Sample records for develop statistical models

  1. Helping Students Develop Statistical Reasoning: Implementing a Statistical Reasoning Learning Environment

    ERIC Educational Resources Information Center

    Garfield, Joan; Ben-Zvi, Dani

    2009-01-01

    This article describes a model for an interactive, introductory secondary- or tertiary-level statistics course that is designed to develop students' statistical reasoning. This model is called a "Statistical Reasoning Learning Environment" and is built on the constructivist theory of learning.

  2. Online Statistical Modeling (Regression Analysis) for Independent Responses

    NASA Astrophysics Data System (ADS)

    Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus

    2017-06-01

    Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.

  3. Modified Likelihood-Based Item Fit Statistics for the Generalized Graded Unfolding Model

    ERIC Educational Resources Information Center

    Roberts, James S.

    2008-01-01

    Orlando and Thissen (2000) developed an item fit statistic for binary item response theory (IRT) models known as S-X[superscript 2]. This article generalizes their statistic to polytomous unfolding models. Four alternative formulations of S-X[superscript 2] are developed for the generalized graded unfolding model (GGUM). The GGUM is a…

  4. Developing Statistical Knowledge for Teaching during Design-Based Research

    ERIC Educational Resources Information Center

    Groth, Randall E.

    2017-01-01

    Statistical knowledge for teaching is not precisely equivalent to statistics subject matter knowledge. Teachers must know how to make statistics understandable to others as well as understand the subject matter themselves. This dual demand on teachers calls for the development of viable teacher education models. This paper offers one such model,…

  5. Progress of statistical analysis in biomedical research through the historical review of the development of the Framingham score.

    PubMed

    Ignjatović, Aleksandra; Stojanović, Miodrag; Milošević, Zoran; Anđelković Apostolović, Marija

    2017-12-02

    The interest in developing risk models in medicine not only is appealing, but also associated with many obstacles in different aspects of predictive model development. Initially, the association of biomarkers or the association of more markers with the specific outcome was proven by statistical significance, but novel and demanding questions required the development of new and more complex statistical techniques. Progress of statistical analysis in biomedical research can be observed the best through the history of the Framingham study and development of the Framingham score. Evaluation of predictive models comes from a combination of the facts which are results of several metrics. Using logistic regression and Cox proportional hazards regression analysis, the calibration test, and the ROC curve analysis should be mandatory and eliminatory, and the central place should be taken by some new statistical techniques. In order to obtain complete information related to the new marker in the model, recently, there is a recommendation to use the reclassification tables by calculating the net reclassification index and the integrated discrimination improvement. Decision curve analysis is a novel method for evaluating the clinical usefulness of a predictive model. It may be noted that customizing and fine-tuning of the Framingham risk score initiated the development of statistical analysis. Clinically applicable predictive model should be a trade-off between all abovementioned statistical metrics, a trade-off between calibration and discrimination, accuracy and decision-making, costs and benefits, and quality and quantity of patient's life.

  6. Watershed Regressions for Pesticides (WARP) models for predicting stream concentrations of multiple pesticides

    USGS Publications Warehouse

    Stone, Wesley W.; Crawford, Charles G.; Gilliom, Robert J.

    2013-01-01

    Watershed Regressions for Pesticides for multiple pesticides (WARP-MP) are statistical models developed to predict concentration statistics for a wide range of pesticides in unmonitored streams. The WARP-MP models use the national atrazine WARP models in conjunction with an adjustment factor for each additional pesticide. The WARP-MP models perform best for pesticides with application timing and methods similar to those used with atrazine. For other pesticides, WARP-MP models tend to overpredict concentration statistics for the model development sites. For WARP and WARP-MP, the less-than-ideal sampling frequency for the model development sites leads to underestimation of the shorter-duration concentration; hence, the WARP models tend to underpredict 4- and 21-d maximum moving-average concentrations, with median errors ranging from 9 to 38% As a result of this sampling bias, pesticides that performed well with the model development sites are expected to have predictions that are biased low for these shorter-duration concentration statistics. The overprediction by WARP-MP apparent for some of the pesticides is variably offset by underestimation of the model development concentration statistics. Of the 112 pesticides used in the WARP-MP application to stream segments nationwide, 25 were predicted to have concentration statistics with a 50% or greater probability of exceeding one or more aquatic life benchmarks in one or more stream segments. Geographically, many of the modeled streams in the Corn Belt Region were predicted to have one or more pesticides that exceeded an aquatic life benchmark during 2009, indicating the potential vulnerability of streams in this region.

  7. Non-equilibrium dog-flea model

    NASA Astrophysics Data System (ADS)

    Ackerson, Bruce J.

    2017-11-01

    We develop the open dog-flea model to serve as a check of proposed non-equilibrium theories of statistical mechanics. The model is developed in detail. Then it is applied to four recent models for non-equilibrium statistical mechanics. Comparison of the dog-flea solution with these different models allows checking claims and giving a concrete example of the theoretical models.

  8. Statistical Models of At-Grade Intersection Accidents. Addendum.

    DOT National Transportation Integrated Search

    2000-03-01

    This report is an addendum to the work published in FHWA-RD-96-125 titled Statistical Models of At-Grade Intersection Accidents. The objective of both research studies was to develop statistical models of the relationship between traffic accide...

  9. The Development of Statistical Models for Predicting Surgical Site Infections in Japan: Toward a Statistical Model-Based Standardized Infection Ratio.

    PubMed

    Fukuda, Haruhisa; Kuroki, Manabu

    2016-03-01

    To develop and internally validate a surgical site infection (SSI) prediction model for Japan. Retrospective observational cohort study. We analyzed surveillance data submitted to the Japan Nosocomial Infections Surveillance system for patients who had undergone target surgical procedures from January 1, 2010, through December 31, 2012. Logistic regression analyses were used to develop statistical models for predicting SSIs. An SSI prediction model was constructed for each of the procedure categories by statistically selecting the appropriate risk factors from among the collected surveillance data and determining their optimal categorization. Standard bootstrapping techniques were applied to assess potential overfitting. The C-index was used to compare the predictive performances of the new statistical models with those of models based on conventional risk index variables. The study sample comprised 349,987 cases from 428 participant hospitals throughout Japan, and the overall SSI incidence was 7.0%. The C-indices of the new statistical models were significantly higher than those of the conventional risk index models in 21 (67.7%) of the 31 procedure categories (P<.05). No significant overfitting was detected. Japan-specific SSI prediction models were shown to generally have higher accuracy than conventional risk index models. These new models may have applications in assessing hospital performance and identifying high-risk patients in specific procedure categories.

  10. Moment-Based Physical Models of Broadband Clutter due to Aggregations of Fish

    DTIC Science & Technology

    2013-09-30

    statistical models for signal-processing algorithm development. These in turn will help to develop a capability to statistically forecast the impact of...aggregations of fish based on higher-order statistical measures describable in terms of physical and system parameters. Environmentally , these models...processing. In this experiment, we had good ground truth on (1) and (2), and had control over (3) and (4) except for environmentally -imposed restrictions

  11. A two-component rain model for the prediction of attenuation statistics

    NASA Technical Reports Server (NTRS)

    Crane, R. K.

    1982-01-01

    A two-component rain model has been developed for calculating attenuation statistics. In contrast to most other attenuation prediction models, the two-component model calculates the occurrence probability for volume cells or debris attenuation events. The model performed significantly better than the International Radio Consultative Committee model when used for predictions on earth-satellite paths. It is expected that the model will have applications in modeling the joint statistics required for space diversity system design, the statistics of interference due to rain scatter at attenuating frequencies, and the duration statistics for attenuation events.

  12. Applications of spatial statistical network models to stream data

    USGS Publications Warehouse

    Isaak, Daniel J.; Peterson, Erin E.; Ver Hoef, Jay M.; Wenger, Seth J.; Falke, Jeffrey A.; Torgersen, Christian E.; Sowder, Colin; Steel, E. Ashley; Fortin, Marie-Josée; Jordan, Chris E.; Ruesch, Aaron S.; Som, Nicholas; Monestiez, Pascal

    2014-01-01

    Streams and rivers host a significant portion of Earth's biodiversity and provide important ecosystem services for human populations. Accurate information regarding the status and trends of stream resources is vital for their effective conservation and management. Most statistical techniques applied to data measured on stream networks were developed for terrestrial applications and are not optimized for streams. A new class of spatial statistical model, based on valid covariance structures for stream networks, can be used with many common types of stream data (e.g., water quality attributes, habitat conditions, biological surveys) through application of appropriate distributions (e.g., Gaussian, binomial, Poisson). The spatial statistical network models account for spatial autocorrelation (i.e., nonindependence) among measurements, which allows their application to databases with clustered measurement locations. Large amounts of stream data exist in many areas where spatial statistical analyses could be used to develop novel insights, improve predictions at unsampled sites, and aid in the design of efficient monitoring strategies at relatively low cost. We review the topic of spatial autocorrelation and its effects on statistical inference, demonstrate the use of spatial statistics with stream datasets relevant to common research and management questions, and discuss additional applications and development potential for spatial statistics on stream networks. Free software for implementing the spatial statistical network models has been developed that enables custom applications with many stream databases.

  13. An adaptive state of charge estimation approach for lithium-ion series-connected battery system

    NASA Astrophysics Data System (ADS)

    Peng, Simin; Zhu, Xuelai; Xing, Yinjiao; Shi, Hongbing; Cai, Xu; Pecht, Michael

    2018-07-01

    Due to the incorrect or unknown noise statistics of a battery system and its cell-to-cell variations, state of charge (SOC) estimation of a lithium-ion series-connected battery system is usually inaccurate or even divergent using model-based methods, such as extended Kalman filter (EKF) and unscented Kalman filter (UKF). To resolve this problem, an adaptive unscented Kalman filter (AUKF) based on a noise statistics estimator and a model parameter regulator is developed to accurately estimate the SOC of a series-connected battery system. An equivalent circuit model is first built based on the model parameter regulator that illustrates the influence of cell-to-cell variation on the battery system. A noise statistics estimator is then used to attain adaptively the estimated noise statistics for the AUKF when its prior noise statistics are not accurate or exactly Gaussian. The accuracy and effectiveness of the SOC estimation method is validated by comparing the developed AUKF and UKF when model and measurement statistics noises are inaccurate, respectively. Compared with the UKF and EKF, the developed method shows the highest SOC estimation accuracy.

  14. The Development of the Children's Services Statistical Neighbour Benchmarking Model. Final Report

    ERIC Educational Resources Information Center

    Benton, Tom; Chamberlain, Tamsin; Wilson, Rebekah; Teeman, David

    2007-01-01

    In April 2006, the Department for Education and Skills (DfES) commissioned the National Foundation for Educational Research (NFER) to conduct an independent external review in order to develop a single "statistical neighbour" model. This single model aimed to combine the key elements of the different models currently available and be…

  15. Statistical Methodologies to Integrate Experimental and Computational Research

    NASA Technical Reports Server (NTRS)

    Parker, P. A.; Johnson, R. T.; Montgomery, D. C.

    2008-01-01

    Development of advanced algorithms for simulating engine flow paths requires the integration of fundamental experiments with the validation of enhanced mathematical models. In this paper, we provide an overview of statistical methods to strategically and efficiently conduct experiments and computational model refinement. Moreover, the integration of experimental and computational research efforts is emphasized. With a statistical engineering perspective, scientific and engineering expertise is combined with statistical sciences to gain deeper insights into experimental phenomenon and code development performance; supporting the overall research objectives. The particular statistical methods discussed are design of experiments, response surface methodology, and uncertainty analysis and planning. Their application is illustrated with a coaxial free jet experiment and a turbulence model refinement investigation. Our goal is to provide an overview, focusing on concepts rather than practice, to demonstrate the benefits of using statistical methods in research and development, thereby encouraging their broader and more systematic application.

  16. Probabilistic Graphical Model Representation in Phylogenetics

    PubMed Central

    Höhna, Sebastian; Heath, Tracy A.; Boussau, Bastien; Landis, Michael J.; Ronquist, Fredrik; Huelsenbeck, John P.

    2014-01-01

    Recent years have seen a rapid expansion of the model space explored in statistical phylogenetics, emphasizing the need for new approaches to statistical model representation and software development. Clear communication and representation of the chosen model is crucial for: (i) reproducibility of an analysis, (ii) model development, and (iii) software design. Moreover, a unified, clear and understandable framework for model representation lowers the barrier for beginners and nonspecialists to grasp complex phylogenetic models, including their assumptions and parameter/variable dependencies. Graphical modeling is a unifying framework that has gained in popularity in the statistical literature in recent years. The core idea is to break complex models into conditionally independent distributions. The strength lies in the comprehensibility, flexibility, and adaptability of this formalism, and the large body of computational work based on it. Graphical models are well-suited to teach statistical models, to facilitate communication among phylogeneticists and in the development of generic software for simulation and statistical inference. Here, we provide an introduction to graphical models for phylogeneticists and extend the standard graphical model representation to the realm of phylogenetics. We introduce a new graphical model component, tree plates, to capture the changing structure of the subgraph corresponding to a phylogenetic tree. We describe a range of phylogenetic models using the graphical model framework and introduce modules to simplify the representation of standard components in large and complex models. Phylogenetic model graphs can be readily used in simulation, maximum likelihood inference, and Bayesian inference using, for example, Metropolis–Hastings or Gibbs sampling of the posterior distribution. [Computation; graphical models; inference; modularization; statistical phylogenetics; tree plate.] PMID:24951559

  17. Strategies for Reduced-Order Models in Uncertainty Quantification of Complex Turbulent Dynamical Systems

    NASA Astrophysics Data System (ADS)

    Qi, Di

    Turbulent dynamical systems are ubiquitous in science and engineering. Uncertainty quantification (UQ) in turbulent dynamical systems is a grand challenge where the goal is to obtain statistical estimates for key physical quantities. In the development of a proper UQ scheme for systems characterized by both a high-dimensional phase space and a large number of instabilities, significant model errors compared with the true natural signal are always unavoidable due to both the imperfect understanding of the underlying physical processes and the limited computational resources available. One central issue in contemporary research is the development of a systematic methodology for reduced order models that can recover the crucial features both with model fidelity in statistical equilibrium and with model sensitivity in response to perturbations. In the first part, we discuss a general mathematical framework to construct statistically accurate reduced-order models that have skill in capturing the statistical variability in the principal directions of a general class of complex systems with quadratic nonlinearity. A systematic hierarchy of simple statistical closure schemes, which are built through new global statistical energy conservation principles combined with statistical equilibrium fidelity, are designed and tested for UQ of these problems. Second, the capacity of imperfect low-order stochastic approximations to model extreme events in a passive scalar field advected by turbulent flows is investigated. The effects in complicated flow systems are considered including strong nonlinear and non-Gaussian interactions, and much simpler and cheaper imperfect models with model error are constructed to capture the crucial statistical features in the stationary tracer field. Several mathematical ideas are introduced to improve the prediction skill of the imperfect reduced-order models. Most importantly, empirical information theory and statistical linear response theory are applied in the training phase for calibrating model errors to achieve optimal imperfect model parameters; and total statistical energy dynamics are introduced to improve the model sensitivity in the prediction phase especially when strong external perturbations are exerted. The validity of reduced-order models for predicting statistical responses and intermittency is demonstrated on a series of instructive models with increasing complexity, including the stochastic triad model, the Lorenz '96 model, and models for barotropic and baroclinic turbulence. The skillful low-order modeling methods developed here should also be useful for other applications such as efficient algorithms for data assimilation.

  18. Risk prediction model: Statistical and artificial neural network approach

    NASA Astrophysics Data System (ADS)

    Paiman, Nuur Azreen; Hariri, Azian; Masood, Ibrahim

    2017-04-01

    Prediction models are increasingly gaining popularity and had been used in numerous areas of studies to complement and fulfilled clinical reasoning and decision making nowadays. The adoption of such models assist physician's decision making, individual's behavior, and consequently improve individual outcomes and the cost-effectiveness of care. The objective of this paper is to reviewed articles related to risk prediction model in order to understand the suitable approach, development and the validation process of risk prediction model. A qualitative review of the aims, methods and significant main outcomes of the nineteen published articles that developed risk prediction models from numerous fields were done. This paper also reviewed on how researchers develop and validate the risk prediction models based on statistical and artificial neural network approach. From the review done, some methodological recommendation in developing and validating the prediction model were highlighted. According to studies that had been done, artificial neural network approached in developing the prediction model were more accurate compared to statistical approach. However currently, only limited published literature discussed on which approach is more accurate for risk prediction model development.

  19. Development of a statistical model for cervical cancer cell death with irreversible electroporation in vitro.

    PubMed

    Yang, Yongji; Moser, Michael A J; Zhang, Edwin; Zhang, Wenjun; Zhang, Bing

    2018-01-01

    The aim of this study was to develop a statistical model for cell death by irreversible electroporation (IRE) and to show that the statistic model is more accurate than the electric field threshold model in the literature using cervical cancer cells in vitro. HeLa cell line was cultured and treated with different IRE protocols in order to obtain data for modeling the statistical relationship between the cell death and pulse-setting parameters. In total, 340 in vitro experiments were performed with a commercial IRE pulse system, including a pulse generator and an electric cuvette. Trypan blue staining technique was used to evaluate cell death after 4 hours of incubation following IRE treatment. Peleg-Fermi model was used in the study to build the statistical relationship using the cell viability data obtained from the in vitro experiments. A finite element model of IRE for the electric field distribution was also built. Comparison of ablation zones between the statistical model and electric threshold model (drawn from the finite element model) was used to show the accuracy of the proposed statistical model in the description of the ablation zone and its applicability in different pulse-setting parameters. The statistical models describing the relationships between HeLa cell death and pulse length and the number of pulses, respectively, were built. The values of the curve fitting parameters were obtained using the Peleg-Fermi model for the treatment of cervical cancer with IRE. The difference in the ablation zone between the statistical model and the electric threshold model was also illustrated to show the accuracy of the proposed statistical model in the representation of ablation zone in IRE. This study concluded that: (1) the proposed statistical model accurately described the ablation zone of IRE with cervical cancer cells, and was more accurate compared with the electric field model; (2) the proposed statistical model was able to estimate the value of electric field threshold for the computer simulation of IRE in the treatment of cervical cancer; and (3) the proposed statistical model was able to express the change in ablation zone with the change in pulse-setting parameters.

  20. Development of a Predictive Corrosion Model Using Locality-Specific Corrosion Indices

    DTIC Science & Technology

    2017-09-12

    6 3.2.1 Statistical data analysis methods ...6 3.2.2 Algorithm development method ...components, and method ) were compiled into an executable program that uses mathematical models of materials degradation, and statistical calcula- tions

  1. Exponential order statistic models of software reliability growth

    NASA Technical Reports Server (NTRS)

    Miller, D. R.

    1985-01-01

    Failure times of a software reliabilty growth process are modeled as order statistics of independent, nonidentically distributed exponential random variables. The Jelinsky-Moranda, Goel-Okumoto, Littlewood, Musa-Okumoto Logarithmic, and Power Law models are all special cases of Exponential Order Statistic Models, but there are many additional examples also. Various characterizations, properties and examples of this class of models are developed and presented.

  2. Peer Review of EPA's Draft BMDS Document: Exponential ...

    EPA Pesticide Factsheets

    BMDS is one of the Agency's premier tools for estimating risk assessments, therefore the validity and reliability of its statistical models are of paramount importance. This page provides links to peer review of the BMDS applications and its models as they were developed and eventually released documenting the rigorous review process taken to provide the best science tools available for statistical modeling. This page provides links to peer review of the BMDS applications and its models as they were developed and eventually released documenting the rigorous review process taken to provide the best science tools available for statistical modeling.

  3. 78 FR 70303 - Announcement of Requirements and Registration for the Predict the Influenza Season Challenge

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-11-25

    ... public. Mathematical and statistical models can be useful in predicting the timing and impact of the... applying any mathematical, statistical, or other approach to predictive modeling. This challenge will... Services (HHS) region level(s) in the United States by developing mathematical and statistical models that...

  4. Use of statistical and neural net approaches in predicting toxicity of chemicals.

    PubMed

    Basak, S C; Grunwald, G D; Gute, B D; Balasubramanian, K; Opitz, D

    2000-01-01

    Hierarchical quantitative structure-activity relationships (H-QSAR) have been developed as a new approach in constructing models for estimating physicochemical, biomedicinal, and toxicological properties of interest. This approach uses increasingly more complex molecular descriptors in a graduated approach to model building. In this study, statistical and neural network methods have been applied to the development of H-QSAR models for estimating the acute aquatic toxicity (LC50) of 69 benzene derivatives to Pimephales promelas (fathead minnow). Topostructural, topochemical, geometrical, and quantum chemical indices were used as the four levels of the hierarchical method. It is clear from both the statistical and neural network models that topostructural indices alone cannot adequately model this set of congeneric chemicals. Not surprisingly, topochemical indices greatly increase the predictive power of both statistical and neural network models. Quantum chemical indices also add significantly to the modeling of this set of acute aquatic toxicity data.

  5. Collaborative Professional Development for Statistics Teaching: A Case Study of Two Middle-School Mathematics Teachers

    ERIC Educational Resources Information Center

    de Oliveira Souza, Leandro; Lopes, Celi Espasandin; Pfannkuch, Maxine

    2015-01-01

    The recent introduction of statistics into the Brazilian curriculum has presented a multi-problematic situation for teacher professional development. Drawing on research in the areas of teacher development and statistical inquiry, we propose a Teacher Professional Development Cycle (TPDC) model. This paper focuses on two teachers who planned a…

  6. Watershed regressions for pesticides (warp) models for predicting atrazine concentrations in Corn Belt streams

    USGS Publications Warehouse

    Stone, Wesley W.; Gilliom, Robert J.

    2012-01-01

    Watershed Regressions for Pesticides (WARP) models, previously developed for atrazine at the national scale, are improved for application to the United States (U.S.) Corn Belt region by developing region-specific models that include watershed characteristics that are influential in predicting atrazine concentration statistics within the Corn Belt. WARP models for the Corn Belt (WARP-CB) were developed for annual maximum moving-average (14-, 21-, 30-, 60-, and 90-day durations) and annual 95th-percentile atrazine concentrations in streams of the Corn Belt region. The WARP-CB models accounted for 53 to 62% of the variability in the various concentration statistics among the model-development sites. Model predictions were within a factor of 5 of the observed concentration statistic for over 90% of the model-development sites. The WARP-CB residuals and uncertainty are lower than those of the National WARP model for the same sites. Although atrazine-use intensity is the most important explanatory variable in the National WARP models, it is not a significant variable in the WARP-CB models. The WARP-CB models provide improved predictions for Corn Belt streams draining watersheds with atrazine-use intensities of 17 kg/km2 of watershed area or greater.

  7. Spatial Statistical Network Models for Stream and River Temperature in the Chesapeake Bay Watershed, USA

    EPA Science Inventory

    Regional temperature models are needed for characterizing and mapping stream thermal regimes, establishing reference conditions, predicting future impacts and identifying critical thermal refugia. Spatial statistical models have been developed to improve regression modeling techn...

  8. Discharge destination following lower limb fracture: development of a prediction model to assist with decision making.

    PubMed

    Kimmel, Lara A; Holland, Anne E; Edwards, Elton R; Cameron, Peter A; De Steiger, Richard; Page, Richard S; Gabbe, Belinda

    2012-06-01

    Accurate prediction of the likelihood of discharge to inpatient rehabilitation following lower limb fracture made on admission to hospital may assist patient discharge planning and decrease the burden on the hospital system caused by delays in decision making. To develop a prognostic model for discharge to inpatient rehabilitation. Isolated lower extremity fracture cases (excluding fractured neck of femur), captured by the Victorian Orthopaedic Trauma Outcomes Registry (VOTOR), were extracted for analysis. A training data set was created for model development and validation data set for evaluation. A multivariable logistic regression model was developed based on patient and injury characteristics. Models were assessed using measures of discrimination (C-statistic) and calibration (Hosmer-Lemeshow (H-L) statistic). A total of 1429 patients met the inclusion criteria and were randomly split into training and test data sets. Increasing age, more proximal fracture type, compensation or private fund source for the admission, metropolitan location of residence, not working prior to injury and having a self-reported pre-injury disability were included in the final prediction model. The C-statistic for the model was 0.92 (95% confidence interval (CI) 0.88, 0.95) with an H-L statistic of χ(2)=11.62, p=0.17. For the test data set, the C-statistic was 0.86 (95% CI 0.83, 0.90) with an H-L statistic of χ(2)=37.98, p<0.001. A model to predict discharge to inpatient rehabilitation following lower limb fracture was developed with excellent discrimination although the calibration was reduced in the test data set. This model requires prospective testing but could form an integral part of decision making in regards to discharge disposition to facilitate timely and accurate referral to rehabilitation and optimise resource allocation. Copyright © 2011 Elsevier Ltd. All rights reserved.

  9. Gene-Based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions.

    PubMed

    Fan, Ruzong; Wang, Yifan; Yan, Qi; Ding, Ying; Weeks, Daniel E; Lu, Zhaohui; Ren, Haobo; Cook, Richard J; Xiong, Momiao; Swaroop, Anand; Chew, Emily Y; Chen, Wei

    2016-02-01

    Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, here we develop Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT), which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example. © 2016 WILEY PERIODICALS, INC.

  10. Gene-based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions

    PubMed Central

    Fan, Ruzong; Wang, Yifan; Yan, Qi; Ding, Ying; Weeks, Daniel E.; Lu, Zhaohui; Ren, Haobo; Cook, Richard J; Xiong, Momiao; Swaroop, Anand; Chew, Emily Y.; Chen, Wei

    2015-01-01

    Summary Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, we develop here Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT) which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example. PMID:26782979

  11. The Development of Statistics Textbook Supported with ICT and Portfolio-Based Assessment

    NASA Astrophysics Data System (ADS)

    Hendikawati, Putriaji; Yuni Arini, Florentina

    2016-02-01

    This research was development research that aimed to develop and produce a Statistics textbook model that supported with information and communication technology (ICT) and Portfolio-Based Assessment. This book was designed for students of mathematics at the college to improve students’ ability in mathematical connection and communication. There were three stages in this research i.e. define, design, and develop. The textbooks consisted of 10 chapters which each chapter contains introduction, core materials and include examples and exercises. The textbook developed phase begins with the early stages of designed the book (draft 1) which then validated by experts. Revision of draft 1 produced draft 2 which then limited test for readability test book. Furthermore, revision of draft 2 produced textbook draft 3 which simulated on a small sample to produce a valid model textbook. The data were analysed with descriptive statistics. The analysis showed that the Statistics textbook model that supported with ICT and Portfolio-Based Assessment valid and fill up the criteria of practicality.

  12. Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2005-01-01

    Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.

  13. Derivative Free Optimization of Complex Systems with the Use of Statistical Machine Learning Models

    DTIC Science & Technology

    2015-09-12

    AFRL-AFOSR-VA-TR-2015-0278 DERIVATIVE FREE OPTIMIZATION OF COMPLEX SYSTEMS WITH THE USE OF STATISTICAL MACHINE LEARNING MODELS Katya Scheinberg...COMPLEX SYSTEMS WITH THE USE OF STATISTICAL MACHINE LEARNING MODELS 5a.  CONTRACT NUMBER 5b.  GRANT NUMBER FA9550-11-1-0239 5c.  PROGRAM ELEMENT...developed, which has been the focus of our research. 15. SUBJECT TERMS optimization, Derivative-Free Optimization, Statistical Machine Learning 16. SECURITY

  14. Development and evaluation of statistical shape modeling for principal inner organs on torso CT images.

    PubMed

    Zhou, Xiangrong; Xu, Rui; Hara, Takeshi; Hirano, Yasushi; Yokoyama, Ryujiro; Kanematsu, Masayuki; Hoshi, Hiroaki; Kido, Shoji; Fujita, Hiroshi

    2014-07-01

    The shapes of the inner organs are important information for medical image analysis. Statistical shape modeling provides a way of quantifying and measuring shape variations of the inner organs in different patients. In this study, we developed a universal scheme that can be used for building the statistical shape models for different inner organs efficiently. This scheme combines the traditional point distribution modeling with a group-wise optimization method based on a measure called minimum description length to provide a practical means for 3D organ shape modeling. In experiments, the proposed scheme was applied to the building of five statistical shape models for hearts, livers, spleens, and right and left kidneys by use of 50 cases of 3D torso CT images. The performance of these models was evaluated by three measures: model compactness, model generalization, and model specificity. The experimental results showed that the constructed shape models have good "compactness" and satisfied the "generalization" performance for different organ shape representations; however, the "specificity" of these models should be improved in the future.

  15. Moving in Parallel Toward a Modern Modeling Epistemology: Bayes Factors and Frequentist Modeling Methods.

    PubMed

    Rodgers, Joseph Lee

    2016-01-01

    The Bayesian-frequentist debate typically portrays these statistical perspectives as opposing views. However, both Bayesian and frequentist statisticians have expanded their epistemological basis away from a singular focus on the null hypothesis, to a broader perspective involving the development and comparison of competing statistical/mathematical models. For frequentists, statistical developments such as structural equation modeling and multilevel modeling have facilitated this transition. For Bayesians, the Bayes factor has facilitated this transition. The Bayes factor is treated in articles within this issue of Multivariate Behavioral Research. The current presentation provides brief commentary on those articles and more extended discussion of the transition toward a modern modeling epistemology. In certain respects, Bayesians and frequentists share common goals.

  16. The potential of composite cognitive scores for tracking progression in Huntington's disease.

    PubMed

    Jones, Rebecca; Stout, Julie C; Labuschagne, Izelle; Say, Miranda; Justo, Damian; Coleman, Allison; Dumas, Eve M; Hart, Ellen; Owen, Gail; Durr, Alexandra; Leavitt, Blair R; Roos, Raymund; O'Regan, Alison; Langbehn, Doug; Tabrizi, Sarah J; Frost, Chris

    2014-01-01

    Composite scores derived from joint statistical modelling of individual risk factors are widely used to identify individuals who are at increased risk of developing disease or of faster disease progression. We investigated the ability of composite measures developed using statistical models to differentiate progressive cognitive deterioration in Huntington's disease (HD) from natural decline in healthy controls. Using longitudinal data from TRACK-HD, the optimal combinations of quantitative cognitive measures to differentiate premanifest and early stage HD individuals respectively from controls was determined using logistic regression. Composite scores were calculated from the parameters of each statistical model. Linear regression models were used to calculate effect sizes (ES) quantifying the difference in longitudinal change over 24 months between premanifest and early stage HD groups respectively and controls. ES for the composites were compared with ES for individual cognitive outcomes and other measures used in HD research. The 0.632 bootstrap was used to eliminate biases which result from developing and testing models in the same sample. In early HD, the composite score from the HD change prediction model produced an ES for difference in rate of 24-month change relative to controls of 1.14 (95% CI: 0.90 to 1.39), larger than the ES for any individual cognitive outcome and UHDRS Total Motor Score and Total Functional Capacity. In addition, this composite gave a statistically significant difference in rate of change in premanifest HD compared to controls over 24-months (ES: 0.24; 95% CI: 0.04 to 0.44), even though none of the individual cognitive outcomes produced statistically significant ES over this period. Composite scores developed using appropriate statistical modelling techniques have the potential to materially reduce required sample sizes for randomised controlled trials.

  17. The Statistical Interpretation of Classical Thermodynamic Heating and Expansion Processes

    ERIC Educational Resources Information Center

    Cartier, Stephen F.

    2011-01-01

    A statistical model has been developed and applied to interpret thermodynamic processes typically presented from the macroscopic, classical perspective. Through this model, students learn and apply the concepts of statistical mechanics, quantum mechanics, and classical thermodynamics in the analysis of the (i) constant volume heating, (ii)…

  18. Addressing economic development goals through innovative teaching of university statistics: a case study of statistical modelling in Nigeria

    NASA Astrophysics Data System (ADS)

    Oseloka Ezepue, Patrick; Ojo, Adegbola

    2012-12-01

    A challenging problem in some developing countries such as Nigeria is inadequate training of students in effective problem solving using the core concepts of their disciplines. Related to this is a disconnection between their learning and socio-economic development agenda of a country. These problems are more vivid in statistical education which is dominated by textbook examples and unbalanced assessment 'for' and 'of' learning within traditional curricula. The problems impede the achievement of socio-economic development objectives such as those stated in the Nigerian Vision 2020 blueprint and United Nations Millennium Development Goals. They also impoverish the ability of (statistics) graduates to creatively use their knowledge in relevant business and industry sectors, thereby exacerbating mass graduate unemployment in Nigeria and similar developing countries. This article uses a case study in statistical modelling to discuss the nature of innovations in statistics education vital to producing new kinds of graduates who can link their learning to national economic development goals, create wealth and alleviate poverty through (self) employment. Wider implications of the innovations for repositioning mathematical sciences education globally are explored in this article.

  19. The Development and Demonstration of Multiple Regression Models for Operant Conditioning Questions.

    ERIC Educational Resources Information Center

    Fanning, Fred; Newman, Isadore

    Based on the assumption that inferential statistics can make the operant conditioner more sensitive to possible significant relationships, regressions models were developed to test the statistical significance between slopes and Y intercepts of the experimental and control group subjects. These results were then compared to the traditional operant…

  20. Statistical aspects of carbon fiber risk assessment modeling. [fire accidents involving aircraft

    NASA Technical Reports Server (NTRS)

    Gross, D.; Miller, D. R.; Soland, R. M.

    1980-01-01

    The probabilistic and statistical aspects of the carbon fiber risk assessment modeling of fire accidents involving commercial aircraft are examined. Three major sources of uncertainty in the modeling effort are identified. These are: (1) imprecise knowledge in establishing the model; (2) parameter estimation; and (3)Monte Carlo sampling error. All three sources of uncertainty are treated and statistical procedures are utilized and/or developed to control them wherever possible.

  1. Teaching Classical Statistical Mechanics: A Simulation Approach.

    ERIC Educational Resources Information Center

    Sauer, G.

    1981-01-01

    Describes a one-dimensional model for an ideal gas to study development of disordered motion in Newtonian mechanics. A Monte Carlo procedure for simulation of the statistical ensemble of an ideal gas with fixed total energy is developed. Compares both approaches for a pseudoexperimental foundation of statistical mechanics. (Author/JN)

  2. Fast, Statistical Model of Surface Roughness for Ion-Solid Interaction Simulations and Efficient Code Coupling

    NASA Astrophysics Data System (ADS)

    Drobny, Jon; Curreli, Davide; Ruzic, David; Lasa, Ane; Green, David; Canik, John; Younkin, Tim; Blondel, Sophie; Wirth, Brian

    2017-10-01

    Surface roughness greatly impacts material erosion, and thus plays an important role in Plasma-Surface Interactions. Developing strategies for efficiently introducing rough surfaces into ion-solid interaction codes will be an important step towards whole-device modeling of plasma devices and future fusion reactors such as ITER. Fractal TRIDYN (F-TRIDYN) is an upgraded version of the Monte Carlo, BCA program TRIDYN developed for this purpose that includes an explicit fractal model of surface roughness and extended input and output options for file-based code coupling. Code coupling with both plasma and material codes has been achieved and allows for multi-scale, whole-device modeling of plasma experiments. These code coupling results will be presented. F-TRIDYN has been further upgraded with an alternative, statistical model of surface roughness. The statistical model is significantly faster than and compares favorably to the fractal model. Additionally, the statistical model compares well to alternative computational surface roughness models and experiments. Theoretical links between the fractal and statistical models are made, and further connections to experimental measurements of surface roughness are explored. This work was supported by the PSI-SciDAC Project funded by the U.S. Department of Energy through contract DOE-DE-SC0008658.

  3. Addressing Economic Development Goals through Innovative Teaching of University Statistics: A Case Study of Statistical Modelling in Nigeria

    ERIC Educational Resources Information Center

    Ezepue, Patrick Oseloka; Ojo, Adegbola

    2012-01-01

    A challenging problem in some developing countries such as Nigeria is inadequate training of students in effective problem solving using the core concepts of their disciplines. Related to this is a disconnection between their learning and socio-economic development agenda of a country. These problems are more vivid in statistical education which…

  4. 'Chain pooling' model selection as developed for the statistical analysis of a rotor burst protection experiment

    NASA Technical Reports Server (NTRS)

    Holms, A. G.

    1977-01-01

    A statistical decision procedure called chain pooling had been developed for model selection in fitting the results of a two-level fixed-effects full or fractional factorial experiment not having replication. The basic strategy included the use of one nominal level of significance for a preliminary test and a second nominal level of significance for the final test. The subject has been reexamined from the point of view of using as many as three successive statistical model deletion procedures in fitting the results of a single experiment. The investigation consisted of random number studies intended to simulate the results of a proposed aircraft turbine-engine rotor-burst-protection experiment. As a conservative approach, population model coefficients were chosen to represent a saturated 2 to the 4th power experiment with a distribution of parameter values unfavorable to the decision procedures. Three model selection strategies were developed.

  5. Genetic Programming as Alternative for Predicting Development Effort of Individual Software Projects

    PubMed Central

    Chavoya, Arturo; Lopez-Martin, Cuauhtemoc; Andalon-Garcia, Irma R.; Meda-Campaña, M. E.

    2012-01-01

    Statistical and genetic programming techniques have been used to predict the software development effort of large software projects. In this paper, a genetic programming model was used for predicting the effort required in individually developed projects. Accuracy obtained from a genetic programming model was compared against one generated from the application of a statistical regression model. A sample of 219 projects developed by 71 practitioners was used for generating the two models, whereas another sample of 130 projects developed by 38 practitioners was used for validating them. The models used two kinds of lines of code as well as programming language experience as independent variables. Accuracy results from the model obtained with genetic programming suggest that it could be used to predict the software development effort of individual projects when these projects have been developed in a disciplined manner within a development-controlled environment. PMID:23226305

  6. A statistical model of operational impacts on the framework of the bridge crane

    NASA Astrophysics Data System (ADS)

    Antsev, V. Yu; Tolokonnikov, A. S.; Gorynin, A. D.; Reutov, A. A.

    2017-02-01

    The technical regulations of the Customs Union demands implementation of the risk analysis of the bridge cranes operation at their design stage. The statistical model has been developed for performance of random calculations of risks, allowing us to model possible operational influences on the bridge crane metal structure in their various combination. The statistical model is practically actualized in the software product automated calculation of risks of failure occurrence of bridge cranes.

  7. Combining Statistics and Physics to Improve Climate Downscaling

    NASA Astrophysics Data System (ADS)

    Gutmann, E. D.; Eidhammer, T.; Arnold, J.; Nowak, K.; Clark, M. P.

    2017-12-01

    Getting useful information from climate models is an ongoing problem that has plagued climate science and hydrologic prediction for decades. While it is possible to develop statistical corrections for climate models that mimic current climate almost perfectly, this does not necessarily guarantee that future changes are portrayed correctly. In contrast, convection permitting regional climate models (RCMs) have begun to provide an excellent representation of the regional climate system purely from first principles, providing greater confidence in their change signal. However, the computational cost of such RCMs prohibits the generation of ensembles of simulations or long time periods, thus limiting their applicability for hydrologic applications. Here we discuss a new approach combining statistical corrections with physical relationships for a modest computational cost. We have developed the Intermediate Complexity Atmospheric Research model (ICAR) to provide a climate and weather downscaling option that is based primarily on physics for a fraction of the computational requirements of a traditional regional climate model. ICAR also enables the incorporation of statistical adjustments directly within the model. We demonstrate that applying even simple corrections to precipitation while the model is running can improve the simulation of land atmosphere feedbacks in ICAR. For example, by incorporating statistical corrections earlier in the modeling chain, we permit the model physics to better represent the effect of mountain snowpack on air temperature changes.

  8. Improved analyses using function datasets and statistical modeling

    Treesearch

    John S. Hogland; Nathaniel M. Anderson

    2014-01-01

    Raster modeling is an integral component of spatial analysis. However, conventional raster modeling techniques can require a substantial amount of processing time and storage space and have limited statistical functionality and machine learning algorithms. To address this issue, we developed a new modeling framework using C# and ArcObjects and integrated that framework...

  9. Predicting lettuce canopy photosynthesis with statistical and neural network models

    NASA Technical Reports Server (NTRS)

    Frick, J.; Precetti, C.; Mitchell, C. A.

    1998-01-01

    An artificial neural network (NN) and a statistical regression model were developed to predict canopy photosynthetic rates (Pn) for 'Waldman's Green' leaf lettuce (Latuca sativa L.). All data used to develop and test the models were collected for crop stands grown hydroponically and under controlled-environment conditions. In the NN and regression models, canopy Pn was predicted as a function of three independent variables: shootzone CO2 concentration (600 to 1500 micromoles mol-1), photosynthetic photon flux (PPF) (600 to 1100 micromoles m-2 s-1), and canopy age (10 to 20 days after planting). The models were used to determine the combinations of CO2 and PPF setpoints required each day to maintain maximum canopy Pn. The statistical model (a third-order polynomial) predicted Pn more accurately than the simple NN (a three-layer, fully connected net). Over an 11-day validation period, average percent difference between predicted and actual Pn was 12.3% and 24.6% for the statistical and NN models, respectively. Both models lost considerable accuracy when used to determine relatively long-range Pn predictions (> or = 6 days into the future).

  10. Development of a funding, cost, and spending model for satellite projects

    NASA Technical Reports Server (NTRS)

    Johnson, Jesse P.

    1989-01-01

    The need for a predictive budget/funging model is obvious. The current models used by the Resource Analysis Office (RAO) are used to predict the total costs of satellite projects. An effort to extend the modeling capabilities from total budget analysis to total budget and budget outlays over time analysis was conducted. A statistical based and data driven methodology was used to derive and develop the model. Th budget data for the last 18 GSFC-sponsored satellite projects were analyzed and used to build a funding model which would describe the historical spending patterns. This raw data consisted of dollars spent in that specific year and their 1989 dollar equivalent. This data was converted to the standard format used by the RAO group and placed in a database. A simple statistical analysis was performed to calculate the gross statistics associated with project length and project cost ant the conditional statistics on project length and project cost. The modeling approach used is derived form the theory of embedded statistics which states that properly analyzed data will produce the underlying generating function. The process of funding large scale projects over extended periods of time is described by Life Cycle Cost Models (LCCM). The data was analyzed to find a model in the generic form of a LCCM. The model developed is based on a Weibull function whose parameters are found by both nonlinear optimization and nonlinear regression. In order to use this model it is necessary to transform the problem from a dollar/time space to a percentage of total budget/time space. This transformation is equivalent to moving to a probability space. By using the basic rules of probability, the validity of both the optimization and the regression steps are insured. This statistically significant model is then integrated and inverted. The resulting output represents a project schedule which relates the amount of money spent to the percentage of project completion.

  11. A meta-analysis and statistical modelling of nitrates in groundwater at the African scale

    NASA Astrophysics Data System (ADS)

    Ouedraogo, Issoufou; Vanclooster, Marnik

    2016-06-01

    Contamination of groundwater with nitrate poses a major health risk to millions of people around Africa. Assessing the space-time distribution of this contamination, as well as understanding the factors that explain this contamination, is important for managing sustainable drinking water at the regional scale. This study aims to assess the variables that contribute to nitrate pollution in groundwater at the African scale by statistical modelling. We compiled a literature database of nitrate concentration in groundwater (around 250 studies) and combined it with digital maps of physical attributes such as soil, geology, climate, hydrogeology, and anthropogenic data for statistical model development. The maximum, medium, and minimum observed nitrate concentrations were analysed. In total, 13 explanatory variables were screened to explain observed nitrate pollution in groundwater. For the mean nitrate concentration, four variables are retained in the statistical explanatory model: (1) depth to groundwater (shallow groundwater, typically < 50 m); (2) recharge rate; (3) aquifer type; and (4) population density. The first three variables represent intrinsic vulnerability of groundwater systems to pollution, while the latter variable is a proxy for anthropogenic pollution pressure. The model explains 65 % of the variation of mean nitrate contamination in groundwater at the African scale. Using the same proxy information, we could develop a statistical model for the maximum nitrate concentrations that explains 42 % of the nitrate variation. For the maximum concentrations, other environmental attributes such as soil type, slope, rainfall, climate class, and region type improve the prediction of maximum nitrate concentrations at the African scale. As to minimal nitrate concentrations, in the absence of normal distribution assumptions of the data set, we do not develop a statistical model for these data. The data-based statistical model presented here represents an important step towards developing tools that will allow us to accurately predict nitrate distribution at the African scale and thus may support groundwater monitoring and water management that aims to protect groundwater systems. Yet they should be further refined and validated when more detailed and harmonized data become available and/or combined with more conceptual descriptions of the fate of nutrients in the hydrosystem.

  12. A statistical rain attenuation prediction model with application to the advanced communication technology satellite project. Part 2: Theoretical development of a dynamic model and application to rain fade durations and tolerable control delays for fade countermeasures

    NASA Technical Reports Server (NTRS)

    Manning, Robert M.

    1987-01-01

    A dynamic rain attenuation prediction model is developed for use in obtaining the temporal characteristics, on time scales of minutes or hours, of satellite communication link availability. Analagous to the associated static rain attenuation model, which yields yearly attenuation predictions, this dynamic model is applicable at any location in the world that is characterized by the static rain attenuation statistics peculiar to the geometry of the satellite link and the rain statistics of the location. Such statistics are calculated by employing the formalism of Part I of this report. In fact, the dynamic model presented here is an extension of the static model and reduces to the static model in the appropriate limit. By assuming that rain attenuation is dynamically described by a first-order stochastic differential equation in time and that this random attenuation process is a Markov process, an expression for the associated transition probability is obtained by solving the related forward Kolmogorov equation. This transition probability is then used to obtain such temporal rain attenuation statistics as attenuation durations and allowable attenuation margins versus control system delay.

  13. A new approach to fracture modelling in reservoirs using deterministic, genetic and statistical models of fracture growth

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rawnsley, K.; Swaby, P.

    1996-08-01

    It is increasingly acknowledged that in order to understand and forecast the behavior of fracture influenced reservoirs we must attempt to reproduce the fracture system geometry and use this as a basis for fluid flow calculation. This article aims to present a recently developed fracture modelling prototype designed specifically for use in hydrocarbon reservoir environments. The prototype {open_quotes}FRAME{close_quotes} (FRActure Modelling Environment) aims to provide a tool which will allow the generation of realistic 3D fracture systems within a reservoir model, constrained to the known geology of the reservoir by both mechanical and statistical considerations, and which can be used asmore » a basis for fluid flow calculation. Two newly developed modelling techniques are used. The first is an interactive tool which allows complex fault surfaces and their associated deformations to be reproduced. The second is a {open_quotes}genetic{close_quotes} model which grows fracture patterns from seeds using conceptual models of fracture development. The user defines the mechanical input and can retrieve all the statistics of the growing fractures to allow comparison to assumed statistical distributions for the reservoir fractures. Input parameters include growth rate, fracture interaction characteristics, orientation maps and density maps. More traditional statistical stochastic fracture models are also incorporated. FRAME is designed to allow the geologist to input hard or soft data including seismically defined surfaces, well fractures, outcrop models, analogue or numerical mechanical models or geological {open_quotes}feeling{close_quotes}. The geologist is not restricted to {open_quotes}a priori{close_quotes} models of fracture patterns that may not correspond to the data.« less

  14. A Comparison Study of Rule Space Method and Neural Network Model for Classifying Individuals and an Application.

    ERIC Educational Resources Information Center

    Hayashi, Atsuhiro

    Both the Rule Space Method (RSM) and the Neural Network Model (NNM) are techniques of statistical pattern recognition and classification approaches developed for applications from different fields. RSM was developed in the domain of educational statistics. It started from the use of an incidence matrix Q that characterizes the underlying cognitive…

  15. Assessing Discriminative Performance at External Validation of Clinical Prediction Models

    PubMed Central

    Nieboer, Daan; van der Ploeg, Tjeerd; Steyerberg, Ewout W.

    2016-01-01

    Introduction External validation studies are essential to study the generalizability of prediction models. Recently a permutation test, focusing on discrimination as quantified by the c-statistic, was proposed to judge whether a prediction model is transportable to a new setting. We aimed to evaluate this test and compare it to previously proposed procedures to judge any changes in c-statistic from development to external validation setting. Methods We compared the use of the permutation test to the use of benchmark values of the c-statistic following from a previously proposed framework to judge transportability of a prediction model. In a simulation study we developed a prediction model with logistic regression on a development set and validated them in the validation set. We concentrated on two scenarios: 1) the case-mix was more heterogeneous and predictor effects were weaker in the validation set compared to the development set, and 2) the case-mix was less heterogeneous in the validation set and predictor effects were identical in the validation and development set. Furthermore we illustrated the methods in a case study using 15 datasets of patients suffering from traumatic brain injury. Results The permutation test indicated that the validation and development set were homogenous in scenario 1 (in almost all simulated samples) and heterogeneous in scenario 2 (in 17%-39% of simulated samples). Previously proposed benchmark values of the c-statistic and the standard deviation of the linear predictors correctly pointed at the more heterogeneous case-mix in scenario 1 and the less heterogeneous case-mix in scenario 2. Conclusion The recently proposed permutation test may provide misleading results when externally validating prediction models in the presence of case-mix differences between the development and validation population. To correctly interpret the c-statistic found at external validation it is crucial to disentangle case-mix differences from incorrect regression coefficients. PMID:26881753

  16. Assessing Discriminative Performance at External Validation of Clinical Prediction Models.

    PubMed

    Nieboer, Daan; van der Ploeg, Tjeerd; Steyerberg, Ewout W

    2016-01-01

    External validation studies are essential to study the generalizability of prediction models. Recently a permutation test, focusing on discrimination as quantified by the c-statistic, was proposed to judge whether a prediction model is transportable to a new setting. We aimed to evaluate this test and compare it to previously proposed procedures to judge any changes in c-statistic from development to external validation setting. We compared the use of the permutation test to the use of benchmark values of the c-statistic following from a previously proposed framework to judge transportability of a prediction model. In a simulation study we developed a prediction model with logistic regression on a development set and validated them in the validation set. We concentrated on two scenarios: 1) the case-mix was more heterogeneous and predictor effects were weaker in the validation set compared to the development set, and 2) the case-mix was less heterogeneous in the validation set and predictor effects were identical in the validation and development set. Furthermore we illustrated the methods in a case study using 15 datasets of patients suffering from traumatic brain injury. The permutation test indicated that the validation and development set were homogenous in scenario 1 (in almost all simulated samples) and heterogeneous in scenario 2 (in 17%-39% of simulated samples). Previously proposed benchmark values of the c-statistic and the standard deviation of the linear predictors correctly pointed at the more heterogeneous case-mix in scenario 1 and the less heterogeneous case-mix in scenario 2. The recently proposed permutation test may provide misleading results when externally validating prediction models in the presence of case-mix differences between the development and validation population. To correctly interpret the c-statistic found at external validation it is crucial to disentangle case-mix differences from incorrect regression coefficients.

  17. Metrological traceability in education: A practical online system for measuring and managing middle school mathematics instruction

    NASA Astrophysics Data System (ADS)

    Torres Irribarra, D.; Freund, R.; Fisher, W.; Wilson, M.

    2015-02-01

    Computer-based, online assessments modelled, designed, and evaluated for adaptively administered invariant measurement are uniquely suited to defining and maintaining traceability to standardized units in education. An assessment of this kind is embedded in the Assessing Data Modeling and Statistical Reasoning (ADM) middle school mathematics curriculum. Diagnostic information about middle school students' learning of statistics and modeling is provided via computer-based formative assessments for seven constructs that comprise a learning progression for statistics and modeling from late elementary through the middle school grades. The seven constructs are: Data Display, Meta-Representational Competence, Conceptions of Statistics, Chance, Modeling Variability, Theory of Measurement, and Informal Inference. The end product is a web-delivered system built with Ruby on Rails for use by curriculum development teams working with classroom teachers in designing, developing, and delivering formative assessments. The online accessible system allows teachers to accurately diagnose students' unique comprehension and learning needs in a common language of real-time assessment, logging, analysis, feedback, and reporting.

  18. Statistical Models for Predicting Automobile Driving Postures for Men and Women Including Effects of Age.

    PubMed

    Park, Jangwoon; Ebert, Sheila M; Reed, Matthew P; Hallman, Jason J

    2016-03-01

    Previously published statistical models of driving posture have been effective for vehicle design but have not taken into account the effects of age. The present study developed new statistical models for predicting driving posture. Driving postures of 90 U.S. drivers with a wide range of age and body size were measured in laboratory mockup in nine package conditions. Posture-prediction models for female and male drivers were separately developed by employing a stepwise regression technique using age, body dimensions, vehicle package conditions, and two-way interactions, among other variables. Driving posture was significantly associated with age, and the effects of other variables depended on age. A set of posture-prediction models is presented for women and men. The results are compared with a previously developed model. The present study is the first study of driver posture to include a large cohort of older drivers and the first to report a significant effect of age. The posture-prediction models can be used to position computational human models or crash-test dummies for vehicle design and assessment. © 2015, Human Factors and Ergonomics Society.

  19. Bayesian statistics in medicine: a 25 year review.

    PubMed

    Ashby, Deborah

    2006-11-15

    This review examines the state of Bayesian thinking as Statistics in Medicine was launched in 1982, reflecting particularly on its applicability and uses in medical research. It then looks at each subsequent five-year epoch, with a focus on papers appearing in Statistics in Medicine, putting these in the context of major developments in Bayesian thinking and computation with reference to important books, landmark meetings and seminal papers. It charts the growth of Bayesian statistics as it is applied to medicine and makes predictions for the future. From sparse beginnings, where Bayesian statistics was barely mentioned, Bayesian statistics has now permeated all the major areas of medical statistics, including clinical trials, epidemiology, meta-analyses and evidence synthesis, spatial modelling, longitudinal modelling, survival modelling, molecular genetics and decision-making in respect of new technologies.

  20. A Model for Developing and Assessing Community College Students' Conceptions of the Range, Interquartile Range, and Standard Deviation

    ERIC Educational Resources Information Center

    Turegun, Mikhail

    2011-01-01

    Traditional curricular materials and pedagogical strategies have not been effective in developing conceptual understanding of statistics topics and statistical reasoning abilities of students. Much of the changes proposed by statistics education research and the reform movement over the past decade have supported efforts to transform teaching…

  1. The Effect on the 8th Grade Students' Attitude towards Statistics of Project Based Learning

    ERIC Educational Resources Information Center

    Koparan, Timur; Güven, Bülent

    2014-01-01

    This study investigates the effect of the project based learning approach on 8th grade students' attitude towards statistics. With this aim, an attitude scale towards statistics was developed. Quasi-experimental research model was used in this study. Following this model in the control group the traditional method was applied to teach statistics…

  2. Secondary Statistical Modeling with the National Assessment of Adult Literacy: Implications for the Design of the Background Questionnaire. Working Paper Series.

    ERIC Educational Resources Information Center

    Kaplan, David

    This paper offers recommendations to the National Center for Education Statistics (NCES) on the development of the background questionnaire for the National Assessment of Adult Literacy (NAAL). The recommendations are from the viewpoint of a researcher interested in applying sophisticated statistical models to address important issues in adult…

  3. A statistical approach to develop a detailed soot growth model using PAH characteristics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Raj, Abhijeet; Celnik, Matthew; Shirley, Raphael

    A detailed PAH growth model is developed, which is solved using a kinetic Monte Carlo algorithm. The model describes the structure and growth of planar PAH molecules, and is referred to as the kinetic Monte Carlo-aromatic site (KMC-ARS) model. A detailed PAH growth mechanism based on reactions at radical sites available in the literature, and additional reactions obtained from quantum chemistry calculations are used to model the PAH growth processes. New rates for the reactions involved in the cyclodehydrogenation process for the formation of 6-member rings on PAHs are calculated in this work based on density functional theory simulations. Themore » KMC-ARS model is validated by comparing experimentally observed ensembles on PAHs with the computed ensembles for a C{sub 2}H{sub 2} and a C{sub 6}H{sub 6} flame at different heights above the burner. The motivation for this model is the development of a detailed soot particle population balance model which describes the evolution of an ensemble of soot particles based on their PAH structure. However, at present incorporating such a detailed model into a population balance is computationally unfeasible. Therefore, a simpler model referred to as the site-counting model has been developed, which replaces the structural information of the PAH molecules by their functional groups augmented with statistical closure expressions. This closure is obtained from the KMC-ARS model, which is used to develop correlations and statistics in different flame environments which describe such PAH structural information. These correlations and statistics are implemented in the site-counting model, and results from the site-counting model and the KMC-ARS model are in good agreement. Additionally the effect of steric hindrance in large PAH structures is investigated and correlations for sites unavailable for reaction are presented. (author)« less

  4. Cognitive Components Underpinning the Development of Model-Based Learning

    PubMed Central

    Potter, Tracey C.S.; Bryce, Nessa V.; Hartley, Catherine A.

    2016-01-01

    Reinforcement learning theory distinguishes “model-free” learning, which fosters reflexive repetition of previously rewarded actions, from “model-based” learning, which recruits a mental model of the environment to flexibly select goal-directed actions. Whereas model-free learning is evident across development, recruitment of model-based learning appears to increase with age. However, the cognitive processes underlying the development of model-based learning remain poorly characterized. Here, we examined whether age-related differences in cognitive processes underlying the construction and flexible recruitment of mental models predict developmental increases in model-based choice. In a cohort of participants aged 9–25, we examined whether the abilities to infer sequential regularities in the environment (“statistical learning”), maintain information in an active state (“working memory”) and integrate distant concepts to solve problems (“fluid reasoning”) predicted age-related improvements in model-based choice. We found that age-related improvements in statistical learning performance did not mediate the relationship between age and model-based choice. Ceiling performance on our working memory assay prevented examination of its contribution to model-based learning. However, age-related improvements in fluid reasoning statistically mediated the developmental increase in the recruitment of a model-based strategy. These findings suggest that gradual development of fluid reasoning may be a critical component process underlying the emergence of model-based learning. PMID:27825732

  5. Cognitive components underpinning the development of model-based learning.

    PubMed

    Potter, Tracey C S; Bryce, Nessa V; Hartley, Catherine A

    2017-06-01

    Reinforcement learning theory distinguishes "model-free" learning, which fosters reflexive repetition of previously rewarded actions, from "model-based" learning, which recruits a mental model of the environment to flexibly select goal-directed actions. Whereas model-free learning is evident across development, recruitment of model-based learning appears to increase with age. However, the cognitive processes underlying the development of model-based learning remain poorly characterized. Here, we examined whether age-related differences in cognitive processes underlying the construction and flexible recruitment of mental models predict developmental increases in model-based choice. In a cohort of participants aged 9-25, we examined whether the abilities to infer sequential regularities in the environment ("statistical learning"), maintain information in an active state ("working memory") and integrate distant concepts to solve problems ("fluid reasoning") predicted age-related improvements in model-based choice. We found that age-related improvements in statistical learning performance did not mediate the relationship between age and model-based choice. Ceiling performance on our working memory assay prevented examination of its contribution to model-based learning. However, age-related improvements in fluid reasoning statistically mediated the developmental increase in the recruitment of a model-based strategy. These findings suggest that gradual development of fluid reasoning may be a critical component process underlying the emergence of model-based learning. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  6. VMT-based traffic impact assessment : development of a trip length model.

    DOT National Transportation Integrated Search

    2010-06-01

    This report develops models that relate the trip-lengths to the land-use characteristics at : the trip-ends (both production- and attraction-ends). Separate models were developed by trip : purpose. The results indicate several statistically significa...

  7. Lung Cancer Risk Prediction Model Incorporating Lung Function: Development and Validation in the UK Biobank Prospective Cohort Study.

    PubMed

    Muller, David C; Johansson, Mattias; Brennan, Paul

    2017-03-10

    Purpose Several lung cancer risk prediction models have been developed, but none to date have assessed the predictive ability of lung function in a population-based cohort. We sought to develop and internally validate a model incorporating lung function using data from the UK Biobank prospective cohort study. Methods This analysis included 502,321 participants without a previous diagnosis of lung cancer, predominantly between 40 and 70 years of age. We used flexible parametric survival models to estimate the 2-year probability of lung cancer, accounting for the competing risk of death. Models included predictors previously shown to be associated with lung cancer risk, including sex, variables related to smoking history and nicotine addiction, medical history, family history of lung cancer, and lung function (forced expiratory volume in 1 second [FEV1]). Results During accumulated follow-up of 1,469,518 person-years, there were 738 lung cancer diagnoses. A model incorporating all predictors had excellent discrimination (concordance (c)-statistic [95% CI] = 0.85 [0.82 to 0.87]). Internal validation suggested that the model will discriminate well when applied to new data (optimism-corrected c-statistic = 0.84). The full model, including FEV1, also had modestly superior discriminatory power than one that was designed solely on the basis of questionnaire variables (c-statistic = 0.84 [0.82 to 0.86]; optimism-corrected c-statistic = 0.83; p FEV1 = 3.4 × 10 -13 ). The full model had better discrimination than standard lung cancer screening eligibility criteria (c-statistic = 0.66 [0.64 to 0.69]). Conclusion A risk prediction model that includes lung function has strong predictive ability, which could improve eligibility criteria for lung cancer screening programs.

  8. An ensemble Kalman filter for statistical estimation of physics constrained nonlinear regression models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Harlim, John, E-mail: jharlim@psu.edu; Mahdi, Adam, E-mail: amahdi@ncsu.edu; Majda, Andrew J., E-mail: jonjon@cims.nyu.edu

    2014-01-15

    A central issue in contemporary science is the development of nonlinear data driven statistical–dynamical models for time series of noisy partial observations from nature or a complex model. It has been established recently that ad-hoc quadratic multi-level regression models can have finite-time blow-up of statistical solutions and/or pathological behavior of their invariant measure. Recently, a new class of physics constrained nonlinear regression models were developed to ameliorate this pathological behavior. Here a new finite ensemble Kalman filtering algorithm is developed for estimating the state, the linear and nonlinear model coefficients, the model and the observation noise covariances from available partialmore » noisy observations of the state. Several stringent tests and applications of the method are developed here. In the most complex application, the perfect model has 57 degrees of freedom involving a zonal (east–west) jet, two topographic Rossby waves, and 54 nonlinearly interacting Rossby waves; the perfect model has significant non-Gaussian statistics in the zonal jet with blocked and unblocked regimes and a non-Gaussian skewed distribution due to interaction with the other 56 modes. We only observe the zonal jet contaminated by noise and apply the ensemble filter algorithm for estimation. Numerically, we find that a three dimensional nonlinear stochastic model with one level of memory mimics the statistical effect of the other 56 modes on the zonal jet in an accurate fashion, including the skew non-Gaussian distribution and autocorrelation decay. On the other hand, a similar stochastic model with zero memory levels fails to capture the crucial non-Gaussian behavior of the zonal jet from the perfect 57-mode model.« less

  9. Peer Review Documents Related to the Evaluation of ...

    EPA Pesticide Factsheets

    BMDS is one of the Agency's premier tools for estimating risk assessments, therefore the validity and reliability of its statistical models are of paramount importance. This page provides links to peer review and expert summaries of the BMDS application and its models as they were developed and eventually released documenting the rigorous review process taken to provide the best science tools available for statistical modeling. This page provides links to peer reviews and expert summaries of the BMDS applications and its models as they were developed and eventually released.

  10. Developing statistical wildlife habitat relationships for assessing cumulative effects of fuels treatments: Final Report for Joint Fire Science Program Project

    Treesearch

    Samuel A. Cushman; Kevin S. McKelvey

    2006-01-01

    The primary weakness in our current ability to evaluate future landscapes in terms of wildlife lies in the lack of quantitative models linking wildlife to forest stand conditions, including fuels treatments. This project focuses on 1) developing statistical wildlife habitat relationships models (WHR) utilizing Forest Inventory and Analysis (FIA) and National Vegetation...

  11. Summary goodness-of-fit statistics for binary generalized linear models with noncanonical link functions.

    PubMed

    Canary, Jana D; Blizzard, Leigh; Barry, Ronald P; Hosmer, David W; Quinn, Stephen J

    2016-05-01

    Generalized linear models (GLM) with a canonical logit link function are the primary modeling technique used to relate a binary outcome to predictor variables. However, noncanonical links can offer more flexibility, producing convenient analytical quantities (e.g., probit GLMs in toxicology) and desired measures of effect (e.g., relative risk from log GLMs). Many summary goodness-of-fit (GOF) statistics exist for logistic GLM. Their properties make the development of GOF statistics relatively straightforward, but it can be more difficult under noncanonical links. Although GOF tests for logistic GLM with continuous covariates (GLMCC) have been applied to GLMCCs with log links, we know of no GOF tests in the literature specifically developed for GLMCCs that can be applied regardless of link function chosen. We generalize the Tsiatis GOF statistic originally developed for logistic GLMCCs, (TG), so that it can be applied under any link function. Further, we show that the algebraically related Hosmer-Lemeshow (HL) and Pigeon-Heyse (J(2) ) statistics can be applied directly. In a simulation study, TG, HL, and J(2) were used to evaluate the fit of probit, log-log, complementary log-log, and log models, all calculated with a common grouping method. The TG statistic consistently maintained Type I error rates, while those of HL and J(2) were often lower than expected if terms with little influence were included. Generally, the statistics had similar power to detect an incorrect model. An exception occurred when a log GLMCC was incorrectly fit to data generated from a logistic GLMCC. In this case, TG had more power than HL or J(2) . © 2015 John Wiley & Sons Ltd/London School of Economics.

  12. A Stochastic Model of Space-Time Variability of Mesoscale Rainfall: Statistics of Spatial Averages

    NASA Technical Reports Server (NTRS)

    Kundu, Prasun K.; Bell, Thomas L.

    2003-01-01

    A characteristic feature of rainfall statistics is that they depend on the space and time scales over which rain data are averaged. A previously developed spectral model of rain statistics that is designed to capture this property, predicts power law scaling behavior for the second moment statistics of area-averaged rain rate on the averaging length scale L as L right arrow 0. In the present work a more efficient method of estimating the model parameters is presented, and used to fit the model to the statistics of area-averaged rain rate derived from gridded radar precipitation data from TOGA COARE. Statistical properties of the data and the model predictions are compared over a wide range of averaging scales. An extension of the spectral model scaling relations to describe the dependence of the average fraction of grid boxes within an area containing nonzero rain (the "rainy area fraction") on the grid scale L is also explored.

  13. Assessment of the scale effect on statistical downscaling quality at a station scale using a weather generator-based model

    USDA-ARS?s Scientific Manuscript database

    The resolution of General Circulation Models (GCMs) is too coarse to assess the fine scale or site-specific impacts of climate change. Downscaling approaches including dynamical and statistical downscaling have been developed to meet this requirement. As the resolution of climate model increases, it...

  14. A Modeling Approach to the Development of Students' Informal Inferential Reasoning

    ERIC Educational Resources Information Center

    Doerr, Helen M.; Delmas, Robert; Makar, Katie

    2017-01-01

    Teaching from an informal statistical inference perspective can address the challenge of teaching statistics in a coherent way. We argue that activities that promote model-based reasoning address two additional challenges: providing a coherent sequence of topics and promoting the application of knowledge to novel situations. We take a models and…

  15. Statistics Graduate Students' Professional Development for Teaching: A Communities of Practice Model

    ERIC Educational Resources Information Center

    Justice, Nicola

    2007-01-01

    Graduate teaching assistants (GTAs) are responsible for instructing approximately 25% of introductory statistics courses in the United States (Blair, Kirkman, & Maxwell, 2013). Most research on GTA professional development focuses on structured activities (e.g., courses, workshops) that have been developed to improve GTAs' pedagogy and content…

  16. Statistical modelling of software reliability

    NASA Technical Reports Server (NTRS)

    Miller, Douglas R.

    1991-01-01

    During the six-month period from 1 April 1991 to 30 September 1991 the following research papers in statistical modeling of software reliability appeared: (1) A Nonparametric Software Reliability Growth Model; (2) On the Use and the Performance of Software Reliability Growth Models; (3) Research and Development Issues in Software Reliability Engineering; (4) Special Issues on Software; and (5) Software Reliability and Safety.

  17. Finding the Root Causes of Statistical Inconsistency in Community Earth System Model Output

    NASA Astrophysics Data System (ADS)

    Milroy, D.; Hammerling, D.; Baker, A. H.

    2017-12-01

    Baker et al (2015) developed the Community Earth System Model Ensemble Consistency Test (CESM-ECT) to provide a metric for software quality assurance by determining statistical consistency between an ensemble of CESM outputs and new test runs. The test has proved useful for detecting statistical difference caused by compiler bugs and errors in physical modules. However, detection is only the necessary first step in finding the causes of statistical difference. The CESM is a vastly complex model comprised of millions of lines of code which is developed and maintained by a large community of software engineers and scientists. Any root cause analysis is correspondingly challenging. We propose a new capability for CESM-ECT: identifying the sections of code that cause statistical distinguishability. The first step is to discover CESM variables that cause CESM-ECT to classify new runs as statistically distinct, which we achieve via Randomized Logistic Regression. Next we use a tool developed to identify CESM components that define or compute the variables found in the first step. Finally, we employ the application Kernel GENerator (KGEN) created in Kim et al (2016) to detect fine-grained floating point differences. We demonstrate an example of the procedure and advance a plan to automate this process in our future work.

  18. Design of a testing strategy using non-animal based test methods: lessons learnt from the ACuteTox project.

    PubMed

    Kopp-Schneider, Annette; Prieto, Pilar; Kinsner-Ovaskainen, Agnieszka; Stanzel, Sven

    2013-06-01

    In the framework of toxicology, a testing strategy can be viewed as a series of steps which are taken to come to a final prediction about a characteristic of a compound under study. The testing strategy is performed as a single-step procedure, usually called a test battery, using simultaneously all information collected on different endpoints, or as tiered approach in which a decision tree is followed. Design of a testing strategy involves statistical considerations, such as the development of a statistical prediction model. During the EU FP6 ACuteTox project, several prediction models were proposed on the basis of statistical classification algorithms which we illustrate here. The final choice of testing strategies was not based on statistical considerations alone. However, without thorough statistical evaluations a testing strategy cannot be identified. We present here a number of observations made from the statistical viewpoint which relate to the development of testing strategies. The points we make were derived from problems we had to deal with during the evaluation of this large research project. A central issue during the development of a prediction model is the danger of overfitting. Procedures are presented to deal with this challenge. Copyright © 2012 Elsevier Ltd. All rights reserved.

  19. Heterogeneous variances in multi-environment yield trials for corn hybrids

    USDA-ARS?s Scientific Manuscript database

    Recent developments in statistics and computing have enabled much greater levels of complexity in statistical models of multi-environment yield trial data. One particular feature of interest to breeders is simultaneously modeling heterogeneity of variances among environments and cultivars. Our obj...

  20. On prognostic models, artificial intelligence and censored observations.

    PubMed

    Anand, S S; Hamilton, P W; Hughes, J G; Bell, D A

    2001-03-01

    The development of prognostic models for assisting medical practitioners with decision making is not a trivial task. Models need to possess a number of desirable characteristics and few, if any, current modelling approaches based on statistical or artificial intelligence can produce models that display all these characteristics. The inability of modelling techniques to provide truly useful models has led to interest in these models being purely academic in nature. This in turn has resulted in only a very small percentage of models that have been developed being deployed in practice. On the other hand, new modelling paradigms are being proposed continuously within the machine learning and statistical community and claims, often based on inadequate evaluation, being made on their superiority over traditional modelling methods. We believe that for new modelling approaches to deliver true net benefits over traditional techniques, an evaluation centric approach to their development is essential. In this paper we present such an evaluation centric approach to developing extensions to the basic k-nearest neighbour (k-NN) paradigm. We use standard statistical techniques to enhance the distance metric used and a framework based on evidence theory to obtain a prediction for the target example from the outcome of the retrieved exemplars. We refer to this new k-NN algorithm as Censored k-NN (Ck-NN). This reflects the enhancements made to k-NN that are aimed at providing a means for handling censored observations within k-NN.

  1. Joint inversion of marine seismic AVA and CSEM data using statistical rock-physics models and Markov random fields: Stochastic inversion of AVA and CSEM data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, J.; Hoversten, G.M.

    2011-09-15

    Joint inversion of seismic AVA and CSEM data requires rock-physics relationships to link seismic attributes to electrical properties. Ideally, we can connect them through reservoir parameters (e.g., porosity and water saturation) by developing physical-based models, such as Gassmann’s equations and Archie’s law, using nearby borehole logs. This could be difficult in the exploration stage because information available is typically insufficient for choosing suitable rock-physics models and for subsequently obtaining reliable estimates of the associated parameters. The use of improper rock-physics models and the inaccuracy of the estimates of model parameters may cause misleading inversion results. Conversely, it is easy tomore » derive statistical relationships among seismic and electrical attributes and reservoir parameters from distant borehole logs. In this study, we develop a Bayesian model to jointly invert seismic AVA and CSEM data for reservoir parameter estimation using statistical rock-physics models; the spatial dependence of geophysical and reservoir parameters are carried out by lithotypes through Markov random fields. We apply the developed model to a synthetic case, which simulates a CO{sub 2} monitoring application. We derive statistical rock-physics relations from borehole logs at one location and estimate seismic P- and S-wave velocity ratio, acoustic impedance, density, electrical resistivity, lithotypes, porosity, and water saturation at three different locations by conditioning to seismic AVA and CSEM data. Comparison of the inversion results with their corresponding true values shows that the correlation-based statistical rock-physics models provide significant information for improving the joint inversion results.« less

  2. Statistical error model for a solar electric propulsion thrust subsystem

    NASA Technical Reports Server (NTRS)

    Bantell, M. H.

    1973-01-01

    The solar electric propulsion thrust subsystem statistical error model was developed as a tool for investigating the effects of thrust subsystem parameter uncertainties on navigation accuracy. The model is currently being used to evaluate the impact of electric engine parameter uncertainties on navigation system performance for a baseline mission to Encke's Comet in the 1980s. The data given represent the next generation in statistical error modeling for low-thrust applications. Principal improvements include the representation of thrust uncertainties and random process modeling in terms of random parametric variations in the thrust vector process for a multi-engine configuration.

  3. Nonelastic nuclear reactions and accompanying gamma radiation

    NASA Technical Reports Server (NTRS)

    Snow, R.; Rosner, H. R.; George, M. C.; Hayes, J. D.

    1971-01-01

    Several aspects of nonelastic nuclear reactions which proceed through the formation of a compound nucleus are dealt with. The full statistical model and the partial statistical model are described and computer programs based on these models are presented along with operating instructions and input and output for sample problems. A theoretical development of the expression for the reaction cross section for the hybrid case which involves a combination of the continuum aspects of the full statistical model with the discrete level aspects of the partial statistical model is presented. Cross sections for level excitation and gamma production by neutron inelastic scattering from the nuclei Al-27, Fe-56, Si-28, and Pb-208 are calculated and compared with avaliable experimental data.

  4. Statistical downscaling of general-circulation-model- simulated average monthly air temperature to the beginning of flowering of the dandelion (Taraxacum officinale) in Slovenia

    NASA Astrophysics Data System (ADS)

    Bergant, Klemen; Kajfež-Bogataj, Lučka; Črepinšek, Zalika

    2002-02-01

    Phenological observations are a valuable source of information for investigating the relationship between climate variation and plant development. Potential climate change in the future will shift the occurrence of phenological phases. Information about future climate conditions is needed in order to estimate this shift. General circulation models (GCM) provide the best information about future climate change. They are able to simulate reliably the most important mean features on a large scale, but they fail on a regional scale because of their low spatial resolution. A common approach to bridging the scale gap is statistical downscaling, which was used to relate the beginning of flowering of Taraxacum officinale in Slovenia with the monthly mean near-surface air temperature for January, February and March in Central Europe. Statistical models were developed and tested with NCAR/NCEP Reanalysis predictor data and EARS predictand data for the period 1960-1999. Prior to developing statistical models, empirical orthogonal function (EOF) analysis was employed on the predictor data. Multiple linear regression was used to relate the beginning of flowering with expansion coefficients of the first three EOF for the Janauary, Febrauary and March air temperatures, and a strong correlation was found between them. Developed statistical models were employed on the results of two GCM (HadCM3 and ECHAM4/OPYC3) to estimate the potential shifts in the beginning of flowering for the periods 1990-2019 and 2020-2049 in comparison with the period 1960-1989. The HadCM3 model predicts, on average, 4 days earlier occurrence and ECHAM4/OPYC3 5 days earlier occurrence of flowering in the period 1990-2019. The analogous results for the period 2020-2049 are a 10- and 11-day earlier occurrence.

  5. Statistical methods for the beta-binomial model in teratology.

    PubMed Central

    Yamamoto, E; Yanagimoto, T

    1994-01-01

    The beta-binomial model is widely used for analyzing teratological data involving littermates. Recent developments in statistical analyses of teratological data are briefly reviewed with emphasis on the model. For statistical inference of the parameters in the beta-binomial distribution, separation of the likelihood introduces an likelihood inference. This leads to reducing biases of estimators and also to improving accuracy of empirical significance levels of tests. Separate inference of the parameters can be conducted in a unified way. PMID:8187716

  6. Machine Learning Algorithms Outperform Conventional Regression Models in Predicting Development of Hepatocellular Carcinoma

    PubMed Central

    Singal, Amit G.; Mukherjee, Ashin; Elmunzer, B. Joseph; Higgins, Peter DR; Lok, Anna S.; Zhu, Ji; Marrero, Jorge A; Waljee, Akbar K

    2015-01-01

    Background Predictive models for hepatocellular carcinoma (HCC) have been limited by modest accuracy and lack of validation. Machine learning algorithms offer a novel methodology, which may improve HCC risk prognostication among patients with cirrhosis. Our study's aim was to develop and compare predictive models for HCC development among cirrhotic patients, using conventional regression analysis and machine learning algorithms. Methods We enrolled 442 patients with Child A or B cirrhosis at the University of Michigan between January 2004 and September 2006 (UM cohort) and prospectively followed them until HCC development, liver transplantation, death, or study termination. Regression analysis and machine learning algorithms were used to construct predictive models for HCC development, which were tested on an independent validation cohort from the Hepatitis C Antiviral Long-term Treatment against Cirrhosis (HALT-C) Trial. Both models were also compared to the previously published HALT-C model. Discrimination was assessed using receiver operating characteristic curve analysis and diagnostic accuracy was assessed with net reclassification improvement and integrated discrimination improvement statistics. Results After a median follow-up of 3.5 years, 41 patients developed HCC. The UM regression model had a c-statistic of 0.61 (95%CI 0.56-0.67), whereas the machine learning algorithm had a c-statistic of 0.64 (95%CI 0.60–0.69) in the validation cohort. The machine learning algorithm had significantly better diagnostic accuracy as assessed by net reclassification improvement (p<0.001) and integrated discrimination improvement (p=0.04). The HALT-C model had a c-statistic of 0.60 (95%CI 0.50-0.70) in the validation cohort and was outperformed by the machine learning algorithm (p=0.047). Conclusion Machine learning algorithms improve the accuracy of risk stratifying patients with cirrhosis and can be used to accurately identify patients at high-risk for developing HCC. PMID:24169273

  7. Statistical Learning is Related to Early Literacy-Related Skills

    PubMed Central

    Spencer, Mercedes; Kaschak, Michael P.; Jones, John L.; Lonigan, Christopher J.

    2015-01-01

    It has been demonstrated that statistical learning, or the ability to use statistical information to learn the structure of one’s environment, plays a role in young children’s acquisition of linguistic knowledge. Although most research on statistical learning has focused on language acquisition processes, such as the segmentation of words from fluent speech and the learning of syntactic structure, some recent studies have explored the extent to which individual differences in statistical learning are related to literacy-relevant knowledge and skills. The present study extends on this literature by investigating the relations between two measures of statistical learning and multiple measures of skills that are critical to the development of literacy—oral language, vocabulary knowledge, and phonological processing—within a single model. Our sample included a total of 553 typically developing children from prekindergarten through second grade. Structural equation modeling revealed that statistical learning accounted for a unique portion of the variance in these literacy-related skills. Practical implications for instruction and assessment are discussed. PMID:26478658

  8. Development of the AFRL Aircrew Perfomance and Protection Data Bank

    DTIC Science & Technology

    2007-12-01

    Growth model and statistical model of hypobaric chamber simulations. It offers a quick and readily accessible online DCS risk assessment tool for...are used for the DCS prediction instead of the original model. ADRAC is based on more than 20 years of hypobaric chamber studies using human...prediction based on the combined Bubble Growth model and statistical model of hypobaric chamber simulations was integrated into the Data Bank. It

  9. Predicting Statistical Response and Extreme Events in Uncertainty Quantification through Reduced-Order Models

    NASA Astrophysics Data System (ADS)

    Qi, D.; Majda, A.

    2017-12-01

    A low-dimensional reduced-order statistical closure model is developed for quantifying the uncertainty in statistical sensitivity and intermittency in principal model directions with largest variability in high-dimensional turbulent system and turbulent transport models. Imperfect model sensitivity is improved through a recent mathematical strategy for calibrating model errors in a training phase, where information theory and linear statistical response theory are combined in a systematic fashion to achieve the optimal model performance. The idea in the reduced-order method is from a self-consistent mathematical framework for general systems with quadratic nonlinearity, where crucial high-order statistics are approximated by a systematic model calibration procedure. Model efficiency is improved through additional damping and noise corrections to replace the expensive energy-conserving nonlinear interactions. Model errors due to the imperfect nonlinear approximation are corrected by tuning the model parameters using linear response theory with an information metric in a training phase before prediction. A statistical energy principle is adopted to introduce a global scaling factor in characterizing the higher-order moments in a consistent way to improve model sensitivity. Stringent models of barotropic and baroclinic turbulence are used to display the feasibility of the reduced-order methods. Principal statistical responses in mean and variance can be captured by the reduced-order models with accuracy and efficiency. Besides, the reduced-order models are also used to capture crucial passive tracer field that is advected by the baroclinic turbulent flow. It is demonstrated that crucial principal statistical quantities like the tracer spectrum and fat-tails in the tracer probability density functions in the most important large scales can be captured efficiently with accuracy using the reduced-order tracer model in various dynamical regimes of the flow field with distinct statistical structures.

  10. Collaborative Project: The problem of bias in defining uncertainty in computationally enabled strategies for data-driven climate model development. Final Technical Report.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huerta, Gabriel

    The objective of the project is to develop strategies for better representing scientific sensibilities within statistical measures of model skill that then can be used within a Bayesian statistical framework for data-driven climate model development and improved measures of model scientific uncertainty. One of the thorny issues in model evaluation is quantifying the effect of biases on climate projections. While any bias is not desirable, only those biases that affect feedbacks affect scatter in climate projections. The effort at the University of Texas is to analyze previously calculated ensembles of CAM3.1 with perturbed parameters to discover how biases affect projectionsmore » of global warming. The hypothesis is that compensating errors in the control model can be identified by their effect on a combination of processes and that developing metrics that are sensitive to dependencies among state variables would provide a way to select version of climate models that may reduce scatter in climate projections. Gabriel Huerta at the University of New Mexico is responsible for developing statistical methods for evaluating these field dependencies. The UT effort will incorporate these developments into MECS, which is a set of python scripts being developed at the University of Texas for managing the workflow associated with data-driven climate model development over HPC resources. This report reflects the main activities at the University of New Mexico where the PI (Huerta) and the Postdocs (Nosedal, Hattab and Karki) worked on the project.« less

  11. Landau's statistical mechanics for quasi-particle models

    NASA Astrophysics Data System (ADS)

    Bannur, Vishnu M.

    2014-04-01

    Landau's formalism of statistical mechanics [following L. D. Landau and E. M. Lifshitz, Statistical Physics (Pergamon Press, Oxford, 1980)] is applied to the quasi-particle model of quark-gluon plasma. Here, one starts from the expression for pressure and develop all thermodynamics. It is a general formalism and consistent with our earlier studies [V. M. Bannur, Phys. Lett. B647, 271 (2007)] based on Pathria's formalism [following R. K. Pathria, Statistical Mechanics (Butterworth-Heinemann, Oxford, 1977)]. In Pathria's formalism, one starts from the expression for energy density and develop thermodynamics. Both the formalisms are consistent with thermodynamics and statistical mechanics. Under certain conditions, which are wrongly called thermodynamic consistent relation, we recover other formalism of quasi-particle system, like in M. I. Gorenstein and S. N. Yang, Phys. Rev. D52, 5206 (1995), widely studied in quark-gluon plasma.

  12. Tree injury and mortality in fires: developing process-based models

    Treesearch

    Bret W. Butler; Matthew B. Dickinson

    2010-01-01

    Wildland fire managers are often required to predict tree injury and mortality when planning a prescribed burn or when considering wildfire management options; and, currently, statistical models based on post-fire observations are the only tools available for this purpose. Implicit in the derivation of statistical models is the assumption that they are strictly...

  13. A smoothed residual based goodness-of-fit statistic for nest-survival models

    Treesearch

    Rodney X. Sturdivant; Jay J. Rotella; Robin E. Russell

    2008-01-01

    Estimating nest success and identifying important factors related to nest-survival rates is an essential goal for many wildlife researchers interested in understanding avian population dynamics. Advances in statistical methods have led to a number of estimation methods and approaches to modeling this problem. Recently developed models allow researchers to include a...

  14. Quantifying uncertainty in climate change science through empirical information theory.

    PubMed

    Majda, Andrew J; Gershgorin, Boris

    2010-08-24

    Quantifying the uncertainty for the present climate and the predictions of climate change in the suite of imperfect Atmosphere Ocean Science (AOS) computer models is a central issue in climate change science. Here, a systematic approach to these issues with firm mathematical underpinning is developed through empirical information theory. An information metric to quantify AOS model errors in the climate is proposed here which incorporates both coarse-grained mean model errors as well as covariance ratios in a transformation invariant fashion. The subtle behavior of model errors with this information metric is quantified in an instructive statistically exactly solvable test model with direct relevance to climate change science including the prototype behavior of tracer gases such as CO(2). Formulas for identifying the most sensitive climate change directions using statistics of the present climate or an AOS model approximation are developed here; these formulas just involve finding the eigenvector associated with the largest eigenvalue of a quadratic form computed through suitable unperturbed climate statistics. These climate change concepts are illustrated on a statistically exactly solvable one-dimensional stochastic model with relevance for low frequency variability of the atmosphere. Viable algorithms for implementation of these concepts are discussed throughout the paper.

  15. Risk prediction models of breast cancer: a systematic review of model performances.

    PubMed

    Anothaisintawee, Thunyarat; Teerawattananon, Yot; Wiratkapun, Chollathip; Kasamesup, Vijj; Thakkinstian, Ammarin

    2012-05-01

    The number of risk prediction models has been increasingly developed, for estimating about breast cancer in individual women. However, those model performances are questionable. We therefore have conducted a study with the aim to systematically review previous risk prediction models. The results from this review help to identify the most reliable model and indicate the strengths and weaknesses of each model for guiding future model development. We searched MEDLINE (PubMed) from 1949 and EMBASE (Ovid) from 1974 until October 2010. Observational studies which constructed models using regression methods were selected. Information about model development and performance were extracted. Twenty-five out of 453 studies were eligible. Of these, 18 developed prediction models and 7 validated existing prediction models. Up to 13 variables were included in the models and sample sizes for each study ranged from 550 to 2,404,636. Internal validation was performed in four models, while five models had external validation. Gail and Rosner and Colditz models were the significant models which were subsequently modified by other scholars. Calibration performance of most models was fair to good (expected/observe ratio: 0.87-1.12), but discriminatory accuracy was poor to fair both in internal validation (concordance statistics: 0.53-0.66) and in external validation (concordance statistics: 0.56-0.63). Most models yielded relatively poor discrimination in both internal and external validation. This poor discriminatory accuracy of existing models might be because of a lack of knowledge about risk factors, heterogeneous subtypes of breast cancer, and different distributions of risk factors across populations. In addition the concordance statistic itself is insensitive to measure the improvement of discrimination. Therefore, the new method such as net reclassification index should be considered to evaluate the improvement of the performance of a new develop model.

  16. A comparison of large-scale climate signals and the North American Multi-Model Ensemble (NMME) for drought prediction in China

    NASA Astrophysics Data System (ADS)

    Xu, Lei; Chen, Nengcheng; Zhang, Xiang

    2018-02-01

    Drought is an extreme natural disaster that can lead to huge socioeconomic losses. Drought prediction ahead of months is helpful for early drought warning and preparations. In this study, we developed a statistical model, two weighted dynamic models and a statistical-dynamic (hybrid) model for 1-6 month lead drought prediction in China. Specifically, statistical component refers to climate signals weighting by support vector regression (SVR), dynamic components consist of the ensemble mean (EM) and Bayesian model averaging (BMA) of the North American Multi-Model Ensemble (NMME) climatic models, and the hybrid part denotes a combination of statistical and dynamic components by assigning weights based on their historical performances. The results indicate that the statistical and hybrid models show better rainfall predictions than NMME-EM and NMME-BMA models, which have good predictability only in southern China. In the 2011 China winter-spring drought event, the statistical model well predicted the spatial extent and severity of drought nationwide, although the severity was underestimated in the mid-lower reaches of Yangtze River (MLRYR) region. The NMME-EM and NMME-BMA models largely overestimated rainfall in northern and western China in 2011 drought. In the 2013 China summer drought, the NMME-EM model forecasted the drought extent and severity in eastern China well, while the statistical and hybrid models falsely detected negative precipitation anomaly (NPA) in some areas. Model ensembles such as multiple statistical approaches, multiple dynamic models or multiple hybrid models for drought predictions were highlighted. These conclusions may be helpful for drought prediction and early drought warnings in China.

  17. Empirical support for global integrated assessment modeling: Productivity trends and technological change in developing countries' agriculture and electric power sectors

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sathaye, Jayant A.

    2000-04-01

    Integrated assessment (IA) modeling of climate policy is increasingly global in nature, with models incorporating regional disaggregation. The existing empirical basis for IA modeling, however, largely arises from research on industrialized economies. Given the growing importance of developing countries in determining long-term global energy and carbon emissions trends, filling this gap with improved statistical information on developing countries' energy and carbon-emissions characteristics is an important priority for enhancing IA modeling. Earlier research at LBNL on this topic has focused on assembling and analyzing statistical data on productivity trends and technological change in the energy-intensive manufacturing sectors of five developing countries,more » India, Brazil, Mexico, Indonesia, and South Korea. The proposed work will extend this analysis to the agriculture and electric power sectors in India, South Korea, and two other developing countries. They will also examine the impact of alternative model specifications on estimates of productivity growth and technological change for each of the three sectors, and estimate the contribution of various capital inputs--imported vs. indigenous, rigid vs. malleable-- in contributing to productivity growth and technological change. The project has already produced a data resource on the manufacturing sector which is being shared with IA modelers. This will be extended to the agriculture and electric power sectors, which would also be made accessible to IA modeling groups seeking to enhance the empirical descriptions of developing country characteristics. The project will entail basic statistical and econometric analysis of productivity and energy trends in these developing country sectors, with parameter estimates also made available to modeling groups. The parameter estimates will be developed using alternative model specifications that could be directly utilized by the existing IAMs for the manufacturing, agriculture, and electric power sectors.« less

  18. The Answer Is in the Question: A Guide for Describing and Investigating the Conceptual Foundations and Statistical Properties of Cognitive Psychometric Models

    ERIC Educational Resources Information Center

    Rupp, Andre A.

    2007-01-01

    One of the most revolutionary advances in psychometric research during the last decades has been the systematic development of statistical models that allow for cognitive psychometric research (CPR) to be conducted. Many of the models currently available for such purposes are extensions of basic latent variable models in item response theory…

  19. Statistical ecology comes of age.

    PubMed

    Gimenez, Olivier; Buckland, Stephen T; Morgan, Byron J T; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric

    2014-12-01

    The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1-4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data.

  20. Statistical ecology comes of age

    PubMed Central

    Gimenez, Olivier; Buckland, Stephen T.; Morgan, Byron J. T.; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M.; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M.; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric

    2014-01-01

    The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1–4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data. PMID:25540151

  1. Security of statistical data bases: invasion of privacy through attribute correlational modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Palley, M.A.

    This study develops, defines, and applies a statistical technique for the compromise of confidential information in a statistical data base. Attribute Correlational Modeling (ACM) recognizes that the information contained in a statistical data base represents real world statistical phenomena. As such, ACM assumes correlational behavior among the database attributes. ACM proceeds to compromise confidential information through creation of a regression model, where the confidential attribute is treated as the dependent variable. The typical statistical data base may preclude the direct application of regression. In this scenario, the research introduces the notion of a synthetic data base, created through legitimate queriesmore » of the actual data base, and through proportional random variation of responses to these queries. The synthetic data base is constructed to resemble the actual data base as closely as possible in a statistical sense. ACM then applies regression analysis to the synthetic data base, and utilizes the derived model to estimate confidential information in the actual database.« less

  2. Incorporating signal-dependent noise for hyperspectral target detection

    NASA Astrophysics Data System (ADS)

    Morman, Christopher J.; Meola, Joseph

    2015-05-01

    The majority of hyperspectral target detection algorithms are developed from statistical data models employing stationary background statistics or white Gaussian noise models. Stationary background models are inaccurate as a result of two separate physical processes. First, varying background classes often exist in the imagery that possess different clutter statistics. Many algorithms can account for this variability through the use of subspaces or clustering techniques. The second physical process, which is often ignored, is a signal-dependent sensor noise term. For photon counting sensors that are often used in hyperspectral imaging systems, sensor noise increases as the measured signal level increases as a result of Poisson random processes. This work investigates the impact of this sensor noise on target detection performance. A linear noise model is developed describing sensor noise variance as a linear function of signal level. The linear noise model is then incorporated for detection of targets using data collected at Wright Patterson Air Force Base.

  3. A Statistical Skull Geometry Model for Children 0-3 Years Old

    PubMed Central

    Li, Zhigang; Park, Byoung-Keon; Liu, Weiguo; Zhang, Jinhuan; Reed, Matthew P.; Rupp, Jonathan D.; Hoff, Carrie N.; Hu, Jingwen

    2015-01-01

    Head injury is the leading cause of fatality and long-term disability for children. Pediatric heads change rapidly in both size and shape during growth, especially for children under 3 years old (YO). To accurately assess the head injury risks for children, it is necessary to understand the geometry of the pediatric head and how morphologic features influence injury causation within the 0–3 YO population. In this study, head CT scans from fifty-six 0–3 YO children were used to develop a statistical model of pediatric skull geometry. Geometric features important for injury prediction, including skull size and shape, skull thickness and suture width, along with their variations among the sample population, were quantified through a series of image and statistical analyses. The size and shape of the pediatric skull change significantly with age and head circumference. The skull thickness and suture width vary with age, head circumference and location, which will have important effects on skull stiffness and injury prediction. The statistical geometry model developed in this study can provide a geometrical basis for future development of child anthropomorphic test devices and pediatric head finite element models. PMID:25992998

  4. A statistical skull geometry model for children 0-3 years old.

    PubMed

    Li, Zhigang; Park, Byoung-Keon; Liu, Weiguo; Zhang, Jinhuan; Reed, Matthew P; Rupp, Jonathan D; Hoff, Carrie N; Hu, Jingwen

    2015-01-01

    Head injury is the leading cause of fatality and long-term disability for children. Pediatric heads change rapidly in both size and shape during growth, especially for children under 3 years old (YO). To accurately assess the head injury risks for children, it is necessary to understand the geometry of the pediatric head and how morphologic features influence injury causation within the 0-3 YO population. In this study, head CT scans from fifty-six 0-3 YO children were used to develop a statistical model of pediatric skull geometry. Geometric features important for injury prediction, including skull size and shape, skull thickness and suture width, along with their variations among the sample population, were quantified through a series of image and statistical analyses. The size and shape of the pediatric skull change significantly with age and head circumference. The skull thickness and suture width vary with age, head circumference and location, which will have important effects on skull stiffness and injury prediction. The statistical geometry model developed in this study can provide a geometrical basis for future development of child anthropomorphic test devices and pediatric head finite element models.

  5. New statistical scission-point model to predict fission fragment observables

    NASA Astrophysics Data System (ADS)

    Lemaître, Jean-François; Panebianco, Stefano; Sida, Jean-Luc; Hilaire, Stéphane; Heinrich, Sophie

    2015-09-01

    The development of high performance computing facilities makes possible a massive production of nuclear data in a full microscopic framework. Taking advantage of the individual potential calculations of more than 7000 nuclei, a new statistical scission-point model, called SPY, has been developed. It gives access to the absolute available energy at the scission point, which allows the use of a parameter-free microcanonical statistical description to calculate the distributions and the mean values of all fission observables. SPY uses the richness of microscopy in a rather simple theoretical framework, without any parameter except the scission-point definition, to draw clear answers based on perfect knowledge of the ingredients involved in the model, with very limited computing cost.

  6. Statistical prescission point model of fission fragment angular distributions

    NASA Astrophysics Data System (ADS)

    John, Bency; Kataria, S. K.

    1998-03-01

    In light of recent developments in fission studies such as slow saddle to scission motion and spin equilibration near the scission point, the theory of fission fragment angular distribution is examined and a new statistical prescission point model is developed. The conditional equilibrium of the collective angular bearing modes at the prescission point, which is guided mainly by their relaxation times and population probabilities, is taken into account in the present model. The present model gives a consistent description of the fragment angular and spin distributions for a wide variety of heavy and light ion induced fission reactions.

  7. Modeling the spatial distribution of landslide-prone colluvium and shallow groundwater on hillslopes of Seattle, WA

    USGS Publications Warehouse

    Schulz, W.H.; Lidke, D.J.; Godt, J.W.

    2008-01-01

    Landslides in partially saturated colluvium on Seattle, WA, hillslopes have resulted in property damage and human casualties. We developed statistical models of colluvium and shallow-groundwater distributions to aid landslide hazard assessments. The models were developed using a geographic information system, digital geologic maps, digital topography, subsurface exploration results, the groundwater flow modeling software VS2DI and regression analyses. Input to the colluvium model includes slope, distance to a hillslope-crest escarpment, and escarpment slope and height. We developed different statistical relations for thickness of colluvium on four landforms. Groundwater model input includes colluvium basal slope and distance from the Fraser aquifer. This distance was used to estimate hydraulic conductivity based on the assumption that addition of finer-grained material from down-section would result in lower conductivity. Colluvial groundwater is perched so we estimated its saturated thickness. We used VS2DI to establish relations between saturated thickness and the hydraulic conductivity and basal slope of the colluvium. We developed different statistical relations for three groundwater flow regimes. All model results were validated using observational data that were excluded from calibration. Eighty percent of colluvium thickness predictions were within 25% of observed values and 88% of saturated thickness predictions were within 20% of observed values. The models are based on conditions common to many areas, so our method can provide accurate results for similar regions; relations in our statistical models require calibration for new regions. Our results suggest that Seattle landslides occur in native deposits and colluvium, ultimately in response to surface-water erosion of hillstope toes. Regional groundwater conditions do not appear to strongly affect the general distribution of Seattle landslides; historical landslides were equally dispersed within and outside of the area potentially affected by regional groundwater conditions.

  8. ROTAS: a rotamer-dependent, atomic statistical potential for assessment and prediction of protein structures.

    PubMed

    Park, Jungkap; Saitou, Kazuhiro

    2014-09-18

    Multibody potentials accounting for cooperative effects of molecular interactions have shown better accuracy than typical pairwise potentials. The main challenge in the development of such potentials is to find relevant structural features that characterize the tightly folded proteins. Also, the side-chains of residues adopt several specific, staggered conformations, known as rotamers within protein structures. Different molecular conformations result in different dipole moments and induce charge reorientations. However, until now modeling of the rotameric state of residues had not been incorporated into the development of multibody potentials for modeling non-bonded interactions in protein structures. In this study, we develop a new multibody statistical potential which can account for the influence of rotameric states on the specificity of atomic interactions. In this potential, named "rotamer-dependent atomic statistical potential" (ROTAS), the interaction between two atoms is specified by not only the distance and relative orientation but also by two state parameters concerning the rotameric state of the residues to which the interacting atoms belong. It was clearly found that the rotameric state is correlated to the specificity of atomic interactions. Such rotamer-dependencies are not limited to specific type or certain range of interactions. The performance of ROTAS was tested using 13 sets of decoys and was compared to those of existing atomic-level statistical potentials which incorporate orientation-dependent energy terms. The results show that ROTAS performs better than other competing potentials not only in native structure recognition, but also in best model selection and correlation coefficients between energy and model quality. A new multibody statistical potential, ROTAS accounting for the influence of rotameric states on the specificity of atomic interactions was developed and tested on decoy sets. The results show that ROTAS has improved ability to recognize native structure from decoy models compared to other potentials. The effectiveness of ROTAS may provide insightful information for the development of many applications which require accurate side-chain modeling such as protein design, mutation analysis, and docking simulation.

  9. On Lack of Robustness in Hydrological Model Development Due to Absence of Guidelines for Selecting Calibration and Evaluation Data: Demonstration for Data-Driven Models

    NASA Astrophysics Data System (ADS)

    Zheng, Feifei; Maier, Holger R.; Wu, Wenyan; Dandy, Graeme C.; Gupta, Hoshin V.; Zhang, Tuqiao

    2018-02-01

    Hydrological models are used for a wide variety of engineering purposes, including streamflow forecasting and flood-risk estimation. To develop such models, it is common to allocate the available data to calibration and evaluation data subsets. Surprisingly, the issue of how this allocation can affect model evaluation performance has been largely ignored in the research literature. This paper discusses the evaluation performance bias that can arise from how available data are allocated to calibration and evaluation subsets. As a first step to assessing this issue in a statistically rigorous fashion, we present a comprehensive investigation of the influence of data allocation on the development of data-driven artificial neural network (ANN) models of streamflow. Four well-known formal data splitting methods are applied to 754 catchments from Australia and the U.S. to develop 902,483 ANN models. Results clearly show that the choice of the method used for data allocation has a significant impact on model performance, particularly for runoff data that are more highly skewed, highlighting the importance of considering the impact of data splitting when developing hydrological models. The statistical behavior of the data splitting methods investigated is discussed and guidance is offered on the selection of the most appropriate data splitting methods to achieve representative evaluation performance for streamflow data with different statistical properties. Although our results are obtained for data-driven models, they highlight the fact that this issue is likely to have a significant impact on all types of hydrological models, especially conceptual rainfall-runoff models.

  10. A cloud and radiation model-based algorithm for rainfall retrieval from SSM/I multispectral microwave measurements

    NASA Technical Reports Server (NTRS)

    Xiang, Xuwu; Smith, Eric A.; Tripoli, Gregory J.

    1992-01-01

    A hybrid statistical-physical retrieval scheme is explored which combines a statistical approach with an approach based on the development of cloud-radiation models designed to simulate precipitating atmospheres. The algorithm employs the detailed microphysical information from a cloud model as input to a radiative transfer model which generates a cloud-radiation model database. Statistical procedures are then invoked to objectively generate an initial guess composite profile data set from the database. The retrieval algorithm has been tested for a tropical typhoon case using Special Sensor Microwave/Imager (SSM/I) data and has shown satisfactory results.

  11. The epistemology of mathematical and statistical modeling: a quiet methodological revolution.

    PubMed

    Rodgers, Joseph Lee

    2010-01-01

    A quiet methodological revolution, a modeling revolution, has occurred over the past several decades, almost without discussion. In contrast, the 20th century ended with contentious argument over the utility of null hypothesis significance testing (NHST). The NHST controversy may have been at least partially irrelevant, because in certain ways the modeling revolution obviated the NHST argument. I begin with a history of NHST and modeling and their relation to one another. Next, I define and illustrate principles involved in developing and evaluating mathematical models. Following, I discuss the difference between using statistical procedures within a rule-based framework and building mathematical models from a scientific epistemology. Only the former is treated carefully in most psychology graduate training. The pedagogical implications of this imbalance and the revised pedagogy required to account for the modeling revolution are described. To conclude, I discuss how attention to modeling implies shifting statistical practice in certain progressive ways. The epistemological basis of statistics has moved away from being a set of procedures, applied mechanistically, and moved toward building and evaluating statistical and scientific models. Copyrigiht 2009 APA, all rights reserved.

  12. Maximum entropy models of ecosystem functioning

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bertram, Jason, E-mail: jason.bertram@anu.edu.au

    2014-12-05

    Using organism-level traits to deduce community-level relationships is a fundamental problem in theoretical ecology. This problem parallels the physical one of using particle properties to deduce macroscopic thermodynamic laws, which was successfully achieved with the development of statistical physics. Drawing on this parallel, theoretical ecologists from Lotka onwards have attempted to construct statistical mechanistic theories of ecosystem functioning. Jaynes’ broader interpretation of statistical mechanics, which hinges on the entropy maximisation algorithm (MaxEnt), is of central importance here because the classical foundations of statistical physics do not have clear ecological analogues (e.g. phase space, dynamical invariants). However, models based on themore » information theoretic interpretation of MaxEnt are difficult to interpret ecologically. Here I give a broad discussion of statistical mechanical models of ecosystem functioning and the application of MaxEnt in these models. Emphasising the sample frequency interpretation of MaxEnt, I show that MaxEnt can be used to construct models of ecosystem functioning which are statistical mechanical in the traditional sense using a savanna plant ecology model as an example.« less

  13. Developing Risk Prediction Models for Kidney Injury and Assessing Incremental Value for Novel Biomarkers

    PubMed Central

    Kerr, Kathleen F.; Meisner, Allison; Thiessen-Philbrook, Heather; Coca, Steven G.

    2014-01-01

    The field of nephrology is actively involved in developing biomarkers and improving models for predicting patients’ risks of AKI and CKD and their outcomes. However, some important aspects of evaluating biomarkers and risk models are not widely appreciated, and statistical methods are still evolving. This review describes some of the most important statistical concepts for this area of research and identifies common pitfalls. Particular attention is paid to metrics proposed within the last 5 years for quantifying the incremental predictive value of a new biomarker. PMID:24855282

  14. Exploring Contextual Models in Chemical Patent Search

    NASA Astrophysics Data System (ADS)

    Urbain, Jay; Frieder, Ophir

    We explore the development of probabilistic retrieval models for integrating term statistics with entity search using multiple levels of document context to improve the performance of chemical patent search. A distributed indexing model was developed to enable efficient named entity search and aggregation of term statistics at multiple levels of patent structure including individual words, sentences, claims, descriptions, abstracts, and titles. The system can be scaled to an arbitrary number of compute instances in a cloud computing environment to support concurrent indexing and query processing operations on large patent collections.

  15. Analysis of model development strategies: predicting ventral hernia recurrence.

    PubMed

    Holihan, Julie L; Li, Linda T; Askenasy, Erik P; Greenberg, Jacob A; Keith, Jerrod N; Martindale, Robert G; Roth, J Scott; Liang, Mike K

    2016-11-01

    There have been many attempts to identify variables associated with ventral hernia recurrence; however, it is unclear which statistical modeling approach results in models with greatest internal and external validity. We aim to assess the predictive accuracy of models developed using five common variable selection strategies to determine variables associated with hernia recurrence. Two multicenter ventral hernia databases were used. Database 1 was randomly split into "development" and "internal validation" cohorts. Database 2 was designated "external validation". The dependent variable for model development was hernia recurrence. Five variable selection strategies were used: (1) "clinical"-variables considered clinically relevant, (2) "selective stepwise"-all variables with a P value <0.20 were assessed in a step-backward model, (3) "liberal stepwise"-all variables were included and step-backward regression was performed, (4) "restrictive internal resampling," and (5) "liberal internal resampling." Variables were included with P < 0.05 for the Restrictive model and P < 0.10 for the Liberal model. A time-to-event analysis using Cox regression was performed using these strategies. The predictive accuracy of the developed models was tested on the internal and external validation cohorts using Harrell's C-statistic where C > 0.70 was considered "reasonable". The recurrence rate was 32.9% (n = 173/526; median/range follow-up, 20/1-58 mo) for the development cohort, 36.0% (n = 95/264, median/range follow-up 20/1-61 mo) for the internal validation cohort, and 12.7% (n = 155/1224, median/range follow-up 9/1-50 mo) for the external validation cohort. Internal validation demonstrated reasonable predictive accuracy (C-statistics = 0.772, 0.760, 0.767, 0.757, 0.763), while on external validation, predictive accuracy dipped precipitously (C-statistic = 0.561, 0.557, 0.562, 0.553, 0.560). Predictive accuracy was equally adequate on internal validation among models; however, on external validation, all five models failed to demonstrate utility. Future studies should report multiple variable selection techniques and demonstrate predictive accuracy on external data sets for model validation. Copyright © 2016 Elsevier Inc. All rights reserved.

  16. Comparative evaluation of statistical and mechanistic models of Escherichia coli at beaches in southern Lake Michigan

    USGS Publications Warehouse

    Safaie, Ammar; Wendzel, Aaron; Ge, Zhongfu; Nevers, Meredith; Whitman, Richard L.; Corsi, Steven R.; Phanikumar, Mantha S.

    2016-01-01

    Statistical and mechanistic models are popular tools for predicting the levels of indicator bacteria at recreational beaches. Researchers tend to use one class of model or the other, and it is difficult to generalize statements about their relative performance due to differences in how the models are developed, tested, and used. We describe a cooperative modeling approach for freshwater beaches impacted by point sources in which insights derived from mechanistic modeling were used to further improve the statistical models and vice versa. The statistical models provided a basis for assessing the mechanistic models which were further improved using probability distributions to generate high-resolution time series data at the source, long-term “tracer” transport modeling based on observed electrical conductivity, better assimilation of meteorological data, and the use of unstructured-grids to better resolve nearshore features. This approach resulted in improved models of comparable performance for both classes including a parsimonious statistical model suitable for real-time predictions based on an easily measurable environmental variable (turbidity). The modeling approach outlined here can be used at other sites impacted by point sources and has the potential to improve water quality predictions resulting in more accurate estimates of beach closures.

  17. Development of uncertainty-based work injury model using Bayesian structural equation modelling.

    PubMed

    Chatterjee, Snehamoy

    2014-01-01

    This paper proposed a Bayesian method-based structural equation model (SEM) of miners' work injury for an underground coal mine in India. The environmental and behavioural variables for work injury were identified and causal relationships were developed. For Bayesian modelling, prior distributions of SEM parameters are necessary to develop the model. In this paper, two approaches were adopted to obtain prior distribution for factor loading parameters and structural parameters of SEM. In the first approach, the prior distributions were considered as a fixed distribution function with specific parameter values, whereas, in the second approach, prior distributions of the parameters were generated from experts' opinions. The posterior distributions of these parameters were obtained by applying Bayesian rule. The Markov Chain Monte Carlo sampling in the form Gibbs sampling was applied for sampling from the posterior distribution. The results revealed that all coefficients of structural and measurement model parameters are statistically significant in experts' opinion-based priors, whereas, two coefficients are not statistically significant when fixed prior-based distributions are applied. The error statistics reveals that Bayesian structural model provides reasonably good fit of work injury with high coefficient of determination (0.91) and less mean squared error as compared to traditional SEM.

  18. Spatial Dynamics and Determinants of County-Level Education Expenditure in China

    ERIC Educational Resources Information Center

    Gu, Jiafeng

    2012-01-01

    In this paper, a multivariate spatial autoregressive model of local public education expenditure determination with autoregressive disturbance is developed and estimated. The existence of spatial interdependence is tested using Moran's I statistic and Lagrange multiplier test statistics for both the spatial error and spatial lag models. The full…

  19. Additive hazards regression and partial likelihood estimation for ecological monitoring data across space.

    PubMed

    Lin, Feng-Chang; Zhu, Jun

    2012-01-01

    We develop continuous-time models for the analysis of environmental or ecological monitoring data such that subjects are observed at multiple monitoring time points across space. Of particular interest are additive hazards regression models where the baseline hazard function can take on flexible forms. We consider time-varying covariates and take into account spatial dependence via autoregression in space and time. We develop statistical inference for the regression coefficients via partial likelihood. Asymptotic properties, including consistency and asymptotic normality, are established for parameter estimates under suitable regularity conditions. Feasible algorithms utilizing existing statistical software packages are developed for computation. We also consider a simpler additive hazards model with homogeneous baseline hazard and develop hypothesis testing for homogeneity. A simulation study demonstrates that the statistical inference using partial likelihood has sound finite-sample properties and offers a viable alternative to maximum likelihood estimation. For illustration, we analyze data from an ecological study that monitors bark beetle colonization of red pines in a plantation of Wisconsin.

  20. Bayesian models: A statistical primer for ecologists

    USGS Publications Warehouse

    Hobbs, N. Thompson; Hooten, Mevin B.

    2015-01-01

    Bayesian modeling has become an indispensable tool for ecological research because it is uniquely suited to deal with complexity in a statistically coherent way. This textbook provides a comprehensive and accessible introduction to the latest Bayesian methods—in language ecologists can understand. Unlike other books on the subject, this one emphasizes the principles behind the computations, giving ecologists a big-picture understanding of how to implement this powerful statistical approach.Bayesian Models is an essential primer for non-statisticians. It begins with a definition of probability and develops a step-by-step sequence of connected ideas, including basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and inference from single and multiple models. This unique book places less emphasis on computer coding, favoring instead a concise presentation of the mathematical statistics needed to understand how and why Bayesian analysis works. It also explains how to write out properly formulated hierarchical Bayesian models and use them in computing, research papers, and proposals.This primer enables ecologists to understand the statistical principles behind Bayesian modeling and apply them to research, teaching, policy, and management.Presents the mathematical and statistical foundations of Bayesian modeling in language accessible to non-statisticiansCovers basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and moreDeemphasizes computer coding in favor of basic principlesExplains how to write out properly factored statistical expressions representing Bayesian models

  1. Web 2.0 Articles: Content Analysis and a Statistical Model to Predict Recognition of the Need for New Instructional Design Strategies

    ERIC Educational Resources Information Center

    Liu, Leping; Maddux, Cleborne D.

    2008-01-01

    This article presents a study of Web 2.0 articles intended to (a) analyze the content of what is written and (b) develop a statistical model to predict whether authors' write about the need for new instructional design strategies and models. Eighty-eight technology articles were subjected to lexical analysis and a logistic regression model was…

  2. Monte Carlo based statistical power analysis for mediation models: methods and software.

    PubMed

    Zhang, Zhiyong

    2014-12-01

    The existing literature on statistical power analysis for mediation models often assumes data normality and is based on a less powerful Sobel test instead of the more powerful bootstrap test. This study proposes to estimate statistical power to detect mediation effects on the basis of the bootstrap method through Monte Carlo simulation. Nonnormal data with excessive skewness and kurtosis are allowed in the proposed method. A free R package called bmem is developed to conduct the power analysis discussed in this study. Four examples, including a simple mediation model, a multiple-mediator model with a latent mediator, a multiple-group mediation model, and a longitudinal mediation model, are provided to illustrate the proposed method.

  3. Statistical Techniques Complement UML When Developing Domain Models of Complex Dynamical Biosystems.

    PubMed

    Williams, Richard A; Timmis, Jon; Qwarnstrom, Eva E

    2016-01-01

    Computational modelling and simulation is increasingly being used to complement traditional wet-lab techniques when investigating the mechanistic behaviours of complex biological systems. In order to ensure computational models are fit for purpose, it is essential that the abstracted view of biology captured in the computational model, is clearly and unambiguously defined within a conceptual model of the biological domain (a domain model), that acts to accurately represent the biological system and to document the functional requirements for the resultant computational model. We present a domain model of the IL-1 stimulated NF-κB signalling pathway, which unambiguously defines the spatial, temporal and stochastic requirements for our future computational model. Through the development of this model, we observe that, in isolation, UML is not sufficient for the purpose of creating a domain model, and that a number of descriptive and multivariate statistical techniques provide complementary perspectives, in particular when modelling the heterogeneity of dynamics at the single-cell level. We believe this approach of using UML to define the structure and interactions within a complex system, along with statistics to define the stochastic and dynamic nature of complex systems, is crucial for ensuring that conceptual models of complex dynamical biosystems, which are developed using UML, are fit for purpose, and unambiguously define the functional requirements for the resultant computational model.

  4. Statistical Techniques Complement UML When Developing Domain Models of Complex Dynamical Biosystems

    PubMed Central

    Timmis, Jon; Qwarnstrom, Eva E.

    2016-01-01

    Computational modelling and simulation is increasingly being used to complement traditional wet-lab techniques when investigating the mechanistic behaviours of complex biological systems. In order to ensure computational models are fit for purpose, it is essential that the abstracted view of biology captured in the computational model, is clearly and unambiguously defined within a conceptual model of the biological domain (a domain model), that acts to accurately represent the biological system and to document the functional requirements for the resultant computational model. We present a domain model of the IL-1 stimulated NF-κB signalling pathway, which unambiguously defines the spatial, temporal and stochastic requirements for our future computational model. Through the development of this model, we observe that, in isolation, UML is not sufficient for the purpose of creating a domain model, and that a number of descriptive and multivariate statistical techniques provide complementary perspectives, in particular when modelling the heterogeneity of dynamics at the single-cell level. We believe this approach of using UML to define the structure and interactions within a complex system, along with statistics to define the stochastic and dynamic nature of complex systems, is crucial for ensuring that conceptual models of complex dynamical biosystems, which are developed using UML, are fit for purpose, and unambiguously define the functional requirements for the resultant computational model. PMID:27571414

  5. Development of Composite Materials with High Passive Damping Properties

    DTIC Science & Technology

    2006-05-15

    frequency response function analysis. Sound transmission through sandwich panels was studied using the statistical energy analysis (SEA). Modal density...2.2.3 Finite element models 14 2.2.4 Statistical energy analysis method 15 CHAPTER 3 ANALYSIS OF DAMPING IN SANDWICH MATERIALS. 24 3.1 Equation of...sheets and the core. 2.2.4 Statistical energy analysis method Finite element models are generally only efficient for problems at low and middle frequencies

  6. Watershed Regressions for Pesticides (WARP) for Predicting Annual Maximum and Annual Maximum Moving-Average Concentrations of Atrazine in Streams

    USGS Publications Warehouse

    Stone, Wesley W.; Gilliom, Robert J.; Crawford, Charles G.

    2008-01-01

    Regression models were developed for predicting annual maximum and selected annual maximum moving-average concentrations of atrazine in streams using the Watershed Regressions for Pesticides (WARP) methodology developed by the National Water-Quality Assessment Program (NAWQA) of the U.S. Geological Survey (USGS). The current effort builds on the original WARP models, which were based on the annual mean and selected percentiles of the annual frequency distribution of atrazine concentrations. Estimates of annual maximum and annual maximum moving-average concentrations for selected durations are needed to characterize the levels of atrazine and other pesticides for comparison to specific water-quality benchmarks for evaluation of potential concerns regarding human health or aquatic life. Separate regression models were derived for the annual maximum and annual maximum 21-day, 60-day, and 90-day moving-average concentrations. Development of the regression models used the same explanatory variables, transformations, model development data, model validation data, and regression methods as those used in the original development of WARP. The models accounted for 72 to 75 percent of the variability in the concentration statistics among the 112 sampling sites used for model development. Predicted concentration statistics from the four models were within a factor of 10 of the observed concentration statistics for most of the model development and validation sites. Overall, performance of the models for the development and validation sites supports the application of the WARP models for predicting annual maximum and selected annual maximum moving-average atrazine concentration in streams and provides a framework to interpret the predictions in terms of uncertainty. For streams with inadequate direct measurements of atrazine concentrations, the WARP model predictions for the annual maximum and the annual maximum moving-average atrazine concentrations can be used to characterize the probable levels of atrazine for comparison to specific water-quality benchmarks. Sites with a high probability of exceeding a benchmark for human health or aquatic life can be prioritized for monitoring.

  7. Evaluation of a New Mean Scaled and Moment Adjusted Test Statistic for SEM

    ERIC Educational Resources Information Center

    Tong, Xiaoxiao; Bentler, Peter M.

    2013-01-01

    Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and 2 well-known robust test…

  8. Data-driven fuel consumption estimation: A multivariate adaptive regression spline approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Yuche; Zhu, Lei; Gonder, Jeffrey

    Providing guidance and information to drivers to help them make fuel-efficient route choices remains an important and effective strategy in the near term to reduce fuel consumption from the transportation sector. One key component in implementing this strategy is a fuel-consumption estimation model. In this paper, we developed a mesoscopic fuel consumption estimation model that can be implemented into an eco-routing system. Our proposed model presents a framework that utilizes large-scale, real-world driving data, clusters road links by free-flow speed and fits one statistical model for each of cluster. This model includes predicting variables that were rarely or never consideredmore » before, such as free-flow speed and number of lanes. We applied the model to a real-world driving data set based on a global positioning system travel survey in the Philadelphia-Camden-Trenton metropolitan area. Results from the statistical analyses indicate that the independent variables we chose influence the fuel consumption rates of vehicles. But the magnitude and direction of the influences are dependent on the type of road links, specifically free-flow speeds of links. Here, a statistical diagnostic is conducted to ensure the validity of the models and results. Although the real-world driving data we used to develop statistical relationships are specific to one region, the framework we developed can be easily adjusted and used to explore the fuel consumption relationship in other regions.« less

  9. Data-driven fuel consumption estimation: A multivariate adaptive regression spline approach

    DOE PAGES

    Chen, Yuche; Zhu, Lei; Gonder, Jeffrey; ...

    2017-08-12

    Providing guidance and information to drivers to help them make fuel-efficient route choices remains an important and effective strategy in the near term to reduce fuel consumption from the transportation sector. One key component in implementing this strategy is a fuel-consumption estimation model. In this paper, we developed a mesoscopic fuel consumption estimation model that can be implemented into an eco-routing system. Our proposed model presents a framework that utilizes large-scale, real-world driving data, clusters road links by free-flow speed and fits one statistical model for each of cluster. This model includes predicting variables that were rarely or never consideredmore » before, such as free-flow speed and number of lanes. We applied the model to a real-world driving data set based on a global positioning system travel survey in the Philadelphia-Camden-Trenton metropolitan area. Results from the statistical analyses indicate that the independent variables we chose influence the fuel consumption rates of vehicles. But the magnitude and direction of the influences are dependent on the type of road links, specifically free-flow speeds of links. Here, a statistical diagnostic is conducted to ensure the validity of the models and results. Although the real-world driving data we used to develop statistical relationships are specific to one region, the framework we developed can be easily adjusted and used to explore the fuel consumption relationship in other regions.« less

  10. Development of a mathematical model for the dissolution of uranium dioxide. II. Statistical model for the dissolution of uranium dioxide tablets in nitric acid

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhukovskii, Yu.M.; Luksha, O.P.; Nenarokomov, E.A.

    1988-03-01

    We have derived a statistical model for the dissolution of uranium dioxide tablets for the 6 to 12 M concentration range and temperatures from 80/sup 0/C to the boiling point. The model differs qualitatively from the dissolution model for ground uranium dioxide. In the indicated range of experimental conditions, the mean-square deviation of the curves for the model from the experimental curves is not greater than 6%.

  11. Recent updates in developing a statistical pseudo-dynamic source-modeling framework to capture the variability of earthquake rupture scenarios

    NASA Astrophysics Data System (ADS)

    Song, Seok Goo; Kwak, Sangmin; Lee, Kyungbook; Park, Donghee

    2017-04-01

    It is a critical element to predict the intensity and variability of strong ground motions in seismic hazard assessment. The characteristics and variability of earthquake rupture process may be a dominant factor in determining the intensity and variability of near-source strong ground motions. Song et al. (2014) demonstrated that the variability of earthquake rupture scenarios could be effectively quantified in the framework of 1-point and 2-point statistics of earthquake source parameters, constrained by rupture dynamics and past events. The developed pseudo-dynamic source modeling schemes were also validated against the recorded ground motion data of past events and empirical ground motion prediction equations (GMPEs) at the broadband platform (BBP) developed by the Southern California Earthquake Center (SCEC). Recently we improved the computational efficiency of the developed pseudo-dynamic source-modeling scheme by adopting the nonparametric co-regionalization algorithm, introduced and applied in geostatistics initially. We also investigated the effect of earthquake rupture process on near-source ground motion characteristics in the framework of 1-point and 2-point statistics, particularly focusing on the forward directivity region. Finally we will discuss whether the pseudo-dynamic source modeling can reproduce the variability (standard deviation) of empirical GMPEs and the efficiency of 1-point and 2-point statistics to address the variability of ground motions.

  12. Statistically Qualified Neuro-Analytic system and Method for Process Monitoring

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vilim, Richard B.; Garcia, Humberto E.; Chen, Frederick W.

    1998-11-04

    An apparatus and method for monitoring a process involves development and application of a statistically qualified neuro-analytic (SQNA) model to accurately and reliably identify process change. The development of the SQNA model is accomplished in two steps: deterministic model adaption and stochastic model adaptation. Deterministic model adaption involves formulating an analytic model of the process representing known process characteristics,augmenting the analytic model with a neural network that captures unknown process characteristics, and training the resulting neuro-analytic model by adjusting the neural network weights according to a unique scaled equation emor minimization technique. Stochastic model adaptation involves qualifying any remaining uncertaintymore » in the trained neuro-analytic model by formulating a likelihood function, given an error propagation equation, for computing the probability that the neuro-analytic model generates measured process output. Preferably, the developed SQNA model is validated using known sequential probability ratio tests and applied to the process as an on-line monitoring system.« less

  13. Statistical Ensemble of Large Eddy Simulations

    NASA Technical Reports Server (NTRS)

    Carati, Daniele; Rogers, Michael M.; Wray, Alan A.; Mansour, Nagi N. (Technical Monitor)

    2001-01-01

    A statistical ensemble of large eddy simulations (LES) is run simultaneously for the same flow. The information provided by the different large scale velocity fields is used to propose an ensemble averaged version of the dynamic model. This produces local model parameters that only depend on the statistical properties of the flow. An important property of the ensemble averaged dynamic procedure is that it does not require any spatial averaging and can thus be used in fully inhomogeneous flows. Also, the ensemble of LES's provides statistics of the large scale velocity that can be used for building new models for the subgrid-scale stress tensor. The ensemble averaged dynamic procedure has been implemented with various models for three flows: decaying isotropic turbulence, forced isotropic turbulence, and the time developing plane wake. It is found that the results are almost independent of the number of LES's in the statistical ensemble provided that the ensemble contains at least 16 realizations.

  14. Statistical colour models: an automated digital image analysis method for quantification of histological biomarkers.

    PubMed

    Shu, Jie; Dolman, G E; Duan, Jiang; Qiu, Guoping; Ilyas, Mohammad

    2016-04-27

    Colour is the most important feature used in quantitative immunohistochemistry (IHC) image analysis; IHC is used to provide information relating to aetiology and to confirm malignancy. Statistical modelling is a technique widely used for colour detection in computer vision. We have developed a statistical model of colour detection applicable to detection of stain colour in digital IHC images. Model was first trained by massive colour pixels collected semi-automatically. To speed up the training and detection processes, we removed luminance channel, Y channel of YCbCr colour space and chose 128 histogram bins which is the optimal number. A maximum likelihood classifier is used to classify pixels in digital slides into positively or negatively stained pixels automatically. The model-based tool was developed within ImageJ to quantify targets identified using IHC and histochemistry. The purpose of evaluation was to compare the computer model with human evaluation. Several large datasets were prepared and obtained from human oesophageal cancer, colon cancer and liver cirrhosis with different colour stains. Experimental results have demonstrated the model-based tool achieves more accurate results than colour deconvolution and CMYK model in the detection of brown colour, and is comparable to colour deconvolution in the detection of pink colour. We have also demostrated the proposed model has little inter-dataset variations. A robust and effective statistical model is introduced in this paper. The model-based interactive tool in ImageJ, which can create a visual representation of the statistical model and detect a specified colour automatically, is easy to use and available freely at http://rsb.info.nih.gov/ij/plugins/ihc-toolbox/index.html . Testing to the tool by different users showed only minor inter-observer variations in results.

  15. Improving the Validity of Activity of Daily Living Dependency Risk Assessment

    PubMed Central

    Clark, Daniel O.; Stump, Timothy E.; Tu, Wanzhu; Miller, Douglas K.

    2015-01-01

    Objectives Efforts to prevent activity of daily living (ADL) dependency may be improved through models that assess older adults’ dependency risk. We evaluated whether cognition and gait speed measures improve the predictive validity of interview-based models. Method Participants were 8,095 self-respondents in the 2006 Health and Retirement Survey who were aged 65 years or over and independent in five ADLs. Incident ADL dependency was determined from the 2008 interview. Models were developed using random 2/3rd cohorts and validated in the remaining 1/3rd. Results Compared to a c-statistic of 0.79 in the best interview model, the model including cognitive measures had c-statistics of 0.82 and 0.80 while the best fitting gait speed model had c-statistics of 0.83 and 0.79 in the development and validation cohorts, respectively. Conclusion Two relatively brief models, one that requires an in-person assessment and one that does not, had excellent validity for predicting incident ADL dependency but did not significantly improve the predictive validity of the best fitting interview-based models. PMID:24652867

  16. Model Robust Calibration: Method and Application to Electronically-Scanned Pressure Transducers

    NASA Technical Reports Server (NTRS)

    Walker, Eric L.; Starnes, B. Alden; Birch, Jeffery B.; Mays, James E.

    2010-01-01

    This article presents the application of a recently developed statistical regression method to the controlled instrument calibration problem. The statistical method of Model Robust Regression (MRR), developed by Mays, Birch, and Starnes, is shown to improve instrument calibration by reducing the reliance of the calibration on a predetermined parametric (e.g. polynomial, exponential, logarithmic) model. This is accomplished by allowing fits from the predetermined parametric model to be augmented by a certain portion of a fit to the residuals from the initial regression using a nonparametric (locally parametric) regression technique. The method is demonstrated for the absolute scale calibration of silicon-based pressure transducers.

  17. Chain Pooling modeling selection as developed for the statistical analysis of a rotor burst protection experiment

    NASA Technical Reports Server (NTRS)

    Holms, A. G.

    1977-01-01

    As many as three iterated statistical model deletion procedures were considered for an experiment. Population model coefficients were chosen to simulate a saturated 2 to the 4th power experiment having an unfavorable distribution of parameter values. Using random number studies, three model selection strategies were developed, namely, (1) a strategy to be used in anticipation of large coefficients of variation, approximately 65 percent, (2) a strategy to be sued in anticipation of small coefficients of variation, 4 percent or less, and (3) a security regret strategy to be used in the absence of such prior knowledge.

  18. Statistical modeling of space shuttle environmental data

    NASA Technical Reports Server (NTRS)

    Tubbs, J. D.; Brewer, D. W.

    1983-01-01

    Statistical models which use a class of bivariate gamma distribution are examined. Topics discussed include: (1) the ratio of positively correlated gamma varieties; (2) a method to determine if unequal shape parameters are necessary in bivariate gamma distribution; (3) differential equations for modal location of a family of bivariate gamma distribution; and (4) analysis of some wind gust data using the analytical results developed for modeling application.

  19. Statistical Signal Models and Algorithms for Image Analysis

    DTIC Science & Technology

    1984-10-25

    In this report, two-dimensional stochastic linear models are used in developing algorithms for image analysis such as classification, segmentation, and object detection in images characterized by textured backgrounds. These models generate two-dimensional random processes as outputs to which statistical inference procedures can naturally be applied. A common thread throughout our algorithms is the interpretation of the inference procedures in terms of linear prediction

  20. A scan statistic for binary outcome based on hypergeometric probability model, with an application to detecting spatial clusters of Japanese encephalitis.

    PubMed

    Zhao, Xing; Zhou, Xiao-Hua; Feng, Zijian; Guo, Pengfei; He, Hongyan; Zhang, Tao; Duan, Lei; Li, Xiaosong

    2013-01-01

    As a useful tool for geographical cluster detection of events, the spatial scan statistic is widely applied in many fields and plays an increasingly important role. The classic version of the spatial scan statistic for the binary outcome is developed by Kulldorff, based on the Bernoulli or the Poisson probability model. In this paper, we apply the Hypergeometric probability model to construct the likelihood function under the null hypothesis. Compared with existing methods, the likelihood function under the null hypothesis is an alternative and indirect method to identify the potential cluster, and the test statistic is the extreme value of the likelihood function. Similar with Kulldorff's methods, we adopt Monte Carlo test for the test of significance. Both methods are applied for detecting spatial clusters of Japanese encephalitis in Sichuan province, China, in 2009, and the detected clusters are identical. Through a simulation to independent benchmark data, it is indicated that the test statistic based on the Hypergeometric model outweighs Kulldorff's statistics for clusters of high population density or large size; otherwise Kulldorff's statistics are superior.

  1. Statistical characterization of the fatigue behavior of composite lamina

    NASA Technical Reports Server (NTRS)

    Yang, J. N.; Jones, D. L.

    1979-01-01

    A theoretical model was developed to predict statistically the effects of constant and variable amplitude fatigue loadings on the residual strength and fatigue life of composite lamina. The parameters in the model were established from the results of a series of static tensile tests and a fatigue scan and a number of verification tests were performed. Abstracts for two other papers on the effect of load sequence on the statistical fatigue of composites are also presented.

  2. Statistical Compression for Climate Model Output

    NASA Astrophysics Data System (ADS)

    Hammerling, D.; Guinness, J.; Soh, Y. J.

    2017-12-01

    Numerical climate model simulations run at high spatial and temporal resolutions generate massive quantities of data. As our computing capabilities continue to increase, storing all of the data is not sustainable, and thus is it important to develop methods for representing the full datasets by smaller compressed versions. We propose a statistical compression and decompression algorithm based on storing a set of summary statistics as well as a statistical model describing the conditional distribution of the full dataset given the summary statistics. We decompress the data by computing conditional expectations and conditional simulations from the model given the summary statistics. Conditional expectations represent our best estimate of the original data but are subject to oversmoothing in space and time. Conditional simulations introduce realistic small-scale noise so that the decompressed fields are neither too smooth nor too rough compared with the original data. Considerable attention is paid to accurately modeling the original dataset-one year of daily mean temperature data-particularly with regard to the inherent spatial nonstationarity in global fields, and to determining the statistics to be stored, so that the variation in the original data can be closely captured, while allowing for fast decompression and conditional emulation on modest computers.

  3. Comparison between two statistically based methods, and two physically based models developed to compute daily mean streamflow at ungaged locations in the Cedar River Basin, Iowa

    USGS Publications Warehouse

    Linhart, S. Mike; Nania, Jon F.; Christiansen, Daniel E.; Hutchinson, Kasey J.; Sanders, Curtis L.; Archfield, Stacey A.

    2013-01-01

    A variety of individuals from water resource managers to recreational users need streamflow information for planning and decisionmaking at locations where there are no streamgages. To address this problem, two statistically based methods, the Flow Duration Curve Transfer method and the Flow Anywhere method, were developed for statewide application and the two physically based models, the Precipitation Runoff Modeling-System and the Soil and Water Assessment Tool, were only developed for application for the Cedar River Basin. Observed and estimated streamflows for the two methods and models were compared for goodness of fit at 13 streamgages modeled in the Cedar River Basin by using the Nash-Sutcliffe and the percent-bias efficiency values. Based on median and mean Nash-Sutcliffe values for the 13 streamgages the Precipitation Runoff Modeling-System and Soil and Water Assessment Tool models appear to have performed similarly and better than Flow Duration Curve Transfer and Flow Anywhere methods. Based on median and mean percent bias values, the Soil and Water Assessment Tool model appears to have generally overestimated daily mean streamflows, whereas the Precipitation Runoff Modeling-System model and statistical methods appear to have underestimated daily mean streamflows. The Flow Duration Curve Transfer method produced the lowest median and mean percent bias values and appears to perform better than the other models.

  4. Estimating urban ground-level PM10 using MODIS 3km AOD product and meteorological parameters from WRF model

    NASA Astrophysics Data System (ADS)

    Ghotbi, Saba; Sotoudeheian, Saeed; Arhami, Mohammad

    2016-09-01

    Satellite remote sensing products of AOD from MODIS along with appropriate meteorological parameters were used to develop statistical models and estimate ground-level PM10. Most of previous studies obtained meteorological data from synoptic weather stations, with rather sparse spatial distribution, and used it along with 10 km AOD product to develop statistical models, applicable for PM variations in regional scale (resolution of ≥10 km). In the current study, meteorological parameters were simulated with 3 km resolution using WRF model and used along with the rather new 3 km AOD product (launched in 2014). The resulting PM statistical models were assessed for a polluted and largely variable urban area, Tehran, Iran. Despite the critical particulate pollution problem, very few PM studies were conducted in this area. The issue of rather poor direct PM-AOD associations existed, due to different factors such as variations in particles optical properties, in addition to bright background issue for satellite data, as the studied area located in the semi-arid areas of Middle East. Statistical approach of linear mixed effect (LME) was used, and three types of statistical models including single variable LME model (using AOD as independent variable) and multiple variables LME model by using meteorological data from two sources, WRF model and synoptic stations, were examined. Meteorological simulations were performed using a multiscale approach and creating an appropriate physic for the studied region, and the results showed rather good agreements with recordings of the synoptic stations. The single variable LME model was able to explain about 61%-73% of daily PM10 variations, reflecting a rather acceptable performance. Statistical models performance improved through using multivariable LME and incorporating meteorological data as auxiliary variables, particularly by using fine resolution outputs from WRF (R2 = 0.73-0.81). In addition, rather fine resolution for PM estimates was mapped for the studied city, and resulting concentration maps were consistent with PM recordings at the existing stations.

  5. Statistical mechanics of the Huxley-Simmons model

    NASA Astrophysics Data System (ADS)

    Caruel, M.; Truskinovsky, L.

    2016-06-01

    The chemomechanical model of Huxley and Simmons (HS) [A. F. Huxley and R. M. Simmons, Nature 233, 533 (1971), 10.1038/233533a0] provides a paradigmatic description of mechanically induced collective conformational changes relevant in a variety of biological contexts, from muscles power stroke and hair cell gating to integrin binding and hairpin unzipping. We develop a statistical mechanical perspective on the HS model by exploiting a formal analogy with a paramagnetic Ising model. We first study the equilibrium HS model with a finite number of elements and compute explicitly its mechanical and thermal properties. To model kinetics, we derive a master equation and solve it for several loading protocols. The developed formalism is applicable to a broad range of allosteric systems with mean-field interactions.

  6. Recent statistical methods for orientation data

    NASA Technical Reports Server (NTRS)

    Batschelet, E.

    1972-01-01

    The application of statistical methods for determining the areas of animal orientation and navigation are discussed. The method employed is limited to the two-dimensional case. Various tests for determining the validity of the statistical analysis are presented. Mathematical models are included to support the theoretical considerations and tables of data are developed to show the value of information obtained by statistical analysis.

  7. Statistical Cost Estimation in Higher Education: Some Alternatives.

    ERIC Educational Resources Information Center

    Brinkman, Paul T.; Niwa, Shelley

    Recent developments in econometrics that are relevant to the task of estimating costs in higher education are reviewed. The relative effectiveness of alternative statistical procedures for estimating costs are also tested. Statistical cost estimation involves three basic parts: a model, a data set, and an estimation procedure. Actual data are used…

  8. A Three-Dimensional Statistical Average Skull: Application of Biometric Morphing in Generating Missing Anatomy.

    PubMed

    Teshima, Tara Lynn; Patel, Vaibhav; Mainprize, James G; Edwards, Glenn; Antonyshyn, Oleh M

    2015-07-01

    The utilization of three-dimensional modeling technology in craniomaxillofacial surgery has grown exponentially during the last decade. Future development, however, is hindered by the lack of a normative three-dimensional anatomic dataset and a statistical mean three-dimensional virtual model. The purpose of this study is to develop and validate a protocol to generate a statistical three-dimensional virtual model based on a normative dataset of adult skulls. Two hundred adult skull CT images were reviewed. The average three-dimensional skull was computed by processing each CT image in the series using thin-plate spline geometric morphometric protocol. Our statistical average three-dimensional skull was validated by reconstructing patient-specific topography in cranial defects. The experiment was repeated 4 times. In each case, computer-generated cranioplasties were compared directly to the original intact skull. The errors describing the difference between the prediction and the original were calculated. A normative database of 33 adult human skulls was collected. Using 21 anthropometric landmark points, a protocol for three-dimensional skull landmarking and data reduction was developed and a statistical average three-dimensional skull was generated. Our results show the root mean square error (RMSE) for restoration of a known defect using the native best match skull, our statistical average skull, and worst match skull was 0.58, 0.74, and 4.4  mm, respectively. The ability to statistically average craniofacial surface topography will be a valuable instrument for deriving missing anatomy in complex craniofacial defects and deficiencies as well as in evaluating morphologic results of surgery.

  9. Estimating Traffic Accidents in Turkey Using Differential Evolution Algorithm

    NASA Astrophysics Data System (ADS)

    Akgüngör, Ali Payıdar; Korkmaz, Ersin

    2017-06-01

    Estimating traffic accidents play a vital role to apply road safety procedures. This study proposes Differential Evolution Algorithm (DEA) models to estimate the number of accidents in Turkey. In the model development, population (P) and the number of vehicles (N) are selected as model parameters. Three model forms, linear, exponential and semi-quadratic models, are developed using DEA with the data covering from 2000 to 2014. Developed models are statistically compared to select the best fit model. The results of the DE models show that the linear model form is suitable to estimate the number of accidents. The statistics of this form is better than other forms in terms of performance criteria which are the Mean Absolute Percentage Errors (MAPE) and the Root Mean Square Errors (RMSE). To investigate the performance of linear DE model for future estimations, a ten-year period from 2015 to 2024 is considered. The results obtained from future estimations reveal the suitability of DE method for road safety applications.

  10. Developing risk prediction models for kidney injury and assessing incremental value for novel biomarkers.

    PubMed

    Kerr, Kathleen F; Meisner, Allison; Thiessen-Philbrook, Heather; Coca, Steven G; Parikh, Chirag R

    2014-08-07

    The field of nephrology is actively involved in developing biomarkers and improving models for predicting patients' risks of AKI and CKD and their outcomes. However, some important aspects of evaluating biomarkers and risk models are not widely appreciated, and statistical methods are still evolving. This review describes some of the most important statistical concepts for this area of research and identifies common pitfalls. Particular attention is paid to metrics proposed within the last 5 years for quantifying the incremental predictive value of a new biomarker. Copyright © 2014 by the American Society of Nephrology.

  11. Factorial analysis of trihalomethanes formation in drinking water.

    PubMed

    Chowdhury, Shakhawat; Champagne, Pascale; McLellan, P James

    2010-06-01

    Disinfection of drinking water reduces pathogenic infection, but may pose risks to human health through the formation of disinfection byproducts. The effects of different factors on the formation of trihalomethanes were investigated using a statistically designed experimental program, and a predictive model for trihalomethanes formation was developed. Synthetic water samples with different factor levels were produced, and trihalomethanes concentrations were measured. A replicated fractional factorial design with center points was performed, and significant factors were identified through statistical analysis. A second-order trihalomethanes formation model was developed from 92 experiments, and the statistical adequacy was assessed through appropriate diagnostics. This model was validated using additional data from the Drinking Water Surveillance Program database and was applied to the Smiths Falls water supply system in Ontario, Canada. The model predictions were correlated strongly to the measured trihalomethanes, with correlations of 0.95 and 0.91, respectively. The resulting model can assist in analyzing risk-cost tradeoffs in the design and operation of water supply systems.

  12. Family Environment and Cognitive Development: Twelve Analytic Models

    ERIC Educational Resources Information Center

    Walberg, Herbert J.; Marjoribanks, Kevin

    1976-01-01

    The review indicates that refined measures of the family environment and the use of complex statistical models increase the understanding of the relationships between socioeconomic status, sibling variables, family environment, and cognitive development. (RC)

  13. Future missions studies: Combining Schatten's solar activity prediction model with a chaotic prediction model

    NASA Technical Reports Server (NTRS)

    Ashrafi, S.

    1991-01-01

    K. Schatten (1991) recently developed a method for combining his prediction model with our chaotic model. The philosophy behind this combined model and his method of combination is explained. Because the Schatten solar prediction model (KS) uses a dynamo to mimic solar dynamics, accurate prediction is limited to long-term solar behavior (10 to 20 years). The Chaotic prediction model (SA) uses the recently developed techniques of nonlinear dynamics to predict solar activity. It can be used to predict activity only up to the horizon. In theory, the chaotic prediction should be several orders of magnitude better than statistical predictions up to that horizon; beyond the horizon, chaotic predictions would theoretically be just as good as statistical predictions. Therefore, chaos theory puts a fundamental limit on predictability.

  14. Model Error Estimation for the CPTEC Eta Model

    NASA Technical Reports Server (NTRS)

    Tippett, Michael K.; daSilva, Arlindo

    1999-01-01

    Statistical data assimilation systems require the specification of forecast and observation error statistics. Forecast error is due to model imperfections and differences between the initial condition and the actual state of the atmosphere. Practical four-dimensional variational (4D-Var) methods try to fit the forecast state to the observations and assume that the model error is negligible. Here with a number of simplifying assumption, a framework is developed for isolating the model error given the forecast error at two lead-times. Two definitions are proposed for the Talagrand ratio tau, the fraction of the forecast error due to model error rather than initial condition error. Data from the CPTEC Eta Model running operationally over South America are used to calculate forecast error statistics and lower bounds for tau.

  15. Spontaneous cortical activity reveals hallmarks of an optimal internal model of the environment.

    PubMed

    Berkes, Pietro; Orbán, Gergo; Lengyel, Máté; Fiser, József

    2011-01-07

    The brain maintains internal models of its environment to interpret sensory inputs and to prepare actions. Although behavioral studies have demonstrated that these internal models are optimally adapted to the statistics of the environment, the neural underpinning of this adaptation is unknown. Using a Bayesian model of sensory cortical processing, we related stimulus-evoked and spontaneous neural activities to inferences and prior expectations in an internal model and predicted that they should match if the model is statistically optimal. To test this prediction, we analyzed visual cortical activity of awake ferrets during development. Similarity between spontaneous and evoked activities increased with age and was specific to responses evoked by natural scenes. This demonstrates the progressive adaptation of internal models to the statistics of natural stimuli at the neural level.

  16. New powerful statistics for alignment-free sequence comparison under a pattern transfer model.

    PubMed

    Liu, Xuemei; Wan, Lin; Li, Jing; Reinert, Gesine; Waterman, Michael S; Sun, Fengzhu

    2011-09-07

    Alignment-free sequence comparison is widely used for comparing gene regulatory regions and for identifying horizontally transferred genes. Recent studies on the power of a widely used alignment-free comparison statistic D2 and its variants D*2 and D(s)2 showed that their power approximates a limit smaller than 1 as the sequence length tends to infinity under a pattern transfer model. We develop new alignment-free statistics based on D2, D*2 and D(s)2 by comparing local sequence pairs and then summing over all the local sequence pairs of certain length. We show that the new statistics are much more powerful than the corresponding statistics and the power tends to 1 as the sequence length tends to infinity under the pattern transfer model. Copyright © 2011 Elsevier Ltd. All rights reserved.

  17. New Powerful Statistics for Alignment-free Sequence Comparison Under a Pattern Transfer Model

    PubMed Central

    Liu, Xuemei; Wan, Lin; Li, Jing; Reinert, Gesine; Waterman, Michael S.; Sun, Fengzhu

    2011-01-01

    Alignment-free sequence comparison is widely used for comparing gene regulatory regions and for identifying horizontally transferred genes. Recent studies on the power of a widely used alignment-free comparison statistic D2 and its variants D2∗ and D2s showed that their power approximates a limit smaller than 1 as the sequence length tends to infinity under a pattern transfer model. We develop new alignment-free statistics based on D2, D2∗ and D2s by comparing local sequence pairs and then summing over all the local sequence pairs of certain length. We show that the new statistics are much more powerful than the corresponding statistics and the power tends to 1 as the sequence length tends to infinity under the pattern transfer model. PMID:21723298

  18. P values are only an index to evidence: 20th- vs. 21st-century statistical science.

    PubMed

    Burnham, K P; Anderson, D R

    2014-03-01

    Early statistical methods focused on pre-data probability statements (i.e., data as random variables) such as P values; these are not really inferences nor are P values evidential. Statistical science clung to these principles throughout much of the 20th century as a wide variety of methods were developed for special cases. Looking back, it is clear that the underlying paradigm (i.e., testing and P values) was weak. As Kuhn (1970) suggests, new paradigms have taken the place of earlier ones: this is a goal of good science. New methods have been developed and older methods extended and these allow proper measures of strength of evidence and multimodel inference. It is time to move forward with sound theory and practice for the difficult practical problems that lie ahead. Given data the useful foundation shifts to post-data probability statements such as model probabilities (Akaike weights) or related quantities such as odds ratios and likelihood intervals. These new methods allow formal inference from multiple models in the a prior set. These quantities are properly evidential. The past century was aimed at finding the "best" model and making inferences from it. The goal in the 21st century is to base inference on all the models weighted by their model probabilities (model averaging). Estimates of precision can include model selection uncertainty leading to variances conditional on the model set. The 21st century will be about the quantification of information, proper measures of evidence, and multi-model inference. Nelder (1999:261) concludes, "The most important task before us in developing statistical science is to demolish the P-value culture, which has taken root to a frightening extent in many areas of both pure and applied science and technology".

  19. An Update on Statistical Boosting in Biomedicine.

    PubMed

    Mayr, Andreas; Hofner, Benjamin; Waldmann, Elisabeth; Hepp, Tobias; Meyer, Sebastian; Gefeller, Olaf

    2017-01-01

    Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type of effect for the explanatory variables) can be combined with any kind of loss function (target function to be optimized, defining the type of regression setting). In this review article, we highlight the most recent methodological developments on statistical boosting regarding variable selection, functional regression, and advanced time-to-event modelling. Additionally, we provide a short overview on relevant applications of statistical boosting in biomedicine.

  20. Statistical model for forecasting monthly large wildfire events in western United States

    Treesearch

    Haiganoush K. Preisler; Anthony L. Westerling

    2006-01-01

    The ability to forecast the number and location of large wildfire events (with specified confidence bounds) is important to fire managers attempting to allocate and distribute suppression efforts during severe fire seasons. This paper describes the development of a statistical model for assessing the forecasting skills of fire-danger predictors and producing 1-month-...

  1. Developing Statistical Evaluation Model of Introduction Effect of MSW Thermal Recycling

    NASA Astrophysics Data System (ADS)

    Aoyama, Makoto; Kato, Takeyoshi; Suzuoki, Yasuo

    For the effective utilization of municipal solid waste (MSW) through a thermal recycling, new technologies, such as an incineration plant using a Molten Carbonate Fuel Cell (MCFC), are being developed. The impact of new technologies should be evaluated statistically for various municipalities, so that the target of technological development or potential cost reduction due to the increased cumulative number of installed system can be discussed. For this purpose, we developed a model for discussing the impact of new technologies, where a statistical mesh data set was utilized to estimate the heat demand around the incineration plant. This paper examines a case study by using a developed model, where a conventional type and a MCFC type MSW incineration plant is compared in terms of the reduction in primary energy and the revenue by both electricity and heat supply. Based on the difference in annual revenue, we calculate the allowable investment in MCFC-type MSW incineration plant in addition to conventional plant. The results suggest that allowable investment can be about 30 millions yen/(t/day) in small municipalities, while it is only 10 millions yen/(t/day) in large municipalities. The sensitive analysis shows the model can be useful for discussing the difference of impact of material recycling of plastics on thermal recycling technologies.

  2. An analytic technique for statistically modeling random atomic clock errors in estimation

    NASA Technical Reports Server (NTRS)

    Fell, P. J.

    1981-01-01

    Minimum variance estimation requires that the statistics of random observation errors be modeled properly. If measurements are derived through the use of atomic frequency standards, then one source of error affecting the observable is random fluctuation in frequency. This is the case, for example, with range and integrated Doppler measurements from satellites of the Global Positioning and baseline determination for geodynamic applications. An analytic method is presented which approximates the statistics of this random process. The procedure starts with a model of the Allan variance for a particular oscillator and develops the statistics of range and integrated Doppler measurements. A series of five first order Markov processes is used to approximate the power spectral density obtained from the Allan variance.

  3. Load Model Verification, Validation and Calibration Framework by Statistical Analysis on Field Data

    NASA Astrophysics Data System (ADS)

    Jiao, Xiangqing; Liao, Yuan; Nguyen, Thai

    2017-11-01

    Accurate load models are critical for power system analysis and operation. A large amount of research work has been done on load modeling. Most of the existing research focuses on developing load models, while little has been done on developing formal load model verification and validation (V&V) methodologies or procedures. Most of the existing load model validation is based on qualitative rather than quantitative analysis. In addition, not all aspects of model V&V problem have been addressed by the existing approaches. To complement the existing methods, this paper proposes a novel load model verification and validation framework that can systematically and more comprehensively examine load model's effectiveness and accuracy. Statistical analysis, instead of visual check, quantifies the load model's accuracy, and provides a confidence level of the developed load model for model users. The analysis results can also be used to calibrate load models. The proposed framework can be used as a guidance to systematically examine load models for utility engineers and researchers. The proposed method is demonstrated through analysis of field measurements collected from a utility system.

  4. Comparison of climate envelope models developed using expert-selected variables versus statistical selection

    USGS Publications Warehouse

    Brandt, Laura A.; Benscoter, Allison; Harvey, Rebecca G.; Speroterra, Carolina; Bucklin, David N.; Romañach, Stephanie; Watling, James I.; Mazzotti, Frank J.

    2017-01-01

    Climate envelope models are widely used to describe potential future distribution of species under different climate change scenarios. It is broadly recognized that there are both strengths and limitations to using climate envelope models and that outcomes are sensitive to initial assumptions, inputs, and modeling methods Selection of predictor variables, a central step in modeling, is one of the areas where different techniques can yield varying results. Selection of climate variables to use as predictors is often done using statistical approaches that develop correlations between occurrences and climate data. These approaches have received criticism in that they rely on the statistical properties of the data rather than directly incorporating biological information about species responses to temperature and precipitation. We evaluated and compared models and prediction maps for 15 threatened or endangered species in Florida based on two variable selection techniques: expert opinion and a statistical method. We compared model performance between these two approaches for contemporary predictions, and the spatial correlation, spatial overlap and area predicted for contemporary and future climate predictions. In general, experts identified more variables as being important than the statistical method and there was low overlap in the variable sets (<40%) between the two methods Despite these differences in variable sets (expert versus statistical), models had high performance metrics (>0.9 for area under the curve (AUC) and >0.7 for true skill statistic (TSS). Spatial overlap, which compares the spatial configuration between maps constructed using the different variable selection techniques, was only moderate overall (about 60%), with a great deal of variability across species. Difference in spatial overlap was even greater under future climate projections, indicating additional divergence of model outputs from different variable selection techniques. Our work is in agreement with other studies which have found that for broad-scale species distribution modeling, using statistical methods of variable selection is a useful first step, especially when there is a need to model a large number of species or expert knowledge of the species is limited. Expert input can then be used to refine models that seem unrealistic or for species that experts believe are particularly sensitive to change. It also emphasizes the importance of using multiple models to reduce uncertainty and improve map outputs for conservation planning. Where outputs overlap or show the same direction of change there is greater certainty in the predictions. Areas of disagreement can be used for learning by asking why the models do not agree, and may highlight areas where additional on-the-ground data collection could improve the models.

  5. Modifying climate change habitat models using tree species-specific assessments of model uncertainty and life history-factors

    Treesearch

    Stephen N. Matthews; Louis R. Iverson; Anantha M. Prasad; Matthew P. Peters; Paul G. Rodewald

    2011-01-01

    Species distribution models (SDMs) to evaluate trees' potential responses to climate change are essential for developing appropriate forest management strategies. However, there is a great need to better understand these models' limitations and evaluate their uncertainties. We have previously developed statistical models of suitable habitat, based on both...

  6. Prediction of Chemical Function: Model Development and Application

    EPA Science Inventory

    The United States Environmental Protection Agency’s Exposure Forecaster (ExpoCast) project is developing both statistical and mechanism-based computational models for predicting exposures to thousands of chemicals, including those in consumer products. The high-throughput (...

  7. Synthesis of geophysical data with space-acquired imagery: a review

    USGS Publications Warehouse

    Hastings, David A.

    1983-01-01

    Statistical correlation has been used to determine the applicability of specific data sets to the development of geologic or exploration models. Various arithmetic functions have proven useful in developing models from such data sets.

  8. QSAR study of curcumine derivatives as HIV-1 integrase inhibitors.

    PubMed

    Gupta, Pawan; Sharma, Anju; Garg, Prabha; Roy, Nilanjan

    2013-03-01

    A QSAR study was performed on curcumine derivatives as HIV-1 integrase inhibitors using multiple linear regression. The statistically significant model was developed with squared correlation coefficients (r(2)) 0.891 and cross validated r(2) (r(2) cv) 0.825. The developed model revealed that electronic, shape, size, geometry, substitution's information and hydrophilicity were important atomic properties for determining the inhibitory activity of these molecules. The model was also tested successfully for external validation (r(2) pred = 0.849) as well as Tropsha's test for model predictability. Furthermore, the domain analysis was carried out to evaluate the prediction reliability of external set molecules. The model was statistically robust and had good predictive power which can be successfully utilized for screening of new molecules.

  9. Development of a new eyellipse and seating accommodation model for trucks and buses.

    DOT National Transportation Integrated Search

    2005-11-01

    Driver posture data from a laboratory study and an in-vehicle test-track study were used to develop and to : evaluate a new seating accommodation model and eyellipse for SAE Class-B vehicles. The new statistical : models are configurable for populati...

  10. A Survey of Statistical Models for Reverse Engineering Gene Regulatory Networks

    PubMed Central

    Huang, Yufei; Tienda-Luna, Isabel M.; Wang, Yufeng

    2009-01-01

    Statistical models for reverse engineering gene regulatory networks are surveyed in this article. To provide readers with a system-level view of the modeling issues in this research, a graphical modeling framework is proposed. This framework serves as the scaffolding on which the review of different models can be systematically assembled. Based on the framework, we review many existing models for many aspects of gene regulation; the pros and cons of each model are discussed. In addition, network inference algorithms are also surveyed under the graphical modeling framework by the categories of point solutions and probabilistic solutions and the connections and differences among the algorithms are provided. This survey has the potential to elucidate the development and future of reverse engineering GRNs and bring statistical signal processing closer to the core of this research. PMID:20046885

  11. The construction and assessment of a statistical model for the prediction of protein assay data.

    PubMed

    Pittman, J; Sacks, J; Young, S Stanley

    2002-01-01

    The focus of this work is the development of a statistical model for a bioinformatics database whose distinctive structure makes model assessment an interesting and challenging problem. The key components of the statistical methodology, including a fast approximation to the singular value decomposition and the use of adaptive spline modeling and tree-based methods, are described, and preliminary results are presented. These results are shown to compare favorably to selected results achieved using comparitive methods. An attempt to determine the predictive ability of the model through the use of cross-validation experiments is discussed. In conclusion a synopsis of the results of these experiments and their implications for the analysis of bioinformatic databases in general is presented.

  12. Developing Teachers' Reasoning about Comparing Distributions: A Cross-Institutional Effort

    ERIC Educational Resources Information Center

    Tran, Dung; Lee, Hollylynne; Doerr, Helen

    2016-01-01

    The research reported here uses a pre/post-test model and stimulated recall interviews to assess teachers' statistical reasoning about comparing distributions, when enrolled in a graduate-level statistics education course. We discuss key aspects of the course design aimed at improving teachers' learning and teaching of statistics, and the…

  13. Statistically qualified neuro-analytic failure detection method and system

    DOEpatents

    Vilim, Richard B.; Garcia, Humberto E.; Chen, Frederick W.

    2002-03-02

    An apparatus and method for monitoring a process involve development and application of a statistically qualified neuro-analytic (SQNA) model to accurately and reliably identify process change. The development of the SQNA model is accomplished in two stages: deterministic model adaption and stochastic model modification of the deterministic model adaptation. Deterministic model adaption involves formulating an analytic model of the process representing known process characteristics, augmenting the analytic model with a neural network that captures unknown process characteristics, and training the resulting neuro-analytic model by adjusting the neural network weights according to a unique scaled equation error minimization technique. Stochastic model modification involves qualifying any remaining uncertainty in the trained neuro-analytic model by formulating a likelihood function, given an error propagation equation, for computing the probability that the neuro-analytic model generates measured process output. Preferably, the developed SQNA model is validated using known sequential probability ratio tests and applied to the process as an on-line monitoring system. Illustrative of the method and apparatus, the method is applied to a peristaltic pump system.

  14. Spatial scan statistics for detection of multiple clusters with arbitrary shapes.

    PubMed

    Lin, Pei-Sheng; Kung, Yi-Hung; Clayton, Murray

    2016-12-01

    In applying scan statistics for public health research, it would be valuable to develop a detection method for multiple clusters that accommodates spatial correlation and covariate effects in an integrated model. In this article, we connect the concepts of the likelihood ratio (LR) scan statistic and the quasi-likelihood (QL) scan statistic to provide a series of detection procedures sufficiently flexible to apply to clusters of arbitrary shape. First, we use an independent scan model for detection of clusters and then a variogram tool to examine the existence of spatial correlation and regional variation based on residuals of the independent scan model. When the estimate of regional variation is significantly different from zero, a mixed QL estimating equation is developed to estimate coefficients of geographic clusters and covariates. We use the Benjamini-Hochberg procedure (1995) to find a threshold for p-values to address the multiple testing problem. A quasi-deviance criterion is used to regroup the estimated clusters to find geographic clusters with arbitrary shapes. We conduct simulations to compare the performance of the proposed method with other scan statistics. For illustration, the method is applied to enterovirus data from Taiwan. © 2016, The International Biometric Society.

  15. Estimating the impact of mineral aerosols on crop yields in food insecure regions using statistical crop models

    NASA Astrophysics Data System (ADS)

    Hoffman, A.; Forest, C. E.; Kemanian, A.

    2016-12-01

    A significant number of food-insecure nations exist in regions of the world where dust plays a large role in the climate system. While the impacts of common climate variables (e.g. temperature, precipitation, ozone, and carbon dioxide) on crop yields are relatively well understood, the impact of mineral aerosols on yields have not yet been thoroughly investigated. This research aims to develop the data and tools to progress our understanding of mineral aerosol impacts on crop yields. Suspended dust affects crop yields by altering the amount and type of radiation reaching the plant, modifying local temperature and precipitation. While dust events (i.e. dust storms) affect crop yields by depleting the soil of nutrients or by defoliation via particle abrasion. The impact of dust on yields is modeled statistically because we are uncertain which impacts will dominate the response on national and regional scales considered in this study. Multiple linear regression is used in a number of large-scale statistical crop modeling studies to estimate yield responses to various climate variables. In alignment with previous work, we develop linear crop models, but build upon this simple method of regression with machine-learning techniques (e.g. random forests) to identify important statistical predictors and isolate how dust affects yields on the scales of interest. To perform this analysis, we develop a crop-climate dataset for maize, soybean, groundnut, sorghum, rice, and wheat for the regions of West Africa, East Africa, South Africa, and the Sahel. Random forest regression models consistently model historic crop yields better than the linear models. In several instances, the random forest models accurately capture the temperature and precipitation threshold behavior in crops. Additionally, improving agricultural technology has caused a well-documented positive trend that dominates time series of global and regional yields. This trend is often removed before regression with traditional crop models, but likely at the cost of removing climate information. Our random forest models consistently discover the positive trend without removing any additional data. The application of random forests as a statistical crop model provides insight into understanding the impact of dust on yields in marginal food producing regions.

  16. Linking Statistically- and Physically-Based Models for Improved Streamflow Simulation in Gaged and Ungaged Areas

    NASA Astrophysics Data System (ADS)

    Lafontaine, J.; Hay, L.; Archfield, S. A.; Farmer, W. H.; Kiang, J. E.

    2014-12-01

    The U.S. Geological Survey (USGS) has developed a National Hydrologic Model (NHM) to support coordinated, comprehensive and consistent hydrologic model development, and facilitate the application of hydrologic simulations within the continental US. The portion of the NHM located within the Gulf Coastal Plains and Ozarks Landscape Conservation Cooperative (GCPO LCC) is being used to test the feasibility of improving streamflow simulations in gaged and ungaged watersheds by linking statistically- and physically-based hydrologic models. The GCPO LCC covers part or all of 12 states and 5 sub-geographies, totaling approximately 726,000 km2, and is centered on the lower Mississippi Alluvial Valley. A total of 346 USGS streamgages in the GCPO LCC region were selected to evaluate the performance of this new calibration methodology for the period 1980 to 2013. Initially, the physically-based models are calibrated to measured streamflow data to provide a baseline for comparison. An enhanced calibration procedure then is used to calibrate the physically-based models in the gaged and ungaged areas of the GCPO LCC using statistically-based estimates of streamflow. For this application, the calibration procedure is adjusted to address the limitations of the statistically generated time series to reproduce measured streamflow in gaged basins, primarily by incorporating error and bias estimates. As part of this effort, estimates of uncertainty in the model simulations are also computed for the gaged and ungaged watersheds.

  17. Statistical prediction of September Arctic Sea Ice minimum based on stable teleconnections with global climate and oceanic patterns

    NASA Astrophysics Data System (ADS)

    Ionita, M.; Grosfeld, K.; Scholz, P.; Lohmann, G.

    2016-12-01

    Sea ice in both Polar Regions is an important indicator for the expression of global climate change and its polar amplification. Consequently, a broad information interest exists on sea ice, its coverage, variability and long term change. Knowledge on sea ice requires high quality data on ice extent, thickness and its dynamics. However, its predictability depends on various climate parameters and conditions. In order to provide insights into the potential development of a monthly/seasonal signal, we developed a robust statistical model based on ocean heat content, sea surface temperature and atmospheric variables to calculate an estimate of the September minimum sea ice extent for every year. Although previous statistical attempts at monthly/seasonal forecasts of September sea ice minimum show a relatively reduced skill, here it is shown that more than 97% (r = 0.98) of the September sea ice extent can predicted three months in advance by using previous months conditions via a multiple linear regression model based on global sea surface temperature (SST), mean sea level pressure (SLP), air temperature at 850hPa (TT850), surface winds and sea ice extent persistence. The statistical model is based on the identification of regions with stable teleconnections between the predictors (climatological parameters) and the predictand (here sea ice extent). The results based on our statistical model contribute to the sea ice prediction network for the sea ice outlook report (https://www.arcus.org/sipn) and could provide a tool for identifying relevant regions and climate parameters that are important for the sea ice development in the Arctic and for detecting sensitive and critical regions in global coupled climate models with focus on sea ice formation.

  18. A statistical approach for generating synthetic tip stress data from limited CPT soundings

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Basalams, M.K.

    CPT tip stress data obtained from a Uranium mill tailings impoundment are treated as time series. A statistical class of models that was developed to model time series is explored to investigate its applicability in modeling the tip stress series. These models were developed by Box and Jenkins (1970) and are known as Autoregressive Moving Average (ARMA) models. This research demonstrates how to apply the ARMA models to tip stress series. Generation of synthetic tip stress series that preserve the main statistical characteristics of the measured series is also investigated. Multiple regression analysis is used to model the regional variationmore » of the ARMA model parameters as well as the regional variation of the mean and the standard deviation of the measured tip stress series. The reliability of the generated series is investigated from a geotechnical point of view as well as from a statistical point of view. Estimation of the total settlement using the measured and the generated series subjected to the same loading condition are performed. The variation of friction angle with depth of the impoundment materials is also investigated. This research shows that these series can be modeled by the Box and Jenkins ARMA models. A third degree Autoregressive model AR(3) is selected to represent these series. A theoretical double exponential density function is fitted to the AR(3) model residuals. Synthetic tip stress series are generated at nearby locations. The generated series are shown to be reliable in estimating the total settlement and the friction angle variation with depth for this particular site.« less

  19. Melanoma Risk Prediction Models

    Cancer.gov

    Developing statistical models that estimate the probability of developing melanoma cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  20. The Influence Factor Model for the Popularity of Mobile Phone without Considering the Price Factor

    NASA Astrophysics Data System (ADS)

    Long, Hongming; Peng, Diefei; Wu, Hailin; Yang, Zihui

    2018-01-01

    Based on the statistical data like economic development, social development, population indicator and so on, this paper establishes the linear regression model which influences the popularity rate of mobile phone users.

  1. Modeling Systematicity and Individuality in Nonlinear Second Language Development: The Case of English Grammatical Morphemes

    ERIC Educational Resources Information Center

    Murakami, Akira

    2016-01-01

    This article introduces two sophisticated statistical modeling techniques that allow researchers to analyze systematicity, individual variation, and nonlinearity in second language (L2) development. Generalized linear mixed-effects models can be used to quantify individual variation and examine systematic effects simultaneously, and generalized…

  2. Statistical analyses to support guidelines for marine avian sampling. Final report

    USGS Publications Warehouse

    Kinlan, Brian P.; Zipkin, Elise; O'Connell, Allan F.; Caldow, Chris

    2012-01-01

    Interest in development of offshore renewable energy facilities has led to a need for high-quality, statistically robust information on marine wildlife distributions. A practical approach is described to estimate the amount of sampling effort required to have sufficient statistical power to identify species-specific “hotspots” and “coldspots” of marine bird abundance and occurrence in an offshore environment divided into discrete spatial units (e.g., lease blocks), where “hotspots” and “coldspots” are defined relative to a reference (e.g., regional) mean abundance and/or occurrence probability for each species of interest. For example, a location with average abundance or occurrence that is three times larger the mean (3x effect size) could be defined as a “hotspot,” and a location that is three times smaller than the mean (1/3x effect size) as a “coldspot.” The choice of the effect size used to define hot and coldspots will generally depend on a combination of ecological and regulatory considerations. A method is also developed for testing the statistical significance of possible hotspots and coldspots. Both methods are illustrated with historical seabird survey data from the USGS Avian Compendium Database. Our approach consists of five main components: 1. A review of the primary scientific literature on statistical modeling of animal group size and avian count data to develop a candidate set of statistical distributions that have been used or may be useful to model seabird counts. 2. Statistical power curves for one-sample, one-tailed Monte Carlo significance tests of differences of observed small-sample means from a specified reference distribution. These curves show the power to detect "hotspots" or "coldspots" of occurrence and abundance at a range of effect sizes, given assumptions which we discuss. 3. A model selection procedure, based on maximum likelihood fits of models in the candidate set, to determine an appropriate statistical distribution to describe counts of a given species in a particular region and season. 4. Using a large database of historical at-sea seabird survey data, we applied this technique to identify appropriate statistical distributions for modeling a variety of species, allowing the distribution to vary by season. For each species and season, we used the selected distribution to calculate and map retrospective statistical power to detect hotspots and coldspots, and map pvalues from Monte Carlo significance tests of hotspots and coldspots, in discrete lease blocks designated by the U.S. Department of Interior, Bureau of Ocean Energy Management (BOEM). 5. Because our definition of hotspots and coldspots does not explicitly include variability over time, we examine the relationship between the temporal scale of sampling and the proportion of variance captured in time series of key environmental correlates of marine bird abundance, as well as available marine bird abundance time series, and use these analyses to develop recommendations for the temporal distribution of sampling to adequately represent both shortterm and long-term variability. We conclude by presenting a schematic “decision tree” showing how this power analysis approach would fit in a general framework for avian survey design, and discuss implications of model assumptions and results. We discuss avenues for future development of this work, and recommendations for practical implementation in the context of siting and wildlife assessment for offshore renewable energy development projects.

  3. Improving the Document Development Process: Integrating Relational Data and Statistical Process Control.

    ERIC Educational Resources Information Center

    Miller, John

    1994-01-01

    Presents an approach to document numbering, document titling, and process measurement which, when used with fundamental techniques of statistical process control, reveals meaningful process-element variation as well as nominal productivity models. (SR)

  4. Variational stereo imaging of oceanic waves with statistical constraints.

    PubMed

    Gallego, Guillermo; Yezzi, Anthony; Fedele, Francesco; Benetazzo, Alvise

    2013-11-01

    An image processing observational technique for the stereoscopic reconstruction of the waveform of oceanic sea states is developed. The technique incorporates the enforcement of any given statistical wave law modeling the quasi-Gaussianity of oceanic waves observed in nature. The problem is posed in a variational optimization framework, where the desired waveform is obtained as the minimizer of a cost functional that combines image observations, smoothness priors and a weak statistical constraint. The minimizer is obtained by combining gradient descent and multigrid methods on the necessary optimality equations of the cost functional. Robust photometric error criteria and a spatial intensity compensation model are also developed to improve the performance of the presented image matching strategy. The weak statistical constraint is thoroughly evaluated in combination with other elements presented to reconstruct and enforce constraints on experimental stereo data, demonstrating the improvement in the estimation of the observed ocean surface.

  5. A statistical approach to estimate O3 uptake of ponderosa pine in a mediterranean climate

    Treesearch

    N.E. Grulke; H.K. Preisler; C.C. Fan; W.A. Retzlaff

    2002-01-01

    In highly polluted sites, stomatal behavior is sluggish with respect to light, vapor pressure deficit, and internal CO2 concentration (Ci) and poorly described by existing models. Statistical models were developed to estimate stomatal conductance (gs) of 40-year-old ponderosa pine at three sites differing in pollutant exposure for the purpose of...

  6. Discrimination of dynamical system models for biological and chemical processes.

    PubMed

    Lorenz, Sönke; Diederichs, Elmar; Telgmann, Regina; Schütte, Christof

    2007-06-01

    In technical chemistry, systems biology and biotechnology, the construction of predictive models has become an essential step in process design and product optimization. Accurate modelling of the reactions requires detailed knowledge about the processes involved. However, when concerned with the development of new products and production techniques for example, this knowledge often is not available due to the lack of experimental data. Thus, when one has to work with a selection of proposed models, the main tasks of early development is to discriminate these models. In this article, a new statistical approach to model discrimination is described that ranks models wrt. the probability with which they reproduce the given data. The article introduces the new approach, discusses its statistical background, presents numerical techniques for its implementation and illustrates the application to examples from biokinetics.

  7. Evaluation of the Williams-type spring wheat model in North Dakota and Minnesota

    NASA Technical Reports Server (NTRS)

    Leduc, S. (Principal Investigator)

    1982-01-01

    The Williams type model, developed similarly to previous models of C.V.D. Williams, uses monthly temperature and precipitation data as well as soil and topological variables to predict the yield of the spring wheat crop. The models are statistically developed using the regression technique. Eight model characteristics are examined in the evaluation of the model. Evaluation is at the crop reporting district level, the state level and for the entire region. A ten year bootstrap test was the basis of the statistical evaluation. The accuracy and current indication of modeled yield reliability could show improvement. There is great variability in the bias measured over the districts, but there is a slight overall positive bias. The model estimates for the east central crop reporting district in Minnesota are not accurate. The estimate of yield for 1974 were inaccurate for all of the models.

  8. Intercomparison of four regional climate models for the German State of Saxonia

    NASA Astrophysics Data System (ADS)

    Kreienkamp, F.; Spekat, A.; Enke, W.

    2009-09-01

    Results from four regional climate models which focus on Central Europe are presented: CCLM, the climate version of the German Weather Service's Local Model - REMO, the regional dynamic model from the Max Planck Institute for Meteorology in Hamburg - STAR, the statistical model developed at the PIK Potsdam Institute and WETTREG, the statistic-dynamic model developed by the company CEC Potsdam. For the area of the German State of Saxonia a host of properties and indicators were analyzed aiming to show the models' abilities to reconstruct the current climate and compare climate model scenarios. These include a group of thermal indicators, such as the number of ice, frost, summer and hot days, the number of tropical nights; then there are hydrometeorological indicators such as the exceedance of low and high precipitation thresholds; humidity, cloudiness and wind indicators complement the array. A selection of them showing similarities and differences of the models investigated will be presented.

  9. Systems Engineering Metrics: Organizational Complexity and Product Quality Modeling

    NASA Technical Reports Server (NTRS)

    Mog, Robert A.

    1997-01-01

    Innovative organizational complexity and product quality models applicable to performance metrics for NASA-MSFC's Systems Analysis and Integration Laboratory (SAIL) missions and objectives are presented. An intensive research effort focuses on the synergistic combination of stochastic process modeling, nodal and spatial decomposition techniques, organizational and computational complexity, systems science and metrics, chaos, and proprietary statistical tools for accelerated risk assessment. This is followed by the development of a preliminary model, which is uniquely applicable and robust for quantitative purposes. Exercise of the preliminary model using a generic system hierarchy and the AXAF-I architectural hierarchy is provided. The Kendall test for positive dependence provides an initial verification and validation of the model. Finally, the research and development of the innovation is revisited, prior to peer review. This research and development effort results in near-term, measurable SAIL organizational and product quality methodologies, enhanced organizational risk assessment and evolutionary modeling results, and 91 improved statistical quantification of SAIL productivity interests.

  10. Ensemble engineering and statistical modeling for parameter calibration towards optimal design of microbial fuel cells

    NASA Astrophysics Data System (ADS)

    Sun, Hongyue; Luo, Shuai; Jin, Ran; He, Zhen

    2017-07-01

    Mathematical modeling is an important tool to investigate the performance of microbial fuel cell (MFC) towards its optimized design. To overcome the shortcoming of traditional MFC models, an ensemble model is developed through integrating both engineering model and statistical analytics for the extrapolation scenarios in this study. Such an ensemble model can reduce laboring effort in parameter calibration and require fewer measurement data to achieve comparable accuracy to traditional statistical model under both the normal and extreme operation regions. Based on different weight between current generation and organic removal efficiency, the ensemble model can give recommended input factor settings to achieve the best current generation and organic removal efficiency. The model predicts a set of optimal design factors for the present tubular MFCs including the anode flow rate of 3.47 mL min-1, organic concentration of 0.71 g L-1, and catholyte pumping flow rate of 14.74 mL min-1 to achieve the peak current at 39.2 mA. To maintain 100% organic removal efficiency, the anode flow rate and organic concentration should be controlled lower than 1.04 mL min-1 and 0.22 g L-1, respectively. The developed ensemble model can be potentially modified to model other types of MFCs or bioelectrochemical systems.

  11. Low-complexity stochastic modeling of wall-bounded shear flows

    NASA Astrophysics Data System (ADS)

    Zare, Armin

    Turbulent flows are ubiquitous in nature and they appear in many engineering applications. Transition to turbulence, in general, increases skin-friction drag in air/water vehicles compromising their fuel-efficiency and reduces the efficiency and longevity of wind turbines. While traditional flow control techniques combine physical intuition with costly experiments, their effectiveness can be significantly enhanced by control design based on low-complexity models and optimization. In this dissertation, we develop a theoretical and computational framework for the low-complexity stochastic modeling of wall-bounded shear flows. Part I of the dissertation is devoted to the development of a modeling framework which incorporates data-driven techniques to refine physics-based models. We consider the problem of completing partially known sample statistics in a way that is consistent with underlying stochastically driven linear dynamics. Neither the statistics nor the dynamics are precisely known. Thus, our objective is to reconcile the two in a parsimonious manner. To this end, we formulate optimization problems to identify the dynamics and directionality of input excitation in order to explain and complete available covariance data. For problem sizes that general-purpose solvers cannot handle, we develop customized optimization algorithms based on alternating direction methods. The solution to the optimization problem provides information about critical directions that have maximal effect in bringing model and statistics in agreement. In Part II, we employ our modeling framework to account for statistical signatures of turbulent channel flow using low-complexity stochastic dynamical models. We demonstrate that white-in-time stochastic forcing is not sufficient to explain turbulent flow statistics and develop models for colored-in-time forcing of the linearized Navier-Stokes equations. We also examine the efficacy of stochastically forced linearized NS equations and their parabolized equivalents in the receptivity analysis of velocity fluctuations to external sources of excitation as well as capturing the effect of the slowly-varying base flow on streamwise streaks and Tollmien-Schlichting waves. In Part III, we develop a model-based approach to design surface actuation of turbulent channel flow in the form of streamwise traveling waves. This approach is capable of identifying the drag reducing trends of traveling waves in a simulation-free manner. We also use the stochastically forced linearized NS equations to examine the Reynolds number independent effects of spanwise wall oscillations on drag reduction in turbulent channel flows. This allows us to extend the predictive capability of our simulation-free approach to high Reynolds numbers.

  12. Mathematical and statistical models for determining the crop load in grapevine

    NASA Astrophysics Data System (ADS)

    Alina, Dobrei; Alin, Dobrei; Eleonora, Nistor; Teodor, Cristea; Marius, Boldea; Florin, Sala

    2016-06-01

    Ensuring a balance between vine crop load and vine vegetative growth is a dynamic process, so it is necessary to develop models for describing this relationship. This study analyzed the interrelationship between the crop load and growing specific parameters (viable buds - VB, dead (frost-injured) buds - DB, total shoots growth-TSG, one-year-old wood - MSG), in two vine grapes varieties: Muscat Ottonel cultivar for wine and Victoria cultivar for fresh grapes. In both varieties interrelationship between the buds number and vegetative growth parameters were described by polynomial functions statistically assured. Using regression analysis it was possible to develop predictive models for one-year-old wood (MSG), an important parameter for the yield and quality of wine grape production, with statistical significance results (R2 = 0.884, p <0.001, F = 45.957 in Muscat Ottonel cultivar and R2 = 0.893, p = 0.001, F = 49.886 in Victoria cultivar).

  13. The beta distribution: A statistical model for world cloud cover

    NASA Technical Reports Server (NTRS)

    Falls, L. W.

    1973-01-01

    Much work has been performed in developing empirical global cloud cover models. This investigation was made to determine an underlying theoretical statistical distribution to represent worldwide cloud cover. The beta distribution with probability density function is given to represent the variability of this random variable. It is shown that the beta distribution possesses the versatile statistical characteristics necessary to assume the wide variety of shapes exhibited by cloud cover. A total of 160 representative empirical cloud cover distributions were investigated and the conclusion was reached that this study provides sufficient statical evidence to accept the beta probability distribution as the underlying model for world cloud cover.

  14. Guidelines 13 and 14—Prediction uncertainty

    USGS Publications Warehouse

    Hill, Mary C.; Tiedeman, Claire

    2005-01-01

    An advantage of using optimization for model development and calibration is that optimization provides methods for evaluating and quantifying prediction uncertainty. Both deterministic and statistical methods can be used. Guideline 13 discusses using regression and post-audits, which we classify as deterministic methods. Guideline 14 discusses inferential statistics and Monte Carlo methods, which we classify as statistical methods.

  15. Vibration Transmission through Rolling Element Bearings in Geared Rotor Systems

    DTIC Science & Technology

    1990-11-01

    147 4.8 Concluding Remarks ........................................................... 153 V STATISTICAL ENERGY ANALYSIS ............................................ 155...and dynamic finite element techniques are used to develop the discrete vibration models while statistical energy analysis method is used for the broad...bearing system studies, geared rotor system studies, and statistical energy analysis . Each chapter is self sufficient since it is written in a

  16. Prostate Cancer Risk Prediction Models

    Cancer.gov

    Developing statistical models that estimate the probability of developing prostate cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  17. Bladder Cancer Risk Prediction Models

    Cancer.gov

    Developing statistical models that estimate the probability of developing bladder cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  18. Ovarian Cancer Risk Prediction Models

    Cancer.gov

    Developing statistical models that estimate the probability of developing ovarian cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  19. Pancreatic Cancer Risk Prediction Models

    Cancer.gov

    Developing statistical models that estimate the probability of developing pancreatic cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  20. Breast Cancer Risk Prediction Models

    Cancer.gov

    Developing statistical models that estimate the probability of developing breast cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  1. Esophageal Cancer Risk Prediction Models

    Cancer.gov

    Developing statistical models that estimate the probability of developing esophageal cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  2. Cervical Cancer Risk Prediction Models

    Cancer.gov

    Developing statistical models that estimate the probability of developing cervical cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  3. Liver Cancer Risk Prediction Models

    Cancer.gov

    Developing statistical models that estimate the probability of developing liver cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  4. Lung Cancer Risk Prediction Models

    Cancer.gov

    Developing statistical models that estimate the probability of developing lung cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  5. Colorectal Cancer Risk Prediction Models

    Cancer.gov

    Developing statistical models that estimate the probability of developing colorectal cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  6. Statistical analysis of target acquisition sensor modeling experiments

    NASA Astrophysics Data System (ADS)

    Deaver, Dawne M.; Moyer, Steve

    2015-05-01

    The U.S. Army RDECOM CERDEC NVESD Modeling and Simulation Division is charged with the development and advancement of military target acquisition models to estimate expected soldier performance when using all types of imaging sensors. Two elements of sensor modeling are (1) laboratory-based psychophysical experiments used to measure task performance and calibrate the various models and (2) field-based experiments used to verify the model estimates for specific sensors. In both types of experiments, it is common practice to control or measure environmental, sensor, and target physical parameters in order to minimize uncertainty of the physics based modeling. Predicting the minimum number of test subjects required to calibrate or validate the model should be, but is not always, done during test planning. The objective of this analysis is to develop guidelines for test planners which recommend the number and types of test samples required to yield a statistically significant result.

  7. Assessing the prediction accuracy of a cure model for censored survival data with long-term survivors: Application to breast cancer data.

    PubMed

    Asano, Junichi; Hirakawa, Akihiro

    2017-01-01

    The Cox proportional hazards cure model is a survival model incorporating a cure rate with the assumption that the population contains both uncured and cured individuals. It contains a logistic regression for the cure rate, and a Cox regression to estimate the hazard for uncured patients. A single predictive model for both the cure and hazard can be developed by using a cure model that simultaneously predicts the cure rate and hazards for uncured patients; however, model selection is a challenge because of the lack of a measure for quantifying the predictive accuracy of a cure model. Recently, we developed an area under the receiver operating characteristic curve (AUC) for determining the cure rate in a cure model (Asano et al., 2014), but the hazards measure for uncured patients was not resolved. In this article, we propose novel C-statistics that are weighted by the patients' cure status (i.e., cured, uncured, or censored cases) for the cure model. The operating characteristics of the proposed C-statistics and their confidence interval were examined by simulation analyses. We also illustrate methods for predictive model selection and for further interpretation of variables using the proposed AUCs and C-statistics via application to breast cancer data.

  8. A new in silico classification model for ready biodegradability, based on molecular fragments.

    PubMed

    Lombardo, Anna; Pizzo, Fabiola; Benfenati, Emilio; Manganaro, Alberto; Ferrari, Thomas; Gini, Giuseppina

    2014-08-01

    Regulations such as the European REACH (Registration, Evaluation, Authorization and restriction of Chemicals) often require chemicals to be evaluated for ready biodegradability, to assess the potential risk for environmental and human health. Because not all chemicals can be tested, there is an increasing demand for tools for quick and inexpensive biodegradability screening, such as computer-based (in silico) theoretical models. We developed an in silico model starting from a dataset of 728 chemicals with ready biodegradability data (MITI-test Ministry of International Trade and Industry). We used the novel software SARpy to automatically extract, through a structural fragmentation process, a set of substructures statistically related to ready biodegradability. Then, we analysed these substructures in order to build some general rules. The model consists of a rule-set made up of the combination of the statistically relevant fragments and of the expert-based rules. The model gives good statistical performance with 92%, 82% and 76% accuracy on the training, test and external set respectively. These results are comparable with other in silico models like BIOWIN developed by the United States Environmental Protection Agency (EPA); moreover this new model includes an easily understandable explanation. Copyright © 2014 Elsevier Ltd. All rights reserved.

  9. A Predictive Statistical Model of Navy Career Enlisted Retention Behavior Utilizing Economic Variables.

    DTIC Science & Technology

    1980-12-01

    career retention rates , and to predict future career retention rates in the Navy. The statistical model utilizes economic variables as predictors...The model developed r has a high correlation with Navy career retention rates . The problem of Navy career retention has not been adequately studied, 0D...findings indicate Navy policymakers must be cognizant of the relationships of economic factors to Navy career retention rates . Accrzsiofl ’or NTIS GRA&I

  10. Investigation of Biotransport in a Tumor With Uncertain Material Properties Using a Nonintrusive Spectral Uncertainty Quantification Method.

    PubMed

    Alexanderian, Alen; Zhu, Liang; Salloum, Maher; Ma, Ronghui; Yu, Meilin

    2017-09-01

    In this study, statistical models are developed for modeling uncertain heterogeneous permeability and porosity in tumors, and the resulting uncertainties in pressure and velocity fields during an intratumoral injection are quantified using a nonintrusive spectral uncertainty quantification (UQ) method. Specifically, the uncertain permeability is modeled as a log-Gaussian random field, represented using a truncated Karhunen-Lòeve (KL) expansion, and the uncertain porosity is modeled as a log-normal random variable. The efficacy of the developed statistical models is validated by simulating the concentration fields with permeability and porosity of different uncertainty levels. The irregularity in the concentration field bears reasonable visual agreement with that in MicroCT images from experiments. The pressure and velocity fields are represented using polynomial chaos (PC) expansions to enable efficient computation of their statistical properties. The coefficients in the PC expansion are computed using a nonintrusive spectral projection method with the Smolyak sparse quadrature. The developed UQ approach is then used to quantify the uncertainties in the random pressure and velocity fields. A global sensitivity analysis is also performed to assess the contribution of individual KL modes of the log-permeability field to the total variance of the pressure field. It is demonstrated that the developed UQ approach can effectively quantify the flow uncertainties induced by uncertain material properties of the tumor.

  11. Infinitely divisible cascades to model the statistics of natural images.

    PubMed

    Chainais, Pierre

    2007-12-01

    We propose to model the statistics of natural images thanks to the large class of stochastic processes called Infinitely Divisible Cascades (IDC). IDC were first introduced in one dimension to provide multifractal time series to model the so-called intermittency phenomenon in hydrodynamical turbulence. We have extended the definition of scalar infinitely divisible cascades from 1 to N dimensions and commented on the relevance of such a model in fully developed turbulence in [1]. In this article, we focus on the particular 2 dimensional case. IDC appear as good candidates to model the statistics of natural images. They share most of their usual properties and appear to be consistent with several independent theoretical and experimental approaches of the literature. We point out the interest of IDC for applications to procedural texture synthesis.

  12. Customizing national models for a medical center's population to rapidly identify patients at high risk of 30-day all-cause hospital readmission following a heart failure hospitalization.

    PubMed

    Cox, Zachary L; Lai, Pikki; Lewis, Connie M; Lindenfeld, JoAnn; Collins, Sean P; Lenihan, Daniel J

    2018-05-28

    Nationally-derived models predicting 30-day readmissions following heart failure (HF) hospitalizations yield insufficient discrimination for institutional use. Develop a customized readmission risk model from Medicare-employed and institutionally-customized risk factors and compare the performance against national models in a medical center. Medicare patients age ≥ 65 years hospitalized for HF (n = 1,454) were studied in a derivation cohort and in a separate validation cohort (n = 243). All 30-day hospital readmissions were documented. The primary outcome was risk discrimination (c-statistic) compared to national models. A customized model demonstrated improved discrimination (c-statistic 0.72; 95% CI 0.69 - 0.74) compared to national models (c-statistics of 0.60 and 0.61) with a c-statistic of 0.63 in the validation cohort. Compared to national models, a customized model demonstrated superior readmission risk profiling by distinguishing a high-risk (38.3%) from a low-risk (9.4%) quartile. A customized model improved readmission risk discrimination from HF hospitalizations compared to national models. Copyright © 2018 Elsevier Inc. All rights reserved.

  13. Treated cabin acoustic prediction using statistical energy analysis

    NASA Technical Reports Server (NTRS)

    Yoerkie, Charles A.; Ingraham, Steven T.; Moore, James A.

    1987-01-01

    The application of statistical energy analysis (SEA) to the modeling and design of helicopter cabin interior noise control treatment is demonstrated. The information presented here is obtained from work sponsored at NASA Langley for the development of analytic modeling techniques and the basic understanding of cabin noise. Utility and executive interior models are developed directly from existing S-76 aircraft designs. The relative importance of panel transmission loss (TL), acoustic leakage, and absorption to the control of cabin noise is shown using the SEA modeling parameters. It is shown that the major cabin noise improvement below 1000 Hz comes from increased panel TL, while above 1000 Hz it comes from reduced acoustic leakage and increased absorption in the cabin and overhead cavities.

  14. Bayesian Sensitivity Analysis of Statistical Models with Missing Data

    PubMed Central

    ZHU, HONGTU; IBRAHIM, JOSEPH G.; TANG, NIANSHENG

    2013-01-01

    Methods for handling missing data depend strongly on the mechanism that generated the missing values, such as missing completely at random (MCAR) or missing at random (MAR), as well as other distributional and modeling assumptions at various stages. It is well known that the resulting estimates and tests may be sensitive to these assumptions as well as to outlying observations. In this paper, we introduce various perturbations to modeling assumptions and individual observations, and then develop a formal sensitivity analysis to assess these perturbations in the Bayesian analysis of statistical models with missing data. We develop a geometric framework, called the Bayesian perturbation manifold, to characterize the intrinsic structure of these perturbations. We propose several intrinsic influence measures to perform sensitivity analysis and quantify the effect of various perturbations to statistical models. We use the proposed sensitivity analysis procedure to systematically investigate the tenability of the non-ignorable missing at random (NMAR) assumption. Simulation studies are conducted to evaluate our methods, and a dataset is analyzed to illustrate the use of our diagnostic measures. PMID:24753718

  15. A Stochastic Fractional Dynamics Model of Rainfall Statistics

    NASA Astrophysics Data System (ADS)

    Kundu, Prasun; Travis, James

    2013-04-01

    Rainfall varies in space and time in a highly irregular manner and is described naturally in terms of a stochastic process. A characteristic feature of rainfall statistics is that they depend strongly on the space-time scales over which rain data are averaged. A spectral model of precipitation has been developed based on a stochastic differential equation of fractional order for the point rain rate, that allows a concise description of the second moment statistics of rain at any prescribed space-time averaging scale. The model is designed to faithfully reflect the scale dependence and is thus capable of providing a unified description of the statistics of both radar and rain gauge data. The underlying dynamical equation can be expressed in terms of space-time derivatives of fractional orders that are adjusted together with other model parameters to fit the data. The form of the resulting spectrum gives the model adequate flexibility to capture the subtle interplay between the spatial and temporal scales of variability of rain but strongly constrains the predicted statistical behavior as a function of the averaging length and times scales. The main restriction is the assumption that the statistics of the precipitation field is spatially homogeneous and isotropic and stationary in time. We test the model with radar and gauge data collected contemporaneously at the NASA TRMM ground validation sites located near Melbourne, Florida and in Kwajalein Atoll, Marshall Islands in the tropical Pacific. We estimate the parameters by tuning them to the second moment statistics of the radar data. The model predictions are then found to fit the second moment statistics of the gauge data reasonably well without any further adjustment. Some data sets containing periods of non-stationary behavior that involves occasional anomalously correlated rain events, present a challenge for the model.

  16. Classroom Research: Assessment of Student Understanding of Sampling Distributions of Means and the Central Limit Theorem in Post-Calculus Probability and Statistics Classes

    ERIC Educational Resources Information Center

    Lunsford, M. Leigh; Rowell, Ginger Holmes; Goodson-Espy, Tracy

    2006-01-01

    We applied a classroom research model to investigate student understanding of sampling distributions of sample means and the Central Limit Theorem in post-calculus introductory probability and statistics courses. Using a quantitative assessment tool developed by previous researchers and a qualitative assessment tool developed by the authors, we…

  17. Data Model Performance in Data Warehousing

    NASA Astrophysics Data System (ADS)

    Rorimpandey, G. C.; Sangkop, F. I.; Rantung, V. P.; Zwart, J. P.; Liando, O. E. S.; Mewengkang, A.

    2018-02-01

    Data Warehouses have increasingly become important in organizations that have large amount of data. It is not a product but a part of a solution for the decision support system in those organizations. Data model is the starting point for designing and developing of data warehouses architectures. Thus, the data model needs stable interfaces and consistent for a longer period of time. The aim of this research is to know which data model in data warehousing has the best performance. The research method is descriptive analysis, which has 3 main tasks, such as data collection and organization, analysis of data and interpretation of data. The result of this research is discussed in a statistic analysis method, represents that there is no statistical difference among data models used in data warehousing. The organization can utilize four data model proposed when designing and developing data warehouse.

  18. Statistical models for fever forecasting based on advanced body temperature monitoring.

    PubMed

    Jordan, Jorge; Miro-Martinez, Pau; Vargas, Borja; Varela-Entrecanales, Manuel; Cuesta-Frau, David

    2017-02-01

    Body temperature monitoring provides health carers with key clinical information about the physiological status of patients. Temperature readings are taken periodically to detect febrile episodes and consequently implement the appropriate medical countermeasures. However, fever is often difficult to assess at early stages, or remains undetected until the next reading, probably a few hours later. The objective of this article is to develop a statistical model to forecast fever before a temperature threshold is exceeded to improve the therapeutic approach to the subjects involved. To this end, temperature series of 9 patients admitted to a general internal medicine ward were obtained with a continuous monitoring Holter device, collecting measurements of peripheral and core temperature once per minute. These series were used to develop different statistical models that could quantify the probability of having a fever spike in the following 60 minutes. A validation series was collected to assess the accuracy of the models. Finally, the results were compared with the analysis of some series by experienced clinicians. Two different models were developed: a logistic regression model and a linear discrimination analysis model. Both of them exhibited a fever peak forecasting accuracy greater than 84%. When compared with experts' assessment, both models identified 35 (97.2%) of 36 fever spikes. The models proposed are highly accurate in forecasting the appearance of fever spikes within a short period in patients with suspected or confirmed febrile-related illnesses. Copyright © 2016 Elsevier Inc. All rights reserved.

  19. Assessment of credit risk based on fuzzy relations

    NASA Astrophysics Data System (ADS)

    Tsabadze, Teimuraz

    2017-06-01

    The purpose of this paper is to develop a new approach for an assessment of the credit risk to corporate borrowers. There are different models for borrowers' risk assessment. These models are divided into two groups: statistical and theoretical. When assessing the credit risk for corporate borrowers, statistical model is unacceptable due to the lack of sufficiently large history of defaults. At the same time, we cannot use some theoretical models due to the lack of stock exchange. In those cases, when studying a particular borrower given that statistical base does not exist, the decision-making process is always of expert nature. The paper describes a new approach that may be used in group decision-making. An example of the application of the proposed approach is given.

  20. Use of statistical and pharmacokinetic-pharmacodynamic modeling and simulation to improve decision-making: A section summary report of the trends and innovations in clinical trial statistics conference.

    PubMed

    Kimko, Holly; Berry, Seth; O'Kelly, Michael; Mehrotra, Nitin; Hutmacher, Matthew; Sethuraman, Venkat

    2017-01-01

    The application of modeling and simulation (M&S) methods to improve decision-making was discussed during the Trends & Innovations in Clinical Trial Statistics Conference held in Durham, North Carolina, USA on May 1-4, 2016. Uses of both pharmacometric and statistical M&S were presented during the conference, highlighting the diversity of the methods employed by pharmacometricians and statisticians to address a broad range of quantitative issues in drug development. Five presentations are summarized herein, which cover the development strategy of employing M&S to drive decision-making; European initiatives on best practice in M&S; case studies of pharmacokinetic/pharmacodynamics modeling in regulatory decisions; estimation of exposure-response relationships in the presence of confounding; and the utility of estimating the probability of a correct decision for dose selection when prior information is limited. While M&S has been widely used during the last few decades, it is expected to play an essential role as more quantitative assessments are employed in the decision-making process. By integrating M&S as a tool to compile the totality of evidence collected throughout the drug development program, more informed decisions will be made.

  1. On-Line Analysis of Physiologic and Neurobehavioral Variables During Long-Duration Space Missions

    NASA Technical Reports Server (NTRS)

    Brown, Emery N.

    1999-01-01

    The goal of this project is to develop reliable statistical algorithms for on-line analysis of physiologic and neurobehavioral variables monitored during long-duration space missions. Maintenance of physiologic and neurobehavioral homeostasis during long-duration space missions is crucial for ensuring optimal crew performance. If countermeasures are not applied, alterations in homeostasis will occur in nearly all-physiologic systems. During such missions data from most of these systems will be either continually and/or continuously monitored. Therefore, if these data can be analyzed as they are acquired and the status of these systems can be continually assessed, then once alterations are detected, appropriate countermeasures can be applied to correct them. One of the most important physiologic systems in which to maintain homeostasis during long-duration missions is the circadian system. To detect and treat alterations in circadian physiology during long duration space missions requires development of: 1) a ground-based protocol to assess the status of the circadian system under the light-dark environment in which crews in space will typically work; and 2) appropriate statistical methods to make this assessment. The protocol in Project 1, Circadian Entrainment, Sleep-Wake Regulation and Neurobehavioral will study human volunteers under the simulated light-dark environment of long-duration space missions. Therefore, we propose to develop statistical models to characterize in near real time circadian and neurobehavioral physiology under these conditions. The specific aims of this project are to test the hypotheses that: 1) Dynamic statistical methods based on the Kronauer model of the human circadian system can be developed to estimate circadian phase, period, amplitude from core-temperature data collected under simulated light- dark conditions of long-duration space missions. 2) Analytic formulae and numerical algorithms can be developed to compute the error in the estimates of circadian phase, period and amplitude determined from the data in Specific Aim 1. 3) Statistical models can detect reliably in near real- time (daily) significant alternations in the circadian physiology of individual subjects by analyzing the circadian and neurobehavioral data collected in Project 1. 4) Criteria can be developed using the Kronauer model and the recently developed Jewett model of cognitive -performance and subjective alertness to define altered circadian and neurobehavioral physiology and to set conditions for immediate administration of countermeasures.

  2. ANEMOS: Development of a next generation wind power forecasting system for the large-scale integration of onshore and offshore wind farms.

    NASA Astrophysics Data System (ADS)

    Kariniotakis, G.; Anemos Team

    2003-04-01

    Objectives: Accurate forecasting of the wind energy production up to two days ahead is recognized as a major contribution for reliable large-scale wind power integration. Especially, in a liberalized electricity market, prediction tools enhance the position of wind energy compared to other forms of dispatchable generation. ANEMOS, is a new 3.5 years R&D project supported by the European Commission, that resembles research organizations and end-users with an important experience on the domain. The project aims to develop advanced forecasting models that will substantially outperform current methods. Emphasis is given to situations like complex terrain, extreme weather conditions, as well as to offshore prediction for which no specific tools currently exist. The prediction models will be implemented in a software platform and installed for online operation at onshore and offshore wind farms by the end-users participating in the project. Approach: The paper presents the methodology of the project. Initially, the prediction requirements are identified according to the profiles of the end-users. The project develops prediction models based on both a physical and an alternative statistical approach. Research on physical models gives emphasis to techniques for use in complex terrain and the development of prediction tools based on CFD techniques, advanced model output statistics or high-resolution meteorological information. Statistical models (i.e. based on artificial intelligence) are developed for downscaling, power curve representation, upscaling for prediction at regional or national level, etc. A benchmarking process is set-up to evaluate the performance of the developed models and to compare them with existing ones using a number of case studies. The synergy between statistical and physical approaches is examined to identify promising areas for further improvement of forecasting accuracy. Appropriate physical and statistical prediction models are also developed for offshore wind farms taking into account advances in marine meteorology (interaction between wind and waves, coastal effects). The benefits from the use of satellite radar images for modeling local weather patterns are investigated. A next generation forecasting software, ANEMOS, will be developed to integrate the various models. The tool is enhanced by advanced Information Communication Technology (ICT) functionality and can operate both in stand alone, or remote mode, or be interfaced with standard Energy or Distribution Management Systems (EMS/DMS) systems. Contribution: The project provides an advanced technology for wind resource forecasting applicable in a large scale: at a single wind farm, regional or national level and for both interconnected and island systems. A major milestone is the on-line operation of the developed software by the participating utilities for onshore and offshore wind farms and the demonstration of the economic benefits. The outcome of the ANEMOS project will help consistently the increase of wind integration in two levels; in an operational level due to better management of wind farms, but also, it will contribute to increasing the installed capacity of wind farms. This is because accurate prediction of the resource reduces the risk of wind farm developers, who are then more willing to undertake new wind farm installations especially in a liberalized electricity market environment.

  3. Nonlinear Hebbian Learning as a Unifying Principle in Receptive Field Formation.

    PubMed

    Brito, Carlos S N; Gerstner, Wulfram

    2016-09-01

    The development of sensory receptive fields has been modeled in the past by a variety of models including normative models such as sparse coding or independent component analysis and bottom-up models such as spike-timing dependent plasticity or the Bienenstock-Cooper-Munro model of synaptic plasticity. Here we show that the above variety of approaches can all be unified into a single common principle, namely nonlinear Hebbian learning. When nonlinear Hebbian learning is applied to natural images, receptive field shapes were strongly constrained by the input statistics and preprocessing, but exhibited only modest variation across different choices of nonlinearities in neuron models or synaptic plasticity rules. Neither overcompleteness nor sparse network activity are necessary for the development of localized receptive fields. The analysis of alternative sensory modalities such as auditory models or V2 development lead to the same conclusions. In all examples, receptive fields can be predicted a priori by reformulating an abstract model as nonlinear Hebbian learning. Thus nonlinear Hebbian learning and natural statistics can account for many aspects of receptive field formation across models and sensory modalities.

  4. Nonlinear Hebbian Learning as a Unifying Principle in Receptive Field Formation

    PubMed Central

    Gerstner, Wulfram

    2016-01-01

    The development of sensory receptive fields has been modeled in the past by a variety of models including normative models such as sparse coding or independent component analysis and bottom-up models such as spike-timing dependent plasticity or the Bienenstock-Cooper-Munro model of synaptic plasticity. Here we show that the above variety of approaches can all be unified into a single common principle, namely nonlinear Hebbian learning. When nonlinear Hebbian learning is applied to natural images, receptive field shapes were strongly constrained by the input statistics and preprocessing, but exhibited only modest variation across different choices of nonlinearities in neuron models or synaptic plasticity rules. Neither overcompleteness nor sparse network activity are necessary for the development of localized receptive fields. The analysis of alternative sensory modalities such as auditory models or V2 development lead to the same conclusions. In all examples, receptive fields can be predicted a priori by reformulating an abstract model as nonlinear Hebbian learning. Thus nonlinear Hebbian learning and natural statistics can account for many aspects of receptive field formation across models and sensory modalities. PMID:27690349

  5. Statistical metrology—measurement and modeling of variation for advanced process development and design rule generation

    NASA Astrophysics Data System (ADS)

    Boning, Duane S.; Chung, James E.

    1998-11-01

    Advanced process technology will require more detailed understanding and tighter control of variation in devices and interconnects. The purpose of statistical metrology is to provide methods to measure and characterize variation, to model systematic and random components of that variation, and to understand the impact of variation on both yield and performance of advanced circuits. Of particular concern are spatial or pattern-dependencies within individual chips; such systematic variation within the chip can have a much larger impact on performance than wafer-level random variation. Statistical metrology methods will play an important role in the creation of design rules for advanced technologies. For example, a key issue in multilayer interconnect is the uniformity of interlevel dielectric (ILD) thickness within the chip. For the case of ILD thickness, we describe phases of statistical metrology development and application to understanding and modeling thickness variation arising from chemical-mechanical polishing (CMP). These phases include screening experiments including design of test structures and test masks to gather electrical or optical data, techniques for statistical decomposition and analysis of the data, and approaches to calibrating empirical and physical variation models. These models can be integrated with circuit CAD tools to evaluate different process integration or design rule strategies. One focus for the generation of interconnect design rules are guidelines for the use of "dummy fill" or "metal fill" to improve the uniformity of underlying metal density and thus improve the uniformity of oxide thickness within the die. Trade-offs that can be evaluated via statistical metrology include the improvements to uniformity possible versus the effect of increased capacitance due to additional metal.

  6. Development of an errorable car-following driver model

    NASA Astrophysics Data System (ADS)

    Yang, H.-H.; Peng, H.

    2010-06-01

    An errorable car-following driver model is presented in this paper. An errorable driver model is one that emulates human driver's functions and can generate both nominal (error-free), as well as devious (with error) behaviours. This model was developed for evaluation and design of active safety systems. The car-following data used for developing and validating the model were obtained from a large-scale naturalistic driving database. The stochastic car-following behaviour was first analysed and modelled as a random process. Three error-inducing behaviours were then introduced. First, human perceptual limitation was studied and implemented. Distraction due to non-driving tasks was then identified based on the statistical analysis of the driving data. Finally, time delay of human drivers was estimated through a recursive least-square identification process. By including these three error-inducing behaviours, rear-end collisions with the lead vehicle could occur. The simulated crash rate was found to be similar but somewhat higher than that reported in traffic statistics.

  7. Final Report - Enhanced LAW Glass Property - Composition Models - Phase 1 VSL-13R2940-1, Rev. 0, dated 9/27/2013

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kruger, Albert A.; Muller, I.; Gilbo, K.

    2013-11-13

    The objectives of this work are aimed at the development of enhanced LAW propertycomposition models that expand the composition region covered by the models. The models of interest include PCT, VHT, viscosity and electrical conductivity. This is planned as a multi-year effort that will be performed in phases with the objectives listed below for the current phase.  Incorporate property- composition data from the new glasses into the database.  Assess the database and identify composition spaces in the database that need augmentation.  Develop statistically-designed composition matrices to cover the composition regions identified in the above analysis.  Preparemore » crucible melts of glass compositions from the statistically-designed composition matrix and measure the properties of interest.  Incorporate the above property-composition data into the database.  Assess existing models against the complete dataset and, as necessary, start development of new models.« less

  8. The Prediction of Noise Due to Jet Turbulence Convecting Past Flight Vehicle Trailing Edges

    NASA Technical Reports Server (NTRS)

    Miller, Steven A. E.

    2014-01-01

    High intensity acoustic radiation occurs when turbulence convects past airframe trailing edges. A mathematical model is developed to predict this acoustic radiation. The model is dependent on the local flow and turbulent statistics above the trailing edge of the flight vehicle airframe. These quantities are dependent on the jet and flight vehicle Mach numbers and jet temperature. A term in the model approximates the turbulent statistics of single-stream heated jet flows and is developed based upon measurement. The developed model is valid for a wide range of jet Mach numbers, jet temperature ratios, and flight vehicle Mach numbers. The model predicts traditional trailing edge noise if the jet is not interacting with the airframe. Predictions of mean-flow quantities and the cross-spectrum of static pressure near the airframe trailing edge are compared with measurement. Finally, predictions of acoustic intensity are compared with measurement and the model is shown to accurately capture the phenomenon.

  9. Multiplicative Modeling of Children's Growth and Its Statistical Properties

    NASA Astrophysics Data System (ADS)

    Kuninaka, Hiroto; Matsushita, Mitsugu

    2014-03-01

    We develop a numerical growth model that can predict the statistical properties of the height distribution of Japanese children. Our previous studies have clarified that the height distribution of schoolchildren shows a transition from the lognormal distribution to the normal distribution during puberty. In this study, we demonstrate by simulation that the transition occurs owing to the variability of the onset of puberty.

  10. Adjusting the Adjusted X[superscript 2]/df Ratio Statistic for Dichotomous Item Response Theory Analyses: Does the Model Fit?

    ERIC Educational Resources Information Center

    Tay, Louis; Drasgow, Fritz

    2012-01-01

    Two Monte Carlo simulation studies investigated the effectiveness of the mean adjusted X[superscript 2]/df statistic proposed by Drasgow and colleagues and, because of problems with the method, a new approach for assessing the goodness of fit of an item response theory model was developed. It has been previously recommended that mean adjusted…

  11. Blended particle filters for large-dimensional chaotic dynamical systems

    PubMed Central

    Majda, Andrew J.; Qi, Di; Sapsis, Themistoklis P.

    2014-01-01

    A major challenge in contemporary data science is the development of statistically accurate particle filters to capture non-Gaussian features in large-dimensional chaotic dynamical systems. Blended particle filters that capture non-Gaussian features in an adaptively evolving low-dimensional subspace through particles interacting with evolving Gaussian statistics on the remaining portion of phase space are introduced here. These blended particle filters are constructed in this paper through a mathematical formalism involving conditional Gaussian mixtures combined with statistically nonlinear forecast models compatible with this structure developed recently with high skill for uncertainty quantification. Stringent test cases for filtering involving the 40-dimensional Lorenz 96 model with a 5-dimensional adaptive subspace for nonlinear blended filtering in various turbulent regimes with at least nine positive Lyapunov exponents are used here. These cases demonstrate the high skill of the blended particle filter algorithms in capturing both highly non-Gaussian dynamical features as well as crucial nonlinear statistics for accurate filtering in extreme filtering regimes with sparse infrequent high-quality observations. The formalism developed here is also useful for multiscale filtering of turbulent systems and a simple application is sketched below. PMID:24825886

  12. Evolution of Precipitation Particle Size Distributions within MC3E Systems and its Impact on Aerosol-Cloud-Precipitation Interactions: Final Report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kollias, Pavlos

    2017-08-08

    This is a multi-institutional, collaborative project using observations and modeling to study the evolution (e.g. formation and growth) of hydrometeors in continental convective clouds. Our contribution was in data analysis for the generation of high-value cloud and precipitation products and derive cloud statistics for model validation. There are two areas in data analysis that we contributed: i) the development of novel, state-of-the-art dual-wavelength radar algorithms for the retrieval of cloud microphysical properties and ii) the evaluation of large domain, high-resolution models using comprehensive multi-sensor observations. Our research group developed statistical summaries from numerous sensors and developed retrievals of vertical airmore » motion in deep convection.« less

  13. National Centers for Environmental Prediction

    Science.gov Websites

    Statistics Observational Data Processing Data Assimilation Monsoon Desk Model Transition Seminars Seminar The Mesoscale Modeling Branch conducts a program of research and development in support of the prediction. This research and development includes mesoscale four-dimensional data assimilation of domestic

  14. Forecast and virtual weather driven plant disease risk modeling system

    USDA-ARS?s Scientific Manuscript database

    We describe a system in use and development that leverages public weather station data, several spatialized weather forecast types, leaf wetness estimation, generic plant disease models, and online statistical evaluation. Convergent technological developments in all these areas allow, with funding f...

  15. Statistical models for predicting pair dispersion and particle clustering in isotropic turbulence and their applications

    NASA Astrophysics Data System (ADS)

    Zaichik, Leonid I.; Alipchenkov, Vladimir M.

    2009-10-01

    The purpose of this paper is twofold: (i) to advance and extend the statistical two-point models of pair dispersion and particle clustering in isotropic turbulence that were previously proposed by Zaichik and Alipchenkov (2003 Phys. Fluids15 1776-87 2007 Phys. Fluids 19, 113308) and (ii) to present some applications of these models. The models developed are based on a kinetic equation for the two-point probability density function of the relative velocity distribution of two particles. These models predict the pair relative velocity statistics and the preferential accumulation of heavy particles in stationary and decaying homogeneous isotropic turbulent flows. Moreover, the models are applied to predict the effect of particle clustering on turbulent collisions, sedimentation and intensity of microwave radiation as well as to calculate the mean filtered subgrid stress of the particulate phase. Model predictions are compared with direct numerical simulations and experimental measurements.

  16. THE ATMOSPHERIC MODEL EVALUATION TOOL

    EPA Science Inventory

    This poster describes a model evaluation tool that is currently being developed and applied for meteorological and air quality model evaluation. The poster outlines the framework and provides examples of statistical evaluations that can be performed with the model evaluation tool...

  17. Ten Years of Cloud Properties from MODIS: Global Statistics and Use in Climate Model Evaluation

    NASA Technical Reports Server (NTRS)

    Platnick, Steven E.

    2011-01-01

    The NASA Moderate Resolution Imaging Spectroradiometer (MODIS), launched onboard the Terra and Aqua spacecrafts, began Earth observations on February 24, 2000 and June 24,2002, respectively. Among the algorithms developed and applied to this sensor, a suite of cloud products includes cloud masking/detection, cloud-top properties (temperature, pressure), and optical properties (optical thickness, effective particle radius, water path, and thermodynamic phase). All cloud algorithms underwent numerous changes and enhancements between for the latest Collection 5 production version; this process continues with the current Collection 6 development. We will show example MODIS Collection 5 cloud climatologies derived from global spatial . and temporal aggregations provided in the archived gridded Level-3 MODIS atmosphere team product (product names MOD08 and MYD08 for MODIS Terra and Aqua, respectively). Data sets in this Level-3 product include scalar statistics as well as 1- and 2-D histograms of many cloud properties, allowing for higher order information and correlation studies. In addition to these statistics, we will show trends and statistical significance in annual and seasonal means for a variety of the MODIS cloud properties, as well as the time required for detection given assumed trends. To assist in climate model evaluation, we have developed a MODIS cloud simulator with an accompanying netCDF file containing subsetted monthly Level-3 statistical data sets that correspond to the simulator output. Correlations of cloud properties with ENSO offer the potential to evaluate model cloud sensitivity; initial results will be discussed.

  18. Model Uncertainty Quantification Methods In Data Assimilation

    NASA Astrophysics Data System (ADS)

    Pathiraja, S. D.; Marshall, L. A.; Sharma, A.; Moradkhani, H.

    2017-12-01

    Data Assimilation involves utilising observations to improve model predictions in a seamless and statistically optimal fashion. Its applications are wide-ranging; from improving weather forecasts to tracking targets such as in the Apollo 11 mission. The use of Data Assimilation methods in high dimensional complex geophysical systems is an active area of research, where there exists many opportunities to enhance existing methodologies. One of the central challenges is in model uncertainty quantification; the outcome of any Data Assimilation study is strongly dependent on the uncertainties assigned to both observations and models. I focus on developing improved model uncertainty quantification methods that are applicable to challenging real world scenarios. These include developing methods for cases where the system states are only partially observed, where there is little prior knowledge of the model errors, and where the model error statistics are likely to be highly non-Gaussian.

  19. Modeling Cell Size Regulation: From Single-Cell-Level Statistics to Molecular Mechanisms and Population-Level Effects.

    PubMed

    Ho, Po-Yi; Lin, Jie; Amir, Ariel

    2018-05-20

    Most microorganisms regulate their cell size. In this article, we review some of the mathematical formulations of the problem of cell size regulation. We focus on coarse-grained stochastic models and the statistics that they generate. We review the biologically relevant insights obtained from these models. We then describe cell cycle regulation and its molecular implementations, protein number regulation, and population growth, all in relation to size regulation. Finally, we discuss several future directions for developing understanding beyond phenomenological models of cell size regulation.

  20. Parametric Cost Models for Space Telescopes

    NASA Technical Reports Server (NTRS)

    Stahl, H. Philip

    2010-01-01

    A study is in-process to develop a multivariable parametric cost model for space telescopes. Cost and engineering parametric data has been collected on 30 different space telescopes. Statistical correlations have been developed between 19 variables of 59 variables sampled. Single Variable and Multi-Variable Cost Estimating Relationships have been developed. Results are being published.

  1. Evolution in Cloud Population Statistics of the MJO: From AMIE Field Observations to Global-Cloud Permitting Models Final Report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kollias, Pavlos

    This is a multi-institutional, collaborative project using a three-tier modeling approach to bridge field observations and global cloud-permitting models, with emphases on cloud population structural evolution through various large-scale environments. Our contribution was in data analysis for the generation of high value cloud and precipitation products and derive cloud statistics for model validation. There are two areas in data analysis that we contributed: the development of a synergistic cloud and precipitation cloud classification that identify different cloud (e.g. shallow cumulus, cirrus) and precipitation types (shallow, deep, convective, stratiform) using profiling ARM observations and the development of a quantitative precipitation ratemore » retrieval algorithm using profiling ARM observations. Similar efforts have been developed in the past for precipitation (weather radars), but not for the millimeter-wavelength (cloud) radar deployed at the ARM sites.« less

  2. Rapid analysis of pharmaceutical drugs using LIBS coupled with multivariate analysis.

    PubMed

    Tiwari, P K; Awasthi, S; Kumar, R; Anand, R K; Rai, P K; Rai, A K

    2018-02-01

    Type 2 diabetes drug tablets containing voglibose having dose strengths of 0.2 and 0.3 mg of various brands have been examined, using laser-induced breakdown spectroscopy (LIBS) technique. The statistical methods such as the principal component analysis (PCA) and the partial least square regression analysis (PLSR) have been employed on LIBS spectral data for classifying and developing the calibration models of drug samples. We have developed the ratio-based calibration model applying PLSR in which relative spectral intensity ratios H/C, H/N and O/N are used. Further, the developed model has been employed to predict the relative concentration of element in unknown drug samples. The experiment has been performed in air and argon atmosphere, respectively, and the obtained results have been compared. The present model provides rapid spectroscopic method for drug analysis with high statistical significance for online control and measurement process in a wide variety of pharmaceutical industrial applications.

  3. The Effect of Project Based Learning on the Statistical Literacy Levels of Student 8th Grade

    ERIC Educational Resources Information Center

    Koparan, Timur; Güven, Bülent

    2014-01-01

    This study examines the effect of project based learning on 8th grade students' statistical literacy levels. A performance test was developed for this aim. Quasi-experimental research model was used in this article. In this context, the statistics were taught with traditional method in the control group and it was taught using project based…

  4. Probability of identification: a statistical model for the validation of qualitative botanical identification methods.

    PubMed

    LaBudde, Robert A; Harnly, James M

    2012-01-01

    A qualitative botanical identification method (BIM) is an analytical procedure that returns a binary result (1 = Identified, 0 = Not Identified). A BIM may be used by a buyer, manufacturer, or regulator to determine whether a botanical material being tested is the same as the target (desired) material, or whether it contains excessive nontarget (undesirable) material. The report describes the development and validation of studies for a BIM based on the proportion of replicates identified, or probability of identification (POI), as the basic observed statistic. The statistical procedures proposed for data analysis follow closely those of the probability of detection, and harmonize the statistical concepts and parameters between quantitative and qualitative method validation. Use of POI statistics also harmonizes statistical concepts for botanical, microbiological, toxin, and other analyte identification methods that produce binary results. The POI statistical model provides a tool for graphical representation of response curves for qualitative methods, reporting of descriptive statistics, and application of performance requirements. Single collaborator and multicollaborative study examples are given.

  5. Statistics Graduate Students' Professional Development for Teaching: A Communities of Practice Model

    NASA Astrophysics Data System (ADS)

    Justice, Nicola

    Graduate teaching assistants (GTAs) are responsible for instructing approximately 25% of introductory statistics courses in the United States (Blair, Kirkman, & Maxwell, 2013). Most research on GTA professional development focuses on structured activities (e.g., courses, workshops) that have been developed to improve GTAs' pedagogy and content knowledge. Few studies take into account the social contexts of GTAs' professional development. However, GTAs perceive their social interactions with other GTAs to be a vital part of their preparation and support for teaching (e.g., Staton & Darling, 1989). Communities of practice (CoPs) are one way to bring together the study of the social contexts and structured activities of GTA professional development. CoPs are defined as groups of practitioners who deepen their knowledge and expertise by interacting with each other on an ongoing basis (e.g., Lave & Wenger, 1991). Graduate students may participate in CoPs related to teaching in many ways, including attending courses or workshops, participating in weekly meetings, engaging in informal discussions about teaching, or participating in e-mail conversations related to teaching tasks. This study explored the relationship between statistics graduate students' experiences in CoPs and the extent to which they hold student-centered teaching beliefs. A framework for characterizing GTAs' experiences in CoPs was described and a theoretical model relating these characteristics to GTAs' beliefs was developed. To gather data to test the model, the Graduate Students' Experiences Teaching Statistics (GETS) Inventory was created. Items were written to collect information about GTAs' current teaching beliefs, teaching beliefs before entering their degree programs, characteristics of GTAs' experiences in CoPs, and demographic information. Using an online program, the GETS Inventory was administered to N =218 statistics graduate students representing 37 institutions in 24 different U.S. states. The data gathered from the national survey suggest that statistics graduate students often experience CoPs through required meetings and voluntary discussions about teaching. Participants feel comfortable disagreeing with the people they perceive to be most influential on their teaching beliefs. Most participants perceive a faculty member to have the most influential role in shaping their teaching beliefs. The survey data did not provide evidence to support the proposed theoretical model relating characteristics of experiences in CoPs and beliefs about teaching statistics. Based on cross-validation results, prior beliefs about teaching statistics was the best predictor of current beliefs. Additional models were retained that included student characteristics suggested by previous literature to be associated with student-centered or traditional teaching beliefs (e.g., prior teaching experience, international student status). The results of this study can be used to inform future efforts to help promote student-centered teaching beliefs and teaching practices among statistics GTAs. Modifications to the GETS Inventory are suggested for use in future research designed to gather information about GTAs, their teaching beliefs, and their experiences in CoPs. Suggestions are also made for aspects of CoPs that might be studied further in order to learn how CoPs can promote teaching beliefs and practices that support student learning.

  6. Multi-fidelity stochastic collocation method for computation of statistical moments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhu, Xueyu, E-mail: xueyu-zhu@uiowa.edu; Linebarger, Erin M., E-mail: aerinline@sci.utah.edu; Xiu, Dongbin, E-mail: xiu.16@osu.edu

    We present an efficient numerical algorithm to approximate the statistical moments of stochastic problems, in the presence of models with different fidelities. The method extends the multi-fidelity approximation method developed in . By combining the efficiency of low-fidelity models and the accuracy of high-fidelity models, our method exhibits fast convergence with a limited number of high-fidelity simulations. We establish an error bound of the method and present several numerical examples to demonstrate the efficiency and applicability of the multi-fidelity algorithm.

  7. Sex-Specific Prediction Models for Sleep Apnea From the Hispanic Community Health Study/Study of Latinos.

    PubMed

    Shah, Neomi; Hanna, David B; Teng, Yanping; Sotres-Alvarez, Daniela; Hall, Martica; Loredo, Jose S; Zee, Phyllis; Kim, Mimi; Yaggi, H Klar; Redline, Susan; Kaplan, Robert C

    2016-06-01

    We developed and validated the first-ever sleep apnea (SA) risk calculator in a large population-based cohort of Hispanic/Latino subjects. Cross-sectional data on adults from the Hispanic Community Health Study/Study of Latinos (2008-2011) were analyzed. Subjective and objective sleep measurements were obtained. Clinically significant SA was defined as an apnea-hypopnea index ≥ 15 events per hour. Using logistic regression, four prediction models were created: three sex-specific models (female-only, male-only, and a sex × covariate interaction model to allow differential predictor effects), and one overall model with sex included as a main effect only. Models underwent 10-fold cross-validation and were assessed by using the C statistic. SA and its predictive variables; a total of 17 variables were considered. A total of 12,158 participants had complete sleep data available; 7,363 (61%) were women. The population-weighted prevalence of SA (apnea-hypopnea index ≥ 15 events per hour) was 6.1% in female subjects and 13.5% in male subjects. Male-only (C statistic, 0.808) and female-only (C statistic, 0.836) prediction models had the same predictor variables (ie, age, BMI, self-reported snoring). The sex-interaction model (C statistic, 0.836) contained sex, age, age × sex, BMI, BMI × sex, and self-reported snoring. The final overall model (C statistic, 0.832) contained age, BMI, snoring, and sex. We developed two websites for our SA risk calculator: one in English (https://www.montefiore.org/sleepapneariskcalc.html) and another in Spanish (http://www.montefiore.org/sleepapneariskcalc-es.html). We created an internally validated, highly discriminating, well-calibrated, and parsimonious prediction model for SA. Contrary to the study hypothesis, the variables did not have different predictive magnitudes in male and female subjects. Copyright © 2016 American College of Chest Physicians. Published by Elsevier Inc. All rights reserved.

  8. Western classical music development: a statistical analysis of composers similarity, differentiation and evolution.

    PubMed

    Georges, Patrick

    2017-01-01

    This paper proposes a statistical analysis that captures similarities and differences between classical music composers with the eventual aim to understand why particular composers 'sound' different even if their 'lineages' (influences network) are similar or why they 'sound' alike if their 'lineages' are different. In order to do this we use statistical methods and measures of association or similarity (based on presence/absence of traits such as specific 'ecological' characteristics and personal musical influences) that have been developed in biosystematics, scientometrics, and bibliographic coupling. This paper also represents a first step towards a more ambitious goal of developing an evolutionary model of Western classical music.

  9. Multi-level emulation of complex climate model responses to boundary forcing data

    NASA Astrophysics Data System (ADS)

    Tran, Giang T.; Oliver, Kevin I. C.; Holden, Philip B.; Edwards, Neil R.; Sóbester, András; Challenor, Peter

    2018-04-01

    Climate model components involve both high-dimensional input and output fields. It is desirable to efficiently generate spatio-temporal outputs of these models for applications in integrated assessment modelling or to assess the statistical relationship between such sets of inputs and outputs, for example, uncertainty analysis. However, the need for efficiency often compromises the fidelity of output through the use of low complexity models. Here, we develop a technique which combines statistical emulation with a dimensionality reduction technique to emulate a wide range of outputs from an atmospheric general circulation model, PLASIM, as functions of the boundary forcing prescribed by the ocean component of a lower complexity climate model, GENIE-1. Although accurate and detailed spatial information on atmospheric variables such as precipitation and wind speed is well beyond the capability of GENIE-1's energy-moisture balance model of the atmosphere, this study demonstrates that the output of this model is useful in predicting PLASIM's spatio-temporal fields through multi-level emulation. Meaningful information from the fast model, GENIE-1 was extracted by utilising the correlation between variables of the same type in the two models and between variables of different types in PLASIM. We present here the construction and validation of several PLASIM variable emulators and discuss their potential use in developing a hybrid model with statistical components.

  10. Systematic review of prediction models for delirium in the older adult inpatient.

    PubMed

    Lindroth, Heidi; Bratzke, Lisa; Purvis, Suzanne; Brown, Roger; Coburn, Mark; Mrkobrada, Marko; Chan, Matthew T V; Davis, Daniel H J; Pandharipande, Pratik; Carlsson, Cynthia M; Sanders, Robert D

    2018-04-28

    To identify existing prognostic delirium prediction models and evaluate their validity and statistical methodology in the older adult (≥60 years) acute hospital population. Systematic review. PubMed, CINAHL, PsychINFO, SocINFO, Cochrane, Web of Science and Embase were searched from 1 January 1990 to 31 December 2016. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses and CHARMS Statement guided protocol development. age >60 years, inpatient, developed/validated a prognostic delirium prediction model. alcohol-related delirium, sample size ≤50. The primary performance measures were calibration and discrimination statistics. Two authors independently conducted search and extracted data. The synthesis of data was done by the first author. Disagreement was resolved by the mentoring author. The initial search resulted in 7,502 studies. Following full-text review of 192 studies, 33 were excluded based on age criteria (<60 years) and 27 met the defined criteria. Twenty-three delirium prediction models were identified, 14 were externally validated and 3 were internally validated. The following populations were represented: 11 medical, 3 medical/surgical and 13 surgical. The assessment of delirium was often non-systematic, resulting in varied incidence. Fourteen models were externally validated with an area under the receiver operating curve range from 0.52 to 0.94. Limitations in design, data collection methods and model metric reporting statistics were identified. Delirium prediction models for older adults show variable and typically inadequate predictive capabilities. Our review highlights the need for development of robust models to predict delirium in older inpatients. We provide recommendations for the development of such models. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  11. A Weibull statistics-based lignocellulose saccharification model and a built-in parameter accurately predict lignocellulose hydrolysis performance.

    PubMed

    Wang, Mingyu; Han, Lijuan; Liu, Shasha; Zhao, Xuebing; Yang, Jinghua; Loh, Soh Kheang; Sun, Xiaomin; Zhang, Chenxi; Fang, Xu

    2015-09-01

    Renewable energy from lignocellulosic biomass has been deemed an alternative to depleting fossil fuels. In order to improve this technology, we aim to develop robust mathematical models for the enzymatic lignocellulose degradation process. By analyzing 96 groups of previously published and newly obtained lignocellulose saccharification results and fitting them to Weibull distribution, we discovered Weibull statistics can accurately predict lignocellulose saccharification data, regardless of the type of substrates, enzymes and saccharification conditions. A mathematical model for enzymatic lignocellulose degradation was subsequently constructed based on Weibull statistics. Further analysis of the mathematical structure of the model and experimental saccharification data showed the significance of the two parameters in this model. In particular, the λ value, defined the characteristic time, represents the overall performance of the saccharification system. This suggestion was further supported by statistical analysis of experimental saccharification data and analysis of the glucose production levels when λ and n values change. In conclusion, the constructed Weibull statistics-based model can accurately predict lignocellulose hydrolysis behavior and we can use the λ parameter to assess the overall performance of enzymatic lignocellulose degradation. Advantages and potential applications of the model and the λ value in saccharification performance assessment were discussed. Copyright © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  12. Risk Prediction Models for Other Cancers or Multiple Sites

    Cancer.gov

    Developing statistical models that estimate the probability of developing other multiple cancers over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  13. Non-linear mixed effects modeling - from methodology and software development to driving implementation in drug development science.

    PubMed

    Pillai, Goonaseelan Colin; Mentré, France; Steimer, Jean-Louis

    2005-04-01

    Few scientific contributions have made significant impact unless there was a champion who had the vision to see the potential for its use in seemingly disparate areas-and who then drove active implementation. In this paper, we present a historical summary of the development of non-linear mixed effects (NLME) modeling up to the more recent extensions of this statistical methodology. The paper places strong emphasis on the pivotal role played by Lewis B. Sheiner (1940-2004), who used this statistical methodology to elucidate solutions to real problems identified in clinical practice and in medical research and on how he drove implementation of the proposed solutions. A succinct overview of the evolution of the NLME modeling methodology is presented as well as ideas on how its expansion helped to provide guidance for a more scientific view of (model-based) drug development that reduces empiricism in favor of critical quantitative thinking and decision making.

  14. Leads Detection Using Mixture Statistical Distribution Based CRF Algorithm from Sentinel-1 Dual Polarization SAR Imagery

    NASA Astrophysics Data System (ADS)

    Zhang, Yu; Li, Fei; Zhang, Shengkai; Zhu, Tingting

    2017-04-01

    Synthetic Aperture Radar (SAR) is significantly important for polar remote sensing since it can provide continuous observations in all days and all weather. SAR can be used for extracting the surface roughness information characterized by the variance of dielectric properties and different polarization channels, which make it possible to observe different ice types and surface structure for deformation analysis. In November, 2016, Chinese National Antarctic Research Expedition (CHINARE) 33rd cruise has set sails in sea ice zone in Antarctic. Accurate leads spatial distribution in sea ice zone for routine planning of ship navigation is essential. In this study, the semantic relationship between leads and sea ice categories has been described by the Conditional Random Fields (CRF) model, and leads characteristics have been modeled by statistical distributions in SAR imagery. In the proposed algorithm, a mixture statistical distribution based CRF is developed by considering the contexture information and the statistical characteristics of sea ice for improving leads detection in Sentinel-1A dual polarization SAR imagery. The unary potential and pairwise potential in CRF model is constructed by integrating the posteriori probability estimated from statistical distributions. For mixture statistical distribution parameter estimation, Method of Logarithmic Cumulants (MoLC) is exploited for single statistical distribution parameters estimation. The iteration based Expectation Maximal (EM) algorithm is investigated to calculate the parameters in mixture statistical distribution based CRF model. In the posteriori probability inference, graph-cut energy minimization method is adopted in the initial leads detection. The post-processing procedures including aspect ratio constrain and spatial smoothing approaches are utilized to improve the visual result. The proposed method is validated on Sentinel-1A SAR C-band Extra Wide Swath (EW) Ground Range Detected (GRD) imagery with a pixel spacing of 40 meters near Prydz Bay area, East Antarctica. Main work is listed as follows: 1) A mixture statistical distribution based CRF algorithm has been developed for leads detection from Sentinel-1A dual polarization images. 2) The assessment of the proposed mixture statistical distribution based CRF method and single distribution based CRF algorithm has been presented. 3) The preferable parameters sets including statistical distributions, the aspect ratio threshold and spatial smoothing window size have been provided. In the future, the proposed algorithm will be developed for the operational Sentinel series data sets processing due to its less time consuming cost and high accuracy in leads detection.

  15. Statistics, Computation, and Modeling in Cosmology

    NASA Astrophysics Data System (ADS)

    Jewell, Jeff; Guiness, Joe; SAMSI 2016 Working Group in Cosmology

    2017-01-01

    Current and future ground and space based missions are designed to not only detect, but map out with increasing precision, details of the universe in its infancy to the present-day. As a result we are faced with the challenge of analyzing and interpreting observations from a wide variety of instruments to form a coherent view of the universe. Finding solutions to a broad range of challenging inference problems in cosmology is one of the goals of the “Statistics, Computation, and Modeling in Cosmology” workings groups, formed as part of the year long program on ‘Statistical, Mathematical, and Computational Methods for Astronomy’, hosted by the Statistical and Applied Mathematical Sciences Institute (SAMSI), a National Science Foundation funded institute. Two application areas have emerged for focused development in the cosmology working group involving advanced algorithmic implementations of exact Bayesian inference for the Cosmic Microwave Background, and statistical modeling of galaxy formation. The former includes study and development of advanced Markov Chain Monte Carlo algorithms designed to confront challenging inference problems including inference for spatial Gaussian random fields in the presence of sources of galactic emission (an example of a source separation problem). Extending these methods to future redshift survey data probing the nonlinear regime of large scale structure formation is also included in the working group activities. In addition, the working group is also focused on the study of ‘Galacticus’, a galaxy formation model applied to dark matter-only cosmological N-body simulations operating on time-dependent halo merger trees. The working group is interested in calibrating the Galacticus model to match statistics of galaxy survey observations; specifically stellar mass functions, luminosity functions, and color-color diagrams. The group will use subsampling approaches and fractional factorial designs to statistically and computationally efficiently explore the Galacticus parameter space. The group will also use the Galacticus simulations to study the relationship between the topological and physical structure of the halo merger trees and the properties of the resulting galaxies.

  16. The Importance of Practice in the Development of Statistics.

    DTIC Science & Technology

    1983-01-01

    RESOLUTION TEST CHART NATIONAL BUREAU OIF STANDARDS 1963 -A NRC Technical Summary Report #2471 C THE IMORTANCE OF PRACTICE IN to THE DEVELOPMENT OF STATISTICS...component analysis, bioassay, limits for a ratio, quality control, sampling inspection, non-parametric tests , transformation theory, ARIMA time series...models, sequential tests , cumulative sum charts, data analysis plotting techniques, and a resolution of the Bayes - frequentist controversy. It appears

  17. Wave and Wind Model Performance Metrics Tools

    NASA Astrophysics Data System (ADS)

    Choi, J. K.; Wang, D. W.

    2016-02-01

    Continual improvements and upgrades of Navy ocean wave and wind models are essential to the assurance of battlespace environment predictability of ocean surface wave and surf conditions in support of Naval global operations. Thus, constant verification and validation of model performance is equally essential to assure the progress of model developments and maintain confidence in the predictions. Global and regional scale model evaluations may require large areas and long periods of time. For observational data to compare against, altimeter winds and waves along the tracks from past and current operational satellites as well as moored/drifting buoys can be used for global and regional coverage. Using data and model runs in previous trials such as the planned experiment, the Dynamics of the Adriatic in Real Time (DART), we demonstrated the use of accumulated altimeter wind and wave data over several years to obtain an objective evaluation of the performance the SWAN (Simulating Waves Nearshore) model running in the Adriatic Sea. The assessment provided detailed performance of wind and wave models by using cell-averaged statistical variables maps with spatial statistics including slope, correlation, and scatter index to summarize model performance. Such a methodology is easily generalized to other regions and at global scales. Operational technology currently used by subject matter experts evaluating the Navy Coastal Ocean Model and the Hybrid Coordinate Ocean Model can be expanded to evaluate wave and wind models using tools developed for ArcMAP, a GIS application developed by ESRI. Recent inclusion of altimeter and buoy data into a format through the Naval Oceanographic Office's (NAVOCEANO) quality control system and the netCDF standards applicable to all model output makes it possible for the fusion of these data and direct model verification. Also, procedures were developed for the accumulation of match-ups of modelled and observed parameters to form a data base with which statistics are readily calculated, for the short or long term. Such a system has potential for a quick transition to operations at NAVOCEANO.

  18. Predicting the potential distribution of invasive exotic species using GIS and information-theoretic approaches: A case of ragweed (Ambrosia artemisiifolia L.) distribution in China

    USGS Publications Warehouse

    Hao, Chen; LiJun, Chen; Albright, Thomas P.

    2007-01-01

    Invasive exotic species pose a growing threat to the economy, public health, and ecological integrity of nations worldwide. Explaining and predicting the spatial distribution of invasive exotic species is of great importance to prevention and early warning efforts. We are investigating the potential distribution of invasive exotic species, the environmental factors that influence these distributions, and the ability to predict them using statistical and information-theoretic approaches. For some species, detailed presence/absence occurrence data are available, allowing the use of a variety of standard statistical techniques. However, for most species, absence data are not available. Presented with the challenge of developing a model based on presence-only information, we developed an improved logistic regression approach using Information Theory and Frequency Statistics to produce a relative suitability map. This paper generated a variety of distributions of ragweed (Ambrosia artemisiifolia L.) from logistic regression models applied to herbarium specimen location data and a suite of GIS layers including climatic, topographic, and land cover information. Our logistic regression model was based on Akaike's Information Criterion (AIC) from a suite of ecologically reasonable predictor variables. Based on the results we provided a new Frequency Statistical method to compartmentalize habitat-suitability in the native range. Finally, we used the model and the compartmentalized criterion developed in native ranges to "project" a potential distribution onto the exotic ranges to build habitat-suitability maps. ?? Science in China Press 2007.

  19. Regional analyses of labor markets and demography: a model based Norwegian example.

    PubMed

    Stambol, L S; Stolen, N M; Avitsland, T

    1998-01-01

    The authors discuss the regional REGARD model, developed by Statistics Norway to analyze the regional implications of macroeconomic development of employment, labor force, and unemployment. "In building the model, empirical analyses of regional producer behavior in manufacturing industries have been performed, and the relation between labor market development and regional migration has been investigated. Apart from providing a short description of the REGARD model, this article demonstrates the functioning of the model, and presents some results of an application." excerpt

  20. Development of failure model for nickel cadmium cells

    NASA Technical Reports Server (NTRS)

    Gupta, A.

    1980-01-01

    The development of a method for the life prediction of nickel cadmium cells is discussed. The approach described involves acquiring an understanding of the mechanisms of degradation and failure and at the same time developing nondestructive evaluation techniques for the nickel cadmium cells. The development of a statistical failure model which will describe the mechanisms of degradation and failure is outlined.

  1. Statistical framework for evaluation of climate model simulations by use of climate proxy data from the last millennium - Part 1: Theory

    NASA Astrophysics Data System (ADS)

    Sundberg, R.; Moberg, A.; Hind, A.

    2012-08-01

    A statistical framework for comparing the output of ensemble simulations from global climate models with networks of climate proxy and instrumental records has been developed, focusing on near-surface temperatures for the last millennium. This framework includes the formulation of a joint statistical model for proxy data, instrumental data and simulation data, which is used to optimize a quadratic distance measure for ranking climate model simulations. An essential underlying assumption is that the simulations and the proxy/instrumental series have a shared component of variability that is due to temporal changes in external forcing, such as volcanic aerosol load, solar irradiance or greenhouse gas concentrations. Two statistical tests have been formulated. Firstly, a preliminary test establishes whether a significant temporal correlation exists between instrumental/proxy and simulation data. Secondly, the distance measure is expressed in the form of a test statistic of whether a forced simulation is closer to the instrumental/proxy series than unforced simulations. The proposed framework allows any number of proxy locations to be used jointly, with different seasons, record lengths and statistical precision. The goal is to objectively rank several competing climate model simulations (e.g. with alternative model parameterizations or alternative forcing histories) by means of their goodness of fit to the unobservable true past climate variations, as estimated from noisy proxy data and instrumental observations.

  2. Predicting the stochastic guiding of kinesin-driven microtubules in microfabricated tracks: a statistical-mechanics-based modeling approach.

    PubMed

    Lin, Chih-Tin; Meyhofer, Edgar; Kurabayashi, Katsuo

    2010-01-01

    Directional control of microtubule shuttles via microfabricated tracks is key to the development of controlled nanoscale mass transport by kinesin motor molecules. Here we develop and test a model to quantitatively predict the stochastic behavior of microtubule guiding when they mechanically collide with the sidewalls of lithographically patterned tracks. By taking into account appropriate probability distributions of microscopic states of the microtubule system, the model allows us to theoretically analyze the roles of collision conditions and kinesin surface densities in determining how the motion of microtubule shuttles is controlled. In addition, we experimentally observe the statistics of microtubule collision events and compare our theoretical prediction with experimental data to validate our model. The model will direct the design of future hybrid nanotechnology devices that integrate nanoscale transport systems powered by kinesin-driven molecular shuttles.

  3. Modeling of adsorption isotherms of water vapor on Tunisian olive leaves using statistical mechanical formulation

    NASA Astrophysics Data System (ADS)

    Knani, S.; Aouaini, F.; Bahloul, N.; Khalfaoui, M.; Hachicha, M. A.; Ben Lamine, A.; Kechaou, N.

    2014-04-01

    Analytical expression for modeling water adsorption isotherms of food or agricultural products is developed using the statistical mechanics formalism. The model developed in this paper is further used to fit and interpret the isotherms of four varieties of Tunisian olive leaves called “Chemlali, Chemchali, Chetoui and Zarrazi”. The parameters involved in the model such as the number of adsorbed water molecules per site, n, the receptor sites density, NM, and the energetic parameters, a1 and a2, were determined by fitting the experimental adsorption isotherms at temperatures ranging from 303 to 323 K. We interpret the results of fitting. After that, the model is further applied to calculate thermodynamic functions which govern the adsorption mechanism such as entropy, the free enthalpy of Gibbs and the internal energy.

  4. Exposure time independent summary statistics for assessment of drug dependent cell line growth inhibition.

    PubMed

    Falgreen, Steffen; Laursen, Maria Bach; Bødker, Julie Støve; Kjeldsen, Malene Krag; Schmitz, Alexander; Nyegaard, Mette; Johnsen, Hans Erik; Dybkær, Karen; Bøgsted, Martin

    2014-06-05

    In vitro generated dose-response curves of human cancer cell lines are widely used to develop new therapeutics. The curves are summarised by simplified statistics that ignore the conventionally used dose-response curves' dependency on drug exposure time and growth kinetics. This may lead to suboptimal exploitation of data and biased conclusions on the potential of the drug in question. Therefore we set out to improve the dose-response assessments by eliminating the impact of time dependency. First, a mathematical model for drug induced cell growth inhibition was formulated and used to derive novel dose-response curves and improved summary statistics that are independent of time under the proposed model. Next, a statistical analysis workflow for estimating the improved statistics was suggested consisting of 1) nonlinear regression models for estimation of cell counts and doubling times, 2) isotonic regression for modelling the suggested dose-response curves, and 3) resampling based method for assessing variation of the novel summary statistics. We document that conventionally used summary statistics for dose-response experiments depend on time so that fast growing cell lines compared to slowly growing ones are considered overly sensitive. The adequacy of the mathematical model is tested for doxorubicin and found to fit real data to an acceptable degree. Dose-response data from the NCI60 drug screen were used to illustrate the time dependency and demonstrate an adjustment correcting for it. The applicability of the workflow was illustrated by simulation and application on a doxorubicin growth inhibition screen. The simulations show that under the proposed mathematical model the suggested statistical workflow results in unbiased estimates of the time independent summary statistics. Variance estimates of the novel summary statistics are used to conclude that the doxorubicin screen covers a significant diverse range of responses ensuring it is useful for biological interpretations. Time independent summary statistics may aid the understanding of drugs' action mechanism on tumour cells and potentially renew previous drug sensitivity evaluation studies.

  5. Exposure time independent summary statistics for assessment of drug dependent cell line growth inhibition

    PubMed Central

    2014-01-01

    Background In vitro generated dose-response curves of human cancer cell lines are widely used to develop new therapeutics. The curves are summarised by simplified statistics that ignore the conventionally used dose-response curves’ dependency on drug exposure time and growth kinetics. This may lead to suboptimal exploitation of data and biased conclusions on the potential of the drug in question. Therefore we set out to improve the dose-response assessments by eliminating the impact of time dependency. Results First, a mathematical model for drug induced cell growth inhibition was formulated and used to derive novel dose-response curves and improved summary statistics that are independent of time under the proposed model. Next, a statistical analysis workflow for estimating the improved statistics was suggested consisting of 1) nonlinear regression models for estimation of cell counts and doubling times, 2) isotonic regression for modelling the suggested dose-response curves, and 3) resampling based method for assessing variation of the novel summary statistics. We document that conventionally used summary statistics for dose-response experiments depend on time so that fast growing cell lines compared to slowly growing ones are considered overly sensitive. The adequacy of the mathematical model is tested for doxorubicin and found to fit real data to an acceptable degree. Dose-response data from the NCI60 drug screen were used to illustrate the time dependency and demonstrate an adjustment correcting for it. The applicability of the workflow was illustrated by simulation and application on a doxorubicin growth inhibition screen. The simulations show that under the proposed mathematical model the suggested statistical workflow results in unbiased estimates of the time independent summary statistics. Variance estimates of the novel summary statistics are used to conclude that the doxorubicin screen covers a significant diverse range of responses ensuring it is useful for biological interpretations. Conclusion Time independent summary statistics may aid the understanding of drugs’ action mechanism on tumour cells and potentially renew previous drug sensitivity evaluation studies. PMID:24902483

  6. A Management Information System Model for Program Management. Ph.D. Thesis - Oklahoma State Univ.; [Computerized Systems Analysis

    NASA Technical Reports Server (NTRS)

    Shipman, D. L.

    1972-01-01

    The development of a model to simulate the information system of a program management type of organization is reported. The model statistically determines the following parameters: type of messages, destinations, delivery durations, type processing, processing durations, communication channels, outgoing messages, and priorites. The total management information system of the program management organization is considered, including formal and informal information flows and both facilities and equipment. The model is written in General Purpose System Simulation 2 computer programming language for use on the Univac 1108, Executive 8 computer. The model is simulated on a daily basis and collects queue and resource utilization statistics for each decision point. The statistics are then used by management to evaluate proposed resource allocations, to evaluate proposed changes to the system, and to identify potential problem areas. The model employs both empirical and theoretical distributions which are adjusted to simulate the information flow being studied.

  7. Heuristic Identification of Biological Architectures for Simulating Complex Hierarchical Genetic Interactions

    PubMed Central

    Moore, Jason H; Amos, Ryan; Kiralis, Jeff; Andrews, Peter C

    2015-01-01

    Simulation plays an essential role in the development of new computational and statistical methods for the genetic analysis of complex traits. Most simulations start with a statistical model using methods such as linear or logistic regression that specify the relationship between genotype and phenotype. This is appealing due to its simplicity and because these statistical methods are commonly used in genetic analysis. It is our working hypothesis that simulations need to move beyond simple statistical models to more realistically represent the biological complexity of genetic architecture. The goal of the present study was to develop a prototype genotype–phenotype simulation method and software that are capable of simulating complex genetic effects within the context of a hierarchical biology-based framework. Specifically, our goal is to simulate multilocus epistasis or gene–gene interaction where the genetic variants are organized within the framework of one or more genes, their regulatory regions and other regulatory loci. We introduce here the Heuristic Identification of Biological Architectures for simulating Complex Hierarchical Interactions (HIBACHI) method and prototype software for simulating data in this manner. This approach combines a biological hierarchy, a flexible mathematical framework, a liability threshold model for defining disease endpoints, and a heuristic search strategy for identifying high-order epistatic models of disease susceptibility. We provide several simulation examples using genetic models exhibiting independent main effects and three-way epistatic effects. PMID:25395175

  8. Supporting the Development of Conceptions of Statistics by Engaging Students in Measuring and Modeling Variability

    ERIC Educational Resources Information Center

    Lehrer, Richard; Kim, Min-joung; Schauble, Leona

    2007-01-01

    New capabilities in "TinkerPlots 2.0" supported the conceptual development of fifth- and sixth-grade students as they pursued several weeks of instruction that emphasized data modeling. The instruction highlighted links between data analysis, chance, and modeling in the context of describing and explaining the distributions of measures that result…

  9. Prediction of the presence of insulin resistance using general health checkup data in Japanese employees with metabolic risk factors.

    PubMed

    Takahara, Mitsuyoshi; Katakami, Naoto; Kaneto, Hideaki; Noguchi, Midori; Shimomura, Iichiro

    2014-01-01

    The aim of the current study was to develop a predictive model of insulin resistance using general health checkup data in Japanese employees with one or more metabolic risk factors. We used a database of 846 Japanese employees with one or more metabolic risk factors who underwent general health checkup and a 75-g oral glucose tolerance test (OGTT). Logistic regression models were developed to predict existing insulin resistance evaluated using the Matsuda index. The predictive performance of these models was assessed using the C statistic. The C statistics of body mass index (BMI), waist circumference and their combined use were 0.743, 0.732 and 0.749, with no significant differences. The multivariate backward selection model, in which BMI, the levels of plasma glucose, high-density lipoprotein (HDL) cholesterol, log-transformed triglycerides and log-transformed alanine aminotransferase and hypertension under treatment remained, had a C statistic of 0.816, with a significant difference compared to the combined use of BMI and waist circumference (p<0.01). The C statistic was not significantly reduced when the levels of log-transformed triglycerides and log-transformed alanine aminotransferase and hypertension under treatment were simultaneously excluded from the multivariate model (p=0.14). On the other hand, further exclusion of any of the remaining three variables significantly reduced the C statistic (all p<0.01). When predicting the presence of insulin resistance using general health checkup data in Japanese employees with metabolic risk factors, it is important to take into consideration the BMI and fasting plasma glucose and HDL cholesterol levels.

  10. Statistical analysis and model validation of automobile emissions

    DOT National Transportation Integrated Search

    2000-09-01

    The article discusses the development of a comprehensive modal emissions model that is currently being integrated with a variety of transportation models as part of National Cooperative Highway Research Program project 25-11. Described is the second-...

  11. A review of statistical updating methods for clinical prediction models.

    PubMed

    Su, Ting-Li; Jaki, Thomas; Hickey, Graeme L; Buchan, Iain; Sperrin, Matthew

    2018-01-01

    A clinical prediction model is a tool for predicting healthcare outcomes, usually within a specific population and context. A common approach is to develop a new clinical prediction model for each population and context; however, this wastes potentially useful historical information. A better approach is to update or incorporate the existing clinical prediction models already developed for use in similar contexts or populations. In addition, clinical prediction models commonly become miscalibrated over time, and need replacing or updating. In this article, we review a range of approaches for re-using and updating clinical prediction models; these fall in into three main categories: simple coefficient updating, combining multiple previous clinical prediction models in a meta-model and dynamic updating of models. We evaluated the performance (discrimination and calibration) of the different strategies using data on mortality following cardiac surgery in the United Kingdom: We found that no single strategy performed sufficiently well to be used to the exclusion of the others. In conclusion, useful tools exist for updating existing clinical prediction models to a new population or context, and these should be implemented rather than developing a new clinical prediction model from scratch, using a breadth of complementary statistical methods.

  12. Development of the statistical ARIMA model: an application for predicting the upcoming of MJO index

    NASA Astrophysics Data System (ADS)

    Hermawan, Eddy; Nurani Ruchjana, Budi; Setiawan Abdullah, Atje; Gede Nyoman Mindra Jaya, I.; Berliana Sipayung, Sinta; Rustiana, Shailla

    2017-10-01

    This study is mainly concerned in development one of the most important equatorial atmospheric phenomena that we call as the Madden Julian Oscillation (MJO) which having strong impacts to the extreme rainfall anomalies over the Indonesian Maritime Continent (IMC). In this study, we focused to the big floods over Jakarta and surrounded area that suspecting caused by the impacts of MJO. We concentrated to develop the MJO index using the statistical model that we call as Box-Jenkis (ARIMA) ini 1996, 2002, and 2007, respectively. They are the RMM (Real Multivariate MJO) index as represented by RMM1 and RMM2, respectively. There are some steps to develop that model, starting from identification of data, estimated, determined model, before finally we applied that model for investigation some big floods that occurred at Jakarta in 1996, 2002, and 2007 respectively. We found the best of estimated model for the RMM1 and RMM2 prediction is ARIMA (2,1,2). Detailed steps how that model can be extracted and applying to predict the rainfall anomalies over Jakarta for 3 to 6 months later is discussed at this paper.

  13. Boosting Bayesian parameter inference of stochastic differential equation models with methods from statistical physics

    NASA Astrophysics Data System (ADS)

    Albert, Carlo; Ulzega, Simone; Stoop, Ruedi

    2016-04-01

    Measured time-series of both precipitation and runoff are known to exhibit highly non-trivial statistical properties. For making reliable probabilistic predictions in hydrology, it is therefore desirable to have stochastic models with output distributions that share these properties. When parameters of such models have to be inferred from data, we also need to quantify the associated parametric uncertainty. For non-trivial stochastic models, however, this latter step is typically very demanding, both conceptually and numerically, and always never done in hydrology. Here, we demonstrate that methods developed in statistical physics make a large class of stochastic differential equation (SDE) models amenable to a full-fledged Bayesian parameter inference. For concreteness we demonstrate these methods by means of a simple yet non-trivial toy SDE model. We consider a natural catchment that can be described by a linear reservoir, at the scale of observation. All the neglected processes are assumed to happen at much shorter time-scales and are therefore modeled with a Gaussian white noise term, the standard deviation of which is assumed to scale linearly with the system state (water volume in the catchment). Even for constant input, the outputs of this simple non-linear SDE model show a wealth of desirable statistical properties, such as fat-tailed distributions and long-range correlations. Standard algorithms for Bayesian inference fail, for models of this kind, because their likelihood functions are extremely high-dimensional intractable integrals over all possible model realizations. The use of Kalman filters is illegitimate due to the non-linearity of the model. Particle filters could be used but become increasingly inefficient with growing number of data points. Hamiltonian Monte Carlo algorithms allow us to translate this inference problem to the problem of simulating the dynamics of a statistical mechanics system and give us access to most sophisticated methods that have been developed in the statistical physics community over the last few decades. We demonstrate that such methods, along with automated differentiation algorithms, allow us to perform a full-fledged Bayesian inference, for a large class of SDE models, in a highly efficient and largely automatized manner. Furthermore, our algorithm is highly parallelizable. For our toy model, discretized with a few hundred points, a full Bayesian inference can be performed in a matter of seconds on a standard PC.

  14. Computational algebraic geometry for statistical modeling FY09Q2 progress.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Thompson, David C.; Rojas, Joseph Maurice; Pebay, Philippe Pierre

    2009-03-01

    This is a progress report on polynomial system solving for statistical modeling. This is a progress report on polynomial system solving for statistical modeling. This quarter we have developed our first model of shock response data and an algorithm for identifying the chamber cone containing a polynomial system in n variables with n+k terms within polynomial time - a significant improvement over previous algorithms, all having exponential worst-case complexity. We have implemented and verified the chamber cone algorithm for n+3 and are working to extend the implementation to handle arbitrary k. Later sections of this report explain chamber cones inmore » more detail; the next section provides an overview of the project and how the current progress fits into it.« less

  15. Cleanroom certification model

    NASA Technical Reports Server (NTRS)

    Currit, P. A.

    1983-01-01

    The Cleanroom software development methodology is designed to take the gamble out of product releases for both suppliers and receivers of the software. The ingredients of this procedure are a life cycle of executable product increments, representative statistical testing, and a standard estimate of the MTTF (Mean Time To Failure) of the product at the time of its release. A statistical approach to software product testing using randomly selected samples of test cases is considered. A statistical model is defined for the certification process which uses the timing data recorded during test. A reasonableness argument for this model is provided that uses previously published data on software product execution. Also included is a derivation of the certification model estimators and a comparison of the proposed least squares technique with the more commonly used maximum likelihood estimators.

  16. DEVELOPMENT OF RESIDENTIAL WOOD COMSUMPTION ESTIMATION MODELS

    EPA Science Inventory

    The report gives data on the distribution and usage of firewood, obtained from a pool of household wood use surveys. ased on a series of regression models developed using the STEPWISE procedure in the SAS statistical package, two variables appear to be most predictive of wood use...

  17. The Consolidation/Transition Model in Moral Reasoning Development.

    ERIC Educational Resources Information Center

    Walker, Lawrence J.; Gustafson, Paul; Hennig, Karl H.

    2001-01-01

    This longitudinal study with 62 children and adolescents examined the validity of the consolidation/transition model in the context of moral reasoning development. Results of standard statistical and Bayesian techniques supported the hypotheses regarding cyclical patterns of change and predictors of stage transition, and demonstrated the utility…

  18. Artificial neural network models for prediction of cardiovascular autonomic dysfunction in general Chinese population

    PubMed Central

    2013-01-01

    Background The present study aimed to develop an artificial neural network (ANN) based prediction model for cardiovascular autonomic (CA) dysfunction in the general population. Methods We analyzed a previous dataset based on a population sample consisted of 2,092 individuals aged 30–80 years. The prediction models were derived from an exploratory set using ANN analysis. Performances of these prediction models were evaluated in the validation set. Results Univariate analysis indicated that 14 risk factors showed statistically significant association with CA dysfunction (P < 0.05). The mean area under the receiver-operating curve was 0.762 (95% CI 0.732–0.793) for prediction model developed using ANN analysis. The mean sensitivity, specificity, positive and negative predictive values were similar in the prediction models was 0.751, 0.665, 0.330 and 0.924, respectively. All HL statistics were less than 15.0. Conclusion ANN is an effective tool for developing prediction models with high value for predicting CA dysfunction among the general population. PMID:23902963

  19. Personalizing oncology treatments by predicting drug efficacy, side-effects, and improved therapy: mathematics, statistics, and their integration.

    PubMed

    Agur, Zvia; Elishmereni, Moran; Kheifetz, Yuri

    2014-01-01

    Despite its great promise, personalized oncology still faces many hurdles, and it is increasingly clear that targeted drugs and molecular biomarkers alone yield only modest clinical benefit. One reason is the complex relationships between biomarkers and the patient's response to drugs, obscuring the true weight of the biomarkers in the overall patient's response. This complexity can be disentangled by computational models that integrate the effects of personal biomarkers into a simulator of drug-patient dynamic interactions, for predicting the clinical outcomes. Several computational tools have been developed for personalized oncology, notably evidence-based tools for simulating pharmacokinetics, Bayesian-estimated tools for predicting survival, etc. We describe representative statistical and mathematical tools, and discuss their merits, shortcomings and preliminary clinical validation attesting to their potential. Yet, the individualization power of mathematical models alone, or statistical models alone, is limited. More accurate and versatile personalization tools can be constructed by a new application of the statistical/mathematical nonlinear mixed effects modeling (NLMEM) approach, which until recently has been used only in drug development. Using these advanced tools, clinical data from patient populations can be integrated with mechanistic models of disease and physiology, for generating personal mathematical models. Upon a more substantial validation in the clinic, this approach will hopefully be applied in personalized clinical trials, P-trials, hence aiding the establishment of personalized medicine within the main stream of clinical oncology. © 2014 Wiley Periodicals, Inc.

  20. Statistical wind analysis for near-space applications

    NASA Astrophysics Data System (ADS)

    Roney, Jason A.

    2007-09-01

    Statistical wind models were developed based on the existing observational wind data for near-space altitudes between 60 000 and 100 000 ft (18 30 km) above ground level (AGL) at two locations, Akon, OH, USA, and White Sands, NM, USA. These two sites are envisioned as playing a crucial role in the first flights of high-altitude airships. The analysis shown in this paper has not been previously applied to this region of the stratosphere for such an application. Standard statistics were compiled for these data such as mean, median, maximum wind speed, and standard deviation, and the data were modeled with Weibull distributions. These statistics indicated, on a yearly average, there is a lull or a “knee” in the wind between 65 000 and 72 000 ft AGL (20 22 km). From the standard statistics, trends at both locations indicated substantial seasonal variation in the mean wind speed at these heights. The yearly and monthly statistical modeling indicated that Weibull distributions were a reasonable model for the data. Forecasts and hindcasts were done by using a Weibull model based on 2004 data and comparing the model with the 2003 and 2005 data. The 2004 distribution was also a reasonable model for these years. Lastly, the Weibull distribution and cumulative function were used to predict the 50%, 95%, and 99% winds, which are directly related to the expected power requirements of a near-space station-keeping airship. These values indicated that using only the standard deviation of the mean may underestimate the operational conditions.

  1. A Stochastic Fractional Dynamics Model of Space-time Variability of Rain

    NASA Technical Reports Server (NTRS)

    Kundu, Prasun K.; Travis, James E.

    2013-01-01

    Rainfall varies in space and time in a highly irregular manner and is described naturally in terms of a stochastic process. A characteristic feature of rainfall statistics is that they depend strongly on the space-time scales over which rain data are averaged. A spectral model of precipitation has been developed based on a stochastic differential equation of fractional order for the point rain rate, that allows a concise description of the second moment statistics of rain at any prescribed space-time averaging scale. The model is thus capable of providing a unified description of the statistics of both radar and rain gauge data. The underlying dynamical equation can be expressed in terms of space-time derivatives of fractional orders that are adjusted together with other model parameters to fit the data. The form of the resulting spectrum gives the model adequate flexibility to capture the subtle interplay between the spatial and temporal scales of variability of rain but strongly constrains the predicted statistical behavior as a function of the averaging length and times scales. We test the model with radar and gauge data collected contemporaneously at the NASA TRMM ground validation sites located near Melbourne, Florida and in Kwajalein Atoll, Marshall Islands in the tropical Pacific. We estimate the parameters by tuning them to the second moment statistics of radar data. The model predictions are then found to fit the second moment statistics of the gauge data reasonably well without any further adjustment.

  2. In silico model-based inference: a contemporary approach for hypothesis testing in network biology

    PubMed Central

    Klinke, David J.

    2014-01-01

    Inductive inference plays a central role in the study of biological systems where one aims to increase their understanding of the system by reasoning backwards from uncertain observations to identify causal relationships among components of the system. These causal relationships are postulated from prior knowledge as a hypothesis or simply a model. Experiments are designed to test the model. Inferential statistics are used to establish a level of confidence in how well our postulated model explains the acquired data. This iterative process, commonly referred to as the scientific method, either improves our confidence in a model or suggests that we revisit our prior knowledge to develop a new model. Advances in technology impact how we use prior knowledge and data to formulate models of biological networks and how we observe cellular behavior. However, the approach for model-based inference has remained largely unchanged since Fisher, Neyman and Pearson developed the ideas in the early 1900’s that gave rise to what is now known as classical statistical hypothesis (model) testing. Here, I will summarize conventional methods for model-based inference and suggest a contemporary approach to aid in our quest to discover how cells dynamically interpret and transmit information for therapeutic aims that integrates ideas drawn from high performance computing, Bayesian statistics, and chemical kinetics. PMID:25139179

  3. A statistical model including age to predict passenger postures in the rear seats of automobiles.

    PubMed

    Park, Jangwoon; Ebert, Sheila M; Reed, Matthew P; Hallman, Jason J

    2016-06-01

    Few statistical models of rear seat passenger posture have been published, and none has taken into account the effects of occupant age. This study developed new statistical models for predicting passenger postures in the rear seats of automobiles. Postures of 89 adults with a wide range of age and body size were measured in a laboratory mock-up in seven seat configurations. Posture-prediction models for female and male passengers were separately developed by stepwise regression using age, body dimensions, seat configurations and two-way interactions as potential predictors. Passenger posture was significantly associated with age and the effects of other two-way interaction variables depended on age. A set of posture-prediction models are presented for women and men, and the prediction results are compared with previously published models. This study is the first study of passenger posture to include a large cohort of older passengers and the first to report a significant effect of age for adults. The presented models can be used to position computational and physical human models for vehicle design and assessment. Practitioner Summary: The significant effects of age, body dimensions and seat configuration on rear seat passenger posture were identified. The models can be used to accurately position computational human models or crash test dummies for older passengers in known rear seat configurations.

  4. In silico model-based inference: a contemporary approach for hypothesis testing in network biology.

    PubMed

    Klinke, David J

    2014-01-01

    Inductive inference plays a central role in the study of biological systems where one aims to increase their understanding of the system by reasoning backwards from uncertain observations to identify causal relationships among components of the system. These causal relationships are postulated from prior knowledge as a hypothesis or simply a model. Experiments are designed to test the model. Inferential statistics are used to establish a level of confidence in how well our postulated model explains the acquired data. This iterative process, commonly referred to as the scientific method, either improves our confidence in a model or suggests that we revisit our prior knowledge to develop a new model. Advances in technology impact how we use prior knowledge and data to formulate models of biological networks and how we observe cellular behavior. However, the approach for model-based inference has remained largely unchanged since Fisher, Neyman and Pearson developed the ideas in the early 1900s that gave rise to what is now known as classical statistical hypothesis (model) testing. Here, I will summarize conventional methods for model-based inference and suggest a contemporary approach to aid in our quest to discover how cells dynamically interpret and transmit information for therapeutic aims that integrates ideas drawn from high performance computing, Bayesian statistics, and chemical kinetics. © 2014 American Institute of Chemical Engineers.

  5. Heterogeneous Structure of Stem Cells Dynamics: Statistical Models and Quantitative Predictions

    PubMed Central

    Bogdan, Paul; Deasy, Bridget M.; Gharaibeh, Burhan; Roehrs, Timo; Marculescu, Radu

    2014-01-01

    Understanding stem cell (SC) population dynamics is essential for developing models that can be used in basic science and medicine, to aid in predicting cells fate. These models can be used as tools e.g. in studying patho-physiological events at the cellular and tissue level, predicting (mal)functions along the developmental course, and personalized regenerative medicine. Using time-lapsed imaging and statistical tools, we show that the dynamics of SC populations involve a heterogeneous structure consisting of multiple sub-population behaviors. Using non-Gaussian statistical approaches, we identify the co-existence of fast and slow dividing subpopulations, and quiescent cells, in stem cells from three species. The mathematical analysis also shows that, instead of developing independently, SCs exhibit a time-dependent fractal behavior as they interact with each other through molecular and tactile signals. These findings suggest that more sophisticated models of SC dynamics should view SC populations as a collective and avoid the simplifying homogeneity assumption by accounting for the presence of more than one dividing sub-population, and their multi-fractal characteristics. PMID:24769917

  6. EVALUATION OF A NEW MEAN SCALED AND MOMENT ADJUSTED TEST STATISTIC FOR SEM.

    PubMed

    Tong, Xiaoxiao; Bentler, Peter M

    2013-01-01

    Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and two well-known robust test statistics. A modification to the Satorra-Bentler scaled statistic is developed for the condition that sample size is smaller than degrees of freedom. The behavior of the four test statistics is evaluated with a Monte Carlo confirmatory factor analysis study that varies seven sample sizes and three distributional conditions obtained using Headrick's fifth-order transformation to nonnormality. The new statistic performs badly in most conditions except under the normal distribution. The goodness-of-fit χ(2) test based on maximum-likelihood estimation performed well under normal distributions as well as under a condition of asymptotic robustness. The Satorra-Bentler scaled test statistic performed best overall, while the mean scaled and variance adjusted test statistic outperformed the others at small and moderate sample sizes under certain distributional conditions.

  7. The Population Tracking Model: A Simple, Scalable Statistical Model for Neural Population Data

    PubMed Central

    O'Donnell, Cian; alves, J. Tiago Gonç; Whiteley, Nick; Portera-Cailliau, Carlos; Sejnowski, Terrence J.

    2017-01-01

    Our understanding of neural population coding has been limited by a lack of analysis methods to characterize spiking data from large populations. The biggest challenge comes from the fact that the number of possible network activity patterns scales exponentially with the number of neurons recorded (∼2Neurons). Here we introduce a new statistical method for characterizing neural population activity that requires semi-independent fitting of only as many parameters as the square of the number of neurons, requiring drastically smaller data sets and minimal computation time. The model works by matching the population rate (the number of neurons synchronously active) and the probability that each individual neuron fires given the population rate. We found that this model can accurately fit synthetic data from up to 1000 neurons. We also found that the model could rapidly decode visual stimuli from neural population data from macaque primary visual cortex about 65 ms after stimulus onset. Finally, we used the model to estimate the entropy of neural population activity in developing mouse somatosensory cortex and, surprisingly, found that it first increases, and then decreases during development. This statistical model opens new options for interrogating neural population data and can bolster the use of modern large-scale in vivo Ca2+ and voltage imaging tools. PMID:27870612

  8. Uranium resource assessment through statistical analysis of exploration geochemical and other data. Final report. [Codes EVAL, SURE

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Koch, G.S. Jr.; Howarth, R.J.; Schuenemeyer, J.H.

    1981-02-01

    We have developed a procedure that can help quadrangle evaluators to systematically summarize and use hydrogeochemical and stream sediment reconnaissance (HSSR) and occurrence data. Although we have not provided an independent estimate of uranium endowment, we have devised a methodology that will provide this independent estimate when additional calibration is done by enlarging the study area. Our statistical model for evaluation (system EVAL) ranks uranium endowment for each quadrangle. Because using this model requires experience in geology, statistics, and data analysis, we have also devised a simplified model, presented in the package SURE, a System for Uranium Resource Evaluation. Wemore » have developed and tested these models for the four quadrangles in southern Colorado that comprise the study area; to investigate their generality, the models should be applied to other quandrangles. Once they are calibrated with accepted uranium endowments for several well-known quadrangles, the models can be used to give independent estimates for less-known quadrangles. The point-oriented models structure the objective comparison of the quandrangles on the bases of: (1) Anomalies (a) derived from stream sediments, (b) derived from waters (stream, well, pond, etc.), (2) Geology (a) source rocks, as defined by the evaluator, (b) host rocks, as defined by the evaluator, and (3) Aerial radiometric anomalies.« less

  9. Jet Noise Diagnostics Supporting Statistical Noise Prediction Methods

    NASA Technical Reports Server (NTRS)

    Bridges, James E.

    2006-01-01

    The primary focus of my presentation is the development of the jet noise prediction code JeNo with most examples coming from the experimental work that drove the theoretical development and validation. JeNo is a statistical jet noise prediction code, based upon the Lilley acoustic analogy. Our approach uses time-average 2-D or 3-D mean and turbulent statistics of the flow as input. The output is source distributions and spectral directivity. NASA has been investing in development of statistical jet noise prediction tools because these seem to fit the middle ground that allows enough flexibility and fidelity for jet noise source diagnostics while having reasonable computational requirements. These tools rely on Reynolds-averaged Navier-Stokes (RANS) computational fluid dynamics (CFD) solutions as input for computing far-field spectral directivity using an acoustic analogy. There are many ways acoustic analogies can be created, each with a series of assumptions and models, many often taken unknowingly. And the resulting prediction can be easily reverse-engineered by altering the models contained within. However, only an approach which is mathematically sound, with assumptions validated and modeled quantities checked against direct measurement will give consistently correct answers. Many quantities are modeled in acoustic analogies precisely because they have been impossible to measure or calculate, making this requirement a difficult task. The NASA team has spent considerable effort identifying all the assumptions and models used to take the Navier-Stokes equations to the point of a statistical calculation via an acoustic analogy very similar to that proposed by Lilley. Assumptions have been identified and experiments have been developed to test these assumptions. In some cases this has resulted in assumptions being changed. Beginning with the CFD used as input to the acoustic analogy, models for turbulence closure used in RANS CFD codes have been explored and compared against measurements of mean and rms velocity statistics over a range of jet speeds and temperatures. Models for flow parameters used in the acoustic analogy, most notably the space-time correlations of velocity, have been compared against direct measurements, and modified to better fit the observed data. These measurements have been extremely challenging for hot, high speed jets, and represent a sizeable investment in instrumentation development. As an intermediate check that the analysis is predicting the physics intended, phased arrays have been employed to measure source distributions for a wide range of jet cases. And finally, careful far-field spectral directivity measurements have been taken for final validation of the prediction code. Examples of each of these experimental efforts will be presented. The main result of these efforts is a noise prediction code, named JeNo, which is in middevelopment. JeNo is able to consistently predict spectral directivity, including aft angle directivity, for subsonic cold jets of most geometries. Current development on JeNo is focused on extending its capability to hot jets, requiring inclusion of a previously neglected second source associated with thermal fluctuations. A secondary result of the intensive experimentation is the archiving of various flow statistics applicable to other acoustic analogies and to development of time-resolved prediction methods. These will be of lasting value as we look ahead at future challenges to the aeroacoustic experimentalist.

  10. Multivariate statistical model for 3D image segmentation with application to medical images.

    PubMed

    John, Nigel M; Kabuka, Mansur R; Ibrahim, Mohamed O

    2003-12-01

    In this article we describe a statistical model that was developed to segment brain magnetic resonance images. The statistical segmentation algorithm was applied after a pre-processing stage involving the use of a 3D anisotropic filter along with histogram equalization techniques. The segmentation algorithm makes use of prior knowledge and a probability-based multivariate model designed to semi-automate the process of segmentation. The algorithm was applied to images obtained from the Center for Morphometric Analysis at Massachusetts General Hospital as part of the Internet Brain Segmentation Repository (IBSR). The developed algorithm showed improved accuracy over the k-means, adaptive Maximum Apriori Probability (MAP), biased MAP, and other algorithms. Experimental results showing the segmentation and the results of comparisons with other algorithms are provided. Results are based on an overlap criterion against expertly segmented images from the IBSR. The algorithm produced average results of approximately 80% overlap with the expertly segmented images (compared with 85% for manual segmentation and 55% for other algorithms).

  11. Global Sensitivity Analysis of Environmental Systems via Multiple Indices based on Statistical Moments of Model Outputs

    NASA Astrophysics Data System (ADS)

    Guadagnini, A.; Riva, M.; Dell'Oca, A.

    2017-12-01

    We propose to ground sensitivity of uncertain parameters of environmental models on a set of indices based on the main (statistical) moments, i.e., mean, variance, skewness and kurtosis, of the probability density function (pdf) of a target model output. This enables us to perform Global Sensitivity Analysis (GSA) of a model in terms of multiple statistical moments and yields a quantification of the impact of model parameters on features driving the shape of the pdf of model output. Our GSA approach includes the possibility of being coupled with the construction of a reduced complexity model that allows approximating the full model response at a reduced computational cost. We demonstrate our approach through a variety of test cases. These include a commonly used analytical benchmark, a simplified model representing pumping in a coastal aquifer, a laboratory-scale tracer experiment, and the migration of fracturing fluid through a naturally fractured reservoir (source) to reach an overlying formation (target). Our strategy allows discriminating the relative importance of model parameters to the four statistical moments considered. We also provide an appraisal of the error associated with the evaluation of our sensitivity metrics by replacing the original system model through the selected surrogate model. Our results suggest that one might need to construct a surrogate model with increasing level of accuracy depending on the statistical moment considered in the GSA. The methodological framework we propose can assist the development of analysis techniques targeted to model calibration, design of experiment, uncertainty quantification and risk assessment.

  12. The prediction of epidemics through mathematical modeling.

    PubMed

    Schaus, Catherine

    2014-01-01

    Mathematical models may be resorted to in an endeavor to predict the development of epidemics. The SIR model is one of the applications. Still too approximate, the use of statistics awaits more data in order to come closer to reality.

  13. Parameter discovery in stochastic biological models using simulated annealing and statistical model checking.

    PubMed

    Hussain, Faraz; Jha, Sumit K; Jha, Susmit; Langmead, Christopher J

    2014-01-01

    Stochastic models are increasingly used to study the behaviour of biochemical systems. While the structure of such models is often readily available from first principles, unknown quantitative features of the model are incorporated into the model as parameters. Algorithmic discovery of parameter values from experimentally observed facts remains a challenge for the computational systems biology community. We present a new parameter discovery algorithm that uses simulated annealing, sequential hypothesis testing, and statistical model checking to learn the parameters in a stochastic model. We apply our technique to a model of glucose and insulin metabolism used for in-silico validation of artificial pancreata and demonstrate its effectiveness by developing parallel CUDA-based implementation for parameter synthesis in this model.

  14. A comparison of linear and nonlinear statistical techniques in performance attribution.

    PubMed

    Chan, N H; Genovese, C R

    2001-01-01

    Performance attribution is usually conducted under the linear framework of multifactor models. Although commonly used by practitioners in finance, linear multifactor models are known to be less than satisfactory in many situations. After a brief survey of nonlinear methods, nonlinear statistical techniques are applied to performance attribution of a portfolio constructed from a fixed universe of stocks using factors derived from some commonly used cross sectional linear multifactor models. By rebalancing this portfolio monthly, the cumulative returns for procedures based on standard linear multifactor model and three nonlinear techniques-model selection, additive models, and neural networks-are calculated and compared. It is found that the first two nonlinear techniques, especially in combination, outperform the standard linear model. The results in the neural-network case are inconclusive because of the great variety of possible models. Although these methods are more complicated and may require some tuning, toolboxes are developed and suggestions on calibration are proposed. This paper demonstrates the usefulness of modern nonlinear statistical techniques in performance attribution.

  15. Automated finite element modeling of the lumbar spine: Using a statistical shape model to generate a virtual population of models.

    PubMed

    Campbell, J Q; Petrella, A J

    2016-09-06

    Population-based modeling of the lumbar spine has the potential to be a powerful clinical tool. However, developing a fully parameterized model of the lumbar spine with accurate geometry has remained a challenge. The current study used automated methods for landmark identification to create a statistical shape model of the lumbar spine. The shape model was evaluated using compactness, generalization ability, and specificity. The primary shape modes were analyzed visually, quantitatively, and biomechanically. The biomechanical analysis was performed by using the statistical shape model with an automated method for finite element model generation to create a fully parameterized finite element model of the lumbar spine. Functional finite element models of the mean shape and the extreme shapes (±3 standard deviations) of all 17 shape modes were created demonstrating the robust nature of the methods. This study represents an advancement in finite element modeling of the lumbar spine and will allow population-based modeling in the future. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. Statistics of Statisticians: Critical Mass of Statistics and Operational Research Groups

    NASA Astrophysics Data System (ADS)

    Kenna, Ralph; Berche, Bertrand

    Using a recently developed model, inspired by mean field theory in statistical physics, and data from the UK's Research Assessment Exercise, we analyse the relationship between the qualities of statistics and operational research groups and the quantities of researchers in them. Similar to other academic disciplines, we provide evidence for a linear dependency of quality on quantity up to an upper critical mass, which is interpreted as the average maximum number of colleagues with whom a researcher can communicate meaningfully within a research group. The model also predicts a lower critical mass, which research groups should strive to achieve to avoid extinction. For statistics and operational research, the lower critical mass is estimated to be 9 ± 3. The upper critical mass, beyond which research quality does not significantly depend on group size, is 17 ± 6.

  17. Partial Coordination Numbers in Binary Metallic Glasses (Postprint)

    DTIC Science & Technology

    2011-12-07

    structural differences related to relative atom size and quench rate. The magnitude of chemical interactions between the atoms, eij, might also influence...vious calculations.[2] A statistical approach is used to develop the Zij equations from the product of four terms: (1) the number of reference sites...within experimental scatter. The development of equations for Zij from the ECP model uses a statistical view of topology, and the Zij values

  18. A statistical human rib cage geometry model accounting for variations by age, sex, stature and body mass index.

    PubMed

    Shi, Xiangnan; Cao, Libo; Reed, Matthew P; Rupp, Jonathan D; Hoff, Carrie N; Hu, Jingwen

    2014-07-18

    In this study, we developed a statistical rib cage geometry model accounting for variations by age, sex, stature and body mass index (BMI). Thorax CT scans were obtained from 89 subjects approximately evenly distributed among 8 age groups and both sexes. Threshold-based CT image segmentation was performed to extract the rib geometries, and a total of 464 landmarks on the left side of each subject׳s ribcage were collected to describe the size and shape of the rib cage as well as the cross-sectional geometry of each rib. Principal component analysis and multivariate regression analysis were conducted to predict rib cage geometry as a function of age, sex, stature, and BMI, all of which showed strong effects on rib cage geometry. Except for BMI, all parameters also showed significant effects on rib cross-sectional area using a linear mixed model. This statistical rib cage geometry model can serve as a geometric basis for developing a parametric human thorax finite element model for quantifying effects from different human attributes on thoracic injury risks. Copyright © 2014 Elsevier Ltd. All rights reserved.

  19. Application of multivariate Gaussian detection theory to known non-Gaussian probability density functions

    NASA Astrophysics Data System (ADS)

    Schwartz, Craig R.; Thelen, Brian J.; Kenton, Arthur C.

    1995-06-01

    A statistical parametric multispectral sensor performance model was developed by ERIM to support mine field detection studies, multispectral sensor design/performance trade-off studies, and target detection algorithm development. The model assumes target detection algorithms and their performance models which are based on data assumed to obey multivariate Gaussian probability distribution functions (PDFs). The applicability of these algorithms and performance models can be generalized to data having non-Gaussian PDFs through the use of transforms which convert non-Gaussian data to Gaussian (or near-Gaussian) data. An example of one such transform is the Box-Cox power law transform. In practice, such a transform can be applied to non-Gaussian data prior to the introduction of a detection algorithm that is formally based on the assumption of multivariate Gaussian data. This paper presents an extension of these techniques to the case where the joint multivariate probability density function of the non-Gaussian input data is known, and where the joint estimate of the multivariate Gaussian statistics, under the Box-Cox transform, is desired. The jointly estimated multivariate Gaussian statistics can then be used to predict the performance of a target detection algorithm which has an associated Gaussian performance model.

  20. Model Checking Techniques for Assessing Functional Form Specifications in Censored Linear Regression Models.

    PubMed

    León, Larry F; Cai, Tianxi

    2012-04-01

    In this paper we develop model checking techniques for assessing functional form specifications of covariates in censored linear regression models. These procedures are based on a censored data analog to taking cumulative sums of "robust" residuals over the space of the covariate under investigation. These cumulative sums are formed by integrating certain Kaplan-Meier estimators and may be viewed as "robust" censored data analogs to the processes considered by Lin, Wei & Ying (2002). The null distributions of these stochastic processes can be approximated by the distributions of certain zero-mean Gaussian processes whose realizations can be generated by computer simulation. Each observed process can then be graphically compared with a few realizations from the Gaussian process. We also develop formal test statistics for numerical comparison. Such comparisons enable one to assess objectively whether an apparent trend seen in a residual plot reects model misspecification or natural variation. We illustrate the methods with a well known dataset. In addition, we examine the finite sample performance of the proposed test statistics in simulation experiments. In our simulation experiments, the proposed test statistics have good power of detecting misspecification while at the same time controlling the size of the test.

  1. Scientific, statistical, practical, and regulatory considerations in design space development.

    PubMed

    Debevec, Veronika; Srčič, Stanko; Horvat, Matej

    2018-03-01

    The quality by design (QbD) paradigm guides the pharmaceutical industry towards improved understanding of products and processes, and at the same time facilitates a high degree of manufacturing and regulatory flexibility throughout the establishment of the design space. This review article presents scientific, statistical and regulatory considerations in design space development. All key development milestones, starting with planning, selection of factors, experimental execution, data analysis, model development and assessment, verification, and validation, and ending with design space submission, are presented and discussed. The focus is especially on frequently ignored topics, like management of factors and CQAs that will not be included in experimental design, evaluation of risk of failure on design space edges, or modeling scale-up strategy. Moreover, development of a design space that is independent of manufacturing scale is proposed as the preferred approach.

  2. Statistics for stochastic modeling of volume reduction, hydrograph extension, and water-quality treatment by structural stormwater runoff best management practices (BMPs)

    USGS Publications Warehouse

    Granato, Gregory E.

    2014-01-01

    The U.S. Geological Survey (USGS) developed the Stochastic Empirical Loading and Dilution Model (SELDM) in cooperation with the Federal Highway Administration (FHWA) to indicate the risk for stormwater concentrations, flows, and loads to be above user-selected water-quality goals and the potential effectiveness of mitigation measures to reduce such risks. SELDM models the potential effect of mitigation measures by using Monte Carlo methods with statistics that approximate the net effects of structural and nonstructural best management practices (BMPs). In this report, structural BMPs are defined as the components of the drainage pathway between the source of runoff and a stormwater discharge location that affect the volume, timing, or quality of runoff. SELDM uses a simple stochastic statistical model of BMP performance to develop planning-level estimates of runoff-event characteristics. This statistical approach can be used to represent a single BMP or an assemblage of BMPs. The SELDM BMP-treatment module has provisions for stochastic modeling of three stormwater treatments: volume reduction, hydrograph extension, and water-quality treatment. In SELDM, these three treatment variables are modeled by using the trapezoidal distribution and the rank correlation with the associated highway-runoff variables. This report describes methods for calculating the trapezoidal-distribution statistics and rank correlation coefficients for stochastic modeling of volume reduction, hydrograph extension, and water-quality treatment by structural stormwater BMPs and provides the calculated values for these variables. This report also provides robust methods for estimating the minimum irreducible concentration (MIC), which is the lowest expected effluent concentration from a particular BMP site or a class of BMPs. These statistics are different from the statistics commonly used to characterize or compare BMPs. They are designed to provide a stochastic transfer function to approximate the quantity, duration, and quality of BMP effluent given the associated inflow values for a population of storm events. A database application and several spreadsheet tools are included in the digital media accompanying this report for further documentation of methods and for future use. In this study, analyses were done with data extracted from a modified copy of the January 2012 version of International Stormwater Best Management Practices Database, designated herein as the January 2012a version. Statistics for volume reduction, hydrograph extension, and water-quality treatment were developed with selected data. Sufficient data were available to estimate statistics for 5 to 10 BMP categories by using data from 40 to more than 165 monitoring sites. Water-quality treatment statistics were developed for 13 runoff-quality constituents commonly measured in highway and urban runoff studies including turbidity, sediment and solids; nutrients; total metals; organic carbon; and fecal coliforms. The medians of the best-fit statistics for each category were selected to construct generalized cumulative distribution functions for the three treatment variables. For volume reduction and hydrograph extension, interpretation of available data indicates that selection of a Spearman’s rho value that is the average of the median and maximum values for the BMP category may help generate realistic simulation results in SELDM. The median rho value may be selected to help generate realistic simulation results for water-quality treatment variables. MIC statistics were developed for 12 runoff-quality constituents commonly measured in highway and urban runoff studies by using data from 11 BMP categories and more than 167 monitoring sites. Four statistical techniques were applied for estimating MIC values with monitoring data from each site. These techniques produce a range of lower-bound estimates for each site. Four MIC estimators are proposed as alternatives for selecting a value from among the estimates from multiple sites. Correlation analysis indicates that the MIC estimates from multiple sites were weakly correlated with the geometric mean of inflow values, which indicates that there may be a qualitative or semiquantitative link between the inflow quality and the MIC. Correlations probably are weak because the MIC is influenced by the inflow water quality and the capability of each individual BMP site to reduce inflow concentrations.

  3. Assessing the Impact of Climate Change on Stream Temperatures in the Methow River Basin, Washington

    NASA Astrophysics Data System (ADS)

    Gangopadhyay, S.; Caldwell, R. J.; Lai, Y.; Bountry, J.

    2011-12-01

    The Methow River in Washington offers prime spawning habitat for salmon and other cold-water fishes. During the summer months, low streamflows on the Methow result in cutoff side channels that limit the habitat available to these fishes. Future climate scenarios of increasing air temperature and decreasing precipitation suggest the potential for increasing loss of habitat and fish mortality as stream temperatures rise in response to lower flows and additional heating. To assess the impacts of climate change on stream temperature in the Methow River, the US Bureau of Reclamation is developing an hourly time-step, two-dimensional hydraulic model of the confluence of the Methow and Chewuch Rivers above Winthrop. The model will be coupled with a physical stream temperature model to generate spatial representations of stream conditions conducive for fish habitat. In this study, we develop a statistical framework for generating stream temperature time series from global climate model (GCM) and hydrologic model outputs. Regional observations of stream temperature and hydrometeorological conditions are used to develop statistical models of daily mean stream temperature for the Methow River at Winthrop, WA. Temperature and precipitation projections from 10 global climate models (GCMs) are coupled with the streamflow generated using the University of Washington Variable Infiltration Capacity model. The projections serve as input to the statistical models to generate daily time series of mean daily stream temperature. Since the output from the GCM, VIC, and statistical models offer only daily data, a k-nearest neighbor (k-nn) resampling technique is employed to select appropriate proportion vectors for disaggregating the Winthrop daily flow and temperature to an upstream location on each of the rivers above the confluence. Hourly proportion vectors are then used to disaggregate the daily flow and temperature to hourly values to be used in the hydraulic model. Historical meteorological variables are also selected using the k-nn method. We present the statistical modeling framework using Generalized Linear Models (GLMs), along with diagnostics and measurements of skill. We will also provide a comparison of the stream temperature projections from the future years of 2020, 2040, and 2080 and discuss the potential implications on fish habitat in the Methow River. Future integration of the hourly climate scenarios in the hydraulic model will provide the ability to assess the spatial extent of habitat impacts and allow the USBR to evaluate the effectiveness of various river restoration projects in maintaining or improving habitat in a changing climate.

  4. Regression Analysis as a Cost Estimation Model for Unexploded Ordnance Cleanup at Former Military Installations

    DTIC Science & Technology

    2002-06-01

    fits our actual data . To determine the goodness of fit, statisticians typically use the following four measures: R2 Statistic. The R2 statistic...reviewing instruction, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of...mathematical model is developed to better estimate cleanup costs using historical cost data that could be used by the Defense Department prior to placing

  5. Comparing statistical and machine learning classifiers: alternatives for predictive modeling in human factors research.

    PubMed

    Carnahan, Brian; Meyer, Gérard; Kuntz, Lois-Ann

    2003-01-01

    Multivariate classification models play an increasingly important role in human factors research. In the past, these models have been based primarily on discriminant analysis and logistic regression. Models developed from machine learning research offer the human factors professional a viable alternative to these traditional statistical classification methods. To illustrate this point, two machine learning approaches--genetic programming and decision tree induction--were used to construct classification models designed to predict whether or not a student truck driver would pass his or her commercial driver license (CDL) examination. The models were developed and validated using the curriculum scores and CDL exam performances of 37 student truck drivers who had completed a 320-hr driver training course. Results indicated that the machine learning classification models were superior to discriminant analysis and logistic regression in terms of predictive accuracy. Actual or potential applications of this research include the creation of models that more accurately predict human performance outcomes.

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lewis, John R.; Brooks, Dusty Marie

    In pressurized water reactors, the prevention, detection, and repair of cracks within dissimilar metal welds is essential to ensure proper plant functionality and safety. Weld residual stresses, which are difficult to model and cannot be directly measured, contribute to the formation and growth of cracks due to primary water stress corrosion cracking. Additionally, the uncertainty in weld residual stress measurements and modeling predictions is not well understood, further complicating the prediction of crack evolution. The purpose of this document is to develop methodology to quantify the uncertainty associated with weld residual stress that can be applied to modeling predictions andmore » experimental measurements. Ultimately, the results can be used to assess the current state of uncertainty and to build confidence in both modeling and experimental procedures. The methodology consists of statistically modeling the variation in the weld residual stress profiles using functional data analysis techniques. Uncertainty is quantified using statistical bounds (e.g. confidence and tolerance bounds) constructed with a semi-parametric bootstrap procedure. Such bounds describe the range in which quantities of interest, such as means, are expected to lie as evidenced by the data. The methodology is extended to provide direct comparisons between experimental measurements and modeling predictions by constructing statistical confidence bounds for the average difference between the two quantities. The statistical bounds on the average difference can be used to assess the level of agreement between measurements and predictions. The methodology is applied to experimental measurements of residual stress obtained using two strain relief measurement methods and predictions from seven finite element models developed by different organizations during a round robin study.« less

  7. Statistically accurate low-order models for uncertainty quantification in turbulent dynamical systems.

    PubMed

    Sapsis, Themistoklis P; Majda, Andrew J

    2013-08-20

    A framework for low-order predictive statistical modeling and uncertainty quantification in turbulent dynamical systems is developed here. These reduced-order, modified quasilinear Gaussian (ROMQG) algorithms apply to turbulent dynamical systems in which there is significant linear instability or linear nonnormal dynamics in the unperturbed system and energy-conserving nonlinear interactions that transfer energy from the unstable modes to the stable modes where dissipation occurs, resulting in a statistical steady state; such turbulent dynamical systems are ubiquitous in geophysical and engineering turbulence. The ROMQG method involves constructing a low-order, nonlinear, dynamical system for the mean and covariance statistics in the reduced subspace that has the unperturbed statistics as a stable fixed point and optimally incorporates the indirect effect of non-Gaussian third-order statistics for the unperturbed system in a systematic calibration stage. This calibration procedure is achieved through information involving only the mean and covariance statistics for the unperturbed equilibrium. The performance of the ROMQG algorithm is assessed on two stringent test cases: the 40-mode Lorenz 96 model mimicking midlatitude atmospheric turbulence and two-layer baroclinic models for high-latitude ocean turbulence with over 125,000 degrees of freedom. In the Lorenz 96 model, the ROMQG algorithm with just a single mode captures the transient response to random or deterministic forcing. For the baroclinic ocean turbulence models, the inexpensive ROMQG algorithm with 252 modes, less than 0.2% of the total, captures the nonlinear response of the energy, the heat flux, and even the one-dimensional energy and heat flux spectra.

  8. The Development of Web-based Graphical User Interface for Unified Modeling Data with Multi (Correlated) Responses

    NASA Astrophysics Data System (ADS)

    Made Tirta, I.; Anggraeni, Dian

    2018-04-01

    Statistical models have been developed rapidly into various directions to accommodate various types of data. Data collected from longitudinal, repeated measured, clustered data (either continuous, binary, count, or ordinal), are more likely to be correlated. Therefore statistical model for independent responses, such as Generalized Linear Model (GLM), Generalized Additive Model (GAM) are not appropriate. There are several models available to apply for correlated responses including GEEs (Generalized Estimating Equations), for marginal model and various mixed effect model such as GLMM (Generalized Linear Mixed Models) and HGLM (Hierarchical Generalized Linear Models) for subject spesific models. These models are available on free open source software R, but they can only be accessed through command line interface (using scrit). On the othe hand, most practical researchers very much rely on menu based or Graphical User Interface (GUI). We develop, using Shiny framework, standard pull down menu Web-GUI that unifies most models for correlated responses. The Web-GUI has accomodated almost all needed features. It enables users to do and compare various modeling for repeated measure data (GEE, GLMM, HGLM, GEE for nominal responses) much more easily trough online menus. This paper discusses the features of the Web-GUI and illustrates the use of them. In General we find that GEE, GLMM, HGLM gave very closed results.

  9. Effect of genetic polymorphisms on development of gout.

    PubMed

    Urano, Wako; Taniguchi, Atsuo; Inoue, Eisuke; Sekita, Chieko; Ichikawa, Naomi; Koseki, Yumi; Kamatani, Naoyuki; Yamanaka, Hisashi

    2013-08-01

    To validate the association between genetic polymorphisms and gout in Japanese patients, and to investigate the cumulative effects of multiple genetic factors on the development of gout. Subjects were 153 Japanese male patients with gout and 532 male controls. The genotypes of 11 polymorphisms in the 10 genes that have been indicated to be associated with serum uric acid levels or gout were determined. The cumulative effects of the genetic polymorphisms were investigated using a weighted genotype risk score (wGRS) based on the number of risk alleles and the OR for gout. A model to discriminate between patients with gout and controls was constructed by incorporating the wGRS and clinical factors. C statistics method was applied to evaluate the capability of the model to discriminate gout patients from controls. Seven polymorphisms were shown to be associated with gout. The mean wGRS was significantly higher in patients with gout (15.2 ± 2.01) compared to controls (13.4 ± 2.10; p < 0.0001). The C statistic for the model using genetic information alone was 0.72, while the C statistic was 0.81 for the full model that incorporated all genetic and clinical factors. Accumulation of multiple genetic factors is associated with the development of gout. A prediction model for gout that incorporates genetic and clinical factors may be useful for identifying individuals who are at risk of gout.

  10. Graphene growth process modeling: a physical-statistical approach

    NASA Astrophysics Data System (ADS)

    Wu, Jian; Huang, Qiang

    2014-09-01

    As a zero-band semiconductor, graphene is an attractive material for a wide variety of applications such as optoelectronics. Among various techniques developed for graphene synthesis, chemical vapor deposition on copper foils shows high potential for producing few-layer and large-area graphene. Since fabrication of high-quality graphene sheets requires the understanding of growth mechanisms, and methods of characterization and control of grain size of graphene flakes, analytical modeling of graphene growth process is therefore essential for controlled fabrication. The graphene growth process starts with randomly nucleated islands that gradually develop into complex shapes, grow in size, and eventually connect together to cover the copper foil. To model this complex process, we develop a physical-statistical approach under the assumption of self-similarity during graphene growth. The growth kinetics is uncovered by separating island shapes from area growth rate. We propose to characterize the area growth velocity using a confined exponential model, which not only has clear physical explanation, but also fits the real data well. For the shape modeling, we develop a parametric shape model which can be well explained by the angular-dependent growth rate. This work can provide useful information for the control and optimization of graphene growth process on Cu foil.

  11. Statistical modeling of software reliability

    NASA Technical Reports Server (NTRS)

    Miller, Douglas R.

    1992-01-01

    This working paper discusses the statistical simulation part of a controlled software development experiment being conducted under the direction of the System Validation Methods Branch, Information Systems Division, NASA Langley Research Center. The experiment uses guidance and control software (GCS) aboard a fictitious planetary landing spacecraft: real-time control software operating on a transient mission. Software execution is simulated to study the statistical aspects of reliability and other failure characteristics of the software during development, testing, and random usage. Quantification of software reliability is a major goal. Various reliability concepts are discussed. Experiments are described for performing simulations and collecting appropriate simulated software performance and failure data. This data is then used to make statistical inferences about the quality of the software development and verification processes as well as inferences about the reliability of software versions and reliability growth under random testing and debugging.

  12. Airborne Wireless Communication Modeling and Analysis with MATLAB

    DTIC Science & Technology

    2014-03-27

    research develops a physical layer model that combines antenna modeling using computational electromagnetics and the two-ray propagation model to...predict the received signal strength. The antenna is modeled with triangular patches and analyzed by extending the antenna modeling algorithm by Sergey...7  2.7. Propagation Modeling : Statistical Models ............................................................8  2.8. Antenna Modeling

  13. Phylogeography Takes a Relaxed Random Walk in Continuous Space and Time

    PubMed Central

    Lemey, Philippe; Rambaut, Andrew; Welch, John J.; Suchard, Marc A.

    2010-01-01

    Research aimed at understanding the geographic context of evolutionary histories is burgeoning across biological disciplines. Recent endeavors attempt to interpret contemporaneous genetic variation in the light of increasingly detailed geographical and environmental observations. Such interest has promoted the development of phylogeographic inference techniques that explicitly aim to integrate such heterogeneous data. One promising development involves reconstructing phylogeographic history on a continuous landscape. Here, we present a Bayesian statistical approach to infer continuous phylogeographic diffusion using random walk models while simultaneously reconstructing the evolutionary history in time from molecular sequence data. Moreover, by accommodating branch-specific variation in dispersal rates, we relax the most restrictive assumption of the standard Brownian diffusion process and demonstrate increased statistical efficiency in spatial reconstructions of overdispersed random walks by analyzing both simulated and real viral genetic data. We further illustrate how drawing inference about summary statistics from a fully specified stochastic process over both sequence evolution and spatial movement reveals important characteristics of a rabies epidemic. Together with recent advances in discrete phylogeographic inference, the continuous model developments furnish a flexible statistical framework for biogeographical reconstructions that is easily expanded upon to accommodate various landscape genetic features. PMID:20203288

  14. Use of observational and model-derived fields and regime model output statistics in mesoscale forecasting

    NASA Technical Reports Server (NTRS)

    Forbes, G. S.; Pielke, R. A.

    1985-01-01

    Various empirical and statistical weather-forecasting studies which utilize stratification by weather regime are described. Objective classification was used to determine weather regime in some studies. In other cases the weather pattern was determined on the basis of a parameter representing the physical and dynamical processes relevant to the anticipated mesoscale phenomena, such as low level moisture convergence and convective precipitation, or the Froude number and the occurrence of cold-air damming. For mesoscale phenomena already in existence, new forecasting techniques were developed. The use of cloud models in operational forecasting is discussed. Models to calculate the spatial scales of forcings and resultant response for mesoscale systems are presented. The use of these models to represent the climatologically most prevalent systems, and to perform case-by-case simulations is reviewed. Operational implementation of mesoscale data into weather forecasts, using both actual simulation output and method-output statistics is discussed.

  15. The FORE-SCE model: a practical approach for projecting land cover change using scenario-based modeling

    USGS Publications Warehouse

    Sohl, Terry L.; Sayler, Kristi L.; Drummond, Mark A.; Loveland, Thomas R.

    2007-01-01

    A wide variety of ecological applications require spatially explicit, historic, current, and projected land use and land cover data. The U.S. Land Cover Trends project is analyzing contemporary (1973–2000) land-cover change in the conterminous United States. The newly developed FORE-SCE model used Land Cover Trends data and theoretical, statistical, and deterministic modeling techniques to project future land cover change through 2020 for multiple plausible scenarios. Projected proportions of future land use were initially developed, and then sited on the lands with the highest potential for supporting that land use and land cover using a statistically based stochastic allocation procedure. Three scenarios of 2020 land cover were mapped for the western Great Plains in the US. The model provided realistic, high-resolution, scenario-based land-cover products suitable for multiple applications, including studies of climate and weather variability, carbon dynamics, and regional hydrology.

  16. Development of a new pan-European testate amoeba transfer function for reconstructing peatland palaeohydrology

    NASA Astrophysics Data System (ADS)

    Amesbury, Matthew J.; Swindles, Graeme T.; Bobrov, Anatoly; Charman, Dan J.; Holden, Joseph; Lamentowicz, Mariusz; Mallon, Gunnar; Mazei, Yuri; Mitchell, Edward A. D.; Payne, Richard J.; Roland, Thomas P.; Turner, T. Edward; Warner, Barry G.

    2016-11-01

    In the decade since the first pan-European testate amoeba-based transfer function for peatland palaeohydrological reconstruction was published, a vast amount of additional data collection has been undertaken by the research community. Here, we expand the pan-European dataset from 128 to 1799 samples, spanning 35° of latitude and 55° of longitude. After the development of a new taxonomic scheme to permit compilation of data from a wide range of contributors and the removal of samples with high pH values, we developed ecological transfer functions using a range of model types and a dataset of ∼1300 samples. We rigorously tested the efficacy of these models using both statistical validation and independent test sets with associated instrumental data. Model performance measured by statistical indicators was comparable to other published models. Comparison to test sets showed that taxonomic resolution did not impair model performance and that the new pan-European model can therefore be used as an effective tool for palaeohydrological reconstruction. Our results question the efficacy of relying on statistical validation of transfer functions alone and support a multi-faceted approach to the assessment of new models. We substantiated recent advice that model outputs should be standardised and presented as residual values in order to focus interpretation on secure directional shifts, avoiding potentially inaccurate conclusions relating to specific water-table depths. The extent and diversity of the dataset highlighted that, at the taxonomic resolution applied, a majority of taxa had broad geographic distributions, though some morphotypes appeared to have restricted ranges.

  17. Forecasting runout of rock and debris avalanches

    USGS Publications Warehouse

    Iverson, Richard M.; Evans, S.G.; Mugnozza, G.S.; Strom, A.; Hermanns, R.L.

    2006-01-01

    Physically based mathematical models and statistically based empirical equations each may provide useful means of forecasting runout of rock and debris avalanches. This paper compares the foundations, strengths, and limitations of a physically based model and a statistically based forecasting method, both of which were developed to predict runout across three-dimensional topography. The chief advantage of the physically based model results from its ties to physical conservation laws and well-tested axioms of soil and rock mechanics, such as the Coulomb friction rule and effective-stress principle. The output of this model provides detailed information about the dynamics of avalanche runout, at the expense of high demands for accurate input data, numerical computation, and experimental testing. In comparison, the statistical method requires relatively modest computation and no input data except identification of prospective avalanche source areas and a range of postulated avalanche volumes. Like the physically based model, the statistical method yields maps of predicted runout, but it provides no information on runout dynamics. Although the two methods differ significantly in their structure and objectives, insights gained from one method can aid refinement of the other.

  18. September Arctic Sea Ice minimum prediction - a new skillful statistical approach

    NASA Astrophysics Data System (ADS)

    Ionita-Scholz, Monica; Grosfeld, Klaus; Scholz, Patrick; Treffeisen, Renate; Lohmann, Gerrit

    2017-04-01

    Sea ice in both Polar Regions is an important indicator for the expression of global climate change and its polar amplification. Consequently, a broad interest exists on sea ice, its coverage, variability and long term change. Knowledge on sea ice requires high quality data on ice extent, thickness and its dynamics. However, its predictability is complex and it depends on various climate and oceanic parameters and conditions. In order to provide insights into the potential development of a monthly/seasonal signal of sea ice evolution, we developed a robust statistical model based on ocean heat content, sea surface temperature and different atmospheric variables to calculate an estimate of the September Sea ice extent (SSIE) on monthly time scale. Although previous statistical attempts at monthly/seasonal forecasts of SSIE show a relatively reduced skill, we show here that more than 92% (r = 0.96) of the September sea ice extent can be predicted at the end of May by using previous months' climate and oceanic conditions. The skill of the model increases with a decrease in the time lag used for the forecast. At the end of August, our predictions are even able to explain 99% of the SSIE. Our statistical model captures both the general trend as well as the interannual variability of the SSIE. Moreover, it is able to properly forecast the years with extreme high/low SSIE (e.g. 1996/ 2007, 2012, 2013). Besides its forecast skill for SSIE, the model could provide a valuable tool for identifying relevant regions and climate parameters that are important for the sea ice development in the Arctic and for detecting sensitive and critical regions in global coupled climate models with focus on sea ice formation.

  19. Comparing estimates of climate change impacts from process-based and statistical crop models

    NASA Astrophysics Data System (ADS)

    Lobell, David B.; Asseng, Senthold

    2017-01-01

    The potential impacts of climate change on crop productivity are of widespread interest to those concerned with addressing climate change and improving global food security. Two common approaches to assess these impacts are process-based simulation models, which attempt to represent key dynamic processes affecting crop yields, and statistical models, which estimate functional relationships between historical observations of weather and yields. Examples of both approaches are increasingly found in the scientific literature, although often published in different disciplinary journals. Here we compare published sensitivities to changes in temperature, precipitation, carbon dioxide (CO2), and ozone from each approach for the subset of crops, locations, and climate scenarios for which both have been applied. Despite a common perception that statistical models are more pessimistic, we find no systematic differences between the predicted sensitivities to warming from process-based and statistical models up to +2 °C, with limited evidence at higher levels of warming. For precipitation, there are many reasons why estimates could be expected to differ, but few estimates exist to develop robust comparisons, and precipitation changes are rarely the dominant factor for predicting impacts given the prominent role of temperature, CO2, and ozone changes. A common difference between process-based and statistical studies is that the former tend to include the effects of CO2 increases that accompany warming, whereas statistical models typically do not. Major needs moving forward include incorporating CO2 effects into statistical studies, improving both approaches’ treatment of ozone, and increasing the use of both methods within the same study. At the same time, those who fund or use crop model projections should understand that in the short-term, both approaches when done well are likely to provide similar estimates of warming impacts, with statistical models generally requiring fewer resources to produce robust estimates, especially when applied to crops beyond the major grains.

  20. Systematic Mapping and Statistical Analyses of Valley Landform and Vegetation Asymmetries Across Hydroclimatic Gradients

    NASA Astrophysics Data System (ADS)

    Poulos, M. J.; Pierce, J. L.; McNamara, J. P.; Flores, A. N.; Benner, S. G.

    2015-12-01

    Terrain aspect alters the spatial distribution of insolation across topography, driving eco-pedo-hydro-geomorphic feedbacks that can alter landform evolution and result in valley asymmetries for a suite of land surface characteristics (e.g. slope length and steepness, vegetation, soil properties, and drainage development). Asymmetric valleys serve as natural laboratories for studying how landscapes respond to climate perturbation. In the semi-arid montane granodioritic terrain of the Idaho batholith, Northern Rocky Mountains, USA, prior works indicate that reduced insolation on northern (pole-facing) aspects prolongs snow pack persistence, and is associated with thicker, finer-grained soils, that retain more water, prolong the growing season, support coniferous forest rather than sagebrush steppe ecosystems, stabilize slopes at steeper angles, and produce sparser drainage networks. We hypothesize that the primary drivers of valley asymmetry development are changes in the pedon-scale water-balance that coalesce to alter catchment-scale runoff and drainage development, and ultimately cause the divide between north and south-facing land surfaces to migrate northward. We explore this conceptual framework by coupling land surface analyses with statistical modeling to assess relationships and the relative importance of land surface characteristics. Throughout the Idaho batholith, we systematically mapped and tabulated various statistical measures of landforms, land cover, and hydroclimate within discrete valley segments (n=~10,000). We developed a random forest based statistical model to predict valley slope asymmetry based upon numerous measures (n>300) of landscape asymmetries. Preliminary results suggest that drainages are tightly coupled with hillslopes throughout the region, with drainage-network slope being one of the strongest predictors of land-surface-averaged slope asymmetry. When slope-related statistics are excluded, due to possible autocorrelation, valley slope asymmetry is most strongly predicted by asymmetries of insolation and drainage density, which generally supports a water-balance based conceptual model of valley asymmetry development. Surprisingly, vegetation asymmetries had relatively low predictive importance.

  1. Scale Dependence of Statistics of Spatially Averaged Rain Rate Seen in TOGA COARE Comparison with Predictions from a Stochastic Model

    NASA Technical Reports Server (NTRS)

    Kundu, Prasun K.; Bell, T. L.; Lau, William K. M. (Technical Monitor)

    2002-01-01

    A characteristic feature of rainfall statistics is that they in general depend on the space and time scales over which rain data are averaged. As a part of an earlier effort to determine the sampling error of satellite rain averages, a space-time model of rainfall statistics was developed to describe the statistics of gridded rain observed in GATE. The model allows one to compute the second moment statistics of space- and time-averaged rain rate which can be fitted to satellite or rain gauge data to determine the four model parameters appearing in the precipitation spectrum - an overall strength parameter, a characteristic length separating the long and short wavelength regimes and a characteristic relaxation time for decay of the autocorrelation of the instantaneous local rain rate and a certain 'fractal' power law exponent. For area-averaged instantaneous rain rate, this exponent governs the power law dependence of these statistics on the averaging length scale $L$ predicted by the model in the limit of small $L$. In particular, the variance of rain rate averaged over an $L \\times L$ area exhibits a power law singularity as $L \\rightarrow 0$. In the present work the model is used to investigate how the statistics of area-averaged rain rate over the tropical Western Pacific measured with ship borne radar during TOGA COARE (Tropical Ocean Global Atmosphere Coupled Ocean Atmospheric Response Experiment) and gridded on a 2 km grid depends on the size of the spatial averaging scale. Good agreement is found between the data and predictions from the model over a wide range of averaging length scales.

  2. Strengthen forensic entomology in court--the need for data exploration and the validation of a generalised additive mixed model.

    PubMed

    Baqué, Michèle; Amendt, Jens

    2013-01-01

    Developmental data of juvenile blow flies (Diptera: Calliphoridae) are typically used to calculate the age of immature stages found on or around a corpse and thus to estimate a minimum post-mortem interval (PMI(min)). However, many of those data sets don't take into account that immature blow flies grow in a non-linear fashion. Linear models do not supply a sufficient reliability on age estimates and may even lead to an erroneous determination of the PMI(min). According to the Daubert standard and the need for improvements in forensic science, new statistic tools like smoothing methods and mixed models allow the modelling of non-linear relationships and expand the field of statistical analyses. The present study introduces into the background and application of these statistical techniques by analysing a model which describes the development of the forensically important blow fly Calliphora vicina at different temperatures. The comparison of three statistical methods (linear regression, generalised additive modelling and generalised additive mixed modelling) clearly demonstrates that only the latter provided regression parameters that reflect the data adequately. We focus explicitly on both the exploration of the data--to assure their quality and to show the importance of checking it carefully prior to conducting the statistical tests--and the validation of the resulting models. Hence, we present a common method for evaluating and testing forensic entomological data sets by using for the first time generalised additive mixed models.

  3. Regression modeling of ground-water flow

    USGS Publications Warehouse

    Cooley, R.L.; Naff, R.L.

    1985-01-01

    Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)

  4. Specialized data analysis of SSME and advanced propulsion system vibration measurements

    NASA Technical Reports Server (NTRS)

    Coffin, Thomas; Swanson, Wayne L.; Jong, Yen-Yi

    1993-01-01

    The basic objectives of this contract were to perform detailed analysis and evaluation of dynamic data obtained during Space Shuttle Main Engine (SSME) test and flight operations, including analytical/statistical assessment of component dynamic performance, and to continue the development and implementation of analytical/statistical models to effectively define nominal component dynamic characteristics, detect anomalous behavior, and assess machinery operational conditions. This study was to provide timely assessment of engine component operational status, identify probable causes of malfunction, and define feasible engineering solutions. The work was performed under three broad tasks: (1) Analysis, Evaluation, and Documentation of SSME Dynamic Test Results; (2) Data Base and Analytical Model Development and Application; and (3) Development and Application of Vibration Signature Analysis Techniques.

  5. Development of Turbulent Biological Closure Parameterizations

    DTIC Science & Technology

    2011-09-30

    LONG-TERM GOAL: The long-term goals of this project are: (1) to develop a theoretical framework to quantify turbulence induced NPZ interactions. (2) to apply the theory to develop parameterizations to be used in realistic environmental physical biological coupling numerical models. OBJECTIVES: Connect the Goodman and Robinson (2008) statistically based pdf theory to Advection Diffusion Reaction (ADR) modeling of NPZ interaction.

  6. How well can we predict forage species occurrence and abundance?

    USDA-ARS?s Scientific Manuscript database

    As part of a larger effort focused on forage species production and management, we have been developing a statistical modeling approach to predict the probability of species occurrence and the abundance for Orchard Grass over the Northeast region of the United States using two selected statistical m...

  7. A Review of Meta-Analysis Packages in R

    ERIC Educational Resources Information Center

    Polanin, Joshua R.; Hennessy, Emily A.; Tanner-Smith, Emily E.

    2017-01-01

    Meta-analysis is a statistical technique that allows an analyst to synthesize effect sizes from multiple primary studies. To estimate meta-analysis models, the open-source statistical environment R is quickly becoming a popular choice. The meta-analytic community has contributed to this growth by developing numerous packages specific to…

  8. Applications of spatial statistical network models to stream data

    Treesearch

    Daniel J. Isaak; Erin E. Peterson; Jay M. Ver Hoef; Seth J. Wenger; Jeffrey A. Falke; Christian E. Torgersen; Colin Sowder; E. Ashley Steel; Marie-Josee Fortin; Chris E. Jordan; Aaron S. Ruesch; Nicholas Som; Pascal Monestiez

    2014-01-01

    Streams and rivers host a significant portion of Earth's biodiversity and provide important ecosystem services for human populations. Accurate information regarding the status and trends of stream resources is vital for their effective conservation and management. Most statistical techniques applied to data measured on stream networks were developed for...

  9. An empirical approach to sufficient similarity in dose-responsiveness: Utilization of statistical distance as a similarity measure.

    EPA Science Inventory

    Using statistical equivalence testing logic and mixed model theory an approach has been developed, that extends the work of Stork et al (JABES,2008), to define sufficient similarity in dose-response for chemical mixtures containing the same chemicals with different ratios ...

  10. Sampling stored product insect pests: a comparison of four statistical sampling models for probability of pest detection

    USDA-ARS?s Scientific Manuscript database

    Statistically robust sampling strategies form an integral component of grain storage and handling activities throughout the world. Developing sampling strategies to target biological pests such as insects in stored grain is inherently difficult due to species biology and behavioral characteristics. ...

  11. Introduction to this special issue on statistics for wildfire processes

    Treesearch

    Marcia Gumpertz

    2009-01-01

    This special issue on statistics for wildfire processes brings together foresters, wildfire ecologists, statisticians, mathematicians, and economists. All of these disciplines bring different interests, approaches and expertise to the modeling of wildfire processes. It is not necessarily easy, however, to communicate across disciplines or follow the developments in a...

  12. New Methodology for Estimating Fuel Economy by Vehicle Class

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chin, Shih-Miao; Dabbs, Kathryn; Hwang, Ho-Ling

    2011-01-01

    Office of Highway Policy Information to develop a new methodology to generate annual estimates of average fuel efficiency and number of motor vehicles registered by vehicle class for Table VM-1 of the Highway Statistics annual publication. This paper describes the new methodology developed under this effort and compares the results of the existing manual method and the new systematic approach. The methodology developed under this study takes a two-step approach. First, the preliminary fuel efficiency rates are estimated based on vehicle stock models for different classes of vehicles. Then, a reconciliation model is used to adjust the initial fuel consumptionmore » rates from the vehicle stock models and match the VMT information for each vehicle class and the reported total fuel consumption. This reconciliation model utilizes a systematic approach that produces documentable and reproducible results. The basic framework utilizes a mathematical programming formulation to minimize the deviations between the fuel economy estimates published in the previous year s Highway Statistics and the results from the vehicle stock models, subject to the constraint that fuel consumptions for different vehicle classes must sum to the total fuel consumption estimate published in Table MF-21 of the current year Highway Statistics. The results generated from this new approach provide a smoother time series for the fuel economies by vehicle class. It also utilizes the most up-to-date and best available data with sound econometric models to generate MPG estimates by vehicle class.« less

  13. Internationalizing Nussbaum's Model of Cosmopolitan Democratic Education

    ERIC Educational Resources Information Center

    Culp, Julian

    2018-01-01

    Nussbaum's moral cosmopolitanism informs her capability-based theory of justice, which she uses in order to develop a distinctive model of cosmopolitan democratic education. I characterize Nussbaum's educational model as a 'statist model,' however, because it regards cosmopolitan democratic education as necessary for realizing democratic…

  14. The application of feature selection to the development of Gaussian process models for percutaneous absorption.

    PubMed

    Lam, Lun Tak; Sun, Yi; Davey, Neil; Adams, Rod; Prapopoulou, Maria; Brown, Marc B; Moss, Gary P

    2010-06-01

    The aim was to employ Gaussian processes to assess mathematically the nature of a skin permeability dataset and to employ these methods, particularly feature selection, to determine the key physicochemical descriptors which exert the most significant influence on percutaneous absorption, and to compare such models with established existing models. Gaussian processes, including automatic relevance detection (GPRARD) methods, were employed to develop models of percutaneous absorption that identified key physicochemical descriptors of percutaneous absorption. Using MatLab software, the statistical performance of these models was compared with single linear networks (SLN) and quantitative structure-permeability relationships (QSPRs). Feature selection methods were used to examine in more detail the physicochemical parameters used in this study. A range of statistical measures to determine model quality were used. The inherently nonlinear nature of the skin data set was confirmed. The Gaussian process regression (GPR) methods yielded predictive models that offered statistically significant improvements over SLN and QSPR models with regard to predictivity (where the rank order was: GPR > SLN > QSPR). Feature selection analysis determined that the best GPR models were those that contained log P, melting point and the number of hydrogen bond donor groups as significant descriptors. Further statistical analysis also found that great synergy existed between certain parameters. It suggested that a number of the descriptors employed were effectively interchangeable, thus questioning the use of models where discrete variables are output, usually in the form of an equation. The use of a nonlinear GPR method produced models with significantly improved predictivity, compared with SLN or QSPR models. Feature selection methods were able to provide important mechanistic information. However, it was also shown that significant synergy existed between certain parameters, and as such it was possible to interchange certain descriptors (i.e. molecular weight and melting point) without incurring a loss of model quality. Such synergy suggested that a model constructed from discrete terms in an equation may not be the most appropriate way of representing mechanistic understandings of skin absorption.

  15. Stochastic Analysis and Design of Heterogeneous Microstructural Materials System

    NASA Astrophysics Data System (ADS)

    Xu, Hongyi

    Advanced materials system refers to new materials that are comprised of multiple traditional constituents but complex microstructure morphologies, which lead to superior properties over the conventional materials. To accelerate the development of new advanced materials system, the objective of this dissertation is to develop a computational design framework and the associated techniques for design automation of microstructure materials systems, with an emphasis on addressing the uncertainties associated with the heterogeneity of microstructural materials. Five key research tasks are identified: design representation, design evaluation, design synthesis, material informatics and uncertainty quantification. Design representation of microstructure includes statistical characterization and stochastic reconstruction. This dissertation develops a new descriptor-based methodology, which characterizes 2D microstructures using descriptors of composition, dispersion and geometry. Statistics of 3D descriptors are predicted based on 2D information to enable 2D-to-3D reconstruction. An efficient sequential reconstruction algorithm is developed to reconstruct statistically equivalent random 3D digital microstructures. In design evaluation, a stochastic decomposition and reassembly strategy is developed to deal with the high computational costs and uncertainties induced by material heterogeneity. The properties of Representative Volume Elements (RVE) are predicted by stochastically reassembling SVE elements with stochastic properties into a coarse representation of the RVE. In design synthesis, a new descriptor-based design framework is developed, which integrates computational methods of microstructure characterization and reconstruction, sensitivity analysis, Design of Experiments (DOE), metamodeling and optimization the enable parametric optimization of the microstructure for achieving the desired material properties. Material informatics is studied to efficiently reduce the dimension of microstructure design space. This dissertation develops a machine learning-based methodology to identify the key microstructure descriptors that highly impact properties of interest. In uncertainty quantification, a comparative study on data-driven random process models is conducted to provide guidance for choosing the most accurate model in statistical uncertainty quantification. Two new goodness-of-fit metrics are developed to provide quantitative measurements of random process models' accuracy. The benefits of the proposed methods are demonstrated by the example of designing the microstructure of polymer nanocomposites. This dissertation provides material-generic, intelligent modeling/design methodologies and techniques to accelerate the process of analyzing and designing new microstructural materials system.

  16. A Statistical Weather-Driven Streamflow Model: Enabling future flow predictions in data-scarce headwater streams

    NASA Astrophysics Data System (ADS)

    Rosner, A.; Letcher, B. H.; Vogel, R. M.

    2014-12-01

    Predicting streamflow in headwaters and over a broad spatial scale pose unique challenges due to limited data availability. Flow observation gages for headwaters streams are less common than for larger rivers, and gages with records lengths of ten year or more are even more scarce. Thus, there is a great need for estimating streamflows in ungaged or sparsely-gaged headwaters. Further, there is often insufficient basin information to develop rainfall-runoff models that could be used to predict future flows under various climate scenarios. Headwaters in the northeastern U.S. are of particular concern to aquatic biologists, as these stream serve as essential habitat for native coldwater fish. In order to understand fish response to past or future environmental drivers, estimates of seasonal streamflow are needed. While there is limited flow data, there is a wealth of data for historic weather conditions. Observed data has been modeled to interpolate a spatially continuous historic weather dataset. (Mauer et al 2002). We present a statistical model developed by pairing streamflow observations with precipitation and temperature information for the same and preceding time-steps. We demonstrate this model's use to predict flow metrics at the seasonal time-step. While not a physical model, this statistical model represents the weather drivers. Since this model can predict flows not directly tied to reference gages, we can generate flow estimates for historic as well as potential future conditions.

  17. Biophysical model for assessment of risk of acute exposures in combination with low level chronic irradiation

    NASA Astrophysics Data System (ADS)

    Smirnova, O. A.

    A biophysical model is developed which describes the mortality dynamics in mammalian populations unexposed and exposed to radiation The model relates statistical biometric functions mortality rate life span probability density and life span probability with statistical characteristics and dynamics of a critical body system in individuals composing the population The model describing the dynamics of thrombocytopoiesis in nonirradiated and irradiated mammals is also developed this hematopoietic line being considered as the critical body system under exposures in question The mortality model constructed in the framework of the proposed approach was identified to reproduce the irradiation effects on populations of mice The most parameters of the thrombocytopoiesis model were determined from the data available in the literature on hematology and radiobiology the rest parameters were evaluated by fitting some experimental data on the dynamics of this system in acutely irradiated mice The successful verification of the thrombocytopoiesis model was fulfilled by the quantitative juxtaposition of the modeling predictions and experimental data on the dynamics of this system in mice exposed to either acute or chronic irradiation at wide ranges of doses and dose rates It is important that only experimental data on the mortality rate in nonirradiated population and the relevant statistical characteristics of the thrombocytopoiesis system in mice which are also available in the literature on radiobiology are needed for the final identification of

  18. Discriminative Random Field Models for Subsurface Contamination Uncertainty Quantification

    NASA Astrophysics Data System (ADS)

    Arshadi, M.; Abriola, L. M.; Miller, E. L.; De Paolis Kaluza, C.

    2017-12-01

    Application of flow and transport simulators for prediction of the release, entrapment, and persistence of dense non-aqueous phase liquids (DNAPLs) and associated contaminant plumes is a computationally intensive process that requires specification of a large number of material properties and hydrologic/chemical parameters. Given its computational burden, this direct simulation approach is particularly ill-suited for quantifying both the expected performance and uncertainty associated with candidate remediation strategies under real field conditions. Prediction uncertainties primarily arise from limited information about contaminant mass distributions, as well as the spatial distribution of subsurface hydrologic properties. Application of direct simulation to quantify uncertainty would, thus, typically require simulating multiphase flow and transport for a large number of permeability and release scenarios to collect statistics associated with remedial effectiveness, a computationally prohibitive process. The primary objective of this work is to develop and demonstrate a methodology that employs measured field data to produce equi-probable stochastic representations of a subsurface source zone that capture the spatial distribution and uncertainty associated with key features that control remediation performance (i.e., permeability and contamination mass). Here we employ probabilistic models known as discriminative random fields (DRFs) to synthesize stochastic realizations of initial mass distributions consistent with known, and typically limited, site characterization data. Using a limited number of full scale simulations as training data, a statistical model is developed for predicting the distribution of contaminant mass (e.g., DNAPL saturation and aqueous concentration) across a heterogeneous domain. Monte-Carlo sampling methods are then employed, in conjunction with the trained statistical model, to generate realizations conditioned on measured borehole data. Performance of the statistical model is illustrated through comparisons of generated realizations with the `true' numerical simulations. Finally, we demonstrate how these realizations can be used to determine statistically optimal locations for further interrogation of the subsurface.

  19. Importance of regional variation in conservation planning: A rangewide example of the Greater Sage-Grouse

    USGS Publications Warehouse

    Doherty, Kevin E.; Evans, Jeffrey S.; Coates, Peter S.; Juliusson, Lara; Fedy, Bradley C.

    2016-01-01

    We developed rangewide population and habitat models for Greater Sage-Grouse (Centrocercus urophasianus) that account for regional variation in habitat selection and relative densities of birds for use in conservation planning and risk assessments. We developed a probabilistic model of occupied breeding habitat by statistically linking habitat characteristics within 4 miles of an occupied lek using a nonlinear machine learning technique (Random Forests). Habitat characteristics used were quantified in GIS and represent standard abiotic and biotic variables related to sage-grouse biology. Statistical model fit was high (mean correctly classified = 82.0%, range = 75.4–88.0%) as were cross-validation statistics (mean = 80.9%, range = 75.1–85.8%). We also developed a spatially explicit model to quantify the relative density of breeding birds across each Greater Sage-Grouse management zone. The models demonstrate distinct clustering of relative abundance of sage-grouse populations across all management zones. On average, approximately half of the breeding population is predicted to be within 10% of the occupied range. We also found that 80% of sage-grouse populations were contained in 25–34% of the occupied range within each management zone. Our rangewide population and habitat models account for regional variation in habitat selection and the relative densities of birds, and thus, they can serve as a consistent and common currency to assess how sage-grouse habitat and populations overlap with conservation actions or threats over the entire sage-grouse range. We also quantified differences in functional habitat responses and disturbance thresholds across the Western Association of Fish and Wildlife Agencies (WAFWA) management zones using statistical relationships identified during habitat modeling. Even for a species as specialized as Greater Sage-Grouse, our results show that ecological context matters in both the strength of habitat selection (i.e., functional response curves) and response to disturbance.

  20. A Selective Overview of Variable Selection in High Dimensional Feature Space

    PubMed Central

    Fan, Jianqing

    2010-01-01

    High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of non-concave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultra-high dimensional variable selection, with emphasis on independence screening and two-scale methods. PMID:21572976

  1. A data-based conservation planning tool for Florida panthers

    USGS Publications Warehouse

    Murrow, Jennifer L.; Thatcher, Cindy A.; Van Manen, Frank T.; Clark, Joseph D.

    2013-01-01

    Habitat loss and fragmentation are the greatest threats to the endangered Florida panther (Puma concolor coryi). We developed a data-based habitat model and user-friendly interface so that land managers can objectively evaluate Florida panther habitat. We used a geographic information system (GIS) and the Mahalanobis distance statistic (D2) to develop a model based on broad-scale landscape characteristics associated with panther home ranges. Variables in our model were Euclidean distance to natural land cover, road density, distance to major roads, human density, amount of natural land cover, amount of semi-natural land cover, amount of permanent or semi-permanent flooded area–open water, and a cost–distance variable. We then developed a Florida Panther Habitat Estimator tool, which automates and replicates the GIS processes used to apply the statistical habitat model. The estimator can be used by persons with moderate GIS skills to quantify effects of land-use changes on panther habitat at local and landscape scales. Example applications of the tool are presented.

  2. The Assessment of Climatological Impacts on Agricultural Production and Residential Energy Demand

    NASA Astrophysics Data System (ADS)

    Cooter, Ellen Jean

    The assessment of climatological impacts on selected economic activities is presented as a multi-step, inter -disciplinary problem. The assessment process which is addressed explicitly in this report focuses on (1) user identification, (2) direct impact model selection, (3) methodological development, (4) product development and (5) product communication. Two user groups of major economic importance were selected for study; agriculture and gas utilities. The broad agricultural sector is further defined as U.S.A. corn production. The general category of utilities is narrowed to Oklahoma residential gas heating demand. The CERES physiological growth model was selected as the process model for corn production. The statistical analysis for corn production suggests that (1) although this is a statistically complex model, it can yield useful impact information, (2) as a result of output distributional biases, traditional statistical techniques are not adequate analytical tools, (3) the model yield distribution as a whole is probably non-Gausian, particularly in the tails and (4) there appears to be identifiable weekly patterns of forecasted yields throughout the growing season. Agricultural quantities developed include point yield impact estimates and distributional characteristics, geographic corn weather distributions, return period estimates, decision making criteria (confidence limits) and time series of indices. These products were communicated in economic terms through the use of a Bayesian decision example and an econometric model. The NBSLD energy load model was selected to represent residential gas heating consumption. A cursory statistical analysis suggests relationships among weather variables across the Oklahoma study sites. No linear trend in "technology -free" modeled energy demand or input weather variables which would correspond to that contained in observed state -level residential energy use was detected. It is suggested that this trend is largely the result of non-weather factors such as population and home usage patterns rather than regional climate change. Year-to-year changes in modeled residential heating demand on the order of 10('6) Btu's per household were determined and later related to state -level components of the Oklahoma economy. Products developed include the definition of regional forecast areas, likelihood estimates of extreme seasonal conditions and an energy/climate index. This information is communicated in economic terms through an input/output model which is used to estimate changes in Gross State Product and Household income attributable to weather variability.

  3. A stochastic fractional dynamics model of space-time variability of rain

    NASA Astrophysics Data System (ADS)

    Kundu, Prasun K.; Travis, James E.

    2013-09-01

    varies in space and time in a highly irregular manner and is described naturally in terms of a stochastic process. A characteristic feature of rainfall statistics is that they depend strongly on the space-time scales over which rain data are averaged. A spectral model of precipitation has been developed based on a stochastic differential equation of fractional order for the point rain rate, which allows a concise description of the second moment statistics of rain at any prescribed space-time averaging scale. The model is thus capable of providing a unified description of the statistics of both radar and rain gauge data. The underlying dynamical equation can be expressed in terms of space-time derivatives of fractional orders that are adjusted together with other model parameters to fit the data. The form of the resulting spectrum gives the model adequate flexibility to capture the subtle interplay between the spatial and temporal scales of variability of rain but strongly constrains the predicted statistical behavior as a function of the averaging length and time scales. We test the model with radar and gauge data collected contemporaneously at the NASA TRMM ground validation sites located near Melbourne, Florida and on the Kwajalein Atoll, Marshall Islands in the tropical Pacific. We estimate the parameters by tuning them to fit the second moment statistics of radar data at the smaller spatiotemporal scales. The model predictions are then found to fit the second moment statistics of the gauge data reasonably well at these scales without any further adjustment.

  4. DEVELOPING MEANINGFUL COHORTS FOR HUMAN EXPOSURE MODELS

    EPA Science Inventory

    This paper summarizes numerous statistical analyses focused on the U.S. Environmental Protection Agency's Consolidated Human Activity Database (CHAD), used by many exposure modelers as the basis for data on what people do and where they spend their time. In doing so, modelers ...

  5. DEVELOPMENT OF THE VIRTUAL BEACH MODEL, PHASE 1: AN EMPIRICAL MODEL

    EPA Science Inventory

    With increasing attention focused on the use of multiple linear regression (MLR) modeling of beach fecal bacteria concentration, the validity of the entire statistical process should be carefully evaluated to assure satisfactory predictions. This work aims to identify pitfalls an...

  6. Demographic Accounting and Model-Building. Education and Development Technical Reports.

    ERIC Educational Resources Information Center

    Stone, Richard

    This report describes and develops a model for coordinating a variety of demographic and social statistics within a single framework. The framework proposed, together with its associated methods of analysis, serves both general and specific functions. The general aim of these functions is to give numerical definition to the pattern of society and…

  7. A new test statistic for climate models that includes field and spatial dependencies using Gaussian Markov random fields

    DOE PAGES

    Nosedal-Sanchez, Alvaro; Jackson, Charles S.; Huerta, Gabriel

    2016-07-20

    A new test statistic for climate model evaluation has been developed that potentially mitigates some of the limitations that exist for observing and representing field and space dependencies of climate phenomena. Traditionally such dependencies have been ignored when climate models have been evaluated against observational data, which makes it difficult to assess whether any given model is simulating observed climate for the right reasons. The new statistic uses Gaussian Markov random fields for estimating field and space dependencies within a first-order grid point neighborhood structure. We illustrate the ability of Gaussian Markov random fields to represent empirical estimates of fieldmore » and space covariances using "witch hat" graphs. We further use the new statistic to evaluate the tropical response of a climate model (CAM3.1) to changes in two parameters important to its representation of cloud and precipitation physics. Overall, the inclusion of dependency information did not alter significantly the recognition of those regions of parameter space that best approximated observations. However, there were some qualitative differences in the shape of the response surface that suggest how such a measure could affect estimates of model uncertainty.« less

  8. A new test statistic for climate models that includes field and spatial dependencies using Gaussian Markov random fields

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nosedal-Sanchez, Alvaro; Jackson, Charles S.; Huerta, Gabriel

    A new test statistic for climate model evaluation has been developed that potentially mitigates some of the limitations that exist for observing and representing field and space dependencies of climate phenomena. Traditionally such dependencies have been ignored when climate models have been evaluated against observational data, which makes it difficult to assess whether any given model is simulating observed climate for the right reasons. The new statistic uses Gaussian Markov random fields for estimating field and space dependencies within a first-order grid point neighborhood structure. We illustrate the ability of Gaussian Markov random fields to represent empirical estimates of fieldmore » and space covariances using "witch hat" graphs. We further use the new statistic to evaluate the tropical response of a climate model (CAM3.1) to changes in two parameters important to its representation of cloud and precipitation physics. Overall, the inclusion of dependency information did not alter significantly the recognition of those regions of parameter space that best approximated observations. However, there were some qualitative differences in the shape of the response surface that suggest how such a measure could affect estimates of model uncertainty.« less

  9. Nonlinear wave chaos: statistics of second harmonic fields.

    PubMed

    Zhou, Min; Ott, Edward; Antonsen, Thomas M; Anlage, Steven M

    2017-10-01

    Concepts from the field of wave chaos have been shown to successfully predict the statistical properties of linear electromagnetic fields in electrically large enclosures. The Random Coupling Model (RCM) describes these properties by incorporating both universal features described by Random Matrix Theory and the system-specific features of particular system realizations. In an effort to extend this approach to the nonlinear domain, we add an active nonlinear frequency-doubling circuit to an otherwise linear wave chaotic system, and we measure the statistical properties of the resulting second harmonic fields. We develop an RCM-based model of this system as two linear chaotic cavities coupled by means of a nonlinear transfer function. The harmonic field strengths are predicted to be the product of two statistical quantities and the nonlinearity characteristics. Statistical results from measurement-based calculation, RCM-based simulation, and direct experimental measurements are compared and show good agreement over many decades of power.

  10. Specifying and Refining a Complex Measurement Model.

    ERIC Educational Resources Information Center

    Levy, Roy; Mislevy, Robert J.

    This paper aims to describe a Bayesian approach to modeling and estimating cognitive models both in terms of statistical machinery and actual instrument development. Such a method taps the knowledge of experts to provide initial estimates for the probabilistic relationships among the variables in a multivariate latent variable model and refines…

  11. Routine Discovery of Complex Genetic Models using Genetic Algorithms

    PubMed Central

    Moore, Jason H.; Hahn, Lance W.; Ritchie, Marylyn D.; Thornton, Tricia A.; White, Bill C.

    2010-01-01

    Simulation studies are useful in various disciplines for a number of reasons including the development and evaluation of new computational and statistical methods. This is particularly true in human genetics and genetic epidemiology where new analytical methods are needed for the detection and characterization of disease susceptibility genes whose effects are complex, nonlinear, and partially or solely dependent on the effects of other genes (i.e. epistasis or gene-gene interaction). Despite this need, the development of complex genetic models that can be used to simulate data is not always intuitive. In fact, only a few such models have been published. We have previously developed a genetic algorithm approach to discovering complex genetic models in which two single nucleotide polymorphisms (SNPs) influence disease risk solely through nonlinear interactions. In this paper, we extend this approach for the discovery of high-order epistasis models involving three to five SNPs. We demonstrate that the genetic algorithm is capable of routinely discovering interesting high-order epistasis models in which each SNP influences risk of disease only through interactions with the other SNPs in the model. This study opens the door for routine simulation of complex gene-gene interactions among SNPs for the development and evaluation of new statistical and computational approaches for identifying common, complex multifactorial disease susceptibility genes. PMID:20948983

  12. Linking statistically-and physically-based models for improved streamflow simulation in gaged and ungaged watersheds

    Treesearch

    Jacob LaFontaine; Lauren Hay; Stacey Archfield; William Farmer; Julie Kiang

    2016-01-01

    The U.S. Geological Survey (USGS) has developed a National Hydrologic Model (NHM) to support coordinated, comprehensive and consistent hydrologic model development, and facilitate the application of hydrologic simulations within the continental US. The portion of the NHM located within the Gulf Coastal Plains and Ozarks Landscape Conservation Cooperative (GCPO LCC) is...

  13. Modeled Neutron and Charged-Particle Induced Nuclear Reaction Cross Sections for Radiochemistry in the Region of Yttrium, Zirconium, Niobium, and Molybdenum

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hoffman, R D; Kelley, K; Dietrich, F S

    2006-06-13

    We have developed a set of modeled nuclear reaction cross sections for use in radiochemical diagnostics. Systematics for the input parameters required by the Hauser-Feshbach statistical model were developed and used to calculate neutron, proton, and deuteron induced nuclear reaction cross sections for targets ranging from strontium (Z = 38) to rhodium (Z = 45).

  14. A statistical rain attenuation prediction model with application to the advanced communication technology satellite project. 1: Theoretical development and application to yearly predictions for selected cities in the United States

    NASA Technical Reports Server (NTRS)

    Manning, Robert M.

    1986-01-01

    A rain attenuation prediction model is described for use in calculating satellite communication link availability for any specific location in the world that is characterized by an extended record of rainfall. Such a formalism is necessary for the accurate assessment of such availability predictions in the case of the small user-terminal concept of the Advanced Communication Technology Satellite (ACTS) Project. The model employs the theory of extreme value statistics to generate the necessary statistical rainrate parameters from rain data in the form compiled by the National Weather Service. These location dependent rain statistics are then applied to a rain attenuation model to obtain a yearly prediction of the occurrence of attenuation on any satellite link at that location. The predictions of this model are compared to those of the Crane Two-Component Rain Model and some empirical data and found to be very good. The model is then used to calculate rain attenuation statistics at 59 locations in the United States (including Alaska and Hawaii) for the 20 GHz downlinks and 30 GHz uplinks of the proposed ACTS system. The flexibility of this modeling formalism is such that it allows a complete and unified treatment of the temporal aspects of rain attenuation that leads to the design of an optimum stochastic power control algorithm, the purpose of which is to efficiently counter such rain fades on a satellite link.

  15. Rainfall Downscaling Conditional on Upper-air Atmospheric Predictors: Improved Assessment of Rainfall Statistics in a Changing Climate

    NASA Astrophysics Data System (ADS)

    Langousis, Andreas; Mamalakis, Antonis; Deidda, Roberto; Marrocu, Marino

    2015-04-01

    To improve the level skill of Global Climate Models (GCMs) and Regional Climate Models (RCMs) in reproducing the statistics of rainfall at a basin level and at hydrologically relevant temporal scales (e.g. daily), two types of statistical approaches have been suggested. One is the statistical correction of climate model rainfall outputs using historical series of precipitation. The other is the use of stochastic models of rainfall to conditionally simulate precipitation series, based on large-scale atmospheric predictors produced by climate models (e.g. geopotential height, relative vorticity, divergence, mean sea level pressure). The latter approach, usually referred to as statistical rainfall downscaling, aims at reproducing the statistical character of rainfall, while accounting for the effects of large-scale atmospheric circulation (and, therefore, climate forcing) on rainfall statistics. While promising, statistical rainfall downscaling has not attracted much attention in recent years, since the suggested approaches involved complex (i.e. subjective or computationally intense) identification procedures of the local weather, in addition to demonstrating limited success in reproducing several statistical features of rainfall, such as seasonal variations, the distributions of dry and wet spell lengths, the distribution of the mean rainfall intensity inside wet periods, and the distribution of rainfall extremes. In an effort to remedy those shortcomings, Langousis and Kaleris (2014) developed a statistical framework for simulation of daily rainfall intensities conditional on upper air variables, which accurately reproduces the statistical character of rainfall at multiple time-scales. Here, we study the relative performance of: a) quantile-quantile (Q-Q) correction of climate model rainfall products, and b) the statistical downscaling scheme of Langousis and Kaleris (2014), in reproducing the statistical structure of rainfall, as well as rainfall extremes, at a regional level. This is done for an intermediate-sized catchment in Italy, i.e. the Flumendosa catchment, using climate model rainfall and atmospheric data from the ENSEMBLES project (http://ensembleseu.metoffice.com). In doing so, we split the historical rainfall record of mean areal precipitation (MAP) in 15-year calibration and 45-year validation periods, and compare the historical rainfall statistics to those obtained from: a) Q-Q corrected climate model rainfall products, and b) synthetic rainfall series generated by the suggested downscaling scheme. To our knowledge, this is the first time that climate model rainfall and statistically downscaled precipitation are compared to catchment-averaged MAP at a daily resolution. The obtained results are promising, since the proposed downscaling scheme is more accurate and robust in reproducing a number of historical rainfall statistics, independent of the climate model used and the length of the calibration period. This is particularly the case for the yearly rainfall maxima, where direct statistical correction of climate model rainfall outputs shows increased sensitivity to the length of the calibration period and the climate model used. The robustness of the suggested downscaling scheme in modeling rainfall extremes at a daily resolution, is a notable feature that can effectively be used to assess hydrologic risk at a regional level under changing climatic conditions. Acknowledgments The research project is implemented within the framework of the Action «Supporting Postdoctoral Researchers» of the Operational Program "Education and Lifelong Learning" (Action's Beneficiary: General Secretariat for Research and Technology), and is co-financed by the European Social Fund (ESF) and the Greek State. CRS4 highly acknowledges the contribution of the Sardinian regional authorities.

  16. Development and validation of a risk calculator predicting exercise-induced ventricular arrhythmia in patients with cardiovascular disease.

    PubMed

    Hermes, Ilarraza-Lomelí; Marianna, García-Saldivia; Jessica, Rojano-Castillo; Carlos, Barrera-Ramírez; Rafael, Chávez-Domínguez; María Dolores, Rius-Suárez; Pedro, Iturralde

    2016-10-01

    Mortality due to cardiovascular disease is often associated with ventricular arrhythmias. Nowadays, patients with cardiovascular disease are more encouraged to take part in physical training programs. Nevertheless, high-intensity exercise is associated to a higher risk for sudden death, even in apparently healthy people. During an exercise testing (ET), health care professionals provide patients, in a controlled scenario, an intense physiological stimulus that could precipitate cardiac arrhythmia in high risk individuals. There is still no clinical or statistical tool to predict this incidence. The aim of this study was to develop a statistical model to predict the incidence of exercise-induced potentially life-threatening ventricular arrhythmia (PLVA) during high intensity exercise. 6415 patients underwent a symptom-limited ET with a Balke ramp protocol. A multivariate logistic regression model where the primary outcome was PLVA was performed. Incidence of PLVA was 548 cases (8.5%). After a bivariate model, thirty one clinical or ergometric variables were statistically associated with PLVA and were included in the regression model. In the multivariate model, 13 of these variables were found to be statistically significant. A regression model (G) with a X(2) of 283.987 and a p<0.001, was constructed. Significant variables included: heart failure, antiarrhythmic drugs, myocardial lower-VD, age and use of digoxin, nitrates, among others. This study allows clinicians to identify patients at risk of ventricular tachycardia or couplets during exercise, and to take preventive measures or appropriate supervision. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  17. Macro scale models for freight railroad terminals.

    DOT National Transportation Integrated Search

    2016-03-02

    The project has developed a yard capacity model for macro-level analysis. The study considers the detailed sequence and scheduling in classification yards and their impacts on yard capacities simulate typical freight railroad terminals, and statistic...

  18. The use of analysis of variance procedures in biological studies

    USGS Publications Warehouse

    Williams, B.K.

    1987-01-01

    The analysis of variance (ANOVA) is widely used in biological studies, yet there remains considerable confusion among researchers about the interpretation of hypotheses being tested. Ambiguities arise when statistical designs are unbalanced, and in particular when not all combinations of design factors are represented in the data. This paper clarifies the relationship among hypothesis testing, statistical modelling and computing procedures in ANOVA for unbalanced data. A simple two-factor fixed effects design is used to illustrate three common parametrizations for ANOVA models, and some associations among these parametrizations are developed. Biologically meaningful hypotheses for main effects and interactions are given in terms of each parametrization, and procedures for testing the hypotheses are described. The standard statistical computing procedures in ANOVA are given along with their corresponding hypotheses. Throughout the development unbalanced designs are assumed and attention is given to problems that arise with missing cells.

  19. Computational methods to extract meaning from text and advance theories of human cognition.

    PubMed

    McNamara, Danielle S

    2011-01-01

    Over the past two decades, researchers have made great advances in the area of computational methods for extracting meaning from text. This research has to a large extent been spurred by the development of latent semantic analysis (LSA), a method for extracting and representing the meaning of words using statistical computations applied to large corpora of text. Since the advent of LSA, researchers have developed and tested alternative statistical methods designed to detect and analyze meaning in text corpora. This research exemplifies how statistical models of semantics play an important role in our understanding of cognition and contribute to the field of cognitive science. Importantly, these models afford large-scale representations of human knowledge and allow researchers to explore various questions regarding knowledge, discourse processing, text comprehension, and language. This topic includes the latest progress by the leading researchers in the endeavor to go beyond LSA. Copyright © 2010 Cognitive Science Society, Inc.

  20. Rigorous force field optimization principles based on statistical distance minimization

    DOE PAGES

    Vlcek, Lukas; Chialvo, Ariel A.

    2015-10-12

    We use the concept of statistical distance to define a measure of distinguishability between a pair of statistical mechanical systems, i.e., a model and its target, and show that its minimization leads to general convergence of the model’s static measurable properties to those of the target. Here we exploit this feature to define a rigorous basis for the development of accurate and robust effective molecular force fields that are inherently compatible with coarse-grained experimental data. The new model optimization principles and their efficient implementation are illustrated through selected examples, whose outcome demonstrates the higher robustness and predictive accuracy of themore » approach compared to other currently used methods, such as force matching and relative entropy minimization. We also discuss relations between the newly developed principles and established thermodynamic concepts, which include the Gibbs-Bogoliubov inequality and the thermodynamic length.« less

  1. Fault Diagnosis for Rotating Machinery Using Vibration Measurement Deep Statistical Feature Learning.

    PubMed

    Li, Chuan; Sánchez, René-Vinicio; Zurita, Grover; Cerrada, Mariela; Cabrera, Diego

    2016-06-17

    Fault diagnosis is important for the maintenance of rotating machinery. The detection of faults and fault patterns is a challenging part of machinery fault diagnosis. To tackle this problem, a model for deep statistical feature learning from vibration measurements of rotating machinery is presented in this paper. Vibration sensor signals collected from rotating mechanical systems are represented in the time, frequency, and time-frequency domains, each of which is then used to produce a statistical feature set. For learning statistical features, real-value Gaussian-Bernoulli restricted Boltzmann machines (GRBMs) are stacked to develop a Gaussian-Bernoulli deep Boltzmann machine (GDBM). The suggested approach is applied as a deep statistical feature learning tool for both gearbox and bearing systems. The fault classification performances in experiments using this approach are 95.17% for the gearbox, and 91.75% for the bearing system. The proposed approach is compared to such standard methods as a support vector machine, GRBM and a combination model. In experiments, the best fault classification rate was detected using the proposed model. The results show that deep learning with statistical feature extraction has an essential improvement potential for diagnosing rotating machinery faults.

  2. Fault Diagnosis for Rotating Machinery Using Vibration Measurement Deep Statistical Feature Learning

    PubMed Central

    Li, Chuan; Sánchez, René-Vinicio; Zurita, Grover; Cerrada, Mariela; Cabrera, Diego

    2016-01-01

    Fault diagnosis is important for the maintenance of rotating machinery. The detection of faults and fault patterns is a challenging part of machinery fault diagnosis. To tackle this problem, a model for deep statistical feature learning from vibration measurements of rotating machinery is presented in this paper. Vibration sensor signals collected from rotating mechanical systems are represented in the time, frequency, and time-frequency domains, each of which is then used to produce a statistical feature set. For learning statistical features, real-value Gaussian-Bernoulli restricted Boltzmann machines (GRBMs) are stacked to develop a Gaussian-Bernoulli deep Boltzmann machine (GDBM). The suggested approach is applied as a deep statistical feature learning tool for both gearbox and bearing systems. The fault classification performances in experiments using this approach are 95.17% for the gearbox, and 91.75% for the bearing system. The proposed approach is compared to such standard methods as a support vector machine, GRBM and a combination model. In experiments, the best fault classification rate was detected using the proposed model. The results show that deep learning with statistical feature extraction has an essential improvement potential for diagnosing rotating machinery faults. PMID:27322273

  3. Use of a statistical model of the whole femur in a large scale, multi-model study of femoral neck fracture risk.

    PubMed

    Bryan, Rebecca; Nair, Prasanth B; Taylor, Mark

    2009-09-18

    Interpatient variability is often overlooked in orthopaedic computational studies due to the substantial challenges involved in sourcing and generating large numbers of bone models. A statistical model of the whole femur incorporating both geometric and material property variation was developed as a potential solution to this problem. The statistical model was constructed using principal component analysis, applied to 21 individual computer tomography scans. To test the ability of the statistical model to generate realistic, unique, finite element (FE) femur models it was used as a source of 1000 femurs to drive a study on femoral neck fracture risk. The study simulated the impact of an oblique fall to the side, a scenario known to account for a large proportion of hip fractures in the elderly and have a lower fracture load than alternative loading approaches. FE model generation, application of subject specific loading and boundary conditions, FE processing and post processing of the solutions were completed automatically. The generated models were within the bounds of the training data used to create the statistical model with a high mesh quality, able to be used directly by the FE solver without remeshing. The results indicated that 28 of the 1000 femurs were at highest risk of fracture. Closer analysis revealed the percentage of cortical bone in the proximal femur to be a crucial differentiator between the failed and non-failed groups. The likely fracture location was indicated to be intertrochantic. Comparison to previous computational, clinical and experimental work revealed support for these findings.

  4. Model Fit and Item Factor Analysis: Overfactoring, Underfactoring, and a Program to Guide Interpretation.

    PubMed

    Clark, D Angus; Bowles, Ryan P

    2018-04-23

    In exploratory item factor analysis (IFA), researchers may use model fit statistics and commonly invoked fit thresholds to help determine the dimensionality of an assessment. However, these indices and thresholds may mislead as they were developed in a confirmatory framework for models with continuous, not categorical, indicators. The present study used Monte Carlo simulation methods to investigate the ability of popular model fit statistics (chi-square, root mean square error of approximation, the comparative fit index, and the Tucker-Lewis index) and their standard cutoff values to detect the optimal number of latent dimensions underlying sets of dichotomous items. Models were fit to data generated from three-factor population structures that varied in factor loading magnitude, factor intercorrelation magnitude, number of indicators, and whether cross loadings or minor factors were included. The effectiveness of the thresholds varied across fit statistics, and was conditional on many features of the underlying model. Together, results suggest that conventional fit thresholds offer questionable utility in the context of IFA.

  5. The GenABEL Project for statistical genomics.

    PubMed

    Karssen, Lennart C; van Duijn, Cornelia M; Aulchenko, Yurii S

    2016-01-01

    Development of free/libre open source software is usually done by a community of people with an interest in the tool. For scientific software, however, this is less often the case. Most scientific software is written by only a few authors, often a student working on a thesis. Once the paper describing the tool has been published, the tool is no longer developed further and is left to its own device. Here we describe the broad, multidisciplinary community we formed around a set of tools for statistical genomics. The GenABEL project for statistical omics actively promotes open interdisciplinary development of statistical methodology and its implementation in efficient and user-friendly software under an open source licence. The software tools developed withing the project collectively make up the GenABEL suite, which currently consists of eleven tools. The open framework of the project actively encourages involvement of the community in all stages, from formulation of methodological ideas to application of software to specific data sets. A web forum is used to channel user questions and discussions, further promoting the use of the GenABEL suite. Developer discussions take place on a dedicated mailing list, and development is further supported by robust development practices including use of public version control, code review and continuous integration. Use of this open science model attracts contributions from users and developers outside the "core team", facilitating agile statistical omics methodology development and fast dissemination.

  6. Fully Bayesian tests of neutrality using genealogical summary statistics.

    PubMed

    Drummond, Alexei J; Suchard, Marc A

    2008-10-31

    Many data summary statistics have been developed to detect departures from neutral expectations of evolutionary models. However questions about the neutrality of the evolution of genetic loci within natural populations remain difficult to assess. One critical cause of this difficulty is that most methods for testing neutrality make simplifying assumptions simultaneously about the mutational model and the population size model. Consequentially, rejecting the null hypothesis of neutrality under these methods could result from violations of either or both assumptions, making interpretation troublesome. Here we harness posterior predictive simulation to exploit summary statistics of both the data and model parameters to test the goodness-of-fit of standard models of evolution. We apply the method to test the selective neutrality of molecular evolution in non-recombining gene genealogies and we demonstrate the utility of our method on four real data sets, identifying significant departures of neutrality in human influenza A virus, even after controlling for variation in population size. Importantly, by employing a full model-based Bayesian analysis, our method separates the effects of demography from the effects of selection. The method also allows multiple summary statistics to be used in concert, thus potentially increasing sensitivity. Furthermore, our method remains useful in situations where analytical expectations and variances of summary statistics are not available. This aspect has great potential for the analysis of temporally spaced data, an expanding area previously ignored for limited availability of theory and methods.

  7. Development of a Bayesian Belief Network Runway Incursion and Excursion Model

    NASA Technical Reports Server (NTRS)

    Green, Lawrence L.

    2014-01-01

    In a previous work, a statistical analysis of runway incursion (RI) event data was conducted to ascertain the relevance of this data to the top ten Technical Challenges (TC) of the National Aeronautics and Space Administration (NASA) Aviation Safety Program (AvSP). The study revealed connections to several of the AvSP top ten TC and identified numerous primary causes and contributing factors of RI events. The statistical analysis served as the basis for developing a system-level Bayesian Belief Network (BBN) model for RI events, also previously reported. Through literature searches and data analysis, this RI event network has now been extended to also model runway excursion (RE) events. These RI and RE event networks have been further modified and vetted by a Subject Matter Expert (SME) panel. The combined system-level BBN model will allow NASA to generically model the causes of RI and RE events and to assess the effectiveness of technology products being developed under NASA funding. These products are intended to reduce the frequency of runway safety incidents/accidents, and to improve runway safety in general. The development and structure of the BBN for both RI and RE events are documented in this paper.

  8. A novel risk score model for prediction of contrast-induced nephropathy after emergent percutaneous coronary intervention.

    PubMed

    Lin, Kai-Yang; Zheng, Wei-Ping; Bei, Wei-Jie; Chen, Shi-Qun; Islam, Sheikh Mohammed Shariful; Liu, Yong; Xue, Lin; Tan, Ning; Chen, Ji-Yan

    2017-03-01

    A few studies developed simple risk model for predicting CIN with poor prognosis after emergent PCI. The study aimed to develop and validate a novel tool for predicting the risk of contrast-induced nephropathy (CIN) in patients undergoing emergent percutaneous coronary intervention (PCI). 692 consecutive patients undergoing emergent PCI between January 2010 and December 2013 were randomly (2:1) assigned to a development dataset (n=461) and a validation dataset (n=231). Multivariate logistic regression was applied to identify independent predictors of CIN, and established CIN predicting model, whose prognostic accuracy was assessed using the c-statistic for discrimination and the Hosmere Lemeshow test for calibration. The overall incidence of CIN was 55(7.9%). A total of 11 variables were analyzed, including age >75years old, baseline serum creatinine (SCr)>1.5mg/dl, hypotension and the use of intra-aortic balloon pump(IABP), which were identified to enter risk score model (Chen). The incidence of CIN was 32(6.9%) in the development dataset (in low risk (score=0), 1.0%, moderate risk (score:1-2), 13.4%, high risk (score≥3), 90.0%). Compared to the classical Mehran's and ACEF CIN risk score models, the risk score (Chen) across the subgroup of the study population exhibited similar discrimination and predictive ability on CIN (c-statistic:0.828, 0.776, 0.853, respectively), in-hospital mortality, 2, 3-years mortality (c-statistic:0.738.0.750, 0.845, respectively) in the validation population. Our data showed that this simple risk model exhibited good discrimination and predictive ability on CIN, similar to Mehran's and ACEF score, and even on long-term mortality after emergent PCI. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  9. Measurement of positive direct current corona pulse in coaxial wire-cylinder gap

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yin, Han, E-mail: hanyin1986@gmail.com; Zhang, Bo, E-mail: shizbcn@mail.tsinghua.edu.cn; He, Jinliang, E-mail: hejl@tsinghua.edu.cn

    In this paper, a system is designed and developed to measure the positive corona current in coaxial wire-cylinder gaps. The characteristic parameters of corona current pulses, such as the amplitude, rise time, half-wave time, and repetition frequency, are statistically analyzed and a new set of empirical formulas are derived by numerical fitting. The influence of space charges on corona currents is tested by using three corona cages with different radii. A numerical method is used to solve a simplified ion-flow model to explain the influence of space charges. Based on the statistical results, a stochastic model is developed to simulatemore » the corona pulse trains. And this model is verified by comparing the simulated frequency-domain responses with the measured ones.« less

  10. The Developing Infant Creates a Curriculum for Statistical Learning.

    PubMed

    Smith, Linda B; Jayaraman, Swapnaa; Clerkin, Elizabeth; Yu, Chen

    2018-04-01

    New efforts are using head cameras and eye-trackers worn by infants to capture everyday visual environments from the point of view of the infant learner. From this vantage point, the training sets for statistical learning develop as the sensorimotor abilities of the infant develop, yielding a series of ordered datasets for visual learning that differ in content and structure between timepoints but are highly selective at each timepoint. These changing environments may constitute a developmentally ordered curriculum that optimizes learning across many domains. Future advances in computational models will be necessary to connect the developmentally changing content and statistics of infant experience to the internal machinery that does the learning. Copyright © 2018 Elsevier Ltd. All rights reserved.

  11. SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.

    PubMed

    Chu, Annie; Cui, Jenny; Dinov, Ivo D

    2009-03-01

    The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models.

  12. Development of LACIE CCEA-1 weather/wheat yield models. [regression analysis

    NASA Technical Reports Server (NTRS)

    Strommen, N. D.; Sakamoto, C. M.; Leduc, S. K.; Umberger, D. E. (Principal Investigator)

    1979-01-01

    The advantages and disadvantages of the casual (phenological, dynamic, physiological), statistical regression, and analog approaches to modeling for grain yield are examined. Given LACIE's primary goal of estimating wheat production for the large areas of eight major wheat-growing regions, the statistical regression approach of correlating historical yield and climate data offered the Center for Climatic and Environmental Assessment the greatest potential return within the constraints of time and data sources. The basic equation for the first generation wheat-yield model is given. Topics discussed include truncation, trend variable, selection of weather variables, episodic events, strata selection, operational data flow, weighting, and model results.

  13. Logistic and linear regression model documentation for statistical relations between continuous real-time and discrete water-quality constituents in the Kansas River, Kansas, July 2012 through June 2015

    USGS Publications Warehouse

    Foster, Guy M.; Graham, Jennifer L.

    2016-04-06

    The Kansas River is a primary source of drinking water for about 800,000 people in northeastern Kansas. Source-water supplies are treated by a combination of chemical and physical processes to remove contaminants before distribution. Advanced notification of changing water-quality conditions and cyanobacteria and associated toxin and taste-and-odor compounds provides drinking-water treatment facilities time to develop and implement adequate treatment strategies. The U.S. Geological Survey (USGS), in cooperation with the Kansas Water Office (funded in part through the Kansas State Water Plan Fund), and the City of Lawrence, the City of Topeka, the City of Olathe, and Johnson County Water One, began a study in July 2012 to develop statistical models at two Kansas River sites located upstream from drinking-water intakes. Continuous water-quality monitors have been operated and discrete-water quality samples have been collected on the Kansas River at Wamego (USGS site number 06887500) and De Soto (USGS site number 06892350) since July 2012. Continuous and discrete water-quality data collected during July 2012 through June 2015 were used to develop statistical models for constituents of interest at the Wamego and De Soto sites. Logistic models to continuously estimate the probability of occurrence above selected thresholds were developed for cyanobacteria, microcystin, and geosmin. Linear regression models to continuously estimate constituent concentrations were developed for major ions, dissolved solids, alkalinity, nutrients (nitrogen and phosphorus species), suspended sediment, indicator bacteria (Escherichia coli, fecal coliform, and enterococci), and actinomycetes bacteria. These models will be used to provide real-time estimates of the probability that cyanobacteria and associated compounds exceed thresholds and of the concentrations of other water-quality constituents in the Kansas River. The models documented in this report are useful for characterizing changes in water-quality conditions through time, characterizing potentially harmful cyanobacterial events, and indicating changes in water-quality conditions that may affect drinking-water treatment processes.

  14. Analysis of the dependence of extreme rainfalls

    NASA Astrophysics Data System (ADS)

    Padoan, Simone; Ancey, Christophe; Parlange, Marc

    2010-05-01

    The aim of spatial analysis is to quantitatively describe the behavior of environmental phenomena such as precipitation levels, wind speed or daily temperatures. A number of generic approaches to spatial modeling have been developed[1], but these are not necessarily ideal for handling extremal aspects given their focus on mean process levels. The areal modelling of the extremes of a natural process observed at points in space is important in environmental statistics; for example, understanding extremal spatial rainfall is crucial in flood protection. In light of recent concerns over climate change, the use of robust mathematical and statistical methods for such analyses has grown in importance. Multivariate extreme value models and the class of maxstable processes [2] have a similar asymptotic motivation to the univariate Generalized Extreme Value (GEV) distribution , but providing a general approach to modeling extreme processes incorporating temporal or spatial dependence. Statistical methods for max-stable processes and data analyses of practical problems are discussed by [3] and [4]. This work illustrates methods to the statistical modelling of spatial extremes and gives examples of their use by means of a real extremal data analysis of Switzerland precipitation levels. [1] Cressie, N. A. C. (1993). Statistics for Spatial Data. Wiley, New York. [2] de Haan, L and Ferreria A. (2006). Extreme Value Theory An Introduction. Springer, USA. [3] Padoan, S. A., Ribatet, M and Sisson, S. A. (2009). Likelihood-Based Inference for Max-Stable Processes. Journal of the American Statistical Association, Theory & Methods. In press. [4] Davison, A. C. and Gholamrezaee, M. (2009), Geostatistics of extremes. Journal of the Royal Statistical Society, Series B. To appear.

  15. Modelling of peak temperature during friction stir processing of magnesium alloy AZ91

    NASA Astrophysics Data System (ADS)

    Vaira Vignesh, R.; Padmanaban, R.

    2018-02-01

    Friction stir processing (FSP) is a solid state processing technique with potential to modify the properties of the material through microstructural modification. The study of heat transfer in FSP aids in the identification of defects like flash, inadequate heat input, poor material flow and mixing etc. In this paper, transient temperature distribution during FSP of magnesium alloy AZ91 was simulated using finite element modelling. The numerical model results were validated using the experimental results from the published literature. The model was used to predict the peak temperature obtained during FSP for various process parameter combinations. The simulated peak temperature results were used to develop a statistical model. The effect of process parameters namely tool rotation speed, tool traverse speed and shoulder diameter of the tool on the peak temperature was investigated using the developed statistical model. It was found that peak temperature was directly proportional to tool rotation speed and shoulder diameter and inversely proportional to tool traverse speed.

  16. A diagnostic model for chronic hypersensitivity pneumonitis

    PubMed Central

    Johannson, Kerri A; Elicker, Brett M; Vittinghoff, Eric; Assayag, Deborah; de Boer, Kaïssa; Golden, Jeffrey A; Jones, Kirk D; King, Talmadge E; Koth, Laura L; Lee, Joyce S; Ley, Brett; Wolters, Paul J; Collard, Harold R

    2017-01-01

    The objective of this study was to develop a diagnostic model that allows for a highly specific diagnosis of chronic hypersensitivity pneumonitis using clinical and radiological variables alone. Chronic hypersensitivity pneumonitis and other interstitial lung disease cases were retrospectively identified from a longitudinal database. High-resolution CT scans were blindly scored for radiographic features (eg, ground-glass opacity, mosaic perfusion) as well as the radiologist’s diagnostic impression. Candidate models were developed then evaluated using clinical and radiographic variables and assessed by the cross-validated C-statistic. Forty-four chronic hypersensitivity pneumonitis and eighty other interstitial lung disease cases were identified. Two models were selected based on their statistical performance, clinical applicability and face validity. Key model variables included age, down feather and/or bird exposure, radiographic presence of ground-glass opacity and mosaic perfusion and moderate or high confidence in the radiographic impression of chronic hypersensitivity pneumonitis. Models were internally validated with good performance, and cut-off values were established that resulted in high specificity for a diagnosis of chronic hypersensitivity pneumonitis. PMID:27245779

  17. Model identification using stochastic differential equation grey-box models in diabetes.

    PubMed

    Duun-Henriksen, Anne Katrine; Schmidt, Signe; Røge, Rikke Meldgaard; Møller, Jonas Bech; Nørgaard, Kirsten; Jørgensen, John Bagterp; Madsen, Henrik

    2013-03-01

    The acceptance of virtual preclinical testing of control algorithms is growing and thus also the need for robust and reliable models. Models based on ordinary differential equations (ODEs) can rarely be validated with standard statistical tools. Stochastic differential equations (SDEs) offer the possibility of building models that can be validated statistically and that are capable of predicting not only a realistic trajectory, but also the uncertainty of the prediction. In an SDE, the prediction error is split into two noise terms. This separation ensures that the errors are uncorrelated and provides the possibility to pinpoint model deficiencies. An identifiable model of the glucoregulatory system in a type 1 diabetes mellitus (T1DM) patient is used as the basis for development of a stochastic-differential-equation-based grey-box model (SDE-GB). The parameters are estimated on clinical data from four T1DM patients. The optimal SDE-GB is determined from likelihood-ratio tests. Finally, parameter tracking is used to track the variation in the "time to peak of meal response" parameter. We found that the transformation of the ODE model into an SDE-GB resulted in a significant improvement in the prediction and uncorrelated errors. Tracking of the "peak time of meal absorption" parameter showed that the absorption rate varied according to meal type. This study shows the potential of using SDE-GBs in diabetes modeling. Improved model predictions were obtained due to the separation of the prediction error. SDE-GBs offer a solid framework for using statistical tools for model validation and model development. © 2013 Diabetes Technology Society.

  18. Predicting survival of Escherichia coli O157:H7 in dry fermented sausage using artificial neural networks.

    PubMed

    Palanichamy, A; Jayas, D S; Holley, R A

    2008-01-01

    The Canadian Food Inspection Agency required the meat industry to ensure Escherichia coli O157:H7 does not survive (experiences > or = 5 log CFU/g reduction) in dry fermented sausage (salami) during processing after a series of foodborne illness outbreaks resulting from this pathogenic bacterium occurred. The industry is in need of an effective technique like predictive modeling for estimating bacterial viability, because traditional microbiological enumeration is a time-consuming and laborious method. The accuracy and speed of artificial neural networks (ANNs) for this purpose is an attractive alternative (developed from predictive microbiology), especially for on-line processing in industry. Data from a study of interactive effects of different levels of pH, water activity, and the concentrations of allyl isothiocyanate at various times during sausage manufacture in reducing numbers of E. coli O157:H7 were collected. Data were used to develop predictive models using a general regression neural network (GRNN), a form of ANN, and a statistical linear polynomial regression technique. Both models were compared for their predictive error, using various statistical indices. GRNN predictions for training and test data sets had less serious errors when compared with the statistical model predictions. GRNN models were better and slightly better for training and test sets, respectively, than was the statistical model. Also, GRNN accurately predicted the level of allyl isothiocyanate required, ensuring a 5-log reduction, when an appropriate production set was created by interpolation. Because they are simple to generate, fast, and accurate, ANN models may be of value for industrial use in dry fermented sausage manufacture to reduce the hazard associated with E. coli O157:H7 in fresh beef and permit production of consistently safe products from this raw material.

  19. Keep it simple - A case study of model development in the context of the Dynamic Stocks and Flows (DSF) task

    NASA Astrophysics Data System (ADS)

    Halbrügge, Marc

    2010-12-01

    This paper describes the creation of a cognitive model submitted to the ‘Dynamic Stocks and Flows’ (DSF) modeling challenge. This challenge aims at comparing computational cognitive models for human behavior during an open ended control task. Participants in the modeling competition were provided with a simulation environment and training data for benchmarking their models while the actual specification of the competition task was withheld. To meet this challenge, the cognitive model described here was designed and optimized for generalizability. Only two simple assumptions about human problem solving were used to explain the empirical findings of the training data. In-depth analysis of the data set prior to the development of the model led to the dismissal of correlations or other parametric statistics as goodness-of-fit indicators. A new statistical measurement based on rank orders and sequence matching techniques is being proposed instead. This measurement, when being applied to the human sample, also identifies clusters of subjects that use different strategies for the task. The acceptability of the fits achieved by the model is verified using permutation tests.

  20. Statistical Analysis and Time Series Modeling of Air Traffic Operations Data From Flight Service Stations and Terminal Radar Approach Control Facilities : Two Case Studies

    DOT National Transportation Integrated Search

    1981-10-01

    Two statistical procedures have been developed to estimate hourly or daily aircraft counts. These counts can then be transformed into estimates of instantaneous air counts. The first procedure estimates the stable (deterministic) mean level of hourly...

  1. Knowledge-Sharing Intention among Information Professionals in Nigeria: A Statistical Analysis

    ERIC Educational Resources Information Center

    Tella, Adeyinka

    2016-01-01

    In this study, the researcher administered a survey and developed and tested a statistical model to examine the factors that determine the intention of information professionals in Nigeria to share knowledge with their colleagues. The result revealed correlations between the overall score for intending to share knowledge and other…

  2. A statistical approach to optimizing concrete mixture design.

    PubMed

    Ahmad, Shamsad; Alghamdi, Saeid A

    2014-01-01

    A step-by-step statistical approach is proposed to obtain optimum proportioning of concrete mixtures using the data obtained through a statistically planned experimental program. The utility of the proposed approach for optimizing the design of concrete mixture is illustrated considering a typical case in which trial mixtures were considered according to a full factorial experiment design involving three factors and their three levels (3(3)). A total of 27 concrete mixtures with three replicates (81 specimens) were considered by varying the levels of key factors affecting compressive strength of concrete, namely, water/cementitious materials ratio (0.38, 0.43, and 0.48), cementitious materials content (350, 375, and 400 kg/m(3)), and fine/total aggregate ratio (0.35, 0.40, and 0.45). The experimental data were utilized to carry out analysis of variance (ANOVA) and to develop a polynomial regression model for compressive strength in terms of the three design factors considered in this study. The developed statistical model was used to show how optimization of concrete mixtures can be carried out with different possible options.

  3. A Statistical Approach to Optimizing Concrete Mixture Design

    PubMed Central

    Alghamdi, Saeid A.

    2014-01-01

    A step-by-step statistical approach is proposed to obtain optimum proportioning of concrete mixtures using the data obtained through a statistically planned experimental program. The utility of the proposed approach for optimizing the design of concrete mixture is illustrated considering a typical case in which trial mixtures were considered according to a full factorial experiment design involving three factors and their three levels (33). A total of 27 concrete mixtures with three replicates (81 specimens) were considered by varying the levels of key factors affecting compressive strength of concrete, namely, water/cementitious materials ratio (0.38, 0.43, and 0.48), cementitious materials content (350, 375, and 400 kg/m3), and fine/total aggregate ratio (0.35, 0.40, and 0.45). The experimental data were utilized to carry out analysis of variance (ANOVA) and to develop a polynomial regression model for compressive strength in terms of the three design factors considered in this study. The developed statistical model was used to show how optimization of concrete mixtures can be carried out with different possible options. PMID:24688405

  4. Assessment and prediction of inter-joint upper limb movement correlations based on kinematic analysis and statistical regression

    NASA Astrophysics Data System (ADS)

    Toth-Tascau, Mirela; Balanean, Flavia; Krepelka, Mircea

    2013-10-01

    Musculoskeletal impairment of the upper limb can cause difficulties in performing basic daily activities. Three dimensional motion analyses can provide valuable data of arm movement in order to precisely determine arm movement and inter-joint coordination. The purpose of this study was to develop a method to evaluate the degree of impairment based on the influence of shoulder movements in the amplitude of elbow flexion and extension based on the assumption that a lack of motion of the elbow joint will be compensated by an increased shoulder activity. In order to develop and validate a statistical model, one healthy young volunteer has been involved in the study. The activity of choice simulated blowing the nose, starting from a slight flexion of the elbow and raising the hand until the middle finger touches the tip of the nose and return to the start position. Inter-joint coordination between the elbow and shoulder movements showed significant correlation. Statistical regression was used to fit an equation model describing the influence of shoulder movements on the elbow mobility. The study provides a brief description of the kinematic analysis protocol and statistical models that may be useful in describing the relation between inter-joint movements of daily activities.

  5. A model for indexing medical documents combining statistical and symbolic knowledge.

    PubMed

    Avillach, Paul; Joubert, Michel; Fieschi, Marius

    2007-10-11

    To develop and evaluate an information processing method based on terminologies, in order to index medical documents in any given documentary context. We designed a model using both symbolic general knowledge extracted from the Unified Medical Language System (UMLS) and statistical knowledge extracted from a domain of application. Using statistical knowledge allowed us to contextualize the general knowledge for every particular situation. For each document studied, the extracted terms are ranked to highlight the most significant ones. The model was tested on a set of 17,079 French standardized discharge summaries (SDSs). The most important ICD-10 term of each SDS was ranked 1st or 2nd by the method in nearly 90% of the cases. The use of several terminologies leads to more precise indexing. The improvement achieved in the models implementation performances as a result of using semantic relationships is encouraging.

  6. Ensuring Positiveness of the Scaled Difference Chi-square Test Statistic.

    PubMed

    Satorra, Albert; Bentler, Peter M

    2010-06-01

    A scaled difference test statistic [Formula: see text] that can be computed from standard software of structural equation models (SEM) by hand calculations was proposed in Satorra and Bentler (2001). The statistic [Formula: see text] is asymptotically equivalent to the scaled difference test statistic T̄(d) introduced in Satorra (2000), which requires more involved computations beyond standard output of SEM software. The test statistic [Formula: see text] has been widely used in practice, but in some applications it is negative due to negativity of its associated scaling correction. Using the implicit function theorem, this note develops an improved scaling correction leading to a new scaled difference statistic T̄(d) that avoids negative chi-square values.

  7. Terminology, concepts, and models in genetic epidemiology.

    PubMed

    Teare, M Dawn; Koref, Mauro F Santibàñez

    2011-01-01

    Genetic epidemiology brings together approaches and techniques developed in mathematical genetics and statistics, medical genetics, quantitative genetics, and epidemiology. In the 1980s, the focus was on the mapping and identification of genes where defects had large effects at the individual level. More recently, statistical and experimental advances have made possible to identify and characterise genes associated with small effects at the individual level. In this chapter, we provide a brief outline of the models, concepts, and terminology used in genetic epidemiology.

  8. Statistical Tools for Fitting Models of the Population Consequences of Acoustic Disturbance to Data from Marine Mammal Populations (PCAD Tools II)

    DTIC Science & Technology

    2014-09-30

    Consequences of Acoustic Disturbance to Data from Marine Mammal Populations (PCAD Tools II) Len Thomas, John Harwood, Catriona Harris, and Robert S... mammals changes over time. This project will develop statistical tools to allow mathematical models of the population consequences of acoustic...disturbance to be fitted to data from marine mammal populations. We will work closely with Phase II of the ONR PCAD Working Group, and will provide

  9. Stochastical modeling for Viral Disease: Statistical Mechanics and Network Theory

    NASA Astrophysics Data System (ADS)

    Zhou, Hao; Deem, Michael

    2007-04-01

    Theoretical methods of statistical mechanics are developed and applied to study the immunological response against viral disease, such as dengue. We use this theory to show how the immune response to four different dengue serotypes may be sculpted. It is the ability of avian influenza, to change and to mix, that has given rise to the fear of a new human flu pandemic. Here we propose to utilize a scale free network based stochastic model to investigate the mitigation strategies and analyze the risk.

  10. Towards bridging the gap between climate change projections and maize producers in South Africa

    NASA Astrophysics Data System (ADS)

    Landman, Willem A.; Engelbrecht, Francois; Hewitson, Bruce; Malherbe, Johan; van der Merwe, Jacobus

    2018-05-01

    Multi-decadal regional projections of future climate change are introduced into a linear statistical model in order to produce an ensemble of austral mid-summer maximum temperature simulations for southern Africa. The statistical model uses atmospheric thickness fields from a high-resolution (0.5° × 0.5°) reanalysis-forced simulation as predictors in order to develop a linear recalibration model which represents the relationship between atmospheric thickness fields and gridded maximum temperatures across the region. The regional climate model, the conformal-cubic atmospheric model (CCAM), projects maximum temperatures increases over southern Africa to be in the order of 4 °C under low mitigation towards the end of the century or even higher. The statistical recalibration model is able to replicate these increasing temperatures, and the atmospheric thickness-maximum temperature relationship is shown to be stable under future climate conditions. Since dry land crop yields are not explicitly simulated by climate models but are sensitive to maximum temperature extremes, the effect of projected maximum temperature change on dry land crops of the Witbank maize production district of South Africa, assuming other factors remain unchanged, is then assessed by employing a statistical approach similar to the one used for maximum temperature projections.

  11. Cross-validation of Peak Oxygen Consumption Prediction Models From OMNI Perceived Exertion.

    PubMed

    Mays, R J; Goss, F L; Nagle, E F; Gallagher, M; Haile, L; Schafer, M A; Kim, K H; Robertson, R J

    2016-09-01

    This study cross-validated statistical models for prediction of peak oxygen consumption using ratings of perceived exertion from the Adult OMNI Cycle Scale of Perceived Exertion. 74 participants (men: n=36; women: n=38) completed a graded cycle exercise test. Ratings of perceived exertion for the overall body, legs, and chest/breathing were recorded each test stage and entered into previously developed 3-stage peak oxygen consumption prediction models. There were no significant differences (p>0.05) between measured and predicted peak oxygen consumption from ratings of perceived exertion for the overall body, legs, and chest/breathing within men (mean±standard deviation: 3.16±0.52 vs. 2.92±0.33 vs. 2.90±0.29 vs. 2.90±0.26 L·min(-1)) and women (2.17±0.29 vs. 2.02±0.22 vs. 2.03±0.19 vs. 2.01±0.19 L·min(-1)) participants. Previously developed statistical models for prediction of peak oxygen consumption based on subpeak OMNI ratings of perceived exertion responses were similar to measured peak oxygen consumption in a separate group of participants. These findings provide practical implications for the use of the original statistical models in standard health-fitness settings. © Georg Thieme Verlag KG Stuttgart · New York.

  12. Economic Impacts of Infrastructure Damages on Industrial Sector

    NASA Astrophysics Data System (ADS)

    Kajitani, Yoshio

    This paper proposes a basic model for evaluating economic impacts on industrial sectors under the conditions that multiple infrastructures are simultaneously damaged during the earthquake disasters. Especially, focusing on the available economic data developed in the smallest spatial scale in Japan (small area statistics), economic loss estimation model based on the small area statistics and its applicability are investigated on. In the detail, a loss estimation framework, utilizing survey results on firms' activities under electricity, water and gas disruptions, and route choice models in Transportation Engineering, are applied to the case of 2004 Mid-Niigata Earthquake.

  13. Statistical description of non-Gaussian samples in the F2 layer of the ionosphere during heliogeophysical disturbances

    NASA Astrophysics Data System (ADS)

    Sergeenko, N. P.

    2017-11-01

    An adequate statistical method should be developed in order to predict probabilistically the range of ionospheric parameters. This problem is solved in this paper. The time series of the critical frequency of the layer F2- foF2( t) were subjected to statistical processing. For the obtained samples {δ foF2}, statistical distributions and invariants up to the fourth order are calculated. The analysis shows that the distributions differ from the Gaussian law during the disturbances. At levels of sufficiently small probability distributions, there are arbitrarily large deviations from the model of the normal process. Therefore, it is attempted to describe statistical samples {δ foF2} based on the Poisson model. For the studied samples, the exponential characteristic function is selected under the assumption that time series are a superposition of some deterministic and random processes. Using the Fourier transform, the characteristic function is transformed into a nonholomorphic excessive-asymmetric probability-density function. The statistical distributions of the samples {δ foF2} calculated for the disturbed periods are compared with the obtained model distribution function. According to the Kolmogorov's criterion, the probabilities of the coincidence of a posteriori distributions with the theoretical ones are P 0.7-0.9. The conducted analysis makes it possible to draw a conclusion about the applicability of a model based on the Poisson random process for the statistical description and probabilistic variation estimates during heliogeophysical disturbances of the variations {δ foF2}.

  14. Uncertainty Analysis of Inertial Model Attitude Sensor Calibration and Application with a Recommended New Calibration Method

    NASA Technical Reports Server (NTRS)

    Tripp, John S.; Tcheng, Ping

    1999-01-01

    Statistical tools, previously developed for nonlinear least-squares estimation of multivariate sensor calibration parameters and the associated calibration uncertainty analysis, have been applied to single- and multiple-axis inertial model attitude sensors used in wind tunnel testing to measure angle of attack and roll angle. The analysis provides confidence and prediction intervals of calibrated sensor measurement uncertainty as functions of applied input pitch and roll angles. A comparative performance study of various experimental designs for inertial sensor calibration is presented along with corroborating experimental data. The importance of replicated calibrations over extended time periods has been emphasized; replication provides independent estimates of calibration precision and bias uncertainties, statistical tests for calibration or modeling bias uncertainty, and statistical tests for sensor parameter drift over time. A set of recommendations for a new standardized model attitude sensor calibration method and usage procedures is included. The statistical information provided by these procedures is necessary for the uncertainty analysis of aerospace test results now required by users of industrial wind tunnel test facilities.

  15. Development Of Educational Programs In Renewable And Alternative Energy Processing: The Case Of Russia

    NASA Astrophysics Data System (ADS)

    Svirina, Anna; Shindor, Olga; Tatmyshevsky, Konstantin

    2014-12-01

    The paper deals with the main problems of Russian energy system development that proves necessary to provide educational programs in the field of renewable and alternative energy. In the paper the process of curricula development and defining teaching techniques on the basis of expert opinion evaluation is defined, and the competence model for renewable and alternative energy processing master students is suggested. On the basis of a distributed questionnaire and in-depth interviews, the data for statistical analysis was obtained. On the basis of this data, an optimization of curricula structure was performed, and three models of a structure for optimizing teaching techniques were developed. The suggested educational program structure which was adopted by employers is presented in the paper. The findings include quantitatively estimated importance of systemic thinking and professional skills and knowledge as basic competences of a masters' program graduate; statistically estimated necessity of practice-based learning approach; and optimization models for structuring curricula in renewable and alternative energy processing. These findings allow the establishment of a platform for the development of educational programs.

  16. Thermodynamic Model of Spatial Memory

    NASA Astrophysics Data System (ADS)

    Kaufman, Miron; Allen, P.

    1998-03-01

    We develop and test a thermodynamic model of spatial memory. Our model is an application of statistical thermodynamics to cognitive science. It is related to applications of the statistical mechanics framework in parallel distributed processes research. Our macroscopic model allows us to evaluate an entropy associated with spatial memory tasks. We find that older adults exhibit higher levels of entropy than younger adults. Thurstone's Law of Categorical Judgment, according to which the discriminal processes along the psychological continuum produced by presentations of a single stimulus are normally distributed, is explained by using a Hooke spring model of spatial memory. We have also analyzed a nonlinear modification of the ideal spring model of spatial memory. This work is supported by NIH/NIA grant AG09282-06.

  17. Process model comparison and transferability across bioreactor scales and modes of operation for a mammalian cell bioprocess.

    PubMed

    Craven, Stephen; Shirsat, Nishikant; Whelan, Jessica; Glennon, Brian

    2013-01-01

    A Monod kinetic model, logistic equation model, and statistical regression model were developed for a Chinese hamster ovary cell bioprocess operated under three different modes of operation (batch, bolus fed-batch, and continuous fed-batch) and grown on two different bioreactor scales (3 L bench-top and 15 L pilot-scale). The Monod kinetic model was developed for all modes of operation under study and predicted cell density, glucose glutamine, lactate, and ammonia concentrations well for the bioprocess. However, it was computationally demanding due to the large number of parameters necessary to produce a good model fit. The transferability of the Monod kinetic model structure and parameter set across bioreactor scales and modes of operation was investigated and a parameter sensitivity analysis performed. The experimentally determined parameters had the greatest influence on model performance. They changed with scale and mode of operation, but were easily calculated. The remaining parameters, which were fitted using a differential evolutionary algorithm, were not as crucial. Logistic equation and statistical regression models were investigated as alternatives to the Monod kinetic model. They were less computationally intensive to develop due to the absence of a large parameter set. However, modeling of the nutrient and metabolite concentrations proved to be troublesome due to the logistic equation model structure and the inability of both models to incorporate a feed. The complexity, computational load, and effort required for model development has to be balanced with the necessary level of model sophistication when choosing which model type to develop for a particular application. Copyright © 2012 American Institute of Chemical Engineers (AIChE).

  18. Statistical robustness of machine-learning estimates for characterizing a groundwater-surface water system, Southland, New Zealand

    NASA Astrophysics Data System (ADS)

    Friedel, M. J.; Daughney, C.

    2016-12-01

    The development of a successful surface-groundwater management strategy depends on the quality of data provided for analysis. This study evaluates the statistical robustness when using a modified self-organizing map (MSOM) technique to estimate missing values for three hypersurface models: synoptic groundwater-surface water hydrochemistry, time-series of groundwater-surface water hydrochemistry, and mixed-survey (combination of groundwater-surface water hydrochemistry and lithologies) hydrostratigraphic unit data. These models of increasing complexity are developed and validated based on observations from the Southland region of New Zealand. In each case, the estimation method is sufficiently robust to cope with groundwater-surface water hydrochemistry vagaries due to sample size and extreme data insufficiency, even when >80% of the data are missing. The estimation of surface water hydrochemistry time series values enabled the evaluation of seasonal variation, and the imputation of lithologies facilitated the evaluation of hydrostratigraphic controls on groundwater-surface water interaction. The robust statistical results for groundwater-surface water models of increasing data complexity provide justification to apply the MSOM technique in other regions of New Zealand and abroad.

  19. Assessment of the long-lead probabilistic prediction for the Asian summer monsoon precipitation (1983-2011) based on the APCC multimodel system and a statistical model

    NASA Astrophysics Data System (ADS)

    Sohn, Soo-Jin; Min, Young-Mi; Lee, June-Yi; Tam, Chi-Yung; Kang, In-Sik; Wang, Bin; Ahn, Joong-Bae; Yamagata, Toshio

    2012-02-01

    The performance of the probabilistic multimodel prediction (PMMP) system of the APEC Climate Center (APCC) in predicting the Asian summer monsoon (ASM) precipitation at a four-month lead (with February initial condition) was compared with that of a statistical model using hindcast data for 1983-2005 and real-time forecasts for 2006-2011. Particular attention was paid to probabilistic precipitation forecasts for the boreal summer after the mature phase of El Niño and Southern Oscillation (ENSO). Taking into account the fact that coupled models' skill for boreal spring and summer precipitation mainly comes from their ability to capture ENSO teleconnection, we developed the statistical model using linear regression with the preceding winter ENSO condition as the predictor. Our results reveal several advantages and disadvantages in both forecast systems. First, the PMMP appears to have higher skills for both above- and below-normal categories in the six-year real-time forecast period, whereas the cross-validated statistical model has higher skills during the 23-year hindcast period. This implies that the cross-validated statistical skill may be overestimated. Second, the PMMP is the better tool for capturing atypical ENSO (or non-canonical ENSO related) teleconnection, which has affected the ASM precipitation during the early 1990s and in the recent decade. Third, the statistical model is more sensitive to the ENSO phase and has an advantage in predicting the ASM precipitation after the mature phase of La Niña.

  20. Analysis of Time-Series Quasi-Experiments. Final Report.

    ERIC Educational Resources Information Center

    Glass, Gene V.; Maguire, Thomas O.

    The objective of this project was to investigate the adequacy of statistical models developed by G. E. P. Box and G. C. Tiao for the analysis of time-series quasi-experiments: (1) The basic model developed by Box and Tiao is applied to actual time-series experiment data from two separate experiments, one in psychology and one in educational…

  1. Raman spectroscopy-based screening of IgM positive and negative sera for dengue virus infection

    NASA Astrophysics Data System (ADS)

    Bilal, M.; Saleem, M.; Bilal, Maria; Ijaz, T.; Khan, Saranjam; Ullah, Rahat; Raza, A.; Khurram, M.; Akram, W.; Ahmed, M.

    2016-11-01

    A statistical method based on Raman spectroscopy for the screening of immunoglobulin M (IgM) in dengue virus (DENV) infected human sera is presented. In total, 108 sera samples were collected and their antibody indexes (AI) for IgM were determined through enzyme-linked immunosorbent assay (ELISA). Raman spectra of these samples were acquired using a 785 nm wavelength excitation laser. Seventy-eight Raman spectra were selected randomly and unbiasedly for the development of a statistical model using partial least square (PLS) regression, while the remaining 30 were used for testing the developed model. An R-square (r 2) value of 0.929 was determined using the leave-one-sample-out (LOO) cross validation method, showing the validity of this model. It considers all molecular changes related to IgM concentration, and describes their role in infection. A graphical user interface (GUI) platform has been developed to run a developed multivariate model for the prediction of AI of IgM for blindly tested samples, and an excellent agreement has been found between model predicted and clinically determined values. Parameters like sensitivity, specificity, accuracy, and area under receiver operator characteristic (ROC) curve for these tested samples are also reported to visualize model performance.

  2. Predicting High Health Care Resource Utilization in a Single-payer Public Health Care System: Development and Validation of the High Resource User Population Risk Tool (HRUPoRT).

    PubMed

    Rosella, Laura C; Kornas, Kathy; Yao, Zhan; Manuel, Douglas G; Bornbaum, Catherine; Fransoo, Randall; Stukel, Therese

    2017-11-17

    A large proportion of health care spending is incurred by a small proportion of the population. Population-based health planning tools that consider both the clinical and upstream determinants of high resource users (HRU) of the health system are lacking. To develop and validate the High Resource User Population Risk Tool (HRUPoRT), a predictive model of adults that will become the top 5% of health care users over a 5-year period, based on self-reported clinical, sociodemographic, and health behavioral predictors in population survey data. The HRUPoRT model was developed in a prospective cohort design using the combined 2005 and 2007/2008 Canadian Community Health Surveys (CCHS) (N=58,617), and validated using the external 2009/2010 CCHS cohort (N=28,721). Health care utilization for each of the 5 years following CCHS interview date were determined by applying a person-centered costing algorithm to the linked health administrative databases. Discrimination and calibration of the model were assessed using c-statistic and Hosmer-Lemeshow (HL) χ statistic. The best prediction model for 5-year transition to HRU status included 12 predictors and had good discrimination (c-statistic=0.8213) and calibration (HL χ=18.71) in the development cohort. The model performed similarly in the validation cohort (c-statistic=0.8171; HL χ=19.95). The strongest predictors in the HRUPoRT model were age, perceived general health, and body mass index. HRUPoRT can accurately project the proportion of individuals in the population that will become a HRU over 5 years. HRUPoRT can be applied to inform health resource planning and prevention strategies at the community level.This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal. http://creativecommons.org/licenses/by-nc-nd/4.0/.

  3. Water resources management: Hydrologic characterization through hydrograph simulation may bias streamflow statistics

    NASA Astrophysics Data System (ADS)

    Farmer, W. H.; Kiang, J. E.

    2017-12-01

    The development, deployment and maintenance of water resources management infrastructure and practices rely on hydrologic characterization, which requires an understanding of local hydrology. With regards to streamflow, this understanding is typically quantified with statistics derived from long-term streamgage records. However, a fundamental problem is how to characterize local hydrology without the luxury of streamgage records, a problem that complicates water resources management at ungaged locations and for long-term future projections. This problem has typically been addressed through the development of point estimators, such as regression equations, to estimate particular statistics. Physically-based precipitation-runoff models, which are capable of producing simulated hydrographs, offer an alternative to point estimators. The advantage of simulated hydrographs is that they can be used to compute any number of streamflow statistics from a single source (the simulated hydrograph) rather than relying on a diverse set of point estimators. However, the use of simulated hydrographs introduces a degree of model uncertainty that is propagated through to estimated streamflow statistics and may have drastic effects on management decisions. We compare the accuracy and precision of streamflow statistics (e.g. the mean annual streamflow, the annual maximum streamflow exceeded in 10% of years, and the minimum seven-day average streamflow exceeded in 90% of years, among others) derived from point estimators (e.g. regressions, kriging, machine learning) to that of statistics derived from simulated hydrographs across the continental United States. Initial results suggest that the error introduced through hydrograph simulation may substantially bias the resulting hydrologic characterization.

  4. Statistical considerations in the development of injury risk functions.

    PubMed

    McMurry, Timothy L; Poplin, Gerald S

    2015-01-01

    We address 4 frequently misunderstood and important statistical ideas in the construction of injury risk functions. These include the similarities of survival analysis and logistic regression, the correct scale on which to construct pointwise confidence intervals for injury risk, the ability to discern which form of injury risk function is optimal, and the handling of repeated tests on the same subject. The statistical models are explored through simulation and examination of the underlying mathematics. We provide recommendations for the statistically valid construction and correct interpretation of single-predictor injury risk functions. This article aims to provide useful and understandable statistical guidance to improve the practice in constructing injury risk functions.

  5. Economic Impacts of Wind Turbine Development in U.S. Counties

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    J., Brown; B., Hoen; E., Lantz

    2011-07-25

    The objective is to address the research question using post-project construction, county-level data, and econometric evaluation methods. Wind energy is expanding rapidly in the United States: Over the last 4 years, wind power has contributed approximately 35 percent of all new electric power capacity. Wind power plants are often developed in rural areas where local economic development impacts from the installation are projected, including land lease and property tax payments and employment growth during plant construction and operation. Wind energy represented 2.3 percent of the U.S. electricity supply in 2010, but studies show that penetrations of at least 20 percentmore » are feasible. Several studies have used input-output models to predict direct, indirect, and induced economic development impacts. These analyses have often been completed prior to project construction. Available studies have not yet investigated the economic development impacts of wind development at the county level using post-construction econometric evaluation methods. Analysis of county-level impacts is limited. However, previous county-level analyses have estimated operation-period employment at 0.2 to 0.6 jobs per megawatt (MW) of power installed and earnings at $9,000/MW to $50,000/MW. We find statistically significant evidence of positive impacts of wind development on county-level per capita income from the OLS and spatial lag models when they are applied to the full set of wind and non-wind counties. The total impact on annual per capita income of wind turbine development (measured in MW per capita) in the spatial lag model was $21,604 per MW. This estimate is within the range of values estimated in the literature using input-output models. OLS results for the wind-only counties and matched samples are similar in magnitude, but are not statistically significant at the 10-percent level. We find a statistically significant impact of wind development on employment in the OLS analysis for wind counties only, but not in the other models. Our estimates of employment impacts are not precise enough to assess the validity of employment impacts from input-output models applied in advance of wind energy project construction. The analysis provides empirical evidence of positive income effects at the county level from cumulative wind turbine development, consistent with the range of impacts estimated using input-output models. Employment impacts are less clear.« less

  6. The Abdominal Aortic Aneurysm Statistically Corrected Operative Risk Evaluation (AAA SCORE) for predicting mortality after open and endovascular interventions.

    PubMed

    Ambler, Graeme K; Gohel, Manjit S; Mitchell, David C; Loftus, Ian M; Boyle, Jonathan R

    2015-01-01

    Accurate adjustment of surgical outcome data for risk is vital in an era of surgeon-level reporting. Current risk prediction models for abdominal aortic aneurysm (AAA) repair are suboptimal. We aimed to develop a reliable risk model for in-hospital mortality after intervention for AAA, using rigorous contemporary statistical techniques to handle missing data. Using data collected during a 15-month period in the United Kingdom National Vascular Database, we applied multiple imputation methodology together with stepwise model selection to generate preoperative and perioperative models of in-hospital mortality after AAA repair, using two thirds of the available data. Model performance was then assessed on the remaining third of the data by receiver operating characteristic curve analysis and compared with existing risk prediction models. Model calibration was assessed by Hosmer-Lemeshow analysis. A total of 8088 AAA repair operations were recorded in the National Vascular Database during the study period, of which 5870 (72.6%) were elective procedures. Both preoperative and perioperative models showed excellent discrimination, with areas under the receiver operating characteristic curve of .89 and .92, respectively. This was significantly better than any of the existing models (area under the receiver operating characteristic curve for best comparator model, .84 and .88; P < .001 and P = .001, respectively). Discrimination remained excellent when only elective procedures were considered. There was no evidence of miscalibration by Hosmer-Lemeshow analysis. We have developed accurate models to assess risk of in-hospital mortality after AAA repair. These models were carefully developed with rigorous statistical methodology and significantly outperform existing methods for both elective cases and overall AAA mortality. These models will be invaluable for both preoperative patient counseling and accurate risk adjustment of published outcome data. Copyright © 2015 Society for Vascular Surgery. Published by Elsevier Inc. All rights reserved.

  7. Effects of Instructional Design with Mental Model Analysis on Learning.

    ERIC Educational Resources Information Center

    Hong, Eunsook

    This paper presents a model for systematic instructional design that includes mental model analysis together with the procedures used in developing computer-based instructional materials in the area of statistical hypothesis testing. The instructional design model is based on the premise that the objective for learning is to achieve expert-like…

  8. Editorial: Introduction to the Special Section on Causal Inference in Cross Sectional and Longitudinal Mediational Models

    PubMed Central

    West, Stephen G.

    2016-01-01

    Psychologists have long had interest in the processes through which antecedent variables produce their effects on the outcomes of ultimate interest (e.g., Wood-worth's Stimulus-Organism-Response model). Models involving such meditational processes have characterized many of the important psychological theories of the 20th century and continue to the present day. However, it was not until Judd and Kenny (1981) and Baron and Kenny (1986) combined ideas from experimental design and structural equation modeling that statistical methods for directly testing such models, now known as mediation analysis, began to be developed. Methodologists have improved these statistical methods, developing new, more efficient estimators for mediated effects. They have also extended mediation analysis to multilevel data structures, models involving multiple mediators, models in which interactions occur, and an array of noncontinuous outcome measures (see MacKinnon, 2008). This work nicely maps on to key questions of applied researchers and has led to an outpouring of research testing meditational models (As of August, 2011, Baron and Kenny's article has had over 24,000 citations according to Google Scholar). PMID:26736046

  9. Universal Capacitance Model for Real-Time Biomass in Cell Culture.

    PubMed

    Konakovsky, Viktor; Yagtu, Ali Civan; Clemens, Christoph; Müller, Markus Michael; Berger, Martina; Schlatter, Stefan; Herwig, Christoph

    2015-09-02

    : Capacitance probes have the potential to revolutionize bioprocess control due to their safe and robust use and ability to detect even the smallest capacitors in the form of biological cells. Several techniques have evolved to model biomass statistically, however, there are problems with model transfer between cell lines and process conditions. Errors of transferred models in the declining phase of the culture range for linear models around +100% or worse, causing unnecessary delays with test runs during bioprocess development. The goal of this work was to develop one single universal model which can be adapted by considering a potentially mechanistic factor to estimate biomass in yet untested clones and scales. The novelty of this work is a methodology to select sensitive frequencies to build a statistical model which can be shared among fermentations with an error between 9% and 38% (mean error around 20%) for the whole process, including the declining phase. A simple linear factor was found to be responsible for the transferability of biomass models between cell lines, indicating a link to their phenotype or physiology.

  10. Development and validation of Prediction models for Risks of complications in Early-onset Pre-eclampsia (PREP): a prospective cohort study.

    PubMed

    Thangaratinam, Shakila; Allotey, John; Marlin, Nadine; Mol, Ben W; Von Dadelszen, Peter; Ganzevoort, Wessel; Akkermans, Joost; Ahmed, Asif; Daniels, Jane; Deeks, Jon; Ismail, Khaled; Barnard, Ann Marie; Dodds, Julie; Kerry, Sally; Moons, Carl; Riley, Richard D; Khan, Khalid S

    2017-04-01

    The prognosis of early-onset pre-eclampsia (before 34 weeks' gestation) is variable. Accurate prediction of complications is required to plan appropriate management in high-risk women. To develop and validate prediction models for outcomes in early-onset pre-eclampsia. Prospective cohort for model development, with validation in two external data sets. Model development: 53 obstetric units in the UK. Model transportability: PIERS (Pre-eclampsia Integrated Estimate of RiSk for mothers) and PETRA (Pre-Eclampsia TRial Amsterdam) studies. Pregnant women with early-onset pre-eclampsia. Nine hundred and forty-six women in the model development data set and 850 women (634 in PIERS, 216 in PETRA) in the transportability (external validation) data sets. The predictors were identified from systematic reviews of tests to predict complications in pre-eclampsia and were prioritised by Delphi survey. The primary outcome was the composite of adverse maternal outcomes established using Delphi surveys. The secondary outcome was the composite of fetal and neonatal complications. We developed two prediction models: a logistic regression model (PREP-L) to assess the overall risk of any maternal outcome until postnatal discharge and a survival analysis model (PREP-S) to obtain individual risk estimates at daily intervals from diagnosis until 34 weeks. Shrinkage was used to adjust for overoptimism of predictor effects. For internal validation (of the full models in the development data) and external validation (of the reduced models in the transportability data), we computed the ability of the models to discriminate between those with and without poor outcomes ( c -statistic), and the agreement between predicted and observed risk (calibration slope). The PREP-L model included maternal age, gestational age at diagnosis, medical history, systolic blood pressure, urine protein-to-creatinine ratio, platelet count, serum urea concentration, oxygen saturation, baseline treatment with antihypertensive drugs and administration of magnesium sulphate. The PREP-S model additionally included exaggerated tendon reflexes and serum alanine aminotransaminase and creatinine concentration. Both models showed good discrimination for maternal complications, with anoptimism-adjusted c -statistic of 0.82 [95% confidence interval (CI) 0.80 to 0.84] for PREP-L and 0.75 (95% CI 0.73 to 0.78) for the PREP-S model in the internal validation. External validation of the reduced PREP-L model showed good performance with a c -statistic of 0.81 (95% CI 0.77 to 0.85) in PIERS and 0.75 (95% CI 0.64 to 0.86) in PETRA cohorts for maternal complications, and calibrated well with slopes of 0.93 (95% CI 0.72 to 1.10) and 0.90 (95% CI 0.48 to 1.32), respectively. In the PIERS data set, the reduced PREP-S model had a c -statistic of 0.71 (95% CI 0.67 to 0.75) and a calibration slope of 0.67 (95% CI 0.56 to 0.79). Low gestational age at diagnosis, high urine protein-to-creatinine ratio, increased serum urea concentration, treatment with antihypertensive drugs, magnesium sulphate, abnormal uterine artery Doppler scan findings and estimated fetal weight below the 10th centile were associated with fetal complications. The PREP-L model provided individualised risk estimates in early-onset pre-eclampsia to plan management of high- or low-risk individuals. The PREP-S model has the potential to be used as a triage tool for risk assessment. The impacts of the model use on outcomes need further evaluation. Current Controlled Trials ISRCTN40384046. The National Institute for Health Research Health Technology Assessment programme.

  11. Shock and Vibration Symposium (59th) Held in Albuquerque, New Mexico on 18-20 October 1988. Volume 3

    DTIC Science & Technology

    1988-10-01

    N. F. Rieger Statistical Energy Analysis : An Overview of Its Development and Engineering Applications J. E. Manning DATA BASES DOE/DOD Environmental...Vibroacoustic Response Using the Finite Element Method and Statistical Energy Analysis F. L. Gloyna Study of Helium Effect on Spacecraft Random Vibration...Analysis S. A. Wilkerson vi DYNAMIC ANALYSIS Modeling of Vibration Transmission in a Damped Beam Structure Using Statistical Energy Analysis S. S

  12. Development of a Cadaveric Model for Arthrocentesis.

    PubMed

    MacIver, Melissa A; Johnson, Matthew

    2015-01-01

    This article reports the development of a novel cadaveric model for future use in teaching arthrocentesis. In the clinical setting, animal safety is essential and practice is thus limited. Objectives of the study were to develop and compare a model to an unmodified cadaver by injecting one of two types of fluids to increase yield. The two fluids injected, mineral oil (MO) and hypertonic saline (HS), were compared to determine any difference on yield. Lastly, aspiration immediately after (T1) or three hours after (T2) injection were compared to determine any effect on diagnostic yield. Joints used included the stifle, elbow, and carpus in eight medium dog cadavers. Arthrocentesis was performed before injection (control) and yield measured. Test joints were injected with MO or HS and yield measured after range of motion (T1) and three hours post injection to simulate lab preparation (T2). Both models had statistically significantly higher yield compared with the unmodified cadaver in all joints at T1 and T2 (p<.05) with the exception of HST2 carpus. T2 aspiration had a statistically significant lower yield when compared to T1HS carpus, T1HS elbow, and T1MO carpus. Overall, irrespective of fluid volume or type, percent yield was lower in T2 compared to T1. No statistically significant difference was seen between HS and MO in most joints with the exception of MOT1 stifle and HST2 elbow. Within the time frame assessed, both models were acceptable. However, HS arthrocentesis models proved appropriate for student trial due to the difficult aspirations with MO.

  13. Estimating current and future streamflow characteristics at ungaged sites, central and eastern Montana, with application to evaluating effects of climate change on fish populations

    USGS Publications Warehouse

    Sando, Roy; Chase, Katherine J.

    2017-03-23

    A common statistical procedure for estimating streamflow statistics at ungaged locations is to develop a relational model between streamflow and drainage basin characteristics at gaged locations using least squares regression analysis; however, least squares regression methods are parametric and make constraining assumptions about the data distribution. The random forest regression method provides an alternative nonparametric method for estimating streamflow characteristics at ungaged sites and requires that the data meet fewer statistical conditions than least squares regression methods.Random forest regression analysis was used to develop predictive models for 89 streamflow characteristics using Precipitation-Runoff Modeling System simulated streamflow data and drainage basin characteristics at 179 sites in central and eastern Montana. The predictive models were developed from streamflow data simulated for current (baseline, water years 1982–99) conditions and three future periods (water years 2021–38, 2046–63, and 2071–88) under three different climate-change scenarios. These predictive models were then used to predict streamflow characteristics for baseline conditions and three future periods at 1,707 fish sampling sites in central and eastern Montana. The average root mean square error for all predictive models was about 50 percent. When streamflow predictions at 23 fish sampling sites were compared to nearby locations with simulated data, the mean relative percent difference was about 43 percent. When predictions were compared to streamflow data recorded at 21 U.S. Geological Survey streamflow-gaging stations outside of the calibration basins, the average mean absolute percent error was about 73 percent.

  14. Statistical Analysis of CFD Solutions from the Fourth AIAA Drag Prediction Workshop

    NASA Technical Reports Server (NTRS)

    Morrison, Joseph H.

    2010-01-01

    A graphical framework is used for statistical analysis of the results from an extensive N-version test of a collection of Reynolds-averaged Navier-Stokes computational fluid dynamics codes. The solutions were obtained by code developers and users from the U.S., Europe, Asia, and Russia using a variety of grid systems and turbulence models for the June 2009 4th Drag Prediction Workshop sponsored by the AIAA Applied Aerodynamics Technical Committee. The aerodynamic configuration for this workshop was a new subsonic transport model, the Common Research Model, designed using a modern approach for the wing and included a horizontal tail. The fourth workshop focused on the prediction of both absolute and incremental drag levels for wing-body and wing-body-horizontal tail configurations. This work continues the statistical analysis begun in the earlier workshops and compares the results from the grid convergence study of the most recent workshop with earlier workshops using the statistical framework.

  15. Highway runoff quality models for the protection of environmentally sensitive areas

    NASA Astrophysics Data System (ADS)

    Trenouth, William R.; Gharabaghi, Bahram

    2016-11-01

    This paper presents novel highway runoff quality models using artificial neural networks (ANN) which take into account site-specific highway traffic and seasonal storm event meteorological factors to predict the event mean concentration (EMC) statistics and mean daily unit area load (MDUAL) statistics of common highway pollutants for the design of roadside ditch treatment systems (RDTS) to protect sensitive receiving environs. A dataset of 940 monitored highway runoff events from fourteen sites located in five countries (Canada, USA, Australia, New Zealand, and China) was compiled and used to develop ANN models for the prediction of highway runoff suspended solids (TSS) seasonal EMC statistical distribution parameters, as well as the MDUAL statistics for four different heavy metal species (Cu, Zn, Cr and Pb). TSS EMCs are needed to estimate the minimum required removal efficiency of the RDTS needed in order to improve highway runoff quality to meet applicable standards and MDUALs are needed to calculate the minimum required capacity of the RDTS to ensure performance longevity.

  16. Influences of credibility of testimony and strength of statistical evidence on children’s and adolescents’ reasoning

    PubMed Central

    Kail, Robert V.

    2013-01-01

    According to dual-process models that include analytic and heuristic modes of processing, analytic processing is often expected to become more common with development. Consistent with this view, on reasoning problems, adolescents are more likely than children to select alternatives that are backed by statistical evidence. It is shown here that this pattern depends on the quality of the statistical evidence and the quality of the testimonial that is the typical alternative to statistical evidence. In Experiment 1, 9- and 13-year-olds (N = 64) were presented with scenarios in which solid statistical evidence was contrasted with casual or expert testimonial evidence. When testimony was casual, children relied on it but adolescents did not; when testimony was expert, both children and adolescents relied on it. In Experiment 2, 9- and 13-year-olds (N = 83) were presented with scenarios in which casual testimonial evidence was contrasted with weak or strong statistical evidence. When statistical evidence was weak, children and adolescents relied on both testimonial and statistical evidence; when statistical evidence was strong, most children and adolescents relied on it. Results are discussed in terms of their implications for dual-process accounts of cognitive development. PMID:23735681

  17. Quantifying Variation in Gait Features from Wearable Inertial Sensors Using Mixed Effects Models

    PubMed Central

    Cresswell, Kellen Garrison; Shin, Yongyun; Chen, Shanshan

    2017-01-01

    The emerging technology of wearable inertial sensors has shown its advantages in collecting continuous longitudinal gait data outside laboratories. This freedom also presents challenges in collecting high-fidelity gait data. In the free-living environment, without constant supervision from researchers, sensor-based gait features are susceptible to variation from confounding factors such as gait speed and mounting uncertainty, which are challenging to control or estimate. This paper is one of the first attempts in the field to tackle such challenges using statistical modeling. By accepting the uncertainties and variation associated with wearable sensor-based gait data, we shift our efforts from detecting and correcting those variations to modeling them statistically. From gait data collected on one healthy, non-elderly subject during 48 full-factorial trials, we identified four major sources of variation, and quantified their impact on one gait outcome—range per cycle—using a random effects model and a fixed effects model. The methodology developed in this paper lays the groundwork for a statistical framework to account for sources of variation in wearable gait data, thus facilitating informative statistical inference for free-living gait analysis. PMID:28245602

  18. A Dynamic Intrusion Detection System Based on Multivariate Hotelling's T2 Statistics Approach for Network Environments

    PubMed Central

    Avalappampatty Sivasamy, Aneetha; Sundan, Bose

    2015-01-01

    The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T2 method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T2 statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better. PMID:26357668

  19. A Dynamic Intrusion Detection System Based on Multivariate Hotelling's T2 Statistics Approach for Network Environments.

    PubMed

    Sivasamy, Aneetha Avalappampatty; Sundan, Bose

    2015-01-01

    The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T(2) method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T(2) statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better.

  20. A hybrid model for predicting carbon monoxide from vehicular exhausts in urban environments

    NASA Astrophysics Data System (ADS)

    Gokhale, Sharad; Khare, Mukesh

    Several deterministic-based air quality models evaluate and predict the frequently occurring pollutant concentration well but, in general, are incapable of predicting the 'extreme' concentrations. In contrast, the statistical distribution models overcome the above limitation of the deterministic models and predict the 'extreme' concentrations. However, the environmental damages are caused by both extremes as well as by the sustained average concentration of pollutants. Hence, the model should predict not only 'extreme' ranges but also the 'middle' ranges of pollutant concentrations, i.e. the entire range. Hybrid modelling is one of the techniques that estimates/predicts the 'entire range' of the distribution of pollutant concentrations by combining the deterministic based models with suitable statistical distribution models ( Jakeman, et al., 1988). In the present paper, a hybrid model has been developed to predict the carbon monoxide (CO) concentration distributions at one of the traffic intersections, Income Tax Office (ITO), in the Delhi city, where the traffic is heterogeneous in nature and meteorology is 'tropical'. The model combines the general finite line source model (GFLSM) as its deterministic, and log logistic distribution (LLD) model, as its statistical components. The hybrid (GFLSM-LLD) model is then applied at the ITO intersection. The results show that the hybrid model predictions match with that of the observed CO concentration data within the 5-99 percentiles range. The model is further validated at different street location, i.e. Sirifort roadway. The validation results show that the model predicts CO concentrations fairly well ( d=0.91) in 10-95 percentiles range. The regulatory compliance is also developed to estimate the probability of exceedance of hourly CO concentration beyond the National Ambient Air Quality Standards (NAAQS) of India. It consists of light vehicles, heavy vehicles, three- wheelers (auto rickshaws) and two-wheelers (scooters, motorcycles, etc).

  1. TH-CD-202-07: A Methodology for Generating Numerical Phantoms for Radiation Therapy Using Geometric Attribute Distribution Models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dolly, S; Chen, H; Mutic, S

    Purpose: A persistent challenge for the quality assessment of radiation therapy treatments (e.g. contouring accuracy) is the absence of the known, ground truth for patient data. Moreover, assessment results are often patient-dependent. Computer simulation studies utilizing numerical phantoms can be performed for quality assessment with a known ground truth. However, previously reported numerical phantoms do not include the statistical properties of inter-patient variations, as their models are based on only one patient. In addition, these models do not incorporate tumor data. In this study, a methodology was developed for generating numerical phantoms which encapsulate the statistical variations of patients withinmore » radiation therapy, including tumors. Methods: Based on previous work in contouring assessment, geometric attribute distribution (GAD) models were employed to model both the deterministic and stochastic properties of individual organs via principle component analysis. Using pre-existing radiation therapy contour data, the GAD models are trained to model the shape and centroid distributions of each organ. Then, organs with different shapes and positions can be generated by assigning statistically sound weights to the GAD model parameters. Organ contour data from 20 retrospective prostate patient cases were manually extracted and utilized to train the GAD models. As a demonstration, computer-simulated CT images of generated numerical phantoms were calculated and assessed subjectively and objectively for realism. Results: A cohort of numerical phantoms of the male human pelvis was generated. CT images were deemed realistic both subjectively and objectively in terms of image noise power spectrum. Conclusion: A methodology has been developed to generate realistic numerical anthropomorphic phantoms using pre-existing radiation therapy data. The GAD models guarantee that generated organs span the statistical distribution of observed radiation therapy patients, according to the training dataset. The methodology enables radiation therapy treatment assessment with multi-modality imaging and a known ground truth, and without patient-dependent bias.« less

  2. Adaptation of a Fast Optimal Interpolation Algorithm to the Mapping of Oceangraphic Data

    NASA Technical Reports Server (NTRS)

    Menemenlis, Dimitris; Fieguth, Paul; Wunsch, Carl; Willsky, Alan

    1997-01-01

    A fast, recently developed, multiscale optimal interpolation algorithm has been adapted to the mapping of hydrographic and other oceanographic data. This algorithm produces solution and error estimates which are consistent with those obtained from exact least squares methods, but at a small fraction of the computational cost. Problems whose solution would be completely impractical using exact least squares, that is, problems with tens or hundreds of thousands of measurements and estimation grid points, can easily be solved on a small workstation using the multiscale algorithm. In contrast to methods previously proposed for solving large least squares problems, our approach provides estimation error statistics while permitting long-range correlations, using all measurements, and permitting arbitrary measurement locations. The multiscale algorithm itself, published elsewhere, is not the focus of this paper. However, the algorithm requires statistical models having a very particular multiscale structure; it is the development of a class of multiscale statistical models, appropriate for oceanographic mapping problems, with which we concern ourselves in this paper. The approach is illustrated by mapping temperature in the northeastern Pacific. The number of hydrographic stations is kept deliberately small to show that multiscale and exact least squares results are comparable. A portion of the data were not used in the analysis; these data serve to test the multiscale estimates. A major advantage of the present approach is the ability to repeat the estimation procedure a large number of times for sensitivity studies, parameter estimation, and model testing. We have made available by anonymous Ftp a set of MATLAB-callable routines which implement the multiscale algorithm and the statistical models developed in this paper.

  3. Raising the bar for reproducible science at the U.S. Environmental Protection Agency Office of Research and Development.

    PubMed

    George, Barbara Jane; Sobus, Jon R; Phelps, Lara P; Rashleigh, Brenda; Simmons, Jane Ellen; Hines, Ronald N

    2015-05-01

    Considerable concern has been raised regarding research reproducibility both within and outside the scientific community. Several factors possibly contribute to a lack of reproducibility, including a failure to adequately employ statistical considerations during study design, bias in sample selection or subject recruitment, errors in developing data inclusion/exclusion criteria, and flawed statistical analysis. To address some of these issues, several publishers have developed checklists that authors must complete. Others have either enhanced statistical expertise on existing editorial boards, or formed distinct statistics editorial boards. Although the U.S. Environmental Protection Agency, Office of Research and Development, already has a strong Quality Assurance Program, an initiative was undertaken to further strengthen statistics consideration and other factors in study design and also to ensure these same factors are evaluated during the review and approval of study protocols. To raise awareness of the importance of statistical issues and provide a forum for robust discussion, a Community of Practice for Statistics was formed in January 2014. In addition, three working groups were established to develop a series of questions or criteria that should be considered when designing or reviewing experimental, observational, or modeling focused research. This article describes the process used to develop these study design guidance documents, their contents, how they are being employed by the Agency's research enterprise, and expected benefits to Agency science. The process and guidance documents presented here may be of utility for any research enterprise interested in enhancing the reproducibility of its science. © The Author 2015. Published by Oxford University Press on behalf of the Society of Toxicology.

  4. Statistical and Hydrological evaluation of precipitation forecasts from IMD MME and ECMWF numerical weather forecasts for Indian River basins

    NASA Astrophysics Data System (ADS)

    Mohite, A. R.; Beria, H.; Behera, A. K.; Chatterjee, C.; Singh, R.

    2016-12-01

    Flood forecasting using hydrological models is an important and cost-effective non-structural flood management measure. For forecasting at short lead times, empirical models using real-time precipitation estimates have proven to be reliable. However, their skill depreciates with increasing lead time. Coupling a hydrologic model with real-time rainfall forecasts issued from numerical weather prediction (NWP) systems could increase the lead time substantially. In this study, we compared 1-5 days precipitation forecasts from India Meteorological Department (IMD) Multi-Model Ensemble (MME) with European Center for Medium Weather forecast (ECMWF) NWP forecasts for over 86 major river basins in India. We then evaluated the hydrologic utility of these forecasts over Basantpur catchment (approx. 59,000 km2) of the Mahanadi River basin. Coupled MIKE 11 RR (NAM) and MIKE 11 hydrodynamic (HD) models were used for the development of flood forecast system (FFS). RR model was calibrated using IMD station rainfall data. Cross-sections extracted from SRTM 30 were used as input to the MIKE 11 HD model. IMD started issuing operational MME forecasts from the year 2008, and hence, both the statistical and hydrologic evaluation were carried out from 2008-2014. The performance of FFS was evaluated using both the NWP datasets separately for the year 2011, which was a large flood year in Mahanadi River basin. We will present figures and metrics for statistical (threshold based statistics, skill in terms of correlation and bias) and hydrologic (Nash Sutcliffe efficiency, mean and peak error statistics) evaluation. The statistical evaluation will be at pan-India scale for all the major river basins and the hydrologic evaluation will be for the Basantpur catchment of the Mahanadi River basin.

  5. Statistical variability comparison in MODIS and AERONET derived aerosol optical depth over Indo-Gangetic Plains using time series modeling.

    PubMed

    Soni, Kirti; Parmar, Kulwinder Singh; Kapoor, Sangeeta; Kumar, Nishant

    2016-05-15

    A lot of studies in the literature of Aerosol Optical Depth (AOD) done by using Moderate Resolution Imaging Spectroradiometer (MODIS) derived data, but the accuracy of satellite data in comparison to ground data derived from ARrosol Robotic NETwork (AERONET) has been always questionable. So to overcome from this situation, comparative study of a comprehensive ground based and satellite data for the period of 2001-2012 is modeled. The time series model is used for the accurate prediction of AOD and statistical variability is compared to assess the performance of the model in both cases. Root mean square error (RMSE), mean absolute percentage error (MAPE), stationary R-squared, R-squared, maximum absolute percentage error (MAPE), normalized Bayesian information criterion (NBIC) and Ljung-Box methods are used to check the applicability and validity of the developed ARIMA models revealing significant precision in the model performance. It was found that, it is possible to predict the AOD by statistical modeling using time series obtained from past data of MODIS and AERONET as input data. Moreover, the result shows that MODIS data can be formed from AERONET data by adding 0.251627 ± 0.133589 and vice-versa by subtracting. From the forecast available for AODs for the next four years (2013-2017) by using the developed ARIMA model, it is concluded that the forecasted ground AOD has increased trend. Copyright © 2016 Elsevier B.V. All rights reserved.

  6. Summary of hydrologic modeling for the Delaware River Basin using the Water Availability Tool for Environmental Resources (WATER)

    USGS Publications Warehouse

    Williamson, Tanja N.; Lant, Jeremiah G.; Claggett, Peter; Nystrom, Elizabeth A.; Milly, Paul C.D.; Nelson, Hugh L.; Hoffman, Scott A.; Colarullo, Susan J.; Fischer, Jeffrey M.

    2015-11-18

    The Water Availability Tool for Environmental Resources (WATER) is a decision support system for the nontidal part of the Delaware River Basin that provides a consistent and objective method of simulating streamflow under historical, forecasted, and managed conditions. In order to quantify the uncertainty associated with these simulations, however, streamflow and the associated hydroclimatic variables of potential evapotranspiration, actual evapotranspiration, and snow accumulation and snowmelt must be simulated and compared to long-term, daily observations from sites. This report details model development and optimization, statistical evaluation of simulations for 57 basins ranging from 2 to 930 km2 and 11.0 to 99.5 percent forested cover, and how this statistical evaluation of daily streamflow relates to simulating environmental changes and management decisions that are best examined at monthly time steps normalized over multiple decades. The decision support system provides a database of historical spatial and climatic data for simulating streamflow for 2001–11, in addition to land-cover and general circulation model forecasts that focus on 2030 and 2060. WATER integrates geospatial sampling of landscape characteristics, including topographic and soil properties, with a regionally calibrated hillslope-hydrology model, an impervious-surface model, and hydroclimatic models that were parameterized by using three hydrologic response units: forested, agricultural, and developed land cover. This integration enables the regional hydrologic modeling approach used in WATER without requiring site-specific optimization or those stationary conditions inferred when using a statistical model.

  7. A rigidity transition and glassy dynamics in a model for confluent 3D tissues

    NASA Astrophysics Data System (ADS)

    Merkel, Matthias; Manning, M. Lisa

    The origin of rigidity in disordered materials is an outstanding open problem in statistical physics. Recently, a new type of rigidity transition was discovered in a family of models for 2D biological tissues, but the mechanisms responsible for rigidity remain unclear. This is not just a statistical physics problem, but also relevant for embryonic development, cancer growth, and wound healing. To gain insight into this rigidity transition and make new predictions about biological bulk tissues, we have developed a fully 3D self-propelled Voronoi (SPV) model. The model takes into account shape, elasticity, and self-propelled motion of the individual cells. We find that in the absence of self-propulsion, this model exhibits a rigidity transition that is controlled by a dimensionless model parameter describing the preferred cell shape, with an accompanying structural order parameter. In the presence of self-propulsion, the rigidity transition appears as a glass-like transition featuring caging and aging effects. Given the similarities between this transition and jamming in particulate solids, it is natural to ask if the two transitions are related. By comparing statistics of Voronoi geometries, we show the transitions are surprisingly close but demonstrably distinct. Furthermore, an index theorem used to identify topologically protected mechanical modes in jammed systems can be extended to these vertex-type models. In our model, residual stresses govern the transition and enter the index theorem in a different way compared to jammed particles, suggesting the origin of rigidity may be different between the two.

  8. COLLABORATIVE RESEARCH:USING ARM OBSERVATIONS & ADVANCED STATISTICAL TECHNIQUES TO EVALUATE CAM3 CLOUDS FOR DEVELOPMENT OF STOCHASTIC CLOUD-RADIATION

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Somerville, Richard

    2013-08-22

    The long-range goal of several past and current projects in our DOE-supported research has been the development of new and improved parameterizations of cloud-radiation effects and related processes, using ARM data, and the implementation and testing of these parameterizations in global models. The main objective of the present project being reported on here has been to develop and apply advanced statistical techniques, including Bayesian posterior estimates, to diagnose and evaluate features of both observed and simulated clouds. The research carried out under this project has been novel in two important ways. The first is that it is a key stepmore » in the development of practical stochastic cloud-radiation parameterizations, a new category of parameterizations that offers great promise for overcoming many shortcomings of conventional schemes. The second is that this work has brought powerful new tools to bear on the problem, because it has been a collaboration between a meteorologist with long experience in ARM research (Somerville) and a mathematician who is an expert on a class of advanced statistical techniques that are well-suited for diagnosing model cloud simulations using ARM observations (Shen).« less

  9. No-Impact Threshold Values for NRAP's Reduced Order Models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Last, George V.; Murray, Christopher J.; Brown, Christopher F.

    2013-02-01

    The purpose of this study was to develop methodologies for establishing baseline datasets and statistical protocols for determining statistically significant changes between background concentrations and predicted concentrations that would be used to represent a contamination plume in the Gen II models being developed by NRAP’s Groundwater Protection team. The initial effort examined selected portions of two aquifer systems; the urban shallow-unconfined aquifer system of the Edwards-Trinity Aquifer System (being used to develop the ROM for carbon-rock aquifers, and the a portion of the High Plains Aquifer (an unconsolidated and semi-consolidated sand and gravel aquifer, being used to development the ROMmore » for sandstone aquifers). Threshold values were determined for Cd, Pb, As, pH, and TDS that could be used to identify contamination due to predicted impacts from carbon sequestration storage reservoirs, based on recommendations found in the EPA’s ''Unified Guidance for Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities'' (US Environmental Protection Agency 2009). Results from this effort can be used to inform a ''no change'' scenario with respect to groundwater impacts, rather than the use of an MCL that could be significantly higher than existing concentrations in the aquifer.« less

  10. Towards good practice for health statistics: lessons from the Millennium Development Goal health indicators.

    PubMed

    Murray, Christopher J L

    2007-03-10

    Health statistics are at the centre of an increasing number of worldwide health controversies. Several factors are sharpening the tension between the supply and demand for high quality health information, and the health-related Millennium Development Goals (MDGs) provide a high-profile example. With thousands of indicators recommended but few measured well, the worldwide health community needs to focus its efforts on improving measurement of a small set of priority areas. Priority indicators should be selected on the basis of public-health significance and several dimensions of measurability. Health statistics can be divided into three types: crude, corrected, and predicted. Health statistics are necessary inputs to planning and strategic decision making, programme implementation, monitoring progress towards targets, and assessment of what works and what does not. Crude statistics that are biased have no role in any of these steps; corrected statistics are preferred. For strategic decision making, when corrected statistics are unavailable, predicted statistics can play an important part. For monitoring progress towards agreed targets and assessment of what works and what does not, however, predicted statistics should not be used. Perhaps the most effective method to decrease controversy over health statistics and to encourage better primary data collection and the development of better analytical methods is a strong commitment to provision of an explicit data audit trail. This initiative would make available the primary data, all post-data collection adjustments, models including covariates used for farcasting and forecasting, and necessary documentation to the public.

  11. Hybrid regulatory models: a statistically tractable approach to model regulatory network dynamics.

    PubMed

    Ocone, Andrea; Millar, Andrew J; Sanguinetti, Guido

    2013-04-01

    Computational modelling of the dynamics of gene regulatory networks is a central task of systems biology. For networks of small/medium scale, the dominant paradigm is represented by systems of coupled non-linear ordinary differential equations (ODEs). ODEs afford great mechanistic detail and flexibility, but calibrating these models to data is often an extremely difficult statistical problem. Here, we develop a general statistical inference framework for stochastic transcription-translation networks. We use a coarse-grained approach, which represents the system as a network of stochastic (binary) promoter and (continuous) protein variables. We derive an exact inference algorithm and an efficient variational approximation that allows scalable inference and learning of the model parameters. We demonstrate the power of the approach on two biological case studies, showing that the method allows a high degree of flexibility and is capable of testable novel biological predictions. http://homepages.inf.ed.ac.uk/gsanguin/software.html. Supplementary data are available at Bioinformatics online.

  12. An Assessment of Phylogenetic Tools for Analyzing the Interplay Between Interspecific Interactions and Phenotypic Evolution.

    PubMed

    Drury, J P; Grether, G F; Garland, T; Morlon, H

    2018-05-01

    Much ecological and evolutionary theory predicts that interspecific interactions often drive phenotypic diversification and that species phenotypes in turn influence species interactions. Several phylogenetic comparative methods have been developed to assess the importance of such processes in nature; however, the statistical properties of these methods have gone largely untested. Focusing mainly on scenarios of competition between closely-related species, we assess the performance of available comparative approaches for analyzing the interplay between interspecific interactions and species phenotypes. We find that many currently used statistical methods often fail to detect the impact of interspecific interactions on trait evolution, that sister-taxa analyses are particularly unreliable in general, and that recently developed process-based models have more satisfactory statistical properties. Methods for detecting predictors of species interactions are generally more reliable than methods for detecting character displacement. In weighing the strengths and weaknesses of different approaches, we hope to provide a clear guide for empiricists testing hypotheses about the reciprocal effect of interspecific interactions and species phenotypes and to inspire further development of process-based models.

  13. Using statistical and artificial neural network models to forecast potentiometric levels at a deep well in South Texas

    NASA Astrophysics Data System (ADS)

    Uddameri, V.

    2007-01-01

    Reliable forecasts of monthly and quarterly fluctuations in groundwater levels are necessary for short- and medium-term planning and management of aquifers to ensure proper service of seasonal demands within a region. Development of physically based transient mathematical models at this time scale poses considerable challenges due to lack of suitable data and other uncertainties. Artificial neural networks (ANN) possess flexible mathematical structures and are capable of mapping highly nonlinear relationships. Feed-forward neural network models were constructed and trained using the back-percolation algorithm to forecast monthly and quarterly time-series water levels at a well that taps into the deeper Evangeline formation of the Gulf Coast aquifer in Victoria, TX. Unlike unconfined formations, no causal relationships exist between water levels and hydro-meteorological variables measured near the vicinity of the well. As such, an endogenous forecasting model using dummy variables to capture short-term seasonal fluctuations and longer-term (decadal) trends was constructed. The root mean square error, mean absolute deviation and correlation coefficient ( R) were noted to be 1.40, 0.33 and 0.77 m, respectively, for an evaluation dataset of quarterly measurements and 1.17, 0.46, and 0.88 m for an evaluative monthly dataset not used to train or test the model. These statistics were better for the ANN model than those developed using statistical regression techniques.

  14. Statistical reconstruction for cosmic ray muon tomography.

    PubMed

    Schultz, Larry J; Blanpied, Gary S; Borozdin, Konstantin N; Fraser, Andrew M; Hengartner, Nicolas W; Klimenko, Alexei V; Morris, Christopher L; Orum, Chris; Sossong, Michael J

    2007-08-01

    Highly penetrating cosmic ray muons constantly shower the earth at a rate of about 1 muon per cm2 per minute. We have developed a technique which exploits the multiple Coulomb scattering of these particles to perform nondestructive inspection without the use of artificial radiation. In prior work [1]-[3], we have described heuristic methods for processing muon data to create reconstructed images. In this paper, we present a maximum likelihood/expectation maximization tomographic reconstruction algorithm designed for the technique. This algorithm borrows much from techniques used in medical imaging, particularly emission tomography, but the statistics of muon scattering dictates differences. We describe the statistical model for multiple scattering, derive the reconstruction algorithm, and present simulated examples. We also propose methods to improve the robustness of the algorithm to experimental errors and events departing from the statistical model.

  15. Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate

    NASA Astrophysics Data System (ADS)

    Minh, Vu Trieu; Katushin, Dmitri; Antonov, Maksim; Veinthal, Renno

    2017-03-01

    This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM) based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), rock brittleness index (BI), the distance between planes of weakness (DPW), and the alpha angle (Alpha) between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP). Four (4) statistical regression models (two linear and two nonlinear) are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2) of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.

  16. An Embedded Statistical Method for Coupling Molecular Dynamics and Finite Element Analyses

    NASA Technical Reports Server (NTRS)

    Saether, E.; Glaessgen, E.H.; Yamakov, V.

    2008-01-01

    The coupling of molecular dynamics (MD) simulations with finite element methods (FEM) yields computationally efficient models that link fundamental material processes at the atomistic level with continuum field responses at higher length scales. The theoretical challenge involves developing a seamless connection along an interface between two inherently different simulation frameworks. Various specialized methods have been developed to solve particular classes of problems. Many of these methods link the kinematics of individual MD atoms with FEM nodes at their common interface, necessarily requiring that the finite element mesh be refined to atomic resolution. Some of these coupling approaches also require simulations to be carried out at 0 K and restrict modeling to two-dimensional material domains due to difficulties in simulating full three-dimensional material processes. In the present work, a new approach to MD-FEM coupling is developed based on a restatement of the standard boundary value problem used to define a coupled domain. The method replaces a direct linkage of individual MD atoms and finite element (FE) nodes with a statistical averaging of atomistic displacements in local atomic volumes associated with each FE node in an interface region. The FEM and MD computational systems are effectively independent and communicate only through an iterative update of their boundary conditions. With the use of statistical averages of the atomistic quantities to couple the two computational schemes, the developed approach is referred to as an embedded statistical coupling method (ESCM). ESCM provides an enhanced coupling methodology that is inherently applicable to three-dimensional domains, avoids discretization of the continuum model to atomic scale resolution, and permits finite temperature states to be applied.

  17. Data Modeling for Preservice Teachers and Everyone Else

    ERIC Educational Resources Information Center

    Petrosino, Anthony J.; Mann, Michele J.

    2018-01-01

    Although data modeling, the employment of statistical reasoning for the purpose of investigating questions about the world, is central to both mathematics and science, it is rarely emphasized in K-16 instruction. The current work focuses on developing thinking about data modeling with undergraduates in general and preservice teachers in…

  18. Tour-based model development for TxDOT : implementation steps for the tour-based model design option and the data needs.

    DOT National Transportation Integrated Search

    2009-10-01

    Travel demand modeling, in recent years, has seen a paradigm shift with an emphasis on analyzing travel at the : individual level rather than using direct statistical projections of aggregate travel demand as in the trip-based : approach. Specificall...

  19. Modeling evaporation of Jet A, JP-7 and RP-1 drops at 1 to 15 bars

    NASA Technical Reports Server (NTRS)

    Harstad, K.; Bellan, J.

    2003-01-01

    A model describing the evaportion of an isolated drop of a multicomponent fuel containing hundreds of species has been developed. The model is based on Continuous Thermodynamics concepts wherein the composition of a fuel is statistically described using a Probability Distribution Function (PDF).

  20. Examining Elementary Social Studies Marginalization: A Multilevel Model

    ERIC Educational Resources Information Center

    Fitchett, Paul G.; Heafner, Tina L.; Lambert, Richard G.

    2014-01-01

    Utilizing data from the National Center for Education Statistics Schools and Staffing Survey (SASS), a multilevel model (Hierarchical Linear Model) was developed to examine the association of teacher/classroom and state level indicators on reported elementary social studies instructional time. Findings indicated that state testing policy was a…

  1. Efficiency Analysis of Public Universities in Thailand

    ERIC Educational Resources Information Center

    Kantabutra, Saranya; Tang, John C. S.

    2010-01-01

    This paper examines the performance of Thai public universities in terms of efficiency, using a non-parametric approach called data envelopment analysis. Two efficiency models, the teaching efficiency model and the research efficiency model, are developed and the analysis is conducted at the faculty level. Further statistical analyses are also…

  2. Wheat mill stream properties for discrete element method modeling

    USDA-ARS?s Scientific Manuscript database

    A discrete phase approach based on individual wheat kernel characteristics is needed to overcome the limitations of previous statistical models and accurately predict the milling behavior of wheat. As a first step to develop a discrete element method (DEM) model for the wheat milling process, this s...

  3. A modified F-test for evaluating model performance by including both experimental and simulation uncertainties

    USDA-ARS?s Scientific Manuscript database

    Experimental and simulation uncertainties have not been included in many of the statistics used in assessing agricultural model performance. The objectives of this study were to develop an F-test that can be used to evaluate model performance considering experimental and simulation uncertainties, an...

  4. Physical concepts in the development of constitutive equations

    NASA Technical Reports Server (NTRS)

    Cassenti, B. N.

    1985-01-01

    Proposed viscoplastic material models include in their formulation observed material response but do not generally incorporate principles from thermodynamics, statistical mechanics, and quantum mechanics. Numerous hypotheses were made for material response based on first principles. Many of these hypotheses were tested experimentally. The proposed viscoplastic theories and the experimental basis of these hypotheses must be checked against the hypotheses. The physics of thermodynamics, statistical mechanics and quantum mechanics, and the effects of defects, are reviewed for their application to the development of constitutive laws.

  5. Radio Occultation Investigation of the Rings of Saturn and Uranus

    NASA Technical Reports Server (NTRS)

    Marouf, Essam A.

    1997-01-01

    The proposed work addresses two main objectives: (1) to pursue the development of the random diffraction screen model for analytical/computational characterization of the extinction and near-forward scattering by ring models that include particle crowding, uniform clustering, and clustering along preferred orientations (anisotropy). The characterization is crucial for proper interpretation of past (Voyager) and future (Cassini) ring, occultation observations in terms of physical ring properties, and is needed to address outstanding puzzles in the interpretation of the Voyager radio occultation data sets; (2) to continue the development of spectral analysis techniques to identify and characterize the power scattered by all features of Saturn's rings that can be resolved in the Voyager radio occultation observations, and to use the results to constrain the maximum particle size and its abundance. Characterization of the variability of surface mass density among the main ring, features and within individual features is important for constraining the ring mass and is relevant to investigations of ring dynamics and origin. We completed the developed of the stochastic geometry (random screen) model for the interaction of electromagnetic waves with of planetary ring models; used the model to relate the oblique optical depth and the angular spectrum of the near forward scattered signal to statistical averages of the stochastic geometry of the randomly blocked area. WE developed analytical results based on the assumption of Poisson statistics for particle positions, and investigated the dependence of the oblique optical depth and angular spectrum on the fractional area blocked, vertical ring profile, and incidence angle when the volume fraction is small. Demonstrated agreement with the classical radiative transfer predictions for oblique incidence. Also developed simulation procedures to generate statistical realizations of random screens corresponding to uniformly packed ring models, and used the results to characterize dependence of the extinction and near-forward scattering on ring thickness, packing fraction, and the ring opening angle.

  6. Mapping irrigated lands at 250-m scale by merging MODIS data and National Agricultural Statistics

    USGS Publications Warehouse

    Pervez, Md Shahriar; Brown, Jesslyn F.

    2010-01-01

    Accurate geospatial information on the extent of irrigated land improves our understanding of agricultural water use, local land surface processes, conservation or depletion of water resources, and components of the hydrologic budget. We have developed a method in a geospatial modeling framework that assimilates irrigation statistics with remotely sensed parameters describing vegetation growth conditions in areas with agricultural land cover to spatially identify irrigated lands at 250-m cell size across the conterminous United States for 2002. The geospatial model result, known as the Moderate Resolution Imaging Spectroradiometer (MODIS) Irrigated Agriculture Dataset (MIrAD-US), identified irrigated lands with reasonable accuracy in California and semiarid Great Plains states with overall accuracies of 92% and 75% and kappa statistics of 0.75 and 0.51, respectively. A quantitative accuracy assessment of MIrAD-US for the eastern region has not yet been conducted, and qualitative assessment shows that model improvements are needed for the humid eastern regions where the distinction in annual peak NDVI between irrigated and non-irrigated crops is minimal and county sizes are relatively small. This modeling approach enables consistent mapping of irrigated lands based upon USDA irrigation statistics and should lead to better understanding of spatial trends in irrigated lands across the conterminous United States. An improved version of the model with revised datasets is planned and will employ 2007 USDA irrigation statistics.

  7. PyEvolve: a toolkit for statistical modelling of molecular evolution.

    PubMed

    Butterfield, Andrew; Vedagiri, Vivek; Lang, Edward; Lawrence, Cath; Wakefield, Matthew J; Isaev, Alexander; Huttley, Gavin A

    2004-01-05

    Examining the distribution of variation has proven an extremely profitable technique in the effort to identify sequences of biological significance. Most approaches in the field, however, evaluate only the conserved portions of sequences - ignoring the biological significance of sequence differences. A suite of sophisticated likelihood based statistical models from the field of molecular evolution provides the basis for extracting the information from the full distribution of sequence variation. The number of different problems to which phylogeny-based maximum likelihood calculations can be applied is extensive. Available software packages that can perform likelihood calculations suffer from a lack of flexibility and scalability, or employ error-prone approaches to model parameterisation. Here we describe the implementation of PyEvolve, a toolkit for the application of existing, and development of new, statistical methods for molecular evolution. We present the object architecture and design schema of PyEvolve, which includes an adaptable multi-level parallelisation schema. The approach for defining new methods is illustrated by implementing a novel dinucleotide model of substitution that includes a parameter for mutation of methylated CpG's, which required 8 lines of standard Python code to define. Benchmarking was performed using either a dinucleotide or codon substitution model applied to an alignment of BRCA1 sequences from 20 mammals, or a 10 species subset. Up to five-fold parallel performance gains over serial were recorded. Compared to leading alternative software, PyEvolve exhibited significantly better real world performance for parameter rich models with a large data set, reducing the time required for optimisation from approximately 10 days to approximately 6 hours. PyEvolve provides flexible functionality that can be used either for statistical modelling of molecular evolution, or the development of new methods in the field. The toolkit can be used interactively or by writing and executing scripts. The toolkit uses efficient processes for specifying the parameterisation of statistical models, and implements numerous optimisations that make highly parameter rich likelihood functions solvable within hours on multi-cpu hardware. PyEvolve can be readily adapted in response to changing computational demands and hardware configurations to maximise performance. PyEvolve is released under the GPL and can be downloaded from http://cbis.anu.edu.au/software.

  8. Parameterizing Phrase Based Statistical Machine Translation Models: An Analytic Study

    ERIC Educational Resources Information Center

    Cer, Daniel

    2011-01-01

    The goal of this dissertation is to determine the best way to train a statistical machine translation system. I first develop a state-of-the-art machine translation system called Phrasal and then use it to examine a wide variety of potential learning algorithms and optimization criteria and arrive at two very surprising results. First, despite the…

  9. Statistical validation and an empirical model of hydrogen production enhancement found by utilizing passive flow disturbance in the steam-reformation process

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Erickson, Paul A.; Liao, Chang-hsien

    2007-11-15

    A passive flow disturbance has been proven to enhance the conversion of fuel in a methanol-steam reformer. This study presents a statistical validation of the experiment based on a standard 2{sup k} factorial experiment design and the resulting empirical model of the enhanced hydrogen producing process. A factorial experiment design was used to statistically analyze the effects and interactions of various input factors in the experiment. Three input factors, including the number of flow disturbers, catalyst size, and reactant flow rate were investigated for their effects on the fuel conversion in the steam-reformation process. Based on the experimental results, anmore » empirical model was developed and further evaluated with an uncertainty analysis and interior point data. (author)« less

  10. Towards random matrix model of breaking the time-reversal invariance of elastic waves in chaotic cavities by feedback

    NASA Astrophysics Data System (ADS)

    Antoniuk, Oleg; Sprik, Rudolf

    2010-03-01

    We developed a random matrix model to describe the statistics of resonances in an acoustic cavity with broken time-reversal invariance. Time-reversal invariance braking is achieved by connecting an amplified feedback loop between two transducers on the surface of the cavity. The model is based on approach [1] that describes time- reversal properties of the cavity without a feedback loop. Statistics of eigenvalues (nearest neighbor resonance spacing distributions and spectral rigidity) has been calculated and compared to the statistics obtained from our experimental data. Experiments have been performed on aluminum block of chaotic shape confining ultrasound waves. [1] Carsten Draeger and Mathias Fink, One-channel time- reversal in chaotic cavities: Theoretical limits, Journal of Acoustical Society of America, vol. 105, Nr. 2, pp. 611-617 (1999)

  11. LES/PDF studies of joint statistics of mixture fraction and progress variable in piloted methane jet flames with inhomogeneous inlet flows

    NASA Astrophysics Data System (ADS)

    Zhang, Pei; Barlow, Robert; Masri, Assaad; Wang, Haifeng

    2016-11-01

    The mixture fraction and progress variable are often used as independent variables for describing turbulent premixed and non-premixed flames. There is a growing interest in using these two variables for describing partially premixed flames. The joint statistical distribution of the mixture fraction and progress variable is of great interest in developing models for partially premixed flames. In this work, we conduct predictive studies of the joint statistics of mixture fraction and progress variable in a series of piloted methane jet flames with inhomogeneous inlet flows. The employed models combine large eddy simulations with the Monte Carlo probability density function (PDF) method. The joint PDFs and marginal PDFs are examined in detail by comparing the model predictions and the measurements. Different presumed shapes of the joint PDFs are also evaluated.

  12. Introduction to the Special Issue: Advancing the State-of-the-Science in Reading Research through Modeling.

    PubMed

    Zevin, Jason D; Miller, Brett

    Reading research is increasingly a multi-disciplinary endeavor involving more complex, team-based science approaches. These approaches offer the potential of capturing the complexity of reading development, the emergence of individual differences in reading performance over time, how these differences relate to the development of reading difficulties and disability, and more fully understanding the nature of skilled reading in adults. This special issue focuses on the potential opportunities and insights that early and richly integrated advanced statistical and computational modeling approaches can provide to our foundational (and translational) understanding of reading. The issue explores how computational and statistical modeling, using both observed and simulated data, can serve as a contact point among research domains and topics, complement other data sources and critically provide analytic advantages over current approaches.

  13. Scenarios for Evolving Seismic Crises: Possible Communication Strategies

    NASA Astrophysics Data System (ADS)

    Steacy, S.

    2015-12-01

    Recent advances in operational earthquake forecasting mean that we are very close to being able to confidently compute changes in earthquake probability as seismic crises develop. For instance, we now have statistical models such as ETAS and STEP which demonstrate considerable skill in forecasting earthquake rates and recent advances in Coulomb based models are also showing much promise. Communicating changes in earthquake probability is likely be very difficult, however, as the absolute probability of a damaging event is likely to remain quite small despite a significant increase in the relative value. Here, we use a hybrid Coulomb/statistical model to compute probability changes for a series of earthquake scenarios in New Zealand. We discuss the strengths and limitations of the forecasts and suggest a number of possible mechanisms that might be used to communicate results in an actual developing seismic crisis.

  14. Computing maximum-likelihood estimates for parameters of the National Descriptive Model of Mercury in Fish

    USGS Publications Warehouse

    Donato, David I.

    2012-01-01

    This report presents the mathematical expressions and the computational techniques required to compute maximum-likelihood estimates for the parameters of the National Descriptive Model of Mercury in Fish (NDMMF), a statistical model used to predict the concentration of methylmercury in fish tissue. The expressions and techniques reported here were prepared to support the development of custom software capable of computing NDMMF parameter estimates more quickly and using less computer memory than is currently possible with available general-purpose statistical software. Computation of maximum-likelihood estimates for the NDMMF by numerical solution of a system of simultaneous equations through repeated Newton-Raphson iterations is described. This report explains the derivation of the mathematical expressions required for computational parameter estimation in sufficient detail to facilitate future derivations for any revised versions of the NDMMF that may be developed.

  15. K-nearest neighbors based methods for identification of different gear crack levels under different motor speeds and loads: Revisited

    NASA Astrophysics Data System (ADS)

    Wang, Dong

    2016-03-01

    Gears are the most commonly used components in mechanical transmission systems. Their failures may cause transmission system breakdown and result in economic loss. Identification of different gear crack levels is important to prevent any unexpected gear failure because gear cracks lead to gear tooth breakage. Signal processing based methods mainly require expertize to explain gear fault signatures which is usually not easy to be achieved by ordinary users. In order to automatically identify different gear crack levels, intelligent gear crack identification methods should be developed. The previous case studies experimentally proved that K-nearest neighbors based methods exhibit high prediction accuracies for identification of 3 different gear crack levels under different motor speeds and loads. In this short communication, to further enhance prediction accuracies of existing K-nearest neighbors based methods and extend identification of 3 different gear crack levels to identification of 5 different gear crack levels, redundant statistical features are constructed by using Daubechies 44 (db44) binary wavelet packet transform at different wavelet decomposition levels, prior to the use of a K-nearest neighbors method. The dimensionality of redundant statistical features is 620, which provides richer gear fault signatures. Since many of these statistical features are redundant and highly correlated with each other, dimensionality reduction of redundant statistical features is conducted to obtain new significant statistical features. At last, the K-nearest neighbors method is used to identify 5 different gear crack levels under different motor speeds and loads. A case study including 3 experiments is investigated to demonstrate that the developed method provides higher prediction accuracies than the existing K-nearest neighbors based methods for recognizing different gear crack levels under different motor speeds and loads. Based on the new significant statistical features, some other popular statistical models including linear discriminant analysis, quadratic discriminant analysis, classification and regression tree and naive Bayes classifier, are compared with the developed method. The results show that the developed method has the highest prediction accuracies among these statistical models. Additionally, selection of the number of new significant features and parameter selection of K-nearest neighbors are thoroughly investigated.

  16. A statistical model describing combined irreversible electroporation and electroporation-induced blood-brain barrier disruption.

    PubMed

    Sharabi, Shirley; Kos, Bor; Last, David; Guez, David; Daniels, Dianne; Harnof, Sagi; Mardor, Yael; Miklavcic, Damijan

    2016-03-01

    Electroporation-based therapies such as electrochemotherapy (ECT) and irreversible electroporation (IRE) are emerging as promising tools for treatment of tumors. When applied to the brain, electroporation can also induce transient blood-brain-barrier (BBB) disruption in volumes extending beyond IRE, thus enabling efficient drug penetration. The main objective of this study was to develop a statistical model predicting cell death and BBB disruption induced by electroporation. This model can be used for individual treatment planning. Cell death and BBB disruption models were developed based on the Peleg-Fermi model in combination with numerical models of the electric field. The model calculates the electric field thresholds for cell kill and BBB disruption and describes the dependence on the number of treatment pulses. The model was validated using in vivo experimental data consisting of rats brains MRIs post electroporation treatments. Linear regression analysis confirmed that the model described the IRE and BBB disruption volumes as a function of treatment pulses number (r(2) = 0.79; p < 0.008, r(2) = 0.91; p < 0.001). The results presented a strong plateau effect as the pulse number increased. The ratio between complete cell death and no cell death thresholds was relatively narrow (between 0.88-0.91) even for small numbers of pulses and depended weakly on the number of pulses. For BBB disruption, the ratio increased with the number of pulses. BBB disruption radii were on average 67% ± 11% larger than IRE volumes. The statistical model can be used to describe the dependence of treatment-effects on the number of pulses independent of the experimental setup.

  17. Prediction Models for 30-Day Mortality and Complications After Total Knee and Hip Arthroplasties for Veteran Health Administration Patients With Osteoarthritis.

    PubMed

    Harris, Alex Hs; Kuo, Alfred C; Bowe, Thomas; Gupta, Shalini; Nordin, David; Giori, Nicholas J

    2018-05-01

    Statistical models to preoperatively predict patients' risk of death and major complications after total joint arthroplasty (TJA) could improve the quality of preoperative management and informed consent. Although risk models for TJA exist, they have limitations including poor transparency and/or unknown or poor performance. Thus, it is currently impossible to know how well currently available models predict short-term complications after TJA, or if newly developed models are more accurate. We sought to develop and conduct cross-validation of predictive risk models, and report details and performance metrics as benchmarks. Over 90 preoperative variables were used as candidate predictors of death and major complications within 30 days for Veterans Health Administration patients with osteoarthritis who underwent TJA. Data were split into 3 samples-for selection of model tuning parameters, model development, and cross-validation. C-indexes (discrimination) and calibration plots were produced. A total of 70,569 patients diagnosed with osteoarthritis who received primary TJA were included. C-statistics and bootstrapped confidence intervals for the cross-validation of the boosted regression models were highest for cardiac complications (0.75; 0.71-0.79) and 30-day mortality (0.73; 0.66-0.79) and lowest for deep vein thrombosis (0.59; 0.55-0.64) and return to the operating room (0.60; 0.57-0.63). Moderately accurate predictive models of 30-day mortality and cardiac complications after TJA in Veterans Health Administration patients were developed and internally cross-validated. By reporting model coefficients and performance metrics, other model developers can test these models on new samples and have a procedure and indication-specific benchmark to surpass. Published by Elsevier Inc.

  18. Statistically significant relational data mining :

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berry, Jonathan W.; Leung, Vitus Joseph; Phillips, Cynthia Ann

    This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publicationsmore » that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.« less

  19. Statistical Systems with Z

    NASA Astrophysics Data System (ADS)

    William, Peter

    In this dissertation several two dimensional statistical systems exhibiting discrete Z(n) symmetries are studied. For this purpose a newly developed algorithm to compute the partition function of these models exactly is utilized. The zeros of the partition function are examined in order to obtain information about the observable quantities at the critical point. This occurs in the form of critical exponents of the order parameters which characterize phenomena at the critical point. The correlation length exponent is found to agree very well with those computed from strong coupling expansions for the mass gap and with Monte Carlo results. In Feynman's path integral formalism the partition function of a statistical system can be related to the vacuum expectation value of the time ordered product of the observable quantities of the corresponding field theoretic model. Hence a generalization of ordinary scale invariance in the form of conformal invariance is focussed upon. This principle is very suitably applicable, in the case of two dimensional statistical models undergoing second order phase transitions at criticality. The conformal anomaly specifies the universality class to which these models belong. From an evaluation of the partition function, the free energy at criticality is computed, to determine the conformal anomaly of these models. The conformal anomaly for all the models considered here are in good agreement with the predicted values.

  20. Risk prediction score for death of traumatised and injured children

    PubMed Central

    2014-01-01

    Background Injury prediction scores facilitate the development of clinical management protocols to decrease mortality. However, most of the previously developed scores are limited in scope and are non-specific for use in children. We aimed to develop and validate a risk prediction model of death for injured and Traumatised Thai children. Methods Our cross-sectional study included 43,516 injured children from 34 emergency services. A risk prediction model was derived using a logistic regression analysis that included 15 predictors. Model performance was assessed using the concordance statistic (C-statistic) and the observed per expected (O/E) ratio. Internal validation of the model was performed using a 200-repetition bootstrap analysis. Results Death occurred in 1.7% of the injured children (95% confidence interval [95% CI]: 1.57–1.82). Ten predictors (i.e., age, airway intervention, physical injury mechanism, three injured body regions, the Glasgow Coma Scale, and three vital signs) were significantly associated with death. The C-statistic and the O/E ratio were 0.938 (95% CI: 0.929–0.947) and 0.86 (95% CI: 0.70–1.02), respectively. The scoring scheme classified three risk stratifications with respective likelihood ratios of 1.26 (95% CI: 1.25–1.27), 2.45 (95% CI: 2.42–2.52), and 4.72 (95% CI: 4.57–4.88) for low, intermediate, and high risks of death. Internal validation showed good model performance (C-statistic = 0.938, 95% CI: 0.926–0.952) and a small calibration bias of 0.002 (95% CI: 0.0005–0.003). Conclusions We developed a simplified Thai pediatric injury death prediction score with satisfactory calibrated and discriminative performance in emergency room settings. PMID:24575982

  1. Comparison of simulation modeling and satellite techniques for monitoring ecological processes

    NASA Technical Reports Server (NTRS)

    Box, Elgene O.

    1988-01-01

    In 1985 improvements were made in the world climatic data base for modeling and predictive mapping; in individual process models and the overall carbon-balance models; and in the interface software for mapping the simulation results. Statistical analysis of the data base was begun. In 1986 mapping was shifted to NASA-Goddard. The initial approach involving pattern comparisons was modified to a more statistical approach. A major accomplishment was the expansion and improvement of a global data base of measurements of biomass and primary production, to complement the simulation data. The main accomplishments during 1987 included: production of a master tape with all environmental and satellite data and model results for the 1600 sites; development of a complete mapping system used for the initial color maps comparing annual and monthly patterns of Normalized Difference Vegetation Index (NDVI), actual evapotranspiration, net primary productivity, gross primary productivity, and net ecosystem production; collection of more biosphere measurements for eventual improvement of the biological models; and development of some initial monthly models for primary productivity, based on satellite data.

  2. The GenABEL Project for statistical genomics

    PubMed Central

    Karssen, Lennart C.; van Duijn, Cornelia M.; Aulchenko, Yurii S.

    2016-01-01

    Development of free/libre open source software is usually done by a community of people with an interest in the tool. For scientific software, however, this is less often the case. Most scientific software is written by only a few authors, often a student working on a thesis. Once the paper describing the tool has been published, the tool is no longer developed further and is left to its own device. Here we describe the broad, multidisciplinary community we formed around a set of tools for statistical genomics. The GenABEL project for statistical omics actively promotes open interdisciplinary development of statistical methodology and its implementation in efficient and user-friendly software under an open source licence. The software tools developed withing the project collectively make up the GenABEL suite, which currently consists of eleven tools. The open framework of the project actively encourages involvement of the community in all stages, from formulation of methodological ideas to application of software to specific data sets. A web forum is used to channel user questions and discussions, further promoting the use of the GenABEL suite. Developer discussions take place on a dedicated mailing list, and development is further supported by robust development practices including use of public version control, code review and continuous integration. Use of this open science model attracts contributions from users and developers outside the “core team”, facilitating agile statistical omics methodology development and fast dissemination. PMID:27347381

  3. Statistics for laminar flamelet modeling

    NASA Technical Reports Server (NTRS)

    Cant, R. S.; Rutland, C. J.; Trouve, A.

    1990-01-01

    Statistical information required to support modeling of turbulent premixed combustion by laminar flamelet methods is extracted from a database of the results of Direct Numerical Simulation of turbulent flames. The simulations were carried out previously by Rutland (1989) using a pseudo-spectral code on a three dimensional mesh of 128 points in each direction. One-step Arrhenius chemistry was employed together with small heat release. A framework for the interpretation of the data is provided by the Bray-Moss-Libby model for the mean turbulent reaction rate. Probability density functions are obtained over surfaces of the constant reaction progress variable for the tangential strain rate and the principal curvature. New insights are gained which will greatly aid the development of modeling approaches.

  4. A statistical model of aggregate fragmentation

    NASA Astrophysics Data System (ADS)

    Spahn, F.; Vieira Neto, E.; Guimarães, A. H. F.; Gorban, A. N.; Brilliantov, N. V.

    2014-01-01

    A statistical model of fragmentation of aggregates is proposed, based on the stochastic propagation of cracks through the body. The propagation rules are formulated on a lattice and mimic two important features of the process—a crack moves against the stress gradient while dissipating energy during its growth. We perform numerical simulations of the model for two-dimensional lattice and reveal that the mass distribution for small- and intermediate-size fragments obeys a power law, F(m)∝m-3/2, in agreement with experimental observations. We develop an analytical theory which explains the detected power law and demonstrate that the overall fragment mass distribution in our model agrees qualitatively with that one observed in experiments.

  5. Improving a complex finite-difference ground water flow model through the use of an analytic element screening model

    USGS Publications Warehouse

    Hunt, R.J.; Anderson, M.P.; Kelson, V.A.

    1998-01-01

    This paper demonstrates that analytic element models have potential as powerful screening tools that can facilitate or improve calibration of more complicated finite-difference and finite-element models. We demonstrate how a two-dimensional analytic element model was used to identify errors in a complex three-dimensional finite-difference model caused by incorrect specification of boundary conditions. An improved finite-difference model was developed using boundary conditions developed from a far-field analytic element model. Calibration of a revised finite-difference model was achieved using fewer zones of hydraulic conductivity and lake bed conductance than the original finite-difference model. Calibration statistics were also improved in that simulated base-flows were much closer to measured values. The improved calibration is due mainly to improved specification of the boundary conditions made possible by first solving the far-field problem with an analytic element model.This paper demonstrates that analytic element models have potential as powerful screening tools that can facilitate or improve calibration of more complicated finite-difference and finite-element models. We demonstrate how a two-dimensional analytic element model was used to identify errors in a complex three-dimensional finite-difference model caused by incorrect specification of boundary conditions. An improved finite-difference model was developed using boundary conditions developed from a far-field analytic element model. Calibration of a revised finite-difference model was achieved using fewer zones of hydraulic conductivity and lake bed conductance than the original finite-difference model. Calibration statistics were also improved in that simulated base-flows were much closer to measured values. The improved calibration is due mainly to improved specification of the boundary conditions made possible by first solving the far-field problem with an analytic element model.

  6. Statistical properties of exciton fine structure splitting and polarization angles in quantum dot ensembles

    NASA Astrophysics Data System (ADS)

    Gong, Ming; Hofer, B.; Zallo, E.; Trotta, R.; Luo, Jun-Wei; Schmidt, O. G.; Zhang, Chuanwei

    2014-05-01

    We develop an effective model to describe the statistical properties of exciton fine structure splitting (FSS) and polarization angle in quantum dot ensembles (QDEs) using only a few symmetry-related parameters. The connection between the effective model and the random matrix theory is established. Such effective model is verified both theoretically and experimentally using several rather different types of QDEs, each of which contains hundreds to thousands of QDs. The model naturally addresses three fundamental issues regarding the FSS and polarization angels of QDEs, which are frequently encountered in both theories and experiments. The answers to these fundamental questions yield an approach to characterize the optical properties of QDEs. Potential applications of the effective model are also discussed.

  7. Watershed regressions for pesticides (WARP) for predicting atrazine concentration in Corn Belt streams

    USGS Publications Warehouse

    Stone, Wesley W.; Gilliom, Robert J.

    2011-01-01

    The 95-percent prediction intervals are well within a factor of 10 above and below the predicted concentration statistic. WARP-CB model predictions were within a factor of 5 of the observed concentration statistic for over 90 percent of the model-development sites. The WARP-CB residuals and uncertainty are lower than those of the National WARP model for the same sites. The WARP-CB models provide improved predictions of the probability of exceeding a specified criterion or benchmark for Corn Belt streams draining watersheds with high atrazine use intensities; however, National WARP models should be used for Corn Belt streams where atrazine use intensities are less than 17 kg/km2 of watershed area.

  8. An overview of the mathematical and statistical analysis component of RICIS

    NASA Technical Reports Server (NTRS)

    Hallum, Cecil R.

    1987-01-01

    Mathematical and statistical analysis components of RICIS (Research Institute for Computing and Information Systems) can be used in the following problem areas: (1) quantification and measurement of software reliability; (2) assessment of changes in software reliability over time (reliability growth); (3) analysis of software-failure data; and (4) decision logic for whether to continue or stop testing software. Other areas of interest to NASA/JSC where mathematical and statistical analysis can be successfully employed include: math modeling of physical systems, simulation, statistical data reduction, evaluation methods, optimization, algorithm development, and mathematical methods in signal processing.

  9. iCFD: Interpreted Computational Fluid Dynamics - Degeneration of CFD to one-dimensional advection-dispersion models using statistical experimental design - The secondary clarifier.

    PubMed

    Guyonvarch, Estelle; Ramin, Elham; Kulahci, Murat; Plósz, Benedek Gy

    2015-10-15

    The present study aims at using statistically designed computational fluid dynamics (CFD) simulations as numerical experiments for the identification of one-dimensional (1-D) advection-dispersion models - computationally light tools, used e.g., as sub-models in systems analysis. The objective is to develop a new 1-D framework, referred to as interpreted CFD (iCFD) models, in which statistical meta-models are used to calculate the pseudo-dispersion coefficient (D) as a function of design and flow boundary conditions. The method - presented in a straightforward and transparent way - is illustrated using the example of a circular secondary settling tank (SST). First, the significant design and flow factors are screened out by applying the statistical method of two-level fractional factorial design of experiments. Second, based on the number of significant factors identified through the factor screening study and system understanding, 50 different sets of design and flow conditions are selected using Latin Hypercube Sampling (LHS). The boundary condition sets are imposed on a 2-D axi-symmetrical CFD simulation model of the SST. In the framework, to degenerate the 2-D model structure, CFD model outputs are approximated by the 1-D model through the calibration of three different model structures for D. Correlation equations for the D parameter then are identified as a function of the selected design and flow boundary conditions (meta-models), and their accuracy is evaluated against D values estimated in each numerical experiment. The evaluation and validation of the iCFD model structure is carried out using scenario simulation results obtained with parameters sampled from the corners of the LHS experimental region. For the studied SST, additional iCFD model development was carried out in terms of (i) assessing different density current sub-models; (ii) implementation of a combined flocculation, hindered, transient and compression settling velocity function; and (iii) assessment of modelling the onset of transient and compression settling. Furthermore, the optimal level of model discretization both in 2-D and 1-D was undertaken. Results suggest that the iCFD model developed for the SST through the proposed methodology is able to predict solid distribution with high accuracy - taking a reasonable computational effort - when compared to multi-dimensional numerical experiments, under a wide range of flow and design conditions. iCFD tools could play a crucial role in reliably predicting systems' performance under normal and shock events. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. A conceptual model for the development process of confirmatory adaptive clinical trials within an emergency research network.

    PubMed

    Mawocha, Samkeliso C; Fetters, Michael D; Legocki, Laurie J; Guetterman, Timothy C; Frederiksen, Shirley; Barsan, William G; Lewis, Roger J; Berry, Donald A; Meurer, William J

    2017-06-01

    Adaptive clinical trials use accumulating data from enrolled subjects to alter trial conduct in pre-specified ways based on quantitative decision rules. In this research, we sought to characterize the perspectives of key stakeholders during the development process of confirmatory-phase adaptive clinical trials within an emergency clinical trials network and to build a model to guide future development of adaptive clinical trials. We used an ethnographic, qualitative approach to evaluate key stakeholders' views about the adaptive clinical trial development process. Stakeholders participated in a series of multidisciplinary meetings during the development of five adaptive clinical trials and completed a Strengths-Weaknesses-Opportunities-Threats questionnaire. In the analysis, we elucidated overarching themes across the stakeholders' responses to develop a conceptual model. Four major overarching themes emerged during the analysis of stakeholders' responses to questioning: the perceived statistical complexity of adaptive clinical trials and the roles of collaboration, communication, and time during the development process. Frequent and open communication and collaboration were viewed by stakeholders as critical during the development process, as were the careful management of time and logistical issues related to the complexity of planning adaptive clinical trials. The Adaptive Design Development Model illustrates how statistical complexity, time, communication, and collaboration are moderating factors in the adaptive design development process. The intensity and iterative nature of this process underscores the need for funding mechanisms for the development of novel trial proposals in academic settings.

  11. Automated Vocal Analysis of Children with Hearing Loss and Their Typical and Atypical Peers

    PubMed Central

    VanDam, Mark; Oller, D. Kimbrough; Ambrose, Sophie E.; Gray, Sharmistha; Richards, Jeffrey A.; Xu, Dongxin; Gilkerson, Jill; Silbert, Noah H.; Moeller, Mary Pat

    2014-01-01

    Objectives This study investigated automatic assessment of vocal development in children with hearing loss as compared with children who are typically developing, have language delays, and autism spectrum disorder. Statistical models are examined for performance in a classification model and to predict age within the four groups of children. Design The vocal analysis system analyzed over 1900 whole-day, naturalistic acoustic recordings from 273 toddlers and preschoolers comprising children who were typically developing, hard of hearing, language delayed, or autistic. Results Samples from children who were hard-of-hearing patterned more similarly to those of typically-developing children than to the language-delayed or autistic samples. The statistical models were able to classify children from the four groups examined and estimate developmental age based on automated vocal analysis. Conclusions This work shows a broad similarity between children with hearing loss and typically developing children, although children with hearing loss show some delay in their production of speech. Automatic acoustic analysis can now be used to quantitatively compare vocal development in children with and without speech-related disorders. The work may serve to better distinguish among various developmental disorders and ultimately contribute to improved intervention. PMID:25587667

  12. Testicular Cancer Risk Prediction Models

    Cancer.gov

    Developing statistical models that estimate the probability of testicular cervical cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  13. Modeling Group Interactions via Open Data Sources

    DTIC Science & Technology

    2011-08-30

    data. The state-of-art search engines are designed to help general query-specific search and not suitable for finding disconnected online groups. The...groups, (2) developing innovative mathematical and statistical models and efficient algorithms that leverage existing search engines and employ

  14. Risk model of prolonged intensive care unit stay in Chinese patients undergoing heart valve surgery.

    PubMed

    Wang, Chong; Zhang, Guan-xin; Zhang, Hao; Lu, Fang-lin; Li, Bai-ling; Xu, Ji-bin; Han, Lin; Xu, Zhi-yun

    2012-11-01

    The aim of this study was to develop a preoperative risk prediction model and an scorecard for prolonged intensive care unit length of stay (PrlICULOS) in adult patients undergoing heart valve surgery. This is a retrospective observational study of collected data on 3925 consecutive patients older than 18 years, who had undergone heart valve surgery between January 2000 and December 2010. Data were randomly split into a development dataset (n=2401) and a validation dataset (n=1524). A multivariate logistic regression analysis was undertaken using the development dataset to identify independent risk factors for PrlICULOS. Performance of the model was then assessed by observed and expected rates of PrlICULOS on the development and validation dataset. Model calibration and discriminatory ability were analysed by the Hosmer-Lemeshow goodness-of-fit statistic and the area under the receiver operating characteristic (ROC) curve, respectively. There were 491 patients that required PrlICULOS (12.5%). Preoperative independent predictors of PrlICULOS are shown with odds ratio as follows: (1) age, 1.4; (2) chronic obstructive pulmonary disease (COPD), 1.8; (3) atrial fibrillation, 1.4; (4) left bundle branch block, 2.7; (5) ejection fraction, 1.4; (6) left ventricle weight, 1.5; (7) New York Heart Association class III-IV, 1.8; (8) critical preoperative state, 2.0; (9) perivalvular leakage, 6.4; (10) tricuspid valve replacement, 3.8; (11) concurrent CABG, 2.8; and (12) concurrent other cardiac surgery, 1.8. The Hosmer-Lemeshow goodness-of-fit statistic was not statistically significant in both development and validation dataset (P=0.365 vs P=0.310). The ROC curve for the prediction of PrlICULOS in development and validation dataset was 0.717 and 0.700, respectively. We developed and validated a local risk prediction model for PrlICULOS after adult heart valve surgery. This model can be used to calculate patient-specific risk with an equivalent predicted risk at our centre in future clinical practice. Copyright © 2012 Australian and New Zealand Society of Cardiac and Thoracic Surgeons (ANZSCTS) and the Cardiac Society of Australia and New Zealand (CSANZ). Published by Elsevier B.V. All rights reserved.

  15. Statistical methods in personality assessment research.

    PubMed

    Schinka, J A; LaLone, L; Broeckel, J A

    1997-06-01

    Emerging models of personality structure and advances in the measurement of personality and psychopathology suggest that research in personality and personality assessment has entered a stage of advanced development, in this article we examine whether researchers in these areas have taken advantage of new and evolving statistical procedures. We conducted a review of articles published in the Journal of Personality, Assessment during the past 5 years. Of the 449 articles that included some form of data analysis, 12.7% used only descriptive statistics, most employed only univariate statistics, and fewer than 10% used multivariate methods of data analysis. We discuss the cost of using limited statistical methods, the possible reasons for the apparent reluctance to employ advanced statistical procedures, and potential solutions to this technical shortcoming.

  16. Statistical fluctuations in pedestrian evacuation times and the effect of social contagion

    NASA Astrophysics Data System (ADS)

    Nicolas, Alexandre; Bouzat, Sebastián; Kuperman, Marcelo N.

    2016-08-01

    Mathematical models of pedestrian evacuation and the associated simulation software have become essential tools for the assessment of the safety of public facilities and buildings. While a variety of models is now available, their calibration and test against empirical data are generally restricted to global averaged quantities; the statistics compiled from the time series of individual escapes ("microscopic" statistics) measured in recent experiments are thus overlooked. In the same spirit, much research has primarily focused on the average global evacuation time, whereas the whole distribution of evacuation times over some set of realizations should matter. In the present paper we propose and discuss the validity of a simple relation between this distribution and the microscopic statistics, which is theoretically valid in the absence of correlations. To this purpose, we develop a minimal cellular automaton, with features that afford a semiquantitative reproduction of the experimental microscopic statistics. We then introduce a process of social contagion of impatient behavior in the model and show that the simple relation under test may dramatically fail at high contagion strengths, the latter being responsible for the emergence of strong correlations in the system. We conclude with comments on the potential practical relevance for safety science of calculations based on microscopic statistics.

  17. Statistical dependency in visual scanning

    NASA Technical Reports Server (NTRS)

    Ellis, Stephen R.; Stark, Lawrence

    1986-01-01

    A method to identify statistical dependencies in the positions of eye fixations is developed and applied to eye movement data from subjects who viewed dynamic displays of air traffic and judged future relative position of aircraft. Analysis of approximately 23,000 fixations on points of interest on the display identified statistical dependencies in scanning that were independent of the physical placement of the points of interest. Identification of these dependencies is inconsistent with random-sampling-based theories used to model visual search and information seeking.

  18. Curve fitting and modeling with splines using statistical variable selection techniques

    NASA Technical Reports Server (NTRS)

    Smith, P. L.

    1982-01-01

    The successful application of statistical variable selection techniques to fit splines is demonstrated. Major emphasis is given to knot selection, but order determination is also discussed. Two FORTRAN backward elimination programs, using the B-spline basis, were developed. The program for knot elimination is compared in detail with two other spline-fitting methods and several statistical software packages. An example is also given for the two-variable case using a tensor product basis, with a theoretical discussion of the difficulties of their use.

  19. Statistical appearance models based on probabilistic correspondences.

    PubMed

    Krüger, Julia; Ehrhardt, Jan; Handels, Heinz

    2017-04-01

    Model-based image analysis is indispensable in medical image processing. One key aspect of building statistical shape and appearance models is the determination of one-to-one correspondences in the training data set. At the same time, the identification of these correspondences is the most challenging part of such methods. In our earlier work, we developed an alternative method using correspondence probabilities instead of exact one-to-one correspondences for a statistical shape model (Hufnagel et al., 2008). In this work, a new approach for statistical appearance models without one-to-one correspondences is proposed. A sparse image representation is used to build a model that combines point position and appearance information at the same time. Probabilistic correspondences between the derived multi-dimensional feature vectors are used to omit the need for extensive preprocessing of finding landmarks and correspondences as well as to reduce the dependence of the generated model on the landmark positions. Model generation and model fitting can now be expressed by optimizing a single global criterion derived from a maximum a-posteriori (MAP) approach with respect to model parameters that directly affect both shape and appearance of the considered objects inside the images. The proposed approach describes statistical appearance modeling in a concise and flexible mathematical framework. Besides eliminating the demand for costly correspondence determination, the method allows for additional constraints as topological regularity in the modeling process. In the evaluation the model was applied for segmentation and landmark identification in hand X-ray images. The results demonstrate the feasibility of the model to detect hand contours as well as the positions of the joints between finger bones for unseen test images. Further, we evaluated the model on brain data of stroke patients to show the ability of the proposed model to handle partially corrupted data and to demonstrate a possible employment of the correspondence probabilities to indicate these corrupted/pathological areas. Copyright © 2017 Elsevier B.V. All rights reserved.

  20. A high-frequency warm shallow water acoustic communications channel model and measurements.

    PubMed

    Chitre, Mandar

    2007-11-01

    Underwater acoustic communication is a core enabling technology with applications in ocean monitoring using remote sensors and autonomous underwater vehicles. One of the more challenging underwater acoustic communication channels is the medium-range very shallow warm-water channel, common in tropical coastal regions. This channel exhibits two key features-extensive time-varying multipath and high levels of non-Gaussian ambient noise due to snapping shrimp-both of which limit the performance of traditional communication techniques. A good understanding of the communications channel is key to the design of communication systems. It aids in the development of signal processing techniques as well as in the testing of the techniques via simulation. In this article, a physics-based channel model for the very shallow warm-water acoustic channel at high frequencies is developed, which are of interest to medium-range communication system developers. The model is based on ray acoustics and includes time-varying statistical effects as well as non-Gaussian ambient noise statistics observed during channel studies. The model is calibrated and its accuracy validated using measurements made at sea.

  1. Students' Emergent Articulations of Statistical Models and Modeling in Making Informal Statistical Inferences

    ERIC Educational Resources Information Center

    Braham, Hana Manor; Ben-Zvi, Dani

    2017-01-01

    A fundamental aspect of statistical inference is representation of real-world data using statistical models. This article analyzes students' articulations of statistical models and modeling during their first steps in making informal statistical inferences. An integrated modeling approach (IMA) was designed and implemented to help students…

  2. Sex-specific developmental models for Creophilus maxillosus (L.) (Coleoptera: Staphylinidae): searching for larger accuracy of insect age estimates.

    PubMed

    Frątczak-Łagiewska, Katarzyna; Matuszewski, Szymon

    2018-05-01

    Differences in size between males and females, called the sexual size dimorphism, are common in insects. These differences may be followed by differences in the duration of development. Accordingly, it is believed that insect sex may be used to increase the accuracy of insect age estimates in forensic entomology. Here, the sex-specific differences in the development of Creophilus maxillosus were studied at seven constant temperatures. We have also created separate developmental models for males and females of C. maxillosus and tested them in a validation study to answer a question whether sex-specific developmental models improve the accuracy of insect age estimates. Results demonstrate that males of C. maxillosus developed significantly longer than females. The sex-specific and general models for the total immature development had the same optimal temperature range and similar developmental threshold but different thermal constant K, which was the largest in the case of the male-specific model and the smallest in the case of the female-specific model. Despite these differences, validation study revealed just minimal and statistically insignificant differences in the accuracy of age estimates using sex-specific and general thermal summation models. This finding indicates that in spite of statistically significant differences in the duration of immature development between females and males of C. maxillosus, there is no increase in the accuracy of insect age estimates while using the sex-specific thermal summation models compared to the general model. Accordingly, this study does not support the use of sex-specific developmental data for the estimation of insect age in forensic entomology.

  3. Hierarchical Bayesian spatial models for multispecies conservation planning and monitoring

    Treesearch

    Carlos Carroll; Devin S. Johnson; Jeffrey R. Dunk; William J. Zielinski

    2010-01-01

    Biologists who develop and apply habitat models are often familiar with the statistical challenges posed by their data’s spatial structure but are unsure of whether the use of complex spatial models will increase the utility of model results in planning. We compared the relative performance of nonspatial and hierarchical Bayesian spatial models for three vertebrate and...

  4. Development of a Stochastically-driven, Forward Predictive Performance Model for PEMFCs

    NASA Astrophysics Data System (ADS)

    Harvey, David Benjamin Paul

    A one-dimensional multi-scale coupled, transient, and mechanistic performance model for a PEMFC membrane electrode assembly has been developed. The model explicitly includes each of the 5 layers within a membrane electrode assembly and solves for the transport of charge, heat, mass, species, dissolved water, and liquid water. Key features of the model include the use of a multi-step implementation of the HOR reaction on the anode, agglomerate catalyst sub-models for both the anode and cathode catalyst layers, a unique approach that links the composition of the catalyst layer to key properties within the agglomerate model and the implementation of a stochastic input-based approach for component material properties. The model employs a new methodology for validation using statistically varying input parameters and statistically-based experimental performance data; this model represents the first stochastic input driven unit cell performance model. The stochastic input driven performance model was used to identify optimal ionomer content within the cathode catalyst layer, demonstrate the role of material variation in potential low performing MEA materials, provide explanation for the performance of low-Pt loaded MEAs, and investigate the validity of transient-sweep experimental diagnostic methods.

  5. Effect of Professional Development on Classroom Practices in Some Selected Saudi Universities

    ERIC Educational Resources Information Center

    Alghamdi, AbdulKhaliq Hajjad; Bin Sihes, Ahmad Johari

    2016-01-01

    "Scientific studies found the impact of professional development on effective classroom practices in Higher Education." This paper hypothesizes no statistically significant effect of lecturers' professional development on classroom practices in some selected Saudi Universities not as highlighted in the model. Hierarchical multiple…

  6. Development of a design space and predictive statistical model for capsule filling of low-fill-weight inhalation products.

    PubMed

    Faulhammer, E; Llusa, M; Wahl, P R; Paudel, A; Lawrence, S; Biserni, S; Calzolari, V; Khinast, J G

    2016-01-01

    The objectives of this study were to develop a predictive statistical model for low-fill-weight capsule filling of inhalation products with dosator nozzles via the quality by design (QbD) approach and based on that to create refined models that include quadratic terms for significant parameters. Various controllable process parameters and uncontrolled material attributes of 12 powders were initially screened using a linear model with partial least square (PLS) regression to determine their effect on the critical quality attributes (CQA; fill weight and weight variability). After identifying critical material attributes (CMAs) and critical process parameters (CPPs) that influenced the CQA, model refinement was performed to study if interactions or quadratic terms influence the model. Based on the assessment of the effects of the CPPs and CMAs on fill weight and weight variability for low-fill-weight inhalation products, we developed an excellent linear predictive model for fill weight (R(2 )= 0.96, Q(2 )= 0.96 for powders with good flow properties and R(2 )= 0.94, Q(2 )= 0.93 for cohesive powders) and a model that provides a good approximation of the fill weight variability for each powder group. We validated the model, established a design space for the performance of different types of inhalation grade lactose on low-fill weight capsule filling and successfully used the CMAs and CPPs to predict fill weight of powders that were not included in the development set.

  7. Spatial landscape model to characterize biological diversity using R statistical computing environment.

    PubMed

    Singh, Hariom; Garg, R D; Karnatak, Harish C; Roy, Arijit

    2018-01-15

    Due to urbanization and population growth, the degradation of natural forests and associated biodiversity are now widely recognized as a global environmental concern. Hence, there is an urgent need for rapid assessment and monitoring of biodiversity on priority using state-of-art tools and technologies. The main purpose of this research article is to develop and implement a new methodological approach to characterize biological diversity using spatial model developed during the study viz. Spatial Biodiversity Model (SBM). The developed model is scale, resolution and location independent solution for spatial biodiversity richness modelling. The platform-independent computation model is based on parallel computation. The biodiversity model based on open-source software has been implemented on R statistical computing platform. It provides information on high disturbance and high biological richness areas through different landscape indices and site specific information (e.g. forest fragmentation (FR), disturbance index (DI) etc.). The model has been developed based on the case study of Indian landscape; however it can be implemented in any part of the world. As a case study, SBM has been tested for Uttarakhand state in India. Inputs for landscape ecology are derived through multi-criteria decision making (MCDM) techniques in an interactive command line environment. MCDM with sensitivity analysis in spatial domain has been carried out to illustrate the model stability and robustness. Furthermore, spatial regression analysis has been made for the validation of the output. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. A statistical shape model of the human second cervical vertebra.

    PubMed

    Clogenson, Marine; Duff, John M; Luethi, Marcel; Levivier, Marc; Meuli, Reto; Baur, Charles; Henein, Simon

    2015-07-01

    Statistical shape and appearance models play an important role in reducing the segmentation processing time of a vertebra and in improving results for 3D model development. Here, we describe the different steps in generating a statistical shape model (SSM) of the second cervical vertebra (C2) and provide the shape model for general use by the scientific community. The main difficulties in its construction are the morphological complexity of the C2 and its variability in the population. The input dataset is composed of manually segmented anonymized patient computerized tomography (CT) scans. The alignment of the different datasets is done with the procrustes alignment on surface models, and then, the registration is cast as a model-fitting problem using a Gaussian process. A principal component analysis (PCA)-based model is generated which includes the variability of the C2. The SSM was generated using 92 CT scans. The resulting SSM was evaluated for specificity, compactness and generalization ability. The SSM of the C2 is freely available to the scientific community in Slicer (an open source software for image analysis and scientific visualization) with a module created to visualize the SSM using Statismo, a framework for statistical shape modeling. The SSM of the vertebra allows the shape variability of the C2 to be represented. Moreover, the SSM will enable semi-automatic segmentation and 3D model generation of the vertebra, which would greatly benefit surgery planning.

  9. Development of a statistical oil spill model for risk assessment.

    PubMed

    Guo, Weijun

    2017-11-01

    To gain a better understanding of the impacts from potential risk sources, we developed an oil spill model using probabilistic method, which simulates numerous oil spill trajectories under varying environmental conditions. The statistical results were quantified from hypothetical oil spills under multiple scenarios, including area affected probability, mean oil slick thickness, and duration of water surface exposed to floating oil. The three sub-indices together with marine area vulnerability are merged to compute the composite index, characterizing the spatial distribution of risk degree. Integral of the index can be used to identify the overall risk from an emission source. The developed model has been successfully applied in comparison to and selection of an appropriate oil port construction location adjacent to a marine protected area for Phoca largha in China. The results highlight the importance of selection of candidates before project construction, since that risk estimation from two adjacent potential sources may turn out to be significantly different regarding hydrodynamic conditions and eco-environmental sensitivity. Copyright © 2017. Published by Elsevier Ltd.

  10. Modeling Zero-Inflated and Overdispersed Count Data: An Empirical Study of School Suspensions

    ERIC Educational Resources Information Center

    Desjardins, Christopher David

    2016-01-01

    The purpose of this article is to develop a statistical model that best explains variability in the number of school days suspended. Number of school days suspended is a count variable that may be zero-inflated and overdispersed relative to a Poisson model. Four models were examined: Poisson, negative binomial, Poisson hurdle, and negative…

  11. A Simple Model for Estimating Total and Merchantable Tree Heights

    Treesearch

    Alan R. Ek; Earl T. Birdsall; Rebecca J. Spears

    1984-01-01

    A model is described for estimating total and merchantable tree heights for Lake States tree species. It is intended to be used for compiling forest survey data and in conjunction with growth models for developing projections of tree product yield. Model coefficients are given for 25 species along with fit statistics. Supporting data sets are also described.

  12. A Bayesian statistical analysis of mouse dermal tumor promotion assay data for evaluating cigarette smoke condensate.

    PubMed

    Kathman, Steven J; Potts, Ryan J; Ayres, Paul H; Harp, Paul R; Wilson, Cody L; Garner, Charles D

    2010-10-01

    The mouse dermal assay has long been used to assess the dermal tumorigenicity of cigarette smoke condensate (CSC). This mouse skin model has been developed for use in carcinogenicity testing utilizing the SENCAR mouse as the standard strain. Though the model has limitations, it remains as the most relevant method available to study the dermal tumor promoting potential of mainstream cigarette smoke. In the typical SENCAR mouse CSC bioassay, CSC is applied for 29 weeks following the application of a tumor initiator such as 7,12-dimethylbenz[a]anthracene (DMBA). Several endpoints are considered for analysis including: the percentage of animals with at least one mass, latency, and number of masses per animal. In this paper, a relatively straightforward analytic model and procedure is presented for analyzing the time course of the incidence of masses. The procedure considered here takes advantage of Bayesian statistical techniques, which provide powerful methods for model fitting and simulation. Two datasets are analyzed to illustrate how the model fits the data, how well the model may perform in predicting data from such trials, and how the model may be used as a decision tool when comparing the dermal tumorigenicity of cigarette smoke condensate from multiple cigarette types. The analysis presented here was developed as a statistical decision tool for differentiating between two or more prototype products based on the dermal tumorigenicity. Copyright (c) 2010 Elsevier Inc. All rights reserved.

  13. Silica exposure during construction activities: statistical modeling of task-based measurements from the literature.

    PubMed

    Sauvé, Jean-François; Beaudry, Charles; Bégin, Denis; Dion, Chantal; Gérin, Michel; Lavoué, Jérôme

    2013-05-01

    Many construction activities can put workers at risk of breathing silica containing dusts, and there is an important body of literature documenting exposure levels using a task-based strategy. In this study, statistical modeling was used to analyze a data set containing 1466 task-based, personal respirable crystalline silica (RCS) measurements gathered from 46 sources to estimate exposure levels during construction tasks and the effects of determinants of exposure. Monte-Carlo simulation was used to recreate individual exposures from summary parameters, and the statistical modeling involved multimodel inference with Tobit models containing combinations of the following exposure variables: sampling year, sampling duration, construction sector, project type, workspace, ventilation, and controls. Exposure levels by task were predicted based on the median reported duration by activity, the year 1998, absence of source control methods, and an equal distribution of the other determinants of exposure. The model containing all the variables explained 60% of the variability and was identified as the best approximating model. Of the 27 tasks contained in the data set, abrasive blasting, masonry chipping, scabbling concrete, tuck pointing, and tunnel boring had estimated geometric means above 0.1mg m(-3) based on the exposure scenario developed. Water-fed tools and local exhaust ventilation were associated with a reduction of 71 and 69% in exposure levels compared with no controls, respectively. The predictive model developed can be used to estimate RCS concentrations for many construction activities in a wide range of circumstances.

  14. A multibody knee model with discrete cartilage prediction of tibio-femoral contact mechanics.

    PubMed

    Guess, Trent M; Liu, Hongzeng; Bhashyam, Sampath; Thiagarajan, Ganesh

    2013-01-01

    Combining musculoskeletal simulations with anatomical joint models capable of predicting cartilage contact mechanics would provide a valuable tool for studying the relationships between muscle force and cartilage loading. As a step towards producing multibody musculoskeletal models that include representation of cartilage tissue mechanics, this research developed a subject-specific multibody knee model that represented the tibia plateau cartilage as discrete rigid bodies that interacted with the femur through deformable contacts. Parameters for the compliant contact law were derived using three methods: (1) simplified Hertzian contact theory, (2) simplified elastic foundation contact theory and (3) parameter optimisation from a finite element (FE) solution. The contact parameters and contact friction were evaluated during a simulated walk in a virtual dynamic knee simulator, and the resulting kinematics were compared with measured in vitro kinematics. The effects on predicted contact pressures and cartilage-bone interface shear forces during the simulated walk were also evaluated. The compliant contact stiffness parameters had a statistically significant effect on predicted contact pressures as well as all tibio-femoral motions except flexion-extension. The contact friction was not statistically significant to contact pressures, but was statistically significant to medial-lateral translation and all rotations except flexion-extension. The magnitude of kinematic differences between model formulations was relatively small, but contact pressure predictions were sensitive to model formulation. The developed multibody knee model was computationally efficient and had a computation time 283 times faster than a FE simulation using the same geometries and boundary conditions.

  15. A three-step approach for the derivation and validation of high-performing predictive models using an operational dataset: congestive heart failure readmission case study.

    PubMed

    AbdelRahman, Samir E; Zhang, Mingyuan; Bray, Bruce E; Kawamoto, Kensaku

    2014-05-27

    The aim of this study was to propose an analytical approach to develop high-performing predictive models for congestive heart failure (CHF) readmission using an operational dataset with incomplete records and changing data over time. Our analytical approach involves three steps: pre-processing, systematic model development, and risk factor analysis. For pre-processing, variables that were absent in >50% of records were removed. Moreover, the dataset was divided into a validation dataset and derivation datasets which were separated into three temporal subsets based on changes to the data over time. For systematic model development, using the different temporal datasets and the remaining explanatory variables, the models were developed by combining the use of various (i) statistical analyses to explore the relationships between the validation and the derivation datasets; (ii) adjustment methods for handling missing values; (iii) classifiers; (iv) feature selection methods; and (iv) discretization methods. We then selected the best derivation dataset and the models with the highest predictive performance. For risk factor analysis, factors in the highest-performing predictive models were analyzed and ranked using (i) statistical analyses of the best derivation dataset, (ii) feature rankers, and (iii) a newly developed algorithm to categorize risk factors as being strong, regular, or weak. The analysis dataset consisted of 2,787 CHF hospitalizations at University of Utah Health Care from January 2003 to June 2013. In this study, we used the complete-case analysis and mean-based imputation adjustment methods; the wrapper subset feature selection method; and four ranking strategies based on information gain, gain ratio, symmetrical uncertainty, and wrapper subset feature evaluators. The best-performing models resulted from the use of a complete-case analysis derivation dataset combined with the Class-Attribute Contingency Coefficient discretization method and a voting classifier which averaged the results of multi-nominal logistic regression and voting feature intervals classifiers. Of 42 final model risk factors, discharge disposition, discretized age, and indicators of anemia were the most significant. This model achieved a c-statistic of 86.8%. The proposed three-step analytical approach enhanced predictive model performance for CHF readmissions. It could potentially be leveraged to improve predictive model performance in other areas of clinical medicine.

  16. Calculating phase equilibrium properties of plasma pseudopotential model using hybrid Gibbs statistical ensemble Monte-Carlo technique

    NASA Astrophysics Data System (ADS)

    Butlitsky, M. A.; Zelener, B. B.; Zelener, B. V.

    2015-11-01

    Earlier a two-component pseudopotential plasma model, which we called a “shelf Coulomb” model has been developed. A Monte-Carlo study of canonical NVT ensemble with periodic boundary conditions has been undertaken to calculate equations of state, pair distribution functions, internal energies and other thermodynamics properties of the model. In present work, an attempt is made to apply so-called hybrid Gibbs statistical ensemble Monte-Carlo technique to this model. First simulation results data show qualitatively similar results for critical point region for both methods. Gibbs ensemble technique let us to estimate the melting curve position and a triple point of the model (in reduced temperature and specific volume coordinates): T* ≈ 0.0476, v* ≈ 6 × 10-4.

  17. Detecting temporal change in freshwater fisheries surveys: statistical power and the important linkages between management questions and monitoring objectives

    USGS Publications Warehouse

    Wagner, Tyler; Irwin, Brian J.; James R. Bence,; Daniel B. Hayes,

    2016-01-01

    Monitoring to detect temporal trends in biological and habitat indices is a critical component of fisheries management. Thus, it is important that management objectives are linked to monitoring objectives. This linkage requires a definition of what constitutes a management-relevant “temporal trend.” It is also important to develop expectations for the amount of time required to detect a trend (i.e., statistical power) and for choosing an appropriate statistical model for analysis. We provide an overview of temporal trends commonly encountered in fisheries management, review published studies that evaluated statistical power of long-term trend detection, and illustrate dynamic linear models in a Bayesian context, as an additional analytical approach focused on shorter term change. We show that monitoring programs generally have low statistical power for detecting linear temporal trends and argue that often management should be focused on different definitions of trends, some of which can be better addressed by alternative analytical approaches.

  18. Preparing systems engineering and computing science students in disciplined methods, quantitative, and advanced statistical techniques to improve process performance

    NASA Astrophysics Data System (ADS)

    McCray, Wilmon Wil L., Jr.

    The research was prompted by a need to conduct a study that assesses process improvement, quality management and analytical techniques taught to students in U.S. colleges and universities undergraduate and graduate systems engineering and the computing science discipline (e.g., software engineering, computer science, and information technology) degree programs during their academic training that can be applied to quantitatively manage processes for performance. Everyone involved in executing repeatable processes in the software and systems development lifecycle processes needs to become familiar with the concepts of quantitative management, statistical thinking, process improvement methods and how they relate to process-performance. Organizations are starting to embrace the de facto Software Engineering Institute (SEI) Capability Maturity Model Integration (CMMI RTM) Models as process improvement frameworks to improve business processes performance. High maturity process areas in the CMMI model imply the use of analytical, statistical, quantitative management techniques, and process performance modeling to identify and eliminate sources of variation, continually improve process-performance; reduce cost and predict future outcomes. The research study identifies and provides a detail discussion of the gap analysis findings of process improvement and quantitative analysis techniques taught in U.S. universities systems engineering and computing science degree programs, gaps that exist in the literature, and a comparison analysis which identifies the gaps that exist between the SEI's "healthy ingredients " of a process performance model and courses taught in U.S. universities degree program. The research also heightens awareness that academicians have conducted little research on applicable statistics and quantitative techniques that can be used to demonstrate high maturity as implied in the CMMI models. The research also includes a Monte Carlo simulation optimization model and dashboard that demonstrates the use of statistical methods, statistical process control, sensitivity analysis, quantitative and optimization techniques to establish a baseline and predict future customer satisfaction index scores (outcomes). The American Customer Satisfaction Index (ACSI) model and industry benchmarks were used as a framework for the simulation model.

  19. Hunting Solomonoff's Swans: Exploring the Boundary Between Physics and Statistics in Hydrological Modeling

    NASA Astrophysics Data System (ADS)

    Nearing, G. S.

    2014-12-01

    Statistical models consistently out-perform conceptual models in the short term, however to account for a nonstationary future (or an unobserved past) scientists prefer to base predictions on unchanging and commutable properties of the universe - i.e., physics. The problem with physically-based hydrology models is, of course, that they aren't really based on physics - they are based on statistical approximations of physical interactions, and we almost uniformly lack an understanding of the entropy associated with these approximations. Thermodynamics is successful precisely because entropy statistics are computable for homogeneous (well-mixed) systems, and ergodic arguments explain the success of Newton's laws to describe systems that are fundamentally quantum in nature. Unfortunately, similar arguments do not hold for systems like watersheds that are heterogeneous at a wide range of scales. Ray Solomonoff formalized the situation in 1968 by showing that given infinite evidence, simultaneously minimizing model complexity and entropy in predictions always leads to the best possible model. The open question in hydrology is about what happens when we don't have infinite evidence - for example, when the future will not look like the past, or when one watershed does not behave like another. How do we isolate stationary and commutable components of watershed behavior? I propose that one possible answer to this dilemma lies in a formal combination of physics and statistics. In this talk I outline my recent analogue (Solomonoff's theorem was digital) of Solomonoff's idea that allows us to quantify the complexity/entropy tradeoff in a way that is intuitive to physical scientists. I show how to formally combine "physical" and statistical methods for model development in a way that allows us to derive the theoretically best possible model given any given physics approximation(s) and available observations. Finally, I apply an analogue of Solomonoff's theorem to evaluate the tradeoff between model complexity and prediction power.

  20. Process air quality data

    NASA Technical Reports Server (NTRS)

    Butler, C. M.; Hogge, J. E.

    1978-01-01

    Air quality sampling was conducted. Data for air quality parameters, recorded on written forms, punched cards or magnetic tape, are available for 1972 through 1975. Computer software was developed to (1) calculate several daily statistical measures of location, (2) plot time histories of data or the calculated daily statistics, (3) calculate simple correlation coefficients, and (4) plot scatter diagrams. Computer software was developed for processing air quality data to include time series analysis and goodness of fit tests. Computer software was developed to (1) calculate a larger number of daily statistical measures of location, and a number of daily monthly and yearly measures of location, dispersion, skewness and kurtosis, (2) decompose the extended time series model and (3) perform some goodness of fit tests. The computer program is described, documented and illustrated by examples. Recommendations are made for continuation of the development of research on processing air quality data.

Top