Sample records for regression models fitted

  1. A new approach to correct the QT interval for changes in heart rate using a nonparametric regression model in beagle dogs.

    PubMed

    Watanabe, Hiroyuki; Miyazaki, Hiroyasu

    2006-01-01

    Over- and/or under-correction of QT intervals for changes in heart rate may lead to misleading conclusions and/or masking the potential of a drug to prolong the QT interval. This study examines a nonparametric regression model (Loess Smoother) to adjust the QT interval for differences in heart rate, with an improved fitness over a wide range of heart rates. 240 sets of (QT, RR) observations collected from each of 8 conscious and non-treated beagle dogs were used as the materials for investigation. The fitness of the nonparametric regression model to the QT-RR relationship was compared with four models (individual linear regression, common linear regression, and Bazett's and Fridericia's correlation models) with reference to Akaike's Information Criterion (AIC). Residuals were visually assessed. The bias-corrected AIC of the nonparametric regression model was the best of the models examined in this study. Although the parametric models did not fit, the nonparametric regression model improved the fitting at both fast and slow heart rates. The nonparametric regression model is the more flexible method compared with the parametric method. The mathematical fit for linear regression models was unsatisfactory at both fast and slow heart rates, while the nonparametric regression model showed significant improvement at all heart rates in beagle dogs.

  2. SPSS macros to compare any two fitted values from a regression model.

    PubMed

    Weaver, Bruce; Dubois, Sacha

    2012-12-01

    In regression models with first-order terms only, the coefficient for a given variable is typically interpreted as the change in the fitted value of Y for a one-unit increase in that variable, with all other variables held constant. Therefore, each regression coefficient represents the difference between two fitted values of Y. But the coefficients represent only a fraction of the possible fitted value comparisons that might be of interest to researchers. For many fitted value comparisons that are not captured by any of the regression coefficients, common statistical software packages do not provide the standard errors needed to compute confidence intervals or carry out statistical tests-particularly in more complex models that include interactions, polynomial terms, or regression splines. We describe two SPSS macros that implement a matrix algebra method for comparing any two fitted values from a regression model. The !OLScomp and !MLEcomp macros are for use with models fitted via ordinary least squares and maximum likelihood estimation, respectively. The output from the macros includes the standard error of the difference between the two fitted values, a 95% confidence interval for the difference, and a corresponding statistical test with its p-value.

  3. Evaluation of weighted regression and sample size in developing a taper model for loblolly pine

    Treesearch

    Kenneth L. Cormier; Robin M. Reich; Raymond L. Czaplewski; William A. Bechtold

    1992-01-01

    A stem profile model, fit using pseudo-likelihood weighted regression, was used to estimate merchantable volume of loblolly pine (Pinus taeda L.) in the southeast. The weighted regression increased model fit marginally, but did not substantially increase model performance. In all cases, the unweighted regression models performed as well as the...

  4. Hidden Connections between Regression Models of Strain-Gage Balance Calibration Data

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert

    2013-01-01

    Hidden connections between regression models of wind tunnel strain-gage balance calibration data are investigated. These connections become visible whenever balance calibration data is supplied in its design format and both the Iterative and Non-Iterative Method are used to process the data. First, it is shown how the regression coefficients of the fitted balance loads of a force balance can be approximated by using the corresponding regression coefficients of the fitted strain-gage outputs. Then, data from the manual calibration of the Ames MK40 six-component force balance is chosen to illustrate how estimates of the regression coefficients of the fitted balance loads can be obtained from the regression coefficients of the fitted strain-gage outputs. The study illustrates that load predictions obtained by applying the Iterative or the Non-Iterative Method originate from two related regression solutions of the balance calibration data as long as balance loads are given in the design format of the balance, gage outputs behave highly linear, strict statistical quality metrics are used to assess regression models of the data, and regression model term combinations of the fitted loads and gage outputs can be obtained by a simple variable exchange.

  5. An Entropy-Based Measure for Assessing Fuzziness in Logistic Regression

    PubMed Central

    Weiss, Brandi A.; Dardick, William

    2015-01-01

    This article introduces an entropy-based measure of data–model fit that can be used to assess the quality of logistic regression models. Entropy has previously been used in mixture-modeling to quantify how well individuals are classified into latent classes. The current study proposes the use of entropy for logistic regression models to quantify the quality of classification and separation of group membership. Entropy complements preexisting measures of data–model fit and provides unique information not contained in other measures. Hypothetical data scenarios, an applied example, and Monte Carlo simulation results are used to demonstrate the application of entropy in logistic regression. Entropy should be used in conjunction with other measures of data–model fit to assess how well logistic regression models classify cases into observed categories. PMID:29795897

  6. An Entropy-Based Measure for Assessing Fuzziness in Logistic Regression.

    PubMed

    Weiss, Brandi A; Dardick, William

    2016-12-01

    This article introduces an entropy-based measure of data-model fit that can be used to assess the quality of logistic regression models. Entropy has previously been used in mixture-modeling to quantify how well individuals are classified into latent classes. The current study proposes the use of entropy for logistic regression models to quantify the quality of classification and separation of group membership. Entropy complements preexisting measures of data-model fit and provides unique information not contained in other measures. Hypothetical data scenarios, an applied example, and Monte Carlo simulation results are used to demonstrate the application of entropy in logistic regression. Entropy should be used in conjunction with other measures of data-model fit to assess how well logistic regression models classify cases into observed categories.

  7. Regression Models for Identifying Noise Sources in Magnetic Resonance Images

    PubMed Central

    Zhu, Hongtu; Li, Yimei; Ibrahim, Joseph G.; Shi, Xiaoyan; An, Hongyu; Chen, Yashen; Gao, Wei; Lin, Weili; Rowe, Daniel B.; Peterson, Bradley S.

    2009-01-01

    Stochastic noise, susceptibility artifacts, magnetic field and radiofrequency inhomogeneities, and other noise components in magnetic resonance images (MRIs) can introduce serious bias into any measurements made with those images. We formally introduce three regression models including a Rician regression model and two associated normal models to characterize stochastic noise in various magnetic resonance imaging modalities, including diffusion-weighted imaging (DWI) and functional MRI (fMRI). Estimation algorithms are introduced to maximize the likelihood function of the three regression models. We also develop a diagnostic procedure for systematically exploring MR images to identify noise components other than simple stochastic noise, and to detect discrepancies between the fitted regression models and MRI data. The diagnostic procedure includes goodness-of-fit statistics, measures of influence, and tools for graphical display. The goodness-of-fit statistics can assess the key assumptions of the three regression models, whereas measures of influence can isolate outliers caused by certain noise components, including motion artifacts. The tools for graphical display permit graphical visualization of the values for the goodness-of-fit statistic and influence measures. Finally, we conduct simulation studies to evaluate performance of these methods, and we analyze a real dataset to illustrate how our diagnostic procedure localizes subtle image artifacts by detecting intravoxel variability that is not captured by the regression models. PMID:19890478

  8. Premium analysis for copula model: A case study for Malaysian motor insurance claims

    NASA Astrophysics Data System (ADS)

    Resti, Yulia; Ismail, Noriszura; Jaaman, Saiful Hafizah

    2014-06-01

    This study performs premium analysis for copula models with regression marginals. For illustration purpose, the copula models are fitted to the Malaysian motor insurance claims data. In this study, we consider copula models from Archimedean and Elliptical families, and marginal distributions of Gamma and Inverse Gaussian regression models. The simulated results from independent model, which is obtained from fitting regression models separately to each claim category, and dependent model, which is obtained from fitting copula models to all claim categories, are compared. The results show that the dependent model using Frank copula is the best model since the risk premiums estimated under this model are closely approximate to the actual claims experience relative to the other copula models.

  9. Regression-Based Norms for a Bi-factor Model for Scoring the Brief Test of Adult Cognition by Telephone (BTACT).

    PubMed

    Gurnani, Ashita S; John, Samantha E; Gavett, Brandon E

    2015-05-01

    The current study developed regression-based normative adjustments for a bi-factor model of the The Brief Test of Adult Cognition by Telephone (BTACT). Archival data from the Midlife Development in the United States-II Cognitive Project were used to develop eight separate linear regression models that predicted bi-factor BTACT scores, accounting for age, education, gender, and occupation-alone and in various combinations. All regression models provided statistically significant fit to the data. A three-predictor regression model fit best and accounted for 32.8% of the variance in the global bi-factor BTACT score. The fit of the regression models was not improved by gender. Eight different regression models are presented to allow the user flexibility in applying demographic corrections to the bi-factor BTACT scores. Occupation corrections, while not widely used, may provide useful demographic adjustments for adult populations or for those individuals who have attained an occupational status not commensurate with expected educational attainment. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  10. Goodness-Of-Fit Test for Nonparametric Regression Models: Smoothing Spline ANOVA Models as Example.

    PubMed

    Teran Hidalgo, Sebastian J; Wu, Michael C; Engel, Stephanie M; Kosorok, Michael R

    2018-06-01

    Nonparametric regression models do not require the specification of the functional form between the outcome and the covariates. Despite their popularity, the amount of diagnostic statistics, in comparison to their parametric counter-parts, is small. We propose a goodness-of-fit test for nonparametric regression models with linear smoother form. In particular, we apply this testing framework to smoothing spline ANOVA models. The test can consider two sources of lack-of-fit: whether covariates that are not currently in the model need to be included, and whether the current model fits the data well. The proposed method derives estimated residuals from the model. Then, statistical dependence is assessed between the estimated residuals and the covariates using the HSIC. If dependence exists, the model does not capture all the variability in the outcome associated with the covariates, otherwise the model fits the data well. The bootstrap is used to obtain p-values. Application of the method is demonstrated with a neonatal mental development data analysis. We demonstrate correct type I error as well as power performance through simulations.

  11. Model building strategy for logistic regression: purposeful selection.

    PubMed

    Zhang, Zhongheng

    2016-03-01

    Logistic regression is one of the most commonly used models to account for confounders in medical literature. The article introduces how to perform purposeful selection model building strategy with R. I stress on the use of likelihood ratio test to see whether deleting a variable will have significant impact on model fit. A deleted variable should also be checked for whether it is an important adjustment of remaining covariates. Interaction should be checked to disentangle complex relationship between covariates and their synergistic effect on response variable. Model should be checked for the goodness-of-fit (GOF). In other words, how the fitted model reflects the real data. Hosmer-Lemeshow GOF test is the most widely used for logistic regression model.

  12. Robust mislabel logistic regression without modeling mislabel probabilities.

    PubMed

    Hung, Hung; Jou, Zhi-Yu; Huang, Su-Yun

    2018-03-01

    Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression. © 2017, The International Biometric Society.

  13. A generalized right truncated bivariate Poisson regression model with applications to health data.

    PubMed

    Islam, M Ataharul; Chowdhury, Rafiqul I

    2017-01-01

    A generalized right truncated bivariate Poisson regression model is proposed in this paper. Estimation and tests for goodness of fit and over or under dispersion are illustrated for both untruncated and right truncated bivariate Poisson regression models using marginal-conditional approach. Estimation and test procedures are illustrated for bivariate Poisson regression models with applications to Health and Retirement Study data on number of health conditions and the number of health care services utilized. The proposed test statistics are easy to compute and it is evident from the results that the models fit the data very well. A comparison between the right truncated and untruncated bivariate Poisson regression models using the test for nonnested models clearly shows that the truncated model performs significantly better than the untruncated model.

  14. A generalized right truncated bivariate Poisson regression model with applications to health data

    PubMed Central

    Islam, M. Ataharul; Chowdhury, Rafiqul I.

    2017-01-01

    A generalized right truncated bivariate Poisson regression model is proposed in this paper. Estimation and tests for goodness of fit and over or under dispersion are illustrated for both untruncated and right truncated bivariate Poisson regression models using marginal-conditional approach. Estimation and test procedures are illustrated for bivariate Poisson regression models with applications to Health and Retirement Study data on number of health conditions and the number of health care services utilized. The proposed test statistics are easy to compute and it is evident from the results that the models fit the data very well. A comparison between the right truncated and untruncated bivariate Poisson regression models using the test for nonnested models clearly shows that the truncated model performs significantly better than the untruncated model. PMID:28586344

  15. A global goodness-of-fit statistic for Cox regression models.

    PubMed

    Parzen, M; Lipsitz, S R

    1999-06-01

    In this paper, a global goodness-of-fit test statistic for a Cox regression model, which has an approximate chi-squared distribution when the model has been correctly specified, is proposed. Our goodness-of-fit statistic is global and has power to detect if interactions or higher order powers of covariates in the model are needed. The proposed statistic is similar to the Hosmer and Lemeshow (1980, Communications in Statistics A10, 1043-1069) goodness-of-fit statistic for binary data as well as Schoenfeld's (1980, Biometrika 67, 145-153) statistic for the Cox model. The methods are illustrated using data from a Mayo Clinic trial in primary billiary cirrhosis of the liver (Fleming and Harrington, 1991, Counting Processes and Survival Analysis), in which the outcome is the time until liver transplantation or death. The are 17 possible covariates. Two Cox proportional hazards models are fit to the data, and the proposed goodness-of-fit statistic is applied to the fitted models.

  16. [Application of negative binomial regression and modified Poisson regression in the research of risk factors for injury frequency].

    PubMed

    Cao, Qingqing; Wu, Zhenqiang; Sun, Ying; Wang, Tiezhu; Han, Tengwei; Gu, Chaomei; Sun, Yehuan

    2011-11-01

    To Eexplore the application of negative binomial regression and modified Poisson regression analysis in analyzing the influential factors for injury frequency and the risk factors leading to the increase of injury frequency. 2917 primary and secondary school students were selected from Hefei by cluster random sampling method and surveyed by questionnaire. The data on the count event-based injuries used to fitted modified Poisson regression and negative binomial regression model. The risk factors incurring the increase of unintentional injury frequency for juvenile students was explored, so as to probe the efficiency of these two models in studying the influential factors for injury frequency. The Poisson model existed over-dispersion (P < 0.0001) based on testing by the Lagrangemultiplier. Therefore, the over-dispersion dispersed data using a modified Poisson regression and negative binomial regression model, was fitted better. respectively. Both showed that male gender, younger age, father working outside of the hometown, the level of the guardian being above junior high school and smoking might be the results of higher injury frequencies. On a tendency of clustered frequency data on injury event, both the modified Poisson regression analysis and negative binomial regression analysis can be used. However, based on our data, the modified Poisson regression fitted better and this model could give a more accurate interpretation of relevant factors affecting the frequency of injury.

  17. LiDAR based prediction of forest biomass using hierarchical models with spatially varying coefficients

    USGS Publications Warehouse

    Babcock, Chad; Finley, Andrew O.; Bradford, John B.; Kolka, Randall K.; Birdsey, Richard A.; Ryan, Michael G.

    2015-01-01

    Many studies and production inventory systems have shown the utility of coupling covariates derived from Light Detection and Ranging (LiDAR) data with forest variables measured on georeferenced inventory plots through regression models. The objective of this study was to propose and assess the use of a Bayesian hierarchical modeling framework that accommodates both residual spatial dependence and non-stationarity of model covariates through the introduction of spatial random effects. We explored this objective using four forest inventory datasets that are part of the North American Carbon Program, each comprising point-referenced measures of above-ground forest biomass and discrete LiDAR. For each dataset, we considered at least five regression model specifications of varying complexity. Models were assessed based on goodness of fit criteria and predictive performance using a 10-fold cross-validation procedure. Results showed that the addition of spatial random effects to the regression model intercept improved fit and predictive performance in the presence of substantial residual spatial dependence. Additionally, in some cases, allowing either some or all regression slope parameters to vary spatially, via the addition of spatial random effects, further improved model fit and predictive performance. In other instances, models showed improved fit but decreased predictive performance—indicating over-fitting and underscoring the need for cross-validation to assess predictive ability. The proposed Bayesian modeling framework provided access to pixel-level posterior predictive distributions that were useful for uncertainty mapping, diagnosing spatial extrapolation issues, revealing missing model covariates, and discovering locally significant parameters.

  18. Parametric regression model for survival data: Weibull regression model as an example

    PubMed Central

    2016-01-01

    Weibull regression model is one of the most popular forms of parametric regression model that it provides estimate of baseline hazard function, as well as coefficients for covariates. Because of technical difficulties, Weibull regression model is seldom used in medical literature as compared to the semi-parametric proportional hazard model. To make clinical investigators familiar with Weibull regression model, this article introduces some basic knowledge on Weibull regression model and then illustrates how to fit the model with R software. The SurvRegCensCov package is useful in converting estimated coefficients to clinical relevant statistics such as hazard ratio (HR) and event time ratio (ETR). Model adequacy can be assessed by inspecting Kaplan-Meier curves stratified by categorical variable. The eha package provides an alternative method to model Weibull regression model. The check.dist() function helps to assess goodness-of-fit of the model. Variable selection is based on the importance of a covariate, which can be tested using anova() function. Alternatively, backward elimination starting from a full model is an efficient way for model development. Visualization of Weibull regression model after model development is interesting that it provides another way to report your findings. PMID:28149846

  19. Model Robust Calibration: Method and Application to Electronically-Scanned Pressure Transducers

    NASA Technical Reports Server (NTRS)

    Walker, Eric L.; Starnes, B. Alden; Birch, Jeffery B.; Mays, James E.

    2010-01-01

    This article presents the application of a recently developed statistical regression method to the controlled instrument calibration problem. The statistical method of Model Robust Regression (MRR), developed by Mays, Birch, and Starnes, is shown to improve instrument calibration by reducing the reliance of the calibration on a predetermined parametric (e.g. polynomial, exponential, logarithmic) model. This is accomplished by allowing fits from the predetermined parametric model to be augmented by a certain portion of a fit to the residuals from the initial regression using a nonparametric (locally parametric) regression technique. The method is demonstrated for the absolute scale calibration of silicon-based pressure transducers.

  20. On the Usefulness of a Multilevel Logistic Regression Approach to Person-Fit Analysis

    ERIC Educational Resources Information Center

    Conijn, Judith M.; Emons, Wilco H. M.; van Assen, Marcel A. L. M.; Sijtsma, Klaas

    2011-01-01

    The logistic person response function (PRF) models the probability of a correct response as a function of the item locations. Reise (2000) proposed to use the slope parameter of the logistic PRF as a person-fit measure. He reformulated the logistic PRF model as a multilevel logistic regression model and estimated the PRF parameters from this…

  1. [How to fit and interpret multilevel models using SPSS].

    PubMed

    Pardo, Antonio; Ruiz, Miguel A; San Martín, Rafael

    2007-05-01

    Hierarchic or multilevel models are used to analyse data when cases belong to known groups and sample units are selected both from the individual level and from the group level. In this work, the multilevel models most commonly discussed in the statistic literature are described, explaining how to fit these models using the SPSS program (any version as of the 11 th ) and how to interpret the outcomes of the analysis. Five particular models are described, fitted, and interpreted: (1) one-way analysis of variance with random effects, (2) regression analysis with means-as-outcomes, (3) one-way analysis of covariance with random effects, (4) regression analysis with random coefficients, and (5) regression analysis with means- and slopes-as-outcomes. All models are explained, trying to make them understandable to researchers in health and behaviour sciences.

  2. Real estate value prediction using multivariate regression models

    NASA Astrophysics Data System (ADS)

    Manjula, R.; Jain, Shubham; Srivastava, Sharad; Rajiv Kher, Pranav

    2017-11-01

    The real estate market is one of the most competitive in terms of pricing and the same tends to vary significantly based on a lot of factors, hence it becomes one of the prime fields to apply the concepts of machine learning to optimize and predict the prices with high accuracy. Therefore in this paper, we present various important features to use while predicting housing prices with good accuracy. We have described regression models, using various features to have lower Residual Sum of Squares error. While using features in a regression model some feature engineering is required for better prediction. Often a set of features (multiple regressions) or polynomial regression (applying a various set of powers in the features) is used for making better model fit. For these models are expected to be susceptible towards over fitting ridge regression is used to reduce it. This paper thus directs to the best application of regression models in addition to other techniques to optimize the result.

  3. Spatially resolved regression analysis of pre-treatment FDG, FLT and Cu-ATSM PET from post-treatment FDG PET: an exploratory study

    PubMed Central

    Bowen, Stephen R; Chappell, Richard J; Bentzen, Søren M; Deveau, Michael A; Forrest, Lisa J; Jeraj, Robert

    2012-01-01

    Purpose To quantify associations between pre-radiotherapy and post-radiotherapy PET parameters via spatially resolved regression. Materials and methods Ten canine sinonasal cancer patients underwent PET/CT scans of [18F]FDG (FDGpre), [18F]FLT (FLTpre), and [61Cu]Cu-ATSM (Cu-ATSMpre). Following radiotherapy regimens of 50 Gy in 10 fractions, veterinary patients underwent FDG PET/CT scans at three months (FDGpost). Regression of standardized uptake values in baseline FDGpre, FLTpre and Cu-ATSMpre tumour voxels to those in FDGpost images was performed for linear, log-linear, generalized-linear and mixed-fit linear models. Goodness-of-fit in regression coefficients was assessed by R2. Hypothesis testing of coefficients over the patient population was performed. Results Multivariate linear model fits of FDGpre to FDGpost were significantly positive over the population (FDGpost~0.17 FDGpre, p=0.03), and classified slopes of RECIST non-responders and responders to be different (0.37 vs. 0.07, p=0.01). Generalized-linear model fits related FDGpre to FDGpost by a linear power law (FDGpost~FDGpre0.93, p<0.001). Univariate mixture model fits of FDGpre improved R2 from 0.17 to 0.52. Neither baseline FLT PET nor Cu-ATSM PET uptake contributed statistically significant multivariate regression coefficients. Conclusions Spatially resolved regression analysis indicates that pre-treatment FDG PET uptake is most strongly associated with three-month post-treatment FDG PET uptake in this patient population, though associations are histopathology-dependent. PMID:22682748

  4. Climate variations and salmonellosis transmission in Adelaide, South Australia: a comparison between regression models

    NASA Astrophysics Data System (ADS)

    Zhang, Ying; Bi, Peng; Hiller, Janet

    2008-01-01

    This is the first study to identify appropriate regression models for the association between climate variation and salmonellosis transmission. A comparison between different regression models was conducted using surveillance data in Adelaide, South Australia. By using notified salmonellosis cases and climatic variables from the Adelaide metropolitan area over the period 1990-2003, four regression methods were examined: standard Poisson regression, autoregressive adjusted Poisson regression, multiple linear regression, and a seasonal autoregressive integrated moving average (SARIMA) model. Notified salmonellosis cases in 2004 were used to test the forecasting ability of the four models. Parameter estimation, goodness-of-fit and forecasting ability of the four regression models were compared. Temperatures occurring 2 weeks prior to cases were positively associated with cases of salmonellosis. Rainfall was also inversely related to the number of cases. The comparison of the goodness-of-fit and forecasting ability suggest that the SARIMA model is better than the other three regression models. Temperature and rainfall may be used as climatic predictors of salmonellosis cases in regions with climatic characteristics similar to those of Adelaide. The SARIMA model could, thus, be adopted to quantify the relationship between climate variations and salmonellosis transmission.

  5. Comparison of random regression models with Legendre polynomials and linear splines for production traits and somatic cell score of Canadian Holstein cows.

    PubMed

    Bohmanova, J; Miglior, F; Jamrozik, J; Misztal, I; Sullivan, P G

    2008-09-01

    A random regression model with both random and fixed regressions fitted by Legendre polynomials of order 4 was compared with 3 alternative models fitting linear splines with 4, 5, or 6 knots. The effects common for all models were a herd-test-date effect, fixed regressions on days in milk (DIM) nested within region-age-season of calving class, and random regressions for additive genetic and permanent environmental effects. Data were test-day milk, fat and protein yields, and SCS recorded from 5 to 365 DIM during the first 3 lactations of Canadian Holstein cows. A random sample of 50 herds consisting of 96,756 test-day records was generated to estimate variance components within a Bayesian framework via Gibbs sampling. Two sets of genetic evaluations were subsequently carried out to investigate performance of the 4 models. Models were compared by graphical inspection of variance functions, goodness of fit, error of prediction of breeding values, and stability of estimated breeding values. Models with splines gave lower estimates of variances at extremes of lactations than the model with Legendre polynomials. Differences among models in goodness of fit measured by percentages of squared bias, correlations between predicted and observed records, and residual variances were small. The deviance information criterion favored the spline model with 6 knots. Smaller error of prediction and higher stability of estimated breeding values were achieved by using spline models with 5 and 6 knots compared with the model with Legendre polynomials. In general, the spline model with 6 knots had the best overall performance based upon the considered model comparison criteria.

  6. Using Weighted Least Squares Regression for Obtaining Langmuir Sorption Constants

    USDA-ARS?s Scientific Manuscript database

    One of the most commonly used models for describing phosphorus (P) sorption to soils is the Langmuir model. To obtain model parameters, the Langmuir model is fit to measured sorption data using least squares regression. Least squares regression is based on several assumptions including normally dist...

  7. Analyzing Student Learning Outcomes: Usefulness of Logistic and Cox Regression Models. IR Applications, Volume 5

    ERIC Educational Resources Information Center

    Chen, Chau-Kuang

    2005-01-01

    Logistic and Cox regression methods are practical tools used to model the relationships between certain student learning outcomes and their relevant explanatory variables. The logistic regression model fits an S-shaped curve into a binary outcome with data points of zero and one. The Cox regression model allows investigators to study the duration…

  8. Predicting Quantitative Traits With Regression Models for Dense Molecular Markers and Pedigree

    PubMed Central

    de los Campos, Gustavo; Naya, Hugo; Gianola, Daniel; Crossa, José; Legarra, Andrés; Manfredi, Eduardo; Weigel, Kent; Cotes, José Miguel

    2009-01-01

    The availability of genomewide dense markers brings opportunities and challenges to breeding programs. An important question concerns the ways in which dense markers and pedigrees, together with phenotypic records, should be used to arrive at predictions of genetic values for complex traits. If a large number of markers are included in a regression model, marker-specific shrinkage of regression coefficients may be needed. For this reason, the Bayesian least absolute shrinkage and selection operator (LASSO) (BL) appears to be an interesting approach for fitting marker effects in a regression model. This article adapts the BL to arrive at a regression model where markers, pedigrees, and covariates other than markers are considered jointly. Connections between BL and other marker-based regression models are discussed, and the sensitivity of BL with respect to the choice of prior distributions assigned to key parameters is evaluated using simulation. The proposed model was fitted to two data sets from wheat and mouse populations, and evaluated using cross-validation methods. Results indicate that inclusion of markers in the regression further improved the predictive ability of models. An R program that implements the proposed model is freely available. PMID:19293140

  9. Categorical regression dose-response modeling

    EPA Science Inventory

    The goal of this training is to provide participants with training on the use of the U.S. EPA’s Categorical Regression soft¬ware (CatReg) and its application to risk assessment. Categorical regression fits mathematical models to toxicity data that have been assigned ord...

  10. Potential pitfalls when denoising resting state fMRI data using nuisance regression.

    PubMed

    Bright, Molly G; Tench, Christopher R; Murphy, Kevin

    2017-07-01

    In resting state fMRI, it is necessary to remove signal variance associated with noise sources, leaving cleaned fMRI time-series that more accurately reflect the underlying intrinsic brain fluctuations of interest. This is commonly achieved through nuisance regression, in which the fit is calculated of a noise model of head motion and physiological processes to the fMRI data in a General Linear Model, and the "cleaned" residuals of this fit are used in further analysis. We examine the statistical assumptions and requirements of the General Linear Model, and whether these are met during nuisance regression of resting state fMRI data. Using toy examples and real data we show how pre-whitening, temporal filtering and temporal shifting of regressors impact model fit. Based on our own observations, existing literature, and statistical theory, we make the following recommendations when employing nuisance regression: pre-whitening should be applied to achieve valid statistical inference of the noise model fit parameters; temporal filtering should be incorporated into the noise model to best account for changes in degrees of freedom; temporal shifting of regressors, although merited, should be achieved via optimisation and validation of a single temporal shift. We encourage all readers to make simple, practical changes to their fMRI denoising pipeline, and to regularly assess the appropriateness of the noise model used. By negotiating the potential pitfalls described in this paper, and by clearly reporting the details of nuisance regression in future manuscripts, we hope that the field will achieve more accurate and precise noise models for cleaning the resting state fMRI time-series. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  11. An Entropy-Based Measure for Assessing Fuzziness in Logistic Regression

    ERIC Educational Resources Information Center

    Weiss, Brandi A.; Dardick, William

    2016-01-01

    This article introduces an entropy-based measure of data-model fit that can be used to assess the quality of logistic regression models. Entropy has previously been used in mixture-modeling to quantify how well individuals are classified into latent classes. The current study proposes the use of entropy for logistic regression models to quantify…

  12. Statistical model to perform error analysis of curve fits of wind tunnel test data using the techniques of analysis of variance and regression analysis

    NASA Technical Reports Server (NTRS)

    Alston, D. W.

    1981-01-01

    The considered research had the objective to design a statistical model that could perform an error analysis of curve fits of wind tunnel test data using analysis of variance and regression analysis techniques. Four related subproblems were defined, and by solving each of these a solution to the general research problem was obtained. The capabilities of the evolved true statistical model are considered. The least squares fit is used to determine the nature of the force, moment, and pressure data. The order of the curve fit is increased in order to delete the quadratic effect in the residuals. The analysis of variance is used to determine the magnitude and effect of the error factor associated with the experimental data.

  13. Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.

    PubMed

    Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko

    2016-03-01

    In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. Copyright © 2015 Elsevier Ltd. All rights reserved.

  14. Convex Regression with Interpretable Sharp Partitions

    PubMed Central

    Petersen, Ashley; Simon, Noah; Witten, Daniela

    2016-01-01

    We consider the problem of predicting an outcome variable on the basis of a small number of covariates, using an interpretable yet non-additive model. We propose convex regression with interpretable sharp partitions (CRISP) for this task. CRISP partitions the covariate space into blocks in a data-adaptive way, and fits a mean model within each block. Unlike other partitioning methods, CRISP is fit using a non-greedy approach by solving a convex optimization problem, resulting in low-variance fits. We explore the properties of CRISP, and evaluate its performance in a simulation study and on a housing price data set. PMID:27635120

  15. Goodness of Fit and Misspecification in Quantile Regressions

    ERIC Educational Resources Information Center

    Furno, Marilena

    2011-01-01

    The article considers a test of specification for quantile regressions. The test relies on the increase of the objective function and the worsening of the fit when unnecessary constraints are imposed. It compares the objective functions of restricted and unrestricted models and, in its different formulations, it verifies (a) forecast ability, (b)…

  16. Pseudo-second order models for the adsorption of safranin onto activated carbon: comparison of linear and non-linear regression methods.

    PubMed

    Kumar, K Vasanth

    2007-04-02

    Kinetic experiments were carried out for the sorption of safranin onto activated carbon particles. The kinetic data were fitted to pseudo-second order model of Ho, Sobkowsk and Czerwinski, Blanchard et al. and Ritchie by linear and non-linear regression methods. Non-linear method was found to be a better way of obtaining the parameters involved in the second order rate kinetic expressions. Both linear and non-linear regression showed that the Sobkowsk and Czerwinski and Ritchie's pseudo-second order models were the same. Non-linear regression analysis showed that both Blanchard et al. and Ho have similar ideas on the pseudo-second order model but with different assumptions. The best fit of experimental data in Ho's pseudo-second order expression by linear and non-linear regression method showed that Ho pseudo-second order model was a better kinetic expression when compared to other pseudo-second order kinetic expressions.

  17. Interaction Models for Functional Regression.

    PubMed

    Usset, Joseph; Staicu, Ana-Maria; Maity, Arnab

    2016-02-01

    A functional regression model with a scalar response and multiple functional predictors is proposed that accommodates two-way interactions in addition to their main effects. The proposed estimation procedure models the main effects using penalized regression splines, and the interaction effect by a tensor product basis. Extensions to generalized linear models and data observed on sparse grids or with measurement error are presented. A hypothesis testing procedure for the functional interaction effect is described. The proposed method can be easily implemented through existing software. Numerical studies show that fitting an additive model in the presence of interaction leads to both poor estimation performance and lost prediction power, while fitting an interaction model where there is in fact no interaction leads to negligible losses. The methodology is illustrated on the AneuRisk65 study data.

  18. glmnetLRC f/k/a lrc package: Logistic Regression Classification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    2016-06-09

    Methods for fitting and predicting logistic regression classifiers (LRC) with an arbitrary loss function using elastic net or best subsets. This package adds additional model fitting features to the existing glmnet and bestglm R packages. This package was created to perform the analyses described in Amidan BG, Orton DJ, LaMarche BL, et al. 2014. Signatures for Mass Spectrometry Data Quality. Journal of Proteome Research. 13(4), 2215-2222. It makes the model fitting available in the glmnet and bestglm packages more general by identifying optimal model parameters via cross validation with an customizable loss function. It also identifies the optimal threshold formore » binary classification.« less

  19. A quadratic regression modelling on paddy production in the area of Perlis

    NASA Astrophysics Data System (ADS)

    Goh, Aizat Hanis Annas; Ali, Zalila; Nor, Norlida Mohd; Baharum, Adam; Ahmad, Wan Muhamad Amir W.

    2017-08-01

    Polynomial regression models are useful in situations in which the relationship between a response variable and predictor variables is curvilinear. Polynomial regression fits the nonlinear relationship into a least squares linear regression model by decomposing the predictor variables into a kth order polynomial. The polynomial order determines the number of inflexions on the curvilinear fitted line. A second order polynomial forms a quadratic expression (parabolic curve) with either a single maximum or minimum, a third order polynomial forms a cubic expression with both a relative maximum and a minimum. This study used paddy data in the area of Perlis to model paddy production based on paddy cultivation characteristics and environmental characteristics. The results indicated that a quadratic regression model best fits the data and paddy production is affected by urea fertilizer application and the interaction between amount of average rainfall and percentage of area defected by pest and disease. Urea fertilizer application has a quadratic effect in the model which indicated that if the number of days of urea fertilizer application increased, paddy production is expected to decrease until it achieved a minimum value and paddy production is expected to increase at higher number of days of urea application. The decrease in paddy production with an increased in rainfall is greater, the higher the percentage of area defected by pest and disease.

  20. Revisiting Gaussian Process Regression Modeling for Localization in Wireless Sensor Networks

    PubMed Central

    Richter, Philipp; Toledano-Ayala, Manuel

    2015-01-01

    Signal strength-based positioning in wireless sensor networks is a key technology for seamless, ubiquitous localization, especially in areas where Global Navigation Satellite System (GNSS) signals propagate poorly. To enable wireless local area network (WLAN) location fingerprinting in larger areas while maintaining accuracy, methods to reduce the effort of radio map creation must be consolidated and automatized. Gaussian process regression has been applied to overcome this issue, also with auspicious results, but the fit of the model was never thoroughly assessed. Instead, most studies trained a readily available model, relying on the zero mean and squared exponential covariance function, without further scrutinization. This paper studies the Gaussian process regression model selection for WLAN fingerprinting in indoor and outdoor environments. We train several models for indoor/outdoor- and combined areas; we evaluate them quantitatively and compare them by means of adequate model measures, hence assessing the fit of these models directly. To illuminate the quality of the model fit, the residuals of the proposed model are investigated, as well. Comparative experiments on the positioning performance verify and conclude the model selection. In this way, we show that the standard model is not the most appropriate, discuss alternatives and present our best candidate. PMID:26370996

  1. [Comparison of predictive effect between the single auto regressive integrated moving average (ARIMA) model and the ARIMA-generalized regression neural network (GRNN) combination model on the incidence of scarlet fever].

    PubMed

    Zhu, Yu; Xia, Jie-lai; Wang, Jing

    2009-09-01

    Application of the 'single auto regressive integrated moving average (ARIMA) model' and the 'ARIMA-generalized regression neural network (GRNN) combination model' in the research of the incidence of scarlet fever. Establish the auto regressive integrated moving average model based on the data of the monthly incidence on scarlet fever of one city, from 2000 to 2006. The fitting values of the ARIMA model was used as input of the GRNN, and the actual values were used as output of the GRNN. After training the GRNN, the effect of the single ARIMA model and the ARIMA-GRNN combination model was then compared. The mean error rate (MER) of the single ARIMA model and the ARIMA-GRNN combination model were 31.6%, 28.7% respectively and the determination coefficient (R(2)) of the two models were 0.801, 0.872 respectively. The fitting efficacy of the ARIMA-GRNN combination model was better than the single ARIMA, which had practical value in the research on time series data such as the incidence of scarlet fever.

  2. Background stratified Poisson regression analysis of cohort data.

    PubMed

    Richardson, David B; Langholz, Bryan

    2012-03-01

    Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as 'nuisance' variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this 'conditional' regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models.

  3. The association of health-related fitness with indicators of academic performance in Texas schools.

    PubMed

    Welk, Gregory J; Jackson, Allen W; Morrow, James R; Haskell, William H; Meredith, Marilu D; Cooper, Kenneth H

    2010-09-01

    This study examined the associations between indicators of health-related physical fitness (cardiovascular fitness and body mass index) and academic performance (Texas Assessment of Knowledge and Skills). Partial correlations were generally stronger for cardiovascular fitness than body mass index and consistently stronger in the middle school grades. Mixed-model regression analyses revealed modest associations between fitness and academic achievement after controlling for potentially confounding variables. The effects of fitness on academic achievement were positive but small. A separate logistic regression analysis indicated that higher fitness rates increased the odds of schools achieving exemplary/recognized school status within the state. School fitness attainment is an indicator of higher performing schools. Direction of causality cannot be inferred due to the cross-sectional nature of the data.

  4. A classical regression framework for mediation analysis: fitting one model to estimate mediation effects.

    PubMed

    Saunders, Christina T; Blume, Jeffrey D

    2017-10-26

    Mediation analysis explores the degree to which an exposure's effect on an outcome is diverted through a mediating variable. We describe a classical regression framework for conducting mediation analyses in which estimates of causal mediation effects and their variance are obtained from the fit of a single regression model. The vector of changes in exposure pathway coefficients, which we named the essential mediation components (EMCs), is used to estimate standard causal mediation effects. Because these effects are often simple functions of the EMCs, an analytical expression for their model-based variance follows directly. Given this formula, it is instructive to revisit the performance of routinely used variance approximations (e.g., delta method and resampling methods). Requiring the fit of only one model reduces the computation time required for complex mediation analyses and permits the use of a rich suite of regression tools that are not easily implemented on a system of three equations, as would be required in the Baron-Kenny framework. Using data from the BRAIN-ICU study, we provide examples to illustrate the advantages of this framework and compare it with the existing approaches. © The Author 2017. Published by Oxford University Press.

  5. Local polynomial estimation of heteroscedasticity in a multivariate linear regression model and its applications in economics.

    PubMed

    Su, Liyun; Zhao, Yanyong; Yan, Tianshun; Li, Fenglan

    2012-01-01

    Multivariate local polynomial fitting is applied to the multivariate linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to non-parametric technique of local polynomial estimation, it is unnecessary to know the form of heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we verify that the regression coefficients is asymptotic normal based on numerical simulations and normal Q-Q plots of residuals. Finally, the simulation results and the local polynomial estimation of real data indicate that our approach is surely effective in finite-sample situations.

  6. Carbon dioxide stripping in aquaculture -- part III: model verification

    USGS Publications Warehouse

    Colt, John; Watten, Barnaby; Pfeiffer, Tim

    2012-01-01

    Based on conventional mass transfer models developed for oxygen, the use of the non-linear ASCE method, 2-point method, and one parameter linear-regression method were evaluated for carbon dioxide stripping data. For values of KLaCO2 < approximately 1.5/h, the 2-point or ASCE method are a good fit to experimental data, but the fit breaks down at higher values of KLaCO2. How to correct KLaCO2 for gas phase enrichment remains to be determined. The one-parameter linear regression model was used to vary the C*CO2 over the test, but it did not result in a better fit to the experimental data when compared to the ASCE or fixed C*CO2 assumptions.

  7. Random regression models using Legendre polynomials or linear splines for test-day milk yield of dairy Gyr (Bos indicus) cattle.

    PubMed

    Pereira, R J; Bignardi, A B; El Faro, L; Verneque, R S; Vercesi Filho, A E; Albuquerque, L G

    2013-01-01

    Studies investigating the use of random regression models for genetic evaluation of milk production in Zebu cattle are scarce. In this study, 59,744 test-day milk yield records from 7,810 first lactations of purebred dairy Gyr (Bos indicus) and crossbred (dairy Gyr × Holstein) cows were used to compare random regression models in which additive genetic and permanent environmental effects were modeled using orthogonal Legendre polynomials or linear spline functions. Residual variances were modeled considering 1, 5, or 10 classes of days in milk. Five classes fitted the changes in residual variances over the lactation adequately and were used for model comparison. The model that fitted linear spline functions with 6 knots provided the lowest sum of residual variances across lactation. On the other hand, according to the deviance information criterion (DIC) and bayesian information criterion (BIC), a model using third-order and fourth-order Legendre polynomials for additive genetic and permanent environmental effects, respectively, provided the best fit. However, the high rank correlation (0.998) between this model and that applying third-order Legendre polynomials for additive genetic and permanent environmental effects, indicates that, in practice, the same bulls would be selected by both models. The last model, which is less parameterized, is a parsimonious option for fitting dairy Gyr breed test-day milk yield records. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  8. Correlation and simple linear regression.

    PubMed

    Eberly, Lynn E

    2007-01-01

    This chapter highlights important steps in using correlation and simple linear regression to address scientific questions about the association of two continuous variables with each other. These steps include estimation and inference, assessing model fit, the connection between regression and ANOVA, and study design. Examples in microbiology are used throughout. This chapter provides a framework that is helpful in understanding more complex statistical techniques, such as multiple linear regression, linear mixed effects models, logistic regression, and proportional hazards regression.

  9. Quantifying and Reducing Curve-Fitting Uncertainty in Isc

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Campanelli, Mark; Duck, Benjamin; Emery, Keith

    2015-06-14

    Current-voltage (I-V) curve measurements of photovoltaic (PV) devices are used to determine performance parameters and to establish traceable calibration chains. Measurement standards specify localized curve fitting methods, e.g., straight-line interpolation/extrapolation of the I-V curve points near short-circuit current, Isc. By considering such fits as statistical linear regressions, uncertainties in the performance parameters are readily quantified. However, the legitimacy of such a computed uncertainty requires that the model be a valid (local) representation of the I-V curve and that the noise be sufficiently well characterized. Using more data points often has the advantage of lowering the uncertainty. However, more data pointsmore » can make the uncertainty in the fit arbitrarily small, and this fit uncertainty misses the dominant residual uncertainty due to so-called model discrepancy. Using objective Bayesian linear regression for straight-line fits for Isc, we investigate an evidence-based method to automatically choose data windows of I-V points with reduced model discrepancy. We also investigate noise effects. Uncertainties, aligned with the Guide to the Expression of Uncertainty in Measurement (GUM), are quantified throughout.« less

  10. Quantifying and Reducing Curve-Fitting Uncertainty in Isc: Preprint

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Campanelli, Mark; Duck, Benjamin; Emery, Keith

    Current-voltage (I-V) curve measurements of photovoltaic (PV) devices are used to determine performance parameters and to establish traceable calibration chains. Measurement standards specify localized curve fitting methods, e.g., straight-line interpolation/extrapolation of the I-V curve points near short-circuit current, Isc. By considering such fits as statistical linear regressions, uncertainties in the performance parameters are readily quantified. However, the legitimacy of such a computed uncertainty requires that the model be a valid (local) representation of the I-V curve and that the noise be sufficiently well characterized. Using more data points often has the advantage of lowering the uncertainty. However, more data pointsmore » can make the uncertainty in the fit arbitrarily small, and this fit uncertainty misses the dominant residual uncertainty due to so-called model discrepancy. Using objective Bayesian linear regression for straight-line fits for Isc, we investigate an evidence-based method to automatically choose data windows of I-V points with reduced model discrepancy. We also investigate noise effects. Uncertainties, aligned with the Guide to the Expression of Uncertainty in Measurement (GUM), are quantified throughout.« less

  11. Modeling health survey data with excessive zero and K responses.

    PubMed

    Lin, Ting Hsiang; Tsai, Min-Hsiao

    2013-04-30

    Zero-inflated Poisson regression is a popular tool used to analyze data with excessive zeros. Although much work has already been performed to fit zero-inflated data, most models heavily depend on special features of the individual data. To be specific, this means that there is a sizable group of respondents who endorse the same answers making the data have peaks. In this paper, we propose a new model with the flexibility to model excessive counts other than zero, and the model is a mixture of multinomial logistic and Poisson regression, in which the multinomial logistic component models the occurrence of excessive counts, including zeros, K (where K is a positive integer) and all other values. The Poisson regression component models the counts that are assumed to follow a Poisson distribution. Two examples are provided to illustrate our models when the data have counts containing many ones and sixes. As a result, the zero-inflated and K-inflated models exhibit a better fit than the zero-inflated Poisson and standard Poisson regressions. Copyright © 2012 John Wiley & Sons, Ltd.

  12. Orthogonal Regression: A Teaching Perspective

    ERIC Educational Resources Information Center

    Carr, James R.

    2012-01-01

    A well-known approach to linear least squares regression is that which involves minimizing the sum of squared orthogonal projections of data points onto the best fit line. This form of regression is known as orthogonal regression, and the linear model that it yields is known as the major axis. A similar method, reduced major axis regression, is…

  13. Quantum algorithm for linear regression

    NASA Astrophysics Data System (ADS)

    Wang, Guoming

    2017-07-01

    We present a quantum algorithm for fitting a linear regression model to a given data set using the least-squares approach. Differently from previous algorithms which yield a quantum state encoding the optimal parameters, our algorithm outputs these numbers in the classical form. So by running it once, one completely determines the fitted model and then can use it to make predictions on new data at little cost. Moreover, our algorithm works in the standard oracle model, and can handle data sets with nonsparse design matrices. It runs in time poly( log2(N ) ,d ,κ ,1 /ɛ ) , where N is the size of the data set, d is the number of adjustable parameters, κ is the condition number of the design matrix, and ɛ is the desired precision in the output. We also show that the polynomial dependence on d and κ is necessary. Thus, our algorithm cannot be significantly improved. Furthermore, we also give a quantum algorithm that estimates the quality of the least-squares fit (without computing its parameters explicitly). This algorithm runs faster than the one for finding this fit, and can be used to check whether the given data set qualifies for linear regression in the first place.

  14. Pseudo second order kinetics and pseudo isotherms for malachite green onto activated carbon: comparison of linear and non-linear regression methods.

    PubMed

    Kumar, K Vasanth; Sivanesan, S

    2006-08-25

    Pseudo second order kinetic expressions of Ho, Sobkowsk and Czerwinski, Blanachard et al. and Ritchie were fitted to the experimental kinetic data of malachite green onto activated carbon by non-linear and linear method. Non-linear method was found to be a better way of obtaining the parameters involved in the second order rate kinetic expressions. Both linear and non-linear regression showed that the Sobkowsk and Czerwinski and Ritchie's pseudo second order model were the same. Non-linear regression analysis showed that both Blanachard et al. and Ho have similar ideas on the pseudo second order model but with different assumptions. The best fit of experimental data in Ho's pseudo second order expression by linear and non-linear regression method showed that Ho pseudo second order model was a better kinetic expression when compared to other pseudo second order kinetic expressions. The amount of dye adsorbed at equilibrium, q(e), was predicted from Ho pseudo second order expression and were fitted to the Langmuir, Freundlich and Redlich Peterson expressions by both linear and non-linear method to obtain the pseudo isotherms. The best fitting pseudo isotherm was found to be the Langmuir and Redlich Peterson isotherm. Redlich Peterson is a special case of Langmuir when the constant g equals unity.

  15. Adjusted variable plots for Cox's proportional hazards regression model.

    PubMed

    Hall, C B; Zeger, S L; Bandeen-Roche, K J

    1996-01-01

    Adjusted variable plots are useful in linear regression for outlier detection and for qualitative evaluation of the fit of a model. In this paper, we extend adjusted variable plots to Cox's proportional hazards model for possibly censored survival data. We propose three different plots: a risk level adjusted variable (RLAV) plot in which each observation in each risk set appears, a subject level adjusted variable (SLAV) plot in which each subject is represented by one point, and an event level adjusted variable (ELAV) plot in which the entire risk set at each failure event is represented by a single point. The latter two plots are derived from the RLAV by combining multiple points. In each point, the regression coefficient and standard error from a Cox proportional hazards regression is obtained by a simple linear regression through the origin fit to the coordinates of the pictured points. The plots are illustrated with a reanalysis of a dataset of 65 patients with multiple myeloma.

  16. Factors associated with parasite dominance in fishes from Brazil.

    PubMed

    Amarante, Cristina Fernandes do; Tassinari, Wagner de Souza; Luque, Jose Luis; Pereira, Maria Julia Salim

    2016-06-14

    The present study used regression models to evaluate the existence of factors that may influence the numerical parasite dominance with an epidemiological approximation. A database including 3,746 fish specimens and their respective parasites were used to evaluate the relationship between parasite dominance and biotic characteristics inherent to the studied hosts and the parasite taxa. Multivariate, classical, and mixed effects linear regression models were fitted. The calculations were performed using R software (95% CI). In the fitting of the classical multiple linear regression model, freshwater and planktivorous fish species and body length, as well as the species of the taxa Trematoda, Monogenea, and Hirudinea, were associated with parasite dominance. However, the fitting of the mixed effects model showed that the body length of the host and the species of the taxa Nematoda, Trematoda, Monogenea, Hirudinea, and Crustacea were significantly associated with parasite dominance. Studies that consider specific biological aspects of the hosts and parasites should expand the knowledge regarding factors that influence the numerical dominance of fish in Brazil. The use of a mixed model shows, once again, the importance of the appropriate use of a model correlated with the characteristics of the data to obtain consistent results.

  17. Optimization of isotherm models for pesticide sorption on biopolymer-nanoclay composite by error analysis.

    PubMed

    Narayanan, Neethu; Gupta, Suman; Gajbhiye, V T; Manjaiah, K M

    2017-04-01

    A carboxy methyl cellulose-nano organoclay (nano montmorillonite modified with 35-45 wt % dimethyl dialkyl (C 14 -C 18 ) amine (DMDA)) composite was prepared by solution intercalation method. The prepared composite was characterized by infrared spectroscopy (FTIR), X-Ray diffraction spectroscopy (XRD) and scanning electron microscopy (SEM). The composite was utilized for its pesticide sorption efficiency for atrazine, imidacloprid and thiamethoxam. The sorption data was fitted into Langmuir and Freundlich isotherms using linear and non linear methods. The linear regression method suggested best fitting of sorption data into Type II Langmuir and Freundlich isotherms. In order to avoid the bias resulting from linearization, seven different error parameters were also analyzed by non linear regression method. The non linear error analysis suggested that the sorption data fitted well into Langmuir model rather than in Freundlich model. The maximum sorption capacity, Q 0 (μg/g) was given by imidacloprid (2000) followed by thiamethoxam (1667) and atrazine (1429). The study suggests that the degree of determination of linear regression alone cannot be used for comparing the best fitting of Langmuir and Freundlich models and non-linear error analysis needs to be done to avoid inaccurate results. Copyright © 2017 Elsevier Ltd. All rights reserved.

  18. Small-Sample Adjustments for Tests of Moderators and Model Fit in Robust Variance Estimation in Meta-Regression

    ERIC Educational Resources Information Center

    Tipton, Elizabeth; Pustejovsky, James E.

    2015-01-01

    Randomized experiments are commonly used to evaluate the effectiveness of educational interventions. The goal of the present investigation is to develop small-sample corrections for multiple contrast hypothesis tests (i.e., F-tests) such as the omnibus test of meta-regression fit or a test for equality of three or more levels of a categorical…

  19. A New Global Regression Analysis Method for the Prediction of Wind Tunnel Model Weight Corrections

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert Manfred; Bridge, Thomas M.; Amaya, Max A.

    2014-01-01

    A new global regression analysis method is discussed that predicts wind tunnel model weight corrections for strain-gage balance loads during a wind tunnel test. The method determines corrections by combining "wind-on" model attitude measurements with least squares estimates of the model weight and center of gravity coordinates that are obtained from "wind-off" data points. The method treats the least squares fit of the model weight separate from the fit of the center of gravity coordinates. Therefore, it performs two fits of "wind- off" data points and uses the least squares estimator of the model weight as an input for the fit of the center of gravity coordinates. Explicit equations for the least squares estimators of the weight and center of gravity coordinates are derived that simplify the implementation of the method in the data system software of a wind tunnel. In addition, recommendations for sets of "wind-off" data points are made that take typical model support system constraints into account. Explicit equations of the confidence intervals on the model weight and center of gravity coordinates and two different error analyses of the model weight prediction are also discussed in the appendices of the paper.

  20. Using the PLUM procedure of SPSS to fit unequal variance and generalized signal detection models.

    PubMed

    DeCarlo, Lawrence T

    2003-02-01

    The recent addition of aprocedure in SPSS for the analysis of ordinal regression models offers a simple means for researchers to fit the unequal variance normal signal detection model and other extended signal detection models. The present article shows how to implement the analysis and how to interpret the SPSS output. Examples of fitting the unequal variance normal model and other generalized signal detection models are given. The approach offers a convenient means for applying signal detection theory to a variety of research.

  1. An Investigation of the Fit of Linear Regression Models to Data from an SAT[R] Validity Study. Research Report 2011-3

    ERIC Educational Resources Information Center

    Kobrin, Jennifer L.; Sinharay, Sandip; Haberman, Shelby J.; Chajewski, Michael

    2011-01-01

    This study examined the adequacy of a multiple linear regression model for predicting first-year college grade point average (FYGPA) using SAT[R] scores and high school grade point average (HSGPA). A variety of techniques, both graphical and statistical, were used to examine if it is possible to improve on the linear regression model. The results…

  2. Temperature-viscosity models reassessed.

    PubMed

    Peleg, Micha

    2017-05-04

    The temperature effect on viscosity of liquid and semi-liquid foods has been traditionally described by the Arrhenius equation, a few other mathematical models, and more recently by the WLF and VTF (or VFT) equations. The essence of the Arrhenius equation is that the viscosity is proportional to the absolute temperature's reciprocal and governed by a single parameter, namely, the energy of activation. However, if the absolute temperature in K in the Arrhenius equation is replaced by T + b where both T and the adjustable b are in °C, the result is a two-parameter model, which has superior fit to experimental viscosity-temperature data. This modified version of the Arrhenius equation is also mathematically equal to the WLF and VTF equations, which are known to be equal to each other. Thus, despite their dissimilar appearances all three equations are essentially the same model, and when used to fit experimental temperature-viscosity data render exactly the same very high regression coefficient. It is shown that three new hybrid two-parameter mathematical models, whose formulation bears little resemblance to any of the conventional models, can also have excellent fit with r 2 ∼ 1. This is demonstrated by comparing the various models' regression coefficients to published viscosity-temperature relationships of 40% sucrose solution, soybean oil, and 70°Bx pear juice concentrate at different temperature ranges. Also compared are reconstructed temperature-viscosity curves using parameters calculated directly from 2 or 3 data points and fitted curves obtained by nonlinear regression using a larger number of experimental viscosity measurements.

  3. Depression, stress, and intimate partner violence among Latino migrant and seasonal farmworkers in rural Southeastern North Carolina.

    PubMed

    Kim-Godwin, Yeoun Soo; Maume, Michael O; Fox, Jane A

    2014-12-01

    The purpose of the study is to identify the predictors of depression and intimate partner violence (IPV) among Latinos in rural Southeastern North Carolina. A sample of 291 migrant and seasonal farmworkers was interviewed to complete the demographic questionnaire, HITS (intimate violence tendency), Migrant Farmworker Stress Inventory, Center for Epidemiologic Studies Depression Scale (depression), and CAGE/4M (alcohol abuse). OLS regression and structural equation modeling were used to test the hypothesized relations between predictors of IPV and depression. The findings indicated that respondents reporting higher levels of stress also reported higher levels of IPV and depression. The goodness-of-fit statistics for the overall model again indicated a moderate fit of the model to the data (χ2 = 5,612, p < .001; root mean square error for approximation = 0.09; adjusted goodness-of-fit index = 0.44; comparative fit index = 0.52). Although the findings were not robust to estimation in the structural equation models, the OLS regression models indicated direct associations between IPV and depression.

  4. Age- and sex-dependent regression models for predicting the live weight of West African Dwarf goat from body measurements.

    PubMed

    Sowande, O S; Oyewale, B F; Iyasere, O S

    2010-06-01

    The relationships between live weight and eight body measurements of West African Dwarf (WAD) goats were studied using 211 animals under farm condition. The animals were categorized based on age and sex. Data obtained on height at withers (HW), heart girth (HG), body length (BL), head length (HL), and length of hindquarter (LHQ) were fitted into simple linear, allometric, and multiple-regression models to predict live weight from the body measurements according to age group and sex. Results showed that live weight, HG, BL, LHQ, HL, and HW increased with the age of the animals. In multiple-regression model, HG and HL best fit the model for goat kids; HG, HW, and HL for goat aged 13-24 months; while HG, LHQ, HW, and HL best fit the model for goats aged 25-36 months. Coefficients of determination (R(2)) values for linear and allometric models for predicting the live weight of WAD goat increased with age in all the body measurements, with HG being the most satisfactory single measurement in predicting the live weight of WAD goat. Sex had significant influence on the model with R(2) values consistently higher in females except the models for LHQ and HW.

  5. Obtaining Predictions from Models Fit to Multiply Imputed Data

    ERIC Educational Resources Information Center

    Miles, Andrew

    2016-01-01

    Obtaining predictions from regression models fit to multiply imputed data can be challenging because treatments of multiple imputation seldom give clear guidance on how predictions can be calculated, and because available software often does not have built-in routines for performing the necessary calculations. This research note reviews how…

  6. A method for fitting regression splines with varying polynomial order in the linear mixed model.

    PubMed

    Edwards, Lloyd J; Stewart, Paul W; MacDougall, James E; Helms, Ronald W

    2006-02-15

    The linear mixed model has become a widely used tool for longitudinal analysis of continuous variables. The use of regression splines in these models offers the analyst additional flexibility in the formulation of descriptive analyses, exploratory analyses and hypothesis-driven confirmatory analyses. We propose a method for fitting piecewise polynomial regression splines with varying polynomial order in the fixed effects and/or random effects of the linear mixed model. The polynomial segments are explicitly constrained by side conditions for continuity and some smoothness at the points where they join. By using a reparameterization of this explicitly constrained linear mixed model, an implicitly constrained linear mixed model is constructed that simplifies implementation of fixed-knot regression splines. The proposed approach is relatively simple, handles splines in one variable or multiple variables, and can be easily programmed using existing commercial software such as SAS or S-plus. The method is illustrated using two examples: an analysis of longitudinal viral load data from a study of subjects with acute HIV-1 infection and an analysis of 24-hour ambulatory blood pressure profiles.

  7. Multinomial logistic regression modelling of obesity and overweight among primary school students in a rural area of Negeri Sembilan

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd

    Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test ofmore » the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.« less

  8. Multinomial logistic regression modelling of obesity and overweight among primary school students in a rural area of Negeri Sembilan

    NASA Astrophysics Data System (ADS)

    Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd; Baharum, Adam

    2015-10-01

    Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test of the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.

  9. Parameterizing sorption isotherms using a hybrid global-local fitting procedure.

    PubMed

    Matott, L Shawn; Singh, Anshuman; Rabideau, Alan J

    2017-05-01

    Predictive modeling of the transport and remediation of groundwater contaminants requires an accurate description of the sorption process, which is usually provided by fitting an isotherm model to site-specific laboratory data. Commonly used calibration procedures, listed in order of increasing sophistication, include: trial-and-error, linearization, non-linear regression, global search, and hybrid global-local search. Given the considerable variability in fitting procedures applied in published isotherm studies, we investigated the importance of algorithm selection through a series of numerical experiments involving 13 previously published sorption datasets. These datasets, considered representative of state-of-the-art for isotherm experiments, had been previously analyzed using trial-and-error, linearization, or non-linear regression methods. The isotherm expressions were re-fit using a 3-stage hybrid global-local search procedure (i.e. global search using particle swarm optimization followed by Powell's derivative free local search method and Gauss-Marquardt-Levenberg non-linear regression). The re-fitted expressions were then compared to previously published fits in terms of the optimized weighted sum of squared residuals (WSSR) fitness function, the final estimated parameters, and the influence on contaminant transport predictions - where easily computed concentration-dependent contaminant retardation factors served as a surrogate measure of likely transport behavior. Results suggest that many of the previously published calibrated isotherm parameter sets were local minima. In some cases, the updated hybrid global-local search yielded order-of-magnitude reductions in the fitness function. In particular, of the candidate isotherms, the Polanyi-type models were most likely to benefit from the use of the hybrid fitting procedure. In some cases, improvements in fitness function were associated with slight (<10%) changes in parameter values, but in other cases significant (>50%) changes in parameter values were noted. Despite these differences, the influence of isotherm misspecification on contaminant transport predictions was quite variable and difficult to predict from inspection of the isotherms. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Paleotemperature reconstruction from mammalian phosphate δ18O records - an alternative view on data processing

    NASA Astrophysics Data System (ADS)

    Skrzypek, Grzegorz; Sadler, Rohan; Wiśniewski, Andrzej

    2017-04-01

    The stable oxygen isotope composition of phosphates (δ18O) extracted from mammalian bone and teeth material is commonly used as a proxy for paleotemperature. Historically, several different analytical and statistical procedures for determining air paleotemperatures from the measured δ18O of phosphates have been applied. This inconsistency in both stable isotope data processing and the application of statistical procedures has led to large and unwanted differences between calculated results. This study presents the uncertainty associated with two of the most commonly used regression methods: least squares inverted fit and transposed fit. We assessed the performance of these methods by designing and applying calculation experiments to multiple real-life data sets, calculating in reverse temperatures, and comparing them with true recorded values. Our calculations clearly show that the mean absolute errors are always substantially higher for the inverted fit (a causal model), with the transposed fit (a predictive model) returning mean values closer to the measured values (Skrzypek et al. 2015). The predictive models always performed better than causal models, with 12-65% lower mean absolute errors. Moreover, the least-squares regression (LSM) model is more appropriate than Reduced Major Axis (RMA) regression for calculating the environmental water stable oxygen isotope composition from phosphate signatures, as well as for calculating air temperature from the δ18O value of environmental water. The transposed fit introduces a lower overall error than the inverted fit for both the δ18O of environmental water and Tair calculations; therefore, the predictive models are more statistically efficient than the causal models in this instance. The direct comparison of paleotemperature results from different laboratories and studies may only be achieved if a single method of calculation is applied. Reference Skrzypek G., Sadler R., Wiśniewski A., 2016. Reassessment of recommendations for processing mammal phosphate δ18O data for paleotemperature reconstruction. Palaeogeography, Palaeoclimatology, Palaeoecology 446, 162-167.

  11. Association of Fitness With Incident Dyslipidemias Over 25 Years in the Coronary Artery Risk Development in Young Adults Study.

    PubMed

    Sarzynski, Mark A; Schuna, John M; Carnethon, Mercedes R; Jacobs, David R; Lewis, Cora E; Quesenberry, Charles P; Sidney, Stephen; Schreiner, Pamela J; Sternfeld, Barbara

    2015-11-01

    Few studies have examined the longitudinal associations of fitness or changes in fitness on the risk of developing dyslipidemias. This study examined the associations of (1) baseline fitness with 25-year dyslipidemia incidence and (2) 20-year fitness change on dyslipidemia development in middle age in the Coronary Artery Risk Development in Young Adults Study (CARDIA). Multivariable Cox proportional hazards regression models were used to test the association of baseline fitness (1985-1986) with dyslipidemia incidence over 25 years (2010-2011) in CARDIA (N=4,898). Modified Poisson regression models were used to examine the association of 20-year change in fitness with dyslipidemia incidence between Years 20 and 25 (n=2,487). Data were analyzed in June 2014 and February 2015. In adjusted models, the risk of incident low high-density lipoprotein cholesterol (HDL-C); high triglycerides; and high low-density lipoprotein cholesterol (LDL-C) was significantly lower, by 9%, 16%, and 14%, respectively, for each 2.0-minute increase in baseline treadmill endurance. After additional adjustment for baseline trait level, the associations remained significant for incident high triglycerides and high LDL-C in the total population and for incident high triglycerides in both men and women. In race-stratified models, these associations appeared to be limited to whites. In adjusted models, change in fitness did not predict 5-year incidence of dyslipidemias, whereas baseline fitness significantly predicted 5-year incidence of high triglycerides. Our findings demonstrate the importance of cardiorespiratory fitness in young adulthood as a risk factor for developing dyslipidemias, particularly high triglycerides, during the transition to middle age. Copyright © 2015 American Journal of Preventive Medicine. Published by Elsevier Inc. All rights reserved.

  12. Association of Fitness With Incident Dyslipidemias Over 25 Years in the Coronary Artery Risk Development in Young Adults Study

    PubMed Central

    Sarzynski, Mark A.; Schuna, John M.; Carnethon, Mercedes R.; Jacobs, David R.; Lewis, Cora E.; Quesenberry, Charles P.; Sidney, Stephen; Schreiner, Pamela J.; Sternfeld, Barbara

    2015-01-01

    Introduction Few studies have examined the longitudinal associations of fitness or changes in fitness on the risk of developing dyslipidemias. This study examined the associations of: (1) baseline fitness with 25-year dyslipidemia incidence; and (2) 20-year fitness change on dyslipidemia development in middle age in the Coronary Artery Risk Development in young Adults (CARDIA) study. Methods Multivariable Cox proportional hazards regression models were used to test the association of baseline fitness (1985–1986) with dyslipidemia incidence over 25 years (2010–2011) in CARDIA (N=4,898). Modified Poisson regression models were used to examine the association of 20-year change in fitness with dyslipidemia incidence between Years 20 and 25 (n=2,487). Data were analyzed in June 2014 and February 2015. Results In adjusted models, the risk of incident low high-density lipoprotein cholesterol (HDL-C), high triglycerides, and high low-density lipoprotein cholesterol (LDL-C) was significantly lower, by 9%, 16%, and 14%, respectively, for each 2.0-minute increase in baseline treadmill endurance. After additional adjustment for baseline trait level, the associations remained significant for incident high triglycerides and high LDL-C in the total population and for incident high triglycerides in both men and women. In race-stratified models, these associations appeared to be limited to whites. In adjusted models, change in fitness did not predict 5-year incidence of dyslipidemias, whereas baseline fitness significantly predicted 5-year incidence of high triglycerides. Conclusions Our findings demonstrate the importance of cardiorespiratory fitness in young adulthood as a risk factor for developing dyslipidemias, particularly high triglycerides, during the transition to middle age. PMID:26165197

  13. A Modified LS+AR Model to Improve the Accuracy of the Short-term Polar Motion Prediction

    NASA Astrophysics Data System (ADS)

    Wang, Z. W.; Wang, Q. X.; Ding, Y. Q.; Zhang, J. J.; Liu, S. S.

    2017-03-01

    There are two problems of the LS (Least Squares)+AR (AutoRegressive) model in polar motion forecast: the inner residual value of LS fitting is reasonable, but the residual value of LS extrapolation is poor; and the LS fitting residual sequence is non-linear. It is unsuitable to establish an AR model for the residual sequence to be forecasted, based on the residual sequence before forecast epoch. In this paper, we make solution to those two problems with two steps. First, restrictions are added to the two endpoints of LS fitting data to fix them on the LS fitting curve. Therefore, the fitting values next to the two endpoints are very close to the observation values. Secondly, we select the interpolation residual sequence of an inward LS fitting curve, which has a similar variation trend as the LS extrapolation residual sequence, as the modeling object of AR for the residual forecast. Calculation examples show that this solution can effectively improve the short-term polar motion prediction accuracy by the LS+AR model. In addition, the comparison results of the forecast models of RLS (Robustified Least Squares)+AR, RLS+ARIMA (AutoRegressive Integrated Moving Average), and LS+ANN (Artificial Neural Network) confirm the feasibility and effectiveness of the solution for the polar motion forecast. The results, especially for the polar motion forecast in the 1-10 days, show that the forecast accuracy of the proposed model can reach the world level.

  14. Forecasting the probability of future groundwater levels declining below specified low thresholds in the conterminous U.S.

    USGS Publications Warehouse

    Dudley, Robert W.; Hodgkins, Glenn A.; Dickinson, Jesse

    2017-01-01

    We present a logistic regression approach for forecasting the probability of future groundwater levels declining or maintaining below specific groundwater-level thresholds. We tested our approach on 102 groundwater wells in different climatic regions and aquifers of the United States that are part of the U.S. Geological Survey Groundwater Climate Response Network. We evaluated the importance of current groundwater levels, precipitation, streamflow, seasonal variability, Palmer Drought Severity Index, and atmosphere/ocean indices for developing the logistic regression equations. Several diagnostics of model fit were used to evaluate the regression equations, including testing of autocorrelation of residuals, goodness-of-fit metrics, and bootstrap validation testing. The probabilistic predictions were most successful at wells with high persistence (low month-to-month variability) in their groundwater records and at wells where the groundwater level remained below the defined low threshold for sustained periods (generally three months or longer). The model fit was weakest at wells with strong seasonal variability in levels and with shorter duration low-threshold events. We identified challenges in deriving probabilistic-forecasting models and possible approaches for addressing those challenges.

  15. Regression modeling and prediction of road sweeping brush load characteristics from finite element analysis and experimental results.

    PubMed

    Wang, Chong; Sun, Qun; Wahab, Magd Abdel; Zhang, Xingyu; Xu, Limin

    2015-09-01

    Rotary cup brushes mounted on each side of a road sweeper undertake heavy debris removal tasks but the characteristics have not been well known until recently. A Finite Element (FE) model that can analyze brush deformation and predict brush characteristics have been developed to investigate the sweeping efficiency and to assist the controller design. However, the FE model requires large amount of CPU time to simulate each brush design and operating scenario, which may affect its applications in a real-time system. This study develops a mathematical regression model to summarize the FE modeled results. The complex brush load characteristic curves were statistically analyzed to quantify the effects of cross-section, length, mounting angle, displacement and rotational speed etc. The data were then fitted by a multiple variable regression model using the maximum likelihood method. The fitted results showed good agreement with the FE analysis results and experimental results, suggesting that the mathematical regression model may be directly used in a real-time system to predict characteristics of different brushes under varying operating conditions. The methodology may also be used in the design and optimization of rotary brush tools. Copyright © 2015 Elsevier Ltd. All rights reserved.

  16. Isolating the cow-specific part of residual energy intake in lactating dairy cows using random regressions.

    PubMed

    Fischer, A; Friggens, N C; Berry, D P; Faverdin, P

    2018-07-01

    The ability to properly assess and accurately phenotype true differences in feed efficiency among dairy cows is key to the development of breeding programs for improving feed efficiency. The variability among individuals in feed efficiency is commonly characterised by the residual intake approach. Residual feed intake is represented by the residuals of a linear regression of intake on the corresponding quantities of the biological functions that consume (or release) energy. However, the residuals include both, model fitting and measurement errors as well as any variability in cow efficiency. The objective of this study was to isolate the individual animal variability in feed efficiency from the residual component. Two separate models were fitted, in one the standard residual energy intake (REI) was calculated as the residual of a multiple linear regression of lactation average net energy intake (NEI) on lactation average milk energy output, average metabolic BW, as well as lactation loss and gain of body condition score. In the other, a linear mixed model was used to simultaneously fit fixed linear regressions and random cow levels on the biological traits and intercept using fortnight repeated measures for the variables. This method split the predicted NEI in two parts: one quantifying the population mean intercept and coefficients, and one quantifying cow-specific deviations in the intercept and coefficients. The cow-specific part of predicted NEI was assumed to isolate true differences in feed efficiency among cows. NEI and associated energy expenditure phenotypes were available for the first 17 fortnights of lactation from 119 Holstein cows; all fed a constant energy-rich diet. Mixed models fitting cow-specific intercept and coefficients to different combinations of the aforementioned energy expenditure traits, calculated on a fortnightly basis, were compared. The variance of REI estimated with the lactation average model represented only 8% of the variance of measured NEI. Among all compared mixed models, the variance of the cow-specific part of predicted NEI represented between 53% and 59% of the variance of REI estimated from the lactation average model or between 4% and 5% of the variance of measured NEI. The remaining 41% to 47% of the variance of REI estimated with the lactation average model may therefore reflect model fitting errors or measurement errors. In conclusion, the use of a mixed model framework with cow-specific random regressions seems to be a promising method to isolate the cow-specific component of REI in dairy cows.

  17. Testing goodness of fit in regression: a general approach for specified alternatives.

    PubMed

    Solari, Aldo; le Cessie, Saskia; Goeman, Jelle J

    2012-12-10

    When fitting generalized linear models or the Cox proportional hazards model, it is important to have tools to test for lack of fit. Because lack of fit comes in all shapes and sizes, distinguishing among different types of lack of fit is of practical importance. We argue that an adequate diagnosis of lack of fit requires a specified alternative model. Such specification identifies the type of lack of fit the test is directed against so that if we reject the null hypothesis, we know the direction of the departure from the model. The goodness-of-fit approach of this paper allows to treat different types of lack of fit within a unified general framework and to consider many existing tests as special cases. Connections with penalized likelihood and random effects are discussed, and the application of the proposed approach is illustrated with medical examples. Tailored functions for goodness-of-fit testing have been implemented in the R package global test. Copyright © 2012 John Wiley & Sons, Ltd.

  18. Changes in Clavicle Length and Maturation in Americans: 1840-1980.

    PubMed

    Langley, Natalie R; Cridlin, Sandra

    2016-01-01

    Secular changes refer to short-term biological changes ostensibly due to environmental factors. Two well-documented secular trends in many populations are earlier age of menarche and increasing stature. This study synthesizes data on maximum clavicle length and fusion of the medial epiphysis in 1840-1980 American birth cohorts to provide a comprehensive assessment of developmental and morphological change in the clavicle. Clavicles from the Hamann-Todd Human Osteological Collection (n = 354), McKern and Stewart Korean War males (n = 341), Forensic Anthropology Data Bank (n = 1,239), and the McCormick Clavicle Collection (n = 1,137) were used in the analysis. Transition analysis was used to evaluate fusion of the medial epiphysis (scored as unfused, fusing, or fused). Several statistical treatments were used to assess fluctuations in maximum clavicle length. First, Durbin-Watson tests were used to evaluate autocorrelation, and a local regression (LOESS) was used to identify visual shifts in the regression slope. Next, piecewise regression was used to fit linear regression models before and after the estimated breakpoints. Multiple starting parameters were tested in the range determined to contain the breakpoint, and the model with the smallest mean squared error was chosen as the best fit. The parameters from the best-fit models were then used to derive the piecewise models, which were compared with the initial simple linear regression models to determine which model provided the best fit for the secular change data. The epiphyseal union data indicate a decline in the age at onset of fusion since the early twentieth century. Fusion commences approximately four years earlier in mid- to late twentieth-century birth cohorts than in late nineteenth- and early twentieth-century birth cohorts. However, fusion is completed at roughly the same age across cohorts. The most significant decline in age at onset of epiphyseal union appears to have occurred since the mid-twentieth century. LOESS plots show a breakpoint in the clavicle length data around the mid-twentieth century in both sexes, and piecewise regression models indicate a significant decrease in clavicle length in the American population after 1940. The piecewise model provides a slightly better fit than the simple linear model. Since the model standard error is not substantially different from the piecewise model, an argument could be made to select the less complex linear model. However, we chose the piecewise model to detect changes in clavicle length that are overfitted with a linear model. The decrease in maximum clavicle length is in line with a documented narrowing of the American skeletal form, as shown by analyses of cranial and facial breadth and bi-iliac breadth of the pelvis. Environmental influences on skeletal form include increases in body mass index, health improvements, improved socioeconomic status, and elimination of infectious diseases. Secular changes in bony dimensions and skeletal maturation stipulate that medical and forensic standards used to deduce information about growth, health, and biological traits must be derived from modern populations.

  19. The extended Lennard-Jones potential energy function: A simpler model for direct-potential-fit analysis

    NASA Astrophysics Data System (ADS)

    Hajigeorgiou, Photos G.

    2016-12-01

    An analytical model for the diatomic potential energy function that was recently tested as a universal function (Hajigeorgiou, 2010) has been further modified and tested as a suitable model for direct-potential-fit analysis. Applications are presented for the ground electronic states of three diatomic molecules: oxygen, carbon monoxide, and hydrogen fluoride. The adjustable parameters of the extended Lennard-Jones potential model are determined through nonlinear regression by fits to calculated rovibrational energy term values or experimental spectroscopic line positions. The model is shown to lead to reliable, compact and simple representations for the potential energy functions of these systems and could therefore be classified as a suitable and attractive model for direct-potential-fit analysis.

  20. A comparative study between nonlinear regression and artificial neural network approaches for modelling wild oat (Avena fatua) field emergence

    USDA-ARS?s Scientific Manuscript database

    Non-linear regression techniques are used widely to fit weed field emergence patterns to soil microclimatic indices using S-type functions. Artificial neural networks present interesting and alternative features for such modeling purposes. In this work, a univariate hydrothermal-time based Weibull m...

  1. Stock price forecasting for companies listed on Tehran stock exchange using multivariate adaptive regression splines model and semi-parametric splines technique

    NASA Astrophysics Data System (ADS)

    Rounaghi, Mohammad Mahdi; Abbaszadeh, Mohammad Reza; Arashi, Mohammad

    2015-11-01

    One of the most important topics of interest to investors is stock price changes. Investors whose goals are long term are sensitive to stock price and its changes and react to them. In this regard, we used multivariate adaptive regression splines (MARS) model and semi-parametric splines technique for predicting stock price in this study. The MARS model as a nonparametric method is an adaptive method for regression and it fits for problems with high dimensions and several variables. semi-parametric splines technique was used in this study. Smoothing splines is a nonparametric regression method. In this study, we used 40 variables (30 accounting variables and 10 economic variables) for predicting stock price using the MARS model and using semi-parametric splines technique. After investigating the models, we select 4 accounting variables (book value per share, predicted earnings per share, P/E ratio and risk) as influencing variables on predicting stock price using the MARS model. After fitting the semi-parametric splines technique, only 4 accounting variables (dividends, net EPS, EPS Forecast and P/E Ratio) were selected as variables effective in forecasting stock prices.

  2. Regularization Paths for Conditional Logistic Regression: The clogitL1 Package.

    PubMed

    Reid, Stephen; Tibshirani, Rob

    2014-07-01

    We apply the cyclic coordinate descent algorithm of Friedman, Hastie, and Tibshirani (2010) to the fitting of a conditional logistic regression model with lasso [Formula: see text] and elastic net penalties. The sequential strong rules of Tibshirani, Bien, Hastie, Friedman, Taylor, Simon, and Tibshirani (2012) are also used in the algorithm and it is shown that these offer a considerable speed up over the standard coordinate descent algorithm with warm starts. Once implemented, the algorithm is used in simulation studies to compare the variable selection and prediction performance of the conditional logistic regression model against that of its unconditional (standard) counterpart. We find that the conditional model performs admirably on datasets drawn from a suitable conditional distribution, outperforming its unconditional counterpart at variable selection. The conditional model is also fit to a small real world dataset, demonstrating how we obtain regularization paths for the parameters of the model and how we apply cross validation for this method where natural unconditional prediction rules are hard to come by.

  3. Regularization Paths for Conditional Logistic Regression: The clogitL1 Package

    PubMed Central

    Reid, Stephen; Tibshirani, Rob

    2014-01-01

    We apply the cyclic coordinate descent algorithm of Friedman, Hastie, and Tibshirani (2010) to the fitting of a conditional logistic regression model with lasso (ℓ1) and elastic net penalties. The sequential strong rules of Tibshirani, Bien, Hastie, Friedman, Taylor, Simon, and Tibshirani (2012) are also used in the algorithm and it is shown that these offer a considerable speed up over the standard coordinate descent algorithm with warm starts. Once implemented, the algorithm is used in simulation studies to compare the variable selection and prediction performance of the conditional logistic regression model against that of its unconditional (standard) counterpart. We find that the conditional model performs admirably on datasets drawn from a suitable conditional distribution, outperforming its unconditional counterpart at variable selection. The conditional model is also fit to a small real world dataset, demonstrating how we obtain regularization paths for the parameters of the model and how we apply cross validation for this method where natural unconditional prediction rules are hard to come by. PMID:26257587

  4. How Good Are Statistical Models at Approximating Complex Fitness Landscapes?

    PubMed Central

    du Plessis, Louis; Leventhal, Gabriel E.; Bonhoeffer, Sebastian

    2016-01-01

    Fitness landscapes determine the course of adaptation by constraining and shaping evolutionary trajectories. Knowledge of the structure of a fitness landscape can thus predict evolutionary outcomes. Empirical fitness landscapes, however, have so far only offered limited insight into real-world questions, as the high dimensionality of sequence spaces makes it impossible to exhaustively measure the fitness of all variants of biologically meaningful sequences. We must therefore revert to statistical descriptions of fitness landscapes that are based on a sparse sample of fitness measurements. It remains unclear, however, how much data are required for such statistical descriptions to be useful. Here, we assess the ability of regression models accounting for single and pairwise mutations to correctly approximate a complex quasi-empirical fitness landscape. We compare approximations based on various sampling regimes of an RNA landscape and find that the sampling regime strongly influences the quality of the regression. On the one hand it is generally impossible to generate sufficient samples to achieve a good approximation of the complete fitness landscape, and on the other hand systematic sampling schemes can only provide a good description of the immediate neighborhood of a sequence of interest. Nevertheless, we obtain a remarkably good and unbiased fit to the local landscape when using sequences from a population that has evolved under strong selection. Thus, current statistical methods can provide a good approximation to the landscape of naturally evolving populations. PMID:27189564

  5. Goodness-of-fit tests and model diagnostics for negative binomial regression of RNA sequencing data.

    PubMed

    Mi, Gu; Di, Yanming; Schafer, Daniel W

    2015-01-01

    This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.

  6. Item Response Theory Modeling of the Philadelphia Naming Test.

    PubMed

    Fergadiotis, Gerasimos; Kellough, Stacey; Hula, William D

    2015-06-01

    In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating explanatory variables to item difficulty. This article describes the statistical model underlying the computer adaptive PNT presented in a companion article (Hula, Kellough, & Fergadiotis, 2015). Using archival data, we evaluated the fit of the PNT to 1- and 2-parameter logistic models and examined the precision of the resulting parameter estimates. We regressed the item difficulty estimates on three predictor variables: word length, age of acquisition, and contextual diversity. The 2-parameter logistic model demonstrated marginally better fit, but the fit of the 1-parameter logistic model was adequate. Precision was excellent for both person ability and item difficulty estimates. Word length, age of acquisition, and contextual diversity all independently contributed to variance in item difficulty. Item-response-theory methods can be productively used to analyze and quantify anomia severity in aphasia. Regression of item difficulty on lexical variables supported the validity of the PNT and interpretation of anomia severity scores in the context of current word-finding models.

  7. RBF kernel based support vector regression to estimate the blood volume and heart rate responses during hemodialysis.

    PubMed

    Javed, Faizan; Chan, Gregory S H; Savkin, Andrey V; Middleton, Paul M; Malouf, Philip; Steel, Elizabeth; Mackie, James; Lovell, Nigel H

    2009-01-01

    This paper uses non-linear support vector regression (SVR) to model the blood volume and heart rate (HR) responses in 9 hemodynamically stable kidney failure patients during hemodialysis. Using radial bias function (RBF) kernels the non-parametric models of relative blood volume (RBV) change with time as well as percentage change in HR with respect to RBV were obtained. The e-insensitivity based loss function was used for SVR modeling. Selection of the design parameters which includes capacity (C), insensitivity region (e) and the RBF kernel parameter (sigma) was made based on a grid search approach and the selected models were cross-validated using the average mean square error (AMSE) calculated from testing data based on a k-fold cross-validation technique. Linear regression was also applied to fit the curves and the AMSE was calculated for comparison with SVR. For the model based on RBV with time, SVR gave a lower AMSE for both training (AMSE=1.5) as well as testing data (AMSE=1.4) compared to linear regression (AMSE=1.8 and 1.5). SVR also provided a better fit for HR with RBV for both training as well as testing data (AMSE=15.8 and 16.4) compared to linear regression (AMSE=25.2 and 20.1).

  8. Smooth Scalar-on-Image Regression via Spatial Bayesian Variable Selection

    PubMed Central

    Goldsmith, Jeff; Huang, Lei; Crainiceanu, Ciprian M.

    2013-01-01

    We develop scalar-on-image regression models when images are registered multidimensional manifolds. We propose a fast and scalable Bayes inferential procedure to estimate the image coefficient. The central idea is the combination of an Ising prior distribution, which controls a latent binary indicator map, and an intrinsic Gaussian Markov random field, which controls the smoothness of the nonzero coefficients. The model is fit using a single-site Gibbs sampler, which allows fitting within minutes for hundreds of subjects with predictor images containing thousands of locations. The code is simple and is provided in less than one page in the Appendix. We apply this method to a neuroimaging study where cognitive outcomes are regressed on measures of white matter microstructure at every voxel of the corpus callosum for hundreds of subjects. PMID:24729670

  9. Modeling Group Differences in OLS and Orthogonal Regression: Implications for Differential Validity Studies

    ERIC Educational Resources Information Center

    Kane, Michael T.; Mroch, Andrew A.

    2010-01-01

    In evaluating the relationship between two measures across different groups (i.e., in evaluating "differential validity") it is necessary to examine differences in correlation coefficients and in regression lines. Ordinary least squares (OLS) regression is the standard method for fitting lines to data, but its criterion for optimal fit…

  10. Modeling of boldine alkaloid adsorption onto pure and propyl-sulfonic acid-modified mesoporous silicas. A comparative study.

    PubMed

    Geszke-Moritz, Małgorzata; Moritz, Michał

    2016-12-01

    The present study deals with the adsorption of boldine onto pure and propyl-sulfonic acid-functionalized SBA-15, SBA-16 and mesocellular foam (MCF) materials. Siliceous adsorbents were characterized by nitrogen sorption analysis, transmission electron microscopy (TEM), scanning electron microscopy (SEM), Fourier-transform infrared (FT-IR) spectroscopy and thermogravimetric analysis. The equilibrium adsorption data were analyzed using the Langmuir, Freundlich, Redlich-Peterson, and Temkin isotherms. Moreover, the Dubinin-Radushkevich and Dubinin-Astakhov isotherm models based on the Polanyi adsorption potential were employed. The latter was calculated using two alternative formulas including solubility-normalized (S-model) and empirical C-model. In order to find the best-fit isotherm, both linear regression and nonlinear fitting analysis were carried out. The Dubinin-Astakhov (S-model) isotherm revealed the best fit to the experimental points for adsorption of boldine onto pure mesoporous materials using both linear and nonlinear fitting analysis. Meanwhile, the process of boldine sorption onto modified silicas was described the best by the Langmuir and Temkin isotherms using linear regression and nonlinear fitting analysis, respectively. The values of adsorption energy (below 8kJ/mol) indicate the physical nature of boldine adsorption onto unmodified silicas whereas the ionic interactions seem to be the main force of alkaloid adsorption onto functionalized sorbents (energy of adsorption above 8kJ/mol). Copyright © 2016 Elsevier B.V. All rights reserved.

  11. Tests of a habitat suitability model for black-capped chickadees

    USGS Publications Warehouse

    Schroeder, Richard L.

    1990-01-01

    The black-capped chickadee (Parus atricapillus) Habitat Suitability Index (HSI) model provides a quantitative rating of the capability of a habitat to support breeding, based on measures related to food and nest site availability. The model assumption that tree canopy volume can be predicted from measures of tree height and canopy closure was tested using data from foliage volume studies conducted in the riparian cottonwood habitat along the South Platte River in Colorado. Least absolute deviations (LAD) regression showed that canopy cover and over story tree height yielded volume predictions significantly lower than volume estimated by more direct methods. Revisions to these model relations resulted in improved predictions of foliage volume. The relation between the HSI and estimates of black-capped chickadee population densities was examined using LAD regression for both the original model and the model with the foliage volume revisions. Residuals from these models were compared to residuals from both a zero slope model and an ideal model. The fit model for the original HSI differed significantly from the ideal model, whereas the fit model for the original HSI did not differ significantly from the ideal model. However, both the fit model for the original HSI and the fit model for the revised HSI did not differ significantly from a model with a zero slope. Although further testing of the revised model is needed, its use is recommended for more realistic estimates of tree canopy volume and habitat suitability.

  12. Predicting location of recurrence using FDG, FLT, and Cu-ATSM PET in canine sinonasal tumors treated with radiotherapy

    NASA Astrophysics Data System (ADS)

    Bradshaw, Tyler; Fu, Rau; Bowen, Stephen; Zhu, Jun; Forrest, Lisa; Jeraj, Robert

    2015-07-01

    Dose painting relies on the ability of functional imaging to identify resistant tumor subvolumes to be targeted for additional boosting. This work assessed the ability of FDG, FLT, and Cu-ATSM PET imaging to predict the locations of residual FDG PET in canine tumors following radiotherapy. Nineteen canines with spontaneous sinonasal tumors underwent PET/CT imaging with radiotracers FDG, FLT, and Cu-ATSM prior to hypofractionated radiotherapy. Therapy consisted of 10 fractions of 4.2 Gy to the sinonasal cavity with or without an integrated boost of 0.8 Gy to the GTV. Patients had an additional FLT PET/CT scan after fraction 2, a Cu-ATSM PET/CT scan after fraction 3, and follow-up FDG PET/CT scans after radiotherapy. Following image registration, simple and multiple linear and logistic voxel regressions were performed to assess how well pre- and mid-treatment PET imaging predicted post-treatment FDG uptake. R2 and pseudo R2 were used to assess the goodness of fits. For simple linear regression models, regression coefficients for all pre- and mid-treatment PET images were significantly positive across the population (P < 0.05). However, there was large variability among patients in goodness of fits: R2 ranged from 0.00 to 0.85, with a median of 0.12. Results for logistic regression models were similar. Multiple linear regression models resulted in better fits (median R2 = 0.31), but there was still large variability between patients in R2. The R2 from regression models for different predictor variables were highly correlated across patients (R ≈ 0.8), indicating tumors that were poorly predicted with one tracer were also poorly predicted by other tracers. In conclusion, the high inter-patient variability in goodness of fits indicates that PET was able to predict locations of residual tumor in some patients, but not others. This suggests not all patients would be good candidates for dose painting based on a single biological target.

  13. Predicting location of recurrence using FDG, FLT, and Cu-ATSM PET in canine sinonasal tumors treated with radiotherapy.

    PubMed

    Bradshaw, Tyler; Fu, Rau; Bowen, Stephen; Zhu, Jun; Forrest, Lisa; Jeraj, Robert

    2015-07-07

    Dose painting relies on the ability of functional imaging to identify resistant tumor subvolumes to be targeted for additional boosting. This work assessed the ability of FDG, FLT, and Cu-ATSM PET imaging to predict the locations of residual FDG PET in canine tumors following radiotherapy. Nineteen canines with spontaneous sinonasal tumors underwent PET/CT imaging with radiotracers FDG, FLT, and Cu-ATSM prior to hypofractionated radiotherapy. Therapy consisted of 10 fractions of 4.2 Gy to the sinonasal cavity with or without an integrated boost of 0.8 Gy to the GTV. Patients had an additional FLT PET/CT scan after fraction 2, a Cu-ATSM PET/CT scan after fraction 3, and follow-up FDG PET/CT scans after radiotherapy. Following image registration, simple and multiple linear and logistic voxel regressions were performed to assess how well pre- and mid-treatment PET imaging predicted post-treatment FDG uptake. R(2) and pseudo R(2) were used to assess the goodness of fits. For simple linear regression models, regression coefficients for all pre- and mid-treatment PET images were significantly positive across the population (P < 0.05). However, there was large variability among patients in goodness of fits: R(2) ranged from 0.00 to 0.85, with a median of 0.12. Results for logistic regression models were similar. Multiple linear regression models resulted in better fits (median R(2) = 0.31), but there was still large variability between patients in R(2). The R(2) from regression models for different predictor variables were highly correlated across patients (R ≈ 0.8), indicating tumors that were poorly predicted with one tracer were also poorly predicted by other tracers. In conclusion, the high inter-patient variability in goodness of fits indicates that PET was able to predict locations of residual tumor in some patients, but not others. This suggests not all patients would be good candidates for dose painting based on a single biological target.

  14. A quasi-Monte-Carlo comparison of parametric and semiparametric regression methods for heavy-tailed and non-normal data: an application to healthcare costs.

    PubMed

    Jones, Andrew M; Lomas, James; Moore, Peter T; Rice, Nigel

    2016-10-01

    We conduct a quasi-Monte-Carlo comparison of the recent developments in parametric and semiparametric regression methods for healthcare costs, both against each other and against standard practice. The population of English National Health Service hospital in-patient episodes for the financial year 2007-2008 (summed for each patient) is randomly divided into two equally sized subpopulations to form an estimation set and a validation set. Evaluating out-of-sample using the validation set, a conditional density approximation estimator shows considerable promise in forecasting conditional means, performing best for accuracy of forecasting and among the best four for bias and goodness of fit. The best performing model for bias is linear regression with square-root-transformed dependent variables, whereas a generalized linear model with square-root link function and Poisson distribution performs best in terms of goodness of fit. Commonly used models utilizing a log-link are shown to perform badly relative to other models considered in our comparison.

  15. Comparison of Survival Models for Analyzing Prognostic Factors in Gastric Cancer Patients

    PubMed

    Habibi, Danial; Rafiei, Mohammad; Chehrei, Ali; Shayan, Zahra; Tafaqodi, Soheil

    2018-03-27

    Objective: There are a number of models for determining risk factors for survival of patients with gastric cancer. This study was conducted to select the model showing the best fit with available data. Methods: Cox regression and parametric models (Exponential, Weibull, Gompertz, Log normal, Log logistic and Generalized Gamma) were utilized in unadjusted and adjusted forms to detect factors influencing mortality of patients. Comparisons were made with Akaike Information Criterion (AIC) by using STATA 13 and R 3.1.3 softwares. Results: The results of this study indicated that all parametric models outperform the Cox regression model. The Log normal, Log logistic and Generalized Gamma provided the best performance in terms of AIC values (179.2, 179.4 and 181.1, respectively). On unadjusted analysis, the results of the Cox regression and parametric models indicated stage, grade, largest diameter of metastatic nest, largest diameter of LM, number of involved lymph nodes and the largest ratio of metastatic nests to lymph nodes, to be variables influencing the survival of patients with gastric cancer. On adjusted analysis, according to the best model (log normal), grade was found as the significant variable. Conclusion: The results suggested that all parametric models outperform the Cox model. The log normal model provides the best fit and is a good substitute for Cox regression. Creative Commons Attribution License

  16. Non-proportional odds multivariate logistic regression of ordinal family data.

    PubMed

    Zaloumis, Sophie G; Scurrah, Katrina J; Harrap, Stephen B; Ellis, Justine A; Gurrin, Lyle C

    2015-03-01

    Methods to examine whether genetic and/or environmental sources can account for the residual variation in ordinal family data usually assume proportional odds. However, standard software to fit the non-proportional odds model to ordinal family data is limited because the correlation structure of family data is more complex than for other types of clustered data. To perform these analyses we propose the non-proportional odds multivariate logistic regression model and take a simulation-based approach to model fitting using Markov chain Monte Carlo methods, such as partially collapsed Gibbs sampling and the Metropolis algorithm. We applied the proposed methodology to male pattern baldness data from the Victorian Family Heart Study. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  17. Response Surface Methodology Using a Fullest Balanced Model: A Re-Analysis of a Dataset in the Korean Journal for Food Science of Animal Resources.

    PubMed

    Rheem, Sungsue; Rheem, Insoo; Oh, Sejong

    2017-01-01

    Response surface methodology (RSM) is a useful set of statistical techniques for modeling and optimizing responses in research studies of food science. In the analysis of response surface data, a second-order polynomial regression model is usually used. However, sometimes we encounter situations where the fit of the second-order model is poor. If the model fitted to the data has a poor fit including a lack of fit, the modeling and optimization results might not be accurate. In such a case, using a fullest balanced model, which has no lack of fit, can fix such problem, enhancing the accuracy of the response surface modeling and optimization. This article presents how to develop and use such a model for the better modeling and optimizing of the response through an illustrative re-analysis of a dataset in Park et al. (2014) published in the Korean Journal for Food Science of Animal Resources .

  18. Simulation on Poisson and negative binomial models of count road accident modeling

    NASA Astrophysics Data System (ADS)

    Sapuan, M. S.; Razali, A. M.; Zamzuri, Z. H.; Ibrahim, K.

    2016-11-01

    Accident count data have often been shown to have overdispersion. On the other hand, the data might contain zero count (excess zeros). The simulation study was conducted to create a scenarios which an accident happen in T-junction with the assumption the dependent variables of generated data follows certain distribution namely Poisson and negative binomial distribution with different sample size of n=30 to n=500. The study objective was accomplished by fitting Poisson regression, negative binomial regression and Hurdle negative binomial model to the simulated data. The model validation was compared and the simulation result shows for each different sample size, not all model fit the data nicely even though the data generated from its own distribution especially when the sample size is larger. Furthermore, the larger sample size indicates that more zeros accident count in the dataset.

  19. Estimation of retinal vessel caliber using model fitting and random forests

    NASA Astrophysics Data System (ADS)

    Araújo, Teresa; Mendonça, Ana Maria; Campilho, Aurélio

    2017-03-01

    Retinal vessel caliber changes are associated with several major diseases, such as diabetes and hypertension. These caliber changes can be evaluated using eye fundus images. However, the clinical assessment is tiresome and prone to errors, motivating the development of automatic methods. An automatic method based on vessel crosssection intensity profile model fitting for the estimation of vessel caliber in retinal images is herein proposed. First, vessels are segmented from the image, vessel centerlines are detected and individual segments are extracted and smoothed. Intensity profiles are extracted perpendicularly to the vessel, and the profile lengths are determined. Then, model fitting is applied to the smoothed profiles. A novel parametric model (DoG-L7) is used, consisting on a Difference-of-Gaussians multiplied by a line which is able to describe profile asymmetry. Finally, the parameters of the best-fit model are used for determining the vessel width through regression using ensembles of bagged regression trees with random sampling of the predictors (random forests). The method is evaluated on the REVIEW public dataset. A precision close to the observers is achieved, outperforming other state-of-the-art methods. The method is robust and reliable for width estimation in images with pathologies and artifacts, with performance independent of the range of diameters.

  20. A Bayesian goodness of fit test and semiparametric generalization of logistic regression with measurement data.

    PubMed

    Schörgendorfer, Angela; Branscum, Adam J; Hanson, Timothy E

    2013-06-01

    Logistic regression is a popular tool for risk analysis in medical and population health science. With continuous response data, it is common to create a dichotomous outcome for logistic regression analysis by specifying a threshold for positivity. Fitting a linear regression to the nondichotomized response variable assuming a logistic sampling model for the data has been empirically shown to yield more efficient estimates of odds ratios than ordinary logistic regression of the dichotomized endpoint. We illustrate that risk inference is not robust to departures from the parametric logistic distribution. Moreover, the model assumption of proportional odds is generally not satisfied when the condition of a logistic distribution for the data is violated, leading to biased inference from a parametric logistic analysis. We develop novel Bayesian semiparametric methodology for testing goodness of fit of parametric logistic regression with continuous measurement data. The testing procedures hold for any cutoff threshold and our approach simultaneously provides the ability to perform semiparametric risk estimation. Bayes factors are calculated using the Savage-Dickey ratio for testing the null hypothesis of logistic regression versus a semiparametric generalization. We propose a fully Bayesian and a computationally efficient empirical Bayesian approach to testing, and we present methods for semiparametric estimation of risks, relative risks, and odds ratios when parametric logistic regression fails. Theoretical results establish the consistency of the empirical Bayes test. Results from simulated data show that the proposed approach provides accurate inference irrespective of whether parametric assumptions hold or not. Evaluation of risk factors for obesity shows that different inferences are derived from an analysis of a real data set when deviations from a logistic distribution are permissible in a flexible semiparametric framework. © 2013, The International Biometric Society.

  1. Assessing Local Model Adequacy in Bayesian Hierarchical Models Using the Partitioned Deviance Information Criterion

    PubMed Central

    Wheeler, David C.; Hickson, DeMarc A.; Waller, Lance A.

    2010-01-01

    Many diagnostic tools and goodness-of-fit measures, such as the Akaike information criterion (AIC) and the Bayesian deviance information criterion (DIC), are available to evaluate the overall adequacy of linear regression models. In addition, visually assessing adequacy in models has become an essential part of any regression analysis. In this paper, we focus on a spatial consideration of the local DIC measure for model selection and goodness-of-fit evaluation. We use a partitioning of the DIC into the local DIC, leverage, and deviance residuals to assess local model fit and influence for both individual observations and groups of observations in a Bayesian framework. We use visualization of the local DIC and differences in local DIC between models to assist in model selection and to visualize the global and local impacts of adding covariates or model parameters. We demonstrate the utility of the local DIC in assessing model adequacy using HIV prevalence data from pregnant women in the Butare province of Rwanda during 1989-1993 using a range of linear model specifications, from global effects only to spatially varying coefficient models, and a set of covariates related to sexual behavior. Results of applying the diagnostic visualization approach include more refined model selection and greater understanding of the models as applied to the data. PMID:21243121

  2. Random regression analyses using B-splines functions to model growth from birth to adult age in Canchim cattle.

    PubMed

    Baldi, F; Alencar, M M; Albuquerque, L G

    2010-12-01

    The objective of this work was to estimate covariance functions using random regression models on B-splines functions of animal age, for weights from birth to adult age in Canchim cattle. Data comprised 49,011 records on 2435 females. The model of analysis included fixed effects of contemporary groups, age of dam as quadratic covariable and the population mean trend taken into account by a cubic regression on orthogonal polynomials of animal age. Residual variances were modelled through a step function with four classes. The direct and maternal additive genetic effects, and animal and maternal permanent environmental effects were included as random effects in the model. A total of seventeen analyses, considering linear, quadratic and cubic B-splines functions and up to seven knots, were carried out. B-spline functions of the same order were considered for all random effects. Random regression models on B-splines functions were compared to a random regression model on Legendre polynomials and with a multitrait model. Results from different models of analyses were compared using the REML form of the Akaike Information criterion and Schwarz' Bayesian Information criterion. In addition, the variance components and genetic parameters estimated for each random regression model were also used as criteria to choose the most adequate model to describe the covariance structure of the data. A model fitting quadratic B-splines, with four knots or three segments for direct additive genetic effect and animal permanent environmental effect and two knots for maternal additive genetic effect and maternal permanent environmental effect, was the most adequate to describe the covariance structure of the data. Random regression models using B-spline functions as base functions fitted the data better than Legendre polynomials, especially at mature ages, but higher number of parameters need to be estimated with B-splines functions. © 2010 Blackwell Verlag GmbH.

  3. Machine learning approaches to the social determinants of health in the health and retirement study.

    PubMed

    Seligman, Benjamin; Tuljapurkar, Shripad; Rehkopf, David

    2018-04-01

    Social and economic factors are important predictors of health and of recognized importance for health systems. However, machine learning, used elsewhere in the biomedical literature, has not been extensively applied to study relationships between society and health. We investigate how machine learning may add to our understanding of social determinants of health using data from the Health and Retirement Study. A linear regression of age and gender, and a parsimonious theory-based regression additionally incorporating income, wealth, and education, were used to predict systolic blood pressure, body mass index, waist circumference, and telomere length. Prediction, fit, and interpretability were compared across four machine learning methods: linear regression, penalized regressions, random forests, and neural networks. All models had poor out-of-sample prediction. Most machine learning models performed similarly to the simpler models. However, neural networks greatly outperformed the three other methods. Neural networks also had good fit to the data ( R 2 between 0.4-0.6, versus <0.3 for all others). Across machine learning models, nine variables were frequently selected or highly weighted as predictors: dental visits, current smoking, self-rated health, serial-seven subtractions, probability of receiving an inheritance, probability of leaving an inheritance of at least $10,000, number of children ever born, African-American race, and gender. Some of the machine learning methods do not improve prediction or fit beyond simpler models, however, neural networks performed well. The predictors identified across models suggest underlying social factors that are important predictors of biological indicators of chronic disease, and that the non-linear and interactive relationships between variables fundamental to the neural network approach may be important to consider.

  4. Accounting for measurement error in log regression models with applications to accelerated testing.

    PubMed

    Richardson, Robert; Tolley, H Dennis; Evenson, William E; Lunt, Barry M

    2018-01-01

    In regression settings, parameter estimates will be biased when the explanatory variables are measured with error. This bias can significantly affect modeling goals. In particular, accelerated lifetime testing involves an extrapolation of the fitted model, and a small amount of bias in parameter estimates may result in a significant increase in the bias of the extrapolated predictions. Additionally, bias may arise when the stochastic component of a log regression model is assumed to be multiplicative when the actual underlying stochastic component is additive. To account for these possible sources of bias, a log regression model with measurement error and additive error is approximated by a weighted regression model which can be estimated using Iteratively Re-weighted Least Squares. Using the reduced Eyring equation in an accelerated testing setting, the model is compared to previously accepted approaches to modeling accelerated testing data with both simulations and real data.

  5. INNOVATIVE INSTRUMENTATION AND ANALYSIS OF THE TEMPERATURE MEASUREMENT FOR HIGH TEMPERATURE GASIFICATION

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Seong W. Lee

    During this reporting period, the literature survey including the gasifier temperature measurement literature, the ultrasonic application and its background study in cleaning application, and spray coating process are completed. The gasifier simulator (cold model) testing has been successfully conducted. Four factors (blower voltage, ultrasonic application, injection time intervals, particle weight) were considered as significant factors that affect the temperature measurement. The Analysis of Variance (ANOVA) was applied to analyze the test data. The analysis shows that all four factors are significant to the temperature measurements in the gasifier simulator (cold model). The regression analysis for the case with the normalizedmore » room temperature shows that linear model fits the temperature data with 82% accuracy (18% error). The regression analysis for the case without the normalized room temperature shows 72.5% accuracy (27.5% error). The nonlinear regression analysis indicates a better fit than that of the linear regression. The nonlinear regression model's accuracy is 88.7% (11.3% error) for normalized room temperature case, which is better than the linear regression analysis. The hot model thermocouple sleeve design and fabrication are completed. The gasifier simulator (hot model) design and the fabrication are completed. The system tests of the gasifier simulator (hot model) have been conducted and some modifications have been made. Based on the system tests and results analysis, the gasifier simulator (hot model) has met the proposed design requirement and the ready for system test. The ultrasonic cleaning method is under evaluation and will be further studied for the gasifier simulator (hot model) application. The progress of this project has been on schedule.« less

  6. Detecting sea-level hazards: Simple regression-based methods for calculating the acceleration of sea level

    USGS Publications Warehouse

    Doran, Kara S.; Howd, Peter A.; Sallenger,, Asbury H.

    2016-01-04

    Recent studies, and most of their predecessors, use tide gage data to quantify SL acceleration, ASL(t). In the current study, three techniques were used to calculate acceleration from tide gage data, and of those examined, it was determined that the two techniques based on sliding a regression window through the time series are more robust compared to the technique that fits a single quadratic form to the entire time series, particularly if there is temporal variation in the magnitude of the acceleration. The single-fit quadratic regression method has been the most commonly used technique in determining acceleration in tide gage data. The inability of the single-fit method to account for time-varying acceleration may explain some of the inconsistent findings between investigators. Properly quantifying ASL(t) from field measurements is of particular importance in evaluating numerical models of past, present, and future SLR resulting from anticipated climate change.

  7. Transformation Model Choice in Nonlinear Regression Analysis of Fluorescence-based Serial Dilution Assays

    PubMed Central

    Fong, Youyi; Yu, Xuesong

    2016-01-01

    Many modern serial dilution assays are based on fluorescence intensity (FI) readouts. We study optimal transformation model choice for fitting five parameter logistic curves (5PL) to FI-based serial dilution assay data. We first develop a generalized least squares-pseudolikelihood type algorithm for fitting heteroscedastic logistic models. Next we show that the 5PL and log 5PL functions can approximate each other well. We then compare four 5PL models with different choices of log transformation and variance modeling through a Monte Carlo study and real data. Our findings are that the optimal choice depends on the intended use of the fitted curves. PMID:27642502

  8. In vitro differential diagnosis of clavus and verruca by a predictive model generated from electrical impedance.

    PubMed

    Hung, Chien-Ya; Sun, Pei-Lun; Chiang, Shu-Jen; Jaw, Fu-Shan

    2014-01-01

    Similar clinical appearances prevent accurate diagnosis of two common skin diseases, clavus and verruca. In this study, electrical impedance is employed as a novel tool to generate a predictive model for differentiating these two diseases. We used 29 clavus and 28 verruca lesions. To obtain impedance parameters, a LCR-meter system was applied to measure capacitance (C), resistance (Re), impedance magnitude (Z), and phase angle (θ). These values were combined with lesion thickness (d) to characterize the tissue specimens. The results from clavus and verruca were then fitted to a univariate logistic regression model with the generalized estimating equations (GEE) method. In model generation, log ZSD and θSD were formulated as predictors by fitting a multiple logistic regression model with the same GEE method. The potential nonlinear effects of covariates were detected by fitting generalized additive models (GAM). Moreover, the model was validated by the goodness-of-fit (GOF) assessments. Significant mean differences of the index d, Re, Z, and θ are found between clavus and verruca (p<0.001). A final predictive model is established with Z and θ indices. The model fits the observed data quite well. In GOF evaluation, the area under the receiver operating characteristics (ROC) curve is 0.875 (>0.7), the adjusted generalized R2 is 0.512 (>0.3), and the p value of the Hosmer-Lemeshow GOF test is 0.350 (>0.05). This technique promises to provide an approved model for differential diagnosis of clavus and verruca. It could provide a rapid, relatively low-cost, safe and non-invasive screening tool in clinic use.

  9. Fast function-on-scalar regression with penalized basis expansions.

    PubMed

    Reiss, Philip T; Huang, Lei; Mennes, Maarten

    2010-01-01

    Regression models for functional responses and scalar predictors are often fitted by means of basis functions, with quadratic roughness penalties applied to avoid overfitting. The fitting approach described by Ramsay and Silverman in the 1990 s amounts to a penalized ordinary least squares (P-OLS) estimator of the coefficient functions. We recast this estimator as a generalized ridge regression estimator, and present a penalized generalized least squares (P-GLS) alternative. We describe algorithms by which both estimators can be implemented, with automatic selection of optimal smoothing parameters, in a more computationally efficient manner than has heretofore been available. We discuss pointwise confidence intervals for the coefficient functions, simultaneous inference by permutation tests, and model selection, including a novel notion of pointwise model selection. P-OLS and P-GLS are compared in a simulation study. Our methods are illustrated with an analysis of age effects in a functional magnetic resonance imaging data set, as well as a reanalysis of a now-classic Canadian weather data set. An R package implementing the methods is publicly available.

  10. New formulation feed method in tariff model of solar PV in Indonesia

    NASA Astrophysics Data System (ADS)

    Djamal, Muchlishah Hadi; Setiawan, Eko Adhi; Setiawan, Aiman

    2017-03-01

    Geographically, Indonesia has 18 latitudes that correlated strongly with the potential of solar radiation for the implementation of solar photovoltaic (PV) technologies. This is becoming the basis assumption to develop a proportional model of Feed In Tariff (FIT), consequently the FIT will be vary, according to the various of latitudes in Indonesia. This paper proposed a new formulation of solar PV FIT based on the potential of solar radiation and some independent variables such as latitude, longitude, Levelized Cost of Electricity (LCOE), and also socio-economic. The Principal Component Regression (PCR) method is used to analyzed the correlation of six independent variables C1-C6 then three models of FIT are presented. Model FIT-2 is chosen because it has a small residual value and has higher financial benefit compared to the other models. This study reveals the value of variable FIT associated with solar energy potential in each region, can reduce the total FIT to be paid by the state around 80 billion rupiahs in 10 years of 1 MW photovoltaic operation at each 34 provinces in Indonesia.

  11. Robust ridge regression estimators for nonlinear models with applications to high throughput screening assay data.

    PubMed

    Lim, Changwon

    2015-03-30

    Nonlinear regression is often used to evaluate the toxicity of a chemical or a drug by fitting data from a dose-response study. Toxicologists and pharmacologists may draw a conclusion about whether a chemical is toxic by testing the significance of the estimated parameters. However, sometimes the null hypothesis cannot be rejected even though the fit is quite good. One possible reason for such cases is that the estimated standard errors of the parameter estimates are extremely large. In this paper, we propose robust ridge regression estimation procedures for nonlinear models to solve this problem. The asymptotic properties of the proposed estimators are investigated; in particular, their mean squared errors are derived. The performances of the proposed estimators are compared with several standard estimators using simulation studies. The proposed methodology is also illustrated using high throughput screening assay data obtained from the National Toxicology Program. Copyright © 2014 John Wiley & Sons, Ltd.

  12. Angiogenic Signaling in Living Breast Tumor Models

    DTIC Science & Technology

    2007-06-01

    Poisson distributed random noise is added in an amount relative to the desired signal to noise ratio. We fit the data using a regressive fitting...AD_________________ Award Number: W81XWH-05-1-0396 TITLE: Angiogenic Signaling in Living Breast...CONTRACT NUMBER Angiogenic Signaling in Living Breast Tumor Models 5b. GRANT NUMBER W81XWH-05-1-0396 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d

  13. Conditional Poisson models: a flexible alternative to conditional logistic case cross-over analysis.

    PubMed

    Armstrong, Ben G; Gasparrini, Antonio; Tobias, Aurelio

    2014-11-24

    The time stratified case cross-over approach is a popular alternative to conventional time series regression for analysing associations between time series of environmental exposures (air pollution, weather) and counts of health outcomes. These are almost always analyzed using conditional logistic regression on data expanded to case-control (case crossover) format, but this has some limitations. In particular adjusting for overdispersion and auto-correlation in the counts is not possible. It has been established that a Poisson model for counts with stratum indicators gives identical estimates to those from conditional logistic regression and does not have these limitations, but it is little used, probably because of the overheads in estimating many stratum parameters. The conditional Poisson model avoids estimating stratum parameters by conditioning on the total event count in each stratum, thus simplifying the computing and increasing the number of strata for which fitting is feasible compared with the standard unconditional Poisson model. Unlike the conditional logistic model, the conditional Poisson model does not require expanding the data, and can adjust for overdispersion and auto-correlation. It is available in Stata, R, and other packages. By applying to some real data and using simulations, we demonstrate that conditional Poisson models were simpler to code and shorter to run than are conditional logistic analyses and can be fitted to larger data sets than possible with standard Poisson models. Allowing for overdispersion or autocorrelation was possible with the conditional Poisson model but when not required this model gave identical estimates to those from conditional logistic regression. Conditional Poisson regression models provide an alternative to case crossover analysis of stratified time series data with some advantages. The conditional Poisson model can also be used in other contexts in which primary control for confounding is by fine stratification.

  14. Performance of the score systems Acute Physiology and Chronic Health Evaluation II and III at an interdisciplinary intensive care unit, after customization

    PubMed Central

    Markgraf, Rainer; Deutschinoff, Gerd; Pientka, Ludger; Scholten, Theo; Lorenz, Cristoph

    2001-01-01

    Background: Mortality predictions calculated using scoring scales are often not accurate in populations other than those in which the scales were developed because of differences in case-mix. The present study investigates the effect of first-level customization, using a logistic regression technique, on discrimination and calibration of the Acute Physiology and Chronic Health Evaluation (APACHE) II and III scales. Method: Probabilities of hospital death for patients were estimated by applying APACHE II and III and comparing these with observed outcomes. Using the split sample technique, a customized model to predict outcome was developed by logistic regression. The overall goodness-of-fit of the original and the customized models was assessed. Results: Of 3383 consecutive intensive care unit (ICU) admissions over 3 years, 2795 patients could be analyzed, and were split randomly into development and validation samples. The discriminative powers of APACHE II and III were unchanged by customization (areas under the receiver operating characteristic [ROC] curve 0.82 and 0.85, respectively). Hosmer-Lemeshow goodness-of-fit tests showed good calibration for APACHE II, but insufficient calibration for APACHE III. Customization improved calibration for both models, with a good fit for APACHE III as well. However, fit was different for various subgroups. Conclusions: The overall goodness-of-fit of APACHE III mortality prediction was improved significantly by customization, but uniformity of fit in different subgroups was not achieved. Therefore, application of the customized model provides no advantage, because differences in case-mix still limit comparisons of quality of care. PMID:11178223

  15. Predicting Performance on a Firefighter's Ability Test from Fitness Parameters

    ERIC Educational Resources Information Center

    Michaelides, Marcos A.; Parpa, Koulla M.; Thompson, Jerald; Brown, Barry

    2008-01-01

    The purpose of this project was to identify the relationships between various fitness parameters such as upper body muscular endurance, upper and lower body strength, flexibility, body composition and performance on an ability test (AT) that included simulated firefighting tasks. A second intent was to create a regression model that would predict…

  16. Validity of VO(2 max) in predicting blood volume: implications for the effect of fitness on aging

    NASA Technical Reports Server (NTRS)

    Convertino, V. A.; Ludwig, D. A.

    2000-01-01

    A multiple regression model was constructed to investigate the premise that blood volume (BV) could be predicted using several anthropometric variables, age, and maximal oxygen uptake (VO(2 max)). To test this hypothesis, age, calculated body surface area (height/weight composite), percent body fat (hydrostatic weight), and VO(2 max) were regressed on to BV using data obtained from 66 normal healthy men. Results from the evaluation of the full model indicated that the most parsimonious result was obtained when age and VO(2 max) were regressed on BV expressed per kilogram body weight. The full model accounted for 52% of the total variance in BV per kilogram body weight. Both age and VO(2 max) were related to BV in the positive direction. Percent body fat contributed <1% to the explained variance in BV when expressed in absolute BV (ml) or as BV per kilogram body weight. When the model was cross validated on 41 new subjects and BV per kilogram body weight was reexpressed as raw BV, the results indicated that the statistical model would be stable under cross validation (e.g., predictive applications) with an accuracy of +/- 1,200 ml at 95% confidence. Our results support the hypothesis that BV is an increasing function of aerobic fitness and to a lesser extent the age of the subject. The results may have implication as to a mechanism by which aerobic fitness and activity may be protective against reduced BV associated with aging.

  17. Predicted effect size of lisdexamfetamine treatment of attention deficit/hyperactivity disorder (ADHD) in European adults: Estimates based on indirect analysis using a systematic review and meta-regression analysis.

    PubMed

    Fridman, M; Hodgkins, P S; Kahle, J S; Erder, M H

    2015-06-01

    There are few approved therapies for adults with attention-deficit/hyperactivity disorder (ADHD) in Europe. Lisdexamfetamine (LDX) is an effective treatment for ADHD; however, no clinical trials examining the efficacy of LDX specifically in European adults have been conducted. Therefore, to estimate the efficacy of LDX in European adults we performed a meta-regression of existing clinical data. A systematic review identified US- and Europe-based randomized efficacy trials of LDX, atomoxetine (ATX), or osmotic-release oral system methylphenidate (OROS-MPH) in children/adolescents and adults. A meta-regression model was then fitted to the published/calculated effect sizes (Cohen's d) using medication, geographical location, and age group as predictors. The LDX effect size in European adults was extrapolated from the fitted model. Sensitivity analyses performed included using adult-only studies and adding studies with placebo designs other than a standard pill-placebo design. Twenty-two of 2832 identified articles met inclusion criteria. The model-estimated effect size of LDX for European adults was 1.070 (95% confidence interval: 0.738, 1.401), larger than the 0.8 threshold for large effect sizes. The overall model fit was adequate (80%) and stable in the sensitivity analyses. This model predicts that LDX may have a large treatment effect size in European adults with ADHD. Copyright © 2015 Elsevier Masson SAS. All rights reserved.

  18. Comparison of adsorption equilibrium models for the study of CL-, NO3- and SO4(2-) removal from aqueous solutions by an anion exchange resin.

    PubMed

    Dron, Julien; Dodi, Alain

    2011-06-15

    The removal of chloride, nitrate and sulfate ions from aqueous solutions by a macroporous resin is studied through the ion exchange systems OH(-)/Cl(-), OH(-)/NO(3)(-), OH(-)/SO(4)(2-), and HCO(3)(-)/Cl(-), Cl(-)/NO(3)(-), Cl(-)/SO(4)(2-). They are investigated by means of Langmuir, Freundlich, Dubinin-Radushkevitch (D-R) and Dubinin-Astakhov (D-A) single-component adsorption isotherms. The sorption parameters and the fitting of the models are determined by nonlinear regression and discussed. The Langmuir model provides a fair estimation of the sorption capacity whatever the system under study, on the contrary to Freundlich and D-R models. The adsorption energies deduced from Dubinin and Langmuir isotherms are in good agreement, and the surface parameter of the D-A isotherm appears consistent. All models agree on the order of affinity OH(-)

  19. A metabolism-based whole lake eutrophication model to estimate the magnitude and time scales of the effects of restoration in Upper Klamath Lake, south-central Oregon

    USGS Publications Warehouse

    Wherry, Susan A.; Wood, Tamara M.

    2018-04-27

    A whole lake eutrophication (WLE) model approach for phosphorus and cyanobacterial biomass in Upper Klamath Lake, south-central Oregon, is presented here. The model is a successor to a previous model developed to inform a Total Maximum Daily Load (TMDL) for phosphorus in the lake, but is based on net primary production (NPP), which can be calculated from dissolved oxygen, rather than scaling up a small-scale description of cyanobacterial growth and respiration rates. This phase 3 WLE model is a refinement of the proof-of-concept developed in phase 2, which was the first attempt to use NPP to simulate cyanobacteria in the TMDL model. The calibration of the calculated NPP WLE model was successful, with performance metrics indicating a good fit to calibration data, and the calculated NPP WLE model was able to simulate mid-season bloom decreases, a feature that previous models could not reproduce.In order to use the model to simulate future scenarios based on phosphorus load reduction, a multivariate regression model was created to simulate NPP as a function of the model state variables (phosphorus and chlorophyll a) and measured meteorological and temperature model inputs. The NPP time series was split into a low- and high-frequency component using wavelet analysis, and regression models were fit to the components separately, with moderate success.The regression models for NPP were incorporated in the WLE model, referred to as the “scenario” WLE (SWLE), and the fit statistics for phosphorus during the calibration period were mostly unchanged. The fit statistics for chlorophyll a, however, were degraded. These statistics are still an improvement over prior models, and indicate that the SWLE is appropriate for long-term predictions even though it misses some of the seasonal variations in chlorophyll a.The complete whole lake SWLE model, with multivariate regression to predict NPP, was used to make long-term simulations of the response to 10-, 20-, and 40-percent reductions in tributary nutrient loads. The long-term mean water column concentration of total phosphorus was reduced by 9, 18, and 36 percent, respectively, in response to these load reductions. The long-term water column chlorophyll a concentration was reduced by 4, 13, and 44 percent, respectively. The adjustment to a new equilibrium between the water column and sediments occurred over about 30 years.

  20. Analysis of Blood Transfusion Data Using Bivariate Zero-Inflated Poisson Model: A Bayesian Approach.

    PubMed

    Mohammadi, Tayeb; Kheiri, Soleiman; Sedehi, Morteza

    2016-01-01

    Recognizing the factors affecting the number of blood donation and blood deferral has a major impact on blood transfusion. There is a positive correlation between the variables "number of blood donation" and "number of blood deferral": as the number of return for donation increases, so does the number of blood deferral. On the other hand, due to the fact that many donors never return to donate, there is an extra zero frequency for both of the above-mentioned variables. In this study, in order to apply the correlation and to explain the frequency of the excessive zero, the bivariate zero-inflated Poisson regression model was used for joint modeling of the number of blood donation and number of blood deferral. The data was analyzed using the Bayesian approach applying noninformative priors at the presence and absence of covariates. Estimating the parameters of the model, that is, correlation, zero-inflation parameter, and regression coefficients, was done through MCMC simulation. Eventually double-Poisson model, bivariate Poisson model, and bivariate zero-inflated Poisson model were fitted on the data and were compared using the deviance information criteria (DIC). The results showed that the bivariate zero-inflated Poisson regression model fitted the data better than the other models.

  1. Analysis of Blood Transfusion Data Using Bivariate Zero-Inflated Poisson Model: A Bayesian Approach

    PubMed Central

    Mohammadi, Tayeb; Sedehi, Morteza

    2016-01-01

    Recognizing the factors affecting the number of blood donation and blood deferral has a major impact on blood transfusion. There is a positive correlation between the variables “number of blood donation” and “number of blood deferral”: as the number of return for donation increases, so does the number of blood deferral. On the other hand, due to the fact that many donors never return to donate, there is an extra zero frequency for both of the above-mentioned variables. In this study, in order to apply the correlation and to explain the frequency of the excessive zero, the bivariate zero-inflated Poisson regression model was used for joint modeling of the number of blood donation and number of blood deferral. The data was analyzed using the Bayesian approach applying noninformative priors at the presence and absence of covariates. Estimating the parameters of the model, that is, correlation, zero-inflation parameter, and regression coefficients, was done through MCMC simulation. Eventually double-Poisson model, bivariate Poisson model, and bivariate zero-inflated Poisson model were fitted on the data and were compared using the deviance information criteria (DIC). The results showed that the bivariate zero-inflated Poisson regression model fitted the data better than the other models. PMID:27703493

  2. Evaluation of the Use of Zero-Augmented Regression Techniques to Model Incidence of Campylobacter Infections in FoodNet.

    PubMed

    Tremblay, Marlène; Crim, Stacy M; Cole, Dana J; Hoekstra, Robert M; Henao, Olga L; Döpfer, Dörte

    2017-10-01

    The Foodborne Diseases Active Surveillance Network (FoodNet) is currently using a negative binomial (NB) regression model to estimate temporal changes in the incidence of Campylobacter infection. FoodNet active surveillance in 483 counties collected data on 40,212 Campylobacter cases between years 2004 and 2011. We explored models that disaggregated these data to allow us to account for demographic, geographic, and seasonal factors when examining changes in incidence of Campylobacter infection. We hypothesized that modeling structural zeros and including demographic variables would increase the fit of FoodNet's Campylobacter incidence regression models. Five different models were compared: NB without demographic covariates, NB with demographic covariates, hurdle NB with covariates in the count component only, hurdle NB with covariates in both zero and count components, and zero-inflated NB with covariates in the count component only. Of the models evaluated, the nonzero-augmented NB model with demographic variables provided the best fit. Results suggest that even though zero inflation was not present at this level, individualizing the level of aggregation and using different model structures and predictors per site might be required to correctly distinguish between structural and observational zeros and account for risk factors that vary geographically.

  3. Multiple-trait structured antedependence model to study the relationship between litter size and birth weight in pigs and rabbits.

    PubMed

    David, Ingrid; Garreau, Hervé; Balmisse, Elodie; Billon, Yvon; Canario, Laurianne

    2017-01-20

    Some genetic studies need to take into account correlations between traits that are repeatedly measured over time. Multiple-trait random regression models are commonly used to analyze repeated traits but suffer from several major drawbacks. In the present study, we developed a multiple-trait extension of the structured antedependence model (SAD) to overcome this issue and validated its usefulness by modeling the association between litter size (LS) and average birth weight (ABW) over parities in pigs and rabbits. The single-trait SAD model assumes that a random effect at time [Formula: see text] can be explained by the previous values of the random effect (i.e. at previous times). The proposed multiple-trait extension of the SAD model consists in adding a cross-antedependence parameter to the single-trait SAD model. This model can be easily fitted using ASReml and the OWN Fortran program that we have developed. In comparison with the random regression model, we used our multiple-trait SAD model to analyze the LS and ABW of 4345 litters from 1817 Large White sows and 8706 litters from 2286 L-1777 does over a maximum of five successive parities. For both species, the multiple-trait SAD fitted the data better than the random regression model. The difference between AIC of the two models (AIC_random regression-AIC_SAD) were equal to 7 and 227 for pigs and rabbits, respectively. A similar pattern of heritability and correlation estimates was obtained for both species. Heritabilities were lower for LS (ranging from 0.09 to 0.29) than for ABW (ranging from 0.23 to 0.39). The general trend was a decrease of the genetic correlation for a given trait between more distant parities. Estimates of genetic correlations between LS and ABW were negative and ranged from -0.03 to -0.52 across parities. No correlation was observed between the permanent environmental effects, except between the permanent environmental effects of LS and ABW of the same parity, for which the estimate of the correlation was strongly negative (ranging from -0.57 to -0.67). We demonstrated that application of our multiple-trait SAD model is feasible for studying several traits with repeated measurements and showed that it provided a better fit to the data than the random regression model.

  4. Regression Analysis as a Cost Estimation Model for Unexploded Ordnance Cleanup at Former Military Installations

    DTIC Science & Technology

    2002-06-01

    fits our actual data . To determine the goodness of fit, statisticians typically use the following four measures: R2 Statistic. The R2 statistic...reviewing instruction, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of...mathematical model is developed to better estimate cleanup costs using historical cost data that could be used by the Defense Department prior to placing

  5. The role of social capital and community belongingness for exercise adherence: An exploratory study of the CrossFit gym model.

    PubMed

    Whiteman-Sandland, Jessica; Hawkins, Jemma; Clayton, Debbie

    2016-08-01

    This is the first study to measure the 'sense of community' reportedly offered by the CrossFit gym model. A cross-sectional study adapted Social Capital and General Belongingness scales to compare perceptions of a CrossFit gym and a traditional gym. CrossFit gym members reported significantly higher levels of social capital (both bridging and bonding) and community belongingness compared with traditional gym members. However, regression analysis showed neither social capital, community belongingness, nor gym type was an independent predictor of gym attendance. Exercise and health professionals may benefit from evaluating further the 'sense of community' offered by gym-based exercise programmes.

  6. Correlation of Respirator Fit Measured on Human Subjects and a Static Advanced Headform

    PubMed Central

    Bergman, Michael S.; He, Xinjian; Joseph, Michael E.; Zhuang, Ziqing; Heimbuch, Brian K.; Shaffer, Ronald E.; Choe, Melanie; Wander, Joseph D.

    2015-01-01

    This study assessed the correlation of N95 filtering face-piece respirator (FFR) fit between a Static Advanced Headform (StAH) and 10 human test subjects. Quantitative fit evaluations were performed on test subjects who made three visits to the laboratory. On each visit, one fit evaluation was performed on eight different FFRs of various model/size variations. Additionally, subject breathing patterns were recorded. Each fit evaluation comprised three two-minute exercises: “Normal Breathing,” “Deep Breathing,” and again “Normal Breathing.” The overall test fit factors (FF) for human tests were recorded. The same respirator samples were later mounted on the StAH and the overall test manikin fit factors (MFF) were assessed utilizing the recorded human breathing patterns. Linear regression was performed on the mean log10-transformed FF and MFF values to assess the relationship between the values obtained from humans and the StAH. This is the first study to report a positive correlation of respirator fit between a headform and test subjects. The linear regression by respirator resulted in R2 = 0.95, indicating a strong linear correlation between FF and MFF. For all respirators the geometric mean (GM) FF values were consistently higher than those of the GM MFF. For 50% of respirators, GM FF and GM MFF values were significantly different between humans and the StAH. For data grouped by subject/respirator combinations, the linear regression resulted in R2 = 0.49. A weaker correlation (R2 = 0.11) was found using only data paired by subject/respirator combination where both the test subject and StAH had passed a real-time leak check before performing the fit evaluation. For six respirators, the difference in passing rates between the StAH and humans was < 20%, while two respirators showed a difference of 29% and 43%. For data by test subject, GM FF and GM MFF values were significantly different for 40% of the subjects. Overall, the advanced headform system has potential for assessing fit for some N95 FFR model/sizes. PMID:25265037

  7. Adiposity as a full mediator of the influence of cardiorespiratory fitness and inflammation in schoolchildren: The FUPRECOL Study.

    PubMed

    Garcia-Hermoso, A; Agostinis-Sobrinho, C; Mota, J; Santos, R M; Correa-Bautista, J E; Ramírez-Vélez, R

    2017-06-01

    Studies in the paediatric population have shown inconsistent associations between cardiorespiratory fitness and inflammation independently of adiposity. The purpose of this study was (i) to analyse the combined association of cardiorespiratory fitness and adiposity with high-sensitivity C-reactive protein (hs-CRP), and (ii) to determine whether adiposity acts as a mediator on the association between cardiorespiratory fitness and hs-CRP in children and adolescents. This cross-sectional study included 935 (54.7% girls) healthy children and adolescents from Bogotá, Colombia. The 20 m shuttle run test was used to estimate cardiorespiratory fitness. We assessed the following adiposity parameters: body mass index, waist circumference, and fat mass index and the sum of subscapular and triceps skinfold thickness. High sensitivity assays were used to obtain hs-CRP. Linear regression models were fitted for mediation analyses examined whether the association between cardiorespiratory fitness and hs-CRP was mediated by each of adiposity parameters according to Baron and Kenny procedures. Lower levels of hs-CRP were associated with the best schoolchildren profiles (high cardiorespiratory fitness + low adiposity) (p for trend <0.001 in the four adiposity parameters), compared with unfit and overweight (low cardiorespiratory fitness + high adiposity) counterparts. Linear regression models suggest a full mediation of adiposity on the association between cardiorespiratory fitness and hs-CRP levels. Our findings seem to emphasize the importance of obesity prevention in childhood, suggesting that having high levels of cardiorespiratory fitness may not counteract the negative consequences ascribed to adiposity on hs-CRP. Copyright © 2017 The Italian Society of Diabetology, the Italian Society for the Study of Atherosclerosis, the Italian Society of Human Nutrition, and the Department of Clinical Medicine and Surgery, Federico II University. Published by Elsevier B.V. All rights reserved.

  8. The allometry of coarse root biomass: log-transformed linear regression or nonlinear regression?

    PubMed

    Lai, Jiangshan; Yang, Bo; Lin, Dunmei; Kerkhoff, Andrew J; Ma, Keping

    2013-01-01

    Precise estimation of root biomass is important for understanding carbon stocks and dynamics in forests. Traditionally, biomass estimates are based on allometric scaling relationships between stem diameter and coarse root biomass calculated using linear regression (LR) on log-transformed data. Recently, it has been suggested that nonlinear regression (NLR) is a preferable fitting method for scaling relationships. But while this claim has been contested on both theoretical and empirical grounds, and statistical methods have been developed to aid in choosing between the two methods in particular cases, few studies have examined the ramifications of erroneously applying NLR. Here, we use direct measurements of 159 trees belonging to three locally dominant species in east China to compare the LR and NLR models of diameter-root biomass allometry. We then contrast model predictions by estimating stand coarse root biomass based on census data from the nearby 24-ha Gutianshan forest plot and by testing the ability of the models to predict known root biomass values measured on multiple tropical species at the Pasoh Forest Reserve in Malaysia. Based on likelihood estimates for model error distributions, as well as the accuracy of extrapolative predictions, we find that LR on log-transformed data is superior to NLR for fitting diameter-root biomass scaling models. More importantly, inappropriately using NLR leads to grossly inaccurate stand biomass estimates, especially for stands dominated by smaller trees.

  9. Zero-inflated Conway-Maxwell Poisson Distribution to Analyze Discrete Data.

    PubMed

    Sim, Shin Zhu; Gupta, Ramesh C; Ong, Seng Huat

    2018-01-09

    In this paper, we study the zero-inflated Conway-Maxwell Poisson (ZICMP) distribution and develop a regression model. Score and likelihood ratio tests are also implemented for testing the inflation/deflation parameter. Simulation studies are carried out to examine the performance of these tests. A data example is presented to illustrate the concepts. In this example, the proposed model is compared to the well-known zero-inflated Poisson (ZIP) and the zero- inflated generalized Poisson (ZIGP) regression models. It is shown that the fit by ZICMP is comparable or better than these models.

  10. Regression estimators for generic health-related quality of life and quality-adjusted life years.

    PubMed

    Basu, Anirban; Manca, Andrea

    2012-01-01

    To develop regression models for outcomes with truncated supports, such as health-related quality of life (HRQoL) data, and account for features typical of such data such as a skewed distribution, spikes at 1 or 0, and heteroskedasticity. Regression estimators based on features of the Beta distribution. First, both a single equation and a 2-part model are presented, along with estimation algorithms based on maximum-likelihood, quasi-likelihood, and Bayesian Markov-chain Monte Carlo methods. A novel Bayesian quasi-likelihood estimator is proposed. Second, a simulation exercise is presented to assess the performance of the proposed estimators against ordinary least squares (OLS) regression for a variety of HRQoL distributions that are encountered in practice. Finally, the performance of the proposed estimators is assessed by using them to quantify the treatment effect on QALYs in the EVALUATE hysterectomy trial. Overall model fit is studied using several goodness-of-fit tests such as Pearson's correlation test, link and reset tests, and a modified Hosmer-Lemeshow test. The simulation results indicate that the proposed methods are more robust in estimating covariate effects than OLS, especially when the effects are large or the HRQoL distribution has a large spike at 1. Quasi-likelihood techniques are more robust than maximum likelihood estimators. When applied to the EVALUATE trial, all but the maximum likelihood estimators produce unbiased estimates of the treatment effect. One and 2-part Beta regression models provide flexible approaches to regress the outcomes with truncated supports, such as HRQoL, on covariates, after accounting for many idiosyncratic features of the outcomes distribution. This work will provide applied researchers with a practical set of tools to model outcomes in cost-effectiveness analysis.

  11. A generalized multivariate regression model for modelling ocean wave heights

    NASA Astrophysics Data System (ADS)

    Wang, X. L.; Feng, Y.; Swail, V. R.

    2012-04-01

    In this study, a generalized multivariate linear regression model is developed to represent the relationship between 6-hourly ocean significant wave heights (Hs) and the corresponding 6-hourly mean sea level pressure (MSLP) fields. The model is calibrated using the ERA-Interim reanalysis of Hs and MSLP fields for 1981-2000, and is validated using the ERA-Interim reanalysis for 2001-2010 and ERA40 reanalysis of Hs and MSLP for 1958-2001. The performance of the fitted model is evaluated in terms of Pierce skill score, frequency bias index, and correlation skill score. Being not normally distributed, wave heights are subjected to a data adaptive Box-Cox transformation before being used in the model fitting. Also, since 6-hourly data are being modelled, lag-1 autocorrelation must be and is accounted for. The models with and without Box-Cox transformation, and with and without accounting for autocorrelation, are inter-compared in terms of their prediction skills. The fitted MSLP-Hs relationship is then used to reconstruct historical wave height climate from the 6-hourly MSLP fields taken from the Twentieth Century Reanalysis (20CR, Compo et al. 2011), and to project possible future wave height climates using CMIP5 model simulations of MSLP fields. The reconstructed and projected wave heights, both seasonal means and maxima, are subject to a trend analysis that allows for non-linear (polynomial) trends.

  12. Mean centering, multicollinearity, and moderators in multiple regression: The reconciliation redux.

    PubMed

    Iacobucci, Dawn; Schneider, Matthew J; Popovich, Deidre L; Bakamitsos, Georgios A

    2017-02-01

    In this article, we attempt to clarify our statements regarding the effects of mean centering. In a multiple regression with predictors A, B, and A × B (where A × B serves as an interaction term), mean centering A and B prior to computing the product term can clarify the regression coefficients (which is good) and the overall model fit R 2 will remain undisturbed (which is also good).

  13. An analysis of input errors in precipitation-runoff models using regression with errors in the independent variables

    USGS Publications Warehouse

    Troutman, Brent M.

    1982-01-01

    Errors in runoff prediction caused by input data errors are analyzed by treating precipitation-runoff models as regression (conditional expectation) models. Independent variables of the regression consist of precipitation and other input measurements; the dependent variable is runoff. In models using erroneous input data, prediction errors are inflated and estimates of expected storm runoff for given observed input variables are biased. This bias in expected runoff estimation results in biased parameter estimates if these parameter estimates are obtained by a least squares fit of predicted to observed runoff values. The problems of error inflation and bias are examined in detail for a simple linear regression of runoff on rainfall and for a nonlinear U.S. Geological Survey precipitation-runoff model. Some implications for flood frequency analysis are considered. A case study using a set of data from Turtle Creek near Dallas, Texas illustrates the problems of model input errors.

  14. A matrix-based method of moments for fitting the multivariate random effects model for meta-analysis and meta-regression

    PubMed Central

    Jackson, Dan; White, Ian R; Riley, Richard D

    2013-01-01

    Multivariate meta-analysis is becoming more commonly used. Methods for fitting the multivariate random effects model include maximum likelihood, restricted maximum likelihood, Bayesian estimation and multivariate generalisations of the standard univariate method of moments. Here, we provide a new multivariate method of moments for estimating the between-study covariance matrix with the properties that (1) it allows for either complete or incomplete outcomes and (2) it allows for covariates through meta-regression. Further, for complete data, it is invariant to linear transformations. Our method reduces to the usual univariate method of moments, proposed by DerSimonian and Laird, in a single dimension. We illustrate our method and compare it with some of the alternatives using a simulation study and a real example. PMID:23401213

  15. Overall Preference of Running Shoes Can Be Predicted by Suitable Perception Factors Using a Multiple Regression Model.

    PubMed

    Tay, Cheryl Sihui; Sterzing, Thorsten; Lim, Chen Yen; Ding, Rui; Kong, Pui Wah

    2017-05-01

    This study examined (a) the strength of four individual footwear perception factors to influence the overall preference of running shoes and (b) whether these perception factors satisfied the nonmulticollinear assumption in a regression model. Running footwear must fulfill multiple functional criteria to satisfy its potential users. Footwear perception factors, such as fit and cushioning, are commonly used to guide shoe design and development, but it is unclear whether running-footwear users are able to differentiate one factor from another. One hundred casual runners assessed four running shoes on a 15-cm visual analogue scale for four footwear perception factors (fit, cushioning, arch support, and stability) as well as for overall preference during a treadmill running protocol. Diagnostic tests showed an absence of multicollinearity between factors, where values for tolerance ranged from .36 to .72, corresponding to variance inflation factors of 2.8 to 1.4. The multiple regression model of these four footwear perception variables accounted for 77.7% to 81.6% of variance in overall preference, with each factor explaining a unique part of the total variance. Casual runners were able to rate each footwear perception factor separately, thus assigning each factor a true potential to improve overall preference for the users. The results also support the use of a multiple regression model of footwear perception factors to predict overall running shoe preference. Regression modeling is a useful tool for running-shoe manufacturers to more precisely evaluate how individual factors contribute to the subjective assessment of running footwear.

  16. LogCauchy, log-sech and lognormal distributions of species abundances in forest communities

    USGS Publications Warehouse

    Yin, Z.-Y.; Peng, S.-L.; Ren, H.; Guo, Q.; Chen, Z.-H.

    2005-01-01

    Species-abundance (SA) pattern is one of the most fundamental aspects of biological community structure, providing important information regarding species richness, species-area relation and succession. To better describe the SA distribution (SAD) in a community, based on the widely used lognormal (LN) distribution model with exp(-x2) roll-off on Preston's octave scale, this study proposed two additional models, logCauchy (LC) and log-sech (LS), respectively with roll-offs of simple x-2 and e-x. The estimation of the theoretical total number of species in the whole community, S*, including very rare species not yet collected in sample, was derived from the left-truncation of each distribution. We fitted these three models by Levenberg-Marquardt nonlinear regression and measured the model fit to the data using coefficient of determination of regression, parameters' t-test and distribution's Kolmogorov-Smirnov (KS) test. Examining the SA data from six forest communities (five in lower subtropics and one in tropics), we found that: (1) on a log scale, all three models that are bell-shaped and left-truncated statistically adequately fitted the observed SADs, and the LC and LS did better than the LN; (2) from each model and for each community the S* values estimated by the integral and summation methods were almost equal, allowing us to estimate S* using a simple integral formula and to estimate its asymptotic confidence internals by regression of a transformed model containing it; (3) following the order of LC, LS, and LN, the fitted distributions became lower in the peak, less concave in the side, and shorter in the tail, and overall the LC tended to overestimate, the LN tended to underestimate, while the LS was intermediate but slightly tended to underestimate, the observed SADs (particularly the number of common species in the right tail); (4) the six communities had some similar structural properties such as following similar distribution models, having a common modal octave and a similar proportion of common species. We suggested that what follows the LN distribution should follow (or better follow) the LC and LS, and that the LC, LS and LN distributions represent a "sequential distribution set" in which one can find a best fit to the observed SAD. ?? 2004 Elsevier B.V. All rights reserved.

  17. Hyperopic photorefractive keratectomy and central islands

    NASA Astrophysics Data System (ADS)

    Gobbi, Pier Giorgio; Carones, Francesco; Morico, Alessandro; Vigo, Luca; Brancato, Rosario

    1998-06-01

    We have evaluated the refractive evolution in patients treated with yhyperopic PRK to assess the extent of the initial overcorrection and the time constant of regression. To this end, the time history of the refractive error (i.e. the difference between achieved and intended refractive correction) has been fitted by means of an exponential statistical model, giving information characterizing the surgical procedure with a direct clinical meaning. Both hyperopic and myopic PRk procedures have been analyzed by this method. The analysis of the fitting model parameters shows that hyperopic PRK patients exhibit a definitely higher initial overcorrection than myopic ones, and a regression time constant which is much longer. A common mechanism is proposed to be responsible for the refractive outcomes in hyperopic treatments and in myopic patients exhibiting significant central islands. The interpretation is in terms of superhydration of the central cornea, and is based on a simple physical model evaluating the amount of centripetal compression in the apical cornea.

  18. Changes in Collegiate Ice Hockey Player Anthropometrics and Aerobic Fitness Over Three Decades.

    PubMed

    Triplett, Ashley N; Ebbing, Amy C; Green, Matthew R; Connolly, Christopher P; Carrier, David P; Pivarnik, James M

    2018-04-09

    Over the past several decades, an increased emphasis on fitness training has emerged among collegiate ice hockey teams, with the objective to improve on-ice performance. However, it is unknown if this increase in training has translated over time to changes in anthropometric and fitness profiles of collegiate ice hockey players. The purposes of this study were to describe anthropometric (height, weight, BMI, %fat) and aerobic fitness (VO2peak) characteristics of collegiate ice hockey players over 36 years, and to evaluate whether these characteristics differ between player positions. Anthropometric and physiologic data were obtained through preseason fitness testing of players (N=279) from a NCAA Division I men's ice hockey team from the years of 1980 through 2015. Changes over time in the anthropometric and physiologic variables were evaluated via regression analysis using linear and polynomial models and differences between player position were compared via ANOVA (p<0.05). Regression analysis revealed a cubic model best predicted changes in mean height (R2=0.65), weight (R2=0.77), and BMI (R2=0.57), while a quadratic model best fit change in %fat by year (R2=0.30). Little change was observed over time in the anthropometric characteristics. Defensemen were significantly taller than forwards (184.7±12.1 vs. 181.3±5.9cm)(p=0.007) and forwards had a higher relative VO2peak compared to defensemen (58.7±4.7 vs. 57.2±4.4ml/kg/min)(p=0.032). No significant differences were observed in %fat or weight by position. While average player heights and weights fluctuated over time, increased emphasis on fitness training did not affect athletes' relative aerobic fitness. Differences in height and aerobic fitness levels were observed between player position.

  19. Response Surface Modeling Tolerance and Inference Error Risk Specifications: Proposed Industry Standards

    NASA Technical Reports Server (NTRS)

    DeLoach, Richard

    2012-01-01

    This paper reviews the derivation of an equation for scaling response surface modeling experiments. The equation represents the smallest number of data points required to fit a linear regression polynomial so as to achieve certain specified model adequacy criteria. Specific criteria are proposed which simplify an otherwise rather complex equation, generating a practical rule of thumb for the minimum volume of data required to adequately fit a polynomial with a specified number of terms in the model. This equation and the simplified rule of thumb it produces can be applied to minimize the cost of wind tunnel testing.

  20. The importance of regional models in assessing canine cancer incidences in Switzerland

    PubMed Central

    Leyk, Stefan; Brunsdon, Christopher; Graf, Ramona; Pospischil, Andreas; Fabrikant, Sara Irina

    2018-01-01

    Fitting canine cancer incidences through a conventional regression model assumes constant statistical relationships across the study area in estimating the model coefficients. However, it is often more realistic to consider that these relationships may vary over space. Such a condition, known as spatial non-stationarity, implies that the model coefficients need to be estimated locally. In these kinds of local models, the geographic scale, or spatial extent, employed for coefficient estimation may also have a pervasive influence. This is because important variations in the local model coefficients across geographic scales may impact the understanding of local relationships. In this study, we fitted canine cancer incidences across Swiss municipal units through multiple regional models. We computed diagnostic summaries across the different regional models, and contrasted them with the diagnostics of the conventional regression model, using value-by-alpha maps and scalograms. The results of this comparative assessment enabled us to identify variations in the goodness-of-fit and coefficient estimates. We detected spatially non-stationary relationships, in particular, for the variables related to biological risk factors. These variations in the model coefficients were more important at small geographic scales, making a case for the need to model canine cancer incidences locally in contrast to more conventional global approaches. However, we contend that prior to undertaking local modeling efforts, a deeper understanding of the effects of geographic scale is needed to better characterize and identify local model relationships. PMID:29652921

  1. The importance of regional models in assessing canine cancer incidences in Switzerland.

    PubMed

    Boo, Gianluca; Leyk, Stefan; Brunsdon, Christopher; Graf, Ramona; Pospischil, Andreas; Fabrikant, Sara Irina

    2018-01-01

    Fitting canine cancer incidences through a conventional regression model assumes constant statistical relationships across the study area in estimating the model coefficients. However, it is often more realistic to consider that these relationships may vary over space. Such a condition, known as spatial non-stationarity, implies that the model coefficients need to be estimated locally. In these kinds of local models, the geographic scale, or spatial extent, employed for coefficient estimation may also have a pervasive influence. This is because important variations in the local model coefficients across geographic scales may impact the understanding of local relationships. In this study, we fitted canine cancer incidences across Swiss municipal units through multiple regional models. We computed diagnostic summaries across the different regional models, and contrasted them with the diagnostics of the conventional regression model, using value-by-alpha maps and scalograms. The results of this comparative assessment enabled us to identify variations in the goodness-of-fit and coefficient estimates. We detected spatially non-stationary relationships, in particular, for the variables related to biological risk factors. These variations in the model coefficients were more important at small geographic scales, making a case for the need to model canine cancer incidences locally in contrast to more conventional global approaches. However, we contend that prior to undertaking local modeling efforts, a deeper understanding of the effects of geographic scale is needed to better characterize and identify local model relationships.

  2. Effects of Employing Ridge Regression in Structural Equation Models.

    ERIC Educational Resources Information Center

    McQuitty, Shaun

    1997-01-01

    LISREL 8 invokes a ridge option when maximum likelihood or generalized least squares are used to estimate a structural equation model with a nonpositive definite covariance or correlation matrix. Implications of the ridge option for model fit, parameter estimates, and standard errors are explored through two examples. (SLD)

  3. Simplified large African carnivore density estimators from track indices.

    PubMed

    Winterbach, Christiaan W; Ferreira, Sam M; Funston, Paul J; Somers, Michael J

    2016-01-01

    The range, population size and trend of large carnivores are important parameters to assess their status globally and to plan conservation strategies. One can use linear models to assess population size and trends of large carnivores from track-based surveys on suitable substrates. The conventional approach of a linear model with intercept may not intercept at zero, but may fit the data better than linear model through the origin. We assess whether a linear regression through the origin is more appropriate than a linear regression with intercept to model large African carnivore densities and track indices. We did simple linear regression with intercept analysis and simple linear regression through the origin and used the confidence interval for ß in the linear model y  =  αx  + ß, Standard Error of Estimate, Mean Squares Residual and Akaike Information Criteria to evaluate the models. The Lion on Clay and Low Density on Sand models with intercept were not significant ( P  > 0.05). The other four models with intercept and the six models thorough origin were all significant ( P  < 0.05). The models using linear regression with intercept all included zero in the confidence interval for ß and the null hypothesis that ß = 0 could not be rejected. All models showed that the linear model through the origin provided a better fit than the linear model with intercept, as indicated by the Standard Error of Estimate and Mean Square Residuals. Akaike Information Criteria showed that linear models through the origin were better and that none of the linear models with intercept had substantial support. Our results showed that linear regression through the origin is justified over the more typical linear regression with intercept for all models we tested. A general model can be used to estimate large carnivore densities from track densities across species and study areas. The formula observed track density = 3.26 × carnivore density can be used to estimate densities of large African carnivores using track counts on sandy substrates in areas where carnivore densities are 0.27 carnivores/100 km 2 or higher. To improve the current models, we need independent data to validate the models and data to test for non-linear relationship between track indices and true density at low densities.

  4. The relationship of aerobic capacity, anaerobic peak power and experience to performance in CrossFit exercise.

    PubMed

    Bellar, D; Hatchett, A; Judge, L W; Breaux, M E; Marcus, L

    2015-11-01

    CrossFit is becoming increasingly popular as a method to increase fitness and as a competitive sport in both the Unites States and Europe. However, little research on this mode of exercise has been performed to date. The purpose of the present investigation involving experienced CrossFit athletes and naïve healthy young men was to investigate the relationship of aerobic capacity and anaerobic power to performance in two representative CrossFit workouts: the first workout was 12 minutes in duration, and the second was based on the total time to complete the prescribed exercise. The participants were 32 healthy adult males, who were either naïve to CrossFit exercise or had competed in CrossFit competitions. Linear regression was undertaken to predict performance on the first workout (time) with age, group (naïve or CrossFit athlete), VO2max and anaerobic power, which were all significant predictors (p < 0.05) in the model. The second workout (repetitions), when examined similarly using regression, only resulted in CrossFit experience as a significant predictor (p < 0.05). The results of the study suggest that a history of participation in CrossFit competition is a key component of performance in CrossFit workouts which are representative of those performed in CrossFit, and that, in at least one these workouts, aerobic capacity and anaerobic power are associated with success.

  5. The relationship of aerobic capacity, anaerobic peak power and experience to performance in CrossFit exercise

    PubMed Central

    Hatchett, A; Judge, LW; Breaux, ME; Marcus, L

    2015-01-01

    CrossFit is becoming increasingly popular as a method to increase fitness and as a competitive sport in both the Unites States and Europe. However, little research on this mode of exercise has been performed to date. The purpose of the present investigation involving experienced CrossFit athletes and naïve healthy young men was to investigate the relationship of aerobic capacity and anaerobic power to performance in two representative CrossFit workouts: the first workout was 12 minutes in duration, and the second was based on the total time to complete the prescribed exercise. The participants were 32 healthy adult males, who were either naïve to CrossFit exercise or had competed in CrossFit competitions. Linear regression was undertaken to predict performance on the first workout (time) with age, group (naïve or CrossFit athlete), VO2max and anaerobic power, which were all significant predictors (p < 0.05) in the model. The second workout (repetitions), when examined similarly using regression, only resulted in CrossFit experience as a significant predictor (p < 0.05). The results of the study suggest that a history of participation in CrossFit competition is a key component of performance in CrossFit workouts which are representative of those performed in CrossFit, and that, in at least one these workouts, aerobic capacity and anaerobic power are associated with success. PMID:26681834

  6. OPC modeling by genetic algorithm

    NASA Astrophysics Data System (ADS)

    Huang, W. C.; Lai, C. M.; Luo, B.; Tsai, C. K.; Tsay, C. S.; Lai, C. W.; Kuo, C. C.; Liu, R. G.; Lin, H. T.; Lin, B. J.

    2005-05-01

    Optical proximity correction (OPC) is usually used to pre-distort mask layouts to make the printed patterns as close to the desired shapes as possible. For model-based OPC, a lithographic model to predict critical dimensions after lithographic processing is needed. The model is usually obtained via a regression of parameters based on experimental data containing optical proximity effects. When the parameters involve a mix of the continuous (optical and resist models) and the discrete (kernel numbers) sets, the traditional numerical optimization method may have difficulty handling model fitting. In this study, an artificial-intelligent optimization method was used to regress the parameters of the lithographic models for OPC. The implemented phenomenological models were constant-threshold models that combine diffused aerial image models with loading effects. Optical kernels decomposed from Hopkin"s equation were used to calculate aerial images on the wafer. Similarly, the numbers of optical kernels were treated as regression parameters. This way, good regression results were obtained with different sets of optical proximity effect data.

  7. Random regression models on Legendre polynomials to estimate genetic parameters for weights from birth to adult age in Canchim cattle.

    PubMed

    Baldi, F; Albuquerque, L G; Alencar, M M

    2010-08-01

    The objective of this work was to estimate covariance functions for direct and maternal genetic effects, animal and maternal permanent environmental effects, and subsequently, to derive relevant genetic parameters for growth traits in Canchim cattle. Data comprised 49,011 weight records on 2435 females from birth to adult age. The model of analysis included fixed effects of contemporary groups (year and month of birth and at weighing) and age of dam as quadratic covariable. Mean trends were taken into account by a cubic regression on orthogonal polynomials of animal age. Residual variances were allowed to vary and were modelled by a step function with 1, 4 or 11 classes based on animal's age. The model fitting four classes of residual variances was the best. A total of 12 random regression models from second to seventh order were used to model direct and maternal genetic effects, animal and maternal permanent environmental effects. The model with direct and maternal genetic effects, animal and maternal permanent environmental effects fitted by quadric, cubic, quintic and linear Legendre polynomials, respectively, was the most adequate to describe the covariance structure of the data. Estimates of direct and maternal heritability obtained by multi-trait (seven traits) and random regression models were very similar. Selection for higher weight at any age, especially after weaning, will produce an increase in mature cow weight. The possibility to modify the growth curve in Canchim cattle to obtain animals with rapid growth at early ages and moderate to low mature cow weight is limited.

  8. Soft-sensing model of temperature for aluminum reduction cell on improved twin support vector regression

    NASA Astrophysics Data System (ADS)

    Li, Tao

    2018-06-01

    The complexity of aluminum electrolysis process leads the temperature for aluminum reduction cells hard to measure directly. However, temperature is the control center of aluminum production. To solve this problem, combining some aluminum plant's practice data, this paper presents a Soft-sensing model of temperature for aluminum electrolysis process on Improved Twin Support Vector Regression (ITSVR). ITSVR eliminates the slow learning speed of Support Vector Regression (SVR) and the over-fit risk of Twin Support Vector Regression (TSVR) by introducing a regularization term into the objective function of TSVR, which ensures the structural risk minimization principle and lower computational complexity. Finally, the model with some other parameters as auxiliary variable, predicts the temperature by ITSVR. The simulation result shows Soft-sensing model based on ITSVR has short time-consuming and better generalization.

  9. Clinical risk stratification model for advanced colorectal neoplasia in persons with negative fecal immunochemical test results.

    PubMed

    Jung, Yoon Suk; Park, Chan Hyuk; Kim, Nam Hee; Park, Jung Ho; Park, Dong Il; Sohn, Chong Il

    2018-01-01

    The fecal immunochemical test (FIT) has low sensitivity for detecting advanced colorectal neoplasia (ACRN); thus, a considerable portion of FIT-negative persons may have ACRN. We aimed to develop a risk-scoring model for predicting ACRN in FIT-negative persons. We reviewed the records of participants aged ≥40 years who underwent a colonoscopy and FIT during a health check-up. We developed a risk-scoring model for predicting ACRN in FIT-negative persons. Of 11,873 FIT-negative participants, 255 (2.1%) had ACRN. On the basis of the multivariable logistic regression model, point scores were assigned as follows among FIT-negative persons: age (per year from 40 years old), 1 point; current smoker, 10 points; overweight, 5 points; obese, 7 points; hypertension, 6 points; old cerebrovascular attack (CVA), 15 points. Although the proportion of ACRN in FIT-negative persons increased as risk scores increased (from 0.6% in the group with 0-4 points to 8.1% in the group with 35-39 points), it was significantly lower than that in FIT-positive persons (14.9%). However, there was no statistical difference between the proportion of ACRN in FIT-negative persons with ≥40 points and in FIT-positive persons (10.5% vs. 14.9%, P = 0.321). FIT-negative persons may need to undergo screening colonoscopy if they clinically have a high risk of ACRN. The scoring model based on age, smoking habits, overweight or obesity, hypertension, and old CVA may be useful in selecting and prioritizing FIT-negative persons for screening colonoscopy.

  10. Inverse models: A necessary next step in ground-water modeling

    USGS Publications Warehouse

    Poeter, E.P.; Hill, M.C.

    1997-01-01

    Inverse models using, for example, nonlinear least-squares regression, provide capabilities that help modelers take full advantage of the insight available from ground-water models. However, lack of information about the requirements and benefits of inverse models is an obstacle to their widespread use. This paper presents a simple ground-water flow problem to illustrate the requirements and benefits of the nonlinear least-squares repression method of inverse modeling and discusses how these attributes apply to field problems. The benefits of inverse modeling include: (1) expedited determination of best fit parameter values; (2) quantification of the (a) quality of calibration, (b) data shortcomings and needs, and (c) confidence limits on parameter estimates and predictions; and (3) identification of issues that are easily overlooked during nonautomated calibration.Inverse models using, for example, nonlinear least-squares regression, provide capabilities that help modelers take full advantage of the insight available from ground-water models. However, lack of information about the requirements and benefits of inverse models is an obstacle to their widespread use. This paper presents a simple ground-water flow problem to illustrate the requirements and benefits of the nonlinear least-squares regression method of inverse modeling and discusses how these attributes apply to field problems. The benefits of inverse modeling include: (1) expedited determination of best fit parameter values; (2) quantification of the (a) quality of calibration, (b) data shortcomings and needs, and (c) confidence limits on parameter estimates and predictions; and (3) identification of issues that are easily overlooked during nonautomated calibration.

  11. Genetic Programming Transforms in Linear Regression Situations

    NASA Astrophysics Data System (ADS)

    Castillo, Flor; Kordon, Arthur; Villa, Carlos

    The chapter summarizes the use of Genetic Programming (GP) inMultiple Linear Regression (MLR) to address multicollinearity and Lack of Fit (LOF). The basis of the proposed method is applying appropriate input transforms (model respecification) that deal with these issues while preserving the information content of the original variables. The transforms are selected from symbolic regression models with optimal trade-off between accuracy of prediction and expressional complexity, generated by multiobjective Pareto-front GP. The chapter includes a comparative study of the GP-generated transforms with Ridge Regression, a variant of ordinary Multiple Linear Regression, which has been a useful and commonly employed approach for reducing multicollinearity. The advantages of GP-generated model respecification are clearly defined and demonstrated. Some recommendations for transforms selection are given as well. The application benefits of the proposed approach are illustrated with a real industrial application in one of the broadest empirical modeling areas in manufacturing - robust inferential sensors. The chapter contributes to increasing the awareness of the potential of GP in statistical model building by MLR.

  12. Modeling the pressure inactivation of Escherichia coli and Salmonella typhimurium in sapote mamey ( Pouteria sapota (Jacq.) H.E. Moore & Stearn) pulp.

    PubMed

    Saucedo-Reyes, Daniela; Carrillo-Salazar, José A; Román-Padilla, Lizbeth; Saucedo-Veloz, Crescenciano; Reyes-Santamaría, María I; Ramírez-Gilly, Mariana; Tecante, Alberto

    2018-03-01

    High hydrostatic pressure inactivation kinetics of Escherichia coli ATCC 25922 and Salmonella enterica subsp. enterica serovar Typhimurium ATCC 14028 ( S. typhimurium) in a low acid mamey pulp at four pressure levels (300, 350, 400, and 450 MPa), different exposure times (0-8 min), and temperature of 25 ± 2℃ were obtained. Survival curves showed deviations from linearity in the form of a tail (upward concavity). The primary models tested were the Weibull model, the modified Gompertz equation, and the biphasic model. The Weibull model gave the best goodness of fit ( R 2 adj  > 0.956, root mean square error < 0.290) in the modeling and the lowest Akaike information criterion value. Exponential-logistic and exponential decay models, and Bigelow-type and an empirical models for b'( P) and n( P) parameters, respectively, were tested as alternative secondary models. The process validation considered the two- and one-step nonlinear regressions for making predictions of the survival fraction; both regression types provided an adequate goodness of fit and the one-step nonlinear regression clearly reduced fitting errors. The best candidate model according to the Akaike theory information, with better accuracy and more reliable predictions was the Weibull model integrated by the exponential-logistic and exponential decay secondary models as a function of time and pressure (two-step procedure) or incorporated as one equation (one-step procedure). Both mathematical expressions were used to determine the t d parameter, where the desired reductions ( 5D) (considering d = 5 ( t 5 ) as the criterion of 5 Log 10 reduction (5 D)) in both microorganisms are attainable at 400 MPa for 5.487 ± 0.488 or 5.950 ± 0.329 min, respectively, for the one- or two-step nonlinear procedure.

  13. The Naïve Overfitting Index Selection (NOIS): A new method to optimize model complexity for hyperspectral data

    NASA Astrophysics Data System (ADS)

    Rocha, Alby D.; Groen, Thomas A.; Skidmore, Andrew K.; Darvishzadeh, Roshanak; Willemen, Louise

    2017-11-01

    The growing number of narrow spectral bands in hyperspectral remote sensing improves the capacity to describe and predict biological processes in ecosystems. But it also poses a challenge to fit empirical models based on such high dimensional data, which often contain correlated and noisy predictors. As sample sizes, to train and validate empirical models, seem not to be increasing at the same rate, overfitting has become a serious concern. Overly complex models lead to overfitting by capturing more than the underlying relationship, and also through fitting random noise in the data. Many regression techniques claim to overcome these problems by using different strategies to constrain complexity, such as limiting the number of terms in the model, by creating latent variables or by shrinking parameter coefficients. This paper is proposing a new method, named Naïve Overfitting Index Selection (NOIS), which makes use of artificially generated spectra, to quantify the relative model overfitting and to select an optimal model complexity supported by the data. The robustness of this new method is assessed by comparing it to a traditional model selection based on cross-validation. The optimal model complexity is determined for seven different regression techniques, such as partial least squares regression, support vector machine, artificial neural network and tree-based regressions using five hyperspectral datasets. The NOIS method selects less complex models, which present accuracies similar to the cross-validation method. The NOIS method reduces the chance of overfitting, thereby avoiding models that present accurate predictions that are only valid for the data used, and too complex to make inferences about the underlying process.

  14. Order Selection for General Expression of Nonlinear Autoregressive Model Based on Multivariate Stepwise Regression

    NASA Astrophysics Data System (ADS)

    Shi, Jinfei; Zhu, Songqing; Chen, Ruwen

    2017-12-01

    An order selection method based on multiple stepwise regressions is proposed for General Expression of Nonlinear Autoregressive model which converts the model order problem into the variable selection of multiple linear regression equation. The partial autocorrelation function is adopted to define the linear term in GNAR model. The result is set as the initial model, and then the nonlinear terms are introduced gradually. Statistics are chosen to study the improvements of both the new introduced and originally existed variables for the model characteristics, which are adopted to determine the model variables to retain or eliminate. So the optimal model is obtained through data fitting effect measurement or significance test. The simulation and classic time-series data experiment results show that the method proposed is simple, reliable and can be applied to practical engineering.

  15. Influence plots for LASSO

    DOE PAGES

    Jang, Dae -Heung; Anderson-Cook, Christine Michaela

    2016-11-22

    With many predictors in regression, fitting the full model can induce multicollinearity problems. Least Absolute Shrinkage and Selection Operation (LASSO) is useful when the effects of many explanatory variables are sparse in a high-dimensional dataset. Influential points can have a disproportionate impact on the estimated values of model parameters. Here, this paper describes a new influence plot that can be used to increase understanding of the contributions of individual observations and the robustness of results. This can serve as a complement to other regression diagnostics techniques in the LASSO regression setting. Using this influence plot, we can find influential pointsmore » and their impact on shrinkage of model parameters and model selection. Lastly, we provide two examples to illustrate the methods.« less

  16. Influence plots for LASSO

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jang, Dae -Heung; Anderson-Cook, Christine Michaela

    With many predictors in regression, fitting the full model can induce multicollinearity problems. Least Absolute Shrinkage and Selection Operation (LASSO) is useful when the effects of many explanatory variables are sparse in a high-dimensional dataset. Influential points can have a disproportionate impact on the estimated values of model parameters. Here, this paper describes a new influence plot that can be used to increase understanding of the contributions of individual observations and the robustness of results. This can serve as a complement to other regression diagnostics techniques in the LASSO regression setting. Using this influence plot, we can find influential pointsmore » and their impact on shrinkage of model parameters and model selection. Lastly, we provide two examples to illustrate the methods.« less

  17. Computation of nonlinear least squares estimator and maximum likelihood using principles in matrix calculus

    NASA Astrophysics Data System (ADS)

    Mahaboob, B.; Venkateswarlu, B.; Sankar, J. Ravi; Balasiddamuni, P.

    2017-11-01

    This paper uses matrix calculus techniques to obtain Nonlinear Least Squares Estimator (NLSE), Maximum Likelihood Estimator (MLE) and Linear Pseudo model for nonlinear regression model. David Pollard and Peter Radchenko [1] explained analytic techniques to compute the NLSE. However the present research paper introduces an innovative method to compute the NLSE using principles in multivariate calculus. This study is concerned with very new optimization techniques used to compute MLE and NLSE. Anh [2] derived NLSE and MLE of a heteroscedatistic regression model. Lemcoff [3] discussed a procedure to get linear pseudo model for nonlinear regression model. In this research article a new technique is developed to get the linear pseudo model for nonlinear regression model using multivariate calculus. The linear pseudo model of Edmond Malinvaud [4] has been explained in a very different way in this paper. David Pollard et.al used empirical process techniques to study the asymptotic of the LSE (Least-squares estimation) for the fitting of nonlinear regression function in 2006. In Jae Myung [13] provided a go conceptual for Maximum likelihood estimation in his work “Tutorial on maximum likelihood estimation

  18. Random regression analyses using B-splines to model growth of Australian Angus cattle

    PubMed Central

    Meyer, Karin

    2005-01-01

    Regression on the basis function of B-splines has been advocated as an alternative to orthogonal polynomials in random regression analyses. Basic theory of splines in mixed model analyses is reviewed, and estimates from analyses of weights of Australian Angus cattle from birth to 820 days of age are presented. Data comprised 84 533 records on 20 731 animals in 43 herds, with a high proportion of animals with 4 or more weights recorded. Changes in weights with age were modelled through B-splines of age at recording. A total of thirteen analyses, considering different combinations of linear, quadratic and cubic B-splines and up to six knots, were carried out. Results showed good agreement for all ages with many records, but fluctuated where data were sparse. On the whole, analyses using B-splines appeared more robust against "end-of-range" problems and yielded more consistent and accurate estimates of the first eigenfunctions than previous, polynomial analyses. A model fitting quadratic B-splines, with knots at 0, 200, 400, 600 and 821 days and a total of 91 covariance components, appeared to be a good compromise between detailedness of the model, number of parameters to be estimated, plausibility of results, and fit, measured as residual mean square error. PMID:16093011

  19. Sensitivity of Chemical Shift-Encoded Fat Quantification to Calibration of Fat MR Spectrum

    PubMed Central

    Wang, Xiaoke; Hernando, Diego; Reeder, Scott B.

    2015-01-01

    Purpose To evaluate the impact of different fat spectral models on proton density fat-fraction (PDFF) quantification using chemical shift-encoded (CSE) MRI. Material and Methods Simulations and in vivo imaging were performed. In a simulation study, spectral models of fat were compared pairwise. Comparison of magnitude fitting and mixed fitting was performed over a range of echo times and fat fractions. In vivo acquisitions from 41 patients were reconstructed using 7 published spectral models of fat. T2-corrected STEAM-MRS was used as reference. Results Simulations demonstrate that imperfectly calibrated spectral models of fat result in biases that depend on echo times and fat fraction. Mixed fitting is more robust against this bias than magnitude fitting. Multi-peak spectral models showed much smaller differences among themselves than when compared to the single-peak spectral model. In vivo studies show all multi-peak models agree better (for mixed fitting, slope ranged from 0.967–1.045 using linear regression) with reference standard than the single-peak model (for mixed fitting, slope=0.76). Conclusion It is essential to use a multi-peak fat model for accurate quantification of fat with CSE-MRI. Further, fat quantification techniques using multi-peak fat models are comparable and no specific choice of spectral model is shown to be superior to the rest. PMID:25845713

  20. LIFESTYLE INDICATORS AND CARDIORESPIRATORY FITNESS IN ADOLESCENTS

    PubMed Central

    de Victo, Eduardo Rossato; Ferrari, Gerson Luis de Moraes; da Silva, João Pedro; Araújo, Timóteo Leandro; Matsudo, Victor Keihan Rodrigues

    2017-01-01

    ABSTRACT Objective: To evaluate the lifestyle indicators associated with cardiorespiratory fitness in adolescents from Ilhabela, São Paulo, Brazil. Methods: The sample consisted of 181 adolescents (53% male) from the Mixed Longitudinal Project on Growth, Development, and Physical Fitness of Ilhabela. Body composition (weight, height, and body mass index, or BMI), school transportation, time spent sitting, physical activity, sports, television time (TV), having a TV in the bedroom, sleep, health perception, diet, and economic status (ES) were analyzed. Cardiorespiratory fitness was estimated by the submaximal progressive protocol performed on a cycle ergometer. Linear regression models were used with the stepwise method. Results: The sample average age was 14.8 years, and the average cardiorespiratory fitness was 42.2 mL.kg-1.min-1 (42.9 for boys and 41.4 for girls; p=0.341). In the total sample, BMI (unstandardized regression coefficient [B]=-0.03), height (B=-0.01), ES (B=0.10), gender (B=0.12), and age (B=0.03) were significantly associated with cardiorespiratory fitness. In boys, BMI, height, not playing any sports, and age were significantly associated with cardiorespiratory fitness. In girls, BMI, ES, and having a TV in the bedroom were significantly associated with cardiorespiratory fitness. Conclusions: Lifestyle indicators influenced the cardiorespiratory fitness; BMI, ES, and age influenced both sexes. Not playing any sports, for boys, and having a TV in the bedroom, for girls, also influenced cardiorespiratory fitness. Public health measures to improve lifestyle indicators can help to increase cardiorespiratory fitness levels. PMID:28977318

  1. Erosion and soil displacement related to timber harvesting in northwestern California, U.S.A.

    Treesearch

    R.M. Rice; D.J. Furbish

    1984-01-01

    The relationship between measures of site disturbance and erosion resulting from timber harvest was studied by regression analyses. None of the 12 regression models developed and tested yielded a coefficient of determination (R2) greater than 0.60. The results indicated that the poor fits to the data were due, in part, to unexplained qualitative...

  2. "Erosion and soil displacement related to timber harvesting in northwestern California, U.S.A."

    Treesearch

    R. M. Rice; D. J. Furbish

    1984-01-01

    The relationship between measures of site disturbance and erosion resulting from timber harvest was studied by regression analyses. None of the 12 regression models developed and tested yielded a coefficient of determination (R 2) greater than 0.60. The results indicated that the poor fits to the data were due, in part, to unexplained qualitative differences in...

  3. Investigating the correlation of the U.S. Air Force Physical Fitness Test to combat-based fitness: a women-only study.

    PubMed

    Mitchell, Tarah; White, Edward D; Ritschel, Daniel

    2014-06-01

    The primary objective in this research involves determining the Air Force Physical Fitness Test's (AFPFT) predictability of combat fitness and whether measures within the AFPFT require modification to increase this predictability further. We recruited 60 female volunteers and compared their performance on the AFPFT to the Marine Combat Fitness Test, the proxy for combat fitness. We discovered little association between the two (R(2) of 0.35), however, this association significantly increased (adjusted R(2) of 0.56) when utilizing the raw scores of the AFPFT instead of using the gender/age scoring tables. Improving on these associations, we develop and propose a simple ordinary least squares regression model that minimally impacts the AFPFT testing routine. This two-event model for predicting combat fitness incorporates the 1.5-mile run along with the number of repetitions of a 30-lb dumbbell from chest height to overhead with arms extended during a 2-minute time span. These two events predicted combat fitness as assessed by the Marine Combat Fitness Test with an adjusted R(2) of 0.82. By adopting this model, we greatly improve the Air Force's ability to assess combat fitness for women. Reprint & Copyright © 2014 Association of Military Surgeons of the U.S.

  4. Comparative evaluation of human heat stress indices on selected hospital admissions in Sydney, Australia.

    PubMed

    Goldie, James; Alexander, Lisa; Lewis, Sophie C; Sherwood, Steven

    2017-08-01

    To find appropriate regression model specifications for counts of the daily hospital admissions of a Sydney cohort and determine which human heat stress indices best improve the models' fit. We built parent models of eight daily counts of admission records using weather station observations, census population estimates and public holiday data. We added heat stress indices; models with lower Akaike Information Criterion scores were judged a better fit. Five of the eight parent models demonstrated adequate fit. Daily maximum Simplified Wet Bulb Globe Temperature (sWBGT) consistently improved fit more than most other indices; temperature and heatwave indices also modelled some health outcomes well. Humidity and heat-humidity indices better fit counts of patients who died following admission. Maximum sWBGT is an ideal measure of heat stress for these types of Sydney hospital admissions. Simple temperature indices are a good fallback where a narrower range of conditions is investigated. Implications for public health: This study confirms the importance of selecting appropriate heat stress indices for modelling. Epidemiologists projecting Sydney hospital admissions should use maximum sWBGT as a common measure of heat stress. Health organisations interested in short-range forecasting may prefer simple temperature indices. © 2017 The Authors.

  5. Spatio-temporal water quality mapping from satellite images using geographically and temporally weighted regression

    NASA Astrophysics Data System (ADS)

    Chu, Hone-Jay; Kong, Shish-Jeng; Chang, Chih-Hua

    2018-03-01

    The turbidity (TB) of a water body varies with time and space. Water quality is traditionally estimated via linear regression based on satellite images. However, estimating and mapping water quality require a spatio-temporal nonstationary model, while TB mapping necessitates the use of geographically and temporally weighted regression (GTWR) and geographically weighted regression (GWR) models, both of which are more precise than linear regression. Given the temporal nonstationary models for mapping water quality, GTWR offers the best option for estimating regional water quality. Compared with GWR, GTWR provides highly reliable information for water quality mapping, boasts a relatively high goodness of fit, improves the explanation of variance from 44% to 87%, and shows a sufficient space-time explanatory power. The seasonal patterns of TB and the main spatial patterns of TB variability can be identified using the estimated TB maps from GTWR and by conducting an empirical orthogonal function (EOF) analysis.

  6. Evolution of the Marine Officer Fitness Report: A Multivariate Analysis

    DTIC Science & Technology

    This thesis explores the evaluation behavior of United States Marine Corps (USMC) Reporting Seniors (RSs) from 2010 to 2017. Using fitness report...RSs evaluate the performance of subordinate active component unrestricted officer MROs over time. I estimate logistic regression models of the...lowest. However, these correlations indicating the effects of race matching on FITREP evaluations narrow in significance when performance-based factors

  7. Ensemble learning of inverse probability weights for marginal structural modeling in large observational datasets.

    PubMed

    Gruber, Susan; Logan, Roger W; Jarrín, Inmaculada; Monge, Susana; Hernán, Miguel A

    2015-01-15

    Inverse probability weights used to fit marginal structural models are typically estimated using logistic regression. However, a data-adaptive procedure may be able to better exploit information available in measured covariates. By combining predictions from multiple algorithms, ensemble learning offers an alternative to logistic regression modeling to further reduce bias in estimated marginal structural model parameters. We describe the application of two ensemble learning approaches to estimating stabilized weights: super learning (SL), an ensemble machine learning approach that relies on V-fold cross validation, and an ensemble learner (EL) that creates a single partition of the data into training and validation sets. Longitudinal data from two multicenter cohort studies in Spain (CoRIS and CoRIS-MD) were analyzed to estimate the mortality hazard ratio for initiation versus no initiation of combined antiretroviral therapy among HIV positive subjects. Both ensemble approaches produced hazard ratio estimates further away from the null, and with tighter confidence intervals, than logistic regression modeling. Computation time for EL was less than half that of SL. We conclude that ensemble learning using a library of diverse candidate algorithms offers an alternative to parametric modeling of inverse probability weights when fitting marginal structural models. With large datasets, EL provides a rich search over the solution space in less time than SL with comparable results. Copyright © 2014 John Wiley & Sons, Ltd.

  8. Ensemble learning of inverse probability weights for marginal structural modeling in large observational datasets

    PubMed Central

    Gruber, Susan; Logan, Roger W.; Jarrín, Inmaculada; Monge, Susana; Hernán, Miguel A.

    2014-01-01

    Inverse probability weights used to fit marginal structural models are typically estimated using logistic regression. However a data-adaptive procedure may be able to better exploit information available in measured covariates. By combining predictions from multiple algorithms, ensemble learning offers an alternative to logistic regression modeling to further reduce bias in estimated marginal structural model parameters. We describe the application of two ensemble learning approaches to estimating stabilized weights: super learning (SL), an ensemble machine learning approach that relies on V -fold cross validation, and an ensemble learner (EL) that creates a single partition of the data into training and validation sets. Longitudinal data from two multicenter cohort studies in Spain (CoRIS and CoRIS-MD) were analyzed to estimate the mortality hazard ratio for initiation versus no initiation of combined antiretroviral therapy among HIV positive subjects. Both ensemble approaches produced hazard ratio estimates further away from the null, and with tighter confidence intervals, than logistic regression modeling. Computation time for EL was less than half that of SL. We conclude that ensemble learning using a library of diverse candidate algorithms offers an alternative to parametric modeling of inverse probability weights when fitting marginal structural models. With large datasets, EL provides a rich search over the solution space in less time than SL with comparable results. PMID:25316152

  9. Calibration power of the Braden scale in predicting pressure ulcer development.

    PubMed

    Chen, Hong-Lin; Cao, Ying-Juan; Wang, Jing; Huai, Bao-Sha

    2016-11-02

    Calibration is the degree of correspondence between the estimated probability produced by a model and the actual observed probability. The aim of this study was to investigate the calibration power of the Braden scale in predicting pressure ulcer development (PU). A retrospective analysis was performed among consecutive patients in 2013. The patients were separated into training a group and a validation group. The predicted incidence was calculated using a logistic regression model in the training group and the Hosmer-Lemeshow test was used for assessing the goodness of fit. In the validation cohort, the observed and the predicted incidence were compared by the Chi-square (χ 2 ) goodness of fit test for calibration power. We included 2585 patients in the study, of these 78 patients (3.0%) developed a PU. Between the training and validation groups the patient characteristics were non-significant (p>0.05). In the training group, the logistic regression model for predicting pressure ulcer was Logit(P) = -0.433*Braden score+2.616. The Hosmer-Lemeshow test showed no goodness fit (χ 2 =13.472; p=0.019). In the validation group, the predicted pressure ulcer incidence also did not fit well with the observed incidence (χ 2 =42.154, p=0.000 by Braden scores; and χ 2 =17.223, p=0.001 by Braden scale risk classification). The Braden scale has low calibration power in predicting PU formation.

  10. Physical fitness and academic performance: empirical evidence from the National Administrative Senior High School Student Data in Taiwan.

    PubMed

    Liao, Pei-An; Chang, Hung-Hao; Wang, Jiun-Hao; Wu, Min-Chen

    2013-06-01

    This study examined the relationship between the changes of physical fitness across the 3-year spectrum of senior high school study and academic performance measured by standardized tests in Taiwan. A unique dataset of 149 240 university-bound senior high school students from 2009 to 2011 was constructed by merging two nationwide administrative datasets of physical fitness test performance and the university entrance exam scores. Hierarchical linear regression models were used. All regressions included controls for students' baseline physical fitness status, changes of physical fitness performance over time, age and family economic status. Some notable findings were revealed. An increase of 1 SD on students' overall physical fitness from the first to third school year is associated with an increase in the university entrance exam scores by 0.007 and 0.010 SD for male and female students, respectively. An increase of 1 SD on anaerobic power (flexibility) from the first to third school year is positively associated with an increase in the university entrance exam scores by 0.018 (0.010) SD among female students. We suggest that education and school health policymakers should consider and design policies to improve physical fitness as part of their overall strategy of improving academic performance.

  11. Applicability of Monte Carlo cross validation technique for model development and validation using generalised least squares regression

    NASA Astrophysics Data System (ADS)

    Haddad, Khaled; Rahman, Ataur; A Zaman, Mohammad; Shrestha, Surendra

    2013-03-01

    SummaryIn regional hydrologic regression analysis, model selection and validation are regarded as important steps. Here, the model selection is usually based on some measurements of goodness-of-fit between the model prediction and observed data. In Regional Flood Frequency Analysis (RFFA), leave-one-out (LOO) validation or a fixed percentage leave out validation (e.g., 10%) is commonly adopted to assess the predictive ability of regression-based prediction equations. This paper develops a Monte Carlo Cross Validation (MCCV) technique (which has widely been adopted in Chemometrics and Econometrics) in RFFA using Generalised Least Squares Regression (GLSR) and compares it with the most commonly adopted LOO validation approach. The study uses simulated and regional flood data from the state of New South Wales in Australia. It is found that when developing hydrologic regression models, application of the MCCV is likely to result in a more parsimonious model than the LOO. It has also been found that the MCCV can provide a more realistic estimate of a model's predictive ability when compared with the LOO.

  12. A Survival Model for Shortleaf Pine Tress Growing in Uneven-Aged Stands

    Treesearch

    Thomas B. Lynch; Lawrence R. Gering; Michael M. Huebschmann; Paul A. Murphy

    1999-01-01

    A survival model for shortleaf pine (Pinus echinata Mill.) trees growing in uneven-aged stands was developed using data from permanently established plots maintained by an industrial forestry company in western Arkansas. Parameters were fitted to a logistic regression model with a Bernoulli dependent variable in which "0" represented...

  13. R package PRIMsrc: Bump Hunting by Patient Rule Induction Method for Survival, Regression and Classification

    PubMed Central

    Dazard, Jean-Eudes; Choe, Michael; LeBlanc, Michael; Rao, J. Sunil

    2015-01-01

    PRIMsrc is a novel implementation of a non-parametric bump hunting procedure, based on the Patient Rule Induction Method (PRIM), offering a unified treatment of outcome variables, including censored time-to-event (Survival), continuous (Regression) and discrete (Classification) responses. To fit the model, it uses a recursive peeling procedure with specific peeling criteria and stopping rules depending on the response. To validate the model, it provides an objective function based on prediction-error or other specific statistic, as well as two alternative cross-validation techniques, adapted to the task of decision-rule making and estimation in the three types of settings. PRIMsrc comes as an open source R package, including at this point: (i) a main function for fitting a Survival Bump Hunting model with various options allowing cross-validated model selection to control model size (#covariates) and model complexity (#peeling steps) and generation of cross-validated end-point estimates; (ii) parallel computing; (iii) various S3-generic and specific plotting functions for data visualization, diagnostic, prediction, summary and display of results. It is available on CRAN and GitHub. PMID:26798326

  14. Adsorption potential of a modified activated carbon for the removal of nitrogen containing compounds from model fuel

    NASA Astrophysics Data System (ADS)

    Anisuzzaman, S. M.; Krishnaiah, D.; Alfred, D.

    2018-02-01

    The purpose of this study is to find the effect of the modified activated carbon (MAC) on the adsorption activity for nitrogen containing compounds (NCC) removal from model fuel. Modification of commercial activated carbon (AC) involved impregnation with different ratios of sulfuric acid solution. Pseudo-first and pseudo-second order kinetic models were applied to study the adsorption kinetics, while the adsorption isotherms were used for the evaluation of equilibrium data. All of the experimental data were analyzed using ultraviolet-visible spectroscopy after adsorption experiment between different concentration dosage of adsorbent and model fuel. It has been found that adsorption of NCC by MAC was best fit is the Langmuir isotherm for quinoline (QUI) and Freundlich isotherm for indole (IND) with a maximum adsorption capacity of 0.13 mg/g and 0.16 mg/g respectively. Based on the experimental data, pseudo-first order exhibited the best fit for QUI with linear regression (R2) ranges from 0.0.9777 to 0.9935 and pseudo-second order exhibited the best fit for IND with linear regression (R2) ranges from 0.9701 to 0.9962. From the adsorption isotherm and kinetic studies result proven that commercial AC shows great potential in removing nitrogen.

  15. Use of probabilistic weights to enhance linear regression myoelectric control

    NASA Astrophysics Data System (ADS)

    Smith, Lauren H.; Kuiken, Todd A.; Hargrove, Levi J.

    2015-12-01

    Objective. Clinically available prostheses for transradial amputees do not allow simultaneous myoelectric control of degrees of freedom (DOFs). Linear regression methods can provide simultaneous myoelectric control, but frequently also result in difficulty with isolating individual DOFs when desired. This study evaluated the potential of using probabilistic estimates of categories of gross prosthesis movement, which are commonly used in classification-based myoelectric control, to enhance linear regression myoelectric control. Approach. Gaussian models were fit to electromyogram (EMG) feature distributions for three movement classes at each DOF (no movement, or movement in either direction) and used to weight the output of linear regression models by the probability that the user intended the movement. Eight able-bodied and two transradial amputee subjects worked in a virtual Fitts’ law task to evaluate differences in controllability between linear regression and probability-weighted regression for an intramuscular EMG-based three-DOF wrist and hand system. Main results. Real-time and offline analyses in able-bodied subjects demonstrated that probability weighting improved performance during single-DOF tasks (p < 0.05) by preventing extraneous movement at additional DOFs. Similar results were seen in experiments with two transradial amputees. Though goodness-of-fit evaluations suggested that the EMG feature distributions showed some deviations from the Gaussian, equal-covariance assumptions used in this experiment, the assumptions were sufficiently met to provide improved performance compared to linear regression control. Significance. Use of probability weights can improve the ability to isolate individual during linear regression myoelectric control, while maintaining the ability to simultaneously control multiple DOFs.

  16. Polynomials to model the growth of young bulls in performance tests.

    PubMed

    Scalez, D C B; Fragomeni, B O; Passafaro, T L; Pereira, I G; Toral, F L B

    2014-03-01

    The use of polynomial functions to describe the average growth trajectory and covariance functions of Nellore and MA (21/32 Charolais+11/32 Nellore) young bulls in performance tests was studied. The average growth trajectories and additive genetic and permanent environmental covariance functions were fit with Legendre (linear through quintic) and quadratic B-spline (with two to four intervals) polynomials. In general, the Legendre and quadratic B-spline models that included more covariance parameters provided a better fit with the data. When comparing models with the same number of parameters, the quadratic B-spline provided a better fit than the Legendre polynomials. The quadratic B-spline with four intervals provided the best fit for the Nellore and MA groups. The fitting of random regression models with different types of polynomials (Legendre polynomials or B-spline) affected neither the genetic parameters estimates nor the ranking of the Nellore young bulls. However, fitting different type of polynomials affected the genetic parameters estimates and the ranking of the MA young bulls. Parsimonious Legendre or quadratic B-spline models could be used for genetic evaluation of body weight of Nellore young bulls in performance tests, whereas these parsimonious models were less efficient for animals of the MA genetic group owing to limited data at the extreme ages.

  17. Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project.

    PubMed

    Alghamdi, Manal; Al-Mallah, Mouaz; Keteyian, Steven; Brawner, Clinton; Ehrman, Jonathan; Sakr, Sherif

    2017-01-01

    Machine learning is becoming a popular and important approach in the field of medical research. In this study, we investigate the relative performance of various machine learning methods such as Decision Tree, Naïve Bayes, Logistic Regression, Logistic Model Tree and Random Forests for predicting incident diabetes using medical records of cardiorespiratory fitness. In addition, we apply different techniques to uncover potential predictors of diabetes. This FIT project study used data of 32,555 patients who are free of any known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 5-year follow-up. At the completion of the fifth year, 5,099 of those patients have developed diabetes. The dataset contained 62 attributes classified into four categories: demographic characteristics, disease history, medication use history, and stress test vital signs. We developed an Ensembling-based predictive model using 13 attributes that were selected based on their clinical importance, Multiple Linear Regression, and Information Gain Ranking methods. The negative effect of the imbalance class of the constructed model was handled by Synthetic Minority Oversampling Technique (SMOTE). The overall performance of the predictive model classifier was improved by the Ensemble machine learning approach using the Vote method with three Decision Trees (Naïve Bayes Tree, Random Forest, and Logistic Model Tree) and achieved high accuracy of prediction (AUC = 0.92). The study shows the potential of ensembling and SMOTE approaches for predicting incident diabetes using cardiorespiratory fitness data.

  18. Quality Reporting of Multivariable Regression Models in Observational Studies: Review of a Representative Sample of Articles Published in Biomedical Journals.

    PubMed

    Real, Jordi; Forné, Carles; Roso-Llorach, Albert; Martínez-Sánchez, Jose M

    2016-05-01

    Controlling for confounders is a crucial step in analytical observational studies, and multivariable models are widely used as statistical adjustment techniques. However, the validation of the assumptions of the multivariable regression models (MRMs) should be made clear in scientific reporting. The objective of this study is to review the quality of statistical reporting of the most commonly used MRMs (logistic, linear, and Cox regression) that were applied in analytical observational studies published between 2003 and 2014 by journals indexed in MEDLINE.Review of a representative sample of articles indexed in MEDLINE (n = 428) with observational design and use of MRMs (logistic, linear, and Cox regression). We assessed the quality of reporting about: model assumptions and goodness-of-fit, interactions, sensitivity analysis, crude and adjusted effect estimate, and specification of more than 1 adjusted model.The tests of underlying assumptions or goodness-of-fit of the MRMs used were described in 26.2% (95% CI: 22.0-30.3) of the articles and 18.5% (95% CI: 14.8-22.1) reported the interaction analysis. Reporting of all items assessed was higher in articles published in journals with a higher impact factor.A low percentage of articles indexed in MEDLINE that used multivariable techniques provided information demonstrating rigorous application of the model selected as an adjustment method. Given the importance of these methods to the final results and conclusions of observational studies, greater rigor is required in reporting the use of MRMs in the scientific literature.

  19. A New Z Score Curve of the Coronary Arterial Internal Diameter Using the Lambda-Mu-Sigma Method in a Pediatric Population.

    PubMed

    Kobayashi, Tohru; Fuse, Shigeto; Sakamoto, Naoko; Mikami, Masashi; Ogawa, Shunichi; Hamaoka, Kenji; Arakaki, Yoshio; Nakamura, Tsuneyuki; Nagasawa, Hiroyuki; Kato, Taichi; Jibiki, Toshiaki; Iwashima, Satoru; Yamakawa, Masaru; Ohkubo, Takashi; Shimoyama, Shinya; Aso, Kentaro; Sato, Seiichi; Saji, Tsutomu

    2016-08-01

    Several coronary artery Z score models have been developed. However, a Z score model derived by the lambda-mu-sigma (LMS) method has not been established. Echocardiographic measurements of the proximal right coronary artery, left main coronary artery, proximal left anterior descending coronary artery, and proximal left circumflex artery were prospectively collected in 3,851 healthy children ≤18 years of age and divided into developmental and validation data sets. In the developmental data set, smooth curves were fitted for each coronary artery using linear, logarithmic, square-root, and LMS methods for both sexes. The relative goodness of fit of these models was compared using the Bayesian information criterion. The best-fitting model was tested for reproducibility using the validation data set. The goodness of fit of the selected model was visually compared with that of the previously reported regression models using a Q-Q plot. Because the internal diameter of each coronary artery was not similar between sexes, sex-specific Z score models were developed. The LMS model with body surface area as the independent variable showed the best goodness of fit; therefore, the internal diameter of each coronary artery was transformed into a sex-specific Z score on the basis of body surface area using the LMS method. In the validation data set, a Q-Q plot of each model indicated that the distribution of Z scores in the LMS models was closer to the normal distribution compared with previously reported regression models. Finally, the final models for each coronary artery in both sexes were developed using the developmental and validation data sets. A Microsoft Excel-based Z score calculator was also created, which is freely available online (http://raise.umin.jp/zsp/calculator/). Novel LMS models with which to estimate the sex-specific Z score of each internal coronary artery diameter were generated and validated using a large pediatric population. Copyright © 2016 American Society of Echocardiography. Published by Elsevier Inc. All rights reserved.

  20. Fitting program for linear regressions according to Mahon (1996)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Trappitsch, Reto G.

    2018-01-09

    This program takes the users' Input data and fits a linear regression to it using the prescription presented by Mahon (1996). Compared to the commonly used York fit, this method has the correct prescription for measurement error propagation. This software should facilitate the proper fitting of measurements with a simple Interface.

  1. Models for Estimating Genetic Parameters of Milk Production Traits Using Random Regression Models in Korean Holstein Cattle

    PubMed Central

    Cho, C. I.; Alam, M.; Choi, T. J.; Choy, Y. H.; Choi, J. G.; Lee, S. S.; Cho, K. H.

    2016-01-01

    The objectives of the study were to estimate genetic parameters for milk production traits of Holstein cattle using random regression models (RRMs), and to compare the goodness of fit of various RRMs with homogeneous and heterogeneous residual variances. A total of 126,980 test-day milk production records of the first parity Holstein cows between 2007 and 2014 from the Dairy Cattle Improvement Center of National Agricultural Cooperative Federation in South Korea were used. These records included milk yield (MILK), fat yield (FAT), protein yield (PROT), and solids-not-fat yield (SNF). The statistical models included random effects of genetic and permanent environments using Legendre polynomials (LP) of the third to fifth order (L3–L5), fixed effects of herd-test day, year-season at calving, and a fixed regression for the test-day record (third to fifth order). The residual variances in the models were either homogeneous (HOM) or heterogeneous (15 classes, HET15; 60 classes, HET60). A total of nine models (3 orders of polynomials×3 types of residual variance) including L3-HOM, L3-HET15, L3-HET60, L4-HOM, L4-HET15, L4-HET60, L5-HOM, L5-HET15, and L5-HET60 were compared using Akaike information criteria (AIC) and/or Schwarz Bayesian information criteria (BIC) statistics to identify the model(s) of best fit for their respective traits. The lowest BIC value was observed for the models L5-HET15 (MILK; PROT; SNF) and L4-HET15 (FAT), which fit the best. In general, the BIC values of HET15 models for a particular polynomial order was lower than that of the HET60 model in most cases. This implies that the orders of LP and types of residual variances affect the goodness of models. Also, the heterogeneity of residual variances should be considered for the test-day analysis. The heritability estimates of from the best fitted models ranged from 0.08 to 0.15 for MILK, 0.06 to 0.14 for FAT, 0.08 to 0.12 for PROT, and 0.07 to 0.13 for SNF according to days in milk of first lactation. Genetic variances for studied traits tended to decrease during the earlier stages of lactation, which were followed by increases in the middle and decreases further at the end of lactation. With regards to the fitness of the models and the differential genetic parameters across the lactation stages, we could estimate genetic parameters more accurately from RRMs than from lactation models. Therefore, we suggest using RRMs in place of lactation models to make national dairy cattle genetic evaluations for milk production traits in Korea. PMID:26954184

  2. Models for Estimating Genetic Parameters of Milk Production Traits Using Random Regression Models in Korean Holstein Cattle.

    PubMed

    Cho, C I; Alam, M; Choi, T J; Choy, Y H; Choi, J G; Lee, S S; Cho, K H

    2016-05-01

    The objectives of the study were to estimate genetic parameters for milk production traits of Holstein cattle using random regression models (RRMs), and to compare the goodness of fit of various RRMs with homogeneous and heterogeneous residual variances. A total of 126,980 test-day milk production records of the first parity Holstein cows between 2007 and 2014 from the Dairy Cattle Improvement Center of National Agricultural Cooperative Federation in South Korea were used. These records included milk yield (MILK), fat yield (FAT), protein yield (PROT), and solids-not-fat yield (SNF). The statistical models included random effects of genetic and permanent environments using Legendre polynomials (LP) of the third to fifth order (L3-L5), fixed effects of herd-test day, year-season at calving, and a fixed regression for the test-day record (third to fifth order). The residual variances in the models were either homogeneous (HOM) or heterogeneous (15 classes, HET15; 60 classes, HET60). A total of nine models (3 orders of polynomials×3 types of residual variance) including L3-HOM, L3-HET15, L3-HET60, L4-HOM, L4-HET15, L4-HET60, L5-HOM, L5-HET15, and L5-HET60 were compared using Akaike information criteria (AIC) and/or Schwarz Bayesian information criteria (BIC) statistics to identify the model(s) of best fit for their respective traits. The lowest BIC value was observed for the models L5-HET15 (MILK; PROT; SNF) and L4-HET15 (FAT), which fit the best. In general, the BIC values of HET15 models for a particular polynomial order was lower than that of the HET60 model in most cases. This implies that the orders of LP and types of residual variances affect the goodness of models. Also, the heterogeneity of residual variances should be considered for the test-day analysis. The heritability estimates of from the best fitted models ranged from 0.08 to 0.15 for MILK, 0.06 to 0.14 for FAT, 0.08 to 0.12 for PROT, and 0.07 to 0.13 for SNF according to days in milk of first lactation. Genetic variances for studied traits tended to decrease during the earlier stages of lactation, which were followed by increases in the middle and decreases further at the end of lactation. With regards to the fitness of the models and the differential genetic parameters across the lactation stages, we could estimate genetic parameters more accurately from RRMs than from lactation models. Therefore, we suggest using RRMs in place of lactation models to make national dairy cattle genetic evaluations for milk production traits in Korea.

  3. Deletion Diagnostics for Alternating Logistic Regressions

    PubMed Central

    Preisser, John S.; By, Kunthel; Perin, Jamie; Qaqish, Bahjat F.

    2013-01-01

    Deletion diagnostics are introduced for the regression analysis of clustered binary outcomes estimated with alternating logistic regressions, an implementation of generalized estimating equations (GEE) that estimates regression coefficients in a marginal mean model and in a model for the intracluster association given by the log odds ratio. The diagnostics are developed within an estimating equations framework that recasts the estimating functions for association parameters based upon conditional residuals into equivalent functions based upon marginal residuals. Extensions of earlier work on GEE diagnostics follow directly, including computational formulae for one-step deletion diagnostics that measure the influence of a cluster of observations on the estimated regression parameters and on the overall marginal mean or association model fit. The diagnostic formulae are evaluated with simulations studies and with an application concerning an assessment of factors associated with health maintenance visits in primary care medical practices. The application and the simulations demonstrate that the proposed cluster-deletion diagnostics for alternating logistic regressions are good approximations of their exact fully iterated counterparts. PMID:22777960

  4. Reader reaction to "a robust method for estimating optimal treatment regimes" by Zhang et al. (2012).

    PubMed

    Taylor, Jeremy M G; Cheng, Wenting; Foster, Jared C

    2015-03-01

    A recent article (Zhang et al., 2012, Biometrics 168, 1010-1018) compares regression based and inverse probability based methods of estimating an optimal treatment regime and shows for a small number of covariates that inverse probability weighted methods are more robust to model misspecification than regression methods. We demonstrate that using models that fit the data better reduces the concern about non-robustness for the regression methods. We extend the simulation study of Zhang et al. (2012, Biometrics 168, 1010-1018), also considering the situation of a larger number of covariates, and show that incorporating random forests into both regression and inverse probability weighted based methods improves their properties. © 2014, The International Biometric Society.

  5. Application of a parameter-estimation technique to modeling the regional aquifer underlying the eastern Snake River plain, Idaho

    USGS Publications Warehouse

    Garabedian, Stephen P.

    1986-01-01

    A nonlinear, least-squares regression technique for the estimation of ground-water flow model parameters was applied to the regional aquifer underlying the eastern Snake River Plain, Idaho. The technique uses a computer program to simulate two-dimensional, steady-state ground-water flow. Hydrologic data for the 1980 water year were used to calculate recharge rates, boundary fluxes, and spring discharges. Ground-water use was estimated from irrigated land maps and crop consumptive-use figures. These estimates of ground-water withdrawal, recharge rates, and boundary flux, along with leakance, were used as known values in the model calibration of transmissivity. Leakance values were adjusted between regression solutions by comparing model-calculated to measured spring discharges. In other simulations, recharge and leakance also were calibrated as prior-information regression parameters, which limits the variation of these parameters using a normalized standard error of estimate. Results from a best-fit model indicate a wide areal range in transmissivity from about 0.05 to 44 feet squared per second and in leakance from about 2.2x10 -9 to 6.0 x 10 -8 feet per second per foot. Along with parameter values, model statistics also were calculated, including the coefficient of correlation between calculated and observed head (0.996), the standard error of the estimates for head (40 feet), and the parameter coefficients of variation (about 10-40 percent). Additional boundary flux was added in some areas during calibration to achieve proper fit to ground-water flow directions. Model fit improved significantly when areas that violated model assumptions were removed. It also improved slightly when y-direction (northwest-southeast) transmissivity values were larger than x-direction (northeast-southwest) transmissivity values. The model was most sensitive to changes in recharge, and in some areas, to changes in transmissivity, particularly near the spring discharge area from Milner Dam to King Hill.

  6. The PX-EM algorithm for fast stable fitting of Henderson's mixed model

    PubMed Central

    Foulley, Jean-Louis; Van Dyk, David A

    2000-01-01

    This paper presents procedures for implementing the PX-EM algorithm of Liu, Rubin and Wu to compute REML estimates of variance covariance components in Henderson's linear mixed models. The class of models considered encompasses several correlated random factors having the same vector length e.g., as in random regression models for longitudinal data analysis and in sire-maternal grandsire models for genetic evaluation. Numerical examples are presented to illustrate the procedures. Much better results in terms of convergence characteristics (number of iterations and time required for convergence) are obtained for PX-EM relative to the basic EM algorithm in the random regression. PMID:14736399

  7. Bioinactivation: Software for modelling dynamic microbial inactivation.

    PubMed

    Garre, Alberto; Fernández, Pablo S; Lindqvist, Roland; Egea, Jose A

    2017-03-01

    This contribution presents the bioinactivation software, which implements functions for the modelling of isothermal and non-isothermal microbial inactivation. This software offers features such as user-friendliness, modelling of dynamic conditions, possibility to choose the fitting algorithm and generation of prediction intervals. The software is offered in two different formats: Bioinactivation core and Bioinactivation SE. Bioinactivation core is a package for the R programming language, which includes features for the generation of predictions and for the fitting of models to inactivation experiments using non-linear regression or a Markov Chain Monte Carlo algorithm (MCMC). The calculations are based on inactivation models common in academia and industry (Bigelow, Peleg, Mafart and Geeraerd). Bioinactivation SE supplies a user-friendly interface to selected functions of Bioinactivation core, namely the model fitting of non-isothermal experiments and the generation of prediction intervals. The capabilities of bioinactivation are presented in this paper through a case study, modelling the non-isothermal inactivation of Bacillus sporothermodurans. This study has provided a full characterization of the response of the bacteria to dynamic temperature conditions, including confidence intervals for the model parameters and a prediction interval of the survivor curve. We conclude that the MCMC algorithm produces a better characterization of the biological uncertainty and variability than non-linear regression. The bioinactivation software can be relevant to the food and pharmaceutical industry, as well as to regulatory agencies, as part of a (quantitative) microbial risk assessment. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. Relationship between long working hours and depression: a 3-year longitudinal study of clerical workers.

    PubMed

    Amagasa, Takashi; Nakayama, Takeo

    2013-08-01

    To clarify how long working hours affect the likelihood of current and future depression. Using data from four repeated measurements collected from 218 clerical workers, four models associating work-related factors to the depressive mood scale were established. The final model was constructed after comparing and testing the goodness-of-fit index using structural equation modeling. Multiple logistic regression analysis was also performed. The final model showed the best fit (normed fit index = 0.908; goodness-of-fit index = 0.936; root-mean-square error of approximation = 0.018). Its standardized total effect indicated that long working hours affected depression at the time of evaluation and 1 to 3 years later. The odds ratio for depression risk was 14.7 in employees who were not long-hours overworked according to the initial survey but who were long-hours overworked according to the second survey. Long working hours increase current and future risks of depression.

  9. Local Intrinsic Dimension Estimation by Generalized Linear Modeling.

    PubMed

    Hino, Hideitsu; Fujiki, Jun; Akaho, Shotaro; Murata, Noboru

    2017-07-01

    We propose a method for intrinsic dimension estimation. By fitting the power of distance from an inspection point and the number of samples included inside a ball with a radius equal to the distance, to a regression model, we estimate the goodness of fit. Then, by using the maximum likelihood method, we estimate the local intrinsic dimension around the inspection point. The proposed method is shown to be comparable to conventional methods in global intrinsic dimension estimation experiments. Furthermore, we experimentally show that the proposed method outperforms a conventional local dimension estimation method.

  10. Linearity versus Nonlinearity of Offspring-Parent Regression: An Experimental Study of Drosophila Melanogaster

    PubMed Central

    Gimelfarb, A.; Willis, J. H.

    1994-01-01

    An experiment was conducted to investigate the offspring-parent regression for three quantitative traits (weight, abdominal bristles and wing length) in Drosophila melanogaster. Linear and polynomial models were fitted for the regressions of a character in offspring on both parents. It is demonstrated that responses by the characters to selection predicted by the nonlinear regressions may differ substantially from those predicted by the linear regressions. This is true even, and especially, if selection is weak. The realized heritability for a character under selection is shown to be determined not only by the offspring-parent regression but also by the distribution of the character and by the form and strength of selection. PMID:7828818

  11. Accounting for spatial effects in land use regression for urban air pollution modeling.

    PubMed

    Bertazzon, Stefania; Johnson, Markey; Eccles, Kristin; Kaplan, Gilaad G

    2015-01-01

    In order to accurately assess air pollution risks, health studies require spatially resolved pollution concentrations. Land-use regression (LUR) models estimate ambient concentrations at a fine spatial scale. However, spatial effects such as spatial non-stationarity and spatial autocorrelation can reduce the accuracy of LUR estimates by increasing regression errors and uncertainty; and statistical methods for resolving these effects--e.g., spatially autoregressive (SAR) and geographically weighted regression (GWR) models--may be difficult to apply simultaneously. We used an alternate approach to address spatial non-stationarity and spatial autocorrelation in LUR models for nitrogen dioxide. Traditional models were re-specified to include a variable capturing wind speed and direction, and re-fit as GWR models. Mean R(2) values for the resulting GWR-wind models (summer: 0.86, winter: 0.73) showed a 10-20% improvement over traditional LUR models. GWR-wind models effectively addressed both spatial effects and produced meaningful predictive models. These results suggest a useful method for improving spatially explicit models. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  12. Motion patterns in acupuncture needle manipulation.

    PubMed

    Seo, Yoonjeong; Lee, In-Seon; Jung, Won-Mo; Ryu, Ho-Sun; Lim, Jinwoong; Ryu, Yeon-Hee; Kang, Jung-Won; Chae, Younbyoung

    2014-10-01

    In clinical practice, acupuncture manipulation is highly individualised for each practitioner. Before we establish a standard for acupuncture manipulation, it is important to understand completely the manifestations of acupuncture manipulation in the actual clinic. To examine motion patterns during acupuncture manipulation, we generated a fitted model of practitioners' motion patterns and evaluated their consistencies in acupuncture manipulation. Using a motion sensor, we obtained real-time motion data from eight experienced practitioners while they conducted acupuncture manipulation using their own techniques. We calculated the average amplitude and duration of a sampled motion unit for each practitioner and, after normalisation, we generated a true regression curve of motion patterns for each practitioner using a generalised additive mixed modelling (GAMM). We observed significant differences in rotation amplitude and duration in motion samples among practitioners. GAMM showed marked variations in average regression curves of motion patterns among practitioners but there was strong consistency in motion parameters for individual practitioners. The fitted regression model showed that the true regression curve accounted for an average of 50.2% of variance in the motion pattern for each practitioner. Our findings suggest that there is great inter-individual variability between practitioners, but remarkable intra-individual consistency within each practitioner. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  13. Thermophysical Property Models for Lunar Regolith

    NASA Technical Reports Server (NTRS)

    Schreiner, Samuel S.; Dominguez, Jesus A.; Sibille, Laurent; Hoffman, Jeffrey A.

    2015-01-01

    We present a set of models for a wide range of lunar regolith material properties. Data from the literature are t with regression models for the following regolith properties: composition, density, specific heat, thermal conductivity, electrical conductivity, optical absorption length, and latent heat of melting/fusion. These models contain both temperature and composition dependencies so that they can be tailored for a range of applications. These models can enable more consistent, informed analysis and design of lunar regolith processing hardware. Furthermore, these models can be utilized to further inform lunar geological simulations. In addition to regression models for each material property, the raw data is also presented to allow for further interpretation and fitting as necessary.

  14. On Insensitivity of the Chi-Square Model Test to Nonlinear Misspecification in Structural Equation Models

    ERIC Educational Resources Information Center

    Mooijaart, Ab; Satorra, Albert

    2009-01-01

    In this paper, we show that for some structural equation models (SEM), the classical chi-square goodness-of-fit test is unable to detect the presence of nonlinear terms in the model. As an example, we consider a regression model with latent variables and interactions terms. Not only the model test has zero power against that type of…

  15. Empirical Modeling of Plant Gas Fluxes in Controlled Environments

    NASA Technical Reports Server (NTRS)

    Cornett, Jessie David

    1994-01-01

    As humans extend their reach beyond the earth, bioregenerative life support systems must replace the resupply and physical/chemical systems now used. The Controlled Ecological Life Support System (CELSS) will utilize plants to recycle the carbon dioxide (CO2) and excrement produced by humans and return oxygen (O2), purified water and food. CELSS design requires knowledge of gas flux levels for net photosynthesis (PS(sub n)), dark respiration (R(sub d)) and evapotranspiration (ET). Full season gas flux data regarding these processes for wheat (Triticum aestivum), soybean (Glycine max) and rice (Oryza sativa) from published sources were used to develop empirical models. Univariate models relating crop age (days after planting) and gas flux were fit by simple regression. Models are either high order (5th to 8th) or more complex polynomials whose curves describe crop development characteristics. The models provide good estimates of gas flux maxima, but are of limited utility. To broaden the applicability, data were transformed to dimensionless or correlation formats and, again, fit by regression. Polynomials, similar to those in the initial effort, were selected as the most appropriate models. These models indicate that, within a cultivar, gas flux patterns appear remarkably similar prior to maximum flux, but exhibit considerable variation beyond this point. This suggests that more broadly applicable models of plant gas flux are feasible, but univariate models defining gas flux as a function of crop age are too simplistic. Multivariate models using CO2 and crop age were fit for PS(sub n), and R(sub d) by multiple regression. In each case, the selected model is a subset of a full third order model with all possible interactions. These models are improvements over the univariate models because they incorporate more than the single factor, crop age, as the primary variable governing gas flux. They are still limited, however, by their reliance on the other environmental conditions under which the original data were collected. Three-dimensional plots representing the response surface of each model are included. Suitability of using empirical models to generate engineering design estimates is discussed. Recommendations for the use of more complex multivariate models to increase versatility are included.

  16. Mathematical model of zinc absorption: effects of dietary calcium, protein and iron on zinc absorption

    PubMed Central

    Miller, Leland V.; Krebs, Nancy F.; Hambidge, K. Michael

    2013-01-01

    A previously described mathematical model of Zn absorption as a function of total daily dietary Zn and phytate was fitted to data from studies in which dietary Ca, Fe and protein were also measured. An analysis of regression residuals indicated statistically significant positive relationships between the residuals and Ca, Fe and protein, suggesting that the presence of any of these dietary components enhances Zn absorption. Based on the hypotheses that (1) Ca and Fe both promote Zn absorption by binding with phytate and thereby making it unavailable for binding Zn and (2) protein enhances the availability of Zn for transporter binding, the model was modified to incorporate these effects. The new model of Zn absorption as a function of dietary Zn, phytate, Ca, Fe and protein was then fitted to the data. The proportion of variation in absorbed Zn explained by the new model was 0·88, an increase from 0·82 with the original model. A reduced version of the model without Fe produced an equally good fit to the data and an improved value for the model selection criterion, demonstrating that when dietary Ca and protein are controlled for, there is no evidence that dietary Fe influences Zn absorption. Regression residuals and testing with additional data supported the validity of the new model. It was concluded that dietary Ca and protein modestly enhanced Zn absorption and Fe had no statistically discernable effect. Furthermore, the model provides a meaningful foundation for efforts to model nutrient interactions in mineral absorption. PMID:22617116

  17. Mathematical model of zinc absorption: effects of dietary calcium, protein and iron on zinc absorption.

    PubMed

    Miller, Leland V; Krebs, Nancy F; Hambidge, K Michael

    2013-02-28

    A previously described mathematical model of Zn absorption as a function of total daily dietary Zn and phytate was fitted to data from studies in which dietary Ca, Fe and protein were also measured. An analysis of regression residuals indicated statistically significant positive relationships between the residuals and Ca, Fe and protein, suggesting that the presence of any of these dietary components enhances Zn absorption. Based on the hypotheses that (1) Ca and Fe both promote Zn absorption by binding with phytate and thereby making it unavailable for binding Zn and (2) protein enhances the availability of Zn for transporter binding, the model was modified to incorporate these effects. The new model of Zn absorption as a function of dietary Zn, phytate, Ca, Fe and protein was then fitted to the data. The proportion of variation in absorbed Zn explained by the new model was 0·88, an increase from 0·82 with the original model. A reduced version of the model without Fe produced an equally good fit to the data and an improved value for the model selection criterion, demonstrating that when dietary Ca and protein are controlled for, there is no evidence that dietary Fe influences Zn absorption. Regression residuals and testing with additional data supported the validity of the new model. It was concluded that dietary Ca and protein modestly enhanced Zn absorption and Fe had no statistically discernable effect. Furthermore, the model provides a meaningful foundation for efforts to model nutrient interactions in mineral absorption.

  18. Development of an Agent-Based Model (ABM) to Simulate the Immune System and Integration of a Regression Method to Estimate the Key ABM Parameters by Fitting the Experimental Data

    PubMed Central

    Tong, Xuming; Chen, Jinghang; Miao, Hongyu; Li, Tingting; Zhang, Le

    2015-01-01

    Agent-based models (ABM) and differential equations (DE) are two commonly used methods for immune system simulation. However, it is difficult for ABM to estimate key parameters of the model by incorporating experimental data, whereas the differential equation model is incapable of describing the complicated immune system in detail. To overcome these problems, we developed an integrated ABM regression model (IABMR). It can combine the advantages of ABM and DE by employing ABM to mimic the multi-scale immune system with various phenotypes and types of cells as well as using the input and output of ABM to build up the Loess regression for key parameter estimation. Next, we employed the greedy algorithm to estimate the key parameters of the ABM with respect to the same experimental data set and used ABM to describe a 3D immune system similar to previous studies that employed the DE model. These results indicate that IABMR not only has the potential to simulate the immune system at various scales, phenotypes and cell types, but can also accurately infer the key parameters like DE model. Therefore, this study innovatively developed a complex system development mechanism that could simulate the complicated immune system in detail like ABM and validate the reliability and efficiency of model like DE by fitting the experimental data. PMID:26535589

  19. Modeling Geodetic Processes with Levy α-Stable Distribution and FARIMA

    NASA Astrophysics Data System (ADS)

    Montillet, Jean-Philippe; Yu, Kegen

    2015-04-01

    Over the last years the scientific community has been using the auto regressive moving average (ARMA) model in the modeling of the noise in global positioning system (GPS) time series (daily solution). This work starts with the investigation of the limit of the ARMA model which is widely used in signal processing when the measurement noise is white. Since a typical GPS time series consists of geophysical signals (e.g., seasonal signal) and stochastic processes (e.g., coloured and white noise), the ARMA model may be inappropriate. Therefore, the application of the fractional auto-regressive integrated moving average (FARIMA) model is investigated. The simulation results using simulated time series as well as real GPS time series from a few selected stations around Australia show that the FARIMA model fits the time series better than other models when the coloured noise is larger than the white noise. The second fold of this work focuses on fitting the GPS time series with the family of Levy α-stable distributions. Using this distribution, a hypothesis test is developed to eliminate effectively coarse outliers from GPS time series, achieving better performance than using the rule of thumb of n standard deviations (with n chosen empirically).

  20. Preliminary results of spatial modeling of selected forest health variables in Georgia

    Treesearch

    Brock Stewart; Chris J. Cieszewski

    2009-01-01

    Variables relating to forest health monitoring, such as mortality, are difficult to predict and model. We present here the results of fitting various spatial regression models to these variables. We interpolate plot-level values compiled from the Forest Inventory and Analysis National Information Management System (FIA-NIMS) data that are related to forest health....

  1. An updated PREDICT breast cancer prognostication and treatment benefit prediction model with independent validation.

    PubMed

    Candido Dos Reis, Francisco J; Wishart, Gordon C; Dicks, Ed M; Greenberg, David; Rashbass, Jem; Schmidt, Marjanka K; van den Broek, Alexandra J; Ellis, Ian O; Green, Andrew; Rakha, Emad; Maishman, Tom; Eccles, Diana M; Pharoah, Paul D P

    2017-05-22

    PREDICT is a breast cancer prognostic and treatment benefit model implemented online. The overall fit of the model has been good in multiple independent case series, but PREDICT has been shown to underestimate breast cancer specific mortality in women diagnosed under the age of 40. Another limitation is the use of discrete categories for tumour size and node status resulting in 'step' changes in risk estimates on moving between categories. We have refitted the PREDICT prognostic model using the original cohort of cases from East Anglia with updated survival time in order to take into account age at diagnosis and to smooth out the survival function for tumour size and node status. Multivariable Cox regression models were used to fit separate models for ER negative and ER positive disease. Continuous variables were fitted using fractional polynomials and a smoothed baseline hazard was obtained by regressing the baseline cumulative hazard for each patients against time using fractional polynomials. The fit of the prognostic models were then tested in three independent data sets that had also been used to validate the original version of PREDICT. In the model fitting data, after adjusting for other prognostic variables, there is an increase in risk of breast cancer specific mortality in younger and older patients with ER positive disease, with a substantial increase in risk for women diagnosed before the age of 35. In ER negative disease the risk increases slightly with age. The association between breast cancer specific mortality and both tumour size and number of positive nodes was non-linear with a more marked increase in risk with increasing size and increasing number of nodes in ER positive disease. The overall calibration and discrimination of the new version of PREDICT (v2) was good and comparable to that of the previous version in both model development and validation data sets. However, the calibration of v2 improved over v1 in patients diagnosed under the age of 40. The PREDICT v2 is an improved prognostication and treatment benefit model compared with v1. The online version should continue to aid clinical decision making in women with early breast cancer.

  2. Comparison of 1-step and 2-step methods of fitting microbiological models.

    PubMed

    Jewell, Keith

    2012-11-15

    Previous conclusions that a 1-step fitting method gives more precise coefficients than the traditional 2-step method are confirmed by application to three different data sets. It is also shown that, in comparison to 2-step fits, the 1-step method gives better fits to the data (often substantially) with directly interpretable regression diagnostics and standard errors. The improvement is greatest at extremes of environmental conditions and it is shown that 1-step fits can indicate inappropriate functional forms when 2-step fits do not. 1-step fits are better at estimating primary parameters (e.g. lag, growth rate) as well as concentrations, and are much more data efficient, allowing the construction of more robust models on smaller data sets. The 1-step method can be straightforwardly applied to any data set for which the 2-step method can be used and additionally to some data sets where the 2-step method fails. A 2-step approach is appropriate for visual assessment in the early stages of model development, and may be a convenient way to generate starting values for a 1-step fit, but the 1-step approach should be used for any quantitative assessment. Copyright © 2012 Elsevier B.V. All rights reserved.

  3. vFitness: a web-based computing tool for improving estimation of in vitro HIV-1 fitness experiments

    PubMed Central

    2010-01-01

    Background The replication rate (or fitness) between viral variants has been investigated in vivo and in vitro for human immunodeficiency virus (HIV). HIV fitness plays an important role in the development and persistence of drug resistance. The accurate estimation of viral fitness relies on complicated computations based on statistical methods. This calls for tools that are easy to access and intuitive to use for various experiments of viral fitness. Results Based on a mathematical model and several statistical methods (least-squares approach and measurement error models), a Web-based computing tool has been developed for improving estimation of virus fitness in growth competition assays of human immunodeficiency virus type 1 (HIV-1). Conclusions Unlike the two-point calculation used in previous studies, the estimation here uses linear regression methods with all observed data in the competition experiment to more accurately estimate relative viral fitness parameters. The dilution factor is introduced for making the computational tool more flexible to accommodate various experimental conditions. This Web-based tool is implemented in C# language with Microsoft ASP.NET, and is publicly available on the Web at http://bis.urmc.rochester.edu/vFitness/. PMID:20482791

  4. vFitness: a web-based computing tool for improving estimation of in vitro HIV-1 fitness experiments.

    PubMed

    Ma, Jingming; Dykes, Carrie; Wu, Tao; Huang, Yangxin; Demeter, Lisa; Wu, Hulin

    2010-05-18

    The replication rate (or fitness) between viral variants has been investigated in vivo and in vitro for human immunodeficiency virus (HIV). HIV fitness plays an important role in the development and persistence of drug resistance. The accurate estimation of viral fitness relies on complicated computations based on statistical methods. This calls for tools that are easy to access and intuitive to use for various experiments of viral fitness. Based on a mathematical model and several statistical methods (least-squares approach and measurement error models), a Web-based computing tool has been developed for improving estimation of virus fitness in growth competition assays of human immunodeficiency virus type 1 (HIV-1). Unlike the two-point calculation used in previous studies, the estimation here uses linear regression methods with all observed data in the competition experiment to more accurately estimate relative viral fitness parameters. The dilution factor is introduced for making the computational tool more flexible to accommodate various experimental conditions. This Web-based tool is implemented in C# language with Microsoft ASP.NET, and is publicly available on the Web at http://bis.urmc.rochester.edu/vFitness/.

  5. Analysis of the Environmental Management System based on ISO 14001 on the American continent.

    PubMed

    Neves, Fábio de Oliveira; Salgado, Eduardo G; Beijo, Luiz A

    2017-09-01

    The American continent is in broad economic and industrial development. Consequently, a more detailed discussion of the impacts generated by such development is needed. Moreover, there is an increase in the number of ISO 14001 certificates issued to this continent. Given the above, no studies were found that bridge the gap to identify the influence of different factors on ISO 14001 in the Americas. Thus, this article has as its main aim to check which economic, environmental and cultural factors have influence on ISO 14001 Certification in the American Continent. The data were collected in the ISO Survey, World Bank, United Nations Development Programme and International Energy Agency. Among the countries of that continent, thirteen were analyzed and only two did not show the economic factors as the influence factor in the multiple regression models fitted with Brazil and the United State. In these models, all presented environmental factors as influencing factors. Only in Brazil the index HDI presented as cultural factor in multiple regression model fitted. The economic factors: Gross Domestic Product and exports of goods and services and environmental: Carbon Dioxide (CO 2 ) and fossil fuel consumption were the most influential in ISO 14001 certification. Venezuela, Uruguay, Colombia and the United States were countries that had factors dependent on each other, featuring the environmental marketing. Briefly, this study brings up several implications: to the academy, with the proposal of new concepts and guidance on the factors that assist in ISO 14001 certification in the American Continent. Additionally, taking into account the industry, the factors serve as efficiency parameters for the implementation of ISO 14001 standard, and for the Government to improve through factors that do not fit in multiple regression models. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. Flexible Meta-Regression to Assess the Shape of the Benzene–Leukemia Exposure–Response Curve

    PubMed Central

    Vlaanderen, Jelle; Portengen, Lützen; Rothman, Nathaniel; Lan, Qing; Kromhout, Hans; Vermeulen, Roel

    2010-01-01

    Background Previous evaluations of the shape of the benzene–leukemia exposure–response curve (ERC) were based on a single set or on small sets of human occupational studies. Integrating evidence from all available studies that are of sufficient quality combined with flexible meta-regression models is likely to provide better insight into the functional relation between benzene exposure and risk of leukemia. Objectives We used natural splines in a flexible meta-regression method to assess the shape of the benzene–leukemia ERC. Methods We fitted meta-regression models to 30 aggregated risk estimates extracted from nine human observational studies and performed sensitivity analyses to assess the impact of a priori assessed study characteristics on the predicted ERC. Results The natural spline showed a supralinear shape at cumulative exposures less than 100 ppm-years, although this model fitted the data only marginally better than a linear model (p = 0.06). Stratification based on study design and jackknifing indicated that the cohort studies had a considerable impact on the shape of the ERC at high exposure levels (> 100 ppm-years) but that predicted risks for the low exposure range (< 50 ppm-years) were robust. Conclusions Although limited by the small number of studies and the large heterogeneity between studies, the inclusion of all studies of sufficient quality combined with a flexible meta-regression method provides the most comprehensive evaluation of the benzene–leukemia ERC to date. The natural spline based on all data indicates a significantly increased risk of leukemia [relative risk (RR) = 1.14; 95% confidence interval (CI), 1.04–1.26] at an exposure level as low as 10 ppm-years. PMID:20064779

  7. Remote-sensing data processing with the multivariate regression analysis method for iron mineral resource potential mapping: a case study in the Sarvian area, central Iran

    NASA Astrophysics Data System (ADS)

    Mansouri, Edris; Feizi, Faranak; Jafari Rad, Alireza; Arian, Mehran

    2018-03-01

    This paper uses multivariate regression to create a mathematical model for iron skarn exploration in the Sarvian area, central Iran, using multivariate regression for mineral prospectivity mapping (MPM). The main target of this paper is to apply multivariate regression analysis (as an MPM method) to map iron outcrops in the northeastern part of the study area in order to discover new iron deposits in other parts of the study area. Two types of multivariate regression models using two linear equations were employed to discover new mineral deposits. This method is one of the reliable methods for processing satellite images. ASTER satellite images (14 bands) were used as unique independent variables (UIVs), and iron outcrops were mapped as dependent variables for MPM. According to the results of the probability value (p value), coefficient of determination value (R2) and adjusted determination coefficient (Radj2), the second regression model (which consistent of multiple UIVs) fitted better than other models. The accuracy of the model was confirmed by iron outcrops map and geological observation. Based on field observation, iron mineralization occurs at the contact of limestone and intrusive rocks (skarn type).

  8. Groundwater depth prediction in a shallow aquifer in north China by a quantile regression model

    NASA Astrophysics Data System (ADS)

    Li, Fawen; Wei, Wan; Zhao, Yong; Qiao, Jiale

    2017-01-01

    There is a close relationship between groundwater level in a shallow aquifer and the surface ecological environment; hence, it is important to accurately simulate and predict the groundwater level in eco-environmental construction projects. The multiple linear regression (MLR) model is one of the most useful methods to predict groundwater level (depth); however, the predicted values by this model only reflect the mean distribution of the observations and cannot effectively fit the extreme distribution data (outliers). The study reported here builds a prediction model of groundwater-depth dynamics in a shallow aquifer using the quantile regression (QR) method on the basis of the observed data of groundwater depth and related factors. The proposed approach was applied to five sites in Tianjin city, north China, and the groundwater depth was calculated in different quantiles, from which the optimal quantile was screened out according to the box plot method and compared to the values predicted by the MLR model. The results showed that the related factors in the five sites did not follow the standard normal distribution and that there were outliers in the precipitation and last-month (initial state) groundwater-depth factors because the basic assumptions of the MLR model could not be achieved, thereby causing errors. Nevertheless, these conditions had no effect on the QR model, as it could more effectively describe the distribution of original data and had a higher precision in fitting the outliers.

  9. Analysis of Learning Curve Fitting Techniques.

    DTIC Science & Technology

    1987-09-01

    1986. 15. Neter, John and others. Applied Linear Regression Models. Homewood IL: Irwin, 19-33. 16. SAS User’s Guide: Basics, Version 5 Edition. SAS... Linear Regression Techniques (15:23-52). Random errors are assumed to be normally distributed when using -# ordinary least-squares, according to Johnston...lot estimated by the improvement curve formula. For a more detailed explanation of the ordinary least-squares technique, see Neter, et. al., Applied

  10. Methods for scalar-on-function regression.

    PubMed

    Reiss, Philip T; Goldsmith, Jeff; Shang, Han Lin; Ogden, R Todd

    2017-08-01

    Recent years have seen an explosion of activity in the field of functional data analysis (FDA), in which curves, spectra, images, etc. are considered as basic functional data units. A central problem in FDA is how to fit regression models with scalar responses and functional data points as predictors. We review some of the main approaches to this problem, categorizing the basic model types as linear, nonlinear and nonparametric. We discuss publicly available software packages, and illustrate some of the procedures by application to a functional magnetic resonance imaging dataset.

  11. A method for nonlinear exponential regression analysis

    NASA Technical Reports Server (NTRS)

    Junkin, B. G.

    1971-01-01

    A computer-oriented technique is presented for performing a nonlinear exponential regression analysis on decay-type experimental data. The technique involves the least squares procedure wherein the nonlinear problem is linearized by expansion in a Taylor series. A linear curve fitting procedure for determining the initial nominal estimates for the unknown exponential model parameters is included as an integral part of the technique. A correction matrix was derived and then applied to the nominal estimate to produce an improved set of model parameters. The solution cycle is repeated until some predetermined criterion is satisfied.

  12. An Update of the Bodeker Scientific Vertically Resolved, Global, Gap-Free Ozone Database

    NASA Astrophysics Data System (ADS)

    Kremser, S.; Bodeker, G. E.; Lewis, J.; Hassler, B.

    2016-12-01

    High vertical resolution ozone measurements from multiple satellite-based instruments have been merged with measurements from the global ozonesonde network to calculate monthly mean ozone values in 5º latitude zones. Ozone number densities and ozone mixing ratios are provided on 70 altitude levels (1 to 70 km) and on 70 pressure levels spaced approximately 1 km apart (878.4 hPa to 0.046 hPa). These data are sparse and do not cover the entire globe or altitude range. To provide a gap-free database, a least squares regression model is fitted to these data and then evaluated globally. By applying a single fit at each level, and using the approach of allowing the regression fits to change only slightly from one level to the next, the regression is less sensitive to measurement anomalies at individual stations or to individual satellite-based instruments. Particular attention is paid to ensuring that the low ozone abundances in the polar regions are captured. This presentation reports on updates to an earlier version of the vertically resolved ozone database, including the incorporation of new ozone measurements and new techniques for combining the data. Compared to previous versions of the database, particular attention is paid to avoiding spatial and temporal sampling biases and tracing uncertainties through to the final product. This updated database, developed within the New Zealand Deep South National Science Challenge, is suitable for assessing ozone fields from chemistry-climate model simulations or for providing the ozone boundary conditions for global climate model simulations that do not treat stratospheric chemistry interactively.

  13. An Activity-Based Non-Linear Regression Model of Sopite Syndrome and its Effects on Crew Performance in High-Speed Vessel Operations

    DTIC Science & Technology

    2009-03-01

    80 100 120 140 -0 .6 -0 .4 -0 .2 0. 0 0. 2 0. 4 0. 6 l1 fit M irr or T ra ce r M od el Figure 26. l1fit of Mirror Tracer Model To ensure model... teachers are unfair to students is nonsense. b. Most students don’t realize the extent to which their grades are influenced by accidental happenings...understand how teachers arrive at the grades they give. b. There is a direct connection between how hard 1 study and the grades I get. 24. a. A

  14. Interquantile Shrinkage in Regression Models

    PubMed Central

    Jiang, Liewen; Wang, Huixia Judy; Bondell, Howard D.

    2012-01-01

    Conventional analysis using quantile regression typically focuses on fitting the regression model at different quantiles separately. However, in situations where the quantile coefficients share some common feature, joint modeling of multiple quantiles to accommodate the commonality often leads to more efficient estimation. One example of common features is that a predictor may have a constant effect over one region of quantile levels but varying effects in other regions. To automatically perform estimation and detection of the interquantile commonality, we develop two penalization methods. When the quantile slope coefficients indeed do not change across quantile levels, the proposed methods will shrink the slopes towards constant and thus improve the estimation efficiency. We establish the oracle properties of the two proposed penalization methods. Through numerical investigations, we demonstrate that the proposed methods lead to estimations with competitive or higher efficiency than the standard quantile regression estimation in finite samples. Supplemental materials for the article are available online. PMID:24363546

  15. Predicting motor vehicle collisions using Bayesian neural network models: an empirical analysis.

    PubMed

    Xie, Yuanchang; Lord, Dominique; Zhang, Yunlong

    2007-09-01

    Statistical models have frequently been used in highway safety studies. They can be utilized for various purposes, including establishing relationships between variables, screening covariates and predicting values. Generalized linear models (GLM) and hierarchical Bayes models (HBM) have been the most common types of model favored by transportation safety analysts. Over the last few years, researchers have proposed the back-propagation neural network (BPNN) model for modeling the phenomenon under study. Compared to GLMs and HBMs, BPNNs have received much less attention in highway safety modeling. The reasons are attributed to the complexity for estimating this kind of model as well as the problem related to "over-fitting" the data. To circumvent the latter problem, some statisticians have proposed the use of Bayesian neural network (BNN) models. These models have been shown to perform better than BPNN models while at the same time reducing the difficulty associated with over-fitting the data. The objective of this study is to evaluate the application of BNN models for predicting motor vehicle crashes. To accomplish this objective, a series of models was estimated using data collected on rural frontage roads in Texas. Three types of models were compared: BPNN, BNN and the negative binomial (NB) regression models. The results of this study show that in general both types of neural network models perform better than the NB regression model in terms of data prediction. Although the BPNN model can occasionally provide better or approximately equivalent prediction performance compared to the BNN model, in most cases its prediction performance is worse than the BNN model. In addition, the data fitting performance of the BPNN model is consistently worse than the BNN model, which suggests that the BNN model has better generalization abilities than the BPNN model and can effectively alleviate the over-fitting problem without significantly compromising the nonlinear approximation ability. The results also show that BNNs could be used for other useful analyses in highway safety, including the development of accident modification factors and for improving the prediction capabilities for evaluating different highway design alternatives.

  16. Body mass index and physical fitness in Brazilian adolescents.

    PubMed

    Lopes, Vitor P; Malina, Robert M; Gomez-Campos, Rossana; Cossio-Bolaños, Marco; Arruda, Miguel de; Hobold, Edilson

    2018-05-05

    Evaluate the relationship between body mass index and physical fitness in a cross-sectional sample of Brazilian youth. Participants were 3849 adolescents (2027 girls) aged 10-17 years. Weight and height were measured; body mass index was calculated. Physical fitness was evaluated with a multistage 20m shuttle run (cardiovascular endurance), standing long jump (power), and push-ups (upper body strength). Participants were grouped by sex into four age groups: 10-11, 12-13, 14-15, and 16-17 years. Sex-specific ANOVA was used to evaluate differences in each physical fitness item among weight status categories by age group. Relationships between body mass index and each physical fitness item were evaluated with quadratic regression models by age group within each sex. The physical fitness of thin and normal youth was, with few exceptions, significantly better than the physical fitness of overweight and obese youth in each age group by sex. On the other hand, physical fitness performances did not consistently differ, on average, between thin and normal weight and between overweight and obese youths. Results of the quadratic regressions indicated a curvilinear (parabolic) relationship between body mass index and each physical fitness item in most age groups. Better performances were attained by adolescents in the mid-range of the body mass index distribution, while performances of youth at the low and high ends of the body mass index distribution were lower. Relationships between the body mass index and physical fitness were generally nonlinear (parabolic) in youth 10-17 years. Copyright © 2018 Sociedade Brasileira de Pediatria. Published by Elsevier Editora Ltda. All rights reserved.

  17. Can we get some cooperation around here? The mediating role of group norms on the relationship between team personality and individual helping behaviors.

    PubMed

    Gonzalez-Mulé, Erik; DeGeest, David S; McCormick, Brian W; Seong, Jee Young; Brown, Kenneth G

    2014-09-01

    Drawing on the group-norms theory of organizational citizenship behaviors and person-environment fit theory, we introduce and test a multilevel model of the effects of additive and dispersion composition models of team members' personality characteristics on group norms and individual helping behaviors. Our model was tested using regression and random coefficients modeling on 102 research and development teams. Results indicated that high mean levels of extraversion are positively related to individual helping behaviors through the mediating effect of cooperative group norms. Further, low variance on agreeableness (supplementary fit) and high variance on extraversion (complementary fit) promote the enactment of individual helping behaviors, but only the effects of extraversion were mediated by cooperative group norms. Implications of these findings for theories of helping behaviors in teams are discussed. PsycINFO Database Record (c) 2014 APA, all rights reserved.

  18. Improving Lidar-based Aboveground Biomass Estimation with Site Productivity for Central Hardwood Forests, USA

    NASA Astrophysics Data System (ADS)

    Shao, G.; Gallion, J.; Fei, S.

    2016-12-01

    Sound forest aboveground biomass estimation is required to monitor diverse forest ecosystems and their impacts on the changing climate. Lidar-based regression models provided promised biomass estimations in most forest ecosystems. However, considerable uncertainties of biomass estimations have been reported in the temperate hardwood and hardwood-dominated mixed forests. Varied site productivities in temperate hardwood forests largely diversified height and diameter growth rates, which significantly reduced the correlation between tree height and diameter at breast height (DBH) in mature and complex forests. It is, therefore, difficult to utilize height-based lidar metrics to predict DBH-based field-measured biomass through a simple regression model regardless the variation of site productivity. In this study, we established a multi-dimension nonlinear regression model incorporating lidar metrics and site productivity classes derived from soil features. In the regression model, lidar metrics provided horizontal and vertical structural information and productivity classes differentiated good and poor forest sites. The selection and combination of lidar metrics were discussed. Multiple regression models were employed and compared. Uncertainty analysis was applied to the best fit model. The effects of site productivity on the lidar-based biomass model were addressed.

  19. Spatial distribution of psychotic disorders in an urban area of France: an ecological study.

    PubMed

    Pignon, Baptiste; Schürhoff, Franck; Baudin, Grégoire; Ferchiou, Aziz; Richard, Jean-Romain; Saba, Ghassen; Leboyer, Marion; Kirkbride, James B; Szöke, Andrei

    2016-05-18

    Previous analyses of neighbourhood variations of non-affective psychotic disorders (NAPD) have focused mainly on incidence. However, prevalence studies provide important insights on factors associated with disease evolution as well as for healthcare resource allocation. This study aimed to investigate the distribution of prevalent NAPD cases in an urban area in France. The number of cases in each neighbourhood was modelled as a function of potential confounders and ecological variables, namely: migrant density, economic deprivation and social fragmentation. This was modelled using statistical models of increasing complexity: frequentist models (using Poisson and negative binomial regressions), and several Bayesian models. For each model, assumptions validity were checked and compared as to how this fitted to the data, in order to test for possible spatial variation in prevalence. Data showed significant overdispersion (invalidating the Poisson regression model) and residual autocorrelation (suggesting the need to use Bayesian models). The best Bayesian model was Leroux's model (i.e. a model with both strong correlation between neighbouring areas and weaker correlation between areas further apart), with economic deprivation as an explanatory variable (OR = 1.13, 95% CI [1.02-1.25]). In comparison with frequentist methods, the Bayesian model showed a better fit. The number of cases showed non-random spatial distribution and was linked to economic deprivation.

  20. Weakly Informative Prior for Point Estimation of Covariance Matrices in Hierarchical Models

    ERIC Educational Resources Information Center

    Chung, Yeojin; Gelman, Andrew; Rabe-Hesketh, Sophia; Liu, Jingchen; Dorie, Vincent

    2015-01-01

    When fitting hierarchical regression models, maximum likelihood (ML) estimation has computational (and, for some users, philosophical) advantages compared to full Bayesian inference, but when the number of groups is small, estimates of the covariance matrix (S) of group-level varying coefficients are often degenerate. One can do better, even from…

  1. A Growth Model for Academic Program Life Cycle (APLC): A Theoretical and Empirical Analysis

    ERIC Educational Resources Information Center

    Acquah, Edward H. K.

    2010-01-01

    Academic program life cycle concept states each program's life flows through several stages: introduction, growth, maturity, and decline. A mixed-influence diffusion growth model is fitted to enrolment data on academic programs to analyze the factors determining progress of academic programs through their life cycles. The regression analysis yield…

  2. Endoscopic third ventriculostomy in the treatment of childhood hydrocephalus.

    PubMed

    Kulkarni, Abhaya V; Drake, James M; Mallucci, Conor L; Sgouros, Spyros; Roth, Jonathan; Constantini, Shlomi

    2009-08-01

    To develop a model to predict the probability of endoscopic third ventriculostomy (ETV) success in the treatment for hydrocephalus on the basis of a child's individual characteristics. We analyzed 618 ETVs performed consecutively on children at 12 international institutions to identify predictors of ETV success at 6 months. A multivariable logistic regression model was developed on 70% of the dataset (training set) and validated on 30% of the dataset (validation set). In the training set, 305/455 ETVs (67.0%) were successful. The regression model (containing patient age, cause of hydrocephalus, and previous cerebrospinal fluid shunt) demonstrated good fit (Hosmer-Lemeshow, P = .78) and discrimination (C statistic = 0.70). In the validation set, 105/163 ETVs (64.4%) were successful and the model maintained good fit (Hosmer-Lemeshow, P = .45), discrimination (C statistic = 0.68), and calibration (calibration slope = 0.88). A simplified ETV Success Score was devised that closely approximates the predicted probability of ETV success. Children most likely to succeed with ETV can now be accurately identified and spared the long-term complications of CSF shunting.

  3. Estimating standard errors in feature network models.

    PubMed

    Frank, Laurence E; Heiser, Willem J

    2007-05-01

    Feature network models are graphical structures that represent proximity data in a discrete space while using the same formalism that is the basis of least squares methods employed in multidimensional scaling. Existing methods to derive a network model from empirical data only give the best-fitting network and yield no standard errors for the parameter estimates. The additivity properties of networks make it possible to consider the model as a univariate (multiple) linear regression problem with positivity restrictions on the parameters. In the present study, both theoretical and empirical standard errors are obtained for the constrained regression parameters of a network model with known features. The performance of both types of standard error is evaluated using Monte Carlo techniques.

  4. The use of quantile regression to forecast higher than expected respiratory deaths in a daily time series: a study of New York City data 1987-2000.

    PubMed

    Soyiri, Ireneous N; Reidpath, Daniel D

    2013-01-01

    Forecasting higher than expected numbers of health events provides potentially valuable insights in its own right, and may contribute to health services management and syndromic surveillance. This study investigates the use of quantile regression to predict higher than expected respiratory deaths. Data taken from 70,830 deaths occurring in New York were used. Temporal, weather and air quality measures were fitted using quantile regression at the 90th-percentile with half the data (in-sample). Four QR models were fitted: an unconditional model predicting the 90th-percentile of deaths (Model 1), a seasonal/temporal (Model 2), a seasonal, temporal plus lags of weather and air quality (Model 3), and a seasonal, temporal model with 7-day moving averages of weather and air quality. Models were cross-validated with the out of sample data. Performance was measured as proportionate reduction in weighted sum of absolute deviations by a conditional, over unconditional models; i.e., the coefficient of determination (R1). The coefficient of determination showed an improvement over the unconditional model between 0.16 and 0.19. The greatest improvement in predictive and forecasting accuracy of daily mortality was associated with the inclusion of seasonal and temporal predictors (Model 2). No gains were made in the predictive models with the addition of weather and air quality predictors (Models 3 and 4). However, forecasting models that included weather and air quality predictors performed slightly better than the seasonal and temporal model alone (i.e., Model 3 > Model 4 > Model 2) This study provided a new approach to predict higher than expected numbers of respiratory related-deaths. The approach, while promising, has limitations and should be treated at this stage as a proof of concept.

  5. The Use of Quantile Regression to Forecast Higher Than Expected Respiratory Deaths in a Daily Time Series: A Study of New York City Data 1987-2000

    PubMed Central

    Soyiri, Ireneous N.; Reidpath, Daniel D.

    2013-01-01

    Forecasting higher than expected numbers of health events provides potentially valuable insights in its own right, and may contribute to health services management and syndromic surveillance. This study investigates the use of quantile regression to predict higher than expected respiratory deaths. Data taken from 70,830 deaths occurring in New York were used. Temporal, weather and air quality measures were fitted using quantile regression at the 90th-percentile with half the data (in-sample). Four QR models were fitted: an unconditional model predicting the 90th-percentile of deaths (Model 1), a seasonal / temporal (Model 2), a seasonal, temporal plus lags of weather and air quality (Model 3), and a seasonal, temporal model with 7-day moving averages of weather and air quality. Models were cross-validated with the out of sample data. Performance was measured as proportionate reduction in weighted sum of absolute deviations by a conditional, over unconditional models; i.e., the coefficient of determination (R1). The coefficient of determination showed an improvement over the unconditional model between 0.16 and 0.19. The greatest improvement in predictive and forecasting accuracy of daily mortality was associated with the inclusion of seasonal and temporal predictors (Model 2). No gains were made in the predictive models with the addition of weather and air quality predictors (Models 3 and 4). However, forecasting models that included weather and air quality predictors performed slightly better than the seasonal and temporal model alone (i.e., Model 3 > Model 4 > Model 2) This study provided a new approach to predict higher than expected numbers of respiratory related-deaths. The approach, while promising, has limitations and should be treated at this stage as a proof of concept. PMID:24147122

  6. Examining Preservice Science Teacher Understanding of Nature of Science: Discriminating Variables on the Aspects of Nature of Science

    NASA Astrophysics Data System (ADS)

    Jones, William I.

    This study examined the understanding of nature of science among participants in their final year of a 4-year undergraduate teacher education program at a Midwest liberal arts university. The Logic Model Process was used as an integrative framework to focus the collection, organization, analysis, and interpretation of the data for the purpose of (1) describing participant understanding of NOS and (2) to identify participant characteristics and teacher education program features related to those understandings. The Views of Nature of Science Questionnaire form C (VNOS-C) was used to survey participant understanding of 7 target aspects of Nature of Science (NOS). A rubric was developed from a review of the literature to categorize and score participant understanding of the target aspects of NOS. Participants' high school and college transcripts, planning guides for their respective teacher education program majors, and science content and science teaching methods course syllabi were examined to identify and categorize participant characteristics and teacher education program features. The R software (R Project for Statistical Computing, 2010) was used to conduct an exploratory analysis to determine correlations of the antecedent and transaction predictor variables with participants' scores on the 7 target aspects of NOS. Fourteen participant characteristics and teacher education program features were moderately and significantly ( p < .01) correlated with participant scores on the target aspects of NOS. The 6 antecedent predictor variables were entered into multiple regression analyses to determine the best-fit model of antecedent predictor variables for each target NOS aspect. The transaction predictor variables were entered into separate multiple regression analyses to determine the best-fit model of transaction predictor variables for each target NOS aspect. Variables from the best-fit antecedent and best-fit transaction models for each target aspect of NOS were then combined. A regression analysis for each of the combined models was conducted to determine the relative effect of these variables on the target aspects of NOS. Findings from the multiple regression analyses revealed that each of the fourteen predictor variables was present in the best-fit model for at least 1 of the 7 target aspects of NOS. However, not all of the predictor variables were statistically significant (p < .007) in the models and their effect (beta) varied. Participants in the teacher education program who had higher ACT Math scores, completed more high school science credits, and were enrolled either in the Middle Childhood with a science concentration program major or in the Adolescent/Young Adult Science Education program major were more likely to have an informed understanding on each of the 7 target aspects of NOS. Analyses of the planning guides and the course syllabi in each teacher education program major revealed differences between the program majors that may account for the results.

  7. [Development of the lung cancer diagnostic system].

    PubMed

    Lv, You-Jiang; Yu, Shou-Yi

    2009-07-01

    To develop a lung cancer diagnosis system. A retrospective analysis was conducted in 1883 patients with primary lung cancer or benign pulmonary diseases (pneumonia, tuberculosis, or pneumonia pseudotumor). SPSS11.5 software was used for data processing. For the relevant factors, a non-factor Logistic regression analysis was used followed by establishment of the regression model. Microsoft Visual Studio 2005 system development platform and VB.Net corresponding language were used to develop the lung cancer diagnosis system. The non-factor multi-factor regression model showed a goodness-of-fit (R2) of the model of 0.806, with a diagnostic accuracy for benign lung diseases of 92.8%, a diagnostic accuracy for lung cancer of 89.0%, and an overall accuracy of 90.8%. The model system for early clinical diagnosis of lung cancer has been established.

  8. Fitting Proportional Odds Models to Educational Data in Ordinal Logistic Regression Using Stata, SAS and SPSS

    ERIC Educational Resources Information Center

    Liu, Xing

    2008-01-01

    The proportional odds (PO) model, which is also called cumulative odds model (Agresti, 1996, 2002 ; Armstrong & Sloan, 1989; Long, 1997, Long & Freese, 2006; McCullagh, 1980; McCullagh & Nelder, 1989; Powers & Xie, 2000; O'Connell, 2006), is one of the most commonly used models for the analysis of ordinal categorical data and comes from the class…

  9. Review and Recommendations for Zero-inflated Count Regression Modeling of Dental Caries Indices in Epidemiological Studies

    PubMed Central

    Stamm, John W.; Long, D. Leann; Kincade, Megan E.

    2012-01-01

    Over the past five to ten years, zero-inflated count regression models have been increasingly applied to the analysis of dental caries indices (e.g., DMFT, dfms, etc). The main reason for that is linked to the broad decline in children’s caries experience, such that dmf and DMF indices more frequently generate low or even zero counts. This article specifically reviews the application of zero-inflated Poisson and zero-inflated negative binomial regression models to dental caries, with emphasis on the description of the models and the interpretation of fitted model results given the study goals. The review finds that interpretations provided in the published caries research are often imprecise or inadvertently misleading, particularly with respect to failing to discriminate between inference for the class of susceptible persons defined by such models and inference for the sampled population in terms of overall exposure effects. Recommendations are provided to enhance the use as well as the interpretation and reporting of results of count regression models when applied to epidemiological studies of dental caries. PMID:22710271

  10. Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications

    PubMed Central

    Qian, Guoqi; Wu, Yuehua; Ferrari, Davide; Qiao, Puxue; Hollande, Frédéric

    2016-01-01

    Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method. PMID:27212939

  11. Modeling Antimicrobial Activity of Clorox(R) Using an Agar-Diffusion Test: A New Twist On an Old Experiment.

    ERIC Educational Resources Information Center

    Mitchell, James K.; Carter, William E.

    2000-01-01

    Describes using a computer statistical software package called Minitab to model the sensitivity of several microbes to the disinfectant NaOCl (Clorox') using the Kirby-Bauer technique. Each group of students collects data from one microbe, conducts regression analyses, then chooses the best-fit model based on the highest r-values obtained.…

  12. Bootstrap evaluation of a young Douglas-fir height growth model for the Pacific Northwest

    Treesearch

    Nicholas R. Vaughn; Eric C. Turnblom; Martin W. Ritchie

    2010-01-01

    We evaluated the stability of a complex regression model developed to predict the annual height growth of young Douglas-fir. This model is highly nonlinear and is fit in an iterative manner for annual growth coefficients from data with multiple periodic remeasurement intervals. The traditional methods for such a sensitivity analysis either involve laborious math or...

  13. 40 CFR 80.48 - Augmentation of the complex emission model by vehicle testing.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... section, the analysis shall fit a regression model to a combined data set that includes vehicle testing... logarithm of emissions contained in this combined data set: (A) A term for each vehicle that shall reflect... nearest limit of the data core, using the unaugmented complex model. (B) “B” shall be set equal to the...

  14. 40 CFR 80.48 - Augmentation of the complex emission model by vehicle testing.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... section, the analysis shall fit a regression model to a combined data set that includes vehicle testing... logarithm of emissions contained in this combined data set: (A) A term for each vehicle that shall reflect... nearest limit of the data core, using the unaugmented complex model. (B) “B” shall be set equal to the...

  15. 40 CFR 80.48 - Augmentation of the complex emission model by vehicle testing.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... section, the analysis shall fit a regression model to a combined data set that includes vehicle testing... logarithm of emissions contained in this combined data set: (A) A term for each vehicle that shall reflect... nearest limit of the data core, using the unaugmented complex model. (B) “B” shall be set equal to the...

  16. 40 CFR 80.48 - Augmentation of the complex emission model by vehicle testing.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... section, the analysis shall fit a regression model to a combined data set that includes vehicle testing... logarithm of emissions contained in this combined data set: (A) A term for each vehicle that shall reflect... nearest limit of the data core, using the unaugmented complex model. (B) “B” shall be set equal to the...

  17. 40 CFR 80.48 - Augmentation of the complex emission model by vehicle testing.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... section, the analysis shall fit a regression model to a combined data set that includes vehicle testing... logarithm of emissions contained in this combined data set: (A) A term for each vehicle that shall reflect... nearest limit of the data core, using the unaugmented complex model. (B) “B” shall be set equal to the...

  18. Predicting recreational water quality advisories: A comparison of statistical methods

    USGS Publications Warehouse

    Brooks, Wesley R.; Corsi, Steven R.; Fienen, Michael N.; Carvin, Rebecca B.

    2016-01-01

    Epidemiological studies indicate that fecal indicator bacteria (FIB) in beach water are associated with illnesses among people having contact with the water. In order to mitigate public health impacts, many beaches are posted with an advisory when the concentration of FIB exceeds a beach action value. The most commonly used method of measuring FIB concentration takes 18–24 h before returning a result. In order to avoid the 24 h lag, it has become common to ”nowcast” the FIB concentration using statistical regressions on environmental surrogate variables. Most commonly, nowcast models are estimated using ordinary least squares regression, but other regression methods from the statistical and machine learning literature are sometimes used. This study compares 14 regression methods across 7 Wisconsin beaches to identify which consistently produces the most accurate predictions. A random forest model is identified as the most accurate, followed by multiple regression fit using the adaptive LASSO.

  19. What are hierarchical models and how do we analyze them?

    USGS Publications Warehouse

    Royle, Andy

    2016-01-01

    In this chapter we provide a basic definition of hierarchical models and introduce the two canonical hierarchical models in this book: site occupancy and N-mixture models. The former is a hierarchical extension of logistic regression and the latter is a hierarchical extension of Poisson regression. We introduce basic concepts of probability modeling and statistical inference including likelihood and Bayesian perspectives. We go through the mechanics of maximizing the likelihood and characterizing the posterior distribution by Markov chain Monte Carlo (MCMC) methods. We give a general perspective on topics such as model selection and assessment of model fit, although we demonstrate these topics in practice in later chapters (especially Chapters 5, 6, 7, and 10 Chapter 5 Chapter 6 Chapter 7 Chapter 10)

  20. Analysis of volumetric response of pituitary adenomas receiving adjuvant CyberKnife stereotactic radiosurgery with the application of an exponential fitting model.

    PubMed

    Yu, Yi-Lin; Yang, Yun-Ju; Lin, Chin; Hsieh, Chih-Chuan; Li, Chiao-Zhu; Feng, Shao-Wei; Tang, Chi-Tun; Chung, Tzu-Tsao; Ma, Hsin-I; Chen, Yuan-Hao; Ju, Da-Tong; Hueng, Dueng-Yuan

    2017-01-01

    Tumor control rates of pituitary adenomas (PAs) receiving adjuvant CyberKnife stereotactic radiosurgery (CK SRS) are high. However, there is currently no uniform way to estimate the time course of the disease. The aim of this study was to analyze the volumetric responses of PAs after CK SRS and investigate the application of an exponential decay model in calculating an accurate time course and estimation of the eventual outcome.A retrospective review of 34 patients with PAs who received adjuvant CK SRS between 2006 and 2013 was performed. Tumor volume was calculated using the planimetric method. The percent change in tumor volume and tumor volume rate of change were compared at median 4-, 10-, 20-, and 36-month intervals. Tumor responses were classified as: progression for >15% volume increase, regression for ≤15% decrease, and stabilization for ±15% of the baseline volume at the time of last follow-up. For each patient, the volumetric change versus time was fitted with an exponential model.The overall tumor control rate was 94.1% in the 36-month (range 18-87 months) follow-up period (mean volume change of -43.3%). Volume regression (mean decrease of -50.5%) was demonstrated in 27 (79%) patients, tumor stabilization (mean change of -3.7%) in 5 (15%) patients, and tumor progression (mean increase of 28.1%) in 2 (6%) patients (P = 0.001). Tumors that eventually regressed or stabilized had a temporary volume increase of 1.07% and 41.5% at 4 months after CK SRS, respectively (P = 0.017). The tumor volume estimated using the exponential fitting equation demonstrated high positive correlation with the actual volume calculated by magnetic resonance imaging (MRI) as tested by Pearson correlation coefficient (0.9).Transient progression of PAs post-CK SRS was seen in 62.5% of the patients receiving CK SRS, and it was not predictive of eventual volume regression or progression. A three-point exponential model is of potential predictive value according to relative distribution. An exponential decay model can be used to calculate the time course of tumors that are ultimately controlled.

  1. Prediction of Compressional Wave Velocity Using Regression and Neural Network Modeling and Estimation of Stress Orientation in Bokaro Coalfield, India

    NASA Astrophysics Data System (ADS)

    Paul, Suman; Ali, Muhammad; Chatterjee, Rima

    2018-01-01

    Velocity of compressional wave ( V P) of coal and non-coal lithology is predicted from five wells from the Bokaro coalfield (CF), India. Shear sonic travel time logs are not recorded for all wells under the study area. Shear wave velocity ( Vs) is available only for two wells: one from east and other from west Bokaro CF. The major lithologies of this CF are dominated by coal, shaly coal of Barakar formation. This paper focuses on the (a) relationship between Vp and Vs, (b) prediction of Vp using regression and neural network modeling and (c) estimation of maximum horizontal stress from image log. Coal characterizes with low acoustic impedance (AI) as compared to the overlying and underlying strata. The cross-plot between AI and Vp/ Vs is able to identify coal, shaly coal, shale and sandstone from wells in Bokaro CF. The relationship between Vp and Vs is obtained with excellent goodness of fit ( R 2) ranging from 0.90 to 0.93. Linear multiple regression and multi-layered feed-forward neural network (MLFN) models are developed for prediction Vp from two wells using four input log parameters: gamma ray, resistivity, bulk density and neutron porosity. Regression model predicted Vp shows poor fit (from R 2 = 0.28) to good fit ( R 2 = 0.79) with the observed velocity. MLFN model predicted Vp indicates satisfactory to good R2 values varying from 0.62 to 0.92 with the observed velocity. Maximum horizontal stress orientation from a well at west Bokaro CF is studied from Formation Micro-Imager (FMI) log. Breakouts and drilling-induced fractures (DIFs) are identified from the FMI log. Breakout length of 4.5 m is oriented towards N60°W whereas the orientation of DIFs for a cumulative length of 26.5 m is varying from N15°E to N35°E. The mean maximum horizontal stress in this CF is towards N28°E.

  2. Detecting outliers when fitting data with nonlinear regression – a new method based on robust nonlinear regression and the false discovery rate

    PubMed Central

    Motulsky, Harvey J; Brown, Ronald E

    2006-01-01

    Background Nonlinear regression, like linear regression, assumes that the scatter of data around the ideal curve follows a Gaussian or normal distribution. This assumption leads to the familiar goal of regression: to minimize the sum of the squares of the vertical or Y-value distances between the points and the curve. Outliers can dominate the sum-of-the-squares calculation, and lead to misleading results. However, we know of no practical method for routinely identifying outliers when fitting curves with nonlinear regression. Results We describe a new method for identifying outliers when fitting data with nonlinear regression. We first fit the data using a robust form of nonlinear regression, based on the assumption that scatter follows a Lorentzian distribution. We devised a new adaptive method that gradually becomes more robust as the method proceeds. To define outliers, we adapted the false discovery rate approach to handling multiple comparisons. We then remove the outliers, and analyze the data using ordinary least-squares regression. Because the method combines robust regression and outlier removal, we call it the ROUT method. When analyzing simulated data, where all scatter is Gaussian, our method detects (falsely) one or more outlier in only about 1–3% of experiments. When analyzing data contaminated with one or several outliers, the ROUT method performs well at outlier identification, with an average False Discovery Rate less than 1%. Conclusion Our method, which combines a new method of robust nonlinear regression with a new method of outlier identification, identifies outliers from nonlinear curve fits with reasonable power and few false positives. PMID:16526949

  3. A spatially explicit approach to the study of socio-demographic inequality in the spatial distribution of trees across Boston neighborhoods.

    PubMed

    Duncan, Dustin T; Kawachi, Ichiro; Kum, Susan; Aldstadt, Jared; Piras, Gianfranco; Matthews, Stephen A; Arbia, Giuseppe; Castro, Marcia C; White, Kellee; Williams, David R

    2014-04-01

    The racial/ethnic and income composition of neighborhoods often influences local amenities, including the potential spatial distribution of trees, which are important for population health and community wellbeing, particularly in urban areas. This ecological study used spatial analytical methods to assess the relationship between neighborhood socio-demographic characteristics (i.e. minority racial/ethnic composition and poverty) and tree density at the census tact level in Boston, Massachusetts (US). We examined spatial autocorrelation with the Global Moran's I for all study variables and in the ordinary least squares (OLS) regression residuals as well as computed Spearman correlations non-adjusted and adjusted for spatial autocorrelation between socio-demographic characteristics and tree density. Next, we fit traditional regressions (i.e. OLS regression models) and spatial regressions (i.e. spatial simultaneous autoregressive models), as appropriate. We found significant positive spatial autocorrelation for all neighborhood socio-demographic characteristics (Global Moran's I range from 0.24 to 0.86, all P =0.001), for tree density (Global Moran's I =0.452, P =0.001), and in the OLS regression residuals (Global Moran's I range from 0.32 to 0.38, all P <0.001). Therefore, we fit the spatial simultaneous autoregressive models. There was a negative correlation between neighborhood percent non-Hispanic Black and tree density (r S =-0.19; conventional P -value=0.016; spatially adjusted P -value=0.299) as well as a negative correlation between predominantly non-Hispanic Black (over 60% Black) neighborhoods and tree density (r S =-0.18; conventional P -value=0.019; spatially adjusted P -value=0.180). While the conventional OLS regression model found a marginally significant inverse relationship between Black neighborhoods and tree density, we found no statistically significant relationship between neighborhood socio-demographic composition and tree density in the spatial regression models. Methodologically, our study suggests the need to take into account spatial autocorrelation as findings/conclusions can change when the spatial autocorrelation is ignored. Substantively, our findings suggest no need for policy intervention vis-à-vis trees in Boston, though we hasten to add that replication studies, and more nuanced data on tree quality, age and diversity are needed.

  4. Evaluating abundance and trends in a Hawaiian avian community using state-space analysis

    USGS Publications Warehouse

    Camp, Richard J.; Brinck, Kevin W.; Gorresen, P.M.; Paxton, Eben H.

    2016-01-01

    Estimating population abundances and patterns of change over time are important in both ecology and conservation. Trend assessment typically entails fitting a regression to a time series of abundances to estimate population trajectory. However, changes in abundance estimates from year-to-year across time are due to both true variation in population size (process variation) and variation due to imperfect sampling and model fit. State-space models are a relatively new method that can be used to partition the error components and quantify trends based only on process variation. We compare a state-space modelling approach with a more traditional linear regression approach to assess trends in uncorrected raw counts and detection-corrected abundance estimates of forest birds at Hakalau Forest National Wildlife Refuge, Hawai‘i. Most species demonstrated similar trends using either method. In general, evidence for trends using state-space models was less strong than for linear regression, as measured by estimates of precision. However, while the state-space models may sacrifice precision, the expectation is that these estimates provide a better representation of the real world biological processes of interest because they are partitioning process variation (environmental and demographic variation) and observation variation (sampling and model variation). The state-space approach also provides annual estimates of abundance which can be used by managers to set conservation strategies, and can be linked to factors that vary by year, such as climate, to better understand processes that drive population trends.

  5. Kepler AutoRegressive Planet Search: Motivation & Methodology

    NASA Astrophysics Data System (ADS)

    Caceres, Gabriel; Feigelson, Eric; Jogesh Babu, G.; Bahamonde, Natalia; Bertin, Karine; Christen, Alejandra; Curé, Michel; Meza, Cristian

    2015-08-01

    The Kepler AutoRegressive Planet Search (KARPS) project uses statistical methodology associated with autoregressive (AR) processes to model Kepler lightcurves in order to improve exoplanet transit detection in systems with high stellar variability. We also introduce a planet-search algorithm to detect transits in time-series residuals after application of the AR models. One of the main obstacles in detecting faint planetary transits is the intrinsic stellar variability of the host star. The variability displayed by many stars may have autoregressive properties, wherein later flux values are correlated with previous ones in some manner. Auto-Regressive Moving-Average (ARMA) models, Generalized Auto-Regressive Conditional Heteroskedasticity (GARCH), and related models are flexible, phenomenological methods used with great success to model stochastic temporal behaviors in many fields of study, particularly econometrics. Powerful statistical methods are implemented in the public statistical software environment R and its many packages. Modeling involves maximum likelihood fitting, model selection, and residual analysis. These techniques provide a useful framework to model stellar variability and are used in KARPS with the objective of reducing stellar noise to enhance opportunities to find as-yet-undiscovered planets. Our analysis procedure consisting of three steps: pre-processing of the data to remove discontinuities, gaps and outliers; ARMA-type model selection and fitting; and transit signal search of the residuals using a new Transit Comb Filter (TCF) that replaces traditional box-finding algorithms. We apply the procedures to simulated Kepler-like time series with known stellar and planetary signals to evaluate the effectiveness of the KARPS procedures. The ARMA-type modeling is effective at reducing stellar noise, but also reduces and transforms the transit signal into ingress/egress spikes. A periodogram based on the TCF is constructed to concentrate the signal of these periodic spikes. When a periodic transit is found, the model is displayed on a standard period-folded averaged light curve. We also illustrate the efficient coding in R.

  6. Older driver fitness-to-drive evaluation using naturalistic driving data.

    PubMed

    Guo, Feng; Fang, Youjia; Antin, Jonathan F

    2015-09-01

    As our driving population continues to age, it is becoming increasingly important to find a small set of easily administered fitness metrics that can meaningfully and reliably identify at-risk seniors requiring more in-depth evaluation of their driving skills and weaknesses. Sixty driver assessment metrics related to fitness-to-drive were examined for 20 seniors who were followed for a year using the naturalistic driving paradigm. Principal component analysis and negative binomial regression modeling approaches were used to develop parsimonious models relating the most highly predictive of the driver assessment metrics to the safety-related outcomes observed in the naturalistic driving data. This study provides important confirmation using naturalistic driving methods of the relationship between contrast sensitivity and crash-related events. The results of this study provide crucial information on the continuing journey to identify metrics and protocols that could be applied to determine seniors' fitness to drive. Published by Elsevier Ltd.

  7. GWAS with longitudinal phenotypes: performance of approximate procedures

    PubMed Central

    Sikorska, Karolina; Montazeri, Nahid Mostafavi; Uitterlinden, André; Rivadeneira, Fernando; Eilers, Paul HC; Lesaffre, Emmanuel

    2015-01-01

    Analysis of genome-wide association studies with longitudinal data using standard procedures, such as linear mixed model (LMM) fitting, leads to discouragingly long computation times. There is a need to speed up the computations significantly. In our previous work (Sikorska et al: Fast linear mixed model computations for genome-wide association studies with longitudinal data. Stat Med 2012; 32.1: 165–180), we proposed the conditional two-step (CTS) approach as a fast method providing an approximation to the P-value for the longitudinal single-nucleotide polymorphism (SNP) effect. In the first step a reduced conditional LMM is fit, omitting all the SNP terms. In the second step, the estimated random slopes are regressed on SNPs. The CTS has been applied to the bone mineral density data from the Rotterdam Study and proved to work very well even in unbalanced situations. In another article (Sikorska et al: GWAS on your notebook: fast semi-parallel linear and logistic regression for genome-wide association studies. BMC Bioinformatics 2013; 14: 166), we suggested semi-parallel computations, greatly speeding up fitting many linear regressions. Combining CTS with fast linear regression reduces the computation time from several weeks to a few minutes on a single computer. Here, we explore further the properties of the CTS both analytically and by simulations. We investigate the performance of our proposal in comparison with a related but different approach, the two-step procedure. It is analytically shown that for the balanced case, under mild assumptions, the P-value provided by the CTS is the same as from the LMM. For unbalanced data and in realistic situations, simulations show that the CTS method does not inflate the type I error rate and implies only a minimal loss of power. PMID:25712081

  8. Multivariate prediction of upper limb prosthesis acceptance or rejection.

    PubMed

    Biddiss, Elaine A; Chau, Tom T

    2008-07-01

    To develop a model for prediction of upper limb prosthesis use or rejection. A questionnaire exploring factors in prosthesis acceptance was distributed internationally to individuals with upper limb absence through community-based support groups and rehabilitation hospitals. A total of 191 participants (59 prosthesis rejecters and 132 prosthesis wearers) were included in this study. A logistic regression model, a C5.0 decision tree, and a radial basis function neural network were developed and compared in terms of sensitivity (prediction of prosthesis rejecters), specificity (prediction of prosthesis wearers), and overall cross-validation accuracy. The logistic regression and neural network provided comparable overall accuracies of approximately 84 +/- 3%, specificity of 93%, and sensitivity of 61%. Fitting time-frame emerged as the predominant predictor. Individuals fitted within two years of birth (congenital) or six months of amputation (acquired) were 16 times more likely to continue prosthesis use. To increase rates of prosthesis acceptance, clinical directives should focus on timely, client-centred fitting strategies and the development of improved prostheses and healthcare for individuals with high-level or bilateral limb absence. Multivariate analyses are useful in determining the relative importance of the many factors involved in prosthesis acceptance and rejection.

  9. Lead bioaccumulation in Texas Harvester Ants (Pogonomyrmex barbatus) and toxicological implications for Texas Horned Lizard (Phrynosoma cornutum) populations of Bexar County, Texas.

    PubMed

    Burgess, Robert; Davis, Robert; Edwards, Deborah

    2018-03-01

    Uptake of lead from soil was examined in order to establish a site-specific ecological protective concentration level for the Texas Horned Lizard (Phrynosoma cornutum) at the Former Humble Refinery in San Antonio, Texas. Soils, harvester ants, and rinse water from the ants were analyzed at 11 Texas Harvester Ant (Pogonomyrmex barbatus) mounds. Soil concentrations at the harvester ant mounds ranged from 13 to 7474 mg/kg of lead dry weight. Ant tissue sample concentrations ranged from < 0.82 to 21.17 mg/kg dry weight. Rinse water concentrations were below the reporting limit in the majority of samples. Two uptake models were developed for the ants. A bioaccumulation factor model did not fit the data, as there was a strong decay in the calculated value with rising soil concentrations. A univariate natural log-transformed regression model produced a significant regression (p < .0001) with a high coefficient of determination (0.82), indicating a good fit to the data. Other diagnostic regression statistics indicated that the regression model could be reliably used to predict concentrations of lead in harvester ants from soil concentrations. Estimates of protective levels for P. cornutum were developed using published sub-chronic toxicological findings for the Western Fence Lizard that were allometricly adapted and compared to the Dose oral equation, which estimated lead consumed through ants plus incidental soil ingestion. The no observed adverse effect level toxicological limit for P. cornutum was estimated to be 5500 mg/kg.

  10. Cost-of-illness studies based on massive data: a prevalence-based, top-down regression approach.

    PubMed

    Stollenwerk, Björn; Welchowski, Thomas; Vogl, Matthias; Stock, Stephanie

    2016-04-01

    Despite the increasing availability of routine data, no analysis method has yet been presented for cost-of-illness (COI) studies based on massive data. We aim, first, to present such a method and, second, to assess the relevance of the associated gain in numerical efficiency. We propose a prevalence-based, top-down regression approach consisting of five steps: aggregating the data; fitting a generalized additive model (GAM); predicting costs via the fitted GAM; comparing predicted costs between prevalent and non-prevalent subjects; and quantifying the stochastic uncertainty via error propagation. To demonstrate the method, it was applied to aggregated data in the context of chronic lung disease to German sickness funds data (from 1999), covering over 7.3 million insured. To assess the gain in numerical efficiency, the computational time of the innovative approach has been compared with corresponding GAMs applied to simulated individual-level data. Furthermore, the probability of model failure was modeled via logistic regression. Applying the innovative method was reasonably fast (19 min). In contrast, regarding patient-level data, computational time increased disproportionately by sample size. Furthermore, using patient-level data was accompanied by a substantial risk of model failure (about 80 % for 6 million subjects). The gain in computational efficiency of the innovative COI method seems to be of practical relevance. Furthermore, it may yield more precise cost estimates.

  11. Body Composition of Bangladeshi Children: Comparison and Development of Leg-to-Leg Bioelectrical Impedance Equation

    PubMed Central

    Khan, I.; Hawlader, Sophie Mohammad Delwer Hossain; Arifeen, Shams El; Moore, Sophie; Hills, Andrew P.; Wells, Jonathan C.; Persson, Lars-Åke; Kabir, Iqbal

    2012-01-01

    The aim of this study was to investigate the validity of the Tanita TBF 300A leg-to-leg bioimpedance analyzer for estimating fat-free mass (FFM) in Bangladeshi children aged 4-10 years and to develop novel prediction equations for use in this population, using deuterium dilution as the reference method. Two hundred Bangladeshi children were enrolled. The isotope dilution technique with deuterium oxide was used for estimation of total body water (TBW). FFM estimated by Tanita was compared with results of deuterium oxide dilution technique. Novel prediction equations were created for estimating FFM, using linear regression models, fitting child's height and impedance as predictors. There was a significant difference in FFM and percentage of body fat (BF%) between methods (p<0.01), Tanita underestimating TBW in boys (p=0.001) and underestimating BF% in girls (p<0.001). A basic linear regression model with height and impedance explained 83% of the variance in FFM estimated by deuterium oxide dilution technique. The best-fit equation to predict FFM from linear regression modelling was achieved by adding weight, sex, and age to the basic model, bringing the adjusted R2 to 89% (standard error=0.90, p<0.001). These data suggest Tanita analyzer may be a valid field-assessment technique in Bangladeshi children when using population-specific prediction equations, such as the ones developed here. PMID:23082630

  12. Carbon emissions risk map from deforestation in the tropical Amazon

    NASA Astrophysics Data System (ADS)

    Ometto, J.; Soler, L. S.; Assis, T. D.; Oliveira, P. V.; Aguiar, A. P.

    2011-12-01

    Assis, Pedro Valle This work aims to estimate the carbon emissions from tropical deforestation in the Brazilian Amazon associated to the risk assessment of future land use change. The emissions are estimated by incorporating temporal deforestation dynamics, accounting for the biophysical and socioeconomic heterogeneity in the region, as well secondary forest growth dynamic in abandoned areas. The land cover change model that supported the risk assessment of deforestation, was run based on linear regressions. This method takes into account spatial heterogeneity of deforestation as the spatial variables adopted to fit the final regression model comprise: environmental aspects, economic attractiveness, accessibility and land tenure structure. After fitting a suitable regression models for each land cover category, the potential of each cell to be deforested (25x25km and 5x5 km of resolution) in the near future was used to calculate the risk assessment of land cover change. The carbon emissions model combines high-resolution new forest clear-cut mapping and four alternative sources of spatial information on biomass distribution for different vegetation types. The risk assessment map of CO2 emissions, was obtained by crossing the simulation results of the historical land cover changes to a map of aboveground biomass contained in the remaining forest. This final map represents the risk of CO2 emissions at 25x25km and 5x5 km until 2020, under a scenario of carbon emission reduction target.

  13. Meta-regression analysis of the effect of trans fatty acids on low-density lipoprotein cholesterol.

    PubMed

    Allen, Bruce C; Vincent, Melissa J; Liska, DeAnn; Haber, Lynne T

    2016-12-01

    We conducted a meta-regression of controlled clinical trial data to investigate quantitatively the relationship between dietary intake of industrial trans fatty acids (iTFA) and increased low-density lipoprotein cholesterol (LDL-C). Previous regression analyses included insufficient data to determine the nature of the dose response in the low-dose region and have nonetheless assumed a linear relationship between iTFA intake and LDL-C levels. This work contributes to the previous work by 1) including additional studies examining low-dose intake (identified using an evidence mapping procedure); 2) investigating a range of curve shapes, including both linear and nonlinear models; and 3) using Bayesian meta-regression to combine results across trials. We found that, contrary to previous assumptions, the linear model does not acceptably fit the data, while the nonlinear, S-shaped Hill model fits the data well. Based on a conservative estimate of the degree of intra-individual variability in LDL-C (0.1 mmoL/L), as an estimate of a change in LDL-C that is not adverse, a change in iTFA intake of 2.2% of energy intake (%en) (corresponding to a total iTFA intake of 2.2-2.9%en) does not cause adverse effects on LDL-C. The iTFA intake associated with this change in LDL-C is substantially higher than the average iTFA intake (0.5%en). Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  14. A novel variational Bayes multiple locus Z-statistic for genome-wide association studies with Bayesian model averaging

    PubMed Central

    Logsdon, Benjamin A.; Carty, Cara L.; Reiner, Alexander P.; Dai, James Y.; Kooperberg, Charles

    2012-01-01

    Motivation: For many complex traits, including height, the majority of variants identified by genome-wide association studies (GWAS) have small effects, leaving a significant proportion of the heritable variation unexplained. Although many penalized multiple regression methodologies have been proposed to increase the power to detect associations for complex genetic architectures, they generally lack mechanisms for false-positive control and diagnostics for model over-fitting. Our methodology is the first penalized multiple regression approach that explicitly controls Type I error rates and provide model over-fitting diagnostics through a novel normally distributed statistic defined for every marker within the GWAS, based on results from a variational Bayes spike regression algorithm. Results: We compare the performance of our method to the lasso and single marker analysis on simulated data and demonstrate that our approach has superior performance in terms of power and Type I error control. In addition, using the Women's Health Initiative (WHI) SNP Health Association Resource (SHARe) GWAS of African-Americans, we show that our method has power to detect additional novel associations with body height. These findings replicate by reaching a stringent cutoff of marginal association in a larger cohort. Availability: An R-package, including an implementation of our variational Bayes spike regression (vBsr) algorithm, is available at http://kooperberg.fhcrc.org/soft.html. Contact: blogsdon@fhcrc.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22563072

  15. Impact of weather factors on hand, foot and mouth disease, and its role in short-term incidence trend forecast in Huainan City, Anhui Province.

    PubMed

    Zhao, Desheng; Wang, Lulu; Cheng, Jian; Xu, Jun; Xu, Zhiwei; Xie, Mingyu; Yang, Huihui; Li, Kesheng; Wen, Lingying; Wang, Xu; Zhang, Heng; Wang, Shusi; Su, Hong

    2017-03-01

    Hand, foot, and mouth disease (HFMD) is one of the most common communicable diseases in China, and current climate change had been recognized as a significant contributor. Nevertheless, no reliable models have been put forward to predict the dynamics of HFMD cases based on short-term weather variations. The present study aimed to examine the association between weather factors and HFMD, and to explore the accuracy of seasonal auto-regressive integrated moving average (SARIMA) model with local weather conditions in forecasting HFMD. Weather and HFMD data from 2009 to 2014 in Huainan, China, were used. Poisson regression model combined with a distributed lag non-linear model (DLNM) was applied to examine the relationship between weather factors and HFMD. The forecasting model for HFMD was performed by using the SARIMA model. The results showed that temperature rise was significantly associated with an elevated risk of HFMD. Yet, no correlations between relative humidity, barometric pressure and rainfall, and HFMD were observed. SARIMA models with temperature variable fitted HFMD data better than the model without it (sR 2 increased, while the BIC decreased), and the SARIMA (0, 1, 1)(0, 1, 0) 52 offered the best fit for HFMD data. In addition, compared with females and nursery children, males and scattered children may be more suitable for using SARIMA model to predict the number of HFMD cases and it has high precision. In conclusion, high temperature could increase the risk of contracting HFMD. SARIMA model with temperature variable can effectively improve its forecast accuracy, which can provide valuable information for the policy makers and public health to construct a best-fitting model and optimize HFMD prevention.

  16. Application of seasonal auto-regressive integrated moving average model in forecasting the incidence of hand-foot-mouth disease in Wuhan, China.

    PubMed

    Peng, Ying; Yu, Bin; Wang, Peng; Kong, De-Guang; Chen, Bang-Hua; Yang, Xiao-Bing

    2017-12-01

    Outbreaks of hand-foot-mouth disease (HFMD) have occurred many times and caused serious health burden in China since 2008. Application of modern information technology to prediction and early response can be helpful for efficient HFMD prevention and control. A seasonal auto-regressive integrated moving average (ARIMA) model for time series analysis was designed in this study. Eighty-four-month (from January 2009 to December 2015) retrospective data obtained from the Chinese Information System for Disease Prevention and Control were subjected to ARIMA modeling. The coefficient of determination (R 2 ), normalized Bayesian Information Criterion (BIC) and Q-test P value were used to evaluate the goodness-of-fit of constructed models. Subsequently, the best-fitted ARIMA model was applied to predict the expected incidence of HFMD from January 2016 to December 2016. The best-fitted seasonal ARIMA model was identified as (1,0,1)(0,1,1) 12 , with the largest coefficient of determination (R 2 =0.743) and lowest normalized BIC (BIC=3.645) value. The residuals of the model also showed non-significant autocorrelations (P Box-Ljung (Q) =0.299). The predictions by the optimum ARIMA model adequately captured the pattern in the data and exhibited two peaks of activity over the forecast interval, including a major peak during April to June, and again a light peak for September to November. The ARIMA model proposed in this study can forecast HFMD incidence trend effectively, which could provide useful support for future HFMD prevention and control in the study area. Besides, further observations should be added continually into the modeling data set, and parameters of the models should be adjusted accordingly.

  17. Impact of weather factors on hand, foot and mouth disease, and its role in short-term incidence trend forecast in Huainan City, Anhui Province

    NASA Astrophysics Data System (ADS)

    Zhao, Desheng; Wang, Lulu; Cheng, Jian; Xu, Jun; Xu, Zhiwei; Xie, Mingyu; Yang, Huihui; Li, Kesheng; Wen, Lingying; Wang, Xu; Zhang, Heng; Wang, Shusi; Su, Hong

    2017-03-01

    Hand, foot, and mouth disease (HFMD) is one of the most common communicable diseases in China, and current climate change had been recognized as a significant contributor. Nevertheless, no reliable models have been put forward to predict the dynamics of HFMD cases based on short-term weather variations. The present study aimed to examine the association between weather factors and HFMD, and to explore the accuracy of seasonal auto-regressive integrated moving average (SARIMA) model with local weather conditions in forecasting HFMD. Weather and HFMD data from 2009 to 2014 in Huainan, China, were used. Poisson regression model combined with a distributed lag non-linear model (DLNM) was applied to examine the relationship between weather factors and HFMD. The forecasting model for HFMD was performed by using the SARIMA model. The results showed that temperature rise was significantly associated with an elevated risk of HFMD. Yet, no correlations between relative humidity, barometric pressure and rainfall, and HFMD were observed. SARIMA models with temperature variable fitted HFMD data better than the model without it (s R 2 increased, while the BIC decreased), and the SARIMA (0, 1, 1)(0, 1, 0)52 offered the best fit for HFMD data. In addition, compared with females and nursery children, males and scattered children may be more suitable for using SARIMA model to predict the number of HFMD cases and it has high precision. In conclusion, high temperature could increase the risk of contracting HFMD. SARIMA model with temperature variable can effectively improve its forecast accuracy, which can provide valuable information for the policy makers and public health to construct a best-fitting model and optimize HFMD prevention.

  18. [Prediction of schistosomiasis infection rates of population based on ARIMA-NARNN model].

    PubMed

    Ke-Wei, Wang; Yu, Wu; Jin-Ping, Li; Yu-Yu, Jiang

    2016-07-12

    To explore the effect of the autoregressive integrated moving average model-nonlinear auto-regressive neural network (ARIMA-NARNN) model on predicting schistosomiasis infection rates of population. The ARIMA model, NARNN model and ARIMA-NARNN model were established based on monthly schistosomiasis infection rates from January 2005 to February 2015 in Jiangsu Province, China. The fitting and prediction performances of the three models were compared. Compared to the ARIMA model and NARNN model, the mean square error (MSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) of the ARIMA-NARNN model were the least with the values of 0.011 1, 0.090 0 and 0.282 4, respectively. The ARIMA-NARNN model could effectively fit and predict schistosomiasis infection rates of population, which might have a great application value for the prevention and control of schistosomiasis.

  19. Fast and exact Newton and Bidirectional fitting of Active Appearance Models.

    PubMed

    Kossaifi, Jean; Tzimiropoulos, Yorgos; Pantic, Maja

    2016-12-21

    Active Appearance Models (AAMs) are generative models of shape and appearance that have proven very attractive for their ability to handle wide changes in illumination, pose and occlusion when trained in the wild, while not requiring large training dataset like regression-based or deep learning methods. The problem of fitting an AAM is usually formulated as a non-linear least squares one and the main way of solving it is a standard Gauss-Newton algorithm. In this paper we extend Active Appearance Models in two ways: we first extend the Gauss-Newton framework by formulating a bidirectional fitting method that deforms both the image and the template to fit a new instance. We then formulate a second order method by deriving an efficient Newton method for AAMs fitting. We derive both methods in a unified framework for two types of Active Appearance Models, holistic and part-based, and additionally show how to exploit the structure in the problem to derive fast yet exact solutions. We perform a thorough evaluation of all algorithms on three challenging and recently annotated inthe- wild datasets, and investigate fitting accuracy, convergence properties and the influence of noise in the initialisation. We compare our proposed methods to other algorithms and show that they yield state-of-the-art results, out-performing other methods while having superior convergence properties.

  20. Determining factors influencing survival of breast cancer by fuzzy logistic regression model.

    PubMed

    Nikbakht, Roya; Bahrampour, Abbas

    2017-01-01

    Fuzzy logistic regression model can be used for determining influential factors of disease. This study explores the important factors of actual predictive survival factors of breast cancer's patients. We used breast cancer data which collected by cancer registry of Kerman University of Medical Sciences during the period of 2000-2007. The variables such as morphology, grade, age, and treatments (surgery, radiotherapy, and chemotherapy) were applied in the fuzzy logistic regression model. Performance of model was determined in terms of mean degree of membership (MDM). The study results showed that almost 41% of patients were in neoplasm and malignant group and more than two-third of them were still alive after 5-year follow-up. Based on the fuzzy logistic model, the most important factors influencing survival were chemotherapy, morphology, and radiotherapy, respectively. Furthermore, the MDM criteria show that the fuzzy logistic regression have a good fit on the data (MDM = 0.86). Fuzzy logistic regression model showed that chemotherapy is more important than radiotherapy in survival of patients with breast cancer. In addition, another ability of this model is calculating possibilistic odds of survival in cancer patients. The results of this study can be applied in clinical research. Furthermore, there are few studies which applied the fuzzy logistic models. Furthermore, we recommend using this model in various research areas.

  1. High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics

    PubMed Central

    Carvalho, Carlos M.; Chang, Jeffrey; Lucas, Joseph E.; Nevins, Joseph R.; Wang, Quanli; West, Mike

    2010-01-01

    We describe studies in molecular profiling and biological pathway analysis that use sparse latent factor and regression models for microarray gene expression data. We discuss breast cancer applications and key aspects of the modeling and computational methodology. Our case studies aim to investigate and characterize heterogeneity of structure related to specific oncogenic pathways, as well as links between aggregate patterns in gene expression profiles and clinical biomarkers. Based on the metaphor of statistically derived “factors” as representing biological “subpathway” structure, we explore the decomposition of fitted sparse factor models into pathway subcomponents and investigate how these components overlay multiple aspects of known biological activity. Our methodology is based on sparsity modeling of multivariate regression, ANOVA, and latent factor models, as well as a class of models that combines all components. Hierarchical sparsity priors address questions of dimension reduction and multiple comparisons, as well as scalability of the methodology. The models include practically relevant non-Gaussian/nonparametric components for latent structure, underlying often quite complex non-Gaussianity in multivariate expression patterns. Model search and fitting are addressed through stochastic simulation and evolutionary stochastic search methods that are exemplified in the oncogenic pathway studies. Supplementary supporting material provides more details of the applications, as well as examples of the use of freely available software tools for implementing the methodology. PMID:21218139

  2. A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data.

    PubMed

    Spelman, Tim; Gray, Orla; Lucas, Robyn; Butzkueven, Helmut

    2015-12-09

    This report describes a novel Stata-based application of trigonometric regression modelling to 55 years of multiple sclerosis relapse data from 46 clinical centers across 20 countries located in both hemispheres. Central to the success of this method was the strategic use of plot analysis to guide and corroborate the statistical regression modelling. Initial plot analysis was necessary for establishing realistic hypotheses regarding the presence and structural form of seasonal and latitudinal influences on relapse probability and then testing the performance of the resultant models. Trigonometric regression was then necessary to quantify these relationships, adjust for important confounders and provide a measure of certainty as to how plausible these associations were. Synchronization of graphing techniques with regression modelling permitted a systematic refinement of models until best-fit convergence was achieved, enabling novel inferences to be made regarding the independent influence of both season and latitude in predicting relapse onset timing in MS. These methods have the potential for application across other complex disease and epidemiological phenomena suspected or known to vary systematically with season and/or geographic location.

  3. Residential magnetic fields predicted from wiring configurations: I. Exposure model.

    PubMed

    Bowman, J D; Thomas, D C; Jiang, L; Jiang, F; Peters, J M

    1999-10-01

    A physically based model for residential magnetic fields from electric transmission and distribution wiring was developed to reanalyze the Los Angeles study of childhood leukemia by London et al. For this exposure model, magnetic field measurements were fitted to a function of wire configuration attributes that was derived from a multipole expansion of the Law of Biot and Savart. The model parameters were determined by nonlinear regression techniques, using wiring data, distances, and the geometric mean of the ELF magnetic field magnitude from 24-h bedroom measurements taken at 288 homes during the epidemiologic study. The best fit to the measurement data was obtained with separate models for the two major utilities serving Los Angeles County. This model's predictions produced a correlation of 0.40 with the measured fields, an improvement on the 0.27 correlation obtained with the Wertheimer-Leeper (WL) wire code. For the leukemia risk analysis in a companion paper, the regression model predicts exposures to the 24-h geometric mean of the ELF magnetic fields in Los Angeles homes where only wiring data and distances have been obtained. Since these input parameters for the exposure model usually do not change for many years, the predicted magnetic fields will be stable over long time periods, just like the WL code. If the geometric mean is not the exposure metric associated with cancer, this regression technique could be used to estimate long-term exposures to temporal variability metrics and other characteristics of the ELF magnetic field which may be cancer risk factors.

  4. Regression analysis of current-status data: an application to breast-feeding.

    PubMed

    Grummer-strawn, L M

    1993-09-01

    "Although techniques for calculating mean survival time from current-status data are well known, their use in multiple regression models is somewhat troublesome. Using data on current breast-feeding behavior, this article considers a number of techniques that have been suggested in the literature, including parametric, nonparametric, and semiparametric models as well as the application of standard schedules. Models are tested in both proportional-odds and proportional-hazards frameworks....I fit [the] models to current status data on breast-feeding from the Demographic and Health Survey (DHS) in six countries: two African (Mali and Ondo State, Nigeria), two Asian (Indonesia and Sri Lanka), and two Latin American (Colombia and Peru)." excerpt

  5. Multivariate logistic regression for predicting total culturable virus presence at the intake of a potable-water treatment plant: novel application of the atypical coliform/total coliform ratio.

    PubMed

    Black, L E; Brion, G M; Freitas, S J

    2007-06-01

    Predicting the presence of enteric viruses in surface waters is a complex modeling problem. Multiple water quality parameters that indicate the presence of human fecal material, the load of fecal material, and the amount of time fecal material has been in the environment are needed. This paper presents the results of a multiyear study of raw-water quality at the inlet of a potable-water plant that related 17 physical, chemical, and biological indices to the presence of enteric viruses as indicated by cytopathic changes in cell cultures. It was found that several simple, multivariate logistic regression models that could reliably identify observations of the presence or absence of total culturable virus could be fitted. The best models developed combined a fecal age indicator (the atypical coliform [AC]/total coliform [TC] ratio), the detectable presence of a human-associated sterol (epicoprostanol) to indicate the fecal source, and one of several fecal load indicators (the levels of Giardia species cysts, coliform bacteria, and coprostanol). The best fit to the data was found when the AC/TC ratio, the presence of epicoprostanol, and the density of fecal coliform bacteria were input into a simple, multivariate logistic regression equation, resulting in 84.5% and 78.6% accuracies for the identification of the presence and absence of total culturable virus, respectively. The AC/TC ratio was the most influential input variable in all of the models generated, but producing the best prediction required additional input related to the fecal source and the fecal load. The potential for replacing microbial indicators of fecal load with levels of coprostanol was proposed and evaluated by multivariate logistic regression modeling for the presence and absence of virus.

  6. Issues and Importance of "Good" Starting Points for Nonlinear Regression for Mathematical Modeling with Maple: Basic Model Fitting to Make Predictions with Oscillating Data

    ERIC Educational Resources Information Center

    Fox, William

    2012-01-01

    The purpose of our modeling effort is to predict future outcomes. We assume the data collected are both accurate and relatively precise. For our oscillating data, we examined several mathematical modeling forms for predictions. We also examined both ignoring the oscillations as an important feature and including the oscillations as an important…

  7. A Bayesian model averaging method for the derivation of reservoir operating rules

    NASA Astrophysics Data System (ADS)

    Zhang, Jingwen; Liu, Pan; Wang, Hao; Lei, Xiaohui; Zhou, Yanlai

    2015-09-01

    Because the intrinsic dynamics among optimal decision making, inflow processes and reservoir characteristics are complex, functional forms of reservoir operating rules are always determined subjectively. As a result, the uncertainty of selecting form and/or model involved in reservoir operating rules must be analyzed and evaluated. In this study, we analyze the uncertainty of reservoir operating rules using the Bayesian model averaging (BMA) model. Three popular operating rules, namely piecewise linear regression, surface fitting and a least-squares support vector machine, are established based on the optimal deterministic reservoir operation. These individual models provide three-member decisions for the BMA combination, enabling the 90% release interval to be estimated by the Markov Chain Monte Carlo simulation. A case study of China's the Baise reservoir shows that: (1) the optimal deterministic reservoir operation, superior to any reservoir operating rules, is used as the samples to derive the rules; (2) the least-squares support vector machine model is more effective than both piecewise linear regression and surface fitting; (3) BMA outperforms any individual model of operating rules based on the optimal trajectories. It is revealed that the proposed model can reduce the uncertainty of operating rules, which is of great potential benefit in evaluating the confidence interval of decisions.

  8. Surface complexation modeling of zinc sorption onto ferrihydrite.

    PubMed

    Dyer, James A; Trivedi, Paras; Scrivner, Noel C; Sparks, Donald L

    2004-02-01

    A previous study involving lead(II) [Pb(II)] sorption onto ferrihydrite over a wide range of conditions highlighted the advantages of combining molecular- and macroscopic-scale investigations with surface complexation modeling to predict Pb(II) speciation and partitioning in aqueous systems. In this work, an extensive collection of new macroscopic and spectroscopic data was used to assess the ability of the modified triple-layer model (TLM) to predict single-solute zinc(II) [Zn(II)] sorption onto 2-line ferrihydrite in NaNO(3) solutions as a function of pH, ionic strength, and concentration. Regression of constant-pH isotherm data, together with potentiometric titration and pH edge data, was a much more rigorous test of the modified TLM than fitting pH edge data alone. When coupled with valuable input from spectroscopic analyses, good fits of the isotherm data were obtained with a one-species, one-Zn-sorption-site model using the bidentate-mononuclear surface complex, (triple bond FeO)(2)Zn; however, surprisingly, both the density of Zn(II) sorption sites and the value of the best-fit equilibrium "constant" for the bidentate-mononuclear complex had to be adjusted with pH to adequately fit the isotherm data. Although spectroscopy provided some evidence for multinuclear surface complex formation at surface loadings approaching site saturation at pH >/=6.5, the assumption of a bidentate-mononuclear surface complex provided acceptable fits of the sorption data over the entire range of conditions studied. Regressing edge data in the absence of isotherm and spectroscopic data resulted in a fair number of surface-species/site-type combinations that provided acceptable fits of the edge data, but unacceptable fits of the isotherm data. A linear relationship between logK((triple bond FeO)2Zn) and pH was found, given by logK((triple bond FeO)2Znat1g/l)=2.058 (pH)-6.131. In addition, a surface activity coefficient term was introduced to the model to reduce the ionic strength dependence of sorption. The results of this research and previous work with Pb(II) indicate that the existing thermodynamic framework for the modified TLM is able to reproduce the metal sorption data only over a limited range of conditions. For this reason, much work still needs to be done in fine-tuning the thermodynamic framework and databases for the TLM.

  9. Handling nonnormality and variance heterogeneity for quantitative sublethal toxicity tests.

    PubMed

    Ritz, Christian; Van der Vliet, Leana

    2009-09-01

    The advantages of using regression-based techniques to derive endpoints from environmental toxicity data are clear, and slowly, this superior analytical technique is gaining acceptance. As use of regression-based analysis becomes more widespread, some of the associated nuances and potential problems come into sharper focus. Looking at data sets that cover a broad spectrum of standard test species, we noticed that some model fits to data failed to meet two key assumptions-variance homogeneity and normality-that are necessary for correct statistical analysis via regression-based techniques. Failure to meet these assumptions often is caused by reduced variance at the concentrations showing severe adverse effects. Although commonly used with linear regression analysis, transformation of the response variable only is not appropriate when fitting data using nonlinear regression techniques. Through analysis of sample data sets, including Lemna minor, Eisenia andrei (terrestrial earthworm), and algae, we show that both the so-called Box-Cox transformation and use of the Poisson distribution can help to correct variance heterogeneity and nonnormality and so allow nonlinear regression analysis to be implemented. Both the Box-Cox transformation and the Poisson distribution can be readily implemented into existing protocols for statistical analysis. By correcting for nonnormality and variance heterogeneity, these two statistical tools can be used to encourage the transition to regression-based analysis and the depreciation of less-desirable and less-flexible analytical techniques, such as linear interpolation.

  10. Introduction to methodology of dose-response meta-analysis for binary outcome: With application on software.

    PubMed

    Zhang, Chao; Jia, Pengli; Yu, Liu; Xu, Chang

    2018-05-01

    Dose-response meta-analysis (DRMA) is widely applied to investigate the dose-specific relationship between independent and dependent variables. Such methods have been in use for over 30 years and are increasingly employed in healthcare and clinical decision-making. In this article, we give an overview of the methodology used in DRMA. We summarize the commonly used regression model and the pooled method in DRMA. We also use an example to illustrate how to employ a DRMA by these methods. Five regression models, linear regression, piecewise regression, natural polynomial regression, fractional polynomial regression, and restricted cubic spline regression, were illustrated in this article to fit the dose-response relationship. And two types of pooling approaches, that is, one-stage approach and two-stage approach are illustrated to pool the dose-response relationship across studies. The example showed similar results among these models. Several dose-response meta-analysis methods can be used for investigating the relationship between exposure level and the risk of an outcome. However the methodology of DRMA still needs to be improved. © 2018 Chinese Cochrane Center, West China Hospital of Sichuan University and John Wiley & Sons Australia, Ltd.

  11. Modeling Caribbean tree stem diameters from tree height and crown width measurements

    Treesearch

    Thomas Brandeis; KaDonna Randolph; Mike Strub

    2009-01-01

    Regression models to predict diameter at breast height (DBH) as a function of tree height and maximum crown radius were developed for Caribbean forests based on data collected by the U.S. Forest Service in the Commonwealth of Puerto Rico and Territory of the U.S. Virgin Islands. The model predicting DBH from tree height fit reasonably well (R2 = 0.7110), with...

  12. Evaluation and application of regional turbidity-sediment regression models in Virginia

    USGS Publications Warehouse

    Hyer, Kenneth; Jastram, John D.; Moyer, Douglas; Webber, James S.; Chanat, Jeffrey G.

    2015-01-01

    Conventional thinking has long held that turbidity-sediment surrogate-regression equations are site specific and that regression equations developed at a single monitoring station should not be applied to another station; however, few studies have evaluated this issue in a rigorous manner. If robust regional turbidity-sediment models can be developed successfully, their applications could greatly expand the usage of these methods. Suspended sediment load estimation could occur as soon as flow and turbidity monitoring commence at a site, suspended sediment sampling frequencies for various projects potentially could be reduced, and special-project applications (sediment monitoring following dam removal, for example) could be significantly enhanced. The objective of this effort was to investigate the turbidity-suspended sediment concentration (SSC) relations at all available USGS monitoring sites within Virginia to determine whether meaningful turbidity-sediment regression models can be developed by combining the data from multiple monitoring stations into a single model, known as a “regional” model. Following the development of the regional model, additional objectives included a comparison of predicted SSCs between the regional model and commonly used site-specific models, as well as an evaluation of why specific monitoring stations did not fit the regional model.

  13. Diagnostic efficiency of an ability-focused battery.

    PubMed

    Miller, Justin B; Fichtenberg, Norman L; Millis, Scott R

    2010-05-01

    An ability-focused battery (AFB) is a selected group of well-validated neuropsychological measures that assess the conventional range of cognitive domains. This study examined the diagnostic efficiency of an AFB for use in clinical decision making with a mixed sample composed of individuals with neurological brain dysfunction and individuals referred for cognitive assessment without evidence of neurological disorders. Using logistic regression analyses and ROC curve analysis, a five-domain model composed of attention, processing speed, visual-spatial reasoning, language/verbal reasoning, and memory domain scores was fitted that had an AUC of.89 (95% CI =.84-.95). A more parsimonious two-domain model using processing speed and memory was also fitted that had an AUC of.90 (95% confidence interval =.84-.95). A model composed of a global ability score calculated from the mean of the individual domain scores was also fitted with an AUC of.88 (95% CI =.82-.94).

  14. Optimizing methods for linking cinematic features to fMRI data.

    PubMed

    Kauttonen, Janne; Hlushchuk, Yevhen; Tikka, Pia

    2015-04-15

    One of the challenges of naturalistic neurosciences using movie-viewing experiments is how to interpret observed brain activations in relation to the multiplicity of time-locked stimulus features. As previous studies have shown less inter-subject synchronization across viewers of random video footage than story-driven films, new methods need to be developed for analysis of less story-driven contents. To optimize the linkage between our fMRI data collected during viewing of a deliberately non-narrative silent film 'At Land' by Maya Deren (1944) and its annotated content, we combined the method of elastic-net regularization with the model-driven linear regression and the well-established data-driven independent component analysis (ICA) and inter-subject correlation (ISC) methods. In the linear regression analysis, both IC and region-of-interest (ROI) time-series were fitted with time-series of a total of 36 binary-valued and one real-valued tactile annotation of film features. The elastic-net regularization and cross-validation were applied in the ordinary least-squares linear regression in order to avoid over-fitting due to the multicollinearity of regressors, the results were compared against both the partial least-squares (PLS) regression and the un-regularized full-model regression. Non-parametric permutation testing scheme was applied to evaluate the statistical significance of regression. We found statistically significant correlation between the annotation model and 9 ICs out of 40 ICs. Regression analysis was also repeated for a large set of cubic ROIs covering the grey matter. Both IC- and ROI-based regression analyses revealed activations in parietal and occipital regions, with additional smaller clusters in the frontal lobe. Furthermore, we found elastic-net based regression more sensitive than PLS and un-regularized regression since it detected a larger number of significant ICs and ROIs. Along with the ISC ranking methods, our regression analysis proved a feasible method for ordering the ICs based on their functional relevance to the annotated cinematic features. The novelty of our method is - in comparison to the hypothesis-driven manual pre-selection and observation of some individual regressors biased by choice - in applying data-driven approach to all content features simultaneously. We found especially the combination of regularized regression and ICA useful when analyzing fMRI data obtained using non-narrative movie stimulus with a large set of complex and correlated features. Copyright © 2015. Published by Elsevier Inc.

  15. Cox Regression Models with Functional Covariates for Survival Data.

    PubMed

    Gellar, Jonathan E; Colantuoni, Elizabeth; Needham, Dale M; Crainiceanu, Ciprian M

    2015-06-01

    We extend the Cox proportional hazards model to cases when the exposure is a densely sampled functional process, measured at baseline. The fundamental idea is to combine penalized signal regression with methods developed for mixed effects proportional hazards models. The model is fit by maximizing the penalized partial likelihood, with smoothing parameters estimated by a likelihood-based criterion such as AIC or EPIC. The model may be extended to allow for multiple functional predictors, time varying coefficients, and missing or unequally-spaced data. Methods were inspired by and applied to a study of the association between time to death after hospital discharge and daily measures of disease severity collected in the intensive care unit, among survivors of acute respiratory distress syndrome.

  16. Tree STEM and Canopy Biomass Estimates from Terrestrial Laser Scanning Data

    NASA Astrophysics Data System (ADS)

    Olofsson, K.; Holmgren, J.

    2017-10-01

    In this study an automatic method for estimating both the tree stem and the tree canopy biomass is presented. The point cloud tree extraction techniques operate on TLS data and models the biomass using the estimated stem and canopy volume as independent variables. The regression model fit error is of the order of less than 5 kg, which gives a relative model error of about 5 % for the stem estimate and 10-15 % for the spruce and pine canopy biomass estimates. The canopy biomass estimate was improved by separating the models by tree species which indicates that the method is allometry dependent and that the regression models need to be recomputed for different areas with different climate and different vegetation.

  17. Connecting clinical and actuarial prediction with rule-based methods.

    PubMed

    Fokkema, Marjolein; Smits, Niels; Kelderman, Henk; Penninx, Brenda W J H

    2015-06-01

    Meta-analyses comparing the accuracy of clinical versus actuarial prediction have shown actuarial methods to outperform clinical methods, on average. However, actuarial methods are still not widely used in clinical practice, and there has been a call for the development of actuarial prediction methods for clinical practice. We argue that rule-based methods may be more useful than the linear main effect models usually employed in prediction studies, from a data and decision analytic as well as a practical perspective. In addition, decision rules derived with rule-based methods can be represented as fast and frugal trees, which, unlike main effects models, can be used in a sequential fashion, reducing the number of cues that have to be evaluated before making a prediction. We illustrate the usability of rule-based methods by applying RuleFit, an algorithm for deriving decision rules for classification and regression problems, to a dataset on prediction of the course of depressive and anxiety disorders from Penninx et al. (2011). The RuleFit algorithm provided a model consisting of 2 simple decision rules, requiring evaluation of only 2 to 4 cues. Predictive accuracy of the 2-rule model was very similar to that of a logistic regression model incorporating 20 predictor variables, originally applied to the dataset. In addition, the 2-rule model required, on average, evaluation of only 3 cues. Therefore, the RuleFit algorithm appears to be a promising method for creating decision tools that are less time consuming and easier to apply in psychological practice, and with accuracy comparable to traditional actuarial methods. (c) 2015 APA, all rights reserved).

  18. Using Gamma and Quantile Regressions to Explore the Association between Job Strain and Adiposity in the ELSA-Brasil Study: Does Gender Matter?

    PubMed

    Fonseca, Maria de Jesus Mendes da; Juvanhol, Leidjaira Lopes; Rotenberg, Lúcia; Nobre, Aline Araújo; Griep, Rosane Härter; Alves, Márcia Guimarães de Mello; Cardoso, Letícia de Oliveira; Giatti, Luana; Nunes, Maria Angélica; Aquino, Estela M L; Chor, Dóra

    2017-11-17

    This paper explores the association between job strain and adiposity, using two statistical analysis approaches and considering the role of gender. The research evaluated 11,960 active baseline participants (2008-2010) in the ELSA-Brasil study. Job strain was evaluated through a demand-control questionnaire, while body mass index (BMI) and waist circumference (WC) were evaluated in continuous form. The associations were estimated using gamma regression models with an identity link function. Quantile regression models were also estimated from the final set of co-variables established by gamma regression. The relationship that was found varied by analytical approach and gender. Among the women, no association was observed between job strain and adiposity in the fitted gamma models. In the quantile models, a pattern of increasing effects of high strain was observed at higher BMI and WC distribution quantiles. Among the men, high strain was associated with adiposity in the gamma regression models. However, when quantile regression was used, that association was found not to be homogeneous across outcome distributions. In addition, in the quantile models an association was observed between active jobs and BMI. Our results point to an association between job strain and adiposity, which follows a heterogeneous pattern. Modelling strategies can produce different results and should, accordingly, be used to complement one another.

  19. Graphical approach to assess the soil fertility evaluation model validity for rice (case study: southern area of Merapi Mountain, Indonesia)

    NASA Astrophysics Data System (ADS)

    Julianto, E. A.; Suntoro, W. A.; Dewi, W. S.; Partoyo

    2018-03-01

    Climate change has been reported to exacerbate land resources degradation including soil fertility decline. The appropriate validity use on soil fertility evaluation could reduce the risk of climate change effect on plant cultivation. This study aims to assess the validity of a Soil Fertility Evaluation Model using a graphical approach. The models evaluated were the Indonesian Soil Research Center (PPT) version model, the FAO Unesco version model, and the Kyuma version model. Each model was then correlated with rice production (dry grain weight/GKP). The goodness of fit of each model can be tested to evaluate the quality and validity of a model, as well as the regression coefficient (R2). This research used the Eviews 9 programme by a graphical approach. The results obtained three curves, namely actual, fitted, and residual curves. If the actual and fitted curves are widely apart or irregular, this means that the quality of the model is not good, or there are many other factors that are still not included in the model (large residual) and conversely. Indeed, if the actual and fitted curves show exactly the same shape, it means that all factors have already been included in the model. Modification of the standard soil fertility evaluation models can improve the quality and validity of a model.

  20. A quantile count model of water depth constraints on Cape Sable seaside sparrows

    USGS Publications Warehouse

    Cade, B.S.; Dong, Q.

    2008-01-01

    1. A quantile regression model for counts of breeding Cape Sable seaside sparrows Ammodramus maritimus mirabilis (L.) as a function of water depth and previous year abundance was developed based on extensive surveys, 1992-2005, in the Florida Everglades. The quantile count model extends linear quantile regression methods to discrete response variables, providing a flexible alternative to discrete parametric distributional models, e.g. Poisson, negative binomial and their zero-inflated counterparts. 2. Estimates from our multiplicative model demonstrated that negative effects of increasing water depth in breeding habitat on sparrow numbers were dependent on recent occupation history. Upper 10th percentiles of counts (one to three sparrows) decreased with increasing water depth from 0 to 30 cm when sites were not occupied in previous years. However, upper 40th percentiles of counts (one to six sparrows) decreased with increasing water depth for sites occupied in previous years. 3. Greatest decreases (-50% to -83%) in upper quantiles of sparrow counts occurred as water depths increased from 0 to 15 cm when previous year counts were 1, but a small proportion of sites (5-10%) held at least one sparrow even as water depths increased to 20 or 30 cm. 4. A zero-inflated Poisson regression model provided estimates of conditional means that also decreased with increasing water depth but rates of change were lower and decreased with increasing previous year counts compared to the quantile count model. Quantiles computed for the zero-inflated Poisson model enhanced interpretation of this model but had greater lack-of-fit for water depths > 0 cm and previous year counts 1, conditions where the negative effect of water depths were readily apparent and fitted better with the quantile count model.

  1. Egg production forecasting: Determining efficient modeling approaches.

    PubMed

    Ahmad, H A

    2011-12-01

    Several mathematical or statistical and artificial intelligence models were developed to compare egg production forecasts in commercial layers. Initial data for these models were collected from a comparative layer trial on commercial strains conducted at the Poultry Research Farms, Auburn University. Simulated data were produced to represent new scenarios by using means and SD of egg production of the 22 commercial strains. From the simulated data, random examples were generated for neural network training and testing for the weekly egg production prediction from wk 22 to 36. Three neural network architectures-back-propagation-3, Ward-5, and the general regression neural network-were compared for their efficiency to forecast egg production, along with other traditional models. The general regression neural network gave the best-fitting line, which almost overlapped with the commercial egg production data, with an R(2) of 0.71. The general regression neural network-predicted curve was compared with original egg production data, the average curves of white-shelled and brown-shelled strains, linear regression predictions, and the Gompertz nonlinear model. The general regression neural network was superior in all these comparisons and may be the model of choice if the initial overprediction is managed efficiently. In general, neural network models are efficient, are easy to use, require fewer data, and are practical under farm management conditions to forecast egg production.

  2. A nonparametric multiple imputation approach for missing categorical data.

    PubMed

    Zhou, Muhan; He, Yulei; Yu, Mandi; Hsu, Chiu-Hsieh

    2017-06-06

    Incomplete categorical variables with more than two categories are common in public health data. However, most of the existing missing-data methods do not use the information from nonresponse (missingness) probabilities. We propose a nearest-neighbour multiple imputation approach to impute a missing at random categorical outcome and to estimate the proportion of each category. The donor set for imputation is formed by measuring distances between each missing value with other non-missing values. The distance function is calculated based on a predictive score, which is derived from two working models: one fits a multinomial logistic regression for predicting the missing categorical outcome (the outcome model) and the other fits a logistic regression for predicting missingness probabilities (the missingness model). A weighting scheme is used to accommodate contributions from two working models when generating the predictive score. A missing value is imputed by randomly selecting one of the non-missing values with the smallest distances. We conduct a simulation to evaluate the performance of the proposed method and compare it with several alternative methods. A real-data application is also presented. The simulation study suggests that the proposed method performs well when missingness probabilities are not extreme under some misspecifications of the working models. However, the calibration estimator, which is also based on two working models, can be highly unstable when missingness probabilities for some observations are extremely high. In this scenario, the proposed method produces more stable and better estimates. In addition, proper weights need to be chosen to balance the contributions from the two working models and achieve optimal results for the proposed method. We conclude that the proposed multiple imputation method is a reasonable approach to dealing with missing categorical outcome data with more than two levels for assessing the distribution of the outcome. In terms of the choices for the working models, we suggest a multinomial logistic regression for predicting the missing outcome and a binary logistic regression for predicting the missingness probability.

  3. [Primary branch size of Pinus koraiensis plantation: a prediction based on linear mixed effect model].

    PubMed

    Dong, Ling-Bo; Liu, Zhao-Gang; Li, Feng-Ri; Jiang, Li-Chun

    2013-09-01

    By using the branch analysis data of 955 standard branches from 60 sampled trees in 12 sampling plots of Pinus koraiensis plantation in Mengjiagang Forest Farm in Heilongjiang Province of Northeast China, and based on the linear mixed-effect model theory and methods, the models for predicting branch variables, including primary branch diameter, length, and angle, were developed. Considering tree effect, the MIXED module of SAS software was used to fit the prediction models. The results indicated that the fitting precision of the models could be improved by choosing appropriate random-effect parameters and variance-covariance structure. Then, the correlation structures including complex symmetry structure (CS), first-order autoregressive structure [AR(1)], and first-order autoregressive and moving average structure [ARMA(1,1)] were added to the optimal branch size mixed-effect model. The AR(1) improved the fitting precision of branch diameter and length mixed-effect model significantly, but all the three structures didn't improve the precision of branch angle mixed-effect model. In order to describe the heteroscedasticity during building mixed-effect model, the CF1 and CF2 functions were added to the branch mixed-effect model. CF1 function improved the fitting effect of branch angle mixed model significantly, whereas CF2 function improved the fitting effect of branch diameter and length mixed model significantly. Model validation confirmed that the mixed-effect model could improve the precision of prediction, as compare to the traditional regression model for the branch size prediction of Pinus koraiensis plantation.

  4. Deriving the Regression Equation without Using Calculus

    ERIC Educational Resources Information Center

    Gordon, Sheldon P.; Gordon, Florence S.

    2004-01-01

    Probably the one "new" mathematical topic that is most responsible for modernizing courses in college algebra and precalculus over the last few years is the idea of fitting a function to a set of data in the sense of a least squares fit. Whether it be simple linear regression or nonlinear regression, this topic opens the door to applying the…

  5. Simulation of parametric model towards the fixed covariate of right censored lung cancer data

    NASA Astrophysics Data System (ADS)

    Afiqah Muhamad Jamil, Siti; Asrul Affendi Abdullah, M.; Kek, Sie Long; Ridwan Olaniran, Oyebayo; Enera Amran, Syahila

    2017-09-01

    In this study, simulation procedure was applied to measure the fixed covariate of right censored data by using parametric survival model. The scale and shape parameter were modified to differentiate the analysis of parametric regression survival model. Statistically, the biases, mean biases and the coverage probability were used in this analysis. Consequently, different sample sizes were employed to distinguish the impact of parametric regression model towards right censored data with 50, 100, 150 and 200 number of sample. R-statistical software was utilised to develop the coding simulation with right censored data. Besides, the final model of right censored simulation was compared with the right censored lung cancer data in Malaysia. It was found that different values of shape and scale parameter with different sample size, help to improve the simulation strategy for right censored data and Weibull regression survival model is suitable fit towards the simulation of survival of lung cancer patients data in Malaysia.

  6. Closed-form solution for static pull-in voltage of electrostatically actuated clamped-clamped micro/nano beams under the effect of fringing field and van der Waals force

    NASA Astrophysics Data System (ADS)

    Bhojawala, V. M.; Vakharia, D. P.

    2017-12-01

    This investigation provides an accurate prediction of static pull-in voltage for clamped-clamped micro/nano beams based on distributed model. The Euler-Bernoulli beam theory is used adapting geometric non-linearity of beam, internal (residual) stress, van der Waals force, distributed electrostatic force and fringing field effects for deriving governing differential equation. The Galerkin discretisation method is used to make reduced-order model of the governing differential equation. A regime plot is presented in the current work for determining the number of modes required in reduced-order model to obtain completely converged pull-in voltage for micro/nano beams. A closed-form relation is developed based on the relationship obtained from curve fitting of pull-in instability plots and subsequent non-linear regression for the proposed relation. The output of regression analysis provides Chi-square (χ 2) tolerance value equals to 1  ×  10-9, adjusted R-square value equals to 0.999 29 and P-value equals to zero, these statistical parameters indicate the convergence of non-linear fit, accuracy of fitted data and significance of the proposed model respectively. The closed-form equation is validated using available data of experimental and numerical results. The relative maximum error of 4.08% in comparison to several available experimental and numerical data proves the reliability of the proposed closed-form equation.

  7. Modeling animal-vehicle collisions using diagonal inflated bivariate Poisson regression.

    PubMed

    Lao, Yunteng; Wu, Yao-Jan; Corey, Jonathan; Wang, Yinhai

    2011-01-01

    Two types of animal-vehicle collision (AVC) data are commonly adopted for AVC-related risk analysis research: reported AVC data and carcass removal data. One issue with these two data sets is that they were found to have significant discrepancies by previous studies. In order to model these two types of data together and provide a better understanding of highway AVCs, this study adopts a diagonal inflated bivariate Poisson regression method, an inflated version of bivariate Poisson regression model, to fit the reported AVC and carcass removal data sets collected in Washington State during 2002-2006. The diagonal inflated bivariate Poisson model not only can model paired data with correlation, but also handle under- or over-dispersed data sets as well. Compared with three other types of models, double Poisson, bivariate Poisson, and zero-inflated double Poisson, the diagonal inflated bivariate Poisson model demonstrates its capability of fitting two data sets with remarkable overlapping portions resulting from the same stochastic process. Therefore, the diagonal inflated bivariate Poisson model provides researchers a new approach to investigating AVCs from a different perspective involving the three distribution parameters (λ(1), λ(2) and λ(3)). The modeling results show the impacts of traffic elements, geometric design and geographic characteristics on the occurrences of both reported AVC and carcass removal data. It is found that the increase of some associated factors, such as speed limit, annual average daily traffic, and shoulder width, will increase the numbers of reported AVCs and carcass removals. Conversely, the presence of some geometric factors, such as rolling and mountainous terrain, will decrease the number of reported AVCs. Published by Elsevier Ltd.

  8. Assessment of Response Surface Models using Independent Confirmation Point Analysis

    NASA Technical Reports Server (NTRS)

    DeLoach, Richard

    2010-01-01

    This paper highlights various advantages that confirmation-point residuals have over conventional model design-point residuals in assessing the adequacy of a response surface model fitted by regression techniques to a sample of experimental data. Particular advantages are highlighted for the case of design matrices that may be ill-conditioned for a given sample of data. The impact of both aleatory and epistemological uncertainty in response model adequacy assessments is considered.

  9. Efficient occupancy model-fitting for extensive citizen-science data.

    PubMed

    Dennis, Emily B; Morgan, Byron J T; Freeman, Stephen N; Ridout, Martin S; Brereton, Tom M; Fox, Richard; Powney, Gary D; Roy, David B

    2017-01-01

    Appropriate large-scale citizen-science data present important new opportunities for biodiversity modelling, due in part to the wide spatial coverage of information. Recently proposed occupancy modelling approaches naturally incorporate random effects in order to account for annual variation in the composition of sites surveyed. In turn this leads to Bayesian analysis and model fitting, which are typically extremely time consuming. Motivated by presence-only records of occurrence from the UK Butterflies for the New Millennium data base, we present an alternative approach, in which site variation is described in a standard way through logistic regression on relevant environmental covariates. This allows efficient occupancy model-fitting using classical inference, which is easily achieved using standard computers. This is especially important when models need to be fitted each year, typically for many different species, as with British butterflies for example. Using both real and simulated data we demonstrate that the two approaches, with and without random effects, can result in similar conclusions regarding trends. There are many advantages to classical model-fitting, including the ability to compare a range of alternative models, identify appropriate covariates and assess model fit, using standard tools of maximum likelihood. In addition, modelling in terms of covariates provides opportunities for understanding the ecological processes that are in operation. We show that there is even greater potential; the classical approach allows us to construct regional indices simply, which indicate how changes in occupancy typically vary over a species' range. In addition we are also able to construct dynamic occupancy maps, which provide a novel, modern tool for examining temporal changes in species distribution. These new developments may be applied to a wide range of taxa, and are valuable at a time of climate change. They also have the potential to motivate citizen scientists.

  10. Efficient occupancy model-fitting for extensive citizen-science data

    PubMed Central

    Morgan, Byron J. T.; Freeman, Stephen N.; Ridout, Martin S.; Brereton, Tom M.; Fox, Richard; Powney, Gary D.; Roy, David B.

    2017-01-01

    Appropriate large-scale citizen-science data present important new opportunities for biodiversity modelling, due in part to the wide spatial coverage of information. Recently proposed occupancy modelling approaches naturally incorporate random effects in order to account for annual variation in the composition of sites surveyed. In turn this leads to Bayesian analysis and model fitting, which are typically extremely time consuming. Motivated by presence-only records of occurrence from the UK Butterflies for the New Millennium data base, we present an alternative approach, in which site variation is described in a standard way through logistic regression on relevant environmental covariates. This allows efficient occupancy model-fitting using classical inference, which is easily achieved using standard computers. This is especially important when models need to be fitted each year, typically for many different species, as with British butterflies for example. Using both real and simulated data we demonstrate that the two approaches, with and without random effects, can result in similar conclusions regarding trends. There are many advantages to classical model-fitting, including the ability to compare a range of alternative models, identify appropriate covariates and assess model fit, using standard tools of maximum likelihood. In addition, modelling in terms of covariates provides opportunities for understanding the ecological processes that are in operation. We show that there is even greater potential; the classical approach allows us to construct regional indices simply, which indicate how changes in occupancy typically vary over a species’ range. In addition we are also able to construct dynamic occupancy maps, which provide a novel, modern tool for examining temporal changes in species distribution. These new developments may be applied to a wide range of taxa, and are valuable at a time of climate change. They also have the potential to motivate citizen scientists. PMID:28328937

  11. Modeling Stationary Lithium-Ion Batteries for Optimization and Predictive Control

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Baker, Kyri A; Shi, Ying; Christensen, Dane T

    Accurately modeling stationary battery storage behavior is crucial to understand and predict its limitations in demand-side management scenarios. In this paper, a lithium-ion battery model was derived to estimate lifetime and state-of-charge for building-integrated use cases. The proposed battery model aims to balance speed and accuracy when modeling battery behavior for real-time predictive control and optimization. In order to achieve these goals, a mixed modeling approach was taken, which incorporates regression fits to experimental data and an equivalent circuit to model battery behavior. A comparison of the proposed battery model output to actual data from the manufacturer validates the modelingmore » approach taken in the paper. Additionally, a dynamic test case demonstrates the effects of using regression models to represent internal resistance and capacity fading.« less

  12. Modeling Stationary Lithium-Ion Batteries for Optimization and Predictive Control: Preprint

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Raszmann, Emma; Baker, Kyri; Shi, Ying

    Accurately modeling stationary battery storage behavior is crucial to understand and predict its limitations in demand-side management scenarios. In this paper, a lithium-ion battery model was derived to estimate lifetime and state-of-charge for building-integrated use cases. The proposed battery model aims to balance speed and accuracy when modeling battery behavior for real-time predictive control and optimization. In order to achieve these goals, a mixed modeling approach was taken, which incorporates regression fits to experimental data and an equivalent circuit to model battery behavior. A comparison of the proposed battery model output to actual data from the manufacturer validates the modelingmore » approach taken in the paper. Additionally, a dynamic test case demonstrates the effects of using regression models to represent internal resistance and capacity fading.« less

  13. ShapeSelectForest: a new r package for modeling landsat time series

    Treesearch

    Mary Meyer; Xiyue Liao; Gretchen Moisen; Elizabeth Freeman

    2015-01-01

    We present a new R package called ShapeSelectForest recently posted to the Comprehensive R Archival Network. The package was developed to fit nonparametric shape-restricted regression splines to time series of Landsat imagery for the purpose of modeling, mapping, and monitoring annual forest disturbance dynamics over nearly three decades. For each pixel and spectral...

  14. An appraisal of convergence failures in the application of logistic regression model in published manuscripts.

    PubMed

    Yusuf, O B; Bamgboye, E A; Afolabi, R F; Shodimu, M A

    2014-09-01

    Logistic regression model is widely used in health research for description and predictive purposes. Unfortunately, most researchers are sometimes not aware that the underlying principles of the techniques have failed when the algorithm for maximum likelihood does not converge. Young researchers particularly postgraduate students may not know why separation problem whether quasi or complete occurs, how to identify it and how to fix it. This study was designed to critically evaluate convergence issues in articles that employed logistic regression analysis published in an African Journal of Medicine and medical sciences between 2004 and 2013. Problems of quasi or complete separation were described and were illustrated with the National Demographic and Health Survey dataset. A critical evaluation of articles that employed logistic regression was conducted. A total of 581 articles was reviewed, of which 40 (6.9%) used binary logistic regression. Twenty-four (60.0%) stated the use of logistic regression model in the methodology while none of the articles assessed model fit. Only 3 (12.5%) properly described the procedures. Of the 40 that used the logistic regression model, the problem of convergence occurred in 6 (15.0%) of the articles. Logistic regression tends to be poorly reported in studies published between 2004 and 2013. Our findings showed that the procedure may not be well understood by researchers since very few described the process in their reports and may be totally unaware of the problem of convergence or how to deal with it.

  15. Optimization of Regression Models of Experimental Data Using Confirmation Points

    NASA Technical Reports Server (NTRS)

    Ulbrich, N.

    2010-01-01

    A new search metric is discussed that may be used to better assess the predictive capability of different math term combinations during the optimization of a regression model of experimental data. The new search metric can be determined for each tested math term combination if the given experimental data set is split into two subsets. The first subset consists of data points that are only used to determine the coefficients of the regression model. The second subset consists of confirmation points that are exclusively used to test the regression model. The new search metric value is assigned after comparing two values that describe the quality of the fit of each subset. The first value is the standard deviation of the PRESS residuals of the data points. The second value is the standard deviation of the response residuals of the confirmation points. The greater of the two values is used as the new search metric value. This choice guarantees that both standard deviations are always less or equal to the value that is used during the optimization. Experimental data from the calibration of a wind tunnel strain-gage balance is used to illustrate the application of the new search metric. The new search metric ultimately generates an optimized regression model that was already tested at regression model independent confirmation points before it is ever used to predict an unknown response from a set of regressors.

  16. Genetic analyses of protein yield in dairy cows applying random regression models with time-dependent and temperature x humidity-dependent covariates.

    PubMed

    Brügemann, K; Gernand, E; von Borstel, U U; König, S

    2011-08-01

    Data used in the present study included 1,095,980 first-lactation test-day records for protein yield of 154,880 Holstein cows housed on 196 large-scale dairy farms in Germany. Data were recorded between 2002 and 2009 and merged with meteorological data from public weather stations. The maximum distance between each farm and its corresponding weather station was 50 km. Hourly temperature-humidity indexes (THI) were calculated using the mean of hourly measurements of dry bulb temperature and relative humidity. On the phenotypic scale, an increase in THI was generally associated with a decrease in daily protein yield. For genetic analyses, a random regression model was applied using time-dependent (d in milk, DIM) and THI-dependent covariates. Additive genetic and permanent environmental effects were fitted with this random regression model and Legendre polynomials of order 3 for DIM and THI. In addition, the fixed curve was modeled with Legendre polynomials of order 3. Heterogeneous residuals were fitted by dividing DIM into 5 classes, and by dividing THI into 4 classes, resulting in 20 different classes. Additive genetic variances for daily protein yield decreased with increasing degrees of heat stress and were lowest at the beginning of lactation and at extreme THI. Due to higher additive genetic variances, slightly higher permanent environment variances, and similar residual variances, heritabilities were highest for low THI in combination with DIM at the end of lactation. Genetic correlations among individual values for THI were generally >0.90. These trends from the complex random regression model were verified by applying relatively simple bivariate animal models for protein yield measured in 2 THI environments; that is, defining a THI value of 60 as a threshold. These high correlations indicate the absence of any substantial genotype × environment interaction for protein yield. However, heritabilities and additive genetic variances from the random regression model tended to be slightly higher in the THI range corresponding to cows' comfort zone. Selecting such superior environments for progeny testing can contribute to an accurate genetic differentiation among selection candidates. Copyright © 2011 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  17. Detection of high GS risk group prostate tumors by diffusion tensor imaging and logistic regression modelling.

    PubMed

    Ertas, Gokhan

    2018-07-01

    To assess the value of joint evaluation of diffusion tensor imaging (DTI) measures by using logistic regression modelling to detect high GS risk group prostate tumors. Fifty tumors imaged using DTI on a 3 T MRI device were analyzed. Regions of interests focusing on the center of tumor foci and noncancerous tissue on the maps of mean diffusivity (MD) and fractional anisotropy (FA) were used to extract the minimum, the maximum and the mean measures. Measure ratio was computed by dividing tumor measure by noncancerous tissue measure. Logistic regression models were fitted for all possible pair combinations of the measures using 5-fold cross validation. Systematic differences are present for all MD measures and also for all FA measures in distinguishing the high risk tumors [GS ≥ 7(4 + 3)] from the low risk tumors [GS ≤ 7(3 + 4)] (P < 0.05). Smaller value for MD measures and larger value for FA measures indicate the high risk. The models enrolling the measures achieve good fits and good classification performances (R 2 adj  = 0.55-0.60, AUC = 0.88-0.91), however the models using the measure ratios perform better (R 2 adj  = 0.59-0.75, AUC = 0.88-0.95). The model that employs the ratios of minimum MD and maximum FA accomplishes the highest sensitivity, specificity and accuracy (Se = 77.8%, Sp = 96.9% and Acc = 90.0%). Joint evaluation of MD and FA diffusion tensor imaging measures is valuable to detect high GS risk group peripheral zone prostate tumors. However, use of the ratios of the measures improves the accuracy of the detections substantially. Logistic regression modelling provides a favorable solution for the joint evaluations easily adoptable in clinical practice. Copyright © 2018 Elsevier Inc. All rights reserved.

  18. Genetic background in partitioning of metabolizable energy efficiency in dairy cows.

    PubMed

    Mehtiö, T; Negussie, E; Mäntysaari, P; Mäntysaari, E A; Lidauer, M H

    2018-05-01

    The main objective of this study was to assess the genetic differences in metabolizable energy efficiency and efficiency in partitioning metabolizable energy in different pathways: maintenance, milk production, and growth in primiparous dairy cows. Repeatability models for residual energy intake (REI) and metabolizable energy intake (MEI) were compared and the genetic and permanent environmental variations in MEI were partitioned into its energy sinks using random regression models. We proposed 2 new feed efficiency traits: metabolizable energy efficiency (MEE), which is formed by modeling MEI fitting regressions on energy sinks [metabolic body weight (BW 0.75 ), energy-corrected milk, body weight gain, and body weight loss] directly; and partial MEE (pMEE), where the model for MEE is extended with regressions on energy sinks nested within additive genetic and permanent environmental effects. The data used were collected from Luke's experimental farms Rehtijärvi and Minkiö between 1998 and 2014. There were altogether 12,350 weekly MEI records on 495 primiparous Nordic Red dairy cows from wk 2 to 40 of lactation. Heritability estimates for REI and MEE were moderate, 0.33 and 0.26, respectively. The estimate of the residual variance was smaller for MEE than for REI, indicating that analyzing weekly MEI observations simultaneously with energy sinks is preferable. Model validation based on Akaike's information criterion showed that pMEE models fitted the data even better and also resulted in smaller residual variance estimates. However, models that included random regression on BW 0.75 converged slowly. The resulting genetic standard deviation estimate from the pMEE coefficient for milk production was 0.75 MJ of MEI/kg of energy-corrected milk. The derived partial heritabilities for energy efficiency in maintenance, milk production, and growth were 0.02, 0.06, and 0.04, respectively, indicating that some genetic variation may exist in the efficiency of using metabolizable energy for different pathways in dairy cows. Copyright © 2018 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  19. Geodesic regression on orientation distribution functions with its application to an aging study.

    PubMed

    Du, Jia; Goh, Alvina; Kushnarev, Sergey; Qiu, Anqi

    2014-02-15

    In this paper, we treat orientation distribution functions (ODFs) derived from high angular resolution diffusion imaging (HARDI) as elements of a Riemannian manifold and present a method for geodesic regression on this manifold. In order to find the optimal regression model, we pose this as a least-squares problem involving the sum-of-squared geodesic distances between observed ODFs and their model fitted data. We derive the appropriate gradient terms and employ gradient descent to find the minimizer of this least-squares optimization problem. In addition, we show how to perform statistical testing for determining the significance of the relationship between the manifold-valued regressors and the real-valued regressands. Experiments on both synthetic and real human data are presented. In particular, we examine aging effects on HARDI via geodesic regression of ODFs in normal adults aged 22 years old and above. © 2013 Elsevier Inc. All rights reserved.

  20. A modified temporal criterion to meta-optimize the extended Kalman filter for land cover classification of remotely sensed time series

    NASA Astrophysics Data System (ADS)

    Salmon, B. P.; Kleynhans, W.; Olivier, J. C.; van den Bergh, F.; Wessels, K. J.

    2018-05-01

    Humans are transforming land cover at an ever-increasing rate. Accurate geographical maps on land cover, especially rural and urban settlements are essential to planning sustainable development. Time series extracted from MODerate resolution Imaging Spectroradiometer (MODIS) land surface reflectance products have been used to differentiate land cover classes by analyzing the seasonal patterns in reflectance values. The proper fitting of a parametric model to these time series usually requires several adjustments to the regression method. To reduce the workload, a global setting of parameters is done to the regression method for a geographical area. In this work we have modified a meta-optimization approach to setting a regression method to extract the parameters on a per time series basis. The standard deviation of the model parameters and magnitude of residuals are used as scoring function. We successfully fitted a triply modulated model to the seasonal patterns of our study area using a non-linear extended Kalman filter (EKF). The approach uses temporal information which significantly reduces the processing time and storage requirements to process each time series. It also derives reliability metrics for each time series individually. The features extracted using the proposed method are classified with a support vector machine and the performance of the method is compared to the original approach on our ground truth data.

  1. Bayesian regression analyses of radiation modality effects on pericardial and pleural effusion and survival in esophageal cancer.

    PubMed

    He, Liru; Chapple, Andrew; Liao, Zhongxing; Komaki, Ritsuko; Thall, Peter F; Lin, Steven H

    2016-10-01

    To evaluate radiation modality effects on pericardial effusion (PCE), pleural effusion (PE) and survival in esophageal cancer (EC) patients. We analyzed data from 470 EC patients treated with definitive concurrent chemoradiotherapy (CRT). Bayesian semi-competing risks (SCR) regression models were fit to assess effects of radiation modality and prognostic covariates on the risks of PCE and PE, and death either with or without these preceding events. Bayesian piecewise exponential regression models were fit for overall survival, the time to PCE or death, and the time to PE or death. All models included propensity score as a covariate to correct for potential selection bias. Median times to onset of PCE and PE after RT were 7.1 and 6.1months for IMRT, and 6.5 and 5.4months for 3DCRT, respectively. Compared to 3DCRT, the IMRT group had significantly lower risks of PE, PCE, and death. The respective probabilities of a patient being alive without either PCE or PE at 3-years and 5-years were 0.29 and 0.21 for IMRT compared to 0.13 and 0.08 for 3DCRT. In the SCR regression analyses, IMRT was associated with significantly lower risks of PCE (HR=0.26) and PE (HR=0.49), and greater overall survival (probability of beneficial effect (pbe)>0.99), after controlling for known clinical prognostic factors. IMRT reduces the incidence and postpones the onset of PCE and PE, and increases survival probability, compared to 3DCRT. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  2. Logistic regression analysis of conventional ultrasonography, strain elastosonography, and contrast-enhanced ultrasound characteristics for the differentiation of benign and malignant thyroid nodules

    PubMed Central

    Deng, Yingyuan; Wang, Tianfu; Chen, Siping; Liu, Weixiang

    2017-01-01

    The aim of the study is to screen the significant sonographic features by logistic regression analysis and fit a model to diagnose thyroid nodules. A total of 525 pathological thyroid nodules were retrospectively analyzed. All the nodules underwent conventional ultrasonography (US), strain elastosonography (SE), and contrast -enhanced ultrasound (CEUS). Those nodules’ 12 suspicious sonographic features were used to assess thyroid nodules. The significant features of diagnosing thyroid nodules were picked out by logistic regression analysis. All variables that were statistically related to diagnosis of thyroid nodules, at a level of p < 0.05 were embodied in a logistic regression analysis model. The significant features in the logistic regression model of diagnosing thyroid nodules were calcification, suspected cervical lymph node metastasis, hypoenhancement pattern, margin, shape, vascularity, posterior acoustic, echogenicity, and elastography score. According to the results of logistic regression analysis, the formula that could predict whether or not thyroid nodules are malignant was established. The area under the receiver operating curve (ROC) was 0.930 and the sensitivity, specificity, accuracy, positive predictive value, and negative predictive value were 83.77%, 89.56%, 87.05%, 86.04%, and 87.79% respectively. PMID:29228030

  3. Logistic regression analysis of conventional ultrasonography, strain elastosonography, and contrast-enhanced ultrasound characteristics for the differentiation of benign and malignant thyroid nodules.

    PubMed

    Pang, Tiantian; Huang, Leidan; Deng, Yingyuan; Wang, Tianfu; Chen, Siping; Gong, Xuehao; Liu, Weixiang

    2017-01-01

    The aim of the study is to screen the significant sonographic features by logistic regression analysis and fit a model to diagnose thyroid nodules. A total of 525 pathological thyroid nodules were retrospectively analyzed. All the nodules underwent conventional ultrasonography (US), strain elastosonography (SE), and contrast -enhanced ultrasound (CEUS). Those nodules' 12 suspicious sonographic features were used to assess thyroid nodules. The significant features of diagnosing thyroid nodules were picked out by logistic regression analysis. All variables that were statistically related to diagnosis of thyroid nodules, at a level of p < 0.05 were embodied in a logistic regression analysis model. The significant features in the logistic regression model of diagnosing thyroid nodules were calcification, suspected cervical lymph node metastasis, hypoenhancement pattern, margin, shape, vascularity, posterior acoustic, echogenicity, and elastography score. According to the results of logistic regression analysis, the formula that could predict whether or not thyroid nodules are malignant was established. The area under the receiver operating curve (ROC) was 0.930 and the sensitivity, specificity, accuracy, positive predictive value, and negative predictive value were 83.77%, 89.56%, 87.05%, 86.04%, and 87.79% respectively.

  4. Remote sensing of PM2.5 from ground-based optical measurements

    NASA Astrophysics Data System (ADS)

    Li, S.; Joseph, E.; Min, Q.

    2014-12-01

    Remote sensing of particulate matter concentration with aerodynamic diameter smaller than 2.5 um(PM2.5) by using ground-based optical measurements of aerosols is investigated based on 6 years of hourly average measurements of aerosol optical properties, PM2.5, ceilometer backscatter coefficients and meteorological factors from Howard University Beltsville Campus facility (HUBC). The accuracy of quantitative retrieval of PM2.5 using aerosol optical depth (AOD) is limited due to changes in aerosol size distribution and vertical distribution. In this study, ceilometer backscatter coefficients are used to provide vertical information of aerosol. It is found that the PM2.5-AOD ratio can vary largely for different aerosol vertical distributions. The ratio is also sensitive to mode parameters of bimodal lognormal aerosol size distribution when the geometric mean radius for the fine mode is small. Using two Angstrom exponents calculated at three wavelengths of 415, 500, 860nm are found better representing aerosol size distributions than only using one Angstrom exponent. A regression model is proposed to assess the impacts of different factors on the retrieval of PM2.5. Compared to a simple linear regression model, the new model combining AOD and ceilometer backscatter can prominently improve the fitting of PM2.5. The contribution of further introducing Angstrom coefficients is apparent. Using combined measurements of AOD, ceilometer backscatter, Angstrom coefficients and meteorological parameters in the regression model can get a correlation coefficient of 0.79 between fitted and expected PM2.5.

  5. A phenomenological biological dose model for proton therapy based on linear energy transfer spectra.

    PubMed

    Rørvik, Eivind; Thörnqvist, Sara; Stokkevåg, Camilla H; Dahle, Tordis J; Fjaera, Lars Fredrik; Ytre-Hauge, Kristian S

    2017-06-01

    The relative biological effectiveness (RBE) of protons varies with the radiation quality, quantified by the linear energy transfer (LET). Most phenomenological models employ a linear dependency of the dose-averaged LET (LET d ) to calculate the biological dose. However, several experiments have indicated a possible non-linear trend. Our aim was to investigate if biological dose models including non-linear LET dependencies should be considered, by introducing a LET spectrum based dose model. The RBE-LET relationship was investigated by fitting of polynomials from 1st to 5th degree to a database of 85 data points from aerobic in vitro experiments. We included both unweighted and weighted regression, the latter taking into account experimental uncertainties. Statistical testing was performed to decide whether higher degree polynomials provided better fits to the data as compared to lower degrees. The newly developed models were compared to three published LET d based models for a simulated spread out Bragg peak (SOBP) scenario. The statistical analysis of the weighted regression analysis favored a non-linear RBE-LET relationship, with the quartic polynomial found to best represent the experimental data (P = 0.010). The results of the unweighted regression analysis were on the borderline of statistical significance for non-linear functions (P = 0.053), and with the current database a linear dependency could not be rejected. For the SOBP scenario, the weighted non-linear model estimated a similar mean RBE value (1.14) compared to the three established models (1.13-1.17). The unweighted model calculated a considerably higher RBE value (1.22). The analysis indicated that non-linear models could give a better representation of the RBE-LET relationship. However, this is not decisive, as inclusion of the experimental uncertainties in the regression analysis had a significant impact on the determination and ranking of the models. As differences between the models were observed for the SOBP scenario, both non-linear LET spectrum- and linear LET d based models should be further evaluated in clinically realistic scenarios. © 2017 American Association of Physicists in Medicine.

  6. Confirmatory factor analysis of the female sexual function index.

    PubMed

    Opperman, Emily A; Benson, Lindsay E; Milhausen, Robin R

    2013-01-01

    The Female Sexual Functioning Index (Rosen et al., 2000 ) was designed to assess the key dimensions of female sexual functioning using six domains: desire, arousal, lubrication, orgasm, satisfaction, and pain. A full-scale score was proposed to represent women's overall sexual function. The fifth revision to the Diagnostic and Statistical Manual (DSM) is currently underway and includes a proposal to combine desire and arousal problems. The objective of this article was to evaluate and compare four models of the Female Sexual Functioning Index: (a) single-factor model, (b) six-factor model, (c) second-order factor model, and (4) five-factor model combining the desire and arousal subscales. Cross-sectional and observational data from 85 women were used to conduct a confirmatory factor analysis on the Female Sexual Functioning Index. Local and global goodness-of-fit measures, the chi-square test of differences, squared multiple correlations, and regression weights were used. The single-factor model fit was not acceptable. The original six-factor model was confirmed, and good model fit was found for the second-order and five-factor models. Delta chi-square tests of differences supported best fit for the six-factor model validating usage of the six domains. However, when revisions are made to the DSM-5, the Female Sexual Functioning Index can adapt to reflect these changes and remain a valid assessment tool for women's sexual functioning, as the five-factor structure was also supported.

  7. Analysis of Binary Adherence Data in the Setting of Polypharmacy: A Comparison of Different Approaches

    PubMed Central

    Esserman, Denise A.; Moore, Charity G.; Roth, Mary T.

    2009-01-01

    Older community dwelling adults often take multiple medications for numerous chronic diseases. Non-adherence to these medications can have a large public health impact. Therefore, the measurement and modeling of medication adherence in the setting of polypharmacy is an important area of research. We apply a variety of different modeling techniques (standard linear regression; weighted linear regression; adjusted linear regression; naïve logistic regression; beta-binomial (BB) regression; generalized estimating equations (GEE)) to binary medication adherence data from a study in a North Carolina based population of older adults, where each medication an individual was taking was classified as adherent or non-adherent. In addition, through simulation we compare these different methods based on Type I error rates, bias, power, empirical 95% coverage, and goodness of fit. We find that estimation and inference using GEE is robust to a wide variety of scenarios and we recommend using this in the setting of polypharmacy when adherence is dichotomously measured for multiple medications per person. PMID:20414358

  8. Calibrated Multivariate Regression with Application to Neural Semantic Basis Discovery.

    PubMed

    Liu, Han; Wang, Lie; Zhao, Tuo

    2015-08-01

    We propose a calibrated multivariate regression method named CMR for fitting high dimensional multivariate regression models. Compared with existing methods, CMR calibrates regularization for each regression task with respect to its noise level so that it simultaneously attains improved finite-sample performance and tuning insensitiveness. Theoretically, we provide sufficient conditions under which CMR achieves the optimal rate of convergence in parameter estimation. Computationally, we propose an efficient smoothed proximal gradient algorithm with a worst-case numerical rate of convergence O (1/ ϵ ), where ϵ is a pre-specified accuracy of the objective function value. We conduct thorough numerical simulations to illustrate that CMR consistently outperforms other high dimensional multivariate regression methods. We also apply CMR to solve a brain activity prediction problem and find that it is as competitive as a handcrafted model created by human experts. The R package camel implementing the proposed method is available on the Comprehensive R Archive Network http://cran.r-project.org/web/packages/camel/.

  9. A comparative study of generalized linear mixed modelling and artificial neural network approach for the joint modelling of survival and incidence of Dengue patients in Sri Lanka

    NASA Astrophysics Data System (ADS)

    Hapugoda, J. C.; Sooriyarachchi, M. R.

    2017-09-01

    Survival time of patients with a disease and the incidence of that particular disease (count) is frequently observed in medical studies with the data of a clustered nature. In many cases, though, the survival times and the count can be correlated in a way that, diseases that occur rarely could have shorter survival times or vice versa. Due to this fact, joint modelling of these two variables will provide interesting and certainly improved results than modelling these separately. Authors have previously proposed a methodology using Generalized Linear Mixed Models (GLMM) by joining the Discrete Time Hazard model with the Poisson Regression model to jointly model survival and count model. As Aritificial Neural Network (ANN) has become a most powerful computational tool to model complex non-linear systems, it was proposed to develop a new joint model of survival and count of Dengue patients of Sri Lanka by using that approach. Thus, the objective of this study is to develop a model using ANN approach and compare the results with the previously developed GLMM model. As the response variables are continuous in nature, Generalized Regression Neural Network (GRNN) approach was adopted to model the data. To compare the model fit, measures such as root mean square error (RMSE), absolute mean error (AME) and correlation coefficient (R) were used. The measures indicate the GRNN model fits the data better than the GLMM model.

  10. 4D-Fingerprint Categorical QSAR Models for Skin Sensitization Based on Classification Local Lymph Node Assay Measures

    PubMed Central

    Li, Yi; Tseng, Yufeng J.; Pan, Dahua; Liu, Jianzhong; Kern, Petra S.; Gerberick, G. Frank; Hopfinger, Anton J.

    2008-01-01

    Currently, the only validated methods to identify skin sensitization effects are in vivo models, such as the Local Lymph Node Assay (LLNA) and guinea pig studies. There is a tremendous need, in particular due to novel legislation, to develop animal alternatives, eg. Quantitative Structure-Activity Relationship (QSAR) models. Here, QSAR models for skin sensitization using LLNA data have been constructed. The descriptors used to generate these models are derived from the 4D-molecular similarity paradigm and are referred to as universal 4D-fingerprints. A training set of 132 structurally diverse compounds and a test set of 15 structurally diverse compounds were used in this study. The statistical methodologies used to build the models are logistic regression (LR), and partial least square coupled logistic regression (PLS-LR), which prove to be effective tools for studying skin sensitization measures expressed in the two categorical terms of sensitizer and non-sensitizer. QSAR models with low values of the Hosmer-Lemeshow goodness-of-fit statistic, χHL2, are significant and predictive. For the training set, the cross-validated prediction accuracy of the logistic regression models ranges from 77.3% to 78.0%, while that of PLS-logistic regression models ranges from 87.1% to 89.4%. For the test set, the prediction accuracy of logistic regression models ranges from 80.0%-86.7%, while that of PLS-logistic regression models ranges from 73.3%-80.0%. The QSAR models are made up of 4D-fingerprints related to aromatic atoms, hydrogen bond acceptors and negatively partially charged atoms. PMID:17226934

  11. Relationship between Urbanization and Cancer Incidence in Iran Using Quantile Regression.

    PubMed

    Momenyan, Somayeh; Sadeghifar, Majid; Sarvi, Fatemeh; Khodadost, Mahmoud; Mosavi-Jarrahi, Alireza; Ghaffari, Mohammad Ebrahim; Sekhavati, Eghbal

    2016-01-01

    Quantile regression is an efficient method for predicting and estimating the relationship between explanatory variables and percentile points of the response distribution, particularly for extreme percentiles of the distribution. To study the relationship between urbanization and cancer morbidity, we here applied quantile regression. This cross-sectional study was conducted for 9 cancers in 345 cities in 2007 in Iran. Data were obtained from the Ministry of Health and Medical Education and the relationship between urbanization and cancer morbidity was investigated using quantile regression and least square regression. Fitting models were compared using AIC criteria. R (3.0.1) software and the Quantreg package were used for statistical analysis. With the quantile regression model all percentiles for breast, colorectal, prostate, lung and pancreas cancers demonstrated increasing incidence rate with urbanization. The maximum increase for breast cancer was in the 90th percentile (β=0.13, p-value<0.001), for colorectal cancer was in the 75th percentile (β=0.048, p-value<0.001), for prostate cancer the 95th percentile (β=0.55, p-value<0.001), for lung cancer was in 95th percentile (β=0.52, p-value=0.006), for pancreas cancer was in 10th percentile (β=0.011, p-value<0.001). For gastric, esophageal and skin cancers, with increasing urbanization, the incidence rate was decreased. The maximum decrease for gastric cancer was in the 90th percentile(β=0.003, p-value<0.001), for esophageal cancer the 95th (β=0.04, p-value=0.4) and for skin cancer also the 95th (β=0.145, p-value=0.071). The AIC showed that for upper percentiles, the fitting of quantile regression was better than least square regression. According to the results of this study, the significant impact of urbanization on cancer morbidity requirs more effort and planning by policymakers and administrators in order to reduce risk factors such as pollution in urban areas and ensure proper nutrition recommendations are made.

  12. Spatio-temporal variations of nitric acid total columns from 9 years of IASI measurements - a driver study

    NASA Astrophysics Data System (ADS)

    Ronsmans, Gaétane; Wespes, Catherine; Hurtmans, Daniel; Clerbaux, Cathy; Coheur, Pierre-François

    2018-04-01

    This study aims to understand the spatial and temporal variability of HNO3 total columns in terms of explanatory variables. To achieve this, multiple linear regressions are used to fit satellite-derived time series of HNO3 daily averaged total columns. First, an analysis of the IASI 9-year time series (2008-2016) is conducted based on various equivalent latitude bands. The strong and systematic denitrification of the southern polar stratosphere is observed very clearly. It is also possible to distinguish, within the polar vortex, three regions which are differently affected by the denitrification. Three exceptional denitrification episodes in 2011, 2014 and 2016 are also observed in the Northern Hemisphere, due to unusually low arctic temperatures. The time series are then fitted by multivariate regressions to identify what variables are responsible for HNO3 variability in global distributions and time series, and to quantify their respective influence. Out of an ensemble of proxies (annual cycle, solar flux, quasi-biennial oscillation, multivariate ENSO index, Arctic and Antarctic oscillations and volume of polar stratospheric clouds), only the those defined as significant (p value < 0.05) by a selection algorithm are retained for each equivalent latitude band. Overall, the regression gives a good representation of HNO3 variability, with especially good results at high latitudes (60-80 % of the observed variability explained by the model). The regressions show the dominance of annual variability in all latitudinal bands, which is related to specific chemistry and dynamics depending on the latitudes. We find that the polar stratospheric clouds (PSCs) also have a major influence in the polar regions, and that their inclusion in the model improves the correlation coefficients and the residuals. However, there is still a relatively large portion of HNO3 variability that remains unexplained by the model, especially in the intertropical regions, where factors not included in the regression model (such as vegetation fires or lightning) may be at play.

  13. Predicting exposure-response associations of ambient particulate matter with mortality in 73 Chinese cities.

    PubMed

    Madaniyazi, Lina; Guo, Yuming; Chen, Renjie; Kan, Haidong; Tong, Shilu

    2016-01-01

    Estimating the burden of mortality associated with particulates requires knowledge of exposure-response associations. However, the evidence on exposure-response associations is limited in many cities, especially in developing countries. In this study, we predicted associations of particulates smaller than 10 μm in aerodynamic diameter (PM10) with mortality in 73 Chinese cities. The meta-regression model was used to test and quantify which city-specific characteristics contributed significantly to the heterogeneity of PM10-mortality associations for 16 Chinese cities. Then, those city-specific characteristics with statistically significant regression coefficients were treated as independent variables to build multivariate meta-regression models. The model with the best fitness was used to predict PM10-mortality associations in 73 Chinese cities in 2010. Mean temperature, PM10 concentration and green space per capita could best explain the heterogeneity in PM10-mortality associations. Based on city-specific characteristics, we were able to develop multivariate meta-regression models to predict associations between air pollutants and health outcomes reasonably well. Copyright © 2015 Elsevier Ltd. All rights reserved.

  14. Application of zero-inflated poisson mixed models in prognostic factors of hepatitis C.

    PubMed

    Akbarzadeh Baghban, Alireza; Pourhoseingholi, Asma; Zayeri, Farid; Jafari, Ali Akbar; Alavian, Seyed Moayed

    2013-01-01

    In recent years, hepatitis C virus (HCV) infection represents a major public health problem. Evaluation of risk factors is one of the solutions which help protect people from the infection. This study aims to employ zero-inflated Poisson mixed models to evaluate prognostic factors of hepatitis C. The data was collected from a longitudinal study during 2005-2010. First, mixed Poisson regression (PR) model was fitted to the data. Then, a mixed zero-inflated Poisson model was fitted with compound Poisson random effects. For evaluating the performance of the proposed mixed model, standard errors of estimators were compared. The results obtained from mixed PR showed that genotype 3 and treatment protocol were statistically significant. Results of zero-inflated Poisson mixed model showed that age, sex, genotypes 2 and 3, the treatment protocol, and having risk factors had significant effects on viral load of HCV patients. Of these two models, the estimators of zero-inflated Poisson mixed model had the minimum standard errors. The results showed that a mixed zero-inflated Poisson model was the almost best fit. The proposed model can capture serial dependence, additional overdispersion, and excess zeros in the longitudinal count data.

  15. Alcohol-related predictors of adolescent driving: gender differences in crashes and offenses.

    PubMed

    Shope, J T; Waller, P F; Lang, S W

    1996-11-01

    Demographic and alcohol-related data collected from eight-grade students (age 13 years) were used in logistic regression to predict subsequent first-year driving crashes and offenses (age 17 years). For young men's crashes and offenses, good-fitting models used living situation (both parents or not), parents' attitude about teen drinking (negative or neutral), and the interaction term. Young men who lived with both parents and reported negative parental attitudes regarding teen drinking were less likely to have crashes and offenses. For young women's crashes, a good-fitting model included friends' involvement with alcohol. Young women who reported that their friends were not involved with alcohol were least likely to have crashes. No model predicting young women's offenses emerged.

  16. Analysis of Sting Balance Calibration Data Using Optimized Regression Models

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert; Bader, Jon B.

    2009-01-01

    Calibration data of a wind tunnel sting balance was processed using a search algorithm that identifies an optimized regression model for the data analysis. The selected sting balance had two moment gages that were mounted forward and aft of the balance moment center. The difference and the sum of the two gage outputs were fitted in the least squares sense using the normal force and the pitching moment at the balance moment center as independent variables. The regression model search algorithm predicted that the difference of the gage outputs should be modeled using the intercept and the normal force. The sum of the two gage outputs, on the other hand, should be modeled using the intercept, the pitching moment, and the square of the pitching moment. Equations of the deflection of a cantilever beam are used to show that the search algorithm s two recommended math models can also be obtained after performing a rigorous theoretical analysis of the deflection of the sting balance under load. The analysis of the sting balance calibration data set is a rare example of a situation when regression models of balance calibration data can directly be derived from first principles of physics and engineering. In addition, it is interesting to see that the search algorithm recommended the same regression models for the data analysis using only a set of statistical quality metrics.

  17. A spatially explicit approach to the study of socio-demographic inequality in the spatial distribution of trees across Boston neighborhoods

    PubMed Central

    Duncan, Dustin T.; Kawachi, Ichiro; Kum, Susan; Aldstadt, Jared; Piras, Gianfranco; Matthews, Stephen A.; Arbia, Giuseppe; Castro, Marcia C.; White, Kellee; Williams, David R.

    2017-01-01

    The racial/ethnic and income composition of neighborhoods often influences local amenities, including the potential spatial distribution of trees, which are important for population health and community wellbeing, particularly in urban areas. This ecological study used spatial analytical methods to assess the relationship between neighborhood socio-demographic characteristics (i.e. minority racial/ethnic composition and poverty) and tree density at the census tact level in Boston, Massachusetts (US). We examined spatial autocorrelation with the Global Moran’s I for all study variables and in the ordinary least squares (OLS) regression residuals as well as computed Spearman correlations non-adjusted and adjusted for spatial autocorrelation between socio-demographic characteristics and tree density. Next, we fit traditional regressions (i.e. OLS regression models) and spatial regressions (i.e. spatial simultaneous autoregressive models), as appropriate. We found significant positive spatial autocorrelation for all neighborhood socio-demographic characteristics (Global Moran’s I range from 0.24 to 0.86, all P=0.001), for tree density (Global Moran’s I=0.452, P=0.001), and in the OLS regression residuals (Global Moran’s I range from 0.32 to 0.38, all P<0.001). Therefore, we fit the spatial simultaneous autoregressive models. There was a negative correlation between neighborhood percent non-Hispanic Black and tree density (rS=−0.19; conventional P-value=0.016; spatially adjusted P-value=0.299) as well as a negative correlation between predominantly non-Hispanic Black (over 60% Black) neighborhoods and tree density (rS=−0.18; conventional P-value=0.019; spatially adjusted P-value=0.180). While the conventional OLS regression model found a marginally significant inverse relationship between Black neighborhoods and tree density, we found no statistically significant relationship between neighborhood socio-demographic composition and tree density in the spatial regression models. Methodologically, our study suggests the need to take into account spatial autocorrelation as findings/conclusions can change when the spatial autocorrelation is ignored. Substantively, our findings suggest no need for policy intervention vis-à-vis trees in Boston, though we hasten to add that replication studies, and more nuanced data on tree quality, age and diversity are needed. PMID:29354668

  18. Forecast model applications of retrieved three dimensional liquid water fields

    NASA Technical Reports Server (NTRS)

    Raymond, William H.; Olson, William S.

    1990-01-01

    Forecasts are made for tropical storm Emily using heating rates derived from the SSM/I physical retrievals described in chapters 2 and 3. Average values of the latent heating rates from the convective and stratiform cloud simulations, used in the physical retrieval, are obtained for individual 1.1 km thick vertical layers. Then, the layer-mean latent heating rates are regressed against the slant path-integrated liquid and ice precipitation water contents to determine the best fit two parameter regression coefficients for each layer. The regression formulae and retrieved precipitation water contents are utilized to infer the vertical distribution of heating rates for forecast model applications. In the forecast model, diabatic temperature contributions are calculated and used in a diabatic initialization, or in a diabatic initialization combined with a diabatic forcing procedure. Our forecasts show that the time needed to spin-up precipitation processes in tropical storm Emily is greatly accelerated through the application of the data.

  19. Patterns of medicinal plant use: an examination of the Ecuadorian Shuar medicinal flora using contingency table and binomial analyses.

    PubMed

    Bennett, Bradley C; Husby, Chad E

    2008-03-28

    Botanical pharmacopoeias are non-random subsets of floras, with some taxonomic groups over- or under-represented. Moerman [Moerman, D.E., 1979. Symbols and selectivity: a statistical analysis of Native American medical ethnobotany, Journal of Ethnopharmacology 1, 111-119] introduced linear regression/residual analysis to examine these patterns. However, regression, the commonly-employed analysis, suffers from several statistical flaws. We use contingency table and binomial analyses to examine patterns of Shuar medicinal plant use (from Amazonian Ecuador). We first analyzed the Shuar data using Moerman's approach, modified to better meet requirements of linear regression analysis. Second, we assessed the exact randomization contingency table test for goodness of fit. Third, we developed a binomial model to test for non-random selection of plants in individual families. Modified regression models (which accommodated assumptions of linear regression) reduced R(2) to from 0.59 to 0.38, but did not eliminate all problems associated with regression analyses. Contingency table analyses revealed that the entire flora departs from the null model of equal proportions of medicinal plants in all families. In the binomial analysis, only 10 angiosperm families (of 115) differed significantly from the null model. These 10 families are largely responsible for patterns seen at higher taxonomic levels. Contingency table and binomial analyses offer an easy and statistically valid alternative to the regression approach.

  20. Analysis of volumetric response of pituitary adenomas receiving adjuvant CyberKnife stereotactic radiosurgery with the application of an exponential fitting model

    PubMed Central

    Yu, Yi-Lin; Yang, Yun-Ju; Lin, Chin; Hsieh, Chih-Chuan; Li, Chiao-Zhu; Feng, Shao-Wei; Tang, Chi-Tun; Chung, Tzu-Tsao; Ma, Hsin-I; Chen, Yuan-Hao; Ju, Da-Tong; Hueng, Dueng-Yuan

    2017-01-01

    Abstract Tumor control rates of pituitary adenomas (PAs) receiving adjuvant CyberKnife stereotactic radiosurgery (CK SRS) are high. However, there is currently no uniform way to estimate the time course of the disease. The aim of this study was to analyze the volumetric responses of PAs after CK SRS and investigate the application of an exponential decay model in calculating an accurate time course and estimation of the eventual outcome. A retrospective review of 34 patients with PAs who received adjuvant CK SRS between 2006 and 2013 was performed. Tumor volume was calculated using the planimetric method. The percent change in tumor volume and tumor volume rate of change were compared at median 4-, 10-, 20-, and 36-month intervals. Tumor responses were classified as: progression for >15% volume increase, regression for ≤15% decrease, and stabilization for ±15% of the baseline volume at the time of last follow-up. For each patient, the volumetric change versus time was fitted with an exponential model. The overall tumor control rate was 94.1% in the 36-month (range 18–87 months) follow-up period (mean volume change of −43.3%). Volume regression (mean decrease of −50.5%) was demonstrated in 27 (79%) patients, tumor stabilization (mean change of −3.7%) in 5 (15%) patients, and tumor progression (mean increase of 28.1%) in 2 (6%) patients (P = 0.001). Tumors that eventually regressed or stabilized had a temporary volume increase of 1.07% and 41.5% at 4 months after CK SRS, respectively (P = 0.017). The tumor volume estimated using the exponential fitting equation demonstrated high positive correlation with the actual volume calculated by magnetic resonance imaging (MRI) as tested by Pearson correlation coefficient (0.9). Transient progression of PAs post-CK SRS was seen in 62.5% of the patients receiving CK SRS, and it was not predictive of eventual volume regression or progression. A three-point exponential model is of potential predictive value according to relative distribution. An exponential decay model can be used to calculate the time course of tumors that are ultimately controlled. PMID:28121913

  1. Nonlinear Constitutive Modeling of Piezoelectric Ceramics

    NASA Astrophysics Data System (ADS)

    Xu, Jia; Li, Chao; Wang, Haibo; Zhu, Zhiwen

    2017-12-01

    Nonlinear constitutive modeling of piezoelectric ceramics is discussed in this paper. Van der Pol item is introduced to explain the simple hysteretic curve. Improved nonlinear difference items are used to interpret the hysteresis phenomena of piezoelectric ceramics. The fitting effect of the model on experimental data is proved by the partial least-square regression method. The results show that this method can describe the real curve well. The results of this paper are helpful to piezoelectric ceramics constitutive modeling.

  2. Spatially Explicit Estimates of Suspended Sediment and Bedload Transport Rates for Western Oregon and Northwestern California

    NASA Astrophysics Data System (ADS)

    O'Connor, J. E.; Wise, D. R.; Mangano, J.; Jones, K.

    2015-12-01

    Empirical analyses of suspended sediment and bedload transport gives estimates of sediment flux for western Oregon and northwestern California. The estimates of both bedload and suspended load are from regression models relating measured annual sediment yield to geologic, physiographic, and climatic properties of contributing basins. The best models include generalized geology and either slope or precipitation. The best-fit suspended-sediment model is based on basin geology, precipitation, and area of recent wildfire. It explains 65% of the variance for 68 suspended sediment measurement sites within the model area. Predicted suspended sediment yields range from no yield from the High Cascades geologic province to 200 tonnes/ km2-yr in the northern Oregon Coast Range and 1000 tonnes/km2-yr in recently burned areas of the northern Klamath terrain. Bed-material yield is similarly estimated from a regression model based on 22 sites of measured bed-material transport, mostly from reservoir accumulation analyses but also from several bedload measurement programs. The resulting best-fit regression is based on basin slope and the presence/absence of the Klamath geologic terrane. For the Klamath terrane, bed-material yield is twice that of the other geologic provinces. This model explains more than 80% of the variance of the better-quality measurements. Predicted bed-material yields range up to 350 tonnes/ km2-yr in steep areas of the Klamath terrane. Applying these regressions to small individual watersheds (mean size; 66 km2 for bed-material; 3 km2 for suspended sediment) and cumulating totals down the hydrologic network (but also decreasing the bed-material flux by experimentally determined attrition rates) gives spatially explicit estimates of both bed-material and suspended sediment flux. This enables assessment of several management issues, including the effects of dams on bedload transport, instream gravel mining, habitat formation processes, and water-quality. The combined fluxes can also be compared to long-term rock uplift and cosmogenically determined landscape erosion rates.

  3. Social Inequality and Labor Force Participation.

    ERIC Educational Resources Information Center

    King, Jonathan

    The labor force participation rates of whites, blacks, and Spanish-Americans, grouped by sex, are explained in a linear regression model fitted with 1970 U. S. Census data on Standard Metropolitan Statistical Area (SMSA). The explanatory variables are: average age, average years of education, vocational training rate, disabled rate, unemployment…

  4. Optimizing Treatment of Lung Cancer Patients with Comorbidities

    DTIC Science & Technology

    2017-10-01

    of treatment options, comorbid illness, age, sex , histology, and tumor size. We will simulate base case scenarios for stage I NSCLC for all possible...fitting adjusted logistic regression models controlling for age, sex and cancer stage. Results Overall, 5,644 (80.4%) and 1,377 (19.6%) patients

  5. Spillover in the Academy: Marriage Stability and Faculty Evaluations.

    ERIC Educational Resources Information Center

    Ludlow, Larry H.; Alvarez-Salvat, Rose M.

    2001-01-01

    Studied the spillover between family and work by examining the link between marital status and work performance across marriage, divorce, and remarriage. A polynomial regression model was fit to the data from 78 evaluations of an individual professor, and a cubic curve through the 3 periods was statistically significant. (SLD)

  6. Student and School SES, Gender, Strategy Use, and Achievement

    ERIC Educational Resources Information Center

    Callan, Gregory L.; Marchant, Gregory J.; Finch, W. Holmes; Flegge, Lindsay

    2017-01-01

    A multilevel mediated regression model was fit to Programme for International Student Assessment achievement, strategy use, gender, and family- and school-level socioeconomic status (SES). Two metacognitive strategies (i.e., understanding and summarizing) and one learning strategy (i.e., control strategies) were found to relate significantly and…

  7. COMPARING THE IMPAIRMENT PROFILES OF OLDER DRIVERS AND NON-DRIVERS: TOWARD THE DEVELOPMENT OF A FITNESS-TO-DRIVE MODEL

    PubMed Central

    Antin, Jonathan F.; Stanley, Laura M.; Guo, Feng

    2011-01-01

    The purpose of this research effort was to compare older driver and non-driver functional impairment profiles across some 60 assessment metrics in an initial effort to contribute to the development of fitness-to-drive assessment models. Of the metrics evaluated, 21 showed statistically significant differences, almost all favoring the drivers. Also, it was shown that a logistic regression model comprised of five of the assessment scores could completely and accurately separate the two groups. The results of this study imply that older drivers are far less functionally impaired than non-drivers of similar ages, and that a parsimonious model can accurately assign individuals to either group. With such models, any driver classified or diagnosed as a non-driver would be a strong candidate for further investigation and intervention. PMID:22058607

  8. [Prediction model of health workforce and beds in county hospitals of Hunan by multiple linear regression].

    PubMed

    Ling, Ru; Liu, Jiawang

    2011-12-01

    To construct prediction model for health workforce and hospital beds in county hospitals of Hunan by multiple linear regression. We surveyed 16 counties in Hunan with stratified random sampling according to uniform questionnaires,and multiple linear regression analysis with 20 quotas selected by literature view was done. Independent variables in the multiple linear regression model on medical personnels in county hospitals included the counties' urban residents' income, crude death rate, medical beds, business occupancy, professional equipment value, the number of devices valued above 10 000 yuan, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, and utilization rate of hospital beds. Independent variables in the multiple linear regression model on county hospital beds included the the population of aged 65 and above in the counties, disposable income of urban residents, medical personnel of medical institutions in county area, business occupancy, the total value of professional equipment, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, utilization rate of hospital beds, and length of hospitalization. The prediction model shows good explanatory and fitting, and may be used for short- and mid-term forecasting.

  9. Data error and highly parameterized groundwater models

    USGS Publications Warehouse

    Hill, M.C.

    2008-01-01

    Strengths and weaknesses of highly parameterized models, in which the number of parameters exceeds the number of observations, are demonstrated using a synthetic test case. Results suggest that the approach can yield close matches to observations but also serious errors in system representation. It is proposed that avoiding the difficulties of highly parameterized models requires close evaluation of: (1) model fit, (2) performance of the regression, and (3) estimated parameter distributions. Comparisons to hydrogeologic information are expected to be critical to obtaining credible models. Copyright ?? 2008 IAHS Press.

  10. Method development estimating ambient mercury concentration from monitored mercury wet deposition

    NASA Astrophysics Data System (ADS)

    Chen, S. M.; Qiu, X.; Zhang, L.; Yang, F.; Blanchard, P.

    2013-05-01

    Speciated atmospheric mercury data have recently been monitored at multiple locations in North America; but the spatial coverage is far less than the long-established mercury wet deposition network. The present study describes a first attempt linking ambient concentration with wet deposition using Beta distribution fitting of a ratio estimate. The mean, median, mode, standard deviation, and skewness of the fitted Beta distribution parameters were generated using data collected in 2009 at 11 monitoring stations. Comparing the normalized histogram and the fitted density function, the empirical and fitted Beta distribution of the ratio shows a close fit. The estimated ambient mercury concentration was further partitioned into reactive gaseous mercury and particulate bound mercury using linear regression model developed by Amos et al. (2012). The method presented here can be used to roughly estimate mercury ambient concentration at locations and/or times where such measurement is not available but where wet deposition is monitored.

  11. Associations between different components of fitness and fatness with academic performance in Chilean youths.

    PubMed

    Olivares, Pedro R; García-Rubio, Javier

    2016-01-01

    To analyze the associations between different components of fitness and fatness with academic performance, adjusting the analysis by sex, age, socio-economic status, region and school type in a Chilean sample. Data of fitness, fatness and academic performance was obtained from the Chilean System for the Assessment of Educational Quality test for eighth grade in 2011 and includes a sample of 18,746 subjects (49% females). Partial correlations adjusted by confounders were done to explore association between fitness and fatness components, and between the academic scores. Three unadjusted and adjusted linear regression models were done in order to analyze the associations of variables. Fatness has a negative association with academic performance when Body Mass Index (BMI) and Waist to Height Ratio (WHR) are assessed independently. When BMI and WHR are assessed jointly and adjusted by cofounders, WHR is more associated with academic performance than BMI, and only the association of WHR is positive. For fitness components, strength was the variable most associated with the academic performance. Cardiorespiratory capacity was not associated with academic performance if fatness and other fitness components are included in the model. Fitness and fatness are associated with academic performance. WHR and strength are more related with academic performance than BMI and cardiorespiratory capacity.

  12. Associations between different components of fitness and fatness with academic performance in Chilean youths

    PubMed Central

    2016-01-01

    Objectives To analyze the associations between different components of fitness and fatness with academic performance, adjusting the analysis by sex, age, socio-economic status, region and school type in a Chilean sample. Methods Data of fitness, fatness and academic performance was obtained from the Chilean System for the Assessment of Educational Quality test for eighth grade in 2011 and includes a sample of 18,746 subjects (49% females). Partial correlations adjusted by confounders were done to explore association between fitness and fatness components, and between the academic scores. Three unadjusted and adjusted linear regression models were done in order to analyze the associations of variables. Results Fatness has a negative association with academic performance when Body Mass Index (BMI) and Waist to Height Ratio (WHR) are assessed independently. When BMI and WHR are assessed jointly and adjusted by cofounders, WHR is more associated with academic performance than BMI, and only the association of WHR is positive. For fitness components, strength was the variable most associated with the academic performance. Cardiorespiratory capacity was not associated with academic performance if fatness and other fitness components are included in the model. Conclusions Fitness and fatness are associated with academic performance. WHR and strength are more related with academic performance than BMI and cardiorespiratory capacity. PMID:27761345

  13. Socio-demographic predictors of person-organization fit.

    PubMed

    Merecz-Kot, Dorota; Andysz, Aleksandra

    2017-02-21

    The aim of this study was to explore the relationship between socio-demographic characteristics and the level of complementary and supplementary person-organization fit (P-O fit). The study sample was a group of 600 Polish workers, urban residents aged 19-65. Level of P-O fit was measured using the Subjective Person-Organization Fit Questionnaire by Czarnota-Bojarska. The binomial multivariate logistic regression was applied. The analyzes were performed separately for the men and women. Socio-demographic variables explained small percentage of the outcome variability. Gender differences were found. In the case of men shift work decreased complementary and supplementary fit, while long working hours decreased complementary fit. In the women, age was a stimulant of a complementary fit, involuntary job losses predicted both complementary and supplementary misfit. Additionally, relational responsibilities increased probability of supplementary P-O fit in the men. Going beyond personality and competences as the factors affecting P-O fit will allow development of a more accurate prediction of P-O fit. Int J Occup Med Environ Health 2017;30(1):133-139. This work is available in Open Access model and licensed under a CC BY-NC 3.0 PL license.

  14. Relationship between long working hours and depression in two working populations: a structural equation model approach.

    PubMed

    Amagasa, Takashi; Nakayama, Takeo

    2012-07-01

    To test the hypothesis that relationship reported between long working hours and depression was inconsistent in previous studies because job demand was treated as a confounder. Structural equation modeling was used to construct five models, using work-related factors and depressive mood scale obtained from 218 clerical workers, to test for goodness of fit and was externally validated with data obtained from 1160 sales workers. Multiple logistic regression analysis was also performed. The model that showed that long working hours increased depression risk when job demand was regarded as an intermediate variable was the best fitted model (goodness-of-fit index/root-mean-square error of approximation: 0.981 to 0.996/0.042 to 0.044). The odds ratio for depression risk with work that was high demand and 60 hours or more per week was estimated at 2 to 4 versus work that was low demand and less than 60 hours per week. Long working hours increased depression risk, with job demand being an intermediate variable.

  15. Improving RNA nearest neighbor parameters for helices by going beyond the two-state model.

    PubMed

    Spasic, Aleksandar; Berger, Kyle D; Chen, Jonathan L; Seetin, Matthew G; Turner, Douglas H; Mathews, David H

    2018-06-01

    RNA folding free energy change nearest neighbor parameters are widely used to predict folding stabilities of secondary structures. They were determined by linear regression to datasets of optical melting experiments on small model systems. Traditionally, the optical melting experiments are analyzed assuming a two-state model, i.e. a structure is either complete or denatured. Experimental evidence, however, shows that structures exist in an ensemble of conformations. Partition functions calculated with existing nearest neighbor parameters predict that secondary structures can be partially denatured, which also directly conflicts with the two-state model. Here, a new approach for determining RNA nearest neighbor parameters is presented. Available optical melting data for 34 Watson-Crick helices were fit directly to a partition function model that allows an ensemble of conformations. Fitting parameters were the enthalpy and entropy changes for helix initiation, terminal AU pairs, stacks of Watson-Crick pairs and disordered internal loops. The resulting set of nearest neighbor parameters shows a 38.5% improvement in the sum of residuals in fitting the experimental melting curves compared to the current literature set.

  16. A surrogate model for thermal characteristics of stratospheric airship

    NASA Astrophysics Data System (ADS)

    Zhao, Da; Liu, Dongxu; Zhu, Ming

    2018-06-01

    A simple and accurate surrogate model is extremely needed to reduce the analysis complexity of thermal characteristics for a stratospheric airship. In this paper, a surrogate model based on the Least Squares Support Vector Regression (LSSVR) is proposed. The Gravitational Search Algorithm (GSA) is used to optimize hyper parameters. A novel framework consisting of a preprocessing classifier and two regression models is designed to train the surrogate model. Various temperature datasets of the airship envelope and the internal gas are obtained by a three-dimensional transient model for thermal characteristics. Using these thermal datasets, two-factor and multi-factor surrogate models are trained and several comparison simulations are conducted. Results illustrate that the surrogate models based on LSSVR-GSA have good fitting and generalization abilities. The pre-treated classification strategy proposed in this paper plays a significant role in improving the accuracy of the surrogate model.

  17. Factoring vs linear modeling in rate estimation: a simulation study of relative accuracy.

    PubMed

    Maldonado, G; Greenland, S

    1998-07-01

    A common strategy for modeling dose-response in epidemiology is to transform ordered exposures and covariates into sets of dichotomous indicator variables (that is, to factor the variables). Factoring tends to increase estimation variance, but it also tends to decrease bias and thus may increase or decrease total accuracy. We conducted a simulation study to examine the impact of factoring on the accuracy of rate estimation. Factored and unfactored Poisson regression models were fit to follow-up study datasets that were randomly generated from 37,500 population model forms that ranged from subadditive to supramultiplicative. In the situations we examined, factoring sometimes substantially improved accuracy relative to fitting the corresponding unfactored model, sometimes substantially decreased accuracy, and sometimes made little difference. The difference in accuracy between factored and unfactored models depended in a complicated fashion on the difference between the true and fitted model forms, the strength of exposure and covariate effects in the population, and the study size. It may be difficult in practice to predict when factoring is increasing or decreasing accuracy. We recommend, therefore, that the strategy of factoring variables be supplemented with other strategies for modeling dose-response.

  18. Creating a non-linear total sediment load formula using polynomial best subset regression model

    NASA Astrophysics Data System (ADS)

    Okcu, Davut; Pektas, Ali Osman; Uyumaz, Ali

    2016-08-01

    The aim of this study is to derive a new total sediment load formula which is more accurate and which has less application constraints than the well-known formulae of the literature. 5 most known stream power concept sediment formulae which are approved by ASCE are used for benchmarking on a wide range of datasets that includes both field and flume (lab) observations. The dimensionless parameters of these widely used formulae are used as inputs in a new regression approach. The new approach is called Polynomial Best subset regression (PBSR) analysis. The aim of the PBRS analysis is fitting and testing all possible combinations of the input variables and selecting the best subset. Whole the input variables with their second and third powers are included in the regression to test the possible relation between the explanatory variables and the dependent variable. While selecting the best subset a multistep approach is used that depends on significance values and also the multicollinearity degrees of inputs. The new formula is compared to others in a holdout dataset and detailed performance investigations are conducted for field and lab datasets within this holdout data. Different goodness of fit statistics are used as they represent different perspectives of the model accuracy. After the detailed comparisons are carried out we figured out the most accurate equation that is also applicable on both flume and river data. Especially, on field dataset the prediction performance of the proposed formula outperformed the benchmark formulations.

  19. The relationship between offspring size and fitness: integrating theory and empiricism.

    PubMed

    Rollinson, Njal; Hutchings, Jeffrey A

    2013-02-01

    How parents divide the energy available for reproduction between size and number of offspring has a profound effect on parental reproductive success. Theory indicates that the relationship between offspring size and offspring fitness is of fundamental importance to the evolution of parental reproductive strategies: this relationship predicts the optimal division of resources between size and number of offspring, it describes the fitness consequences for parents that deviate from optimality, and its shape can predict the most viable type of investment strategy in a given environment (e.g., conservative vs. diversified bet-hedging). Many previous attempts to estimate this relationship and the corresponding value of optimal offspring size have been frustrated by a lack of integration between theory and empiricism. In the present study, we draw from C. Smith and S. Fretwell's classic model to explain how a sound estimate of the offspring size--fitness relationship can be derived with empirical data. We evaluate what measures of fitness can be used to model the offspring size--fitness curve and optimal size, as well as which statistical models should and should not be used to estimate offspring size--fitness relationships. To construct the fitness curve, we recommend that offspring fitness be measured as survival up to the age at which the instantaneous rate of offspring mortality becomes random with respect to initial investment. Parental fitness is then expressed in ecologically meaningful, theoretically defensible, and broadly comparable units: the number of offspring surviving to independence. Although logistic and asymptotic regression have been widely used to estimate offspring size-fitness relationships, the former provides relatively unreliable estimates of optimal size when offspring survival and sample sizes are low, and the latter is unreliable under all conditions. We recommend that the Weibull-1 model be used to estimate this curve because it provides modest improvements in prediction accuracy under experimentally relevant conditions.

  20. Reconstruction of missing daily streamflow data using dynamic regression models

    NASA Astrophysics Data System (ADS)

    Tencaliec, Patricia; Favre, Anne-Catherine; Prieur, Clémentine; Mathevet, Thibault

    2015-12-01

    River discharge is one of the most important quantities in hydrology. It provides fundamental records for water resources management and climate change monitoring. Even very short data-gaps in this information can cause extremely different analysis outputs. Therefore, reconstructing missing data of incomplete data sets is an important step regarding the performance of the environmental models, engineering, and research applications, thus it presents a great challenge. The objective of this paper is to introduce an effective technique for reconstructing missing daily discharge data when one has access to only daily streamflow data. The proposed procedure uses a combination of regression and autoregressive integrated moving average models (ARIMA) called dynamic regression model. This model uses the linear relationship between neighbor and correlated stations and then adjusts the residual term by fitting an ARIMA structure. Application of the model to eight daily streamflow data for the Durance river watershed showed that the model yields reliable estimates for the missing data in the time series. Simulation studies were also conducted to evaluate the performance of the procedure.

  1. Modeling vertebrate diversity in Oregon using satellite imagery

    NASA Astrophysics Data System (ADS)

    Cablk, Mary Elizabeth

    Vertebrate diversity was modeled for the state of Oregon using a parametric approach to regression tree analysis. This exploratory data analysis effectively modeled the non-linear relationships between vertebrate richness and phenology, terrain, and climate. Phenology was derived from time-series NOAA-AVHRR satellite imagery for the year 1992 using two methods: principal component analysis and derivation of EROS data center greenness metrics. These two measures of spatial and temporal vegetation condition incorporated the critical temporal element in this analysis. The first three principal components were shown to contain spatial and temporal information about the landscape and discriminated phenologically distinct regions in Oregon. Principal components 2 and 3, 6 greenness metrics, elevation, slope, aspect, annual precipitation, and annual seasonal temperature difference were investigated as correlates to amphibians, birds, all vertebrates, reptiles, and mammals. Variation explained for each regression tree by taxa were: amphibians (91%), birds (67%), all vertebrates (66%), reptiles (57%), and mammals (55%). Spatial statistics were used to quantify the pattern of each taxa and assess validity of resulting predictions from regression tree models. Regression tree analysis was relatively robust against spatial autocorrelation in the response data and graphical results indicated models were well fit to the data.

  2. TG study of the Li0.4Fe2.4Zn0.2O4 ferrite synthesis

    NASA Astrophysics Data System (ADS)

    Lysenko, E. N.; Nikolaev, E. V.; Surzhikov, A. P.

    2016-02-01

    In this paper, the kinetic analysis of Li-Zn ferrite synthesis was studied using thermogravimetry (TG) method through the simultaneous application of non-linear regression to several measurements run at different heating rates (multivariate non-linear regression). Using TG-curves obtained for the four heating rates and Netzsch Thermokinetics software package, the kinetic models with minimal adjustable parameters were selected to quantitatively describe the reaction of Li-Zn ferrite synthesis. It was shown that the experimental TG-curves clearly suggest a two-step process for the ferrite synthesis and therefore a model-fitting kinetic analysis based on multivariate non-linear regressions was conducted. The complex reaction was described by a two-step reaction scheme consisting of sequential reaction steps. It is established that the best results were obtained using the Yander three-dimensional diffusion model at the first stage and Ginstling-Bronstein model at the second step. The kinetic parameters for lithium-zinc ferrite synthesis reaction were found and discussed.

  3. Growth characterisation of intra-thoracic organs of children on CT scans.

    PubMed

    Coulongeat, François; Jarrar, Mohamed-Salah; Thollon, Lionel; Serre, Thierry

    2013-01-01

    This paper analyses the geometry of intra-thoracic organs from computed tomography (CT) scans performed on 20 children aged from 4 months to 16 years. The aim is to find the most reliable measurements to characterise the growth of heart and lungs from CT data. Standard measurements available on chest radiographies are compared with original measurements only available on CT scans. These measurements should characterise the growth of organs as well as the changes in their position relative to the thorax. Measurements were considered as functions of age. Quadratic regression models were fitted to the data. Goodness of fit of the models was then evaluated. Positions of organs relative to the thorax have a high variability compared with their changes with age. The length and volume of the heart and lungs as well as the diameter of the thorax fit well to the models of growth. It could be interesting to study these measurements with a larger sample size in order to define growth standards.

  4. Optimum extrusion-cooking conditions for improving physical properties of fish-cereal based snacks by response surface methodology.

    PubMed

    Singh, R K Ratankumar; Majumdar, Ranendra K; Venkateshwarlu, G

    2014-09-01

    To establish the effect of barrel temperature, screw speed, total moisture and fish flour content on the expansion ratio and bulk density of the fish based extrudates, response surface methodology was adopted in this study. The experiments were optimized using five-levels, four factors central composite design. Analysis of Variance was carried to study the effects of main factors and interaction effects of various factors and regression analysis was carried out to explain the variability. The fitting was done to a second order model with the coded variables for each response. The response surface plots were developed as a function of two independent variables while keeping the other two independent variables at optimal values. Based on the ANOVA, the fitted model confirmed the model fitness for both the dependent variables. Organoleptically highest score was obtained with the combination of temperature-110(0) C, screw speed-480 rpm, moisture-18 % and fish flour-20 %.

  5. Variable-Domain Functional Regression for Modeling ICU Data.

    PubMed

    Gellar, Jonathan E; Colantuoni, Elizabeth; Needham, Dale M; Crainiceanu, Ciprian M

    2014-12-01

    We introduce a class of scalar-on-function regression models with subject-specific functional predictor domains. The fundamental idea is to consider a bivariate functional parameter that depends both on the functional argument and on the width of the functional predictor domain. Both parametric and nonparametric models are introduced to fit the functional coefficient. The nonparametric model is theoretically and practically invariant to functional support transformation, or support registration. Methods were motivated by and applied to a study of association between daily measures of the Intensive Care Unit (ICU) Sequential Organ Failure Assessment (SOFA) score and two outcomes: in-hospital mortality, and physical impairment at hospital discharge among survivors. Methods are generally applicable to a large number of new studies that record a continuous variables over unequal domains.

  6. Physical fitness predicts technical-tactical and time-motion profile in simulated Judo and Brazilian Jiu-Jitsu matches.

    PubMed

    Coswig, Victor S; Gentil, Paulo; Bueno, João C A; Follmer, Bruno; Marques, Vitor A; Del Vecchio, Fabrício B

    2018-01-01

    Among combat sports, Judo and Brazilian Jiu-Jitsu (BJJ) present elevated physical fitness demands from the high-intensity intermittent efforts. However, information regarding how metabolic and neuromuscular physical fitness is associated with technical-tactical performance in Judo and BJJ fights is not available. This study aimed to relate indicators of physical fitness with combat performance variables in Judo and BJJ. The sample consisted of Judo ( n  = 16) and BJJ ( n  = 24) male athletes. At the first meeting, the physical tests were applied and, in the second, simulated fights were performed for later notational analysis. The main findings indicate: (i) high reproducibility of the proposed instrument and protocol used for notational analysis in a mobile device; (ii) differences in the technical-tactical and time-motion patterns between modalities; (iii) performance-related variables are different in Judo and BJJ; and (iv) regression models based on metabolic fitness variables may account for up to 53% of the variances in technical-tactical and/or time-motion variables in Judo and up to 31% in BJJ, whereas neuromuscular fitness models can reach values up to 44 and 73% of prediction in Judo and BJJ, respectively. When all components are combined, they can explain up to 90% of high intensity actions in Judo. In conclusion, performance prediction models in simulated combat indicate that anaerobic, aerobic and neuromuscular fitness variables contribute to explain time-motion variables associated with high intensity and technical-tactical variables in Judo and BJJ fights.

  7. Mixed conditional logistic regression for habitat selection studies.

    PubMed

    Duchesne, Thierry; Fortin, Daniel; Courbin, Nicolas

    2010-05-01

    1. Resource selection functions (RSFs) are becoming a dominant tool in habitat selection studies. RSF coefficients can be estimated with unconditional (standard) and conditional logistic regressions. While the advantage of mixed-effects models is recognized for standard logistic regression, mixed conditional logistic regression remains largely overlooked in ecological studies. 2. We demonstrate the significance of mixed conditional logistic regression for habitat selection studies. First, we use spatially explicit models to illustrate how mixed-effects RSFs can be useful in the presence of inter-individual heterogeneity in selection and when the assumption of independence from irrelevant alternatives (IIA) is violated. The IIA hypothesis states that the strength of preference for habitat type A over habitat type B does not depend on the other habitat types also available. Secondly, we demonstrate the significance of mixed-effects models to evaluate habitat selection of free-ranging bison Bison bison. 3. When movement rules were homogeneous among individuals and the IIA assumption was respected, fixed-effects RSFs adequately described habitat selection by simulated animals. In situations violating the inter-individual homogeneity and IIA assumptions, however, RSFs were best estimated with mixed-effects regressions, and fixed-effects models could even provide faulty conclusions. 4. Mixed-effects models indicate that bison did not select farmlands, but exhibited strong inter-individual variations in their response to farmlands. Less than half of the bison preferred farmlands over forests. Conversely, the fixed-effect model simply suggested an overall selection for farmlands. 5. Conditional logistic regression is recognized as a powerful approach to evaluate habitat selection when resource availability changes. This regression is increasingly used in ecological studies, but almost exclusively in the context of fixed-effects models. Fitness maximization can imply differences in trade-offs among individuals, which can yield inter-individual differences in selection and lead to departure from IIA. These situations are best modelled with mixed-effects models. Mixed-effects conditional logistic regression should become a valuable tool for ecological research.

  8. The association between anthropometric measures and lung function in a population-based study of Canadian adults.

    PubMed

    Rowe, A; Hernandez, P; Kuhle, S; Kirkland, S

    2017-10-01

    Decreased lung function has health impacts beyond diagnosable lung disease. It is therefore important to understand the factors that may influence even small changes in lung function including obesity, physical fitness and physical activity. The aim of this study was to determine the anthropometric measure most useful in examining the association with lung function and to determine how physical activity and physical fitness influence this association. The current study used cross-sectional data on 4662 adults aged 40-79 years from the Canadian Health Measures Survey Cycles 1 and 2. Linear regression models were used to examine the association between the anthropometric and lung function measures (forced expiratory volume in 1 s [FEV 1 ] and forced vital capacity [FVC]); R 2 values were compared among models. Physical fitness and physical activity terms were added to the models and potential confounding was assessed. Models using sum of 5 skinfolds and waist circumference consistently had the highest R 2 values for FEV 1 and FVC, while models using body mass index consistently had among the lowest R 2 values for FEV 1 and FVC and for men and women. Physical activity and physical fitness were confounders of the relationships between waist circumference and the lung function measures. Waist circumference remained a significant predictor of FVC but not FEV 1 after adjustment for physical activity or physical fitness. Waist circumference is an important predictor of lung function. Physical activity and physical fitness should be considered as potential confounders of the relationship between anthropometric measures and lung function. Copyright © 2017. Published by Elsevier Ltd.

  9. Videodensitometric Methods for Cardiac Output Measurements

    NASA Astrophysics Data System (ADS)

    Mischi, Massimo; Kalker, Ton; Korsten, Erik

    2003-12-01

    Cardiac output is often measured by indicator dilution techniques, usually based on dye or cold saline injections. Developments of more stable ultrasound contrast agents (UCA) are leading to new noninvasive indicator dilution methods. However, several problems concerning the interpretation of dilution curves as detected by ultrasound transducers have arisen. This paper presents a method for blood flow measurements based on UCA dilution. Dilution curves are determined by real-time densitometric analysis of the video output of an ultrasound scanner and are automatically fitted by the Local Density Random Walk model. A new fitting algorithm based on multiple linear regression is developed. Calibration, that is, the relation between videodensity and UCA concentration, is modelled by in vitro experimentation. The flow measurement system is validated by in vitro perfusion of SonoVue contrast agent. The results show an accurate dilution curve fit and flow estimation with determination coefficient larger than 0.95 and 0.99, respectively.

  10. Chain pooling to minimize prediction error in subset regression. [Monte Carlo studies using population models

    NASA Technical Reports Server (NTRS)

    Holms, A. G.

    1974-01-01

    Monte Carlo studies using population models intended to represent response surface applications are reported. Simulated experiments were generated by adding pseudo random normally distributed errors to population values to generate observations. Model equations were fitted to the observations and the decision procedure was used to delete terms. Comparison of values predicted by the reduced models with the true population values enabled the identification of deletion strategies that are approximately optimal for minimizing prediction errors.

  11. Aggregating the response in time series regression models, applied to weather-related cardiovascular mortality

    NASA Astrophysics Data System (ADS)

    Masselot, Pierre; Chebana, Fateh; Bélanger, Diane; St-Hilaire, André; Abdous, Belkacem; Gosselin, Pierre; Ouarda, Taha B. M. J.

    2018-07-01

    In environmental epidemiology studies, health response data (e.g. hospitalization or mortality) are often noisy because of hospital organization and other social factors. The noise in the data can hide the true signal related to the exposure. The signal can be unveiled by performing a temporal aggregation on health data and then using it as the response in regression analysis. From aggregated series, a general methodology is introduced to account for the particularities of an aggregated response in a regression setting. This methodology can be used with usually applied regression models in weather-related health studies, such as generalized additive models (GAM) and distributed lag nonlinear models (DLNM). In particular, the residuals are modelled using an autoregressive-moving average (ARMA) model to account for the temporal dependence. The proposed methodology is illustrated by modelling the influence of temperature on cardiovascular mortality in Canada. A comparison with classical DLNMs is provided and several aggregation methods are compared. Results show that there is an increase in the fit quality when the response is aggregated, and that the estimated relationship focuses more on the outcome over several days than the classical DLNM. More precisely, among various investigated aggregation schemes, it was found that an aggregation with an asymmetric Epanechnikov kernel is more suited for studying the temperature-mortality relationship.

  12. Analyzing hospitalization data: potential limitations of Poisson regression.

    PubMed

    Weaver, Colin G; Ravani, Pietro; Oliver, Matthew J; Austin, Peter C; Quinn, Robert R

    2015-08-01

    Poisson regression is commonly used to analyze hospitalization data when outcomes are expressed as counts (e.g. number of days in hospital). However, data often violate the assumptions on which Poisson regression is based. More appropriate extensions of this model, while available, are rarely used. We compared hospitalization data between 206 patients treated with hemodialysis (HD) and 107 treated with peritoneal dialysis (PD) using Poisson regression and compared results from standard Poisson regression with those obtained using three other approaches for modeling count data: negative binomial (NB) regression, zero-inflated Poisson (ZIP) regression and zero-inflated negative binomial (ZINB) regression. We examined the appropriateness of each model and compared the results obtained with each approach. During a mean 1.9 years of follow-up, 183 of 313 patients (58%) were never hospitalized (indicating an excess of 'zeros'). The data also displayed overdispersion (variance greater than mean), violating another assumption of the Poisson model. Using four criteria, we determined that the NB and ZINB models performed best. According to these two models, patients treated with HD experienced similar hospitalization rates as those receiving PD {NB rate ratio (RR): 1.04 [bootstrapped 95% confidence interval (CI): 0.49-2.20]; ZINB summary RR: 1.21 (bootstrapped 95% CI 0.60-2.46)}. Poisson and ZIP models fit the data poorly and had much larger point estimates than the NB and ZINB models [Poisson RR: 1.93 (bootstrapped 95% CI 0.88-4.23); ZIP summary RR: 1.84 (bootstrapped 95% CI 0.88-3.84)]. We found substantially different results when modeling hospitalization data, depending on the approach used. Our results argue strongly for a sound model selection process and improved reporting around statistical methods used for modeling count data. © The Author 2015. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.

  13. Calibration and Data Analysis of the MC-130 Air Balance

    NASA Technical Reports Server (NTRS)

    Booth, Dennis; Ulbrich, N.

    2012-01-01

    Design, calibration, calibration analysis, and intended use of the MC-130 air balance are discussed. The MC-130 balance is an 8.0 inch diameter force balance that has two separate internal air flow systems and one external bellows system. The manual calibration of the balance consisted of a total of 1854 data points with both unpressurized and pressurized air flowing through the balance. A subset of 1160 data points was chosen for the calibration data analysis. The regression analysis of the subset was performed using two fundamentally different analysis approaches. First, the data analysis was performed using a recently developed extension of the Iterative Method. This approach fits gage outputs as a function of both applied balance loads and bellows pressures while still allowing the application of the iteration scheme that is used with the Iterative Method. Then, for comparison, the axial force was also analyzed using the Non-Iterative Method. This alternate approach directly fits loads as a function of measured gage outputs and bellows pressures and does not require a load iteration. The regression models used by both the extended Iterative and Non-Iterative Method were constructed such that they met a set of widely accepted statistical quality requirements. These requirements lead to reliable regression models and prevent overfitting of data because they ensure that no hidden near-linear dependencies between regression model terms exist and that only statistically significant terms are included. Finally, a comparison of the axial force residuals was performed. Overall, axial force estimates obtained from both methods show excellent agreement as the differences of the standard deviation of the axial force residuals are on the order of 0.001 % of the axial force capacity.

  14. Models for forecasting hospital bed requirements in the acute sector.

    PubMed Central

    Farmer, R D; Emami, J

    1990-01-01

    STUDY OBJECTIVE--The aim was to evaluate the current approach to forecasting hospital bed requirements. DESIGN--The study was a time series and regression analysis. The time series for mean duration of stay for general surgery in the age group 15-44 years (1969-1982) was used in the evaluation of different methods of forecasting future values of mean duration of stay and its subsequent use in the formation of hospital bed requirements. RESULTS--It has been suggested that the simple trend fitting approach suffers from model specification error and imposes unjustified restrictions on the data. Time series approach (Box-Jenkins method) was shown to be a more appropriate way of modelling the data. CONCLUSION--The simple trend fitting approach is inferior to the time series approach in modelling hospital bed requirements. PMID:2277253

  15. seawaveQ: an R package providing a model and utilities for analyzing trends in chemical concentrations in streams with a seasonal wave (seawave) and adjustment for streamflow (Q) and other ancillary variables

    USGS Publications Warehouse

    Ryberg, Karen R.; Vecchia, Aldo V.

    2013-01-01

    The seawaveQ R package fits a parametric regression model (seawaveQ) to pesticide concentration data from streamwater samples to assess variability and trends. The model incorporates the strong seasonality and high degree of censoring common in pesticide data and users can incorporate numerous ancillary variables, such as streamflow anomalies. The model is fitted to pesticide data using maximum likelihood methods for censored data and is robust in terms of pesticide, stream location, and degree of censoring of the concentration data. This R package standardizes this methodology for trend analysis, documents the code, and provides help and tutorial information, as well as providing additional utility functions for plotting pesticide and other chemical concentration data.

  16. Combining multiple regression and principal component analysis for accurate predictions for column ozone in Peninsular Malaysia

    NASA Astrophysics Data System (ADS)

    Rajab, Jasim M.; MatJafri, M. Z.; Lim, H. S.

    2013-06-01

    This study encompasses columnar ozone modelling in the peninsular Malaysia. Data of eight atmospheric parameters [air surface temperature (AST), carbon monoxide (CO), methane (CH4), water vapour (H2Ovapour), skin surface temperature (SSKT), atmosphere temperature (AT), relative humidity (RH), and mean surface pressure (MSP)] data set, retrieved from NASA's Atmospheric Infrared Sounder (AIRS), for the entire period (2003-2008) was employed to develop models to predict the value of columnar ozone (O3) in study area. The combined method, which is based on using both multiple regressions combined with principal component analysis (PCA) modelling, was used to predict columnar ozone. This combined approach was utilized to improve the prediction accuracy of columnar ozone. Separate analysis was carried out for north east monsoon (NEM) and south west monsoon (SWM) seasons. The O3 was negatively correlated with CH4, H2Ovapour, RH, and MSP, whereas it was positively correlated with CO, AST, SSKT, and AT during both the NEM and SWM season periods. Multiple regression analysis was used to fit the columnar ozone data using the atmospheric parameter's variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to acquire subsets of the predictor variables to be comprised in the linear regression model of the atmospheric parameter's variables. It was found that the increase in columnar O3 value is associated with an increase in the values of AST, SSKT, AT, and CO and with a drop in the levels of CH4, H2Ovapour, RH, and MSP. The result of fitting the best models for the columnar O3 value using eight of the independent variables gave about the same values of the R (≈0.93) and R2 (≈0.86) for both the NEM and SWM seasons. The common variables that appeared in both regression equations were SSKT, CH4 and RH, and the principal precursor of the columnar O3 value in both the NEM and SWM seasons was SSKT.

  17. Estimating peak-flow frequency statistics for selected gaged and ungaged sites in naturally flowing streams and rivers in Idaho

    USGS Publications Warehouse

    Wood, Molly S.; Fosness, Ryan L.; Skinner, Kenneth D.; Veilleux, Andrea G.

    2016-06-27

    The U.S. Geological Survey, in cooperation with the Idaho Transportation Department, updated regional regression equations to estimate peak-flow statistics at ungaged sites on Idaho streams using recent streamflow (flow) data and new statistical techniques. Peak-flow statistics with 80-, 67-, 50-, 43-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities (1.25-, 1.50-, 2.00-, 2.33-, 5.00-, 10.0-, 25.0-, 50.0-, 100-, 200-, and 500-year recurrence intervals, respectively) were estimated for 192 streamgages in Idaho and bordering States with at least 10 years of annual peak-flow record through water year 2013. The streamgages were selected from drainage basins with little or no flow diversion or regulation. The peak-flow statistics were estimated by fitting a log-Pearson type III distribution to records of annual peak flows and applying two additional statistical methods: (1) the Expected Moments Algorithm to help describe uncertainty in annual peak flows and to better represent missing and historical record; and (2) the generalized Multiple Grubbs Beck Test to screen out potentially influential low outliers and to better fit the upper end of the peak-flow distribution. Additionally, a new regional skew was estimated for the Pacific Northwest and used to weight at-station skew at most streamgages. The streamgages were grouped into six regions (numbered 1_2, 3, 4, 5, 6_8, and 7, to maintain consistency in region numbering with a previous study), and the estimated peak-flow statistics were related to basin and climatic characteristics to develop regional regression equations using a generalized least squares procedure. Four out of 24 evaluated basin and climatic characteristics were selected for use in the final regional peak-flow regression equations.Overall, the standard error of prediction for the regional peak-flow regression equations ranged from 22 to 132 percent. Among all regions, regression model fit was best for region 4 in west-central Idaho (average standard error of prediction=46.4 percent; pseudo-R2>92 percent) and region 5 in central Idaho (average standard error of prediction=30.3 percent; pseudo-R2>95 percent). Regression model fit was poor for region 7 in southern Idaho (average standard error of prediction=103 percent; pseudo-R2<78 percent) compared to other regions because few streamgages in region 7 met the criteria for inclusion in the study, and the region’s semi-arid climate and associated variability in precipitation patterns causes substantial variability in peak flows.A drainage area ratio-adjustment method, using ratio exponents estimated using generalized least-squares regression, was presented as an alternative to the regional regression equations if peak-flow estimates are desired at an ungaged site that is close to a streamgage selected for inclusion in this study. The alternative drainage area ratio-adjustment method is appropriate for use when the drainage area ratio between the ungaged and gaged sites is between 0.5 and 1.5.The updated regional peak-flow regression equations had lower total error (standard error of prediction) than all regression equations presented in a 1982 study and in four of six regions presented in 2002 and 2003 studies in Idaho. A more extensive streamgage screening process used in the current study resulted in fewer streamgages used in the current study than in the 1982, 2002, and 2003 studies. Fewer streamgages used and the selection of different explanatory variables were likely causes of increased error in some regions compared to previous studies, but overall, regional peak‑flow regression model fit was generally improved for Idaho. The revised statistical procedures and increased streamgage screening applied in the current study most likely resulted in a more accurate representation of natural peak-flow conditions.The updated, regional peak-flow regression equations will be integrated in the U.S. Geological Survey StreamStats program to allow users to estimate basin and climatic characteristics and peak-flow statistics at ungaged locations of interest. StreamStats estimates peak-flow statistics with quantifiable certainty only when used at sites with basin and climatic characteristics within the range of input variables used to develop the regional regression equations. Both the regional regression equations and StreamStats should be used to estimate peak-flow statistics only in naturally flowing, relatively unregulated streams without substantial local influences to flow, such as large seeps, springs, or other groundwater-surface water interactions that are not widespread or characteristic of the respective region.

  18. NAT2, meat consumption and colorectal cancer incidence: an ecological study among 27 countries.

    PubMed

    Ognjanovic, Simona; Yamamoto, Jennifer; Maskarinec, Gertraud; Le Marchand, Loïc

    2006-11-01

    The polymorphic gene NAT2 is a major determinant of N-acetyltransferase activity and, thus, may be responsible for differences in one's ability to bioactivate heterocyclic amines, a class of procarcinogens in cooked meat. An unusually marked geographic variation in enzyme activity has been described for NAT2. The present study re-examines the international direct correlation reported for meat intake and colorectal cancer (CRC) incidence, and evaluates the potential modifying effects of NAT2 phenotype and other lifestyle factors on this correlation. Country-specific CRC incidence data, per capita consumption data for meat and other dietary factors, prevalence of the rapid/intermediate NAT2 phenotype, and prevalence of smoking for 27 countries were used. Multiple linear regression models were fit and partial correlation coefficients (PCCs) were computed for men and women separately. Inclusion of the rapid/intermediate NAT2 phenotype with meat consumption improved the fit of the regression model for CRC incidence in both sexes (males-R (2) = 0.78, compared to 0.70 for meat alone; p for difference in model fit-0.009; females-R (2) = 0.76 compared to 0.69 for meat alone; p = 0.02). Vegetable consumption (inversely and in both sexes) and fish consumption (directly and in men only) were also weakly correlated with CRC, whereas smoking prevalence and alcohol consumption had no effects on the models. The PCC between NAT2 and CRC incidence was 0.46 in males and 0.48 in females when meat consumption was included in the model, compared to 0.14 and 0.15, respectively, when it was not. These data suggest that, in combination with meat intake, some proportion of the international variability in CRC incidence may be attributable to genetic susceptibility to heterocyclic amines, as determined by NAT2 genotype.

  19. Identifying model error in metabolic flux analysis - a generalized least squares approach.

    PubMed

    Sokolenko, Stanislav; Quattrociocchi, Marco; Aucoin, Marc G

    2016-09-13

    The estimation of intracellular flux through traditional metabolic flux analysis (MFA) using an overdetermined system of equations is a well established practice in metabolic engineering. Despite the continued evolution of the methodology since its introduction, there has been little focus on validation and identification of poor model fit outside of identifying "gross measurement error". The growing complexity of metabolic models, which are increasingly generated from genome-level data, has necessitated robust validation that can directly assess model fit. In this work, MFA calculation is framed as a generalized least squares (GLS) problem, highlighting the applicability of the common t-test for model validation. To differentiate between measurement and model error, we simulate ideal flux profiles directly from the model, perturb them with estimated measurement error, and compare their validation to real data. Application of this strategy to an established Chinese Hamster Ovary (CHO) cell model shows how fluxes validated by traditional means may be largely non-significant due to a lack of model fit. With further simulation, we explore how t-test significance relates to calculation error and show that fluxes found to be non-significant have 2-4 fold larger error (if measurement uncertainty is in the 5-10 % range). The proposed validation method goes beyond traditional detection of "gross measurement error" to identify lack of fit between model and data. Although the focus of this work is on t-test validation and traditional MFA, the presented framework is readily applicable to other regression analysis methods and MFA formulations.

  20. A menu-driven software package of Bayesian nonparametric (and parametric) mixed models for regression analysis and density estimation.

    PubMed

    Karabatsos, George

    2017-02-01

    Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.

  1. Product unit neural network models for predicting the growth limits of Listeria monocytogenes.

    PubMed

    Valero, A; Hervás, C; García-Gimeno, R M; Zurera, G

    2007-08-01

    A new approach to predict the growth/no growth interface of Listeria monocytogenes as a function of storage temperature, pH, citric acid (CA) and ascorbic acid (AA) is presented. A linear logistic regression procedure was performed and a non-linear model was obtained by adding new variables by means of a Neural Network model based on Product Units (PUNN). The classification efficiency of the training data set and the generalization data of the new Logistic Regression PUNN model (LRPU) were compared with Linear Logistic Regression (LLR) and Polynomial Logistic Regression (PLR) models. 92% of the total cases from the LRPU model were correctly classified, an improvement on the percentage obtained using the PLR model (90%) and significantly higher than the results obtained with the LLR model, 80%. On the other hand predictions of LRPU were closer to data observed which permits to design proper formulations in minimally processed foods. This novel methodology can be applied to predictive microbiology for describing growth/no growth interface of food-borne microorganisms such as L. monocytogenes. The optimal balance is trying to find models with an acceptable interpretation capacity and with good ability to fit the data on the boundaries of variable range. The results obtained conclude that these kinds of models might well be very a valuable tool for mathematical modeling.

  2. [A competency model of rural general practitioners: theory construction and empirical study].

    PubMed

    Yang, Xiu-Mu; Qi, Yu-Long; Shne, Zheng-Fu; Han, Bu-Xin; Meng, Bei

    2015-04-01

    To perform theory construction and empirical study of the competency model of rural general practitioners. Through literature study, job analysis, interviews, and expert team discussion, the questionnaire of rural general practitioners competency was constructed. A total of 1458 rural general practitioners were surveyed by the questionnaire in 6 central provinces. The common factors were constructed using the principal component method of exploratory factor analysis and confirmatory factor analysis. The influence of the competency characteristics on the working performance was analyzed using regression equation analysis. The Cronbach 's alpha coefficient of the questionnaire was 0.974. The model consisted of 9 dimensions and 59 items. The 9 competency dimensions included basic public health service ability, basic clinical skills, system analysis capability, information management capability, communication and cooperation ability, occupational moral ability, non-medical professional knowledge, personal traits and psychological adaptability. The rate of explained cumulative total variance was 76.855%. The model fitting index were Χ(2)/df 1.88, GFI=0.94, NFI=0.96, NNFI=0.98, PNFI=0.91, RMSEA=0.068, CFI=0.97, IFI=0.97, RFI=0.96, suggesting good model fitting. Regression analysis showed that the competency characteristics had a significant effect on job performance. The rural general practitioners competency model provides reference for rural doctor training, rural order directional cultivation of medical students, and competency performance management of the rural general practitioners.

  3. Health Risk Behaviors in a Representative Sample of Bisexual and Heterosexual Female High School Students in Massachusetts

    ERIC Educational Resources Information Center

    White Hughto, Jaclyn M.; Biello, Katie B.; Reisner, Sari L.; Perez-Brumer, Amaya; Heflin, Katherine J.; Mimiaga, Matthew J.

    2016-01-01

    Background: Differences in sexual health-related outcomes by sexual behavior and identity remain underinvestigated among bisexual female adolescents. Methods: Data from girls (N?=?875) who participated in the Massachusetts Youth Risk Behavior Surveillance survey were analyzed. Weighted logistic regression models were fit to examine sexual and…

  4. Psychological Resources as Stress Buffers: Their Relationship to University Students' Anxiety and Depression

    ERIC Educational Resources Information Center

    McCarthy, Christopher J.; Fouladi, Rachel T.; Juncker, Brian D.; Matheny, Kenneth B.

    2006-01-01

    The association of protective resources, personality variables, life events, and gender with anxiety and depression was examined with university students. Building on regression analyses, a structural equation model was generated with good fit, indicating that with respect to both anxiety and depression, negative life events and coping resources…

  5. Accumulation of nucleopolyhedrosis virus of the European pine sawfly (Hymenoptera: Diprionidae) as a function of larval weight

    Treesearch

    M.A. Mohamed; H.C. Coppel; J.D. Podgwaite; W.D. Rollinson

    1983-01-01

    Disease-free larvae of Neodiprion sertifer (Geoffroy) treated with its nucleopolyhedrosis virus in the field and under laboratory conditions showed a high correlation between virus accumulation and body weight. Simple linear regression models were found to fit viral accumulation versus body weight under either circumstance.

  6. Analysis of Radiation Exposure for Troop Observers, Exercise Desert Rock V, Operation Upshot-Knothole.

    DTIC Science & Technology

    1981-04-28

    on initial doses. Residual doses are determined through an automiated procedure that utilizes raw data in regression analyses to fit space-time models...show their relationship to the observer positions. The computer-calculated doses do not reflect the presence of the human body in the radiological

  7. Equilibrium, kinetics and process design of acid yellow 132 adsorption onto red pine sawdust.

    PubMed

    Can, Mustafa

    2015-01-01

    Linear and non-linear regression procedures have been applied to the Langmuir, Freundlich, Tempkin, Dubinin-Radushkevich, and Redlich-Peterson isotherms for adsorption of acid yellow 132 (AY132) dye onto red pine (Pinus resinosa) sawdust. The effects of parameters such as particle size, stirring rate, contact time, dye concentration, adsorption dose, pH, and temperature were investigated, and interaction was characterized by Fourier transform infrared spectroscopy and field emission scanning electron microscope. The non-linear method of the Langmuir isotherm equation was found to be the best fitting model to the equilibrium data. The maximum monolayer adsorption capacity was found as 79.5 mg/g. The calculated thermodynamic results suggested that AY132 adsorption onto red pine sawdust was an exothermic, physisorption, and spontaneous process. Kinetics was analyzed by four different kinetic equations using non-linear regression analysis. The pseudo-second-order equation provides the best fit with experimental data.

  8. A framework for longitudinal data analysis via shape regression

    NASA Astrophysics Data System (ADS)

    Fishbaugh, James; Durrleman, Stanley; Piven, Joseph; Gerig, Guido

    2012-02-01

    Traditional longitudinal analysis begins by extracting desired clinical measurements, such as volume or head circumference, from discrete imaging data. Typically, the continuous evolution of a scalar measurement is estimated by choosing a 1D regression model, such as kernel regression or fitting a polynomial of fixed degree. This type of analysis not only leads to separate models for each measurement, but there is no clear anatomical or biological interpretation to aid in the selection of the appropriate paradigm. In this paper, we propose a consistent framework for the analysis of longitudinal data by estimating the continuous evolution of shape over time as twice differentiable flows of deformations. In contrast to 1D regression models, one model is chosen to realistically capture the growth of anatomical structures. From the continuous evolution of shape, we can simply extract any clinical measurements of interest. We demonstrate on real anatomical surfaces that volume extracted from a continuous shape evolution is consistent with a 1D regression performed on the discrete measurements. We further show how the visualization of shape progression can aid in the search for significant measurements. Finally, we present an example on a shape complex of the brain (left hemisphere, right hemisphere, cerebellum) that demonstrates a potential clinical application for our framework.

  9. The six-minute walk test predicts cardiorespiratory fitness in individuals with aneurysmal subarachnoid hemorrhage.

    PubMed

    Harmsen, Wouter J; Ribbers, Gerard M; Slaman, Jorrit; Heijenbrok-Kal, Majanka H; Khajeh, Ladbon; van Kooten, Fop; Neggers, Sebastiaan J C M M; van den Berg-Emons, Rita J

    2017-05-01

    Peak oxygen uptake (VO 2peak ) established during progressive cardiopulmonary exercise testing (CPET) is the "gold-standard" for cardiorespiratory fitness. However, CPET measurements may be limited in patients with aneurysmal subarachnoid hemorrhage (a-SAH) by disease-related complaints, such as cardiovascular health-risks or anxiety. Furthermore, CPET with gas-exchange analyses require specialized knowledge and infrastructure with limited availability in most rehabilitation facilities. To determine whether an easy-to-administer six-minute walk test (6MWT) is a valid clinical alternative to progressive CPET in order to predict VO 2peak in individuals with a-SAH. Twenty-seven patients performed the 6MWT and CPET with gas-exchange analyses on a cycle ergometer. Univariate and multivariate regression models were made to investigate the predictability of VO 2peak from the six-minute walk distance (6MWD). Univariate regression showed that the 6MWD was strongly related to VO 2peak (r = 0.75, p < 0.001), with an explained variance of 56% and a prediction error of 4.12 ml/kg/min, representing 18% of mean VO 2peak . Adding age and sex to an extended multivariate regression model improved this relationship (r = 0.82, p < 0.001), with an explained variance of 67% and a prediction error of 3.67 ml/kg/min corresponding to 16% of mean VO 2peak . The 6MWT is an easy-to-administer submaximal exercise test that can be selected to estimate cardiorespiratory fitness at an aggregated level, in groups of patients with a-SAH, which may help to evaluate interventions in a clinical or research setting. However, the relatively large prediction error does not allow for an accurate prediction in individual patients.

  10. Aerobic Fitness Does Not Contribute to Prediction of Orthostatic Intolerance

    NASA Technical Reports Server (NTRS)

    Convertino, Victor A.; Sather, Tom M.; Goldwater, Danielle J.; Alford, William R.

    1986-01-01

    Several investigations have suggested that orthostatic tolerance may be inversely related to aerobic fitness (VO (sub 2max)). To test this hypothesis, 18 males (age 29 to 51 yr) underwent both treadmill VO(sub 2max) determination and graded lower body negative pressures (LBNP) exposure to tolerance. VO(2max) was measured during the last minute of a Bruce treadmill protocol. LBNP was terminated based on pre-syncopal symptoms and LBNP tolerance (peak LBNP) was expressed as the cumulative product of LBNP and time (torr-min). Changes in heart rate, stroke volume cardiac output, blood pressure and impedance rheographic indices of mid-thigh-leg initial accumulation were measured at rest and during the final minute of LBNP. For all 18 subjects, mean (plus or minus SE) fluid accumulation index and leg venous compliance index at peak LBNP were 139 plus or minus 3.9 plus or minus 0.4 ml-torr-min(exp -2) x 10(exp 3), respectively. Pearson product-moment correlations and step-wise linear regression were used to investigate relationships with peak LBNP. Variables associated with endurance training, such as VO(sub 2max) and percent body fat were not found to correlate significantly (P is less than 0.05) with peak LBNP and did not add sufficiently to the prediction of peak LBNP to be included in the step-wise regression model. The step-wise regression model included only fluid accumulation index leg venous compliance index, and blood volume and resulted in a squared multiple correlation coefficient of 0.978. These data do not support the hypothesis that orthostatic tolerance as measured by LBNP is lower in individuals with high aerobic fitness.

  11. Adherence to physical activity in an unsupervised setting: Explanatory variables for high attrition rates among fitness center members.

    PubMed

    Sperandei, Sandro; Vieira, Marcelo C; Reis, Arianne C

    2016-11-01

    To evaluate the attrition rate of members of a fitness center in the city of Rio de Janeiro and the potential explanatory variables for the phenomenon. An exploratory, observational study using a retrospective longitudinal frame. The records of 5240 individuals, members of the fitness center between January-2005 and June-2014, were monitored for 12 months or until cancellation of membership, whichever occurred first. A Cox proportional hazard regression model was adjusted to identify variables associated to higher risk of 'abandonment' of activities. This study was approved by Southern Cross University's Human Research Ethics Committee (approval number: ECN-15-176). The general survival curve shows that 63% of new members will abandon activities before the third month, and less than 4% will remain for more than 12 months of continuous activity. The regression model showed that age, previous level of physical activity, initial body mass index and motivations related to weight loss, hypertrophy, health, and aesthetics are related to risk of abandonment. Combined, those variables represent an important difference in the probability to abandon the gym between individuals with the best and worse combination of variables. Even individuals presenting the best combination of variables still present a high risk of abandonment before completion of 12 months of fitness center membership. Findings can assist in the identification of high risk individuals and therefore help in the development of strategies to prevent abandonment of physical activity practice. Copyright © 2016 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.

  12. Objectively measured sedentary time and academic achievement in schoolchildren.

    PubMed

    Lopes, Luís; Santos, Rute; Mota, Jorge; Pereira, Beatriz; Lopes, Vítor

    2017-03-01

    This study aimed to evaluate the relationship between objectively measured total sedentary time and academic achievement (AA) in Portuguese children. The sample comprised of 213 children (51.6% girls) aged 9.46 ± 0.43 years, from the north of Portugal. Sedentary time was measured with accelerometry, and AA was assessed using the Portuguese Language and Mathematics National Exams results. Multilevel linear regression models were fitted to assess regression coefficients predicting AA. The results showed that objectively measured total sedentary time was not associated with AA, after adjusting for potential confounders.

  13. Data Analysis & Statistical Methods for Command File Errors

    NASA Technical Reports Server (NTRS)

    Meshkat, Leila; Waggoner, Bruce; Bryant, Larry

    2014-01-01

    This paper explains current work on modeling for managing the risk of command file errors. It is focused on analyzing actual data from a JPL spaceflight mission to build models for evaluating and predicting error rates as a function of several key variables. We constructed a rich dataset by considering the number of errors, the number of files radiated, including the number commands and blocks in each file, as well as subjective estimates of workload and operational novelty. We have assessed these data using different curve fitting and distribution fitting techniques, such as multiple regression analysis, and maximum likelihood estimation to see how much of the variability in the error rates can be explained with these. We have also used goodness of fit testing strategies and principal component analysis to further assess our data. Finally, we constructed a model of expected error rates based on the what these statistics bore out as critical drivers to the error rate. This model allows project management to evaluate the error rate against a theoretically expected rate as well as anticipate future error rates.

  14. Separation mechanism of nortriptyline and amytriptyline in RPLC

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gritti, Fabrice; Guiochon, Georges A

    2005-08-01

    The single and the competitive equilibrium isotherms of nortriptyline and amytriptyline were acquired by frontal analysis (FA) on the C{sub 18}-bonded discovery column, using a 28/72 (v/v) mixture of acetonitrile and water buffered with phosphate (20 mM, pH 2.70). The adsorption energy distributions (AED) of each compound were calculated from the raw adsorption data. Both the fitting of the adsorption data using multi-linear regression analysis and the AEDs are consistent with a trimodal isotherm model. The single-component isotherm data fit well to the tri-Langmuir isotherm model. The extension to a competitive two-component tri-Langmuir isotherm model based on the best parametersmore » of the single-component isotherms does not account well for the breakthrough curves nor for the overloaded band profiles measured for mixtures of nortriptyline and amytriptyline. However, it was possible to derive adjusted parameters of a competitive tri-Langmuir model based on the fitting of the adsorption data obtained for these mixtures. A very good agreement was then found between the calculated and the experimental overloaded band profiles of all the mixtures injected.« less

  15. Comparison of hypertabastic survival model with other unimodal hazard rate functions using a goodness-of-fit test.

    PubMed

    Tahir, M Ramzan; Tran, Quang X; Nikulin, Mikhail S

    2017-05-30

    We studied the problem of testing a hypothesized distribution in survival regression models when the data is right censored and survival times are influenced by covariates. A modified chi-squared type test, known as Nikulin-Rao-Robson statistic, is applied for the comparison of accelerated failure time models. This statistic is used to test the goodness-of-fit for hypertabastic survival model and four other unimodal hazard rate functions. The results of simulation study showed that the hypertabastic distribution can be used as an alternative to log-logistic and log-normal distribution. In statistical modeling, because of its flexible shape of hazard functions, this distribution can also be used as a competitor of Birnbaum-Saunders and inverse Gaussian distributions. The results for the real data application are shown. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  16. Changes in relative fit of human heat stress indices to cardiovascular, respiratory, and renal hospitalizations across five Australian urban populations

    NASA Astrophysics Data System (ADS)

    Goldie, James; Alexander, Lisa; Lewis, Sophie C.; Sherwood, Steven C.; Bambrick, Hilary

    2018-03-01

    Various human heat stress indices have been developed to relate atmospheric measures of extreme heat to human health impacts, but the usefulness of different indices across various health impacts and in different populations is poorly understood. This paper determines which heat stress indices best fit hospital admissions for sets of cardiovascular, respiratory, and renal diseases across five Australian cities. We hypothesized that the best indices would be largely dependent on location. We fit parent models to these counts in the summers (November-March) between 2001 and 2013 using negative binomial regression. We then added 15 heat stress indices to these models, ranking their goodness of fit using the Akaike information criterion. Admissions for each health outcome were nearly always higher in hot or humid conditions. Contrary to our hypothesis that location would determine the best-fitting heat stress index, we found that the best indices were related largely by health outcome of interest, rather than location as hypothesized. In particular, heatwave and temperature indices had the best fit to cardiovascular admissions, humidity indices had the best fit to respiratory admissions, and combined heat-humidity indices had the best fit to renal admissions. With a few exceptions, the results were similar across all five cities. The best-fitting heat stress indices appear to be useful across several Australian cities with differing climates, but they may have varying usefulness depending on the outcome of interest. These findings suggest that future research on heat and health impacts, and in particular hospital demand modeling, could better reflect reality if it avoided "all-cause" health outcomes and used heat stress indices appropriate to specific diseases and disease groups.

  17. Changes in relative fit of human heat stress indices to cardiovascular, respiratory, and renal hospitalizations across five Australian urban populations.

    PubMed

    Goldie, James; Alexander, Lisa; Lewis, Sophie C; Sherwood, Steven C; Bambrick, Hilary

    2018-03-01

    Various human heat stress indices have been developed to relate atmospheric measures of extreme heat to human health impacts, but the usefulness of different indices across various health impacts and in different populations is poorly understood. This paper determines which heat stress indices best fit hospital admissions for sets of cardiovascular, respiratory, and renal diseases across five Australian cities. We hypothesized that the best indices would be largely dependent on location. We fit parent models to these counts in the summers (November-March) between 2001 and 2013 using negative binomial regression. We then added 15 heat stress indices to these models, ranking their goodness of fit using the Akaike information criterion. Admissions for each health outcome were nearly always higher in hot or humid conditions. Contrary to our hypothesis that location would determine the best-fitting heat stress index, we found that the best indices were related largely by health outcome of interest, rather than location as hypothesized. In particular, heatwave and temperature indices had the best fit to cardiovascular admissions, humidity indices had the best fit to respiratory admissions, and combined heat-humidity indices had the best fit to renal admissions. With a few exceptions, the results were similar across all five cities. The best-fitting heat stress indices appear to be useful across several Australian cities with differing climates, but they may have varying usefulness depending on the outcome of interest. These findings suggest that future research on heat and health impacts, and in particular hospital demand modeling, could better reflect reality if it avoided "all-cause" health outcomes and used heat stress indices appropriate to specific diseases and disease groups.

  18. An Extended Passive Motion Paradigm for Human-Like Posture and Movement Planning in Redundant Manipulators

    PubMed Central

    Tommasino, Paolo; Campolo, Domenico

    2017-01-01

    A major challenge in robotics and computational neuroscience is relative to the posture/movement problem in presence of kinematic redundancy. We recently addressed this issue using a principled approach which, in conjunction with nonlinear inverse optimization, allowed capturing postural strategies such as Donders' law. In this work, after presenting this general model specifying it as an extension of the Passive Motion Paradigm, we show how, once fitted to capture experimental postural strategies, the model is actually able to also predict movements. More specifically, the passive motion paradigm embeds two main intrinsic components: joint damping and joint stiffness. In previous work we showed that joint stiffness is responsible for static postures and, in this sense, its parameters are regressed to fit to experimental postural strategies. Here, we show how joint damping, in particular its anisotropy, directly affects task-space movements. Rather than using damping parameters to fit a posteriori task-space motions, we make the a priori hypothesis that damping is proportional to stiffness. This remarkably allows a postural-fitted model to also capture dynamic performance such as curvature and hysteresis of task-space trajectories during wrist pointing tasks, confirming and extending previous findings in literature. PMID:29249954

  19. Poisson Regression Analysis of Illness and Injury Surveillance Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Frome E.L., Watkins J.P., Ellis E.D.

    2012-12-12

    The Department of Energy (DOE) uses illness and injury surveillance to monitor morbidity and assess the overall health of the work force. Data collected from each participating site include health events and a roster file with demographic information. The source data files are maintained in a relational data base, and are used to obtain stratified tables of health event counts and person time at risk that serve as the starting point for Poisson regression analysis. The explanatory variables that define these tables are age, gender, occupational group, and time. Typical response variables of interest are the number of absences duemore » to illness or injury, i.e., the response variable is a count. Poisson regression methods are used to describe the effect of the explanatory variables on the health event rates using a log-linear main effects model. Results of fitting the main effects model are summarized in a tabular and graphical form and interpretation of model parameters is provided. An analysis of deviance table is used to evaluate the importance of each of the explanatory variables on the event rate of interest and to determine if interaction terms should be considered in the analysis. Although Poisson regression methods are widely used in the analysis of count data, there are situations in which over-dispersion occurs. This could be due to lack-of-fit of the regression model, extra-Poisson variation, or both. A score test statistic and regression diagnostics are used to identify over-dispersion. A quasi-likelihood method of moments procedure is used to evaluate and adjust for extra-Poisson variation when necessary. Two examples are presented using respiratory disease absence rates at two DOE sites to illustrate the methods and interpretation of the results. In the first example the Poisson main effects model is adequate. In the second example the score test indicates considerable over-dispersion and a more detailed analysis attributes the over-dispersion to extra-Poisson variation. The R open source software environment for statistical computing and graphics is used for analysis. Additional details about R and the data that were used in this report are provided in an Appendix. Information on how to obtain R and utility functions that can be used to duplicate results in this report are provided.« less

  20. Particle size distributions by transmission electron microscopy: an interlaboratory comparison case study

    PubMed Central

    Rice, Stephen B; Chan, Christopher; Brown, Scott C; Eschbach, Peter; Han, Li; Ensor, David S; Stefaniak, Aleksandr B; Bonevich, John; Vladár, András E; Hight Walker, Angela R; Zheng, Jiwen; Starnes, Catherine; Stromberg, Arnold; Ye, Jia; Grulke, Eric A

    2015-01-01

    This paper reports an interlaboratory comparison that evaluated a protocol for measuring and analysing the particle size distribution of discrete, metallic, spheroidal nanoparticles using transmission electron microscopy (TEM). The study was focused on automated image capture and automated particle analysis. NIST RM8012 gold nanoparticles (30 nm nominal diameter) were measured for area-equivalent diameter distributions by eight laboratories. Statistical analysis was used to (1) assess the data quality without using size distribution reference models, (2) determine reference model parameters for different size distribution reference models and non-linear regression fitting methods and (3) assess the measurement uncertainty of a size distribution parameter by using its coefficient of variation. The interlaboratory area-equivalent diameter mean, 27.6 nm ± 2.4 nm (computed based on a normal distribution), was quite similar to the area-equivalent diameter, 27.6 nm, assigned to NIST RM8012. The lognormal reference model was the preferred choice for these particle size distributions as, for all laboratories, its parameters had lower relative standard errors (RSEs) than the other size distribution reference models tested (normal, Weibull and Rosin–Rammler–Bennett). The RSEs for the fitted standard deviations were two orders of magnitude higher than those for the fitted means, suggesting that most of the parameter estimate errors were associated with estimating the breadth of the distributions. The coefficients of variation for the interlaboratory statistics also confirmed the lognormal reference model as the preferred choice. From quasi-linear plots, the typical range for good fits between the model and cumulative number-based distributions was 1.9 fitted standard deviations less than the mean to 2.3 fitted standard deviations above the mean. Automated image capture, automated particle analysis and statistical evaluation of the data and fitting coefficients provide a framework for assessing nanoparticle size distributions using TEM for image acquisition. PMID:26361398

  1. Genetic analysis of partial egg production records in Japanese quail using random regression models.

    PubMed

    Abou Khadiga, G; Mahmoud, B Y F; Farahat, G S; Emam, A M; El-Full, E A

    2017-08-01

    The main objectives of this study were to detect the most appropriate random regression model (RRM) to fit the data of monthly egg production in 2 lines (selected and control) of Japanese quail and to test the consistency of different criteria of model choice. Data from 1,200 female Japanese quails for the first 5 months of egg production from 4 consecutive generations of an egg line selected for egg production in the first month (EP1) was analyzed. Eight RRMs with different orders of Legendre polynomials were compared to determine the proper model for analysis. All criteria of model choice suggested that the adequate model included the second-order Legendre polynomials for fixed effects, and the third-order for additive genetic effects and permanent environmental effects. Predictive ability of the best model was the highest among all models (ρ = 0.987). According to the best model fitted to the data, estimates of heritability were relatively low to moderate (0.10 to 0.17) showed a descending pattern from the first to the fifth month of production. A similar pattern was observed for permanent environmental effects with greater estimates in the first (0.36) and second (0.23) months of production than heritability estimates. Genetic correlations between separate production periods were higher (0.18 to 0.93) than their phenotypic counterparts (0.15 to 0.87). The superiority of the selected line over the control was observed through significant (P < 0.05) linear contrast estimates. Significant (P < 0.05) estimates of covariate effect (age at sexual maturity) showed a decreased pattern with greater impact on egg production in earlier ages (first and second months) than later ones. A methodology based on random regression animal models can be recommended for genetic evaluation of egg production in Japanese quail. © 2017 Poultry Science Association Inc.

  2. Dynamic prediction in functional concurrent regression with an application to child growth.

    PubMed

    Leroux, Andrew; Xiao, Luo; Crainiceanu, Ciprian; Checkley, William

    2018-04-15

    In many studies, it is of interest to predict the future trajectory of subjects based on their historical data, referred to as dynamic prediction. Mixed effects models have traditionally been used for dynamic prediction. However, the commonly used random intercept and slope model is often not sufficiently flexible for modeling subject-specific trajectories. In addition, there may be useful exposures/predictors of interest that are measured concurrently with the outcome, complicating dynamic prediction. To address these problems, we propose a dynamic functional concurrent regression model to handle the case where both the functional response and the functional predictors are irregularly measured. Currently, such a model cannot be fit by existing software. We apply the model to dynamically predict children's length conditional on prior length, weight, and baseline covariates. Inference on model parameters and subject-specific trajectories is conducted using the mixed effects representation of the proposed model. An extensive simulation study shows that the dynamic functional regression model provides more accurate estimation and inference than existing methods. Methods are supported by fast, flexible, open source software that uses heavily tested smoothing techniques. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  3. Theory Can Help Structure Regression Models for Projecting Stream Conditions Under Alternative Land Use Scenarios

    NASA Astrophysics Data System (ADS)

    van Sickle, J.; Baker, J.; Herlihy, A.

    2005-05-01

    We built multiple regression models for Emphemeroptera/ Plecoptera/ Tricoptera (EPT) taxon richness and other indicators of biological condition in streams of the Willamette River Basin, Oregon, USA. The models were used to project the changes in condition that would be expected in all 2-4th order streams of the 30000 sq km basin under alternative scenarios of future land use. In formulating the models, we invoked the theory of limiting factors to express the interactive effects of stream power and watershed land use on EPT richness. The resulting models were parsimonious, and they fit the data in our wedge-shaped scatterplots slightly better than did a naive additive-effects model. Just as theory helped formulate our regression models, the models in turn helped us identify a new research need for the Basin's streams. Our future scenarios project that conversions of agricultural to urban uses may dominate landscape dynamics in the basin over the next 50 years. But our models could not detect any difference between the effects of agricultural and urban development in watersheds on stream biota. This result points to an increased need for understanding how agricultural and urban land uses in the Basin differentially influence stream ecosystems.

  4. A review and comparison of Bayesian and likelihood-based inferences in beta regression and zero-or-one-inflated beta regression.

    PubMed

    Liu, Fang; Eugenio, Evercita C

    2018-04-01

    Beta regression is an increasingly popular statistical technique in medical research for modeling of outcomes that assume values in (0, 1), such as proportions and patient reported outcomes. When outcomes take values in the intervals [0,1), (0,1], or [0,1], zero-or-one-inflated beta (zoib) regression can be used. We provide a thorough review on beta regression and zoib regression in the modeling, inferential, and computational aspects via the likelihood-based and Bayesian approaches. We demonstrate the statistical and practical importance of correctly modeling the inflation at zero/one rather than ad hoc replacing them with values close to zero/one via simulation studies; the latter approach can lead to biased estimates and invalid inferences. We show via simulation studies that the likelihood-based approach is computationally faster in general than MCMC algorithms used in the Bayesian inferences, but runs the risk of non-convergence, large biases, and sensitivity to starting values in the optimization algorithm especially with clustered/correlated data, data with sparse inflation at zero and one, and data that warrant regularization of the likelihood. The disadvantages of the regular likelihood-based approach make the Bayesian approach an attractive alternative in these cases. Software packages and tools for fitting beta and zoib regressions in both the likelihood-based and Bayesian frameworks are also reviewed.

  5. A test of inflated zeros for Poisson regression models.

    PubMed

    He, Hua; Zhang, Hui; Ye, Peng; Tang, Wan

    2017-01-01

    Excessive zeros are common in practice and may cause overdispersion and invalidate inference when fitting Poisson regression models. There is a large body of literature on zero-inflated Poisson models. However, methods for testing whether there are excessive zeros are less well developed. The Vuong test comparing a Poisson and a zero-inflated Poisson model is commonly applied in practice. However, the type I error of the test often deviates seriously from the nominal level, rendering serious doubts on the validity of the test in such applications. In this paper, we develop a new approach for testing inflated zeros under the Poisson model. Unlike the Vuong test for inflated zeros, our method does not require a zero-inflated Poisson model to perform the test. Simulation studies show that when compared with the Vuong test our approach not only better at controlling type I error rate, but also yield more power.

  6. Measurement of pediatric regional cerebral blood flow from 6 months to 15 years of age in a clinical population.

    PubMed

    Carsin-Vu, Aline; Corouge, Isabelle; Commowick, Olivier; Bouzillé, Guillaume; Barillot, Christian; Ferré, Jean-Christophe; Proisy, Maia

    2018-04-01

    To investigate changes in cerebral blood flow (CBF) in gray matter (GM) between 6 months and 15 years of age and to provide CBF values for the brain, GM, white matter (WM), hemispheres and lobes. Between 2013 and 2016, we retrospectively included all clinical MRI examinations with arterial spin labeling (ASL). We excluded subjects with a condition potentially affecting brain perfusion. For each subject, mean values of CBF in the brain, GM, WM, hemispheres and lobes were calculated. GM CBF was fitted using linear, quadratic and cubic polynomial regression against age. Regression models were compared with Akaike's information criterion (AIC), and Likelihood Ratio tests. 84 children were included (44 females/40 males). Mean CBF values were 64.2 ± 13.8 mL/100 g/min in GM, and 29.3 ± 10.0 mL/100 g/min in WM. The best-fit model of brain perfusion was the cubic polynomial function (AIC = 672.7, versus respectively AIC = 673.9 and AIC = 674.1 with the linear negative function and the quadratic polynomial function). A statistically significant difference between the tested models demonstrating the superiority of the quadratic (p = 0.18) or cubic polynomial model (p = 0.06), over the negative linear regression model was not found. No effect of general anesthesia (p = 0.34) or of gender (p = 0.16) was found. we provided values for ASL CBF in the brain, GM, WM, hemispheres, and lobes over a wide pediatric age range, approximately showing inverted U-shaped changes in GM perfusion over the course of childhood. Copyright © 2018 Elsevier B.V. All rights reserved.

  7. Area-to-point regression kriging for pan-sharpening

    NASA Astrophysics Data System (ADS)

    Wang, Qunming; Shi, Wenzhong; Atkinson, Peter M.

    2016-04-01

    Pan-sharpening is a technique to combine the fine spatial resolution panchromatic (PAN) band with the coarse spatial resolution multispectral bands of the same satellite to create a fine spatial resolution multispectral image. In this paper, area-to-point regression kriging (ATPRK) is proposed for pan-sharpening. ATPRK considers the PAN band as the covariate. Moreover, ATPRK is extended with a local approach, called adaptive ATPRK (AATPRK), which fits a regression model using a local, non-stationary scheme such that the regression coefficients change across the image. The two geostatistical approaches, ATPRK and AATPRK, were compared to the 13 state-of-the-art pan-sharpening approaches summarized in Vivone et al. (2015) in experiments on three separate datasets. ATPRK and AATPRK produced more accurate pan-sharpened images than the 13 benchmark algorithms in all three experiments. Unlike the benchmark algorithms, the two geostatistical solutions precisely preserved the spectral properties of the original coarse data. Furthermore, ATPRK can be enhanced by a local scheme in AATRPK, in cases where the residuals from a global regression model are such that their spatial character varies locally.

  8. Effects of Inventory Bias on Landslide Susceptibility Calculations

    NASA Technical Reports Server (NTRS)

    Stanley, T. A.; Kirschbaum, D. B.

    2017-01-01

    Many landslide inventories are known to be biased, especially inventories for large regions such as Oregon's SLIDO or NASA's Global Landslide Catalog. These biases must affect the results of empirically derived susceptibility models to some degree. We evaluated the strength of the susceptibility model distortion from postulated biases by truncating an unbiased inventory. We generated a synthetic inventory from an existing landslide susceptibility map of Oregon, then removed landslides from this inventory to simulate the effects of reporting biases likely to affect inventories in this region, namely population and infrastructure effects. Logistic regression models were fitted to the modified inventories. Then the process of biasing a susceptibility model was repeated with SLIDO data. We evaluated each susceptibility model with qualitative and quantitative methods. Results suggest that the effects of landslide inventory bias on empirical models should not be ignored, even if those models are, in some cases, useful. We suggest fitting models in well-documented areas and extrapolating across the study region as a possible approach to modeling landslide susceptibility with heavily biased inventories.

  9. Effects of Inventory Bias on Landslide Susceptibility Calculations

    NASA Technical Reports Server (NTRS)

    Stanley, Thomas; Kirschbaum, Dalia B.

    2017-01-01

    Many landslide inventories are known to be biased, especially inventories for large regions such as Oregons SLIDO or NASAs Global Landslide Catalog. These biases must affect the results of empirically derived susceptibility models to some degree. We evaluated the strength of the susceptibility model distortion from postulated biases by truncating an unbiased inventory. We generated a synthetic inventory from an existing landslide susceptibility map of Oregon, then removed landslides from this inventory to simulate the effects of reporting biases likely to affect inventories in this region, namely population and infrastructure effects. Logistic regression models were fitted to the modified inventories. Then the process of biasing a susceptibility model was repeated with SLIDO data. We evaluated each susceptibility model with qualitative and quantitative methods. Results suggest that the effects of landslide inventory bias on empirical models should not be ignored, even if those models are, in some cases, useful. We suggest fitting models in well-documented areas and extrapolating across the study region as a possible approach to modelling landslide susceptibility with heavily biased inventories.

  10. Variations in area-level disadvantage of Australian registered fitness trainers usual training locations.

    PubMed

    Bennie, Jason A; Thornton, Lukar E; van Uffelen, Jannique G Z; Banting, Lauren K; Biddle, Stuart J H

    2016-07-11

    Leisure-time physical activity and strength training participation levels are low and socioeconomically distributed. Fitness trainers (e.g. gym/group instructors) may have a role in increasing these participation levels. However, it is not known whether the training location and characteristics of Australian fitness trainers vary between areas that differ in socioeconomic status. In 2014, a sample of 1,189 Australian trainers completed an online survey with questions about personal and fitness industry-related characteristics (e.g. qualifications, setting, and experience) and postcode of their usual training location. The Australian Bureau of Statistics 'Index of Relative Socioeconomic Disadvantage' (IRSD) was matched to training location and used to assess where fitness professionals trained and whether their experience, qualification level and delivery methods differed by area-level disadvantage. Linear regression analysis was used to examine the relationship between IRSD score and selected characteristics adjusting for covariates (e.g. sex, age). Overall, 47 % of respondents worked in areas within the three least-disadvantaged deciles. In contrast, only 14.8 % worked in the three most-disadvantaged deciles. In adjusted regression models, fitness industry qualification was positively associated with a higher IRSD score (i.e. working in the least-disadvantaged areas) (Cert III: ref; Cert IV β:13.44 [95 % CI 3.86-23.02]; Diploma β:15.77 [95 % CI: 2.17-29.37]; Undergraduate β:23.14 [95 % CI: 9.41-36.86]). Fewer Australian fitness trainers work in areas with high levels of socioeconomic disadvantaged areas than in areas with low levels of disadvantage. A higher level of fitness industry qualifications was associated with working in areas with lower levels of disadvantage. Future research should explore the effectiveness of providing incentives that encourage more fitness trainers and those with higher qualifications to work in more socioeconomically disadvantaged areas.

  11. Mapping urban environmental noise: a land use regression method.

    PubMed

    Xie, Dan; Liu, Yi; Chen, Jining

    2011-09-01

    Forecasting and preventing urban noise pollution are major challenges in urban environmental management. Most existing efforts, including experiment-based models, statistical models, and noise mapping, however, have limited capacity to explain the association between urban growth and corresponding noise change. Therefore, these conventional methods can hardly forecast urban noise at a given outlook of development layout. This paper, for the first time, introduces a land use regression method, which has been applied for simulating urban air quality for a decade, to construct an urban noise model (LUNOS) in Dalian Municipality, Northwest China. The LUNOS model describes noise as a dependent variable of surrounding various land areas via a regressive function. The results suggest that a linear model performs better in fitting monitoring data, and there is no significant difference of the LUNOS's outputs when applied to different spatial scales. As the LUNOS facilitates a better understanding of the association between land use and urban environmental noise in comparison to conventional methods, it can be regarded as a promising tool for noise prediction for planning purposes and aid smart decision-making.

  12. Evaluating penalized logistic regression models to predict Heat-Related Electric grid stress days

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bramer, L. M.; Rounds, J.; Burleyson, C. D.

    Understanding the conditions associated with stress on the electricity grid is important in the development of contingency plans for maintaining reliability during periods when the grid is stressed. In this paper, heat-related grid stress and the relationship with weather conditions is examined using data from the eastern United States. Penalized logistic regression models were developed and applied to predict stress on the electric grid using weather data. The inclusion of other weather variables, such as precipitation, in addition to temperature improved model performance. Several candidate models and datasets were examined. A penalized logistic regression model fit at the operation-zone levelmore » was found to provide predictive value and interpretability. Additionally, the importance of different weather variables observed at different time scales were examined. Maximum temperature and precipitation were identified as important across all zones while the importance of other weather variables was zone specific. The methods presented in this work are extensible to other regions and can be used to aid in planning and development of the electrical grid.« less

  13. Evaluating penalized logistic regression models to predict Heat-Related Electric grid stress days

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bramer, Lisa M.; Rounds, J.; Burleyson, C. D.

    Understanding the conditions associated with stress on the electricity grid is important in the development of contingency plans for maintaining reliability during periods when the grid is stressed. In this paper, heat-related grid stress and the relationship with weather conditions were examined using data from the eastern United States. Penalized logistic regression models were developed and applied to predict stress on the electric grid using weather data. The inclusion of other weather variables, such as precipitation, in addition to temperature improved model performance. Several candidate models and combinations of predictive variables were examined. A penalized logistic regression model which wasmore » fit at the operation-zone level was found to provide predictive value and interpretability. Additionally, the importance of different weather variables observed at various time scales were examined. Maximum temperature and precipitation were identified as important across all zones while the importance of other weather variables was zone specific. In conclusion, the methods presented in this work are extensible to other regions and can be used to aid in planning and development of the electrical grid.« less

  14. Evaluating penalized logistic regression models to predict Heat-Related Electric grid stress days

    DOE PAGES

    Bramer, Lisa M.; Rounds, J.; Burleyson, C. D.; ...

    2017-09-22

    Understanding the conditions associated with stress on the electricity grid is important in the development of contingency plans for maintaining reliability during periods when the grid is stressed. In this paper, heat-related grid stress and the relationship with weather conditions were examined using data from the eastern United States. Penalized logistic regression models were developed and applied to predict stress on the electric grid using weather data. The inclusion of other weather variables, such as precipitation, in addition to temperature improved model performance. Several candidate models and combinations of predictive variables were examined. A penalized logistic regression model which wasmore » fit at the operation-zone level was found to provide predictive value and interpretability. Additionally, the importance of different weather variables observed at various time scales were examined. Maximum temperature and precipitation were identified as important across all zones while the importance of other weather variables was zone specific. In conclusion, the methods presented in this work are extensible to other regions and can be used to aid in planning and development of the electrical grid.« less

  15. Correlation between adenoma detection rate in colonoscopy- and fecal immunochemical testing-based colorectal cancer screening programs.

    PubMed

    Cubiella, Joaquín; Castells, Antoni; Andreu, Montserrat; Bujanda, Luis; Carballo, Fernando; Jover, Rodrigo; Lanas, Ángel; Morillas, Juan Diego; Salas, Dolores; Quintero, Enrique

    2017-03-01

    The adenoma detection rate (ADR) is the main quality indicator of colonoscopy. The ADR recommended in fecal immunochemical testing (FIT)-based colorectal cancer screening programs is unknown. Using the COLONPREV (NCT00906997) study dataset, we performed a post-hoc analysis to determine if there was a correlation between the ADR in primary and work-up colonoscopy, and the equivalent figure to the minimal 20% ADR recommended. Colonoscopy was performed in 5722 individuals: 5059 as primary strategy and 663 after a positive FIT result (OC-Sensor™; cut-off level 15 µg/g of feces). We developed a predictive model based on a multivariable lineal regression analysis including confounding variables. The median ADR was 31% (range, 14%-51%) in the colonoscopy group and 55% (range, 21%-83%) in the FIT group. There was a positive correlation in the ADR between primary and work-up colonoscopy (Pearson's coefficient 0.716; p  < 0.001). ADR in the FIT group was independently related to ADR in the colonoscopy group: regression coefficient for colonoscopy ADR, 0.71 ( p  = 0.009); sex, 0.09 ( p  = 0.09); age, 0.3 ( p  = 0.5); and region 0.00 ( p  = 0.9). The equivalent figure to the 20% ADR was 45% (95% confidence interval, 35%-56%). ADR in primary and work-up colonoscopy of a FIT-positive result are positively and significantly correlated.

  16. Correlation between adenoma detection rate in colonoscopy- and fecal immunochemical testing-based colorectal cancer screening programs

    PubMed Central

    Castells, Antoni; Andreu, Montserrat; Bujanda, Luis; Carballo, Fernando; Jover, Rodrigo; Lanas, Ángel; Morillas, Juan Diego; Salas, Dolores; Quintero, Enrique

    2016-01-01

    Background The adenoma detection rate (ADR) is the main quality indicator of colonoscopy. The ADR recommended in fecal immunochemical testing (FIT)-based colorectal cancer screening programs is unknown. Methods Using the COLONPREV (NCT00906997) study dataset, we performed a post-hoc analysis to determine if there was a correlation between the ADR in primary and work-up colonoscopy, and the equivalent figure to the minimal 20% ADR recommended. Colonoscopy was performed in 5722 individuals: 5059 as primary strategy and 663 after a positive FIT result (OC-Sensor™; cut-off level 15 µg/g of feces). We developed a predictive model based on a multivariable lineal regression analysis including confounding variables. Results The median ADR was 31% (range, 14%–51%) in the colonoscopy group and 55% (range, 21%–83%) in the FIT group. There was a positive correlation in the ADR between primary and work-up colonoscopy (Pearson’s coefficient 0.716; p < 0.001). ADR in the FIT group was independently related to ADR in the colonoscopy group: regression coefficient for colonoscopy ADR, 0.71 (p = 0.009); sex, 0.09 (p = 0.09); age, 0.3 (p = 0.5); and region 0.00 (p = 0.9). The equivalent figure to the 20% ADR was 45% (95% confidence interval, 35%–56%). Conclusions ADR in primary and work-up colonoscopy of a FIT-positive result are positively and significantly correlated. PMID:28344793

  17. Crystalline silica exposure and lung cancer mortality in diatomaceous earth industry workers: a quantitative risk assessment.

    PubMed

    Rice, F L; Park, R; Stayner, L; Smith, R; Gilbert, S; Checkoway, H

    2001-01-01

    To use various exposure-response models to estimate the risk of mortality from lung cancer due to occupational exposure to respirable crystalline silica dust. Data from a cohort mortality study of 2342 white male California diatomaceous earth mining and processing workers exposed to crystalline silica dust (mainly cristobalite) were reanalyzed with Poisson regression and Cox's proportional hazards models. Internal and external adjustments were used to control for potential confounding from the effects of time since first observation, calendar time, age, and Hispanic ethnicity. Cubic smoothing spline models were used to assess the fit of the models. Exposures were lagged by 10 years. Evaluations of the fit of the models were performed by comparing their deviances. Lifetime risks of lung cancer were estimated up to age 85 with an actuarial approach that accounted for competing causes of death. Exposure to respirable crystalline silica dust was a significant predictor (p<0.05) in nearly all of the models evaluated and the linear relative rate model with a 10 year exposure lag seemed to give the best fit in the Poisson regression analysis. For those who died of lung cancer the linear relative rate model predicted rate ratios for mortality from lung cancer of about 1.6 for the mean cumulative exposure to respirable silica compared with no exposure. The excess lifetime risk (to age 85) of mortality from lung cancer for white men exposed for 45 years and with a 10 year lag period at the current Occupational Safety and Health Administration (OSHA) standard of about 0.05 mg/m(3) for respirable cristobalite dust is 19/1000 (95% confidence interval (95% CI) 5/1000 to 46/1000). There was a significant risk of mortality from lung cancer that increased with cumulative exposure to respirable crystalline silica dust. The predicted number of deaths from lung cancer suggests that current occupational health standards may not be adequately protecting workers from the risk of lung cancer.

  18. Sensitivity analysis of respiratory parameter uncertainties: impact of criterion function form and constraints.

    PubMed

    Lutchen, K R

    1990-08-01

    A sensitivity analysis based on weighted least-squares regression is presented to evaluate alternative methods for fitting lumped-parameter models to respiratory impedance data. The goal is to maintain parameter accuracy simultaneously with practical experiment design. The analysis focuses on predicting parameter uncertainties using a linearized approximation for joint confidence regions. Applications are with four-element parallel and viscoelastic models for 0.125- to 4-Hz data and a six-element model with separate tissue and airway properties for input and transfer impedance data from 2-64 Hz. The criterion function form was evaluated by comparing parameter uncertainties when data are fit as magnitude and phase, dynamic resistance and compliance, or real and imaginary parts of input impedance. The proper choice of weighting can make all three criterion variables comparable. For the six-element model, parameter uncertainties were predicted when both input impedance and transfer impedance are acquired and fit simultaneously. A fit to both data sets from 4 to 64 Hz could reduce parameter estimate uncertainties considerably from those achievable by fitting either alone. For the four-element models, use of an independent, but noisy, measure of static compliance was assessed as a constraint on model parameters. This may allow acceptable parameter uncertainties for a minimum frequency of 0.275-0.375 Hz rather than 0.125 Hz. This reduces data acquisition requirements from a 16- to a 5.33- to 8-s breath holding period. These results are approximations, and the impact of using the linearized approximation for the confidence regions is discussed.

  19. Experimental study of water desorption isotherms and thin-layer convective drying kinetics of bay laurel leaves

    NASA Astrophysics Data System (ADS)

    Ghnimi, Thouraya; Hassini, Lamine; Bagane, Mohamed

    2016-12-01

    The aim of this work is to determine the desorption isotherms and the drying kinetics of bay laurel leaves ( Laurus Nobilis L.). The desorption isotherms were performed at three temperature levels: 50, 60 and 70 °C and at water activity ranging from 0.057 to 0.88 using the statistic gravimetric method. Five sorption models were used to fit desorption experimental isotherm data. It was found that Kuhn model offers the best fitting of experimental moisture isotherms in the mentioned investigated ranges of temperature and water activity. The Net isosteric heat of water desorption was evaluated using The Clausius-Clapeyron equation and was then best correlated to equilibrium moisture content by the empirical Tsami's equation. Thin layer convective drying curves of bay laurel leaves were obtained for temperatures of 45, 50, 60 and 70 °C, relative humidity of 5, 15, 30 and 45 % and air velocities of 1, 1.5 and 2 m/s. A non linear regression procedure of Levenberg-Marquardt was used to fit drying curves with five semi empirical mathematical models available in the literature, The R2 and χ2 were used to evaluate the goodness of fit of models to data. Based on the experimental drying curves the drying characteristic curve (DCC) has been established and fitted with a third degree polynomial function. It was found that the Midilli Kucuk model was the best semi-empirical model describing thin layer drying kinetics of bay laurel leaves. The bay laurel leaves effective moisture diffusivity and activation energy were also identified.

  20. Harmonic regression of Landsat time series for modeling attributes from national forest inventory data

    NASA Astrophysics Data System (ADS)

    Wilson, Barry T.; Knight, Joseph F.; McRoberts, Ronald E.

    2018-03-01

    Imagery from the Landsat Program has been used frequently as a source of auxiliary data for modeling land cover, as well as a variety of attributes associated with tree cover. With ready access to all scenes in the archive since 2008 due to the USGS Landsat Data Policy, new approaches to deriving such auxiliary data from dense Landsat time series are required. Several methods have previously been developed for use with finer temporal resolution imagery (e.g. AVHRR and MODIS), including image compositing and harmonic regression using Fourier series. The manuscript presents a study, using Minnesota, USA during the years 2009-2013 as the study area and timeframe. The study examined the relative predictive power of land cover models, in particular those related to tree cover, using predictor variables based solely on composite imagery versus those using estimated harmonic regression coefficients. The study used two common non-parametric modeling approaches (i.e. k-nearest neighbors and random forests) for fitting classification and regression models of multiple attributes measured on USFS Forest Inventory and Analysis plots using all available Landsat imagery for the study area and timeframe. The estimated Fourier coefficients developed by harmonic regression of tasseled cap transformation time series data were shown to be correlated with land cover, including tree cover. Regression models using estimated Fourier coefficients as predictor variables showed a two- to threefold increase in explained variance for a small set of continuous response variables, relative to comparable models using monthly image composites. Similarly, the overall accuracies of classification models using the estimated Fourier coefficients were approximately 10-20 percentage points higher than the models using the image composites, with corresponding individual class accuracies between six and 45 percentage points higher.

  1. Survival Regression Modeling Strategies in CVD Prediction.

    PubMed

    Barkhordari, Mahnaz; Padyab, Mojgan; Sardarinia, Mahsa; Hadaegh, Farzad; Azizi, Fereidoun; Bozorgmanesh, Mohammadreza

    2016-04-01

    A fundamental part of prevention is prediction. Potential predictors are the sine qua non of prediction models. However, whether incorporating novel predictors to prediction models could be directly translated to added predictive value remains an area of dispute. The difference between the predictive power of a predictive model with (enhanced model) and without (baseline model) a certain predictor is generally regarded as an indicator of the predictive value added by that predictor. Indices such as discrimination and calibration have long been used in this regard. Recently, the use of added predictive value has been suggested while comparing the predictive performances of the predictive models with and without novel biomarkers. User-friendly statistical software capable of implementing novel statistical procedures is conspicuously lacking. This shortcoming has restricted implementation of such novel model assessment methods. We aimed to construct Stata commands to help researchers obtain the aforementioned statistical indices. We have written Stata commands that are intended to help researchers obtain the following. 1, Nam-D'Agostino X 2 goodness of fit test; 2, Cut point-free and cut point-based net reclassification improvement index (NRI), relative absolute integrated discriminatory improvement index (IDI), and survival-based regression analyses. We applied the commands to real data on women participating in the Tehran lipid and glucose study (TLGS) to examine if information relating to a family history of premature cardiovascular disease (CVD), waist circumference, and fasting plasma glucose can improve predictive performance of Framingham's general CVD risk algorithm. The command is adpredsurv for survival models. Herein we have described the Stata package "adpredsurv" for calculation of the Nam-D'Agostino X 2 goodness of fit test as well as cut point-free and cut point-based NRI, relative and absolute IDI, and survival-based regression analyses. We hope this work encourages the use of novel methods in examining predictive capacity of the emerging plethora of novel biomarkers.

  2. An Occupational Performance Test Validation Program for Fire Fighters at the Kennedy Space Center

    NASA Technical Reports Server (NTRS)

    Schonfeld, Brian R.; Doerr, Donald F.; Convertino, Victor A.

    1990-01-01

    We evaluated performance of a modified Combat Task Test (CTT) and of standard fitness tests in 20 male subjects to assess the prediction of occupational performance standards for Kennedy Space Center fire fighters. The CTT consisted of stair-climbing, a chopping simulation, and a victim rescue simulation. Average CTT performance time was 3.61 +/- 0.25 min (SEM) and all CTT tasks required 93% to 97% maximal heart rate. By using scores from the standard fitness tests, a multiple linear regression model was fitted to each parameter: the stairclimb (r(exp 2) = .905, P less than .05), the chopping performance time (r(exp 2) = .582, P less than .05), the victim rescue time (r(exp 2) = .218, P = not significant), and the total performance time (r(exp 2) = .769, P less than .05). Treadmill time was the predominant variable, being the major predictor in two of four models. These results indicated that standardized fitness tests can predict performance on some CTT tasks and that test predictors were amenable to exercise training.

  3. Physical Fitness and Aortic Stiffness Explain the Reduced Cognitive Performance Associated with Increasing Age in Older People.

    PubMed

    Kennedy, Greg; Meyer, Denny; Hardman, Roy J; Macpherson, Helen; Scholey, Andrew B; Pipingas, Andrew

    2018-01-01

    Greater physical fitness is associated with reduced rates of cognitive decline in older people; however, the mechanisms by which this occurs are still unclear. One potential mechanism is aortic stiffness, with increased stiffness resulting in higher pulsatile pressures reaching the brain and possibly causing progressive micro-damage. There is limited evidence that those who regularly exercise may have lower aortic stiffness. To investigate whether greater fitness and lower aortic stiffness predict better cognitive performance in older people and, if so, whether aortic stiffness mediates the relationship between fitness and cognition. Residents of independent living facilities, aged 60-90, participated in the study (N = 102). Primary measures included a computerized cognitive assessment battery, pulse wave velocity analysis to measure aortic stiffness, and the Six-Minute Walk test to assess fitness. Based on hierarchical regression analyses, structural equation modelling was used to test the mediation hypothesis. Both fitness and aortic stiffness independently predicted Spatial Working Memory (SWM) performance, however no mediating relationship was found. Additionally, the derived structural equation model shows that, in conjunction with BMI and sex, fitness and aortic stiffness explain 33% of the overall variation in SWM, with age no longer directly predicting any variation. Greater fitness and lower aortic stiffness both independently predict better SWM in older people. The strong effect of age on cognitive performance is totally mediated by fitness and aortic stiffness. This suggests that addressing both physical fitness and aortic stiffness may be important to reduce the rate of age associated cognitive decline.

  4. Modeling lactation curves and estimation of genetic parameters in Holstein cows using multiple-trait random regression models.

    PubMed

    Kheirabadi, Khabat; Rashidi, Amir; Alijani, Sadegh; Imumorin, Ikhide

    2014-11-01

    We compared the goodness of fit of three mathematical functions (including: Legendre polynomials, Lidauer-Mäntysaari function and Wilmink function) for describing the lactation curve of primiparous Iranian Holstein cows by using multiple-trait random regression models (MT-RRM). Lactational submodels provided the largest daily additive genetic (AG) and permanent environmental (PE) variance estimates at the end and at the onset of lactation, respectively, as well as low genetic correlations between peripheral test-day records. For all models, heritability estimates were highest at the end of lactation (245 to 305 days) and ranged from 0.05 to 0.26, 0.03 to 0.12 and 0.04 to 0.24 for milk, fat and protein yields, respectively. Generally, the genetic correlations between traits depend on how far apart they are or whether they are on the same day in any two traits. On average, genetic correlations between milk and fat were the lowest and those between fat and protein were intermediate, while those between milk and protein were the highest. Results from all criteria (Akaike's and Schwarz's Bayesian information criterion, and -2*logarithm of the likelihood function) suggested that a model with 2 and 5 coefficients of Legendre polynomials for AG and PE effects, respectively, was the most adequate for fitting the data. © 2014 Japanese Society of Animal Science.

  5. Use of non-linear mixed-effects modelling and regression analysis to predict the number of somatic coliphages by plaque enumeration after 3 hours of incubation.

    PubMed

    Mendez, Javier; Monleon-Getino, Antonio; Jofre, Juan; Lucena, Francisco

    2017-10-01

    The present study aimed to establish the kinetics of the appearance of coliphage plaques using the double agar layer titration technique to evaluate the feasibility of using traditional coliphage plaque forming unit (PFU) enumeration as a rapid quantification method. Repeated measurements of the appearance of plaques of coliphages titrated according to ISO 10705-2 at different times were analysed using non-linear mixed-effects regression to determine the most suitable model of their appearance kinetics. Although this model is adequate, to simplify its applicability two linear models were developed to predict the numbers of coliphages reliably, using the PFU counts as determined by the ISO after only 3 hours of incubation. One linear model, when the number of plaques detected was between 4 and 26 PFU after 3 hours, had a linear fit of: (1.48 × Counts 3 h + 1.97); and the other, values >26 PFU, had a fit of (1.18 × Counts 3 h + 2.95). If the number of plaques detected was <4 PFU after 3 hours, we recommend incubation for (18 ± 3) hours. The study indicates that the traditional coliphage plating technique has a reasonable potential to provide results in a single working day without the need to invest in additional laboratory equipment.

  6. Adding thin-ideal internalization and impulsiveness to the cognitive-behavioral model of bulimic symptoms.

    PubMed

    Schnitzler, Caroline E; von Ranson, Kristin M; Wallace, Laurel M

    2012-08-01

    This study evaluated the cognitive-behavioral (CB) model of bulimia nervosa and an extension that included two additional maintaining factors - thin-ideal internalization and impulsiveness - in 327 undergraduate women. Participants completed measures of demographics, self-esteem, concern about shape and weight, dieting, bulimic symptoms, thin-ideal internalization, and impulsiveness. Both the original CB model and the extended model provided good fits to the data. Although structural equation modeling analyses suggested that the original CB model was most parsimonious, hierarchical regression analyses indicated that the additional variables accounted for significantly more variance. Additional analyses showed that the model fit could be improved by adding a path from concern about shape and weight, and deleting the path from dieting, to bulimic symptoms. Expanding upon the factors considered in the model may better capture the scope of variables maintaining bulimic symptoms in young women with a range of severity of bulimic symptoms. Copyright © 2012 Elsevier Ltd. All rights reserved.

  7. Developing a dengue forecast model using machine learning: A case study in China.

    PubMed

    Guo, Pi; Liu, Tao; Zhang, Qin; Wang, Li; Xiao, Jianpeng; Zhang, Qingying; Luo, Ganfeng; Li, Zhihao; He, Jianfeng; Zhang, Yonghui; Ma, Wenjun

    2017-10-01

    In China, dengue remains an important public health issue with expanded areas and increased incidence recently. Accurate and timely forecasts of dengue incidence in China are still lacking. We aimed to use the state-of-the-art machine learning algorithms to develop an accurate predictive model of dengue. Weekly dengue cases, Baidu search queries and climate factors (mean temperature, relative humidity and rainfall) during 2011-2014 in Guangdong were gathered. A dengue search index was constructed for developing the predictive models in combination with climate factors. The observed year and week were also included in the models to control for the long-term trend and seasonality. Several machine learning algorithms, including the support vector regression (SVR) algorithm, step-down linear regression model, gradient boosted regression tree algorithm (GBM), negative binomial regression model (NBM), least absolute shrinkage and selection operator (LASSO) linear regression model and generalized additive model (GAM), were used as candidate models to predict dengue incidence. Performance and goodness of fit of the models were assessed using the root-mean-square error (RMSE) and R-squared measures. The residuals of the models were examined using the autocorrelation and partial autocorrelation function analyses to check the validity of the models. The models were further validated using dengue surveillance data from five other provinces. The epidemics during the last 12 weeks and the peak of the 2014 large outbreak were accurately forecasted by the SVR model selected by a cross-validation technique. Moreover, the SVR model had the consistently smallest prediction error rates for tracking the dynamics of dengue and forecasting the outbreaks in other areas in China. The proposed SVR model achieved a superior performance in comparison with other forecasting techniques assessed in this study. The findings can help the government and community respond early to dengue epidemics.

  8. The relationship between compressive strength and flexural strength of pavement geopolymer grouting material

    NASA Astrophysics Data System (ADS)

    Zhang, L.; Han, X. X.; Ge, J.; Wang, C. H.

    2018-01-01

    To determine the relationship between compressive strength and flexural strength of pavement geopolymer grouting material, 20 groups of geopolymer grouting materials were prepared, the compressive strength and flexural strength were determined by mechanical properties test. On the basis of excluding the abnormal values through boxplot, the results show that, the compressive strength test results were normal, but there were two mild outliers in 7days flexural strength test. The compressive strength and flexural strength were linearly fitted by SPSS, six regression models were obtained by linear fitting of compressive strength and flexural strength. The linear relationship between compressive strength and flexural strength can be better expressed by the cubic curve model, and the correlation coefficient was 0.842.

  9. The Future Impact of Vietnam Era Veterans on Inpatient Acute Care and Mental Health Product Lines at a Veterans Affairs Medical Center

    DTIC Science & Technology

    2000-06-20

    smoothing and regression which includes curve fitting are two principle forecasting model types utilized in the vast majority of forecasting applications ... model were compared against the VA Office of Policy and Planning forecasting study commissioned with the actuarial firm of Milliman & Robertson (M & R... Application to the Veterans Healthcare System The development of a model to forecast future VEV needs, utilization, and cost of the Acute Care and

  10. Determining a Prony Series for a Viscoelastic Material From Time Varying Strain Data

    NASA Technical Reports Server (NTRS)

    Tzikang, Chen

    2000-01-01

    In this study a method of determining the coefficients in a Prony series representation of a viscoelastic modulus from rate dependent data is presented. Load versus time test data for a sequence of different rate loading segments is least-squares fitted to a Prony series hereditary integral model of the material tested. A nonlinear least squares regression algorithm is employed. The measured data includes ramp loading, relaxation, and unloading stress-strain data. The resulting Prony series which captures strain rate loading and unloading effects, produces an excellent fit to the complex loading sequence.

  11. Predicting Grain Growth in Nanocrystalline Materials: A Thermodynamic and Kinetic-Based Model Informed by High Temperature X-ray Diffraction Experiments

    DTIC Science & Technology

    2014-10-01

    and d) Γb0. The scatter of the data points is due to the variation in the other parameters at 1 h. The line represents a best fit linear regression...parameters: a) Hseg, b) QL, c) γ0, and d) Γb0. The scatter of the data points is due to the variation in the other parameters at 1 h. The line represents...concentration x0 for the nanocrystalline Fe–Zr system. The white square data point shows the location of the experimental data used for fitting the

  12. Directional data analysis under the general projected normal distribution

    PubMed Central

    Wang, Fangpo; Gelfand, Alan E.

    2013-01-01

    The projected normal distribution is an under-utilized model for explaining directional data. In particular, the general version provides flexibility, e.g., asymmetry and possible bimodality along with convenient regression specification. Here, we clarify the properties of this general class. We also develop fully Bayesian hierarchical models for analyzing circular data using this class. We show how they can be fit using MCMC methods with suitable latent variables. We show how posterior inference for distributional features such as the angular mean direction and concentration can be implemented as well as how prediction within the regression setting can be handled. With regard to model comparison, we argue for an out-of-sample approach using both a predictive likelihood scoring loss criterion and a cumulative rank probability score criterion. PMID:24046539

  13. [Research of prevalence of schistosomiasis in Hunan province, 1984-2015].

    PubMed

    Li, F Y; Tan, H Z; Ren, G H; Jiang, Q; Wang, H L

    2017-03-10

    Objective: To analyze the prevalence of schistosomiasis in Hunan province, and provide scientific evidence for the control and elimination of schistosomiasis. Methods: The changes of infection rates of Schistosoma ( S .) japonicum among residents and cattle in Hunan from 1984 to 2015 were analyzed by using dynamic trend diagram; and the time regression model was used to fit the infection rates of S. japonicum , and predict the recent infection rate. Results: The overall infection rates of S. japonicum in Hunan from 1984 to 2015 showed downward trend (95.29% in residents and 95.16% in cattle). By using the linear regression model, the actual values of infection rates in residents and cattle were all in the 95% confidence intervals of the value predicted; and the prediction showed that the infection rates in the residents and cattle would continue to decrease from 2016 to 2020. Conclusion: The prevalence of schistosomiasis was in decline in Hunan. The regression model has a good effect in the short-term prediction of schistosomiasis prevalence.

  14. Intermediate and advanced topics in multilevel logistic regression analysis

    PubMed Central

    Merlo, Juan

    2017-01-01

    Multilevel data occur frequently in health services, population and public health, and epidemiologic research. In such research, binary outcomes are common. Multilevel logistic regression models allow one to account for the clustering of subjects within clusters of higher‐level units when estimating the effect of subject and cluster characteristics on subject outcomes. A search of the PubMed database demonstrated that the use of multilevel or hierarchical regression models is increasing rapidly. However, our impression is that many analysts simply use multilevel regression models to account for the nuisance of within‐cluster homogeneity that is induced by clustering. In this article, we describe a suite of analyses that can complement the fitting of multilevel logistic regression models. These ancillary analyses permit analysts to estimate the marginal or population‐average effect of covariates measured at the subject and cluster level, in contrast to the within‐cluster or cluster‐specific effects arising from the original multilevel logistic regression model. We describe the interval odds ratio and the proportion of opposed odds ratios, which are summary measures of effect for cluster‐level covariates. We describe the variance partition coefficient and the median odds ratio which are measures of components of variance and heterogeneity in outcomes. These measures allow one to quantify the magnitude of the general contextual effect. We describe an R 2 measure that allows analysts to quantify the proportion of variation explained by different multilevel logistic regression models. We illustrate the application and interpretation of these measures by analyzing mortality in patients hospitalized with a diagnosis of acute myocardial infarction. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:28543517

  15. Intermediate and advanced topics in multilevel logistic regression analysis.

    PubMed

    Austin, Peter C; Merlo, Juan

    2017-09-10

    Multilevel data occur frequently in health services, population and public health, and epidemiologic research. In such research, binary outcomes are common. Multilevel logistic regression models allow one to account for the clustering of subjects within clusters of higher-level units when estimating the effect of subject and cluster characteristics on subject outcomes. A search of the PubMed database demonstrated that the use of multilevel or hierarchical regression models is increasing rapidly. However, our impression is that many analysts simply use multilevel regression models to account for the nuisance of within-cluster homogeneity that is induced by clustering. In this article, we describe a suite of analyses that can complement the fitting of multilevel logistic regression models. These ancillary analyses permit analysts to estimate the marginal or population-average effect of covariates measured at the subject and cluster level, in contrast to the within-cluster or cluster-specific effects arising from the original multilevel logistic regression model. We describe the interval odds ratio and the proportion of opposed odds ratios, which are summary measures of effect for cluster-level covariates. We describe the variance partition coefficient and the median odds ratio which are measures of components of variance and heterogeneity in outcomes. These measures allow one to quantify the magnitude of the general contextual effect. We describe an R 2 measure that allows analysts to quantify the proportion of variation explained by different multilevel logistic regression models. We illustrate the application and interpretation of these measures by analyzing mortality in patients hospitalized with a diagnosis of acute myocardial infarction. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

  16. Physical fitness predicts technical-tactical and time-motion profile in simulated Judo and Brazilian Jiu-Jitsu matches

    PubMed Central

    Gentil, Paulo; Bueno, João C.A.; Follmer, Bruno; Marques, Vitor A.; Del Vecchio, Fabrício B.

    2018-01-01

    Background Among combat sports, Judo and Brazilian Jiu-Jitsu (BJJ) present elevated physical fitness demands from the high-intensity intermittent efforts. However, information regarding how metabolic and neuromuscular physical fitness is associated with technical-tactical performance in Judo and BJJ fights is not available. This study aimed to relate indicators of physical fitness with combat performance variables in Judo and BJJ. Methods The sample consisted of Judo (n = 16) and BJJ (n = 24) male athletes. At the first meeting, the physical tests were applied and, in the second, simulated fights were performed for later notational analysis. Results The main findings indicate: (i) high reproducibility of the proposed instrument and protocol used for notational analysis in a mobile device; (ii) differences in the technical-tactical and time-motion patterns between modalities; (iii) performance-related variables are different in Judo and BJJ; and (iv) regression models based on metabolic fitness variables may account for up to 53% of the variances in technical-tactical and/or time-motion variables in Judo and up to 31% in BJJ, whereas neuromuscular fitness models can reach values up to 44 and 73% of prediction in Judo and BJJ, respectively. When all components are combined, they can explain up to 90% of high intensity actions in Judo. Discussion In conclusion, performance prediction models in simulated combat indicate that anaerobic, aerobic and neuromuscular fitness variables contribute to explain time-motion variables associated with high intensity and technical-tactical variables in Judo and BJJ fights. PMID:29844991

  17. Artificial bias typically neglected in comparisons of uncertain atmospheric data

    NASA Astrophysics Data System (ADS)

    Pitkänen, Mikko R. A.; Mikkonen, Santtu; Lehtinen, Kari E. J.; Lipponen, Antti; Arola, Antti

    2016-09-01

    Publications in atmospheric sciences typically neglect biases caused by regression dilution (bias of the ordinary least squares line fitting) and regression to the mean (RTM) in comparisons of uncertain data. We use synthetic observations mimicking real atmospheric data to demonstrate how the biases arise from random data uncertainties of measurements, model output, or satellite retrieval products. Further, we provide examples of typical methods of data comparisons that have a tendency to pronounce the biases. The results show, that data uncertainties can significantly bias data comparisons due to regression dilution and RTM, a fact that is known in statistics but disregarded in atmospheric sciences. Thus, we argue that often these biases are widely regarded as measurement or modeling errors, for instance, while they in fact are artificial. It is essential that atmospheric and geoscience communities become aware of and consider these features in research.

  18. Predicting the risk for colorectal cancer with personal characteristics and fecal immunochemical test.

    PubMed

    Li, Wen; Zhao, Li-Zhong; Ma, Dong-Wang; Wang, De-Zheng; Shi, Lei; Wang, Hong-Lei; Dong, Mo; Zhang, Shu-Yi; Cao, Lei; Zhang, Wei-Hua; Zhang, Xi-Peng; Zhang, Qing-Huai; Yu, Lin; Qin, Hai; Wang, Xi-Mo; Chen, Sam Li-Sheng

    2018-05-01

    We aimed to predict colorectal cancer (CRC) based on the demographic features and clinical correlates of personal symptoms and signs from Tianjin community-based CRC screening data.A total of 891,199 residents who were aged 60 to 74 and were screened in 2012 were enrolled. The Lasso logistic regression model was used to identify the predictors for CRC. Predictive validity was assessed by the receiver operating characteristic (ROC) curve. Bootstrapping method was also performed to validate this prediction model.CRC was best predicted by a model that included age, sex, education level, occupations, diarrhea, constipation, colon mucosa and bleeding, gallbladder disease, a stressful life event, family history of CRC, and a positive fecal immunochemical test (FIT). The area under curve (AUC) for the questionnaire with a FIT was 84% (95% CI: 82%-86%), followed by 76% (95% CI: 74%-79%) for a FIT alone, and 73% (95% CI: 71%-76%) for the questionnaire alone. With 500 bootstrap replications, the estimated optimism (<0.005) shows good discrimination in validation of prediction model.A risk prediction model for CRC based on a series of symptoms and signs related to enteric diseases in combination with a FIT was developed from first round of screening. The results of the current study are useful for increasing the awareness of high-risk subjects and for individual-risk-guided invitations or strategies to achieve mass screening for CRC.

  19. Misspecification in Latent Change Score Models: Consequences for Parameter Estimation, Model Evaluation, and Predicting Change.

    PubMed

    Clark, D Angus; Nuttall, Amy K; Bowles, Ryan P

    2018-01-01

    Latent change score models (LCS) are conceptually powerful tools for analyzing longitudinal data (McArdle & Hamagami, 2001). However, applications of these models typically include constraints on key parameters over time. Although practically useful, strict invariance over time in these parameters is unlikely in real data. This study investigates the robustness of LCS when invariance over time is incorrectly imposed on key change-related parameters. Monte Carlo simulation methods were used to explore the impact of misspecification on parameter estimation, predicted trajectories of change, and model fit in the dual change score model, the foundational LCS. When constraints were incorrectly applied, several parameters, most notably the slope (i.e., constant change) factor mean and autoproportion coefficient, were severely and consistently biased, as were regression paths to the slope factor when external predictors of change were included. Standard fit indices indicated that the misspecified models fit well, partly because mean level trajectories over time were accurately captured. Loosening constraint improved the accuracy of parameter estimates, but estimates were more unstable, and models frequently failed to converge. Results suggest that potentially common sources of misspecification in LCS can produce distorted impressions of developmental processes, and that identifying and rectifying the situation is a challenge.

  20. Effect Size Measure and Analysis of Single Subject Designs

    ERIC Educational Resources Information Center

    Society for Research on Educational Effectiveness, 2013

    2013-01-01

    One of the vexing problems in the analysis of SSD is in the assessment of the effect of intervention. Serial dependence notwithstanding, the linear model approach that has been advanced involves, in general, the fitting of regression lines (or curves) to the set of observations within each phase of the design and comparing the parameters of these…

  1. Shape selection in Landsat time series: A tool for monitoring forest dynamics

    Treesearch

    Gretchen G. Moisen; Mary C. Meyer; Todd A. Schroeder; Xiyue Liao; Karen G. Schleeweis; Elizabeth A. Freeman; Chris Toney

    2016-01-01

    We present a new methodology for fitting nonparametric shape-restricted regression splines to time series of Landsat imagery for the purpose of modeling, mapping, and monitoring annual forest disturbance dynamics over nearly three decades. For each pixel and spectral band or index of choice in temporal Landsat data, our method delivers a smoothed rendition of...

  2. Annual Tree Growth Predictions From Periodic Measurements

    Treesearch

    Quang V. Cao

    2004-01-01

    Data from annual measurements of a loblolly pine (Pinus taeda L.) plantation were available for this study. Regression techniques were employed to model annual changes of individual trees in terms of diameters, heights, and survival probabilities. Subsets of the data that include measurements every 2, 3, 4, 5, and 6 years were used to fit the same...

  3. Radioecological modelling of Polonium-210 and Caesium-137 in lichen-reindeer-man and top predators.

    PubMed

    Persson, Bertil R R; Gjelsvik, Runhild; Holm, Elis

    2018-06-01

    This work deals with analysis and modelling of the radionuclides 210 Pb and 210 Po in the food-chain lichen-reindeer-man in addition to 210 Po and 137 Cs in top predators. By using the methods of Partial Least Square Regression (PLSR) the atmospheric deposition of 210 Pb and 210 Po is predicted at the sample locations. Dynamic modelling of the activity concentration with differential equations is fitted to the sample data. Reindeer lichen consumption, gastrointestinal absorption, organ distribution and elimination is derived from information in the literature. Dynamic modelling of transfer of 210 Pb and 210 Po to reindeer meat, liver and bone from lichen consumption, fitted well with data from Sweden and Finland from 1966 to 1971. The activity concentration of 210 Pb in the skeleton in man is modelled by using the results of studying the kinetics of lead in skeleton and blood in lead-workers after end of occupational exposure. The result of modelling 210 Pb and 210 Po activity in skeleton matched well with concentrations of 210 Pb and 210 Po in teeth from reindeer-breeders and autopsy bone samples in Finland. The results of 210 Po and 137 Cs in different tissues of wolf, wolverine and lynx previously published, are analysed with multivariate data processing methods such as Principal Component Analysis PCA, and modelled with the method of Projection to Latent Structures, PLS, or Partial Least Square Regression PLSR. Copyright © 2017 Elsevier Ltd. All rights reserved.

  4. A comparison of methods to handle skew distributed cost variables in the analysis of the resource consumption in schizophrenia treatment.

    PubMed

    Kilian, Reinhold; Matschinger, Herbert; Löeffler, Walter; Roick, Christiane; Angermeyer, Matthias C

    2002-03-01

    Transformation of the dependent cost variable is often used to solve the problems of heteroscedasticity and skewness in linear ordinary least square regression of health service cost data. However, transformation may cause difficulties in the interpretation of regression coefficients and the retransformation of predicted values. The study compares the advantages and disadvantages of different methods to estimate regression based cost functions using data on the annual costs of schizophrenia treatment. Annual costs of psychiatric service use and clinical and socio-demographic characteristics of the patients were assessed for a sample of 254 patients with a diagnosis of schizophrenia (ICD-10 F 20.0) living in Leipzig. The clinical characteristics of the participants were assessed by means of the BPRS 4.0, the GAF, and the CAN for service needs. Quality of life was measured by WHOQOL-BREF. A linear OLS regression model with non-parametric standard errors, a log-transformed OLS model and a generalized linear model with a log-link and a gamma distribution were used to estimate service costs. For the estimation of robust non-parametric standard errors, the variance estimator by White and a bootstrap estimator based on 2000 replications were employed. Models were evaluated by the comparison of the R2 and the root mean squared error (RMSE). RMSE of the log-transformed OLS model was computed with three different methods of bias-correction. The 95% confidence intervals for the differences between the RMSE were computed by means of bootstrapping. A split-sample-cross-validation procedure was used to forecast the costs for the one half of the sample on the basis of a regression equation computed for the other half of the sample. All three methods showed significant positive influences of psychiatric symptoms and met psychiatric service needs on service costs. Only the log- transformed OLS model showed a significant negative impact of age, and only the GLM shows a significant negative influences of employment status and partnership on costs. All three models provided a R2 of about.31. The Residuals of the linear OLS model revealed significant deviances from normality and homoscedasticity. The residuals of the log-transformed model are normally distributed but still heteroscedastic. The linear OLS model provided the lowest prediction error and the best forecast of the dependent cost variable. The log-transformed model provided the lowest RMSE if the heteroscedastic bias correction was used. The RMSE of the GLM with a log link and a gamma distribution was higher than those of the linear OLS model and the log-transformed OLS model. The difference between the RMSE of the linear OLS model and that of the log-transformed OLS model without bias correction was significant at the 95% level. As result of the cross-validation procedure, the linear OLS model provided the lowest RMSE followed by the log-transformed OLS model with a heteroscedastic bias correction. The GLM showed the weakest model fit again. None of the differences between the RMSE resulting form the cross- validation procedure were found to be significant. The comparison of the fit indices of the different regression models revealed that the linear OLS model provided a better fit than the log-transformed model and the GLM, but the differences between the models RMSE were not significant. Due to the small number of cases in the study the lack of significance does not sufficiently proof that the differences between the RSME for the different models are zero and the superiority of the linear OLS model can not be generalized. The lack of significant differences among the alternative estimators may reflect a lack of sample size adequate to detect important differences among the estimators employed. Further studies with larger case number are necessary to confirm the results. Specification of an adequate regression models requires a careful examination of the characteristics of the data. Estimation of standard errors and confidence intervals by nonparametric methods which are robust against deviations from the normal distribution and the homoscedasticity of residuals are suitable alternatives to the transformation of the skew distributed dependent variable. Further studies with more adequate case numbers are needed to confirm the results.

  5. Many-level multilevel structural equation modeling: An efficient evaluation strategy.

    PubMed

    Pritikin, Joshua N; Hunter, Michael D; von Oertzen, Timo; Brick, Timothy R; Boker, Steven M

    2017-01-01

    Structural equation models are increasingly used for clustered or multilevel data in cases where mixed regression is too inflexible. However, when there are many levels of nesting, these models can become difficult to estimate. We introduce a novel evaluation strategy, Rampart, that applies an orthogonal rotation to the parts of a model that conform to commonly met requirements. This rotation dramatically simplifies fit evaluation in a way that becomes more potent as the size of the data set increases. We validate and evaluate the implementation using a 3-level latent regression simulation study. Then we analyze data from a state-wide child behavioral health measure administered by the Oklahoma Department of Human Services. We demonstrate the efficiency of Rampart compared to other similar software using a latent factor model with a 5-level decomposition of latent variance. Rampart is implemented in OpenMx, a free and open source software.

  6. Multivariate Boosting for Integrative Analysis of High-Dimensional Cancer Genomic Data

    PubMed Central

    Xiong, Lie; Kuan, Pei-Fen; Tian, Jianan; Keles, Sunduz; Wang, Sijian

    2015-01-01

    In this paper, we propose a novel multivariate component-wise boosting method for fitting multivariate response regression models under the high-dimension, low sample size setting. Our method is motivated by modeling the association among different biological molecules based on multiple types of high-dimensional genomic data. Particularly, we are interested in two applications: studying the influence of DNA copy number alterations on RNA transcript levels and investigating the association between DNA methylation and gene expression. For this purpose, we model the dependence of the RNA expression levels on DNA copy number alterations and the dependence of gene expression on DNA methylation through multivariate regression models and utilize boosting-type method to handle the high dimensionality as well as model the possible nonlinear associations. The performance of the proposed method is demonstrated through simulation studies. Finally, our multivariate boosting method is applied to two breast cancer studies. PMID:26609213

  7. Numerical scoring for the Classic BILAG index.

    PubMed

    Cresswell, Lynne; Yee, Chee-Seng; Farewell, Vernon; Rahman, Anisur; Teh, Lee-Suan; Griffiths, Bridget; Bruce, Ian N; Ahmad, Yasmeen; Prabu, Athiveeraramapandian; Akil, Mohammed; McHugh, Neil; Toescu, Veronica; D'Cruz, David; Khamashta, Munther A; Maddison, Peter; Isenberg, David A; Gordon, Caroline

    2009-12-01

    To develop an additive numerical scoring scheme for the Classic BILAG index. SLE patients were recruited into this multi-centre cross-sectional study. At every assessment, data were collected on disease activity and therapy. Logistic regression was used to model an increase in therapy, as an indicator of active disease, by the Classic BILAG score in eight systems. As both indicate inactivity, scores of D and E were set to 0 and used as the baseline in the fitted model. The coefficients from the fitted model were used to determine the numerical values for Grades A, B and C. Different scoring schemes were then compared using receiver operating characteristic (ROC) curves. Validation analysis was performed using assessments from a single centre. There were 1510 assessments from 369 SLE patients. The currently used coding scheme (A = 9, B = 3, C = 1 and D/E = 0) did not fit the data well. The regression model suggested three possible numerical scoring schemes: (i) A = 11, B = 6, C = 1 and D/E = 0; (ii) A = 12, B = 6, C = 1 and D/E = 0; and (iii) A = 11, B = 7, C = 1 and D/E = 0. These schemes produced comparable ROC curves. Based on this, A = 12, B = 6, C = 1 and D/E = 0 seemed a reasonable and practical choice. The validation analysis suggested that although the A = 12, B = 6, C = 1 and D/E = 0 coding is still reasonable, a scheme with slightly less weighting for B, such as A = 12, B = 5, C = 1 and D/E = 0, may be more appropriate. A reasonable additive numerical scoring scheme based on treatment decision for the Classic BILAG index is A = 12, B = 5, C = 1, D = 0 and E = 0.

  8. Antiretroviral drug diversion links social vulnerability to poor medication adherence in substance abusing populations.

    PubMed

    Tsuyuki, Kiyomi; Surratt, Hilary L

    2015-05-01

    Antiretroviral (ARV) medication diversion to the illicit market has been documented in South Florida, and linked to sub-optimal adherence in people living with HIV. ARV diversion reflects an unmet need for care in vulnerable populations that have difficulty engaging in consistent HIV care due to competing needs and co-morbidities. This study applies the Gelberg-Andersen behavioral model of health care utilization for vulnerable populations to understand how social vulnerability is linked to ARV diversion and adherence. Cross-sectional data were collected from a targeted sample of vulnerable people living with HIV in South Florida between 2010 and 2012 (n = 503). Structured interviews collected quantitative data on ARV diversion, access and utilization of care, and ARV adherence. Logistic regression was used to estimate the goodness-of-fit of additive models that test domain fit. Linear regression was used to estimate the effects of social vulnerability and ARV diversion on ARV adherence. The best fitting model to predict ARV diversion identifies having a low monthly income and unstable HIV care as salient enabling factors that promote ARV diversion. Importantly, health care need factors did not protect against ARV diversion, evidence that immediate competing needs are prioritized even in the face of poor health for this sample. We also find that ARV diversion provides a link between social vulnerability and sub-optimal ARV adherence, with ARV diversion and domains from the Behavioral Model explaining 25 % of the variation in ARV adherence. Our analyses reveal great need to improve engagement in HIV care for vulnerable populations by strengthening enabling factors (e.g. patient-provider relationship) to improve retention in HIV care and ARV adherence for vulnerable populations.

  9. Numerical scoring for the Classic BILAG index

    PubMed Central

    Cresswell, Lynne; Yee, Chee-Seng; Farewell, Vernon; Rahman, Anisur; Teh, Lee-Suan; Griffiths, Bridget; Bruce, Ian N.; Ahmad, Yasmeen; Prabu, Athiveeraramapandian; Akil, Mohammed; McHugh, Neil; Toescu, Veronica; D’Cruz, David; Khamashta, Munther A.; Maddison, Peter; Isenberg, David A.

    2009-01-01

    Objective. To develop an additive numerical scoring scheme for the Classic BILAG index. Methods. SLE patients were recruited into this multi-centre cross-sectional study. At every assessment, data were collected on disease activity and therapy. Logistic regression was used to model an increase in therapy, as an indicator of active disease, by the Classic BILAG score in eight systems. As both indicate inactivity, scores of D and E were set to 0 and used as the baseline in the fitted model. The coefficients from the fitted model were used to determine the numerical values for Grades A, B and C. Different scoring schemes were then compared using receiver operating characteristic (ROC) curves. Validation analysis was performed using assessments from a single centre. Results. There were 1510 assessments from 369 SLE patients. The currently used coding scheme (A = 9, B = 3, C = 1 and D/E = 0) did not fit the data well. The regression model suggested three possible numerical scoring schemes: (i) A = 11, B = 6, C = 1 and D/E = 0; (ii) A = 12, B = 6, C = 1 and D/E = 0; and (iii) A = 11, B = 7, C = 1 and D/E = 0. These schemes produced comparable ROC curves. Based on this, A = 12, B = 6, C = 1 and D/E = 0 seemed a reasonable and practical choice. The validation analysis suggested that although the A = 12, B = 6, C = 1 and D/E = 0 coding is still reasonable, a scheme with slightly less weighting for B, such as A = 12, B = 5, C = 1 and D/E = 0, may be more appropriate. Conclusions. A reasonable additive numerical scoring scheme based on treatment decision for the Classic BILAG index is A = 12, B = 5, C = 1, D = 0 and E = 0. PMID:19779027

  10. Landslide susceptibility mapping using decision-tree based CHi-squared automatic interaction detection (CHAID) and Logistic regression (LR) integration

    NASA Astrophysics Data System (ADS)

    Althuwaynee, Omar F.; Pradhan, Biswajeet; Ahmad, Noordin

    2014-06-01

    This article uses methodology based on chi-squared automatic interaction detection (CHAID), as a multivariate method that has an automatic classification capacity to analyse large numbers of landslide conditioning factors. This new algorithm was developed to overcome the subjectivity of the manual categorization of scale data of landslide conditioning factors, and to predict rainfall-induced susceptibility map in Kuala Lumpur city and surrounding areas using geographic information system (GIS). The main objective of this article is to use CHi-squared automatic interaction detection (CHAID) method to perform the best classification fit for each conditioning factor, then, combining it with logistic regression (LR). LR model was used to find the corresponding coefficients of best fitting function that assess the optimal terminal nodes. A cluster pattern of landslide locations was extracted in previous study using nearest neighbor index (NNI), which were then used to identify the clustered landslide locations range. Clustered locations were used as model training data with 14 landslide conditioning factors such as; topographic derived parameters, lithology, NDVI, land use and land cover maps. Pearson chi-squared value was used to find the best classification fit between the dependent variable and conditioning factors. Finally the relationship between conditioning factors were assessed and the landslide susceptibility map (LSM) was produced. An area under the curve (AUC) was used to test the model reliability and prediction capability with the training and validation landslide locations respectively. This study proved the efficiency and reliability of decision tree (DT) model in landslide susceptibility mapping. Also it provided a valuable scientific basis for spatial decision making in planning and urban management studies.

  11. [ASSOCIATION BETWEEN HEALTH RELATED QUALITY OF LIFE, BODYWEIGHT STATUS (BMI) AND PHYSICAL ACTIVITY AND FITNESS LEVELS IN CHILEAN ADOLESCENTS].

    PubMed

    García-Rubio, Javier; Olivares, Pedro R; Lopez-Legarrea, Patricia; Gómez-Campos, Rossana; Cossio-Bolaños, Marco A; Merellano-Navarro, Eugenio

    2015-10-01

    the objective of this study was to analyze the potential relationships between Health Related Quality of Life (HRQoL) with weight status, physical activity (PA) and fitness in Chilean adolescents in both, independent and combined analysis. a sample of 767 participants (47.5% females) and aged between 12 and 18 (mean age 15.5) was employed. All measurements were carried out using selfreported instruments and Kidscreen-10, iPAQ and IFIS were used to assess HRQoL, PA and Fitness respectively. One factor ANOVA and linear regression models were applied to analyze associations between HRQoL, weight status, PA and fitness using age and sex as confounders. body mass index, level of PA and fitness were independently associated with HRQoL in Chilean adolescents. However, the combined and adjusted by sex and age analysis of these associations showed that only the fitness was significantly related with HRQoL. general fitness is associated with HRQoL independently of sex, age, bodyweight status and level of PA. The relationship between nutritional status and weekly PA with HRQoL are mediated by sex, age and general fitness. Copyright AULA MEDICA EDICIONES 2014. Published by AULA MEDICA. All rights reserved.

  12. Relationships between training load, injury, and fitness in sub-elite collision sport athletes.

    PubMed

    Gabbett, Tim J; Domrow, Nathan

    2007-11-01

    The purpose of this study was to develop statistical models that estimate the influence of training load on training injury and physical fitness in collision sport athletes. The incidence of training injuries was studied in 183 rugby league players over two competitive seasons. Participants were assessed for height, body mass, skinfold thickness, vertical jump, 10-m, 20-m and 40-m sprint time, agility, and estimated maximal aerobic power in the off-season, pre-season, mid-season, and end-season. Training load and injury data were summarised into pre-season, early-competition, and late-competition training phases. Individual training load, fitness, and injury data were modelled using a logistic regression model with a binomial distribution and logit link function, while team training load and injury data were modelled using a linear regression model. While physical fitness improved with training, there was no association (P=0.16-0.99) between training load and changes in physical fitness during any of the training phases. However, increases in training load during the early-competition training phase decreased (P= 0.04) agility performance. A relationship (P= 0.01-0.04) was observed between the log of training load and odds of injury during each training phase, resulting in a 1.50 - 2.85 increase in the odds of injury for each arbitrary unit increase in training load. Furthermore, during the pre-season training phase there was a relationship (P= 0.01) between training load and injury incidence within the training load range of 155 and 590 arbitrary units. During the early and late-competition training phases, increases in training load of 175-620 arbitrary units and 145-410 arbitrary units, respectively, resulted in no further increase in injury incidence. These findings demonstrate that increases in training load, particularly during the pre-season training phase, increase the odds of injury in collision sport athletes. However, while increases in training load from 175 to 620 arbitrary units during the early-competition training phase result in no further increase in injury incidence, marked reductions in agility performances can occur. These findings suggest that reductions in training load during the early-competition training phase can reduce the odds of injury without compromising agility performances in collision sport athletes.

  13. How to address data gaps in life cycle inventories: a case study on estimating CO2 emissions from coal-fired electricity plants on a global scale.

    PubMed

    Steinmann, Zoran J N; Venkatesh, Aranya; Hauck, Mara; Schipper, Aafke M; Karuppiah, Ramkumar; Laurenzi, Ian J; Huijbregts, Mark A J

    2014-05-06

    One of the major challenges in life cycle assessment (LCA) is the availability and quality of data used to develop models and to make appropriate recommendations. Approximations and assumptions are often made if appropriate data are not readily available. However, these proxies may introduce uncertainty into the results. A regression model framework may be employed to assess missing data in LCAs of products and processes. In this study, we develop such a regression-based framework to estimate CO2 emission factors associated with coal power plants in the absence of reported data. Our framework hypothesizes that emissions from coal power plants can be explained by plant-specific factors (predictors) that include steam pressure, total capacity, plant age, fuel type, and gross domestic product (GDP) per capita of the resident nations of those plants. Using reported emission data for 444 plants worldwide, plant level CO2 emission factors were fitted to the selected predictors by a multiple linear regression model and a local linear regression model. The validated models were then applied to 764 coal power plants worldwide, for which no reported data were available. Cumulatively, available reported data and our predictions together account for 74% of the total world's coal-fired power generation capacity.

  14. A kinetic energy model of two-vehicle crash injury severity.

    PubMed

    Sobhani, Amir; Young, William; Logan, David; Bahrololoom, Sareh

    2011-05-01

    An important part of any model of vehicle crashes is the development of a procedure to estimate crash injury severity. After reviewing existing models of crash severity, this paper outlines the development of a modelling approach aimed at measuring the injury severity of people in two-vehicle road crashes. This model can be incorporated into a discrete event traffic simulation model, using simulation model outputs as its input. The model can then serve as an integral part of a simulation model estimating the crash potential of components of the traffic system. The model is developed using Newtonian Mechanics and Generalised Linear Regression. The factors contributing to the speed change (ΔV(s)) of a subject vehicle are identified using the law of conservation of momentum. A Log-Gamma regression model is fitted to measure speed change (ΔV(s)) of the subject vehicle based on the identified crash characteristics. The kinetic energy applied to the subject vehicle is calculated by the model, which in turn uses a Log-Gamma Regression Model to estimate the Injury Severity Score of the crash from the calculated kinetic energy, crash impact type, presence of airbag and/or seat belt and occupant age. Copyright © 2010 Elsevier Ltd. All rights reserved.

  15. Predicting the aquatic toxicity mode of action using logistic regression and linear discriminant analysis.

    PubMed

    Ren, Y Y; Zhou, L C; Yang, L; Liu, P Y; Zhao, B W; Liu, H X

    2016-09-01

    The paper highlights the use of the logistic regression (LR) method in the construction of acceptable statistically significant, robust and predictive models for the classification of chemicals according to their aquatic toxic modes of action. Essentials accounting for a reliable model were all considered carefully. The model predictors were selected by stepwise forward discriminant analysis (LDA) from a combined pool of experimental data and chemical structure-based descriptors calculated by the CODESSA and DRAGON software packages. Model predictive ability was validated both internally and externally. The applicability domain was checked by the leverage approach to verify prediction reliability. The obtained models are simple and easy to interpret. In general, LR performs much better than LDA and seems to be more attractive for the prediction of the more toxic compounds, i.e. compounds that exhibit excess toxicity versus non-polar narcotic compounds and more reactive compounds versus less reactive compounds. In addition, model fit and regression diagnostics was done through the influence plot which reflects the hat-values, studentized residuals, and Cook's distance statistics of each sample. Overdispersion was also checked for the LR model. The relationships between the descriptors and the aquatic toxic behaviour of compounds are also discussed.

  16. Physical fitness reference standards in fibromyalgia: The al-Ándalus project.

    PubMed

    Álvarez-Gallardo, I C; Carbonell-Baeza, A; Segura-Jiménez, V; Soriano-Maldonado, A; Intemann, T; Aparicio, V A; Estévez-López, F; Camiletti-Moirón, D; Herrador-Colmenero, M; Ruiz, J R; Delgado-Fernández, M; Ortega, F B

    2017-11-01

    We aimed (1) to report age-specific physical fitness levels in people with fibromyalgia of a representative sample from Andalusia; and (2) to compare the fitness levels of people with fibromyalgia with non-fibromyalgia controls. This cross-sectional study included 468 (21 men) patients with fibromyalgia and 360 (55 men) controls. The fibromyalgia sample was geographically representative from southern Spain. Physical fitness was assessed with the Senior Fitness Test battery plus the handgrip test. We applied the Generalized Additive Model for Location, Scale and Shape to calculate percentile curves for women and fitted mean curves using a linear regression for men. Our results show that people with fibromyalgia reached worse performance in all fitness tests than controls (P < 0.001) in all age ranges (P < 0.001). This study provides a comprehensive description of age-specific physical fitness levels among patients with fibromyalgia and controls in a large sample of patients with fibromyalgia from southern of Spain. Physical fitness levels of people with fibromyalgia from Andalusia are very low in comparison with age-matched healthy controls. This information could be useful to correctly interpret physical fitness assessments and helping health care providers to identify individuals at risk for losing physical independence. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  17. Simplified estimation of age-specific reference intervals for skewed data.

    PubMed

    Wright, E M; Royston, P

    1997-12-30

    Age-specific reference intervals are commonly used in medical screening and clinical practice, where interest lies in the detection of extreme values. Many different statistical approaches have been published on this topic. The advantages of a parametric method are that they necessarily produce smooth centile curves, the entire density is estimated and an explicit formula is available for the centiles. The method proposed here is a simplified version of a recent approach proposed by Royston and Wright. Basic transformations of the data and multiple regression techniques are combined to model the mean, standard deviation and skewness. Using these simple tools, which are implemented in almost all statistical computer packages, age-specific reference intervals may be obtained. The scope of the method is illustrated by fitting models to several real data sets and assessing each model using goodness-of-fit techniques.

  18. Differential gene expression detection and sample classification using penalized linear regression models.

    PubMed

    Wu, Baolin

    2006-02-15

    Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection.

  19. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Walsh, Seán, E-mail: walshsharp@gmail.com; Department of Oncology, Gray Institute for Radiation Oncology and Biology, University of Oxford, Oxford OX3 7DQ; Roelofs, Erik

    Purpose: A fully heterogeneous population averaged mechanistic tumor control probability (TCP) model is appropriate for the analysis of external beam radiotherapy (EBRT). This has been accomplished for EBRT photon treatment of intermediate-risk prostate cancer. Extending the TCP model for low and high-risk patients would be beneficial in terms of overall decision making. Furthermore, different radiation treatment modalities such as protons and carbon-ions are becoming increasingly available. Consequently, there is a need for a complete TCP model. Methods: A TCP model was fitted and validated to a primary endpoint of 5-year biological no evidence of disease clinical outcome data obtained frommore » a review of the literature for low, intermediate, and high-risk prostate cancer patients (5218 patients fitted, 1088 patients validated), treated by photons, protons, or carbon-ions. The review followed the preferred reporting item for systematic reviews and meta-analyses statement. Treatment regimens include standard fractionation and hypofractionation treatments. Residual analysis and goodness of fit statistics were applied. Results: The TCP model achieves a good level of fit overall, linear regression results in a p-value of <0.000 01 with an adjusted-weighted-R{sup 2} value of 0.77 and a weighted root mean squared error (wRMSE) of 1.2%, to the fitted clinical outcome data. Validation of the model utilizing three independent datasets obtained from the literature resulted in an adjusted-weighted-R{sup 2} value of 0.78 and a wRMSE of less than 1.8%, to the validation clinical outcome data. The weighted mean absolute residual across the entire dataset is found to be 5.4%. Conclusions: This TCP model fitted and validated to clinical outcome data, appears to be an appropriate model for the inclusion of all clinical prostate cancer risk categories, and allows evaluation of current EBRT modalities with regard to tumor control prediction.« less

  20. Influence of physical fitness on cardio-metabolic risk factors in European children. The IDEFICS study.

    PubMed

    Zaqout, M; Michels, N; Bammann, K; Ahrens, W; Sprengeler, O; Molnar, D; Hadjigeorgiou, C; Eiben, G; Konstabel, K; Russo, P; Jiménez-Pavón, D; Moreno, L A; De Henauw, S

    2016-07-01

    The aim of the study was to assess the associations of individual and combined physical fitness components with single and clustering of cardio-metabolic risk factors in children. This 2-year longitudinal study included a total of 1635 European children aged 6-11 years. The test battery included cardio-respiratory fitness (20-m shuttle run test), upper-limb strength (handgrip test), lower-limb strength (standing long jump test), balance (flamingo test), flexibility (back-saver sit-and-reach) and speed (40-m sprint test). Metabolic risk was assessed through z-score standardization using four components: waist circumference, blood pressure (systolic and diastolic), blood lipids (triglycerides and high-density lipoprotein) and insulin resistance (homeostasis model assessment). Mixed model regression analyses were adjusted for sex, age, parental education, sugar and fat intake, and body mass index. Physical fitness was inversely associated with clustered metabolic risk (P<0.001). All coefficients showed a higher clustered metabolic risk with lower physical fitness, except for upper-limb strength (β=0.057; P=0.002) where the opposite association was found. Cardio-respiratory fitness (β=-0.124; P<0.001) and lower-limb strength (β=-0.076; P=0.002) were the most important longitudinal determinants. The effects of cardio-respiratory fitness were even independent of the amount of vigorous-to-moderate activity (β=-0.059; P=0.029). Among all the metabolic risk components, blood pressure seemed not well predicted by physical fitness, while waist circumference, blood lipids and insulin resistance all seemed significantly predicted by physical fitness. Poor physical fitness in children is associated with the development of cardio-metabolic risk factors. Based on our results, this risk might be modified by improving mainly cardio-respiratory fitness and lower-limb muscular strength.

  1. Aggregating the response in time series regression models, applied to weather-related cardiovascular mortality.

    PubMed

    Masselot, Pierre; Chebana, Fateh; Bélanger, Diane; St-Hilaire, André; Abdous, Belkacem; Gosselin, Pierre; Ouarda, Taha B M J

    2018-07-01

    In environmental epidemiology studies, health response data (e.g. hospitalization or mortality) are often noisy because of hospital organization and other social factors. The noise in the data can hide the true signal related to the exposure. The signal can be unveiled by performing a temporal aggregation on health data and then using it as the response in regression analysis. From aggregated series, a general methodology is introduced to account for the particularities of an aggregated response in a regression setting. This methodology can be used with usually applied regression models in weather-related health studies, such as generalized additive models (GAM) and distributed lag nonlinear models (DLNM). In particular, the residuals are modelled using an autoregressive-moving average (ARMA) model to account for the temporal dependence. The proposed methodology is illustrated by modelling the influence of temperature on cardiovascular mortality in Canada. A comparison with classical DLNMs is provided and several aggregation methods are compared. Results show that there is an increase in the fit quality when the response is aggregated, and that the estimated relationship focuses more on the outcome over several days than the classical DLNM. More precisely, among various investigated aggregation schemes, it was found that an aggregation with an asymmetric Epanechnikov kernel is more suited for studying the temperature-mortality relationship. Copyright © 2018. Published by Elsevier B.V.

  2. Comparison of Cox's Regression Model and Parametric Models in Evaluating the Prognostic Factors for Survival after Liver Transplantation in Shiraz during 2000-2012.

    PubMed

    Adelian, R; Jamali, J; Zare, N; Ayatollahi, S M T; Pooladfar, G R; Roustaei, N

    2015-01-01

    Identification of the prognostic factors for survival in patients with liver transplantation is challengeable. Various methods of survival analysis have provided different, sometimes contradictory, results from the same data. To compare Cox's regression model with parametric models for determining the independent factors for predicting adults' and pediatrics' survival after liver transplantation. This study was conducted on 183 pediatric patients and 346 adults underwent liver transplantation in Namazi Hospital, Shiraz, southern Iran. The study population included all patients undergoing liver transplantation from 2000 to 2012. The prognostic factors sex, age, Child class, initial diagnosis of the liver disease, PELD/MELD score, and pre-operative laboratory markers were selected for survival analysis. Among 529 patients, 346 (64.5%) were adult and 183 (34.6%) were pediatric cases. Overall, the lognormal distribution was the best-fitting model for adult and pediatric patients. Age in adults (HR=1.16, p<0.05) and weight (HR=2.68, p<0.01) and Child class B (HR=2.12, p<0.05) in pediatric patients were the most important factors for prediction of survival after liver transplantation. Adult patients younger than the mean age and pediatric patients weighing above the mean and Child class A (compared to those with classes B or C) had better survival. Parametric regression model is a good alternative for the Cox's regression model.

  3. Hierarchical Bayesian spatial models for predicting multiple forest variables using waveform LiDAR, hyperspectral imagery, and large inventory datasets

    USGS Publications Warehouse

    Finley, Andrew O.; Banerjee, Sudipto; Cook, Bruce D.; Bradford, John B.

    2013-01-01

    In this paper we detail a multivariate spatial regression model that couples LiDAR, hyperspectral and forest inventory data to predict forest outcome variables at a high spatial resolution. The proposed model is used to analyze forest inventory data collected on the US Forest Service Penobscot Experimental Forest (PEF), ME, USA. In addition to helping meet the regression model's assumptions, results from the PEF analysis suggest that the addition of multivariate spatial random effects improves model fit and predictive ability, compared with two commonly applied modeling approaches. This improvement results from explicitly modeling the covariation among forest outcome variables and spatial dependence among observations through the random effects. Direct application of such multivariate models to even moderately large datasets is often computationally infeasible because of cubic order matrix algorithms involved in estimation. We apply a spatial dimension reduction technique to help overcome this computational hurdle without sacrificing richness in modeling.

  4. The effect of using genealogy-based haplotypes for genomic prediction

    PubMed Central

    2013-01-01

    Background Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. Methods A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (π) of the haplotype covariates had zero effect, i.e. a Bayesian mixture method. Results About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some cases, decreased the bias of prediction. With the Bayesian method, accuracy of prediction was less sensitive to parameter π when fitting haplotypes compared to fitting markers. Conclusions Use of haplotypes based on genealogy can slightly increase the accuracy of genomic prediction. Improved methods to cluster the haplotypes constructed from local genealogy could lead to additional gains in accuracy. PMID:23496971

  5. The effect of using genealogy-based haplotypes for genomic prediction.

    PubMed

    Edriss, Vahid; Fernando, Rohan L; Su, Guosheng; Lund, Mogens S; Guldbrandtsen, Bernt

    2013-03-06

    Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (π) of the haplotype covariates had zero effect, i.e. a Bayesian mixture method. About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some cases, decreased the bias of prediction. With the Bayesian method, accuracy of prediction was less sensitive to parameter π when fitting haplotypes compared to fitting markers. Use of haplotypes based on genealogy can slightly increase the accuracy of genomic prediction. Improved methods to cluster the haplotypes constructed from local genealogy could lead to additional gains in accuracy.

  6. Application of a Combined Model with Autoregressive Integrated Moving Average (ARIMA) and Generalized Regression Neural Network (GRNN) in Forecasting Hepatitis Incidence in Heng County, China

    PubMed Central

    Liang, Hao; Gao, Lian; Liang, Bingyu; Huang, Jiegang; Zang, Ning; Liao, Yanyan; Yu, Jun; Lai, Jingzhen; Qin, Fengxiang; Su, Jinming; Ye, Li; Chen, Hui

    2016-01-01

    Background Hepatitis is a serious public health problem with increasing cases and property damage in Heng County. It is necessary to develop a model to predict the hepatitis epidemic that could be useful for preventing this disease. Methods The autoregressive integrated moving average (ARIMA) model and the generalized regression neural network (GRNN) model were used to fit the incidence data from the Heng County CDC (Center for Disease Control and Prevention) from January 2005 to December 2012. Then, the ARIMA-GRNN hybrid model was developed. The incidence data from January 2013 to December 2013 were used to validate the models. Several parameters, including mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE) and mean square error (MSE), were used to compare the performance among the three models. Results The morbidity of hepatitis from Jan 2005 to Dec 2012 has seasonal variation and slightly rising trend. The ARIMA(0,1,2)(1,1,1)12 model was the most appropriate one with the residual test showing a white noise sequence. The smoothing factor of the basic GRNN model and the combined model was 1.8 and 0.07, respectively. The four parameters of the hybrid model were lower than those of the two single models in the validation. The parameters values of the GRNN model were the lowest in the fitting of the three models. Conclusions The hybrid ARIMA-GRNN model showed better hepatitis incidence forecasting in Heng County than the single ARIMA model and the basic GRNN model. It is a potential decision-supportive tool for controlling hepatitis in Heng County. PMID:27258555

  7. Fitness cost: a bacteriological explanation for the demise of the first international methicillin-resistant Staphylococcus aureus epidemic.

    PubMed

    Nielsen, Karen L; Pedersen, Thomas M; Udekwu, Klas I; Petersen, Andreas; Skov, Robert L; Hansen, Lars H; Hughes, Diarmaid; Frimodt-Møller, Niels

    2012-06-01

    Denmark and several other countries experienced the first epidemic of methicillin-resistant Staphylococcus aureus (MRSA) during the period 1965-75, which was caused by multiresistant isolates of phage complex 83A. In Denmark these MRSA isolates disappeared almost completely, being replaced by other phage types, predominantly only penicillin resistant. We investigated whether isolates of this epidemic were associated with a fitness cost, and we employed a mathematical model to ask whether these fitness costs could have led to the observed reduction in frequency. Bacteraemia isolates of S. aureus from Denmark have been stored since 1957. We chose 40 S. aureus isolates belonging to phage complex 83A, clonal complex 8 based on spa type, ranging in time of isolation from 1957 to 1980 and with various antibiograms, including both methicillin-resistant and -susceptible isolates. The relative fitness of each isolate was determined in a growth competition assay with a reference isolate. Significant fitness costs of 2%-15% were determined for the MRSA isolates studied. There was a significant negative correlation between number of antibiotic resistances and relative fitness. Multiple regression analysis found significantly independent negative correlations between fitness and the presence of mecA or streptomycin resistance. Mathematical modelling confirmed that fitness costs of the magnitude carried by these isolates could result in the disappearance of MRSA prevalence during a time span similar to that seen in Denmark. We propose a significant fitness cost of resistance as the main bacteriological explanation for the disappearance of the multiresistant complex 83A MRSA in Denmark following a reduction in antibiotic usage.

  8. Improving regression-model-based streamwater constituent load estimates derived from serially correlated data

    USGS Publications Warehouse

    Aulenbach, Brent T.

    2013-01-01

    A regression-model based approach is a commonly used, efficient method for estimating streamwater constituent load when there is a relationship between streamwater constituent concentration and continuous variables such as streamwater discharge, season and time. A subsetting experiment using a 30-year dataset of daily suspended sediment observations from the Mississippi River at Thebes, Illinois, was performed to determine optimal sampling frequency, model calibration period length, and regression model methodology, as well as to determine the effect of serial correlation of model residuals on load estimate precision. Two regression-based methods were used to estimate streamwater loads, the Adjusted Maximum Likelihood Estimator (AMLE), and the composite method, a hybrid load estimation approach. While both methods accurately and precisely estimated loads at the model’s calibration period time scale, precisions were progressively worse at shorter reporting periods, from annually to monthly. Serial correlation in model residuals resulted in observed AMLE precision to be significantly worse than the model calculated standard errors of prediction. The composite method effectively improved upon AMLE loads for shorter reporting periods, but required a sampling interval of at least 15-days or shorter, when the serial correlations in the observed load residuals were greater than 0.15. AMLE precision was better at shorter sampling intervals and when using the shortest model calibration periods, such that the regression models better fit the temporal changes in the concentration–discharge relationship. The models with the largest errors typically had poor high flow sampling coverage resulting in unrepresentative models. Increasing sampling frequency and/or targeted high flow sampling are more efficient approaches to ensure sufficient sampling and to avoid poorly performing models, than increasing calibration period length.

  9. Eigenvector Spatial Filtering Regression Modeling of Ground PM2.5 Concentrations Using Remotely Sensed Data.

    PubMed

    Zhang, Jingyi; Li, Bin; Chen, Yumin; Chen, Meijie; Fang, Tao; Liu, Yongfeng

    2018-06-11

    This paper proposes a regression model using the Eigenvector Spatial Filtering (ESF) method to estimate ground PM 2.5 concentrations. Covariates are derived from remotely sensed data including aerosol optical depth, normal differential vegetation index, surface temperature, air pressure, relative humidity, height of planetary boundary layer and digital elevation model. In addition, cultural variables such as factory densities and road densities are also used in the model. With the Yangtze River Delta region as the study area, we constructed ESF-based Regression (ESFR) models at different time scales, using data for the period between December 2015 and November 2016. We found that the ESFR models effectively filtered spatial autocorrelation in the OLS residuals and resulted in increases in the goodness-of-fit metrics as well as reductions in residual standard errors and cross-validation errors, compared to the classic OLS models. The annual ESFR model explained 70% of the variability in PM 2.5 concentrations, 16.7% more than the non-spatial OLS model. With the ESFR models, we performed detail analyses on the spatial and temporal distributions of PM 2.5 concentrations in the study area. The model predictions are lower than ground observations but match the general trend. The experiment shows that ESFR provides a promising approach to PM 2.5 analysis and prediction.

  10. Mathematical modelling of temperature effect on growth kinetics of Pseudomonas spp. on sliced mushroom (Agaricus bisporus).

    PubMed

    Tarlak, Fatih; Ozdemir, Murat; Melikoglu, Mehmet

    2018-02-02

    The growth data of Pseudomonas spp. on sliced mushrooms (Agaricus bisporus) stored between 4 and 28°C were obtained and fitted to three different primary models, known as the modified Gompertz, logistic and Baranyi models. The goodness of fit of these models was compared by considering the mean squared error (MSE) and the coefficient of determination for nonlinear regression (pseudo-R 2 ). The Baranyi model yielded the lowest MSE and highest pseudo-R 2 values. Therefore, the Baranyi model was selected as the best primary model. Maximum specific growth rate (r max ) and lag phase duration (λ) obtained from the Baranyi model were fitted to secondary models namely, the Ratkowsky and Arrhenius models. High pseudo-R 2 and low MSE values indicated that the Arrhenius model has a high goodness of fit to determine the effect of temperature on r max . Observed number of Pseudomonas spp. on sliced mushrooms from independent experiments was compared with the predicted number of Pseudomonas spp. with the models used by considering the B f and A f values. The B f and A f values were found to be 0.974 and 1.036, respectively. The correlation between the observed and predicted number of Pseudomonas spp. was high. Mushroom spoilage was simulated as a function of temperature with the models used. The models used for Pseudomonas spp. growth can provide a fast and cost-effective alternative to traditional microbiological techniques to determine the effect of storage temperature on product shelf-life. The models can be used to evaluate the growth behaviour of Pseudomonas spp. on sliced mushroom, set limits for the quantitative detection of the microbial spoilage and assess product shelf-life. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. Peak-flow characteristics of Virginia streams

    USGS Publications Warehouse

    Austin, Samuel H.; Krstolic, Jennifer L.; Wiegand, Ute

    2011-01-01

    Peak-flow annual exceedance probabilities, also called probability-percent chance flow estimates, and regional regression equations are provided describing the peak-flow characteristics of Virginia streams. Statistical methods are used to evaluate peak-flow data. Analysis of Virginia peak-flow data collected from 1895 through 2007 is summarized. Methods are provided for estimating unregulated peak flow of gaged and ungaged streams. Station peak-flow characteristics identified by fitting the logarithms of annual peak flows to a Log Pearson Type III frequency distribution yield annual exceedance probabilities of 0.5, 0.4292, 0.2, 0.1, 0.04, 0.02, 0.01, 0.005, and 0.002 for 476 streamgaging stations. Stream basin characteristics computed using spatial data and a geographic information system are used as explanatory variables in regional regression model equations for six physiographic regions to estimate regional annual exceedance probabilities at gaged and ungaged sites. Weighted peak-flow values that combine annual exceedance probabilities computed from gaging station data and from regional regression equations provide improved peak-flow estimates. Text, figures, and lists are provided summarizing selected peak-flow sites, delineated physiographic regions, peak-flow estimates, basin characteristics, regional regression model equations, error estimates, definitions, data sources, and candidate regression model equations. This study supersedes previous studies of peak flows in Virginia.

  12. Identification and quantification of ciprofloxacin in urine through excitation-emission fluorescence and three-way PARAFAC calibration.

    PubMed

    Ortiz, M C; Sarabia, L A; Sánchez, M S; Giménez, D

    2009-05-29

    Due to the second-order advantage, calibration models based on parallel factor analysis (PARAFAC) decomposition of three-way data are becoming important in routine analysis. This work studies the possibility of fitting PARAFAC models with excitation-emission fluorescence data for the determination of ciprofloxacin in human urine. The finally chosen PARAFAC decomposition is built with calibration samples spiked with ciprofloxacin, and with other series of urine samples that were also spiked. One of the series of samples has also another drug because the patient was taking mesalazine. The mesalazine is a fluorescent substance that interferes with the ciprofloxacin. Finally, the procedure is applied to samples of a patient who was being treated with ciprofloxacin. The trueness has been established by the regression "predicted concentration versus added concentration". The recovery factor is 88.3% for ciprofloxacin in urine, and the mean of the absolute value of the relative errors is 4.2% for 46 test samples. The multivariate sensitivity of the fit calibration model is evaluated by a regression between the loadings of PARAFAC linked to ciprofloxacin versus the true concentration in spiked samples. The multivariate capability of discrimination is near 8 microg L(-1) when the probabilities of false non-compliance and false compliance are fixed at 5%.

  13. A New Navigation Satellite Clock Bias Prediction Method Based on Modified Clock-bias Quadratic Polynomial Model

    NASA Astrophysics Data System (ADS)

    Wang, Y. P.; Lu, Z. P.; Sun, D. S.; Wang, N.

    2016-01-01

    In order to better express the characteristics of satellite clock bias (SCB) and improve SCB prediction precision, this paper proposed a new SCB prediction model which can take physical characteristics of space-borne atomic clock, the cyclic variation, and random part of SCB into consideration. First, the new model employs a quadratic polynomial model with periodic items to fit and extract the trend term and cyclic term of SCB; then based on the characteristics of fitting residuals, a time series ARIMA ~(Auto-Regressive Integrated Moving Average) model is used to model the residuals; eventually, the results from the two models are combined to obtain final SCB prediction values. At last, this paper uses precise SCB data from IGS (International GNSS Service) to conduct prediction tests, and the results show that the proposed model is effective and has better prediction performance compared with the quadratic polynomial model, grey model, and ARIMA model. In addition, the new method can also overcome the insufficiency of the ARIMA model in model recognition and order determination.

  14. [Association between physical fitness parameters and health related quality of life in Chilean community-dwelling older adults].

    PubMed

    Guede Rojas, Francisco; Chirosa Ríos, Luis Javier; Fuentealba Urra, Sergio; Vergara Ríos, César; Ulloa Díaz, David; Campos Jara, Christian; Barbosa González, Paola; Cuevas Aburto, Jesualdo

    2017-01-01

    There is no conclusive evidence about the association between physical fitness (PF) and health related quality of life (HRQOL) in older adults. To seek for an association between PF and HRQOL in non-disabled community-dwelling Chilean older adults. One hundred and sixteen subjects participated in the study. PF was assessed using the Senior Fitness Test (SFT) and hand grip strength (HGS). HRQOL was assessed using eight dimensions provided by the SF-12v2 questionnaire. Binary multivariate logistic regression models were carried out considering the potential influence of confounder variables. Non-adjusted models, indicated that subjects with better performance in arm curl test (ACT) were more likely to score higher on vitality dimension (OR > 1) and those with higher HGS were more likely to score higher on physical functioning, bodily pain, vitality and mental health (OR > 1). The adjusted models consistently showed that ACT and HGS predicted a favorable perception of vitality and mental health dimensions respectively (OR > 1). HGS and ACT have a predictive value for certain dimensions of HRQOL.

  15. Process model comparison and transferability across bioreactor scales and modes of operation for a mammalian cell bioprocess.

    PubMed

    Craven, Stephen; Shirsat, Nishikant; Whelan, Jessica; Glennon, Brian

    2013-01-01

    A Monod kinetic model, logistic equation model, and statistical regression model were developed for a Chinese hamster ovary cell bioprocess operated under three different modes of operation (batch, bolus fed-batch, and continuous fed-batch) and grown on two different bioreactor scales (3 L bench-top and 15 L pilot-scale). The Monod kinetic model was developed for all modes of operation under study and predicted cell density, glucose glutamine, lactate, and ammonia concentrations well for the bioprocess. However, it was computationally demanding due to the large number of parameters necessary to produce a good model fit. The transferability of the Monod kinetic model structure and parameter set across bioreactor scales and modes of operation was investigated and a parameter sensitivity analysis performed. The experimentally determined parameters had the greatest influence on model performance. They changed with scale and mode of operation, but were easily calculated. The remaining parameters, which were fitted using a differential evolutionary algorithm, were not as crucial. Logistic equation and statistical regression models were investigated as alternatives to the Monod kinetic model. They were less computationally intensive to develop due to the absence of a large parameter set. However, modeling of the nutrient and metabolite concentrations proved to be troublesome due to the logistic equation model structure and the inability of both models to incorporate a feed. The complexity, computational load, and effort required for model development has to be balanced with the necessary level of model sophistication when choosing which model type to develop for a particular application. Copyright © 2012 American Institute of Chemical Engineers (AIChE).

  16. Regression approaches in the test-negative study design for assessment of influenza vaccine effectiveness.

    PubMed

    Bond, H S; Sullivan, S G; Cowling, B J

    2016-06-01

    Influenza vaccination is the most practical means available for preventing influenza virus infection and is widely used in many countries. Because vaccine components and circulating strains frequently change, it is important to continually monitor vaccine effectiveness (VE). The test-negative design is frequently used to estimate VE. In this design, patients meeting the same clinical case definition are recruited and tested for influenza; those who test positive are the cases and those who test negative form the comparison group. When determining VE in these studies, the typical approach has been to use logistic regression, adjusting for potential confounders. Because vaccine coverage and influenza incidence change throughout the season, time is included among these confounders. While most studies use unconditional logistic regression, adjusting for time, an alternative approach is to use conditional logistic regression, matching on time. Here, we used simulation data to examine the potential for both regression approaches to permit accurate and robust estimates of VE. In situations where vaccine coverage changed during the influenza season, the conditional model and unconditional models adjusting for categorical week and using a spline function for week provided more accurate estimates. We illustrated the two approaches on data from a test-negative study of influenza VE against hospitalization in children in Hong Kong which resulted in the conditional logistic regression model providing the best fit to the data.

  17. Comparison of Logistic Regression and Artificial Neural Network in Low Back Pain Prediction: Second National Health Survey

    PubMed Central

    Parsaeian, M; Mohammad, K; Mahmoudi, M; Zeraati, H

    2012-01-01

    Background: The purpose of this investigation was to compare empirically predictive ability of an artificial neural network with a logistic regression in prediction of low back pain. Methods: Data from the second national health survey were considered in this investigation. This data includes the information of low back pain and its associated risk factors among Iranian people aged 15 years and older. Artificial neural network and logistic regression models were developed using a set of 17294 data and they were validated in a test set of 17295 data. Hosmer and Lemeshow recommendation for model selection was used in fitting the logistic regression. A three-layer perceptron with 9 inputs, 3 hidden and 1 output neurons was employed. The efficiency of two models was compared by receiver operating characteristic analysis, root mean square and -2 Loglikelihood criteria. Results: The area under the ROC curve (SE), root mean square and -2Loglikelihood of the logistic regression was 0.752 (0.004), 0.3832 and 14769.2, respectively. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the artificial neural network was 0.754 (0.004), 0.3770 and 14757.6, respectively. Conclusions: Based on these three criteria, artificial neural network would give better performance than logistic regression. Although, the difference is statistically significant, it does not seem to be clinically significant. PMID:23113198

  18. Comparison of logistic regression and artificial neural network in low back pain prediction: second national health survey.

    PubMed

    Parsaeian, M; Mohammad, K; Mahmoudi, M; Zeraati, H

    2012-01-01

    The purpose of this investigation was to compare empirically predictive ability of an artificial neural network with a logistic regression in prediction of low back pain. Data from the second national health survey were considered in this investigation. This data includes the information of low back pain and its associated risk factors among Iranian people aged 15 years and older. Artificial neural network and logistic regression models were developed using a set of 17294 data and they were validated in a test set of 17295 data. Hosmer and Lemeshow recommendation for model selection was used in fitting the logistic regression. A three-layer perceptron with 9 inputs, 3 hidden and 1 output neurons was employed. The efficiency of two models was compared by receiver operating characteristic analysis, root mean square and -2 Loglikelihood criteria. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the logistic regression was 0.752 (0.004), 0.3832 and 14769.2, respectively. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the artificial neural network was 0.754 (0.004), 0.3770 and 14757.6, respectively. Based on these three criteria, artificial neural network would give better performance than logistic regression. Although, the difference is statistically significant, it does not seem to be clinically significant.

  19. Wind Tunnel Strain-Gage Balance Calibration Data Analysis Using a Weighted Least Squares Approach

    NASA Technical Reports Server (NTRS)

    Ulbrich, N.; Volden, T.

    2017-01-01

    A new approach is presented that uses a weighted least squares fit to analyze wind tunnel strain-gage balance calibration data. The weighted least squares fit is specifically designed to increase the influence of single-component loadings during the regression analysis. The weighted least squares fit also reduces the impact of calibration load schedule asymmetries on the predicted primary sensitivities of the balance gages. A weighting factor between zero and one is assigned to each calibration data point that depends on a simple count of its intentionally loaded load components or gages. The greater the number of a data point's intentionally loaded load components or gages is, the smaller its weighting factor becomes. The proposed approach is applicable to both the Iterative and Non-Iterative Methods that are used for the analysis of strain-gage balance calibration data in the aerospace testing community. The Iterative Method uses a reasonable estimate of the tare corrected load set as input for the determination of the weighting factors. The Non-Iterative Method, on the other hand, uses gage output differences relative to the natural zeros as input for the determination of the weighting factors. Machine calibration data of a six-component force balance is used to illustrate benefits of the proposed weighted least squares fit. In addition, a detailed derivation of the PRESS residuals associated with a weighted least squares fit is given in the appendices of the paper as this information could not be found in the literature. These PRESS residuals may be needed to evaluate the predictive capabilities of the final regression models that result from a weighted least squares fit of the balance calibration data.

  20. Developing a Novel Parameter Estimation Method for Agent-Based Model in Immune System Simulation under the Framework of History Matching: A Case Study on Influenza A Virus Infection

    PubMed Central

    Li, Tingting; Cheng, Zhengguo; Zhang, Le

    2017-01-01

    Since they can provide a natural and flexible description of nonlinear dynamic behavior of complex system, Agent-based models (ABM) have been commonly used for immune system simulation. However, it is crucial for ABM to obtain an appropriate estimation for the key parameters of the model by incorporating experimental data. In this paper, a systematic procedure for immune system simulation by integrating the ABM and regression method under the framework of history matching is developed. A novel parameter estimation method by incorporating the experiment data for the simulator ABM during the procedure is proposed. First, we employ ABM as simulator to simulate the immune system. Then, the dimension-reduced type generalized additive model (GAM) is employed to train a statistical regression model by using the input and output data of ABM and play a role as an emulator during history matching. Next, we reduce the input space of parameters by introducing an implausible measure to discard the implausible input values. At last, the estimation of model parameters is obtained using the particle swarm optimization algorithm (PSO) by fitting the experiment data among the non-implausible input values. The real Influeza A Virus (IAV) data set is employed to demonstrate the performance of our proposed method, and the results show that the proposed method not only has good fitting and predicting accuracy, but it also owns favorable computational efficiency. PMID:29194393

  1. Developing a Novel Parameter Estimation Method for Agent-Based Model in Immune System Simulation under the Framework of History Matching: A Case Study on Influenza A Virus Infection.

    PubMed

    Li, Tingting; Cheng, Zhengguo; Zhang, Le

    2017-12-01

    Since they can provide a natural and flexible description of nonlinear dynamic behavior of complex system, Agent-based models (ABM) have been commonly used for immune system simulation. However, it is crucial for ABM to obtain an appropriate estimation for the key parameters of the model by incorporating experimental data. In this paper, a systematic procedure for immune system simulation by integrating the ABM and regression method under the framework of history matching is developed. A novel parameter estimation method by incorporating the experiment data for the simulator ABM during the procedure is proposed. First, we employ ABM as simulator to simulate the immune system. Then, the dimension-reduced type generalized additive model (GAM) is employed to train a statistical regression model by using the input and output data of ABM and play a role as an emulator during history matching. Next, we reduce the input space of parameters by introducing an implausible measure to discard the implausible input values. At last, the estimation of model parameters is obtained using the particle swarm optimization algorithm (PSO) by fitting the experiment data among the non-implausible input values. The real Influeza A Virus (IAV) data set is employed to demonstrate the performance of our proposed method, and the results show that the proposed method not only has good fitting and predicting accuracy, but it also owns favorable computational efficiency.

  2. A computational approach to compare regression modelling strategies in prediction research.

    PubMed

    Pajouheshnia, Romin; Pestman, Wiebe R; Teerenstra, Steven; Groenwold, Rolf H H

    2016-08-25

    It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in clinical prediction research. A framework for comparing logistic regression modelling strategies by their likelihoods was formulated using a wrapper approach. Five different strategies for modelling, including simple shrinkage methods, were compared in four empirical data sets to illustrate the concept of a priori strategy comparison. Simulations were performed in both randomly generated data and empirical data to investigate the influence of data characteristics on strategy performance. We applied the comparison framework in a case study setting. Optimal strategies were selected based on the results of a priori comparisons in a clinical data set and the performance of models built according to each strategy was assessed using the Brier score and calibration plots. The performance of modelling strategies was highly dependent on the characteristics of the development data in both linear and logistic regression settings. A priori comparisons in four empirical data sets found that no strategy consistently outperformed the others. The percentage of times that a model adjustment strategy outperformed a logistic model ranged from 3.9 to 94.9 %, depending on the strategy and data set. However, in our case study setting the a priori selection of optimal methods did not result in detectable improvement in model performance when assessed in an external data set. The performance of prediction modelling strategies is a data-dependent process and can be highly variable between data sets within the same clinical domain. A priori strategy comparison can be used to determine an optimal logistic regression modelling strategy for a given data set before selecting a final modelling approach.

  3. Developing a Model for Forecasting Road Traffic Accident (RTA) Fatalities in Yemen

    NASA Astrophysics Data System (ADS)

    Karim, Fareed M. A.; Abdo Saleh, Ali; Taijoobux, Aref; Ševrović, Marko

    2017-12-01

    The aim of this paper is to develop a model for forecasting RTA fatalities in Yemen. The yearly fatalities was modeled as the dependent variable, while the number of independent variables included the population, number of vehicles, GNP, GDP and Real GDP per capita. It was determined that all these variables are highly correlated with the correlation coefficient (r ≈ 0.9); in order to avoid multicollinearity in the model, a single variable with the highest r value was selected (real GDP per capita). A simple regression model was developed; the model was very good (R2=0.916); however, the residuals were serially correlated. The Prais-Winsten procedure was used to overcome this violation of the regression assumption. The data for a 20-year period from 1991-2010 were analyzed to build the model; the model was validated by using data for the years 2011-2013; the historical fit for the period 1991 - 2011 was very good. Also, the validation for 2011-2013 proved accurate.

  4. Variable selection with stepwise and best subset approaches

    PubMed Central

    2016-01-01

    While purposeful selection is performed partly by software and partly by hand, the stepwise and best subset approaches are automatically performed by software. Two R functions stepAIC() and bestglm() are well designed for stepwise and best subset regression, respectively. The stepAIC() function begins with a full or null model, and methods for stepwise regression can be specified in the direction argument with character values “forward”, “backward” and “both”. The bestglm() function begins with a data frame containing explanatory variables and response variables. The response variable should be in the last column. Varieties of goodness-of-fit criteria can be specified in the IC argument. The Bayesian information criterion (BIC) usually results in more parsimonious model than the Akaike information criterion. PMID:27162786

  5. Is the effect of person-organisation fit on turnover intention mediated by job satisfaction? A survey of community health workers in China

    PubMed Central

    Yan, Fei; Wang, Wei; Li, Guohong

    2017-01-01

    Objectives Person-organisation fit (P-O fit) is a predictor of work attitude. However, in the area of human resource for health, the literature of P-O fit is quite limited. It is unclear whether P-O fit directly or indirectly affects turnover intention. This study aims to examine the mediation effect of job satisfaction on the relationship between P-O fit and turnover intention based on data from China. Design and methods This is a cross-sectional survey of community health workers (CHWs) in China in 2013. A questionnaire of P-O fit, job satisfaction and turnover intention was developed, and its validity and reliability were assessed. Multiple regression and structural equation modelling were used to examine the relationship among P-O fit, job satisfaction and turnover intention. Setting and participants Multistage sampling was applied. In total, 656 valid questionnaire responses were collected from CHWs in four provincial regions in China, namely Shanghai, Shaanxi, Shandong and Anhui. Results P-O fit was directly related to job satisfaction (standardised β 0.246) and inversely related to turnover intention (standardised β −0.186). In the mediation model, the total effect of P-O fit on turnover intention was −0.186 (p<0.001); the direct effect of P-O fit on turnover intention was −0.094 (p<0.01); the indirect effect of job satisfaction on the relationship between P-O fit and turnover intention was −0.092 (p<0.001). Conclusions The effect of P-O fit on turnover intention was partially mediated through job satisfaction. It is suggested that more work attitude variables and different dimensions of P-O fit be taken into account to examine the complete mechanism of person-organisation interaction. Indirect measures of P-O fit should be encouraged in practice to enhance work attitudes of health workers. PMID:28399513

  6. Random Regression Models Are Suitable to Substitute the Traditional 305-Day Lactation Model in Genetic Evaluations of Holstein Cattle in Brazil

    PubMed Central

    Padilha, Alessandro Haiduck; Cobuci, Jaime Araujo; Costa, Cláudio Napolis; Neto, José Braccini

    2016-01-01

    The aim of this study was to compare two random regression models (RRM) fitted by fourth (RRM4) and fifth-order Legendre polynomials (RRM5) with a lactation model (LM) for evaluating Holstein cattle in Brazil. Two datasets with the same animals were prepared for this study. To apply test-day RRM and LMs, 262,426 test day records and 30,228 lactation records covering 305 days were prepared, respectively. The lowest values of Akaike’s information criterion, Bayesian information criterion, and estimates of the maximum of the likelihood function (−2LogL) were for RRM4. Heritability for 305-day milk yield (305MY) was 0.23 (RRM4), 0.24 (RRM5), and 0.21 (LM). Heritability, additive genetic and permanent environmental variances of test days on days in milk was from 0.16 to 0.27, from 3.76 to 6.88 and from 11.12 to 20.21, respectively. Additive genetic correlations between test days ranged from 0.20 to 0.99. Permanent environmental correlations between test days were between 0.07 and 0.99. Standard deviations of average estimated breeding values (EBVs) for 305MY from RRM4 and RRM5 were from 11% to 30% higher for bulls and around 28% higher for cows than that in LM. Rank correlations between RRM EBVs and LM EBVs were between 0.86 to 0.96 for bulls and 0.80 to 0.87 for cows. Average percentage of gain in reliability of EBVs for 305-day yield increased from 4% to 17% for bulls and from 23% to 24% for cows when reliability of EBVs from RRM models was compared to those from LM model. Random regression model fitted by fourth order Legendre polynomials is recommended for genetic evaluations of Brazilian Holstein cattle because of the higher reliability in the estimation of breeding values. PMID:26954176

  7. Random Regression Models Are Suitable to Substitute the Traditional 305-Day Lactation Model in Genetic Evaluations of Holstein Cattle in Brazil.

    PubMed

    Padilha, Alessandro Haiduck; Cobuci, Jaime Araujo; Costa, Cláudio Napolis; Neto, José Braccini

    2016-06-01

    The aim of this study was to compare two random regression models (RRM) fitted by fourth (RRM4) and fifth-order Legendre polynomials (RRM5) with a lactation model (LM) for evaluating Holstein cattle in Brazil. Two datasets with the same animals were prepared for this study. To apply test-day RRM and LMs, 262,426 test day records and 30,228 lactation records covering 305 days were prepared, respectively. The lowest values of Akaike's information criterion, Bayesian information criterion, and estimates of the maximum of the likelihood function (-2LogL) were for RRM4. Heritability for 305-day milk yield (305MY) was 0.23 (RRM4), 0.24 (RRM5), and 0.21 (LM). Heritability, additive genetic and permanent environmental variances of test days on days in milk was from 0.16 to 0.27, from 3.76 to 6.88 and from 11.12 to 20.21, respectively. Additive genetic correlations between test days ranged from 0.20 to 0.99. Permanent environmental correlations between test days were between 0.07 and 0.99. Standard deviations of average estimated breeding values (EBVs) for 305MY from RRM4 and RRM5 were from 11% to 30% higher for bulls and around 28% higher for cows than that in LM. Rank correlations between RRM EBVs and LM EBVs were between 0.86 to 0.96 for bulls and 0.80 to 0.87 for cows. Average percentage of gain in reliability of EBVs for 305-day yield increased from 4% to 17% for bulls and from 23% to 24% for cows when reliability of EBVs from RRM models was compared to those from LM model. Random regression model fitted by fourth order Legendre polynomials is recommended for genetic evaluations of Brazilian Holstein cattle because of the higher reliability in the estimation of breeding values.

  8. Statistical Approaches for Spatiotemporal Prediction of Low Flows

    NASA Astrophysics Data System (ADS)

    Fangmann, A.; Haberlandt, U.

    2017-12-01

    An adequate assessment of regional climate change impacts on streamflow requires the integration of various sources of information and modeling approaches. This study proposes simple statistical tools for inclusion into model ensembles, which are fast and straightforward in their application, yet able to yield accurate streamflow predictions in time and space. Target variables for all approaches are annual low flow indices derived from a data set of 51 records of average daily discharge for northwestern Germany. The models require input of climatic data in the form of meteorological drought indices, derived from observed daily climatic variables, averaged over the streamflow gauges' catchments areas. Four different modeling approaches are analyzed. Basis for all pose multiple linear regression models that estimate low flows as a function of a set of meteorological indices and/or physiographic and climatic catchment descriptors. For the first method, individual regression models are fitted at each station, predicting annual low flow values from a set of annual meteorological indices, which are subsequently regionalized using a set of catchment characteristics. The second method combines temporal and spatial prediction within a single panel data regression model, allowing estimation of annual low flow values from input of both annual meteorological indices and catchment descriptors. The third and fourth methods represent non-stationary low flow frequency analyses and require fitting of regional distribution functions. Method three is subject to a spatiotemporal prediction of an index value, method four to estimation of L-moments that adapt the regional frequency distribution to the at-site conditions. The results show that method two outperforms successive prediction in time and space. Method three also shows a high performance in the near future period, but since it relies on a stationary distribution, its application for prediction of far future changes may be problematic. Spatiotemporal prediction of L-moments appeared highly uncertain for higher-order moments resulting in unrealistic future low flow values. All in all, the results promote an inclusion of simple statistical methods in climate change impact assessment.

  9. Insights into the equilibrium, kinetic and thermodynamics of nickel removal by environmental friendly Lansium domesticum peel biosorbent.

    PubMed

    Lam, Yun Fung; Lee, Lai Yee; Chua, Song Jun; Lim, Siew Shee; Gan, Suyin

    2016-05-01

    Lansium domesticum peel (LDP), a waste material generated from the fruit consumption, was evaluated as a biosorbent for nickel removal from aqueous media. The effects of dosage, contact time, initial pH, initial concentration and temperature on the biosorption process were investigated in batch experiments. Equilibrium data were fitted by the Langmuir, Freundlich, Temkin and Dubinin-Radushkevich models using nonlinear regression method with the best-fit model evaluated based on coefficient of determination (R(2)) and Chi-square (χ(2)). The best-fit isotherm was found to be the Langmuir model exhibiting R(2) very close to unity (0.997-0.999), smallest χ(2) (0.0138-0.0562) and largest biosorption capacity (10.1mg/g) at 30°C. Kinetic studies showed that the initial nickel removal was rapid with the equilibrium state established within 30min. Pseudo-second-order model was the best-fit kinetic model indicating the chemisorption nature of the biosorption process. Further data analysis by the intraparticle diffusion model revealed the involvement of several rate-controlling steps such as boundary layer and intraparticle diffusion. Thermodynamically, the process was exothermic, spontaneous and feasible. Regeneration studies indicated that LDP biosorbent could be regenerated using hydrochloric acid solution with up to 85% efficiency. The present investigation proved that LDP having no economic value can be used as an alternative eco-friendly biosorbent for remediation of nickel contaminated water. Copyright © 2016 Elsevier Inc. All rights reserved.

  10. Modelling a model?!! Prediction of observed and calculated daily pan evaporation in New Mexico, U.S.A.

    NASA Astrophysics Data System (ADS)

    Beriro, D. J.; Abrahart, R. J.; Nathanail, C. P.

    2012-04-01

    Data-driven modelling is most commonly used to develop predictive models that will simulate natural processes. This paper, in contrast, uses Gene Expression Programming (GEP) to construct two alternative models of different pan evaporation estimations by means of symbolic regression: a simulator, a model of a real-world process developed on observed records, and an emulator, an imitator of some other model developed on predicted outputs calculated by that source model. The solutions are compared and contrasted for the purposes of determining whether any substantial differences exist between either option. This analysis will address recent arguments over the impact of using downloaded hydrological modelling datasets originating from different initial sources i.e. observed or calculated. These differences can be easily be overlooked by modellers, resulting in a model of a model developed on estimations derived from deterministic empirical equations and producing exceptionally high goodness-of-fit. This paper uses different lines-of-evidence to evaluate model output and in so doing paves the way for a new protocol in machine learning applications. Transparent modelling tools such as symbolic regression offer huge potential for explaining stochastic processes, however, the basic tenets of data quality and recourse to first principles with regard to problem understanding should not be trivialised. GEP is found to be an effective tool for the prediction of observed and calculated pan evaporation, with results supported by an understanding of the records, and of the natural processes concerned, evaluated using one-at-a-time response function sensitivity analysis. The results show that both architectures and response functions are very similar, implying that previously observed differences in goodness-of-fit can be explained by whether models are applied to observed or calculated data.

  11. qFeature

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    2015-09-14

    This package contains statistical routines for extracting features from multivariate time-series data which can then be used for subsequent multivariate statistical analysis to identify patterns and anomalous behavior. It calculates local linear or quadratic regression model fits to moving windows for each series and then summarizes the model coefficients across user-defined time intervals for each series. These methods are domain agnostic-but they have been successfully applied to a variety of domains, including commercial aviation and electric power grid data.

  12. Determinants of U.S. Prescription Drug Utilization using County Level Data.

    PubMed

    Nianogo, Thierry; Okunade, Albert; Fofana, Demba; Chen, Weiwei

    2016-05-01

    Prescription drugs are the third largest component of U.S. healthcare expenditures. The 2006 Medicare Part D and the 2010 Affordable Care Act are catalysts for further growths in utilization becuase of insurance expansion effects. This research investigating the determinants of prescription drug utilization is timely, methodologically novel, and policy relevant. Differences in population health status, access to care, socioeconomics, demographics, and variations in per capita number of scripts filled at retail pharmacies across the U.S.A. justify fitting separate econometric models to county data of the states partitioned into low, medium, and high prescription drug users. Given the skewed distribution of per capita number of filled prescriptions (response variable), we fit the variance stabilizing Box-Cox power transformation regression models to 2011 county level data for investigating the correlates of prescription drug utilization separately for low, medium, and high utilization states. Maximum likelihood regression parameter estimates, including the optimal Box-Cox λ power transformations, differ across high (λ = 0.214), medium (λ = 0.942), and low (λ = 0.302) prescription drug utilization models. The estimated income elasticities of -0.634, 0.031, and -0.532 in high, medium, and low utilization models suggest that the economic behavior of prescriptions is not invariant across different utilization levels. Copyright © 2015 John Wiley & Sons, Ltd.

  13. Genetic Parameters for Milk Yield and Lactation Persistency Using Random Regression Models in Girolando Cattle

    PubMed Central

    Canaza-Cayo, Ali William; Lopes, Paulo Sávio; da Silva, Marcos Vinicius Gualberto Barbosa; de Almeida Torres, Robledo; Martins, Marta Fonseca; Arbex, Wagner Antonio; Cobuci, Jaime Araujo

    2015-01-01

    A total of 32,817 test-day milk yield (TDMY) records of the first lactation of 4,056 Girolando cows daughters of 276 sires, collected from 118 herds between 2000 and 2011 were utilized to estimate the genetic parameters for TDMY via random regression models (RRM) using Legendre’s polynomial functions whose orders varied from 3 to 5. In addition, nine measures of persistency in milk yield (PSi) and the genetic trend of 305-day milk yield (305MY) were evaluated. The fit quality criteria used indicated RRM employing the Legendre’s polynomial of orders 3 and 5 for fitting the genetic additive and permanent environment effects, respectively, as the best model. The heritability and genetic correlation for TDMY throughout the lactation, obtained with the best model, varied from 0.18 to 0.23 and from −0.03 to 1.00, respectively. The heritability and genetic correlation for persistency and 305MY varied from 0.10 to 0.33 and from −0.98 to 1.00, respectively. The use of PS7 would be the most suitable option for the evaluation of Girolando cattle. The estimated breeding values for 305MY of sires and cows showed significant and positive genetic trends. Thus, the use of selection indices would be indicated in the genetic evaluation of Girolando cattle for both traits. PMID:26323397

  14. Multi-omics facilitated variable selection in Cox-regression model for cancer prognosis prediction.

    PubMed

    Liu, Cong; Wang, Xujun; Genchev, Georgi Z; Lu, Hui

    2017-07-15

    New developments in high-throughput genomic technologies have enabled the measurement of diverse types of omics biomarkers in a cost-efficient and clinically-feasible manner. Developing computational methods and tools for analysis and translation of such genomic data into clinically-relevant information is an ongoing and active area of investigation. For example, several studies have utilized an unsupervised learning framework to cluster patients by integrating omics data. Despite such recent advances, predicting cancer prognosis using integrated omics biomarkers remains a challenge. There is also a shortage of computational tools for predicting cancer prognosis by using supervised learning methods. The current standard approach is to fit a Cox regression model by concatenating the different types of omics data in a linear manner, while penalty could be added for feature selection. A more powerful approach, however, would be to incorporate data by considering relationships among omics datatypes. Here we developed two methods: a SKI-Cox method and a wLASSO-Cox method to incorporate the association among different types of omics data. Both methods fit the Cox proportional hazards model and predict a risk score based on mRNA expression profiles. SKI-Cox borrows the information generated by these additional types of omics data to guide variable selection, while wLASSO-Cox incorporates this information as a penalty factor during model fitting. We show that SKI-Cox and wLASSO-Cox models select more true variables than a LASSO-Cox model in simulation studies. We assess the performance of SKI-Cox and wLASSO-Cox using TCGA glioblastoma multiforme and lung adenocarcinoma data. In each case, mRNA expression, methylation, and copy number variation data are integrated to predict the overall survival time of cancer patients. Our methods achieve better performance in predicting patients' survival in glioblastoma and lung adenocarcinoma. Copyright © 2017. Published by Elsevier Inc.

  15. Enhancing Hungarian Special Forces through Transformation -- The Shift to Special Operations Forces

    DTIC Science & Technology

    2010-06-01

    heteroskedasticity and the Ramsey RESET test . For the detailed regression results see Appendix B. Damodar N. Gujarati, Basic Econometrics , Third...96 Table 13. Ramsey RESET test using powers of the fitted values of DV1 (relative attitude toward HUNSF... Ramsey RESET test using powers of the fitted values of DV1 (relative attitude toward HUNSF) B. REGRESSION ANALYSIS

  16. Meteorological adjustment of yearly mean values for air pollutant concentration comparison

    NASA Technical Reports Server (NTRS)

    Sidik, S. M.; Neustadter, H. E.

    1976-01-01

    Using multiple linear regression analysis, models which estimate mean concentrations of Total Suspended Particulate (TSP), sulfur dioxide, and nitrogen dioxide as a function of several meteorologic variables, two rough economic indicators, and a simple trend in time are studied. Meteorologic data were obtained and do not include inversion heights. The goodness of fit of the estimated models is partially reflected by the squared coefficient of multiple correlation which indicates that, at the various sampling stations, the models accounted for about 23 to 47 percent of the total variance of the observed TSP concentrations. If the resulting model equations are used in place of simple overall means of the observed concentrations, there is about a 20 percent improvement in either: (1) predicting mean concentrations for specified meteorological conditions; or (2) adjusting successive yearly averages to allow for comparisons devoid of meteorological effects. An application to source identification is presented using regression coefficients of wind velocity predictor variables.

  17. Aircraft Anomaly Detection Using Performance Models Trained on Fleet Data

    NASA Technical Reports Server (NTRS)

    Gorinevsky, Dimitry; Matthews, Bryan L.; Martin, Rodney

    2012-01-01

    This paper describes an application of data mining technology called Distributed Fleet Monitoring (DFM) to Flight Operational Quality Assurance (FOQA) data collected from a fleet of commercial aircraft. DFM transforms the data into aircraft performance models, flight-to-flight trends, and individual flight anomalies by fitting a multi-level regression model to the data. The model represents aircraft flight performance and takes into account fixed effects: flight-to-flight and vehicle-to-vehicle variability. The regression parameters include aerodynamic coefficients and other aircraft performance parameters that are usually identified by aircraft manufacturers in flight tests. Using DFM, the multi-terabyte FOQA data set with half-million flights was processed in a few hours. The anomalies found include wrong values of competed variables, (e.g., aircraft weight), sensor failures and baises, failures, biases, and trends in flight actuators. These anomalies were missed by the existing airline monitoring of FOQA data exceedances.

  18. Confidence limits for data mining models of options prices

    NASA Astrophysics Data System (ADS)

    Healy, J. V.; Dixon, M.; Read, B. J.; Cai, F. F.

    2004-12-01

    Non-parametric methods such as artificial neural nets can successfully model prices of financial options, out-performing the Black-Scholes analytic model (Eur. Phys. J. B 27 (2002) 219). However, the accuracy of such approaches is usually expressed only by a global fitting/error measure. This paper describes a robust method for determining prediction intervals for models derived by non-linear regression. We have demonstrated it by application to a standard synthetic example (29th Annual Conference of the IEEE Industrial Electronics Society, Special Session on Intelligent Systems, pp. 1926-1931). The method is used here to obtain prediction intervals for option prices using market data for LIFFE “ESX” FTSE 100 index options ( http://www.liffe.com/liffedata/contracts/month_onmonth.xls). We avoid special neural net architectures and use standard regression procedures to determine local error bars. The method is appropriate for target data with non constant variance (or volatility).

  19. Estimating multilevel logistic regression models when the number of clusters is low: a comparison of different statistical software procedures.

    PubMed

    Austin, Peter C

    2010-04-22

    Multilevel logistic regression models are increasingly being used to analyze clustered data in medical, public health, epidemiological, and educational research. Procedures for estimating the parameters of such models are available in many statistical software packages. There is currently little evidence on the minimum number of clusters necessary to reliably fit multilevel regression models. We conducted a Monte Carlo study to compare the performance of different statistical software procedures for estimating multilevel logistic regression models when the number of clusters was low. We examined procedures available in BUGS, HLM, R, SAS, and Stata. We found that there were qualitative differences in the performance of different software procedures for estimating multilevel logistic models when the number of clusters was low. Among the likelihood-based procedures, estimation methods based on adaptive Gauss-Hermite approximations to the likelihood (glmer in R and xtlogit in Stata) or adaptive Gaussian quadrature (Proc NLMIXED in SAS) tended to have superior performance for estimating variance components when the number of clusters was small, compared to software procedures based on penalized quasi-likelihood. However, only Bayesian estimation with BUGS allowed for accurate estimation of variance components when there were fewer than 10 clusters. For all statistical software procedures, estimation of variance components tended to be poor when there were only five subjects per cluster, regardless of the number of clusters.

  20. Prognostic models for predicting posttraumatic seizures during acute hospitalization, and at 1 and 2 years following traumatic brain injury.

    PubMed

    Ritter, Anne C; Wagner, Amy K; Szaflarski, Jerzy P; Brooks, Maria M; Zafonte, Ross D; Pugh, Mary Jo V; Fabio, Anthony; Hammond, Flora M; Dreer, Laura E; Bushnik, Tamara; Walker, William C; Brown, Allen W; Johnson-Greene, Doug; Shea, Timothy; Krellman, Jason W; Rosenthal, Joseph A

    2016-09-01

    Posttraumatic seizures (PTS) are well-recognized acute and chronic complications of traumatic brain injury (TBI). Risk factors have been identified, but considerable variability in who develops PTS remains. Existing PTS prognostic models are not widely adopted for clinical use and do not reflect current trends in injury, diagnosis, or care. We aimed to develop and internally validate preliminary prognostic regression models to predict PTS during acute care hospitalization, and at year 1 and year 2 postinjury. Prognostic models predicting PTS during acute care hospitalization and year 1 and year 2 post-injury were developed using a recent (2011-2014) cohort from the TBI Model Systems National Database. Potential PTS predictors were selected based on previous literature and biologic plausibility. Bivariable logistic regression identified variables with a p-value < 0.20 that were used to fit initial prognostic models. Multivariable logistic regression modeling with backward-stepwise elimination was used to determine reduced prognostic models and to internally validate using 1,000 bootstrap samples. Fit statistics were calculated, correcting for overfitting (optimism). The prognostic models identified sex, craniotomy, contusion load, and pre-injury limitation in learning/remembering/concentrating as significant PTS predictors during acute hospitalization. Significant predictors of PTS at year 1 were subdural hematoma (SDH), contusion load, craniotomy, craniectomy, seizure during acute hospitalization, duration of posttraumatic amnesia, preinjury mental health treatment/psychiatric hospitalization, and preinjury incarceration. Year 2 significant predictors were similar to those of year 1: SDH, intraparenchymal fragment, craniotomy, craniectomy, seizure during acute hospitalization, and preinjury incarceration. Corrected concordance (C) statistics were 0.599, 0.747, and 0.716 for acute hospitalization, year 1, and year 2 models, respectively. The prognostic model for PTS during acute hospitalization did not discriminate well. Year 1 and year 2 models showed fair to good predictive validity for PTS. Cranial surgery, although medically necessary, requires ongoing research regarding potential benefits of increased monitoring for signs of epileptogenesis, PTS prophylaxis, and/or rehabilitation/social support. Future studies should externally validate models and determine clinical utility. Wiley Periodicals, Inc. © 2016 International League Against Epilepsy.

  1. Random regression models using Legendre orthogonal polynomials to evaluate the milk production of Alpine goats.

    PubMed

    Silva, F G; Torres, R A; Brito, L F; Euclydes, R F; Melo, A L P; Souza, N O; Ribeiro, J I; Rodrigues, M T

    2013-12-11

    The objective of this study was to identify the best random regression model using Legendre orthogonal polynomials to evaluate Alpine goats genetically and to estimate the parameters for test day milk yield. On the test day, we analyzed 20,710 records of milk yield of 667 goats from the Goat Sector of the Universidade Federal de Viçosa. The evaluated models had combinations of distinct fitting orders for polynomials (2-5), random genetic (1-7), and permanent environmental (1-7) fixed curves and a number of classes for residual variance (2, 4, 5, and 6). WOMBAT software was used for all genetic analyses. A random regression model using the best Legendre orthogonal polynomial for genetic evaluation of milk yield on the test day of Alpine goats considered a fixed curve of order 4, curve of genetic additive effects of order 2, curve of permanent environmental effects of order 7, and a minimum of 5 classes of residual variance because it was the most economical model among those that were equivalent to the complete model by the likelihood ratio test. Phenotypic variance and heritability were higher at the end of the lactation period, indicating that the length of lactation has more genetic components in relation to the production peak and persistence. It is very important that the evaluation utilizes the best combination of fixed, genetic additive and permanent environmental regressions, and number of classes of heterogeneous residual variance for genetic evaluation using random regression models, thereby enhancing the precision and accuracy of the estimates of parameters and prediction of genetic values.

  2. Estimating V0[subscript 2]max Using a Personalized Step Test

    ERIC Educational Resources Information Center

    Webb, Carrie; Vehrs, Pat R.; George, James D.; Hager, Ronald

    2014-01-01

    The purpose of this study was to develop a step test with a personalized step rate and step height to predict cardiorespiratory fitness in 80 college-aged males and females using the self-reported perceived functional ability scale and data collected during the step test. Multiple linear regression analysis yielded a model (R = 0.90, SEE = 3.43…

  3. Equations for Estimating Biomass of Herbaceous and Woody Vegetation in Early-Successional Southern Appalachian Pine-Hardwood Forests

    Treesearch

    Katherine J. Elliott; Barton D. Clinton

    1993-01-01

    Allometric equations were developed to predict aboveground dry weight of herbaceous and woody species on prescribe-burned sites in the Southern Appalachians. Best-fit least-square regression models were developed using diamet,er, height, or both, as the independent variables and dry weight as the dependent variable. Coefficients of determination for the selected total...

  4. A Case for Transforming the Criterion of a Predictive Validity Study

    ERIC Educational Resources Information Center

    Patterson, Brian F.; Kobrin, Jennifer L.

    2011-01-01

    This study presents a case for applying a transformation (Box and Cox, 1964) of the criterion used in predictive validity studies. The goals of the transformation were to better meet the assumptions of the linear regression model and to reduce the residual variance of fitted (i.e., predicted) values. Using data for the 2008 cohort of first-time,…

  5. Solutions for Determining the Significance Region Using the Johnson-Neyman Type Procedure in Generalized Linear (Mixed) Models

    ERIC Educational Resources Information Center

    Lazar, Ann A.; Zerbe, Gary O.

    2011-01-01

    Researchers often compare the relationship between an outcome and covariate for two or more groups by evaluating whether the fitted regression curves differ significantly. When they do, researchers need to determine the "significance region," or the values of the covariate where the curves significantly differ. In analysis of covariance (ANCOVA),…

  6. An application of model-fitting procedures for marginal structural models.

    PubMed

    Mortimer, Kathleen M; Neugebauer, Romain; van der Laan, Mark; Tager, Ira B

    2005-08-15

    Marginal structural models (MSMs) are being used more frequently to obtain causal effect estimates in observational studies. Although the principal estimator of MSM coefficients has been the inverse probability of treatment weight (IPTW) estimator, there are few published examples that illustrate how to apply IPTW or discuss the impact of model selection on effect estimates. The authors applied IPTW estimation of an MSM to observational data from the Fresno Asthmatic Children's Environment Study (2000-2002) to evaluate the effect of asthma rescue medication use on pulmonary function and compared their results with those obtained through traditional regression methods. Akaike's Information Criterion and cross-validation methods were used to fit the MSM. In this paper, the influence of model selection and evaluation of key assumptions such as the experimental treatment assignment assumption are discussed in detail. Traditional analyses suggested that medication use was not associated with an improvement in pulmonary function--a finding that is counterintuitive and probably due to confounding by symptoms and asthma severity. The final MSM estimated that medication use was causally related to a 7% improvement in pulmonary function. The authors present examples that should encourage investigators who use IPTW estimation to undertake and discuss the impact of model-fitting procedures to justify the choice of the final weights.

  7. Work Demands-Burnout and Job Engagement-Job Satisfaction Relationships: Teamwork as a Mediator and Moderator.

    PubMed

    Mijakoski, Dragan; Karadzinska-Bislimovska, Jovanka; Basarovska, Vera; Minov, Jordan; Stoleski, Sasho; Angeleska, Nada; Atanasovska, Aneta

    2015-03-15

    Few studies have examined teamwork as mediator and moderator of work demands-burnout and job engagement-job satisfaction relationships in healthcare workers (HCWs) in South-East Europe. To assess mediation and moderation effect of teamwork on the relationship between independent (work demands or job engagement) and dependent (burnout or job satisfaction) variables. Work demands, burnout, job engagement, and job satisfaction were measured with Hospital Experience Scale, Maslach Burnout Inventory, Utrecht Work Engagement Scale, and Job Satisfaction Survey, respectively. Hospital Survey on Patient Safety Culture was used for assessment of teamwork. In order to examine role of teamwork as a mediating variable we fit series of regression models for burnout and job satisfaction. We also fit regression models predicting outcome (burnout or job satisfaction) from predictor (work demands or job engagement) and moderator (teamwork) variable. Teamwork was partial mediator of work demands-burnout relationship and full mediator of job engagement-job satisfaction relationship. We found that only job engagement-job satisfaction relationship was moderated by teamwork. Occupational health services should target detection of burnout in HCWs and implementation of organizational interventions in hospitals, taking into account findings that teamwork predicted reduced burnout and higher job satisfaction.

  8. Feasibility of using a miniature NIR spectrometer to measure volumic mass during alcoholic fermentation.

    PubMed

    Fernández-Novales, Juan; López, María-Isabel; González-Caballero, Virginia; Ramírez, Pilar; Sánchez, María-Teresa

    2011-06-01

    Volumic mass-a key component of must quality control tests during alcoholic fermentation-is of great interest to the winemaking industry. Transmitance near-infrared (NIR) spectra of 124 must samples over the range of 200-1,100-nm were obtained using a miniature spectrometer. The performance of this instrument to predict volumic mass was evaluated using partial least squares (PLS) regression and multiple linear regression (MLR). The validation statistics coefficient of determination (r(2)) and the standard error of prediction (SEP) were r(2) = 0.98, n = 31 and r(2) = 0.96, n = 31, and SEP = 5.85 and 7.49 g/dm(3) for PLS and MLR equations developed to fit reference data for volumic mass and spectral data. Comparison of results from MLR and PLS demonstrates that a MLR model with six significant wavelengths (P < 0.05) fit volumic mass data to transmittance (1/T) data slightly worse than a more sophisticated PLS model using the full scanning range. The results suggest that NIR spectroscopy is a suitable technique for predicting volumic mass during alcoholic fermentation, and that a low-cost NIR instrument can be used for this purpose.

  9. Bayesian semi-parametric analysis of Poisson change-point regression models: application to policy making in Cali, Colombia.

    PubMed

    Park, Taeyoung; Krafty, Robert T; Sánchez, Alvaro I

    2012-07-27

    A Poisson regression model with an offset assumes a constant baseline rate after accounting for measured covariates, which may lead to biased estimates of coefficients in an inhomogeneous Poisson process. To correctly estimate the effect of time-dependent covariates, we propose a Poisson change-point regression model with an offset that allows a time-varying baseline rate. When the nonconstant pattern of a log baseline rate is modeled with a nonparametric step function, the resulting semi-parametric model involves a model component of varying dimension and thus requires a sophisticated varying-dimensional inference to obtain correct estimates of model parameters of fixed dimension. To fit the proposed varying-dimensional model, we devise a state-of-the-art MCMC-type algorithm based on partial collapse. The proposed model and methods are used to investigate an association between daily homicide rates in Cali, Colombia and policies that restrict the hours during which the legal sale of alcoholic beverages is permitted. While simultaneously identifying the latent changes in the baseline homicide rate which correspond to the incidence of sociopolitical events, we explore the effect of policies governing the sale of alcohol on homicide rates and seek a policy that balances the economic and cultural dependencies on alcohol sales to the health of the public.

  10. A semi-nonparametric Poisson regression model for analyzing motor vehicle crash data.

    PubMed

    Ye, Xin; Wang, Ke; Zou, Yajie; Lord, Dominique

    2018-01-01

    This paper develops a semi-nonparametric Poisson regression model to analyze motor vehicle crash frequency data collected from rural multilane highway segments in California, US. Motor vehicle crash frequency on rural highway is a topic of interest in the area of transportation safety due to higher driving speeds and the resultant severity level. Unlike the traditional Negative Binomial (NB) model, the semi-nonparametric Poisson regression model can accommodate an unobserved heterogeneity following a highly flexible semi-nonparametric (SNP) distribution. Simulation experiments are conducted to demonstrate that the SNP distribution can well mimic a large family of distributions, including normal distributions, log-gamma distributions, bimodal and trimodal distributions. Empirical estimation results show that such flexibility offered by the SNP distribution can greatly improve model precision and the overall goodness-of-fit. The semi-nonparametric distribution can provide a better understanding of crash data structure through its ability to capture potential multimodality in the distribution of unobserved heterogeneity. When estimated coefficients in empirical models are compared, SNP and NB models are found to have a substantially different coefficient for the dummy variable indicating the lane width. The SNP model with better statistical performance suggests that the NB model overestimates the effect of lane width on crash frequency reduction by 83.1%.

  11. Modeling Data Containing Outliers using ARIMA Additive Outlier (ARIMA-AO)

    NASA Astrophysics Data System (ADS)

    Saleh Ahmar, Ansari; Guritno, Suryo; Abdurakhman; Rahman, Abdul; Awi; Alimuddin; Minggi, Ilham; Arif Tiro, M.; Kasim Aidid, M.; Annas, Suwardi; Utami Sutiksno, Dian; Ahmar, Dewi S.; Ahmar, Kurniawan H.; Abqary Ahmar, A.; Zaki, Ahmad; Abdullah, Dahlan; Rahim, Robbi; Nurdiyanto, Heri; Hidayat, Rahmat; Napitupulu, Darmawan; Simarmata, Janner; Kurniasih, Nuning; Andretti Abdillah, Leon; Pranolo, Andri; Haviluddin; Albra, Wahyudin; Arifin, A. Nurani M.

    2018-01-01

    The aim this study is discussed on the detection and correction of data containing the additive outlier (AO) on the model ARIMA (p, d, q). The process of detection and correction of data using an iterative procedure popularized by Box, Jenkins, and Reinsel (1994). By using this method we obtained an ARIMA models were fit to the data containing AO, this model is added to the original model of ARIMA coefficients obtained from the iteration process using regression methods. In the simulation data is obtained that the data contained AO initial models are ARIMA (2,0,0) with MSE = 36,780, after the detection and correction of data obtained by the iteration of the model ARIMA (2,0,0) with the coefficients obtained from the regression Zt = 0,106+0,204Z t-1+0,401Z t-2-329X 1(t)+115X 2(t)+35,9X 3(t) and MSE = 19,365. This shows that there is an improvement of forecasting error rate data.

  12. Statistical downscaling modeling with quantile regression using lasso to estimate extreme rainfall

    NASA Astrophysics Data System (ADS)

    Santri, Dewi; Wigena, Aji Hamim; Djuraidah, Anik

    2016-02-01

    Rainfall is one of the climatic elements with high diversity and has many negative impacts especially extreme rainfall. Therefore, there are several methods that required to minimize the damage that may occur. So far, Global circulation models (GCM) are the best method to forecast global climate changes include extreme rainfall. Statistical downscaling (SD) is a technique to develop the relationship between GCM output as a global-scale independent variables and rainfall as a local- scale response variable. Using GCM method will have many difficulties when assessed against observations because GCM has high dimension and multicollinearity between the variables. The common method that used to handle this problem is principal components analysis (PCA) and partial least squares regression. The new method that can be used is lasso. Lasso has advantages in simultaneuosly controlling the variance of the fitted coefficients and performing automatic variable selection. Quantile regression is a method that can be used to detect extreme rainfall in dry and wet extreme. Objective of this study is modeling SD using quantile regression with lasso to predict extreme rainfall in Indramayu. The results showed that the estimation of extreme rainfall (extreme wet in January, February and December) in Indramayu could be predicted properly by the model at quantile 90th.

  13. Composite marginal quantile regression analysis for longitudinal adolescent body mass index data.

    PubMed

    Yang, Chi-Chuan; Chen, Yi-Hau; Chang, Hsing-Yi

    2017-09-20

    Childhood and adolescenthood overweight or obesity, which may be quantified through the body mass index (BMI), is strongly associated with adult obesity and other health problems. Motivated by the child and adolescent behaviors in long-term evolution (CABLE) study, we are interested in individual, family, and school factors associated with marginal quantiles of longitudinal adolescent BMI values. We propose a new method for composite marginal quantile regression analysis for longitudinal outcome data, which performs marginal quantile regressions at multiple quantile levels simultaneously. The proposed method extends the quantile regression coefficient modeling method introduced by Frumento and Bottai (Biometrics 2016; 72:74-84) to longitudinal data accounting suitably for the correlation structure in longitudinal observations. A goodness-of-fit test for the proposed modeling is also developed. Simulation results show that the proposed method can be much more efficient than the analysis without taking correlation into account and the analysis performing separate quantile regressions at different quantile levels. The application to the longitudinal adolescent BMI data from the CABLE study demonstrates the practical utility of our proposal. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  14. Rates and risk factors of injury in CrossFitTM: a prospective cohort study.

    PubMed

    Moran, Sebastian; Booker, Harry; Staines, Jacob; Williams, Sean

    2017-09-01

    CrossFitTM is a strength and conditioning program that has gained widespread popularity since its inception approximately 15 years ago. However, at present little is known about the level of injury risk associated with this form of training. Movement competency, assessed using the Functional Movement ScreenTM (FMS), has been identified as a risk factor for injury in numerous athletic populations, but its role in CrossFit participants is currently unclear. The aim of this study was to evaluate the level of injury risk associated with CrossFit training, and examine the influence of a number of potential risk factors (including movement competency). A cohort of 117 CrossFit participants were followed prospectively for 12 weeks. Participants' characteristics, previous injury history and training experience were recorded at baseline, and an FMS assessment was conducted. The overall injury incidence rate was 2.10 per 1000 training hours (90% confidence limits: 1.32-3.33). A multivariate Poisson regression model identified males (rate ratio [RR]: 4.44 ×/÷ 3.30, very likely harmful) and those with previous injuries (RR: 2.35 ×/÷ 2.37, likely harmful) as having a higher injury risk. Inferences relating to FMS variables were unclear in the multivariate model, although number of asymmetries was a clear risk factor in a univariate model (RR per two additional asymmetries: 2.62 ×/÷ 1.53, likely harmful). The injury incidence rate associated with CrossFit training was low, and comparable to other forms of recreational fitness activities. Previous injury and gender were identified as risk factors for injury, whilst the role of movement competency in this setting warrants further investigation.

  15. A Combined SRTM Digital Elevation Model for Zanjan State of Iran Based on the Corrective Surface Idea

    NASA Astrophysics Data System (ADS)

    Kiamehr, Ramin

    2016-04-01

    One arc-second high resolution version of the SRTM model recently published for the Iran by the US Geological Survey database. Digital Elevation Models (DEM) is widely used in different disciplines and applications by geoscientist. It is an essential data in geoid computation procedure, e.g., to determine the topographic, downward continuation (DWC) and atmospheric corrections. Also, it can be used in road location and design in civil engineering and hydrological analysis. However, a DEM is only a model of the elevation surface and it is subject to errors. The most important parts of errors could be comes from the bias in height datum. On the other hand, the accuracy of DEM is usually published in global sense and it is important to have estimation about the accuracy in the area of interest before using of it. One of the best methods to have a reasonable indication about the accuracy of DEM is obtained from the comparison of their height versus the precise national GPS/levelling data. It can be done by the determination of the Root-Mean-Square (RMS) of fitting between the DEM and leveling heights. The errors in the DEM can be approximated by different kinds of functions in order to fit the DEMs to a set of GPS/levelling data using the least squares adjustment. In the current study, several models ranging from a simple linear regression to seven parameter similarity transformation model are used in fitting procedure. However, the seven parameter model gives the best fitting with minimum standard division in all selected DEMs in the study area. Based on the 35 precise GPS/levelling data we obtain a RMS of 7 parameter fitting for SRTM DEM 5.5 m, The corrective surface model in generated based on the transformation parameters and included to the original SRTM model. The result of fitting in combined model is estimated again by independent GPS/leveling data. The result shows great improvement in absolute accuracy of the model with the standard deviation of 3.4 meter.

  16. Preseason Aerobic Fitness Predicts In-Season Injury and Illness in Female Youth Athletes.

    PubMed

    Watson, Andrew; Brickson, Stacey; Brooks, M Alison; Dunn, Warren

    2017-09-01

    Although preseason aerobic fitness has been suggested as a modifiable risk factor for injury in adult athletes, the relationship between aerobic fitness, injury, and illness in youth athletes is unknown. To determine whether preseason aerobic fitness predicts in-season injury and illness risk in female adolescent soccer players. Case-control study; Level of evidence, 3. Fifty-four female adolescent soccer players underwent preseason evaluation to determine years of experience, body mass index (BMI), maximal aerobic capacity (VO 2max ), and time to exhaustion (T max ) during cycle ergometer testing. All injuries and illnesses during the subsequent 20-week season were recorded. Variables were compared between individuals with and without a self-reported injury and individuals with and without a self-reported illness. Separate Poisson regression models were developed to predict number of injuries and illnesses for each individual by use of age, years of experience, BMI, VO 2max , and T max. Twenty-eight injuries and 38 illnesses in 23 individuals were recorded during the season. Although not a statistically significant finding, individuals who reported an in-season injury had lower VO 2max than those who did not (54.9 ± 7.3 vs 58.3 ± 8.5 mL/kg/min, P = .13). Individuals who reported an illness had significantly lower VO 2max than those who did not (54.5 ± 9.9 vs 58.8 ± 6.2 mL/kg/min, P = .014). With the Poisson regression models, VO 2max was a significant predictor of both injury (odds ratio [OR], 0.95; P = .046) and illness (OR, 0.94; P = .009), while no significant relationships were identified between injury or illness and age, years of experience, T max , or BMI (all P > .05). Among adolescent female soccer players, greater preseason aerobic fitness is associated with a reduced risk of in-season injury and illness. Off-season intervention to promote aerobic fitness may help reduce the risk of lost time during the season due to injury and illness.

  17. Parametric correlation functions to model the structure of permanent environmental (co)variances in milk yield random regression models.

    PubMed

    Bignardi, A B; El Faro, L; Cardoso, V L; Machado, P F; Albuquerque, L G

    2009-09-01

    The objective of the present study was to estimate milk yield genetic parameters applying random regression models and parametric correlation functions combined with a variance function to model animal permanent environmental effects. A total of 152,145 test-day milk yields from 7,317 first lactations of Holstein cows belonging to herds located in the southeastern region of Brazil were analyzed. Test-day milk yields were divided into 44 weekly classes of days in milk. Contemporary groups were defined by herd-test-day comprising a total of 2,539 classes. The model included direct additive genetic, permanent environmental, and residual random effects. The following fixed effects were considered: contemporary group, age of cow at calving (linear and quadratic regressions), and the population average lactation curve modeled by fourth-order orthogonal Legendre polynomial. Additive genetic effects were modeled by random regression on orthogonal Legendre polynomials of days in milk, whereas permanent environmental effects were estimated using a stationary or nonstationary parametric correlation function combined with a variance function of different orders. The structure of residual variances was modeled using a step function containing 6 variance classes. The genetic parameter estimates obtained with the model using a stationary correlation function associated with a variance function to model permanent environmental effects were similar to those obtained with models employing orthogonal Legendre polynomials for the same effect. A model using a sixth-order polynomial for additive effects and a stationary parametric correlation function associated with a seventh-order variance function to model permanent environmental effects would be sufficient for data fitting.

  18. APTES-modified mesoporous silicas as the carriers for poorly water-soluble drug. Modeling of diflunisal adsorption and release

    NASA Astrophysics Data System (ADS)

    Geszke-Moritz, Małgorzata; Moritz, Michał

    2016-04-01

    Four mesoporous siliceous materials such as SBA-16, SBA-15, PHTS and MCF functionalized with (3-aminopropyl)triethoxysilane were successfully prepared and applied as the carriers for poorly water-soluble drug diflunisal. Several techniques including nitrogen sorption analysis, XRD, TEM, FTIR and thermogravimetric analysis were employed to characterize mesoporous matrices. Adsorption isotherms were analyzed using Langmuir, Freundlich, Temkin and Dubinin-Radushkevich models. In order to find the best-fit isotherm for each model, both linear and nonlinear regressions were carried out. The equilibrium data were best fitted by the Langmuir isotherm model revealing maximum adsorption capacity of 217.4 mg/g for aminopropyl group-modified SBA-15. The negative values of Gibbs free energy change indicated that the adsorption of diflunisal is a spontaneous process. Weibull release model was employed to describe the dissolution profile of diflunisal. At pH 4.5 all prepared mesoporous matrices exhibited the improvement of drug dissolution kinetics as compared to the dissolution rate of pure diflunisal.

  19. Modelling of different enzyme productions by solid-state fermentation on several agro-industrial residues.

    PubMed

    Diaz, Ana Belen; Blandino, Ana; Webb, Colin; Caro, Ildefonso

    2016-11-01

    A simple kinetic model, with only three fitting parameters, for several enzyme productions in Petri dishes by solid-state fermentation is proposed in this paper, which may be a valuable tool for simulation of this type of processes. Basically, the model is able to predict temporal fungal enzyme production by solid-state fermentation on complex substrates, maximum enzyme activity expected and time at which these maxima are reached. In this work, several fermentations in solid state were performed in Petri dishes, using four filamentous fungi grown on different agro-industrial residues, measuring xylanase, exo-polygalacturonase, cellulose and laccase activities over time. Regression coefficients after fitting experimental data to the proposed model turned out to be quite high in all cases. In fact, these results are very interesting considering, on the one hand, the simplicity of the model and, on the other hand, that enzyme activities correspond to different enzymes, produced by different fungi on different substrates.

  20. Predicting clicks of PubMed articles.

    PubMed

    Mao, Yuqing; Lu, Zhiyong

    2013-01-01

    Predicting the popularity or access usage of an article has the potential to improve the quality of PubMed searches. We can model the click trend of each article as its access changes over time by mining the PubMed query logs, which contain the previous access history for all articles. In this article, we examine the access patterns produced by PubMed users in two years (July 2009 to July 2011). We explore the time series of accesses for each article in the query logs, model the trends with regression approaches, and subsequently use the models for prediction. We show that the click trends of PubMed articles are best fitted with a log-normal regression model. This model allows the number of accesses an article receives and the time since it first becomes available in PubMed to be related via quadratic and logistic functions, with the model parameters to be estimated via maximum likelihood. Our experiments predicting the number of accesses for an article based on its past usage demonstrate that the mean absolute error and mean absolute percentage error of our model are 4.0% and 8.1% lower than the power-law regression model, respectively. The log-normal distribution is also shown to perform significantly better than a previous prediction method based on a human memory theory in cognitive science. This work warrants further investigation on the utility of such a log-normal regression approach towards improving information access in PubMed.

Top