Tokunaga, Makoto; Watanabe, Susumu; Sonoda, Shigeru
2017-09-01
Multiple linear regression analysis is often used to predict the outcome of stroke rehabilitation. However, the predictive accuracy may not be satisfactory. The objective of this study was to elucidate the predictive accuracy of a method of calculating motor Functional Independence Measure (mFIM) at discharge from mFIM effectiveness predicted by multiple regression analysis. The subjects were 505 patients with stroke who were hospitalized in a convalescent rehabilitation hospital. The formula "mFIM at discharge = mFIM effectiveness × (91 points - mFIM at admission) + mFIM at admission" was used. By including the predicted mFIM effectiveness obtained through multiple regression analysis in this formula, we obtained the predicted mFIM at discharge (A). We also used multiple regression analysis to directly predict mFIM at discharge (B). The correlation between the predicted and the measured values of mFIM at discharge was compared between A and B. The correlation coefficients were .916 for A and .878 for B. Calculating mFIM at discharge from mFIM effectiveness predicted by multiple regression analysis had a higher degree of predictive accuracy of mFIM at discharge than that directly predicted. Copyright © 2017 National Stroke Association. Published by Elsevier Inc. All rights reserved.
Exact Analysis of Squared Cross-Validity Coefficient in Predictive Regression Models
ERIC Educational Resources Information Center
Shieh, Gwowen
2009-01-01
In regression analysis, the notion of population validity is of theoretical interest for describing the usefulness of the underlying regression model, whereas the presumably more important concept of population cross-validity represents the predictive effectiveness for the regression equation in future research. It appears that the inference…
Precision Efficacy Analysis for Regression.
ERIC Educational Resources Information Center
Brooks, Gordon P.
When multiple linear regression is used to develop a prediction model, sample size must be large enough to ensure stable coefficients. If the derivation sample size is inadequate, the model may not predict well for future subjects. The precision efficacy analysis for regression (PEAR) method uses a cross- validity approach to select sample sizes…
Multivariate Regression Analysis and Slaughter Livestock,
AGRICULTURE, *ECONOMICS), (*MEAT, PRODUCTION), MULTIVARIATE ANALYSIS, REGRESSION ANALYSIS , ANIMALS, WEIGHT, COSTS, PREDICTIONS, STABILITY, MATHEMATICAL MODELS, STORAGE, BEEF, PORK, FOOD, STATISTICAL DATA, ACCURACY
Poisson Mixture Regression Models for Heart Disease Prediction.
Mufudza, Chipo; Erol, Hamza
2016-01-01
Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.
Poisson Mixture Regression Models for Heart Disease Prediction
Erol, Hamza
2016-01-01
Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611
Regression: The Apple Does Not Fall Far From the Tree.
Vetter, Thomas R; Schober, Patrick
2018-05-15
Researchers and clinicians are frequently interested in either: (1) assessing whether there is a relationship or association between 2 or more variables and quantifying this association; or (2) determining whether 1 or more variables can predict another variable. The strength of such an association is mainly described by the correlation. However, regression analysis and regression models can be used not only to identify whether there is a significant relationship or association between variables but also to generate estimations of such a predictive relationship between variables. This basic statistical tutorial discusses the fundamental concepts and techniques related to the most common types of regression analysis and modeling, including simple linear regression, multiple regression, logistic regression, ordinal regression, and Poisson regression, as well as the common yet often underrecognized phenomenon of regression toward the mean. The various types of regression analysis are powerful statistical techniques, which when appropriately applied, can allow for the valid interpretation of complex, multifactorial data. Regression analysis and models can assess whether there is a relationship or association between 2 or more observed variables and estimate the strength of this association, as well as determine whether 1 or more variables can predict another variable. Regression is thus being applied more commonly in anesthesia, perioperative, critical care, and pain research. However, it is crucial to note that regression can identify plausible risk factors; it does not prove causation (a definitive cause and effect relationship). The results of a regression analysis instead identify independent (predictor) variable(s) associated with the dependent (outcome) variable. As with other statistical methods, applying regression requires that certain assumptions be met, which can be tested with specific diagnostics.
Suzuki, Hideaki; Tabata, Takahisa; Koizumi, Hiroki; Hohchi, Nobusuke; Takeuchi, Shoko; Kitamura, Takuro; Fujino, Yoshihisa; Ohbuchi, Toyoaki
2014-12-01
This study aimed to create a multiple regression model for predicting hearing outcomes of idiopathic sudden sensorineural hearing loss (ISSNHL). The participants were 205 consecutive patients (205 ears) with ISSNHL (hearing level ≥ 40 dB, interval between onset and treatment ≤ 30 days). They received systemic steroid administration combined with intratympanic steroid injection. Data were examined by simple and multiple regression analyses. Three hearing indices (percentage hearing improvement, hearing gain, and posttreatment hearing level [HLpost]) and 7 prognostic factors (age, days from onset to treatment, initial hearing level, initial hearing level at low frequencies, initial hearing level at high frequencies, presence of vertigo, and contralateral hearing level) were included in the multiple regression analysis as dependent and explanatory variables, respectively. In the simple regression analysis, the percentage hearing improvement, hearing gain, and HLpost showed significant correlation with 2, 5, and 6 of the 7 prognostic factors, respectively. The multiple correlation coefficients were 0.396, 0.503, and 0.714 for the percentage hearing improvement, hearing gain, and HLpost, respectively. Predicted values of HLpost calculated by the multiple regression equation were reliable with 70% probability with a 40-dB-width prediction interval. Prediction of HLpost by the multiple regression model may be useful to estimate the hearing prognosis of ISSNHL. © The Author(s) 2014.
Anantha M. Prasad; Louis R. Iverson; Andy Liaw; Andy Liaw
2006-01-01
We evaluated four statistical models - Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS) - for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model.
Deng, Yingyuan; Wang, Tianfu; Chen, Siping; Liu, Weixiang
2017-01-01
The aim of the study is to screen the significant sonographic features by logistic regression analysis and fit a model to diagnose thyroid nodules. A total of 525 pathological thyroid nodules were retrospectively analyzed. All the nodules underwent conventional ultrasonography (US), strain elastosonography (SE), and contrast -enhanced ultrasound (CEUS). Those nodules’ 12 suspicious sonographic features were used to assess thyroid nodules. The significant features of diagnosing thyroid nodules were picked out by logistic regression analysis. All variables that were statistically related to diagnosis of thyroid nodules, at a level of p < 0.05 were embodied in a logistic regression analysis model. The significant features in the logistic regression model of diagnosing thyroid nodules were calcification, suspected cervical lymph node metastasis, hypoenhancement pattern, margin, shape, vascularity, posterior acoustic, echogenicity, and elastography score. According to the results of logistic regression analysis, the formula that could predict whether or not thyroid nodules are malignant was established. The area under the receiver operating curve (ROC) was 0.930 and the sensitivity, specificity, accuracy, positive predictive value, and negative predictive value were 83.77%, 89.56%, 87.05%, 86.04%, and 87.79% respectively. PMID:29228030
Pang, Tiantian; Huang, Leidan; Deng, Yingyuan; Wang, Tianfu; Chen, Siping; Gong, Xuehao; Liu, Weixiang
2017-01-01
The aim of the study is to screen the significant sonographic features by logistic regression analysis and fit a model to diagnose thyroid nodules. A total of 525 pathological thyroid nodules were retrospectively analyzed. All the nodules underwent conventional ultrasonography (US), strain elastosonography (SE), and contrast -enhanced ultrasound (CEUS). Those nodules' 12 suspicious sonographic features were used to assess thyroid nodules. The significant features of diagnosing thyroid nodules were picked out by logistic regression analysis. All variables that were statistically related to diagnosis of thyroid nodules, at a level of p < 0.05 were embodied in a logistic regression analysis model. The significant features in the logistic regression model of diagnosing thyroid nodules were calcification, suspected cervical lymph node metastasis, hypoenhancement pattern, margin, shape, vascularity, posterior acoustic, echogenicity, and elastography score. According to the results of logistic regression analysis, the formula that could predict whether or not thyroid nodules are malignant was established. The area under the receiver operating curve (ROC) was 0.930 and the sensitivity, specificity, accuracy, positive predictive value, and negative predictive value were 83.77%, 89.56%, 87.05%, 86.04%, and 87.79% respectively.
Procedures for adjusting regional regression models of urban-runoff quality using local data
Hoos, A.B.; Sisolak, J.K.
1993-01-01
Statistical operations termed model-adjustment procedures (MAP?s) can be used to incorporate local data into existing regression models to improve the prediction of urban-runoff quality. Each MAP is a form of regression analysis in which the local data base is used as a calibration data set. Regression coefficients are determined from the local data base, and the resulting `adjusted? regression models can then be used to predict storm-runoff quality at unmonitored sites. The response variable in the regression analyses is the observed load or mean concentration of a constituent in storm runoff for a single storm. The set of explanatory variables used in the regression analyses is different for each MAP, but always includes the predicted value of load or mean concentration from a regional regression model. The four MAP?s examined in this study were: single-factor regression against the regional model prediction, P, (termed MAP-lF-P), regression against P,, (termed MAP-R-P), regression against P, and additional local variables (termed MAP-R-P+nV), and a weighted combination of P, and a local-regression prediction (termed MAP-W). The procedures were tested by means of split-sample analysis, using data from three cities included in the Nationwide Urban Runoff Program: Denver, Colorado; Bellevue, Washington; and Knoxville, Tennessee. The MAP that provided the greatest predictive accuracy for the verification data set differed among the three test data bases and among model types (MAP-W for Denver and Knoxville, MAP-lF-P and MAP-R-P for Bellevue load models, and MAP-R-P+nV for Bellevue concentration models) and, in many cases, was not clearly indicated by the values of standard error of estimate for the calibration data set. A scheme to guide MAP selection, based on exploratory data analysis of the calibration data set, is presented and tested. The MAP?s were tested for sensitivity to the size of a calibration data set. As expected, predictive accuracy of all MAP?s for the verification data set decreased as the calibration data-set size decreased, but predictive accuracy was not as sensitive for the MAP?s as it was for the local regression models.
Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon
2015-01-01
Background: Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. Methods: In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. Results: The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Conclusion: Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended. PMID:26793655
Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon
2015-01-01
Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended.
Sampson, Maureen L; Gounden, Verena; van Deventer, Hendrik E; Remaley, Alan T
2016-02-01
The main drawback of the periodic analysis of quality control (QC) material is that test performance is not monitored in time periods between QC analyses, potentially leading to the reporting of faulty test results. The objective of this study was to develop a patient based QC procedure for the more timely detection of test errors. Results from a Chem-14 panel measured on the Beckman LX20 analyzer were used to develop the model. Each test result was predicted from the other 13 members of the panel by multiple regression, which resulted in correlation coefficients between the predicted and measured result of >0.7 for 8 of the 14 tests. A logistic regression model, which utilized the measured test result, the predicted test result, the day of the week and time of day, was then developed for predicting test errors. The output of the logistic regression was tallied by a daily CUSUM approach and used to predict test errors, with a fixed specificity of 90%. The mean average run length (ARL) before error detection by CUSUM-Logistic Regression (CSLR) was 20 with a mean sensitivity of 97%, which was considerably shorter than the mean ARL of 53 (sensitivity 87.5%) for a simple prediction model that only used the measured result for error detection. A CUSUM-Logistic Regression analysis of patient laboratory data can be an effective approach for the rapid and sensitive detection of clinical laboratory errors. Published by Elsevier Inc.
Vidyasagar, Mathukumalli
2015-01-01
This article reviews several techniques from machine learning that can be used to study the problem of identifying a small number of features, from among tens of thousands of measured features, that can accurately predict a drug response. Prediction problems are divided into two categories: sparse classification and sparse regression. In classification, the clinical parameter to be predicted is binary, whereas in regression, the parameter is a real number. Well-known methods for both classes of problems are briefly discussed. These include the SVM (support vector machine) for classification and various algorithms such as ridge regression, LASSO (least absolute shrinkage and selection operator), and EN (elastic net) for regression. In addition, several well-established methods that do not directly fall into machine learning theory are also reviewed, including neural networks, PAM (pattern analysis for microarrays), SAM (significance analysis for microarrays), GSEA (gene set enrichment analysis), and k-means clustering. Several references indicative of the application of these methods to cancer biology are discussed.
Regression Simulation Model. Appendix X. Users Manual,
1981-03-01
change as the prediction equations become refined. Whereas no notice will be provided when the changes are made, the programs will be modified such that...NATIONAL BUREAU Of STANDARDS 1963 A ___,_ __ _ __ _ . APPENDIX X ( R4/ EGRESSION IMULATION ’jDEL. Ape’A ’) 7 USERS MANUA submitted to The Great River...regression analysis and to establish a prediction equation (model). The prediction equation contains the partial regression coefficients (B-weights) which
Afantitis, Antreas; Melagraki, Georgia; Sarimveis, Haralambos; Koutentis, Panayiotis A; Markopoulos, John; Igglessi-Markopoulou, Olga
2006-08-01
A quantitative-structure activity relationship was obtained by applying Multiple Linear Regression Analysis to a series of 80 1-[2-hydroxyethoxy-methyl]-6-(phenylthio) thymine (HEPT) derivatives with significant anti-HIV activity. For the selection of the best among 37 different descriptors, the Elimination Selection Stepwise Regression Method (ES-SWR) was utilized. The resulting QSAR model (R (2) (CV) = 0.8160; S (PRESS) = 0.5680) proved to be very accurate both in training and predictive stages.
Prediction of sweetness and amino acid content in soybean crops from hyperspectral imagery
NASA Astrophysics Data System (ADS)
Monteiro, Sildomar Takahashi; Minekawa, Yohei; Kosugi, Yukio; Akazawa, Tsuneya; Oda, Kunio
Hyperspectral image data provides a powerful tool for non-destructive crop analysis. This paper investigates a hyperspectral image data-processing method to predict the sweetness and amino acid content of soybean crops. Regression models based on artificial neural networks were developed in order to calculate the level of sucrose, glucose, fructose, and nitrogen concentrations, which can be related to the sweetness and amino acid content of vegetables. A performance analysis was conducted comparing regression models obtained using different preprocessing methods, namely, raw reflectance, second derivative, and principal components analysis. This method is demonstrated using high-resolution hyperspectral data of wavelengths ranging from the visible to the near infrared acquired from an experimental field of green vegetable soybeans. The best predictions were achieved using a nonlinear regression model of the second derivative transformed dataset. Glucose could be predicted with greater accuracy, followed by sucrose, fructose and nitrogen. The proposed method provides the possibility to provide relatively accurate maps predicting the chemical content of soybean crop fields.
Chen, Wei-Hsin; Hsu, Hung-Jen; Kumar, Gopalakrishnan; Budzianowski, Wojciech M; Ong, Hwai Chyuan
2017-12-01
This study focuses on the biochar formation and torrefaction performance of sugarcane bagasse, and they are predicted using the bilinear interpolation (BLI), inverse distance weighting (IDW) interpolation, and regression analysis. It is found that the biomass torrefied at 275°C for 60min or at 300°C for 30min or longer is appropriate to produce biochar as alternative fuel to coal with low carbon footprint, but the energy yield from the torrefaction at 300°C is too low. From the biochar yield, enhancement factor of HHV, and energy yield, the results suggest that the three methods are all feasible for predicting the performance, especially for the enhancement factor. The power parameter of unity in the IDW method provides the best predictions and the error is below 5%. The second order in regression analysis gives a more reasonable approach than the first order, and is recommended for the predictions. Copyright © 2017 Elsevier Ltd. All rights reserved.
Modeling time-to-event (survival) data using classification tree analysis.
Linden, Ariel; Yarnold, Paul R
2017-12-01
Time to the occurrence of an event is often studied in health research. Survival analysis differs from other designs in that follow-up times for individuals who do not experience the event by the end of the study (called censored) are accounted for in the analysis. Cox regression is the standard method for analysing censored data, but the assumptions required of these models are easily violated. In this paper, we introduce classification tree analysis (CTA) as a flexible alternative for modelling censored data. Classification tree analysis is a "decision-tree"-like classification model that provides parsimonious, transparent (ie, easy to visually display and interpret) decision rules that maximize predictive accuracy, derives exact P values via permutation tests, and evaluates model cross-generalizability. Using empirical data, we identify all statistically valid, reproducible, longitudinally consistent, and cross-generalizable CTA survival models and then compare their predictive accuracy to estimates derived via Cox regression and an unadjusted naïve model. Model performance is assessed using integrated Brier scores and a comparison between estimated survival curves. The Cox regression model best predicts average incidence of the outcome over time, whereas CTA survival models best predict either relatively high, or low, incidence of the outcome over time. Classification tree analysis survival models offer many advantages over Cox regression, such as explicit maximization of predictive accuracy, parsimony, statistical robustness, and transparency. Therefore, researchers interested in accurate prognoses and clear decision rules should consider developing models using the CTA-survival framework. © 2017 John Wiley & Sons, Ltd.
Prediction models for clustered data: comparison of a random intercept and standard regression model
2013-01-01
Background When study data are clustered, standard regression analysis is considered inappropriate and analytical techniques for clustered data need to be used. For prediction research in which the interest of predictor effects is on the patient level, random effect regression models are probably preferred over standard regression analysis. It is well known that the random effect parameter estimates and the standard logistic regression parameter estimates are different. Here, we compared random effect and standard logistic regression models for their ability to provide accurate predictions. Methods Using an empirical study on 1642 surgical patients at risk of postoperative nausea and vomiting, who were treated by one of 19 anesthesiologists (clusters), we developed prognostic models either with standard or random intercept logistic regression. External validity of these models was assessed in new patients from other anesthesiologists. We supported our results with simulation studies using intra-class correlation coefficients (ICC) of 5%, 15%, or 30%. Standard performance measures and measures adapted for the clustered data structure were estimated. Results The model developed with random effect analysis showed better discrimination than the standard approach, if the cluster effects were used for risk prediction (standard c-index of 0.69 versus 0.66). In the external validation set, both models showed similar discrimination (standard c-index 0.68 versus 0.67). The simulation study confirmed these results. For datasets with a high ICC (≥15%), model calibration was only adequate in external subjects, if the used performance measure assumed the same data structure as the model development method: standard calibration measures showed good calibration for the standard developed model, calibration measures adapting the clustered data structure showed good calibration for the prediction model with random intercept. Conclusion The models with random intercept discriminate better than the standard model only if the cluster effect is used for predictions. The prediction model with random intercept had good calibration within clusters. PMID:23414436
Bouwmeester, Walter; Twisk, Jos W R; Kappen, Teus H; van Klei, Wilton A; Moons, Karel G M; Vergouwe, Yvonne
2013-02-15
When study data are clustered, standard regression analysis is considered inappropriate and analytical techniques for clustered data need to be used. For prediction research in which the interest of predictor effects is on the patient level, random effect regression models are probably preferred over standard regression analysis. It is well known that the random effect parameter estimates and the standard logistic regression parameter estimates are different. Here, we compared random effect and standard logistic regression models for their ability to provide accurate predictions. Using an empirical study on 1642 surgical patients at risk of postoperative nausea and vomiting, who were treated by one of 19 anesthesiologists (clusters), we developed prognostic models either with standard or random intercept logistic regression. External validity of these models was assessed in new patients from other anesthesiologists. We supported our results with simulation studies using intra-class correlation coefficients (ICC) of 5%, 15%, or 30%. Standard performance measures and measures adapted for the clustered data structure were estimated. The model developed with random effect analysis showed better discrimination than the standard approach, if the cluster effects were used for risk prediction (standard c-index of 0.69 versus 0.66). In the external validation set, both models showed similar discrimination (standard c-index 0.68 versus 0.67). The simulation study confirmed these results. For datasets with a high ICC (≥15%), model calibration was only adequate in external subjects, if the used performance measure assumed the same data structure as the model development method: standard calibration measures showed good calibration for the standard developed model, calibration measures adapting the clustered data structure showed good calibration for the prediction model with random intercept. The models with random intercept discriminate better than the standard model only if the cluster effect is used for predictions. The prediction model with random intercept had good calibration within clusters.
Multivariate Analysis of Seismic Field Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Alam, M. Kathleen
1999-06-01
This report includes the details of the model building procedure and prediction of seismic field data. Principal Components Regression, a multivariate analysis technique, was used to model seismic data collected as two pieces of equipment were cycled on and off. Models built that included only the two pieces of equipment of interest had trouble predicting data containing signals not included in the model. Evidence for poor predictions came from the prediction curves as well as spectral F-ratio plots. Once the extraneous signals were included in the model, predictions improved dramatically. While Principal Components Regression performed well for the present datamore » sets, the present data analysis suggests further work will be needed to develop more robust modeling methods as the data become more complex.« less
NASA Astrophysics Data System (ADS)
Haddad, Khaled; Rahman, Ataur; A Zaman, Mohammad; Shrestha, Surendra
2013-03-01
SummaryIn regional hydrologic regression analysis, model selection and validation are regarded as important steps. Here, the model selection is usually based on some measurements of goodness-of-fit between the model prediction and observed data. In Regional Flood Frequency Analysis (RFFA), leave-one-out (LOO) validation or a fixed percentage leave out validation (e.g., 10%) is commonly adopted to assess the predictive ability of regression-based prediction equations. This paper develops a Monte Carlo Cross Validation (MCCV) technique (which has widely been adopted in Chemometrics and Econometrics) in RFFA using Generalised Least Squares Regression (GLSR) and compares it with the most commonly adopted LOO validation approach. The study uses simulated and regional flood data from the state of New South Wales in Australia. It is found that when developing hydrologic regression models, application of the MCCV is likely to result in a more parsimonious model than the LOO. It has also been found that the MCCV can provide a more realistic estimate of a model's predictive ability when compared with the LOO.
NASA Astrophysics Data System (ADS)
Mitra, Ashis; Majumdar, Prabal Kumar; Bannerjee, Debamalya
2013-03-01
This paper presents a comparative analysis of two modeling methodologies for the prediction of air permeability of plain woven handloom cotton fabrics. Four basic fabric constructional parameters namely ends per inch, picks per inch, warp count and weft count have been used as inputs for artificial neural network (ANN) and regression models. Out of the four regression models tried, interaction model showed very good prediction performance with a meager mean absolute error of 2.017 %. However, ANN models demonstrated superiority over the regression models both in terms of correlation coefficient and mean absolute error. The ANN model with 10 nodes in the single hidden layer showed very good correlation coefficient of 0.982 and 0.929 and mean absolute error of only 0.923 and 2.043 % for training and testing data respectively.
Population heterogeneity in the salience of multiple risk factors for adolescent delinquency.
Lanza, Stephanie T; Cooper, Brittany R; Bray, Bethany C
2014-03-01
To present mixture regression analysis as an alternative to more standard regression analysis for predicting adolescent delinquency. We demonstrate how mixture regression analysis allows for the identification of population subgroups defined by the salience of multiple risk factors. We identified population subgroups (i.e., latent classes) of individuals based on their coefficients in a regression model predicting adolescent delinquency from eight previously established risk indices drawn from the community, school, family, peer, and individual levels. The study included N = 37,763 10th-grade adolescents who participated in the Communities That Care Youth Survey. Standard, zero-inflated, and mixture Poisson and negative binomial regression models were considered. Standard and mixture negative binomial regression models were selected as optimal. The five-class regression model was interpreted based on the class-specific regression coefficients, indicating that risk factors had varying salience across classes of adolescents. Standard regression showed that all risk factors were significantly associated with delinquency. Mixture regression provided more nuanced information, suggesting a unique set of risk factors that were salient for different subgroups of adolescents. Implications for the design of subgroup-specific interventions are discussed. Copyright © 2014 Society for Adolescent Health and Medicine. Published by Elsevier Inc. All rights reserved.
Prediction by regression and intrarange data scatter in surface-process studies
Toy, T.J.; Osterkamp, W.R.; Renard, K.G.
1993-01-01
Modeling is a major component of contemporary earth science, and regression analysis occupies a central position in the parameterization, calibration, and validation of geomorphic and hydrologic models. Although this methodology can be used in many ways, we are primarily concerned with the prediction of values for one variable from another variable. Examination of the literature reveals considerable inconsistency in the presentation of the results of regression analysis and the occurrence of patterns in the scatter of data points about the regression line. Both circumstances confound utilization and evaluation of the models. Statisticians are well aware of various problems associated with the use of regression analysis and offer improved practices; often, however, their guidelines are not followed. After a review of the aforementioned circumstances and until standard criteria for model evaluation become established, we recommend, as a minimum, inclusion of scatter diagrams, the standard error of the estimate, and sample size in reporting the results of regression analyses for most surface-process studies. ?? 1993 Springer-Verlag.
The microcomputer scientific software series 2: general linear model--regression.
Harold M. Rauscher
1983-01-01
The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...
Common pitfalls in statistical analysis: Linear regression analysis
Aggarwal, Rakesh; Ranganathan, Priya
2017-01-01
In a previous article in this series, we explained correlation analysis which describes the strength of relationship between two continuous variables. In this article, we deal with linear regression analysis which predicts the value of one continuous variable from another. We also discuss the assumptions and pitfalls associated with this analysis. PMID:28447022
NASA Astrophysics Data System (ADS)
Madhu, B.; Ashok, N. C.; Balasubramanian, S.
2014-11-01
Multinomial logistic regression analysis was used to develop statistical model that can predict the probability of breast cancer in Southern Karnataka using the breast cancer occurrence data during 2007-2011. Independent socio-economic variables describing the breast cancer occurrence like age, education, occupation, parity, type of family, health insurance coverage, residential locality and socioeconomic status of each case was obtained. The models were developed as follows: i) Spatial visualization of the Urban- rural distribution of breast cancer cases that were obtained from the Bharat Hospital and Institute of Oncology. ii) Socio-economic risk factors describing the breast cancer occurrences were complied for each case. These data were then analysed using multinomial logistic regression analysis in a SPSS statistical software and relations between the occurrence of breast cancer across the socio-economic status and the influence of other socio-economic variables were evaluated and multinomial logistic regression models were constructed. iii) the model that best predicted the occurrence of breast cancer were identified. This multivariate logistic regression model has been entered into a geographic information system and maps showing the predicted probability of breast cancer occurrence in Southern Karnataka was created. This study demonstrates that Multinomial logistic regression is a valuable tool for developing models that predict the probability of breast cancer Occurrence in Southern Karnataka.
Troutman, Brent M.
1982-01-01
Errors in runoff prediction caused by input data errors are analyzed by treating precipitation-runoff models as regression (conditional expectation) models. Independent variables of the regression consist of precipitation and other input measurements; the dependent variable is runoff. In models using erroneous input data, prediction errors are inflated and estimates of expected storm runoff for given observed input variables are biased. This bias in expected runoff estimation results in biased parameter estimates if these parameter estimates are obtained by a least squares fit of predicted to observed runoff values. The problems of error inflation and bias are examined in detail for a simple linear regression of runoff on rainfall and for a nonlinear U.S. Geological Survey precipitation-runoff model. Some implications for flood frequency analysis are considered. A case study using a set of data from Turtle Creek near Dallas, Texas illustrates the problems of model input errors.
Forecasting municipal solid waste generation using prognostic tools and regression analysis.
Ghinea, Cristina; Drăgoi, Elena Niculina; Comăniţă, Elena-Diana; Gavrilescu, Marius; Câmpean, Teofil; Curteanu, Silvia; Gavrilescu, Maria
2016-11-01
For an adequate planning of waste management systems the accurate forecast of waste generation is an essential step, since various factors can affect waste trends. The application of predictive and prognosis models are useful tools, as reliable support for decision making processes. In this paper some indicators such as: number of residents, population age, urban life expectancy, total municipal solid waste were used as input variables in prognostic models in order to predict the amount of solid waste fractions. We applied Waste Prognostic Tool, regression analysis and time series analysis to forecast municipal solid waste generation and composition by considering the Iasi Romania case study. Regression equations were determined for six solid waste fractions (paper, plastic, metal, glass, biodegradable and other waste). Accuracy Measures were calculated and the results showed that S-curve trend model is the most suitable for municipal solid waste (MSW) prediction. Copyright © 2016 Elsevier Ltd. All rights reserved.
Can Predictive Modeling Identify Head and Neck Oncology Patients at Risk for Readmission?
Manning, Amy M; Casper, Keith A; Peter, Kay St; Wilson, Keith M; Mark, Jonathan R; Collar, Ryan M
2018-05-01
Objective Unplanned readmission within 30 days is a contributor to health care costs in the United States. The use of predictive modeling during hospitalization to identify patients at risk for readmission offers a novel approach to quality improvement and cost reduction. Study Design Two-phase study including retrospective analysis of prospectively collected data followed by prospective longitudinal study. Setting Tertiary academic medical center. Subjects and Methods Prospectively collected data for patients undergoing surgical treatment for head and neck cancer from January 2013 to January 2015 were used to build predictive models for readmission within 30 days of discharge using logistic regression, classification and regression tree (CART) analysis, and random forests. One model (logistic regression) was then placed prospectively into the discharge workflow from March 2016 to May 2016 to determine the model's ability to predict which patients would be readmitted within 30 days. Results In total, 174 admissions had descriptive data. Thirty-two were excluded due to incomplete data. Logistic regression, CART, and random forest predictive models were constructed using the remaining 142 admissions. When applied to 106 consecutive prospective head and neck oncology patients at the time of discharge, the logistic regression model predicted readmissions with a specificity of 94%, a sensitivity of 47%, a negative predictive value of 90%, and a positive predictive value of 62% (odds ratio, 14.9; 95% confidence interval, 4.02-55.45). Conclusion Prospectively collected head and neck cancer databases can be used to develop predictive models that can accurately predict which patients will be readmitted. This offers valuable support for quality improvement initiatives and readmission-related cost reduction in head and neck cancer care.
Guo, Huey-Ming; Shyu, Yea-Ing Lotus; Chang, Her-Kun
2006-01-01
In this article, the authors provide an overview of a research method to predict quality of care in home health nursing data set. The results of this study can be visualized through classification an regression tree (CART) graphs. The analysis was more effective, and the results were more informative since the home health nursing dataset was analyzed with a combination of the logistic regression and CART, these two techniques complete each other. And the results more informative that more patients' characters were related to quality of care in home care. The results contributed to home health nurse predict patient outcome in case management. Improved prediction is needed for interventions to be appropriately targeted for improved patient outcome and quality of care.
NASA Astrophysics Data System (ADS)
Tan, C. H.; Matjafri, M. Z.; Lim, H. S.
2015-10-01
This paper presents the prediction models which analyze and compute the CO2 emission in Malaysia. Each prediction model for CO2 emission will be analyzed based on three main groups which is transportation, electricity and heat production as well as residential buildings and commercial and public services. The prediction models were generated using data obtained from World Bank Open Data. Best subset method will be used to remove irrelevant data and followed by multi linear regression to produce the prediction models. From the results, high R-square (prediction) value was obtained and this implies that the models are reliable to predict the CO2 emission by using specific data. In addition, the CO2 emissions from these three groups are forecasted using trend analysis plots for observation purpose.
Schistosomiasis Breeding Environment Situation Analysis in Dongting Lake Area
NASA Astrophysics Data System (ADS)
Li, Chuanrong; Jia, Yuanyuan; Ma, Lingling; Liu, Zhaoyan; Qian, Yonggang
2013-01-01
Monitoring environmental characteristics, such as vegetation, soil moisture et al., of Oncomelania hupensis (O. hupensis)’ spatial/temporal distribution is of vital importance to the schistosomiasis prevention and control. In this study, the relationship between environmental factors derived from remotely sensed data and the density of O. hupensis was analyzed by a multiple linear regression model. Secondly, spatial analysis of the regression residual was investigated by the semi-variogram method. Thirdly, spatial analysis of the regression residual and the multiple linear regression model were both employed to estimate the spatial variation of O. hupensis density. Finally, the approach was used to monitor and predict the spatial and temporal variations of oncomelania of Dongting Lake region, China. And the areas of potential O. hupensis habitats were predicted and the influence of Three Gorges Dam (TGB)project on the density of O. hupensis was analyzed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Clegg, Samuel M; Barefield, James E; Wiens, Roger C
2008-01-01
The ChemCam instrument on the Mars Science Laboratory (MSL) will include a laser-induced breakdown spectrometer (LIBS) to quantify major and minor elemental compositions. The traditional analytical chemistry approach to calibration curves for these data regresses a single diagnostic peak area against concentration for each element. This approach contrasts with a new multivariate method in which elemental concentrations are predicted by step-wise multiple regression analysis based on areas of a specific set of diagnostic peaks for each element. The method is tested on LIBS data from igneous and metamorphosed rocks. Between 4 and 13 partial regression coefficients are needed to describemore » each elemental abundance accurately (i.e., with a regression line of R{sup 2} > 0.9995 for the relationship between predicted and measured elemental concentration) for all major and minor elements studied. Validation plots suggest that the method is limited at present by the small data set, and will work best for prediction of concentration when a wide variety of compositions and rock types has been analyzed.« less
A Model for Predicting Student Performance on High-Stakes Assessment
ERIC Educational Resources Information Center
Dammann, Matthew Walter
2010-01-01
This research study examined the use of student achievement on reading and math state assessments to predict success on the science state assessment. Multiple regression analysis was utilized to test the prediction for all students in grades 5 and 8 in a mid-Atlantic state. The prediction model developed from the analysis explored the combined…
Sando, Roy; Chase, Katherine J.
2017-03-23
A common statistical procedure for estimating streamflow statistics at ungaged locations is to develop a relational model between streamflow and drainage basin characteristics at gaged locations using least squares regression analysis; however, least squares regression methods are parametric and make constraining assumptions about the data distribution. The random forest regression method provides an alternative nonparametric method for estimating streamflow characteristics at ungaged sites and requires that the data meet fewer statistical conditions than least squares regression methods.Random forest regression analysis was used to develop predictive models for 89 streamflow characteristics using Precipitation-Runoff Modeling System simulated streamflow data and drainage basin characteristics at 179 sites in central and eastern Montana. The predictive models were developed from streamflow data simulated for current (baseline, water years 1982–99) conditions and three future periods (water years 2021–38, 2046–63, and 2071–88) under three different climate-change scenarios. These predictive models were then used to predict streamflow characteristics for baseline conditions and three future periods at 1,707 fish sampling sites in central and eastern Montana. The average root mean square error for all predictive models was about 50 percent. When streamflow predictions at 23 fish sampling sites were compared to nearby locations with simulated data, the mean relative percent difference was about 43 percent. When predictions were compared to streamflow data recorded at 21 U.S. Geological Survey streamflow-gaging stations outside of the calibration basins, the average mean absolute percent error was about 73 percent.
Using Time Series Analysis to Predict Cardiac Arrest in a PICU.
Kennedy, Curtis E; Aoki, Noriaki; Mariscalco, Michele; Turley, James P
2015-11-01
To build and test cardiac arrest prediction models in a PICU, using time series analysis as input, and to measure changes in prediction accuracy attributable to different classes of time series data. Retrospective cohort study. Thirty-one bed academic PICU that provides care for medical and general surgical (not congenital heart surgery) patients. Patients experiencing a cardiac arrest in the PICU and requiring external cardiac massage for at least 2 minutes. None. One hundred three cases of cardiac arrest and 109 control cases were used to prepare a baseline dataset that consisted of 1,025 variables in four data classes: multivariate, raw time series, clinical calculations, and time series trend analysis. We trained 20 arrest prediction models using a matrix of five feature sets (combinations of data classes) with four modeling algorithms: linear regression, decision tree, neural network, and support vector machine. The reference model (multivariate data with regression algorithm) had an accuracy of 78% and 87% area under the receiver operating characteristic curve. The best model (multivariate + trend analysis data with support vector machine algorithm) had an accuracy of 94% and 98% area under the receiver operating characteristic curve. Cardiac arrest predictions based on a traditional model built with multivariate data and a regression algorithm misclassified cases 3.7 times more frequently than predictions that included time series trend analysis and built with a support vector machine algorithm. Although the final model lacks the specificity necessary for clinical application, we have demonstrated how information from time series data can be used to increase the accuracy of clinical prediction models.
Rudy M. Schuster; Laura Sullivan; Duarte Morais; Diane Kuehn
2009-01-01
This analysis explores the differences in Affective and Cognitive Destination Image among three Hudson River Valley (New York) tourism communities. Multiple regressions were used with six dimensions of visitors' images to predict future intention to revisit. Two of the three regression models were significant. The only significantly contributing independent...
Four Major South Korea's Rivers Using Deep Learning Models.
Lee, Sangmok; Lee, Donghyun
2018-06-24
Harmful algal blooms are an annual phenomenon that cause environmental damage, economic losses, and disease outbreaks. A fundamental solution to this problem is still lacking, thus, the best option for counteracting the effects of algal blooms is to improve advance warnings (predictions). However, existing physical prediction models have difficulties setting a clear coefficient indicating the relationship between each factor when predicting algal blooms, and many variable data sources are required for the analysis. These limitations are accompanied by high time and economic costs. Meanwhile, artificial intelligence and deep learning methods have become increasingly common in scientific research; attempts to apply the long short-term memory (LSTM) model to environmental research problems are increasing because the LSTM model exhibits good performance for time-series data prediction. However, few studies have applied deep learning models or LSTM to algal bloom prediction, especially in South Korea, where algal blooms occur annually. Therefore, we employed the LSTM model for algal bloom prediction in four major rivers of South Korea. We conducted short-term (one week) predictions by employing regression analysis and deep learning techniques on a newly constructed water quality and quantity dataset drawn from 16 dammed pools on the rivers. Three deep learning models (multilayer perceptron, MLP; recurrent neural network, RNN; and long short-term memory, LSTM) were used to predict chlorophyll-a, a recognized proxy for algal activity. The results were compared to those from OLS (ordinary least square) regression analysis and actual data based on the root mean square error (RSME). The LSTM model showed the highest prediction rate for harmful algal blooms and all deep learning models out-performed the OLS regression analysis. Our results reveal the potential for predicting algal blooms using LSTM and deep learning.
Predictive equations for the estimation of body size in seals and sea lions (Carnivora: Pinnipedia)
Churchill, Morgan; Clementz, Mark T; Kohno, Naoki
2014-01-01
Body size plays an important role in pinniped ecology and life history. However, body size data is often absent for historical, archaeological, and fossil specimens. To estimate the body size of pinnipeds (seals, sea lions, and walruses) for today and the past, we used 14 commonly preserved cranial measurements to develop sets of single variable and multivariate predictive equations for pinniped body mass and total length. Principal components analysis (PCA) was used to test whether separate family specific regressions were more appropriate than single predictive equations for Pinnipedia. The influence of phylogeny was tested with phylogenetic independent contrasts (PIC). The accuracy of these regressions was then assessed using a combination of coefficient of determination, percent prediction error, and standard error of estimation. Three different methods of multivariate analysis were examined: bidirectional stepwise model selection using Akaike information criteria; all-subsets model selection using Bayesian information criteria (BIC); and partial least squares regression. The PCA showed clear discrimination between Otariidae (fur seals and sea lions) and Phocidae (earless seals) for the 14 measurements, indicating the need for family-specific regression equations. The PIC analysis found that phylogeny had a minor influence on relationship between morphological variables and body size. The regressions for total length were more accurate than those for body mass, and equations specific to Otariidae were more accurate than those for Phocidae. Of the three multivariate methods, the all-subsets approach required the fewest number of variables to estimate body size accurately. We then used the single variable predictive equations and the all-subsets approach to estimate the body size of two recently extinct pinniped taxa, the Caribbean monk seal (Monachus tropicalis) and the Japanese sea lion (Zalophus japonicus). Body size estimates using single variable regressions generally under or over-estimated body size; however, the all-subset regression produced body size estimates that were close to historically recorded body length for these two species. This indicates that the all-subset regression equations developed in this study can estimate body size accurately. PMID:24916814
Shi, K-Q; Zhou, Y-Y; Yan, H-D; Li, H; Wu, F-L; Xie, Y-Y; Braddock, M; Lin, X-Y; Zheng, M-H
2017-02-01
At present, there is no ideal model for predicting the short-term outcome of patients with acute-on-chronic hepatitis B liver failure (ACHBLF). This study aimed to establish and validate a prognostic model by using the classification and regression tree (CART) analysis. A total of 1047 patients from two separate medical centres with suspected ACHBLF were screened in the study, which were recognized as derivation cohort and validation cohort, respectively. CART analysis was applied to predict the 3-month mortality of patients with ACHBLF. The accuracy of the CART model was tested using the area under the receiver operating characteristic curve, which was compared with the model for end-stage liver disease (MELD) score and a new logistic regression model. CART analysis identified four variables as prognostic factors of ACHBLF: total bilirubin, age, serum sodium and INR, and three distinct risk groups: low risk (4.2%), intermediate risk (30.2%-53.2%) and high risk (81.4%-96.9%). The new logistic regression model was constructed with four independent factors, including age, total bilirubin, serum sodium and prothrombin activity by multivariate logistic regression analysis. The performances of the CART model (0.896), similar to the logistic regression model (0.914, P=.382), exceeded that of MELD score (0.667, P<.001). The results were confirmed in the validation cohort. We have developed and validated a novel CART model superior to MELD for predicting three-month mortality of patients with ACHBLF. Thus, the CART model could facilitate medical decision-making and provide clinicians with a validated practical bedside tool for ACHBLF risk stratification. © 2016 John Wiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
Xu, Wenbo; Jing, Shaocai; Yu, Wenjuan; Wang, Zhaoxian; Zhang, Guoping; Huang, Jianxi
2013-11-01
In this study, the high risk areas of Sichuan Province with debris flow, Panzhihua and Liangshan Yi Autonomous Prefecture, were taken as the studied areas. By using rainfall and environmental factors as the predictors and based on the different prior probability combinations of debris flows, the prediction of debris flows was compared in the areas with statistical methods: logistic regression (LR) and Bayes discriminant analysis (BDA). The results through the comprehensive analysis show that (a) with the mid-range scale prior probability, the overall predicting accuracy of BDA is higher than those of LR; (b) with equal and extreme prior probabilities, the overall predicting accuracy of LR is higher than those of BDA; (c) the regional predicting models of debris flows with rainfall factors only have worse performance than those introduced environmental factors, and the predicting accuracies of occurrence and nonoccurrence of debris flows have been changed in the opposite direction as the supplemented information.
Meta-Analysis of the Reasoned Action Approach (RAA) to Understanding Health Behaviors.
McEachan, Rosemary; Taylor, Natalie; Harrison, Reema; Lawton, Rebecca; Gardner, Peter; Conner, Mark
2016-08-01
Reasoned action approach (RAA) includes subcomponents of attitude (experiential/instrumental), perceived norm (injunctive/descriptive), and perceived behavioral control (capacity/autonomy) to predict intention and behavior. To provide a meta-analysis of the RAA for health behaviors focusing on comparing the pairs of RAA subcomponents and differences between health protection and health-risk behaviors. The present research reports a meta-analysis of correlational tests of RAA subcomponents, examination of moderators, and combined effects of subcomponents on intention and behavior. Regressions were used to predict intention and behavior based on data from studies measuring all variables. Capacity and experiential attitude had large, and other constructs had small-medium-sized correlations with intention; all constructs except autonomy were significant independent predictors of intention in regressions. Intention, capacity, and experiential attitude had medium-large, and other constructs had small-medium-sized correlations with behavior; intention, capacity, experiential attitude, and descriptive norm were significant independent predictors of behavior in regressions. The RAA subcomponents have utility in predicting and understanding health behaviors.
Radio Propagation Prediction Software for Complex Mixed Path Physical Channels
2006-08-14
63 4.4.6. Applied Linear Regression Analysis in the Frequency Range 1-50 MHz 69 4.4.7. Projected Scaling to...4.4.6. Applied Linear Regression Analysis in the Frequency Range 1-50 MHz In order to construct a comprehensive numerical algorithm capable of
Kajbafnezhad, H; Ahadi, H; Heidarie, A; Askari, P; Enayati, M
2012-10-01
The aim of this study was to predict athletic success motivation by mental skills, emotional intelligence and its components. The research sample consisted of 153 male athletes who were selected through random multistage sampling. The subjects completed the Mental Skills Questionnaire, Bar-On Emotional Intelligence questionnaire and the perception of sport success questionnaire. Data were analyzed using Pearson correlation coefficient and multiple regressions. Regression analysis shows that between the two variables of mental skill and emotional intelligence, mental skill is the best predictor for athletic success motivation and has a better ability to predict the success rate of the participants. Regression analysis results showed that among all the components of emotional intelligence, self-respect had a significantly higher ability to predict athletic success motivation. The use of psychological skills and emotional intelligence as an mediating and regulating factor and organizer cause leads to improved performance and can not only can to help athletes in making suitable and effective decisions for reaching a desired goal.
Symplectic geometry spectrum regression for prediction of noisy time series
NASA Astrophysics Data System (ADS)
Xie, Hong-Bo; Dokos, Socrates; Sivakumar, Bellie; Mengersen, Kerrie
2016-05-01
We present the symplectic geometry spectrum regression (SGSR) technique as well as a regularized method based on SGSR for prediction of nonlinear time series. The main tool of analysis is the symplectic geometry spectrum analysis, which decomposes a time series into the sum of a small number of independent and interpretable components. The key to successful regularization is to damp higher order symplectic geometry spectrum components. The effectiveness of SGSR and its superiority over local approximation using ordinary least squares are demonstrated through prediction of two noisy synthetic chaotic time series (Lorenz and Rössler series), and then tested for prediction of three real-world data sets (Mississippi River flow data and electromyographic and mechanomyographic signal recorded from human body).
The Use of Linear Programming for Prediction.
ERIC Educational Resources Information Center
Schnittjer, Carl J.
The purpose of the study was to develop a linear programming model to be used for prediction, test the accuracy of the predictions, and compare the accuracy with that produced by curvilinear multiple regression analysis. (Author)
Weighted regression analysis and interval estimators
Donald W. Seegrist
1974-01-01
A method for deriving the weighted least squares estimators for the parameters of a multiple regression model. Confidence intervals for expected values, and prediction intervals for the means of future samples are given.
Predictive and mechanistic multivariate linear regression models for reaction development
Santiago, Celine B.; Guo, Jing-Yao
2018-01-01
Multivariate Linear Regression (MLR) models utilizing computationally-derived and empirically-derived physical organic molecular descriptors are described in this review. Several reports demonstrating the effectiveness of this methodological approach towards reaction optimization and mechanistic interrogation are discussed. A detailed protocol to access quantitative and predictive MLR models is provided as a guide for model development and parameter analysis. PMID:29719711
Louis R Iverson; Anantha M. Prasad; Mark W. Schwartz; Mark W. Schwartz
2005-01-01
We predict current distribution and abundance for tree species present in eastern North America, and subsequently estimate potential suitable habitat for those species under a changed climate with 2 x CO2. We used a series of statistical models (i.e., Regression Tree Analysis (RTA), Multivariate Adaptive Regression Splines (MARS), Bagging Trees (...
Laurens, L M L; Wolfrum, E J
2013-12-18
One of the challenges associated with microalgal biomass characterization and the comparison of microalgal strains and conversion processes is the rapid determination of the composition of algae. We have developed and applied a high-throughput screening technology based on near-infrared (NIR) spectroscopy for the rapid and accurate determination of algal biomass composition. We show that NIR spectroscopy can accurately predict the full composition using multivariate linear regression analysis of varying lipid, protein, and carbohydrate content of algal biomass samples from three strains. We also demonstrate a high quality of predictions of an independent validation set. A high-throughput 96-well configuration for spectroscopy gives equally good prediction relative to a ring-cup configuration, and thus, spectra can be obtained from as little as 10-20 mg of material. We found that lipids exhibit a dominant, distinct, and unique fingerprint in the NIR spectrum that allows for the use of single and multiple linear regression of respective wavelengths for the prediction of the biomass lipid content. This is not the case for carbohydrate and protein content, and thus, the use of multivariate statistical modeling approaches remains necessary.
Lawrence, Stephen J.
2012-01-01
Regression analyses show that E. coli density in samples was strongly related to turbidity, streamflow characteristics, and season at both sites. The regression equation chosen for the Norcross data showed that 78 percent of the variability in E. coli density (in log base 10 units) was explained by the variability in turbidity values (in log base 10 units), streamflow event (dry-weather flow or stormflow), season (cool or warm), and an interaction term that is the cross product of streamflow event and turbidity. The regression equation chosen for the Atlanta data showed that 76 percent of the variability in E. coli density (in log base 10 units) was explained by the variability in turbidity values (in log base 10 units), water temperature, streamflow event, and an interaction term that is the cross product of streamflow event and turbidity. Residual analysis and model confirmation using new data indicated the regression equations selected at both sites predicted E. coli density within the 90 percent prediction intervals of the equations and could be used to predict E. coli density in real time at both sites.
Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen
2014-01-01
It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models. PMID:24574916
Cervical Vertebral Body's Volume as a New Parameter for Predicting the Skeletal Maturation Stages.
Choi, Youn-Kyung; Kim, Jinmi; Yamaguchi, Tetsutaro; Maki, Koutaro; Ko, Ching-Chang; Kim, Yong-Il
2016-01-01
This study aimed to determine the correlation between the volumetric parameters derived from the images of the second, third, and fourth cervical vertebrae by using cone beam computed tomography with skeletal maturation stages and to propose a new formula for predicting skeletal maturation by using regression analysis. We obtained the estimation of skeletal maturation levels from hand-wrist radiographs and volume parameters derived from the second, third, and fourth cervical vertebrae bodies from 102 Japanese patients (54 women and 48 men, 5-18 years of age). We performed Pearson's correlation coefficient analysis and simple regression analysis. All volume parameters derived from the second, third, and fourth cervical vertebrae exhibited statistically significant correlations (P < 0.05). The simple regression model with the greatest R-square indicated the fourth-cervical-vertebra volume as an independent variable with a variance inflation factor less than ten. The explanation power was 81.76%. Volumetric parameters of cervical vertebrae using cone beam computed tomography are useful in regression models. The derived regression model has the potential for clinical application as it enables a simple and quantitative analysis to evaluate skeletal maturation level.
Cervical Vertebral Body's Volume as a New Parameter for Predicting the Skeletal Maturation Stages
Choi, Youn-Kyung; Kim, Jinmi; Maki, Koutaro; Ko, Ching-Chang
2016-01-01
This study aimed to determine the correlation between the volumetric parameters derived from the images of the second, third, and fourth cervical vertebrae by using cone beam computed tomography with skeletal maturation stages and to propose a new formula for predicting skeletal maturation by using regression analysis. We obtained the estimation of skeletal maturation levels from hand-wrist radiographs and volume parameters derived from the second, third, and fourth cervical vertebrae bodies from 102 Japanese patients (54 women and 48 men, 5–18 years of age). We performed Pearson's correlation coefficient analysis and simple regression analysis. All volume parameters derived from the second, third, and fourth cervical vertebrae exhibited statistically significant correlations (P < 0.05). The simple regression model with the greatest R-square indicated the fourth-cervical-vertebra volume as an independent variable with a variance inflation factor less than ten. The explanation power was 81.76%. Volumetric parameters of cervical vertebrae using cone beam computed tomography are useful in regression models. The derived regression model has the potential for clinical application as it enables a simple and quantitative analysis to evaluate skeletal maturation level. PMID:27340668
Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen
2014-01-01
It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models.
Random forest models to predict aqueous solubility.
Palmer, David S; O'Boyle, Noel M; Glen, Robert C; Mitchell, John B O
2007-01-01
Random Forest regression (RF), Partial-Least-Squares (PLS) regression, Support Vector Machines (SVM), and Artificial Neural Networks (ANN) were used to develop QSPR models for the prediction of aqueous solubility, based on experimental data for 988 organic molecules. The Random Forest regression model predicted aqueous solubility more accurately than those created by PLS, SVM, and ANN and offered methods for automatic descriptor selection, an assessment of descriptor importance, and an in-parallel measure of predictive ability, all of which serve to recommend its use. The prediction of log molar solubility for an external test set of 330 molecules that are solid at 25 degrees C gave an r2 = 0.89 and RMSE = 0.69 log S units. For a standard data set selected from the literature, the model performed well with respect to other documented methods. Finally, the diversity of the training and test sets are compared to the chemical space occupied by molecules in the MDL drug data report, on the basis of molecular descriptors selected by the regression analysis.
On the calibration process of film dosimetry: OLS inverse regression versus WLS inverse prediction.
Crop, F; Van Rompaye, B; Paelinck, L; Vakaet, L; Thierens, H; De Wagter, C
2008-07-21
The purpose of this study was both putting forward a statistically correct model for film calibration and the optimization of this process. A reliable calibration is needed in order to perform accurate reference dosimetry with radiographic (Gafchromic) film. Sometimes, an ordinary least squares simple linear (in the parameters) regression is applied to the dose-optical-density (OD) curve with the dose as a function of OD (inverse regression) or sometimes OD as a function of dose (inverse prediction). The application of a simple linear regression fit is an invalid method because heteroscedasticity of the data is not taken into account. This could lead to erroneous results originating from the calibration process itself and thus to a lower accuracy. In this work, we compare the ordinary least squares (OLS) inverse regression method with the correct weighted least squares (WLS) inverse prediction method to create calibration curves. We found that the OLS inverse regression method could lead to a prediction bias of up to 7.3 cGy at 300 cGy and total prediction errors of 3% or more for Gafchromic EBT film. Application of the WLS inverse prediction method resulted in a maximum prediction bias of 1.4 cGy and total prediction errors below 2% in a 0-400 cGy range. We developed a Monte-Carlo-based process to optimize calibrations, depending on the needs of the experiment. This type of thorough analysis can lead to a higher accuracy for film dosimetry.
Whole-genome regression and prediction methods applied to plant and animal breeding.
de Los Campos, Gustavo; Hickey, John M; Pong-Wong, Ricardo; Daetwyler, Hans D; Calus, Mario P L
2013-02-01
Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade.
Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding
de los Campos, Gustavo; Hickey, John M.; Pong-Wong, Ricardo; Daetwyler, Hans D.; Calus, Mario P. L.
2013-01-01
Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade. PMID:22745228
Machine learning in updating predictive models of planning and scheduling transportation projects
DOT National Transportation Integrated Search
1997-01-01
A method combining machine learning and regression analysis to automatically and intelligently update predictive models used in the Kansas Department of Transportations (KDOTs) internal management system is presented. The predictive models used...
Zhou, Jinzhe; Zhou, Yanbing; Cao, Shougen; Li, Shikuan; Wang, Hao; Niu, Zhaojian; Chen, Dong; Wang, Dongsheng; Lv, Liang; Zhang, Jian; Li, Yu; Jiao, Xuelong; Tan, Xiaojie; Zhang, Jianli; Wang, Haibo; Zhang, Bingyuan; Lu, Yun; Sun, Zhenqing
2016-01-01
Reporting of surgical complications is common, but few provide information about the severity and estimate risk factors of complications. If have, but lack of specificity. We retrospectively analyzed data on 2795 gastric cancer patients underwent surgical procedure at the Affiliated Hospital of Qingdao University between June 2007 and June 2012, established multivariate logistic regression model to predictive risk factors related to the postoperative complications according to the Clavien-Dindo classification system. Twenty-four out of 86 variables were identified statistically significant in univariate logistic regression analysis, 11 significant variables entered multivariate analysis were employed to produce the risk model. Liver cirrhosis, diabetes mellitus, Child classification, invasion of neighboring organs, combined resection, introperative transfusion, Billroth II anastomosis of reconstruction, malnutrition, surgical volume of surgeons, operating time and age were independent risk factors for postoperative complications after gastrectomy. Based on logistic regression equation, p=Exp∑BiXi / (1+Exp∑BiXi), multivariate logistic regression predictive model that calculated the risk of postoperative morbidity was developed, p = 1/(1 + e((4.810-1.287X1-0.504X2-0.500X3-0.474X4-0.405X5-0.318X6-0.316X7-0.305X8-0.278X9-0.255X10-0.138X11))). The accuracy, sensitivity and specificity of the model to predict the postoperative complications were 86.7%, 76.2% and 88.6%, respectively. This risk model based on Clavien-Dindo grading severity of complications system and logistic regression analysis can predict severe morbidity specific to an individual patient's risk factors, estimate patients' risks and benefits of gastric surgery as an accurate decision-making tool and may serve as a template for the development of risk models for other surgical groups.
Fei, Yang; Hu, Jian; Gao, Kun; Tu, Jianfeng; Li, Wei-Qin; Wang, Wei
2017-06-01
To construct a radical basis function (RBF) artificial neural networks (ANNs) model to predict the incidence of acute pancreatitis (AP)-induced portal vein thrombosis. The analysis included 353 patients with AP who had admitted between January 2011 and December 2015. RBF ANNs model and logistic regression model were constructed based on eleven factors relevant to AP respectively. Statistical indexes were used to evaluate the value of the prediction in two models. The predict sensitivity, specificity, positive predictive value, negative predictive value and accuracy by RBF ANNs model for PVT were 73.3%, 91.4%, 68.8%, 93.0% and 87.7%, respectively. There were significant differences between the RBF ANNs and logistic regression models in these parameters (P<0.05). In addition, a comparison of the area under receiver operating characteristic curves of the two models showed a statistically significant difference (P<0.05). The RBF ANNs model is more likely to predict the occurrence of PVT induced by AP than logistic regression model. D-dimer, AMY, Hct and PT were important prediction factors of approval for AP-induced PVT. Copyright © 2017 Elsevier Inc. All rights reserved.
Zeng, Fangfang; Li, Zhongtao; Yu, Xiaoling; Zhou, Linuo
2013-01-01
Background This study aimed to develop the artificial neural network (ANN) and multivariable logistic regression (LR) analyses for prediction modeling of cardiovascular autonomic (CA) dysfunction in the general population, and compare the prediction models using the two approaches. Methods and Materials We analyzed a previous dataset based on a Chinese population sample consisting of 2,092 individuals aged 30–80 years. The prediction models were derived from an exploratory set using ANN and LR analysis, and were tested in the validation set. Performances of these prediction models were then compared. Results Univariate analysis indicated that 14 risk factors showed statistically significant association with the prevalence of CA dysfunction (P<0.05). The mean area under the receiver-operating curve was 0.758 (95% CI 0.724–0.793) for LR and 0.762 (95% CI 0.732–0.793) for ANN analysis, but noninferiority result was found (P<0.001). The similar results were found in comparisons of sensitivity, specificity, and predictive values in the prediction models between the LR and ANN analyses. Conclusion The prediction models for CA dysfunction were developed using ANN and LR. ANN and LR are two effective tools for developing prediction models based on our dataset. PMID:23940593
ERIC Educational Resources Information Center
Campbell, S. Duke; Greenberg, Barry
The development of a predictive equation capable of explaining a significant percentage of enrollment variability at Florida International University is described. A model utilizing trend analysis and a multiple regression approach to enrollment forecasting was adapted to investigate enrollment dynamics at the university. Four independent…
Multiple Logistic Regression Analysis of Cigarette Use among High School Students
ERIC Educational Resources Information Center
Adwere-Boamah, Joseph
2011-01-01
A binary logistic regression analysis was performed to predict high school students' cigarette smoking behavior from selected predictors from 2009 CDC Youth Risk Behavior Surveillance Survey. The specific target student behavior of interest was frequent cigarette use. Five predictor variables included in the model were: a) race, b) frequency of…
Predicting Student Success on the Texas Chemistry STAAR Test: A Logistic Regression Analysis
ERIC Educational Resources Information Center
Johnson, William L.; Johnson, Annabel M.; Johnson, Jared
2012-01-01
Background: The context is the new Texas STAAR end-of-course testing program. Purpose: The authors developed a logistic regression model to predict who would pass-or-fail the new Texas chemistry STAAR end-of-course exam. Setting: Robert E. Lee High School (5A) with an enrollment of 2700 students, Tyler, Texas. Date of the study was the 2011-2012…
Song, Seung Yeob; Lee, Young Koung; Kim, In-Jung
2016-01-01
A high-throughput screening system for Citrus lines were established with higher sugar and acid contents using Fourier transform infrared (FT-IR) spectroscopy in combination with multivariate analysis. FT-IR spectra confirmed typical spectral differences between the frequency regions of 950-1100 cm(-1), 1300-1500 cm(-1), and 1500-1700 cm(-1). Principal component analysis (PCA) and subsequent partial least square-discriminant analysis (PLS-DA) were able to discriminate five Citrus lines into three separate clusters corresponding to their taxonomic relationships. The quantitative predictive modeling of sugar and acid contents from Citrus fruits was established using partial least square regression algorithms from FT-IR spectra. The regression coefficients (R(2)) between predicted values and estimated sugar and acid content values were 0.99. These results demonstrate that by using FT-IR spectra and applying quantitative prediction modeling to Citrus sugar and acid contents, excellent Citrus lines can be early detected with greater accuracy. Copyright © 2015 Elsevier Ltd. All rights reserved.
Feaster, Toby D.; Gotvald, Anthony J.; Weaver, J. Curtis
2014-01-01
Reliable estimates of the magnitude and frequency of floods are essential for the design of transportation and water-conveyance structures, flood-insurance studies, and flood-plain management. Such estimates are particularly important in densely populated urban areas. In order to increase the number of streamflow-gaging stations (streamgages) available for analysis, expand the geographical coverage that would allow for application of regional regression equations across State boundaries, and build on a previous flood-frequency investigation of rural U.S Geological Survey streamgages in the Southeast United States, a multistate approach was used to update methods for determining the magnitude and frequency of floods in urban and small, rural streams that are not substantially affected by regulation or tidal fluctuations in Georgia, South Carolina, and North Carolina. The at-site flood-frequency analysis of annual peak-flow data for urban and small, rural streams (through September 30, 2011) included 116 urban streamgages and 32 small, rural streamgages, defined in this report as basins draining less than 1 square mile. The regional regression analysis included annual peak-flow data from an additional 338 rural streamgages previously included in U.S. Geological Survey flood-frequency reports and 2 additional rural streamgages in North Carolina that were not included in the previous Southeast rural flood-frequency investigation for a total of 488 streamgages included in the urban and small, rural regression analysis. The at-site flood-frequency analyses for the urban and small, rural streamgages included the expected moments algorithm, which is a modification of the Bulletin 17B log-Pearson type III method for fitting the statistical distribution to the logarithms of the annual peak flows. Where applicable, the flood-frequency analysis also included low-outlier and historic information. Additionally, the application of a generalized Grubbs-Becks test allowed for the detection of multiple potentially influential low outliers. Streamgage basin characteristics were determined using geographical information system techniques. Initial ordinary least squares regression simulations reduced the number of basin characteristics on the basis of such factors as statistical significance, coefficient of determination, Mallow’s Cp statistic, and ease of measurement of the explanatory variable. Application of generalized least squares regression techniques produced final predictive (regression) equations for estimating the 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probability flows for urban and small, rural ungaged basins for three hydrologic regions (HR1, Piedmont–Ridge and Valley; HR3, Sand Hills; and HR4, Coastal Plain), which previously had been defined from exploratory regression analysis in the Southeast rural flood-frequency investigation. Because of the limited availability of urban streamgages in the Coastal Plain of Georgia, South Carolina, and North Carolina, additional urban streamgages in Florida and New Jersey were used in the regression analysis for this region. Including the urban streamgages in New Jersey allowed for the expansion of the applicability of the predictive equations in the Coastal Plain from 3.5 to 53.5 square miles. Average standard error of prediction for the predictive equations, which is a measure of the average accuracy of the regression equations when predicting flood estimates for ungaged sites, range from 25.0 percent for the 10-percent annual exceedance probability regression equation for the Piedmont–Ridge and Valley region to 73.3 percent for the 0.2-percent annual exceedance probability regression equation for the Sand Hills region.
Linear regression models for solvent accessibility prediction in proteins.
Wagner, Michael; Adamczak, Rafał; Porollo, Aleksey; Meller, Jarosław
2005-04-01
The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple and computationally much more efficient linear SVR performs comparably to nonlinear models and thus can be used in order to facilitate further attempts to design more accurate RSA prediction methods, with applications to fold recognition and de novo protein structure prediction methods.
An Effect Size for Regression Predictors in Meta-Analysis
ERIC Educational Resources Information Center
Aloe, Ariel M.; Becker, Betsy Jane
2012-01-01
A new effect size representing the predictive power of an independent variable from a multiple regression model is presented. The index, denoted as r[subscript sp], is the semipartial correlation of the predictor with the outcome of interest. This effect size can be computed when multiple predictor variables are included in the regression model…
Held, Elizabeth; Cape, Joshua; Tintle, Nathan
2016-01-01
Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data.
Using regression analysis to predict emergency patient volume at the Indianapolis 500 mile race.
Bowdish, G E; Cordell, W H; Bock, H C; Vukov, L F
1992-10-01
Emergency physicians often plan and provide on-site medical care for mass gatherings. Most of the mass gathering literature is descriptive. Only a few studies have looked at factors such as crowd size, event characteristics, or weather in predicting numbers and types of patients at mass gatherings. We used regression analysis to relate patient volume on Race Day at the Indianapolis Motor Speedway to weather conditions and race characteristics. Race Day weather data for the years 1983 to 1989 were obtained from the National Oceanic and Atmospheric Administration. Data regarding patients treated on 1983 to 1989 Race Days were obtained from the facility hospital (Hannah Emergency Medical Center) data base. Regression analysis was performed using weather factors and race characteristics as independent variables and number of patients seen as the dependent variable. Data from 1990 were used to test the validity of the model. There was a significant relationship between dew point (which is calculated from temperature and humidity) and patient load (P less than .01). Dew point, however, failed to predict patient load during the 1990 race. No relationships could be established between humidity, sunshine, wind, or race characteristics and number of patients. Although higher dew point was associated with higher patient load during the 1983 to 1989 races, dew point was a poor predictor of patient load during the 1990 race. Regression analysis may be useful in identifying relationships between event characteristics and patient load but is probably inadequate to explain the complexities of crowd behavior and too simplified to use as a prediction tool.
Wang, Qingliang; Li, Xiaojie; Hu, Kunpeng; Zhao, Kun; Yang, Peisheng; Liu, Bo
2015-05-12
To explore the risk factors of portal hypertensive gastropathy (PHG) in patients with hepatitis B associated cirrhosis and establish a Logistic regression model of noninvasive prediction. The clinical data of 234 hospitalized patients with hepatitis B associated cirrhosis from March 2012 to March 2014 were analyzed retrospectively. The dependent variable was the occurrence of PHG while the independent variables were screened by binary Logistic analysis. Multivariate Logistic regression was used for further analysis of significant noninvasive independent variables. Logistic regression model was established and odds ratio was calculated for each factor. The accuracy, sensitivity and specificity of model were evaluated by the curve of receiver operating characteristic (ROC). According to univariate Logistic regression, the risk factors included hepatic dysfunction, albumin (ALB), bilirubin (TB), prothrombin time (PT), platelet (PLT), white blood cell (WBC), portal vein diameter, spleen index, splenic vein diameter, diameter ratio, PLT to spleen volume ratio, esophageal varices (EV) and gastric varices (GV). Multivariate analysis showed that hepatic dysfunction (X1), TB (X2), PLT (X3) and splenic vein diameter (X4) were the major occurring factors for PHG. The established regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4. The accuracy of model for PHG was 79.1% with a sensitivity of 77.2% and a specificity of 80.8%. Hepatic dysfunction, TB, PLT and splenic vein diameter are risk factors for PHG and the noninvasive predicted Logistic regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4.
Saliba, Christopher M; Clouthier, Allison L; Brandon, Scott C E; Rainbow, Michael J; Deluzio, Kevin J
2018-05-29
Abnormal loading of the knee joint contributes to the pathogenesis of knee osteoarthritis. Gait retraining is a non-invasive intervention that aims to reduce knee loads by providing audible, visual, or haptic feedback of gait parameters. The computational expense of joint contact force prediction has limited real-time feedback to surrogate measures of the contact force, such as the knee adduction moment. We developed a method to predict knee joint contact forces using motion analysis and a statistical regression model that can be implemented in near real-time. Gait waveform variables were deconstructed using principal component analysis and a linear regression was used to predict the principal component scores of the contact force waveforms. Knee joint contact force waveforms were reconstructed using the predicted scores. We tested our method using a heterogenous population of asymptomatic controls and subjects with knee osteoarthritis. The reconstructed contact force waveforms had mean (SD) RMS differences of 0.17 (0.05) bodyweight compared to the contact forces predicted by a musculoskeletal model. Our method successfully predicted subject-specific shape features of contact force waveforms and is a potentially powerful tool in biofeedback and clinical gait analysis.
Mocellin, Simone; Ambrosi, Alessandro; Montesco, Maria Cristina; Foletto, Mirto; Zavagno, Giorgio; Nitti, Donato; Lise, Mario; Rossi, Carlo Riccardo
2006-08-01
Currently, approximately 80% of melanoma patients undergoing sentinel node biopsy (SNB) have negative sentinel lymph nodes (SLNs), and no prediction system is reliable enough to be implemented in the clinical setting to reduce the number of SNB procedures. In this study, the predictive power of support vector machine (SVM)-based statistical analysis was tested. The clinical records of 246 patients who underwent SNB at our institution were used for this analysis. The following clinicopathologic variables were considered: the patient's age and sex and the tumor's histological subtype, Breslow thickness, Clark level, ulceration, mitotic index, lymphocyte infiltration, regression, angiolymphatic invasion, microsatellitosis, and growth phase. The results of SVM-based prediction of SLN status were compared with those achieved with logistic regression. The SLN positivity rate was 22% (52 of 234). When the accuracy was > or = 80%, the negative predictive value, positive predictive value, specificity, and sensitivity were 98%, 54%, 94%, and 77% and 82%, 41%, 69%, and 93% by using SVM and logistic regression, respectively. Moreover, SVM and logistic regression were associated with a diagnostic error and an SNB percentage reduction of (1) 1% and 60% and (2) 15% and 73%, respectively. The results from this pilot study suggest that SVM-based prediction of SLN status might be evaluated as a prognostic method to avoid the SNB procedure in 60% of patients currently eligible, with a very low error rate. If validated in larger series, this strategy would lead to obvious advantages in terms of both patient quality of life and costs for the health care system.
Ren, Y Y; Zhou, L C; Yang, L; Liu, P Y; Zhao, B W; Liu, H X
2016-09-01
The paper highlights the use of the logistic regression (LR) method in the construction of acceptable statistically significant, robust and predictive models for the classification of chemicals according to their aquatic toxic modes of action. Essentials accounting for a reliable model were all considered carefully. The model predictors were selected by stepwise forward discriminant analysis (LDA) from a combined pool of experimental data and chemical structure-based descriptors calculated by the CODESSA and DRAGON software packages. Model predictive ability was validated both internally and externally. The applicability domain was checked by the leverage approach to verify prediction reliability. The obtained models are simple and easy to interpret. In general, LR performs much better than LDA and seems to be more attractive for the prediction of the more toxic compounds, i.e. compounds that exhibit excess toxicity versus non-polar narcotic compounds and more reactive compounds versus less reactive compounds. In addition, model fit and regression diagnostics was done through the influence plot which reflects the hat-values, studentized residuals, and Cook's distance statistics of each sample. Overdispersion was also checked for the LR model. The relationships between the descriptors and the aquatic toxic behaviour of compounds are also discussed.
Ridge regression for predicting elastic moduli and hardness of calcium aluminosilicate glasses
NASA Astrophysics Data System (ADS)
Deng, Yifan; Zeng, Huidan; Jiang, Yejia; Chen, Guorong; Chen, Jianding; Sun, Luyi
2018-03-01
It is of great significance to design glasses with satisfactory mechanical properties predictively through modeling. Among various modeling methods, data-driven modeling is such a reliable approach that can dramatically shorten research duration, cut research cost and accelerate the development of glass materials. In this work, the ridge regression (RR) analysis was used to construct regression models for predicting the compositional dependence of CaO-Al2O3-SiO2 glass elastic moduli (Shear, Bulk, and Young’s moduli) and hardness based on the ternary diagram of the compositions. The property prediction over a large glass composition space was accomplished with known experimental data of various compositions in the literature, and the simulated results are in good agreement with the measured ones. This regression model can serve as a facile and effective tool for studying the relationship between the compositions and the property, enabling high-efficient design of glasses to meet the requirements for specific elasticity and hardness.
Iturriaga, H; Hirsch, S; Bunout, D; Díaz, M; Kelly, M; Silva, G; de la Maza, M P; Petermann, M; Ugarte, G
1993-04-01
Looking for a noninvasive method to predict liver histologic alterations in alcoholic patients without clinical signs of liver failure, we studied 187 chronic alcoholics recently abstinent, divided in 2 series. In the model series (n = 94) several clinical variables and results of common laboratory tests were confronted to the findings of liver biopsies. These were classified in 3 groups: 1. Normal liver; 2. Moderate alterations; 3. Marked alterations, including alcoholic hepatitis and cirrhosis. Multivariate methods used were logistic regression analysis and a classification and regression tree (CART). Both methods entered gamma-glutamyltransferase (GGT), aspartate-aminotransferase (AST), weight and age as significant and independent variables. Univariate analysis with GGT and AST at different cutoffs were also performed. To predict the presence of any kind of damage (Groups 2 and 3), CART and AST > 30 IU showed the higher sensitivity, specificity and correct prediction, both in the model and validation series. For prediction of marked liver damage, a score based on logistic regression and GGT > 110 IU had the higher efficiencies. It is concluded that GGT and AST are good markers of alcoholic liver damage and that, using sample cutoffs, histologic diagnosis can be correctly predicted in 80% of recently abstinent asymptomatic alcoholics.
Application of linear regression analysis in accuracy assessment of rolling force calculations
NASA Astrophysics Data System (ADS)
Poliak, E. I.; Shim, M. K.; Kim, G. S.; Choo, W. Y.
1998-10-01
Efficient operation of the computational models employed in process control systems require periodical assessment of the accuracy of their predictions. Linear regression is proposed as a tool which allows separate systematic and random prediction errors from those related to measurements. A quantitative characteristic of the model predictive ability is introduced in addition to standard statistical tests for model adequacy. Rolling force calculations are considered as an example for the application. However, the outlined approach can be used to assess the performance of any computational model.
Du, Qing-Yun; Wang, En-Yin; Huang, Yan; Guo, Xiao-Yi; Xiong, Yu-Jing; Yu, Yi-Ping; Yao, Gui-Dong; Shi, Sen-Lin; Sun, Ying-Pu
2016-04-01
To evaluate the independent effects of the degree of blastocoele expansion and re-expansion and the inner cell mass (ICM) and trophectoderm (TE) grades on predicting live birth after fresh and vitrified/warmed single blastocyst transfer. Retrospective study. Reproductive medical center. Women undergoing 844 fresh and 370 vitrified/warmed single blastocyst transfer cycles. None. Live-birth rate correlated with blastocyst morphology parameters by logistic regression analysis and Spearman correlations analysis. The degree of blastocoele expansion and re-expansion was the only blastocyst morphology parameter that exhibited a significant ability to predict live birth in both fresh and vitrified/warmed single blastocyst transfer cycles respectively by multivariate logistic regression and Spearman correlations analysis. Although the ICM grade was significantly related to live birth in fresh cycles according to the univariate model, its effect was not maintained in the multivariate logistic analysis. In vitrified/warmed cycles, neither ICM nor TE grade was correlated with live birth by logistic regression analysis. This study is the first to confirm that the degree of blastocoele expansion and re-expansion is a better predictor of live birth after both fresh and vitrified/warmed single blastocyst transfer cycles than ICM or TE grade. Copyright © 2016. Published by Elsevier Inc.
Does Group-Level Commitment Predict Employee Well-Being?: A Prospective Analysis.
Clausen, Thomas; Christensen, Karl Bang; Nielsen, Karina
2015-11-01
To investigate the links between group-level affective organizational commitment (AOC) and individual-level psychological well-being, self-reported sickness absence, and sleep disturbances. A total of 5085 care workers from 301 workgroups in the Danish eldercare services participated in both waves of the study (T1 [2005] and T2 [2006]). The three outcomes were analyzed using linear multilevel regression analysis, multilevel Poisson regression analysis, and multilevel logistic regression analysis, respectively. Group-level AOC (T1) significantly predicted individual-level psychological well-being, self-reported sickness absence, and sleep disturbances (T2). The association between group-level AOC (T1) and psychological well-being (T2) was fully mediated by individual-level AOC (T1), and the associations between group-level AOC (T1) and self-reported sickness absence and sleep disturbances (T2) were partially mediated by individual-level AOC (T1). Group-level AOC is an important predictor of employee well-being in contemporary health care organizations.
Liu, Jian; Gao, Yun-Hua; Li, Ding-Dong; Gao, Yan-Chun; Hou, Ling-Mi; Xie, Ting
2014-01-01
To compare the value of contrast-enhanced ultrasound (CEUS) qualitative and quantitative analysis in the identification of breast tumor lumps. Qualitative and quantitative indicators of CEUS for 73 cases of breast tumor lumps were retrospectively analyzed by univariate and multivariate approaches. Logistic regression was applied and ROC curves were drawn for evaluation and comparison. The CEUS qualitative indicator-generated regression equation contained three indicators, namely enhanced homogeneity, diameter line expansion and peak intensity grading, which demonstrated prediction accuracy for benign and malignant breast tumor lumps of 91.8%; the quantitative indicator-generated regression equation only contained one indicator, namely the relative peak intensity, and its prediction accuracy was 61.5%. The corresponding areas under the ROC curve for qualitative and quantitative analyses were 91.3% and 75.7%, respectively, which exhibited a statistically significant difference by the Z test (P<0.05). The ability of CEUS qualitative analysis to identify breast tumor lumps is better than with quantitative analysis.
Gaussian Process Regression Model in Spatial Logistic Regression
NASA Astrophysics Data System (ADS)
Sofro, A.; Oktaviarina, A.
2018-01-01
Spatial analysis has developed very quickly in the last decade. One of the favorite approaches is based on the neighbourhood of the region. Unfortunately, there are some limitations such as difficulty in prediction. Therefore, we offer Gaussian process regression (GPR) to accommodate the issue. In this paper, we will focus on spatial modeling with GPR for binomial data with logit link function. The performance of the model will be investigated. We will discuss the inference of how to estimate the parameters and hyper-parameters and to predict as well. Furthermore, simulation studies will be explained in the last section.
Guo, Jin-Cheng; Wu, Yang; Chen, Yang; Pan, Feng; Wu, Zhi-Yong; Zhang, Jia-Sheng; Wu, Jian-Yi; Xu, Xiu-E; Zhao, Jian-Mei; Li, En-Min; Zhao, Yi; Xu, Li-Yan
2018-04-09
Esophageal squamous cell carcinoma (ESCC) is the predominant subtype of esophageal carcinoma in China. This study was to develop a staging model to predict outcomes of patients with ESCC. Using Cox regression analysis, principal component analysis (PCA), partitioning clustering, Kaplan-Meier analysis, receiver operating characteristic (ROC) curve analysis, and classification and regression tree (CART) analysis, we mined the Gene Expression Omnibus database to determine the expression profiles of genes in 179 patients with ESCC from GSE63624 and GSE63622 dataset. Univariate cox regression analysis of the GSE63624 dataset revealed that 2404 protein-coding genes (PCGs) and 635 long non-coding RNAs (lncRNAs) were associated with the survival of patients with ESCC. PCA categorized these PCGs and lncRNAs into three principal components (PCs), which were used to cluster the patients into three groups. ROC analysis demonstrated that the predictive ability of PCG-lncRNA PCs when applied to new patients was better than that of the tumor-node-metastasis staging (area under ROC curve [AUC]: 0.69 vs. 0.65, P < 0.05). Accordingly, we constructed a molecular disaggregated model comprising one lncRNA and two PCGs, which we designated as the LSB staging model using CART analysis in the GSE63624 dataset. This LSB staging model classified the GSE63622 dataset of patients into three different groups, and its effectiveness was validated by analysis of another cohort of 105 patients. The LSB staging model has clinical significance for the prognosis prediction of patients with ESCC and may serve as a three-gene staging microarray.
Variable Selection for Regression Models of Percentile Flows
NASA Astrophysics Data System (ADS)
Fouad, G.
2017-12-01
Percentile flows describe the flow magnitude equaled or exceeded for a given percent of time, and are widely used in water resource management. However, these statistics are normally unavailable since most basins are ungauged. Percentile flows of ungauged basins are often predicted using regression models based on readily observable basin characteristics, such as mean elevation. The number of these independent variables is too large to evaluate all possible models. A subset of models is typically evaluated using automatic procedures, like stepwise regression. This ignores a large variety of methods from the field of feature (variable) selection and physical understanding of percentile flows. A study of 918 basins in the United States was conducted to compare an automatic regression procedure to the following variable selection methods: (1) principal component analysis, (2) correlation analysis, (3) random forests, (4) genetic programming, (5) Bayesian networks, and (6) physical understanding. The automatic regression procedure only performed better than principal component analysis. Poor performance of the regression procedure was due to a commonly used filter for multicollinearity, which rejected the strongest models because they had cross-correlated independent variables. Multicollinearity did not decrease model performance in validation because of a representative set of calibration basins. Variable selection methods based strictly on predictive power (numbers 2-5 from above) performed similarly, likely indicating a limit to the predictive power of the variables. Similar performance was also reached using variables selected based on physical understanding, a finding that substantiates recent calls to emphasize physical understanding in modeling for predictions in ungauged basins. The strongest variables highlighted the importance of geology and land cover, whereas widely used topographic variables were the weakest predictors. Variables suffered from a high degree of multicollinearity, possibly illustrating the co-evolution of climatic and physiographic conditions. Given the ineffectiveness of many variables used here, future work should develop new variables that target specific processes associated with percentile flows.
Regression analysis for LED color detection of visual-MIMO system
NASA Astrophysics Data System (ADS)
Banik, Partha Pratim; Saha, Rappy; Kim, Ki-Doo
2018-04-01
Color detection from a light emitting diode (LED) array using a smartphone camera is very difficult in a visual multiple-input multiple-output (visual-MIMO) system. In this paper, we propose a method to determine the LED color using a smartphone camera by applying regression analysis. We employ a multivariate regression model to identify the LED color. After taking a picture of an LED array, we select the LED array region, and detect the LED using an image processing algorithm. We then apply the k-means clustering algorithm to determine the number of potential colors for feature extraction of each LED. Finally, we apply the multivariate regression model to predict the color of the transmitted LEDs. In this paper, we show our results for three types of environmental light condition: room environmental light, low environmental light (560 lux), and strong environmental light (2450 lux). We compare the results of our proposed algorithm from the analysis of training and test R-Square (%) values, percentage of closeness of transmitted and predicted colors, and we also mention about the number of distorted test data points from the analysis of distortion bar graph in CIE1931 color space.
Prediction of Spirometric Forced Expiratory Volume (FEV1) Data Using Support Vector Regression
NASA Astrophysics Data System (ADS)
Kavitha, A.; Sujatha, C. M.; Ramakrishnan, S.
2010-01-01
In this work, prediction of forced expiratory volume in 1 second (FEV1) in pulmonary function test is carried out using the spirometer and support vector regression analysis. Pulmonary function data are measured with flow volume spirometer from volunteers (N=175) using a standard data acquisition protocol. The acquired data are then used to predict FEV1. Support vector machines with polynomial kernel function with four different orders were employed to predict the values of FEV1. The performance is evaluated by computing the average prediction accuracy for normal and abnormal cases. Results show that support vector machines are capable of predicting FEV1 in both normal and abnormal cases and the average prediction accuracy for normal subjects was higher than that of abnormal subjects. Accuracy in prediction was found to be high for a regularization constant of C=10. Since FEV1 is the most significant parameter in the analysis of spirometric data, it appears that this method of assessment is useful in diagnosing the pulmonary abnormalities with incomplete data and data with poor recording.
Gouvinhas, Irene; Machado, Nelson; Carvalho, Teresa; de Almeida, José M M M; Barros, Ana I R N A
2015-01-01
Extra virgin olive oils produced from three cultivars on different maturation stages were characterized using Raman spectroscopy. Chemometric methods (principal component analysis, discriminant analysis, principal component regression and partial least squares regression) applied to Raman spectral data were utilized to evaluate and quantify the statistical differences between cultivars and their ripening process. The models for predicting the peroxide value and free acidity of olive oils showed good calibration and prediction values and presented high coefficients of determination (>0.933). Both the R(2), and the correlation equations between the measured chemical parameters, and the values predicted by each approach are presented; these comprehend both PCR and PLS, used to assess SNV normalized Raman data, as well as first and second derivative of the spectra. This study demonstrates that a combination of Raman spectroscopy with multivariate analysis methods can be useful to predict rapidly olive oil chemical characteristics during the maturation process. Copyright © 2014 Elsevier B.V. All rights reserved.
Prediction of coagulation and flocculation processes using ANN models and fuzzy regression.
Zangooei, Hossein; Delnavaz, Mohammad; Asadollahfardi, Gholamreza
2016-09-01
Coagulation and flocculation are two main processes used to integrate colloidal particles into larger particles and are two main stages of primary water treatment. Coagulation and flocculation processes are only needed when colloidal particles are a significant part of the total suspended solid fraction. Our objective was to predict turbidity of water after the coagulation and flocculation process while other parameters such as types and concentrations of coagulants, pH, and influent turbidity of raw water were known. We used a multilayer perceptron (MLP), a radial basis function (RBF) of artificial neural networks (ANNs) and various kinds of fuzzy regression analysis to predict turbidity after the coagulation and flocculation processes. The coagulant used in the pilot plant, which was located in water treatment plant, was poly aluminum chloride. We used existing data, including the type and concentrations of coagulant, pH and influent turbidity, of the raw water because these types of data were available from the pilot plant for simulation and data was collected by the Tehran water authority. The results indicated that ANNs had more ability in simulating the coagulation and flocculation process and predicting turbidity removal with different experimental data than did the fuzzy regression analysis, and may have the ability to reduce the number of jar tests, which are time-consuming and expensive. The MLP neural network proved to be the best network compared to the RBF neural network and fuzzy regression analysis in this study. The MLP neural network can predict the effluent turbidity of the coagulation and the flocculation process with a coefficient of determination (R 2 ) of 0.96 and root mean square error of 0.0106.
Methods for Improving Information from ’Undesigned’ Human Factors Experiments.
Human factors engineering, Information processing, Regression analysis , Experimental design, Least squares method, Analysis of variance, Correlation techniques, Matrices(Mathematics), Multiple disciplines, Mathematical prediction
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Shujie; Kawamoto, Taisuke; Morita, Osamu
Chemical exposure often results in liver hypertrophy in animal tests, characterized by increased liver weight, hepatocellular hypertrophy, and/or cell proliferation. While most of these changes are considered adaptive responses, there is concern that they may be associated with carcinogenesis. In this study, we have employed a toxicogenomic approach using a logistic ridge regression model to identify genes responsible for liver hypertrophy and hypertrophic hepatocarcinogenesis and to develop a predictive model for assessing hypertrophy-inducing compounds. Logistic regression models have previously been used in the quantification of epidemiological risk factors. DNA microarray data from the Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System weremore » used to identify hypertrophy-related genes that are expressed differently in hypertrophy induced by carcinogens and non-carcinogens. Data were collected for 134 chemicals (72 non-hypertrophy-inducing chemicals, 27 hypertrophy-inducing non-carcinogenic chemicals, and 15 hypertrophy-inducing carcinogenic compounds). After applying logistic ridge regression analysis, 35 genes for liver hypertrophy (e.g., Acot1 and Abcc3) and 13 genes for hypertrophic hepatocarcinogenesis (e.g., Asns and Gpx2) were selected. The predictive models built using these genes were 94.8% and 82.7% accurate, respectively. Pathway analysis of the genes indicates that, aside from a xenobiotic metabolism-related pathway as an adaptive response for liver hypertrophy, amino acid biosynthesis and oxidative responses appear to be involved in hypertrophic hepatocarcinogenesis. Early detection and toxicogenomic characterization of liver hypertrophy using our models may be useful for predicting carcinogenesis. In addition, the identified genes provide novel insight into discrimination between adverse hypertrophy associated with carcinogenesis and adaptive hypertrophy in risk assessment. - Highlights: • Hypertrophy (H) and hypertrophic carcinogenesis (C) were studied by toxicogenomics. • Important genes for H and C were selected by logistic ridge regression analysis. • Amino acid biosynthesis and oxidative responses may be involved in C. • Predictive models for H and C provided 94.8% and 82.7% accuracy, respectively. • The identified genes could be useful for assessment of liver hypertrophy.« less
Karabatsos, George
2017-02-01
Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.
Wang, Chong; Sun, Qun; Wahab, Magd Abdel; Zhang, Xingyu; Xu, Limin
2015-09-01
Rotary cup brushes mounted on each side of a road sweeper undertake heavy debris removal tasks but the characteristics have not been well known until recently. A Finite Element (FE) model that can analyze brush deformation and predict brush characteristics have been developed to investigate the sweeping efficiency and to assist the controller design. However, the FE model requires large amount of CPU time to simulate each brush design and operating scenario, which may affect its applications in a real-time system. This study develops a mathematical regression model to summarize the FE modeled results. The complex brush load characteristic curves were statistically analyzed to quantify the effects of cross-section, length, mounting angle, displacement and rotational speed etc. The data were then fitted by a multiple variable regression model using the maximum likelihood method. The fitted results showed good agreement with the FE analysis results and experimental results, suggesting that the mathematical regression model may be directly used in a real-time system to predict characteristics of different brushes under varying operating conditions. The methodology may also be used in the design and optimization of rotary brush tools. Copyright © 2015 Elsevier Ltd. All rights reserved.
Ye, Jiang-Feng; Zhao, Yu-Xin; Ju, Jian; Wang, Wei
2017-10-01
To discuss the value of the Bedside Index for Severity in Acute Pancreatitis (BISAP), Modified Early Warning Score (MEWS), serum Ca2+, similarly hereinafter, and red cell distribution width (RDW) for predicting the severity grade of acute pancreatitis and to develop and verify a more accurate scoring system to predict the severity of AP. In 302 patients with AP, we calculated BISAP and MEWS scores and conducted regression analyses on the relationships of BISAP scoring, RDW, MEWS, and serum Ca2+ with the severity of AP using single-factor logistics. The variables with statistical significance in the single-factor logistic regression were used in a multi-factor logistic regression model; forward stepwise regression was used to screen variables and build a multi-factor prediction model. A receiver operating characteristic curve (ROC curve) was constructed, and the significance of multi- and single-factor prediction models in predicting the severity of AP using the area under the ROC curve (AUC) was evaluated. The internal validity of the model was verified through bootstrapping. Among 302 patients with AP, 209 had mild acute pancreatitis (MAP) and 93 had severe acute pancreatitis (SAP). According to single-factor logistic regression analysis, we found that BISAP, MEWS and serum Ca2+ are prediction indexes of the severity of AP (P-value<0.001), whereas RDW is not a prediction index of AP severity (P-value>0.05). The multi-factor logistic regression analysis showed that BISAP and serum Ca2+ are independent prediction indexes of AP severity (P-value<0.001), and MEWS is not an independent prediction index of AP severity (P-value>0.05); BISAP is negatively related to serum Ca2+ (r=-0.330, P-value<0.001). The constructed model is as follows: ln()=7.306+1.151*BISAP-4.516*serum Ca2+. The predictive ability of each model for SAP follows the order of the combined BISAP and serum Ca2+ prediction model>Ca2+>BISAP. There is no statistical significance for the predictive ability of BISAP and serum Ca2+ (P-value>0.05); however, there is remarkable statistical significance for the predictive ability using the newly built prediction model as well as BISAP and serum Ca2+ individually (P-value<0.01). Verification of the internal validity of the models by bootstrapping is favorable. BISAP and serum Ca2+ have high predictive value for the severity of AP. However, the model built by combining BISAP and serum Ca2+ is remarkably superior to those of BISAP and serum Ca2+ individually. Furthermore, this model is simple, practical and appropriate for clinical use. Copyright © 2016. Published by Elsevier Masson SAS.
Sheehan, Kenneth R.; Strager, Michael P.; Welsh, Stuart A.
2013-01-01
Stream habitat assessments are commonplace in fish management, and often involve nonspatial analysis methods for quantifying or predicting habitat, such as ordinary least squares regression (OLS). Spatial relationships, however, often exist among stream habitat variables. For example, water depth, water velocity, and benthic substrate sizes within streams are often spatially correlated and may exhibit spatial nonstationarity or inconsistency in geographic space. Thus, analysis methods should address spatial relationships within habitat datasets. In this study, OLS and a recently developed method, geographically weighted regression (GWR), were used to model benthic substrate from water depth and water velocity data at two stream sites within the Greater Yellowstone Ecosystem. For data collection, each site was represented by a grid of 0.1 m2 cells, where actual values of water depth, water velocity, and benthic substrate class were measured for each cell. Accuracies of regressed substrate class data by OLS and GWR methods were calculated by comparing maps, parameter estimates, and determination coefficient r 2. For analysis of data from both sites, Akaike’s Information Criterion corrected for sample size indicated the best approximating model for the data resulted from GWR and not from OLS. Adjusted r 2 values also supported GWR as a better approach than OLS for prediction of substrate. This study supports GWR (a spatial analysis approach) over nonspatial OLS methods for prediction of habitat for stream habitat assessments.
Aqil, Muhammad; Kita, Ichiro; Yano, Akira; Nishiyama, Soichi
2007-10-01
Traditionally, the multiple linear regression technique has been one of the most widely used models in simulating hydrological time series. However, when the nonlinear phenomenon is significant, the multiple linear will fail to develop an appropriate predictive model. Recently, neuro-fuzzy systems have gained much popularity for calibrating the nonlinear relationships. This study evaluated the potential of a neuro-fuzzy system as an alternative to the traditional statistical regression technique for the purpose of predicting flow from a local source in a river basin. The effectiveness of the proposed identification technique was demonstrated through a simulation study of the river flow time series of the Citarum River in Indonesia. Furthermore, in order to provide the uncertainty associated with the estimation of river flow, a Monte Carlo simulation was performed. As a comparison, a multiple linear regression analysis that was being used by the Citarum River Authority was also examined using various statistical indices. The simulation results using 95% confidence intervals indicated that the neuro-fuzzy model consistently underestimated the magnitude of high flow while the low and medium flow magnitudes were estimated closer to the observed data. The comparison of the prediction accuracy of the neuro-fuzzy and linear regression methods indicated that the neuro-fuzzy approach was more accurate in predicting river flow dynamics. The neuro-fuzzy model was able to improve the root mean square error (RMSE) and mean absolute percentage error (MAPE) values of the multiple linear regression forecasts by about 13.52% and 10.73%, respectively. Considering its simplicity and efficiency, the neuro-fuzzy model is recommended as an alternative tool for modeling of flow dynamics in the study area.
NASA Astrophysics Data System (ADS)
Stigter, T. Y.; Ribeiro, L.; Dill, A. M. M. Carvalho
2008-07-01
SummaryFactorial regression models, based on correspondence analysis, are built to explain the high nitrate concentrations in groundwater beneath an agricultural area in the south of Portugal, exceeding 300 mg/l, as a function of chemical variables, electrical conductivity (EC), land use and hydrogeological setting. Two important advantages of the proposed methodology are that qualitative parameters can be involved in the regression analysis and that multicollinearity is avoided. Regression is performed on eigenvectors extracted from the data similarity matrix, the first of which clearly reveals the impact of agricultural practices and hydrogeological setting on the groundwater chemistry of the study area. Significant correlation exists between response variable NO3- and explanatory variables Ca 2+, Cl -, SO42-, depth to water, aquifer media and land use. Substituting Cl - by the EC results in the most accurate regression model for nitrate, when disregarding the four largest outliers (model A). When built solely on land use and hydrogeological setting, the regression model (model B) is less accurate but more interesting from a practical viewpoint, as it is based on easily obtainable data and can be used to predict nitrate concentrations in groundwater in other areas with similar conditions. This is particularly useful for conservative contaminants, where risk and vulnerability assessment methods, based on assumed rather than established correlations, generally produce erroneous results. Another purpose of the models can be to predict the future evolution of nitrate concentrations under influence of changes in land use or fertilization practices, which occur in compliance with policies such as the Nitrates Directive. Model B predicts a 40% decrease in nitrate concentrations in groundwater of the study area, when horticulture is replaced by other land use with much lower fertilization and irrigation rates.
Model parameter uncertainty analysis for an annual field-scale P loss model
NASA Astrophysics Data System (ADS)
Bolster, Carl H.; Vadas, Peter A.; Boykin, Debbie
2016-08-01
Phosphorous (P) fate and transport models are important tools for developing and evaluating conservation practices aimed at reducing P losses from agricultural fields. Because all models are simplifications of complex systems, there will exist an inherent amount of uncertainty associated with their predictions. It is therefore important that efforts be directed at identifying, quantifying, and communicating the different sources of model uncertainties. In this study, we conducted an uncertainty analysis with the Annual P Loss Estimator (APLE) model. Our analysis included calculating parameter uncertainties and confidence and prediction intervals for five internal regression equations in APLE. We also estimated uncertainties of the model input variables based on values reported in the literature. We then predicted P loss for a suite of fields under different management and climatic conditions while accounting for uncertainties in the model parameters and inputs and compared the relative contributions of these two sources of uncertainty to the overall uncertainty associated with predictions of P loss. Both the overall magnitude of the prediction uncertainties and the relative contributions of the two sources of uncertainty varied depending on management practices and field characteristics. This was due to differences in the number of model input variables and the uncertainties in the regression equations associated with each P loss pathway. Inspection of the uncertainties in the five regression equations brought attention to a previously unrecognized limitation with the equation used to partition surface-applied fertilizer P between leaching and runoff losses. As a result, an alternate equation was identified that provided similar predictions with much less uncertainty. Our results demonstrate how a thorough uncertainty and model residual analysis can be used to identify limitations with a model. Such insight can then be used to guide future data collection and model development and evaluation efforts.
ERIC Educational Resources Information Center
Herreid, Charlene H.; Miller, Thomas E.
2009-01-01
This article is the fourth in a series of articles describing an attrition prediction and intervention project at the University of South Florida (USF) in Tampa. In this article, the researchers describe the updated version of the prediction model. The original model was developed from a sample of about 900 First Time in College (FTIC) students…
Burnout does not help predict depression among French school teachers.
Bianchi, Renzo; Schonfeld, Irvin Sam; Laurent, Eric
2015-11-01
Burnout has been viewed as a phase in the development of depression. However, supportive research is scarce. We examined whether burnout predicted depression among French school teachers. We conducted a 2-wave, 21-month study involving 627 teachers (73% female) working in French primary and secondary schools. Burnout was assessed with the Maslach Burnout Inventory and depression with the 9-item depression module of the Patient Health Questionnaire (PHQ-9). The PHQ-9 grades depressive symptom severity and provides a provisional diagnosis of major depression. Depression was treated both as a continuous and categorical variable using linear and logistic regression analyses. We controlled for gender, age, and length of employment. Controlling for baseline depressive symptoms, linear regression analysis showed that burnout symptoms at time 1 (T1) did not predict depressive symptoms at time 2 (T2). Baseline depressive symptoms accounted for about 88% of the association between T1 burnout and T2 depressive symptoms. Only baseline depressive symptoms predicted depressive symptoms at follow-up. Similarly, logistic regression analysis revealed that burnout symptoms at T1 did not predict incident cases of major depression at T2 when depressive symptoms at T1 were included in the predictive model. Only baseline depressive symptoms predicted cases of major depression at follow-up. This study does not support the view that burnout is a phase in the development of depression. Assessing burnout symptoms in addition to "classical" depressive symptoms may not always improve our ability to predict future depression.
Fakayode, Sayo O; Mitchell, Breanna S; Pollard, David A
2014-08-01
Accurate understanding of analyte boiling points (BP) is of critical importance in gas chromatographic (GC) separation and crude oil refinery operation in petrochemical industries. This study reported the first combined use of GC separation and partial-least-square (PLS1) multivariate regression analysis of petrochemical structural activity relationship (SAR) for accurate BP determination of two commercially available (D3710 and MA VHP) calibration gas mix samples. The results of the BP determination using PLS1 multivariate regression were further compared with the results of traditional simulated distillation method of BP determination. The developed PLS1 regression was able to correctly predict analytes BP in D3710 and MA VHP calibration gas mix samples, with a root-mean-square-%-relative-error (RMS%RE) of 6.4%, and 10.8% respectively. In contrast, the overall RMS%RE of 32.9% and 40.4%, respectively obtained for BP determination in D3710 and MA VHP using a traditional simulated distillation method were approximately four times larger than the corresponding RMS%RE of BP prediction using MRA, demonstrating the better predictive ability of MRA. The reported method is rapid, robust, and promising, and can be potentially used routinely for fast analysis, pattern recognition, and analyte BP determination in petrochemical industries. Copyright © 2014 Elsevier B.V. All rights reserved.
Prediction of the Main Engine Power of a New Container Ship at the Preliminary Design Stage
NASA Astrophysics Data System (ADS)
Cepowski, Tomasz
2017-06-01
The paper presents mathematical relationships that allow us to forecast the estimated main engine power of new container ships, based on data concerning vessels built in 2005-2015. The presented approximations allow us to estimate the engine power based on the length between perpendiculars and the number of containers the ship will carry. The approximations were developed using simple linear regression and multivariate linear regression analysis. The presented relations have practical application for estimation of container ship engine power needed in preliminary parametric design of the ship. It follows from the above that the use of multiple linear regression to predict the main engine power of a container ship brings more accurate solutions than simple linear regression.
Odontological approach to sexual dimorphism in southeastern France.
Lladeres, Emilie; Saliba-Serre, Bérengère; Sastre, Julien; Foti, Bruno; Tardivo, Delphine; Adalian, Pascal
2013-01-01
The aim of this study was to establish a prediction formula to allow for the determination of sex among the southeastern French population using dental measurements. The sample consisted of 105 individuals (57 males and 48 females, aged between 18 and 25 years). Dental measurements were calculated using Euclidean distances, in three-dimensional space, from point coordinates obtained by a Microscribe. A multiple logistic regression analysis was performed to establish the prediction formula. Among 12 selected dental distances, a stepwise logistic regression analysis highlighted the two most significant discriminate predictors of sex: one located at the mandible and the other at the maxilla. A cutpoint was proposed to prediction of true sex. The prediction formula was then tested on a validation sample (20 males and 34 females, aged between 18 and 62 years and with a history of orthodontics or restorative care) to evaluate the accuracy of the method. © 2012 American Academy of Forensic Sciences.
Multiple linear regression models are often used to predict levels of fecal indicator bacteria (FIB) in recreational swimming waters based on independent variables (IVs) such as meteorologic, hydrodynamic, and water-quality measures. The IVs used for these analyses are traditiona...
Brown, C. Erwin
1993-01-01
Correlation analysis in conjunction with principal-component and multiple-regression analyses were applied to laboratory chemical and petrographic data to assess the usefulness of these techniques in evaluating selected physical and hydraulic properties of carbonate-rock aquifers in central Pennsylvania. Correlation and principal-component analyses were used to establish relations and associations among variables, to determine dimensions of property variation of samples, and to filter the variables containing similar information. Principal-component and correlation analyses showed that porosity is related to other measured variables and that permeability is most related to porosity and grain size. Four principal components are found to be significant in explaining the variance of data. Stepwise multiple-regression analysis was used to see how well the measured variables could predict porosity and (or) permeability for this suite of rocks. The variation in permeability and porosity is not totally predicted by the other variables, but the regression is significant at the 5% significance level. ?? 1993.
Hossain, Md Golam; Saw, Aik; Alam, Rashidul; Ohtsuki, Fumio; Kamarul, Tunku
2013-09-01
Cephalic index (CI), the ratio of head breadth to head length, is widely used to categorise human populations. The aim of this study was to access the impact of anthropometric measurements on the CI of male Japanese university students. This study included 1,215 male university students from Tokyo and Kyoto, selected using convenient sampling. Multiple regression analysis was used to determine the effect of anthropometric measurements on CI. The variance inflation factor (VIF) showed no evidence of a multicollinearity problem among independent variables. The coefficients of the regression line demonstrated a significant positive relationship between CI and minimum frontal breadth (p < 0.01), bizygomatic breadth (p < 0.01) and head height (p < 0.05), and a negative relationship between CI and morphological facial height (p < 0.01) and head circumference (p < 0.01). Moreover, the coefficient and odds ratio of logistic regression analysis showed a greater likelihood for minimum frontal breadth (p < 0.01) and bizygomatic breadth (p < 0.01) to predict round-headedness, and morphological facial height (p < 0.05) and head circumference (p < 0.01) to predict long-headedness. Stepwise regression analysis revealed bizygomatic breadth, head circumference, minimum frontal breadth, head height and morphological facial height to be the best predictor craniofacial measurements with respect to CI. The results suggest that most of the variables considered in this study appear to influence the CI of adult male Japanese students.
ERIC Educational Resources Information Center
Miner, Claire Usher; Osborne, W. Larry; Jaeger, Richard M.
1997-01-01
Uses regression analysis on career development measures to examine whether career maturity indicators are predictive of interest consistency, differentiation, and score elevation. Results indicate that interest consistency and score elevation were weakly predicted by the measure; no relationship existed between the attitudinal and cognitive…
Artificial Neural Networks: A New Approach to Predicting Application Behavior.
ERIC Educational Resources Information Center
Gonzalez, Julie M. Byers; DesJardins, Stephen L.
2002-01-01
Applied the technique of artificial neural networks to predict which students were likely to apply to one research university. Compared the results to the traditional analysis tool, logistic regression modeling. Found that the addition of artificial intelligence models was a useful new tool for predicting student application behavior. (EV)
ERIC Educational Resources Information Center
Keegan, John P.; Chan, Fong; Ditchman, Nicole; Chiu, Chung-Yi
2012-01-01
The main objective of this study was to validate Pender's Health Promotion Model (HPM) as a motivational model for exercise/physical activity self-management for people with spinal cord injuries (SCIs). Quantitative descriptive research design using hierarchical regression analysis (HRA) was used. A total of 126 individuals with SCI were recruited…
Bahouth, George; Digges, Kennerly; Schulman, Carl
2012-01-01
This paper presents methods to estimate crash injury risk based on crash characteristics captured by some passenger vehicles equipped with Advanced Automatic Crash Notification technology. The resulting injury risk estimates could be used within an algorithm to optimize rescue care. Regression analysis was applied to the National Automotive Sampling System / Crashworthiness Data System (NASS/CDS) to determine how variations in a specific injury risk threshold would influence the accuracy of predicting crashes with serious injuries. The recommended thresholds for classifying crashes with severe injuries are 0.10 for frontal crashes and 0.05 for side crashes. The regression analysis of NASS/CDS indicates that these thresholds will provide sensitivity above 0.67 while maintaining a positive predictive value in the range of 0.20. PMID:23169132
Fusion of multiscale wavelet-based fractal analysis on retina image for stroke prediction.
Che Azemin, M Z; Kumar, Dinesh K; Wong, T Y; Wang, J J; Kawasaki, R; Mitchell, P; Arjunan, Sridhar P
2010-01-01
In this paper, we present a novel method of analyzing retinal vasculature using Fourier Fractal Dimension to extract the complexity of the retinal vasculature enhanced at different wavelet scales. Logistic regression was used as a fusion method to model the classifier for 5-year stroke prediction. The efficacy of this technique has been tested using standard pattern recognition performance evaluation, Receivers Operating Characteristics (ROC) analysis and medical prediction statistics, odds ratio. Stroke prediction model was developed using the proposed system.
Replica analysis of overfitting in regression models for time-to-event data
NASA Astrophysics Data System (ADS)
Coolen, A. C. C.; Barrett, J. E.; Paga, P.; Perez-Vicente, C. J.
2017-09-01
Overfitting, which happens when the number of parameters in a model is too large compared to the number of data points available for determining these parameters, is a serious and growing problem in survival analysis. While modern medicine presents us with data of unprecedented dimensionality, these data cannot yet be used effectively for clinical outcome prediction. Standard error measures in maximum likelihood regression, such as p-values and z-scores, are blind to overfitting, and even for Cox’s proportional hazards model (the main tool of medical statisticians), one finds in literature only rules of thumb on the number of samples required to avoid overfitting. In this paper we present a mathematical theory of overfitting in regression models for time-to-event data, which aims to increase our quantitative understanding of the problem and provide practical tools with which to correct regression outcomes for the impact of overfitting. It is based on the replica method, a statistical mechanical technique for the analysis of heterogeneous many-variable systems that has been used successfully for several decades in physics, biology, and computer science, but not yet in medical statistics. We develop the theory initially for arbitrary regression models for time-to-event data, and verify its predictions in detail for the popular Cox model.
Test anxiety and academic performance in chiropractic students.
Zhang, Niu; Henderson, Charles N R
2014-01-01
Objective : We assessed the level of students' test anxiety, and the relationship between test anxiety and academic performance. Methods : We recruited 166 third-quarter students. The Test Anxiety Inventory (TAI) was administered to all participants. Total scores from written examinations and objective structured clinical examinations (OSCEs) were used as response variables. Results : Multiple regression analysis shows that there was a modest, but statistically significant negative correlation between TAI scores and written exam scores, but not OSCE scores. Worry and emotionality were the best predictive models for written exam scores. Mean total anxiety and emotionality scores for females were significantly higher than those for males, but not worry scores. Conclusion : Moderate-to-high test anxiety was observed in 85% of the chiropractic students examined. However, total test anxiety, as measured by the TAI score, was a very weak predictive model for written exam performance. Multiple regression analysis demonstrated that replacing total anxiety (TAI) with worry and emotionality (TAI subscales) produces a much more effective predictive model of written exam performance. Sex, age, highest current academic degree, and ethnicity contributed little additional predictive power in either regression model. Moreover, TAI scores were not found to be statistically significant predictors of physical exam skill performance, as measured by OSCEs.
A model for national outcome audit in vascular surgery.
Prytherch, D R; Ridler, B M; Beard, J D; Earnshaw, J J
2001-06-01
The aim was to model vascular surgical outcome in a national study using POSSUM scoring. One hundred and twenty-one British and Irish surgeons completed data questionnaires on patients undergoing arterial surgery under their care (mean 12 patients, range 1-49) in May/June 1998. A total of 1480 completed data records were available for logistic regression analysis using P-POSSUM methodology. Information collected included all POSSUM data items plus other factors thought to have a significant bearing on patient outcome: "extra items". The main outcome measures were death and major postoperative complications. The data were checked and inconsistent records were excluded. The remaining 1313 were divided into two sets for analysis. The first "training" set was used to obtain logistic regression models that were applied prospectively to the second "test" dataset. using POSSUM data items alone, it was possible to predict both mortality and morbidity after vascular reconstruction using P-POSSUM analysis. The addition of the "extra items" found significant in regression analysis did not significantly improve the accuracy of prediction. It was possible to predict both mortality and morbidity derived from the preoperative physiology components of the POSSUM data items alone. this study has shown that P-POSSUM methodology can be used to predict outcome after arterial surgery across a range of surgeons in different hospitals and could form the basis of a national outcome audit. It was also possible to obtain accurate models for both mortality and major morbidity from the POSSUM physiology scores alone. Copyright 2001 Harcourt Publishers Limited.
Aaron Weiskittel; Jereme Frank; David Walker; Phil Radtke; David Macfarlane; James Westfall
2015-01-01
Prediction of forest biomass and carbon is becoming important issues in the United States. However, estimating forest biomass and carbon is difficult and relies on empirically-derived regression equations. Based on recent findings from a national gap analysis and comprehensive assessment of the USDA Forest Service Forest Inventory and Analysis (USFS-FIA) component...
ESTER HYDROLYSIS RATE CONSTANT PREDICTION FROM INFRARED INTERFEROGRAMS
A method for predicting reactivity parameters of organic chemicals from spectroscopic data is being developed to assist in assessing the environmental fate of pollutants. he prototype system, which employs multiple linear regression analysis using selected points from the Fourier...
Undergraduate Student Motivation in Modularized Developmental Mathematics Courses
ERIC Educational Resources Information Center
Pachlhofer, Keith A.
2017-01-01
This study used the Motivated Strategies for Learning Questionnaire in modularized courses at three institutions across the nation (N = 189), and multiple regression was completed to investigate five categories of student motivation that predicted academic success and course completion. The overall multiple regression analysis was significant and…
Jaime-Pérez, José Carlos; Jiménez-Castillo, Raúl Alberto; Vázquez-Hernández, Karina Elizabeth; Salazar-Riojas, Rosario; Méndez-Ramírez, Nereida; Gómez-Almaguer, David
2017-10-01
Advances in automated cell separators have improved the efficiency of plateletpheresis and the possibility of obtaining double products (DP). We assessed cell processor accuracy of predicted platelet (PLT) yields with the goal of a better prediction of DP collections. This retrospective proof-of-concept study included 302 plateletpheresis procedures performed on a Trima Accel v6.0 at the apheresis unit of a hematology department. Donor variables, software predicted yield and actual PLT yield were statistically evaluated. Software prediction was optimized by linear regression analysis and its optimal cut-off to obtain a DP assessed by receiver operating characteristic curve (ROC) modeling. Three hundred and two plateletpheresis procedures were performed; in 271 (89.7%) occasions, donors were men and in 31 (10.3%) women. Pre-donation PLT count had the best direct correlation with actual PLT yield (r = 0.486. P < .001). Means of software machine-derived values differed significantly from actual PLT yield, 4.72 × 10 11 vs.6.12 × 10 11 , respectively, (P < .001). The following equation was developed to adjust these values: actual PLT yield= 0.221 + (1.254 × theoretical platelet yield). ROC curve model showed an optimal apheresis device software prediction cut-off of 4.65 × 10 11 to obtain a DP, with a sensitivity of 82.2%, specificity of 93.3%, and an area under the curve (AUC) of 0.909. Trima Accel v6.0 software consistently underestimated PLT yields. Simple correction derived from linear regression analysis accurately corrected this underestimation and ROC analysis identified a precise cut-off to reliably predict a DP. © 2016 Wiley Periodicals, Inc.
Kim, Sun Mi; Kim, Yongdai; Jeong, Kuhwan; Jeong, Heeyeong; Kim, Jiyoung
2018-01-01
The aim of this study was to compare the performance of image analysis for predicting breast cancer using two distinct regression models and to evaluate the usefulness of incorporating clinical and demographic data (CDD) into the image analysis in order to improve the diagnosis of breast cancer. This study included 139 solid masses from 139 patients who underwent a ultrasonography-guided core biopsy and had available CDD between June 2009 and April 2010. Three breast radiologists retrospectively reviewed 139 breast masses and described each lesion using the Breast Imaging Reporting and Data System (BI-RADS) lexicon. We applied and compared two regression methods-stepwise logistic (SL) regression and logistic least absolute shrinkage and selection operator (LASSO) regression-in which the BI-RADS descriptors and CDD were used as covariates. We investigated the performances of these regression methods and the agreement of radiologists in terms of test misclassification error and the area under the curve (AUC) of the tests. Logistic LASSO regression was superior (P<0.05) to SL regression, regardless of whether CDD was included in the covariates, in terms of test misclassification errors (0.234 vs. 0.253, without CDD; 0.196 vs. 0.258, with CDD) and AUC (0.785 vs. 0.759, without CDD; 0.873 vs. 0.735, with CDD). However, it was inferior (P<0.05) to the agreement of three radiologists in terms of test misclassification errors (0.234 vs. 0.168, without CDD; 0.196 vs. 0.088, with CDD) and the AUC without CDD (0.785 vs. 0.844, P<0.001), but was comparable to the AUC with CDD (0.873 vs. 0.880, P=0.141). Logistic LASSO regression based on BI-RADS descriptors and CDD showed better performance than SL in predicting the presence of breast cancer. The use of CDD as a supplement to the BI-RADS descriptors significantly improved the prediction of breast cancer using logistic LASSO regression.
Nateghi, Roshanak; Guikema, Seth D; Quiring, Steven M
2011-12-01
This article compares statistical methods for modeling power outage durations during hurricanes and examines the predictive accuracy of these methods. Being able to make accurate predictions of power outage durations is valuable because the information can be used by utility companies to plan their restoration efforts more efficiently. This information can also help inform customers and public agencies of the expected outage times, enabling better collective response planning, and coordination of restoration efforts for other critical infrastructures that depend on electricity. In the long run, outage duration estimates for future storm scenarios may help utilities and public agencies better allocate risk management resources to balance the disruption from hurricanes with the cost of hardening power systems. We compare the out-of-sample predictive accuracy of five distinct statistical models for estimating power outage duration times caused by Hurricane Ivan in 2004. The methods compared include both regression models (accelerated failure time (AFT) and Cox proportional hazard models (Cox PH)) and data mining techniques (regression trees, Bayesian additive regression trees (BART), and multivariate additive regression splines). We then validate our models against two other hurricanes. Our results indicate that BART yields the best prediction accuracy and that it is possible to predict outage durations with reasonable accuracy. © 2011 Society for Risk Analysis.
Tighe, Elizabeth L.; Schatschneider, Christopher
2015-01-01
The purpose of this study was to investigate the joint and unique contributions of morphological awareness and vocabulary knowledge at five reading comprehension levels in Adult Basic Education (ABE) students. We introduce the statistical technique of multiple quantile regression, which enabled us to assess the predictive utility of morphological awareness and vocabulary knowledge at multiple points (quantiles) along the continuous distribution of reading comprehension. To demonstrate the efficacy of our multiple quantile regression analysis, we compared and contrasted our results with a traditional multiple regression analytic approach. Our results indicated that morphological awareness and vocabulary knowledge accounted for a large portion of the variance (82-95%) in reading comprehension skills across all quantiles. Morphological awareness exhibited the greatest unique predictive ability at lower levels of reading comprehension whereas vocabulary knowledge exhibited the greatest unique predictive ability at higher levels of reading comprehension. These results indicate the utility of using multiple quantile regression to assess trajectories of component skills across multiple levels of reading comprehension. The implications of our findings for ABE programs are discussed. PMID:25351773
A Comparative Study of Pairwise Learning Methods Based on Kernel Ridge Regression.
Stock, Michiel; Pahikkala, Tapio; Airola, Antti; De Baets, Bernard; Waegeman, Willem
2018-06-12
Many machine learning problems can be formulated as predicting labels for a pair of objects. Problems of that kind are often referred to as pairwise learning, dyadic prediction, or network inference problems. During the past decade, kernel methods have played a dominant role in pairwise learning. They still obtain a state-of-the-art predictive performance, but a theoretical analysis of their behavior has been underexplored in the machine learning literature. In this work we review and unify kernel-based algorithms that are commonly used in different pairwise learning settings, ranging from matrix filtering to zero-shot learning. To this end, we focus on closed-form efficient instantiations of Kronecker kernel ridge regression. We show that independent task kernel ridge regression, two-step kernel ridge regression, and a linear matrix filter arise naturally as a special case of Kronecker kernel ridge regression, implying that all these methods implicitly minimize a squared loss. In addition, we analyze universality, consistency, and spectral filtering properties. Our theoretical results provide valuable insights into assessing the advantages and limitations of existing pairwise learning methods.
NASA Astrophysics Data System (ADS)
Abunama, Taher; Othman, Faridah
2017-06-01
Analysing the fluctuations of wastewater inflow rates in sewage treatment plants (STPs) is essential to guarantee a sufficient treatment of wastewater before discharging it to the environment. The main objectives of this study are to statistically analyze and forecast the wastewater inflow rates into the Bandar Tun Razak STP in Kuala Lumpur, Malaysia. A time series analysis of three years’ weekly influent data (156weeks) has been conducted using the Auto-Regressive Integrated Moving Average (ARIMA) model. Various combinations of ARIMA orders (p, d, q) have been tried to select the most fitted model, which was utilized to forecast the wastewater inflow rates. The linear regression analysis was applied to testify the correlation between the observed and predicted influents. ARIMA (3, 1, 3) model was selected with the highest significance R-square and lowest normalized Bayesian Information Criterion (BIC) value, and accordingly the wastewater inflow rates were forecasted to additional 52weeks. The linear regression analysis between the observed and predicted values of the wastewater inflow rates showed a positive linear correlation with a coefficient of 0.831.
Ning, Xiaohui; Ye, Xuerui; Si, Yanhua; Yang, Zihe; Zhao, Yunzi; Sun, Qi; Chen, Ruohan; Tang, Min; Chen, Keping; Zhang, Xiaoli; Zhang, Shu
2018-03-21
We investigated the prevalence of ventricular tachycardia/ventricular fibrillation (VT/VF) in Post-infarction left ventricular aneurysm (PI-LVA) patients and analyze clinical outcomes in patients presenting with VT/VF. 575 PI-LVA patients were enrolled and investigated by logistic regression analysis. Patients with VT/VF were followed up, the composite primary endpoint was cardiac death and appropriate ICD/external shocks. The incidence of sustained VT/VF was 11%. Logistical regression analysis showed male gender, enlarged LV end diastolic diameter (LVEDD) and higher NYHA class were correlated with VT/VF development. During follow up of 46 ± 15 months, 19 out of 62(31%) patients reached study end point. Multivariate Cox regression analysis revealed that enlarged LVEDD and moderate/severe mitral regurgitation (MR) were independently predictive of clinical outcome. Male gender, enlarged LVEDD and higher NYHA class associated with risk of sustained VT/VF in PI-LVA patients. Among VT/VF positive patients, enlarged LVEDD and moderate/severe MR independently predicted poor clinical prognosis. Copyright © 2018. Published by Elsevier Inc.
Neck-focused panic attacks among Cambodian refugees; a logistic and linear regression analysis.
Hinton, Devon E; Chhean, Dara; Pich, Vuth; Um, Khin; Fama, Jeanne M; Pollack, Mark H
2006-01-01
Consecutive Cambodian refugees attending a psychiatric clinic were assessed for the presence and severity of current--i.e., at least one episode in the last month--neck-focused panic. Among the whole sample (N=130), in a logistic regression analysis, the Anxiety Sensitivity Index (ASI; odds ratio=3.70) and the Clinician-Administered PTSD Scale (CAPS; odds ratio=2.61) significantly predicted the presence of current neck panic (NP). Among the neck panic patients (N=60), in the linear regression analysis, NP severity was significantly predicted by NP-associated flashbacks (beta=.42), NP-associated catastrophic cognitions (beta=.22), and CAPS score (beta=.28). Further analysis revealed the effect of the CAPS score to be significantly mediated (Sobel test [Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173-1182]) by both NP-associated flashbacks and catastrophic cognitions. In the care of traumatized Cambodian refugees, NP severity, as well as NP-associated flashbacks and catastrophic cognitions, should be specifically assessed and treated.
Dimensions of Intuition: First-Round Validation Studies
ERIC Educational Resources Information Center
Vrugtman, Rosanne
2009-01-01
This study utilized confirmatory factor analysis (CFA), canonical correlation analysis (CCA), regression analysis (RA), and correlation analysis (CA) for first-round validation of the researcher's Dimensions of Intuition (DOI) instrument. The DOI examined 25 personal characteristics and situations purportedly predictive of intuition. Data was…
Caballero, Daniel; Antequera, Teresa; Caro, Andrés; Ávila, María Del Mar; G Rodríguez, Pablo; Perez-Palacios, Trinidad
2017-07-01
Magnetic resonance imaging (MRI) combined with computer vision techniques have been proposed as an alternative or complementary technique to determine the quality parameters of food in a non-destructive way. The aim of this work was to analyze the sensory attributes of dry-cured loins using this technique. For that, different MRI acquisition sequences (spin echo, gradient echo and turbo 3D), algorithms for MRI analysis (GLCM, NGLDM, GLRLM and GLCM-NGLDM-GLRLM) and predictive data mining techniques (multiple linear regression and isotonic regression) were tested. The correlation coefficient (R) and mean absolute error (MAE) were used to validate the prediction results. The combination of spin echo, GLCM and isotonic regression produced the most accurate results. In addition, the MRI data from dry-cured loins seems to be more suitable than the data from fresh loins. The application of predictive data mining techniques on computational texture features from the MRI data of loins enables the determination of the sensory traits of dry-cured loins in a non-destructive way. © 2016 Society of Chemical Industry. © 2016 Society of Chemical Industry.
Kennedy, Jeffrey R.; Paretti, Nicholas V.
2014-01-01
Flooding in urban areas routinely causes severe damage to property and often results in loss of life. To investigate the effect of urbanization on the magnitude and frequency of flood peaks, a flood frequency analysis was carried out using data from urbanized streamgaging stations in Phoenix and Tucson, Arizona. Flood peaks at each station were predicted using the log-Pearson Type III distribution, fitted using the expected moments algorithm and the multiple Grubbs-Beck low outlier test. The station estimates were then compared to flood peaks estimated by rural-regression equations for Arizona, and to flood peaks adjusted for urbanization using a previously developed procedure for adjusting U.S. Geological Survey rural regression peak discharges in an urban setting. Only smaller, more common flood peaks at the 50-, 20-, 10-, and 4-percent annual exceedance probabilities (AEPs) demonstrate any increase in magnitude as a result of urbanization; the 1-, 0.5-, and 0.2-percent AEP flood estimates are predicted without bias by the rural-regression equations. Percent imperviousness was determined not to account for the difference in estimated flood peaks between stations, either when adjusting the rural-regression equations or when deriving urban-regression equations to predict flood peaks directly from basin characteristics. Comparison with urban adjustment equations indicates that flood peaks are systematically overestimated if the rural-regression-estimated flood peaks are adjusted upward to account for urbanization. At nearly every streamgaging station in the analysis, adjusted rural-regression estimates were greater than the estimates derived using station data. One likely reason for the lack of increase in flood peaks with urbanization is the presence of significant stormwater retention and detention structures within the watershed used in the study.
Wheat flour dough Alveograph characteristics predicted by Mixolab regression models.
Codină, Georgiana Gabriela; Mironeasa, Silvia; Mironeasa, Costel; Popa, Ciprian N; Tamba-Berehoiu, Radiana
2012-02-01
In Romania, the Alveograph is the most used device to evaluate the rheological properties of wheat flour dough, but lately the Mixolab device has begun to play an important role in the breadmaking industry. These two instruments are based on different principles but there are some correlations that can be found between the parameters determined by the Mixolab and the rheological properties of wheat dough measured with the Alveograph. Statistical analysis on 80 wheat flour samples using the backward stepwise multiple regression method showed that Mixolab values using the ‘Chopin S’ protocol (40 samples) and ‘Chopin + ’ protocol (40 samples) can be used to elaborate predictive models for estimating the value of the rheological properties of wheat dough: baking strength (W), dough tenacity (P) and extensibility (L). The correlation analysis confirmed significant findings (P < 0.05 and P < 0.01) between the parameters of wheat dough studied by the Mixolab and its rheological properties measured with the Alveograph. A number of six predictive linear equations were obtained. Linear regression models gave multiple regression coefficients with R²(adjusted) > 0.70 for P, R²(adjusted) > 0.70 for W and R²(adjusted) > 0.38 for L, at a 95% confidence interval. Copyright © 2011 Society of Chemical Industry.
Regression and multivariate models for predicting particulate matter concentration level.
Nazif, Amina; Mohammed, Nurul Izma; Malakahmad, Amirhossein; Abualqumboz, Motasem S
2018-01-01
The devastating health effects of particulate matter (PM 10 ) exposure by susceptible populace has made it necessary to evaluate PM 10 pollution. Meteorological parameters and seasonal variation increases PM 10 concentration levels, especially in areas that have multiple anthropogenic activities. Hence, stepwise regression (SR), multiple linear regression (MLR) and principal component regression (PCR) analyses were used to analyse daily average PM 10 concentration levels. The analyses were carried out using daily average PM 10 concentration, temperature, humidity, wind speed and wind direction data from 2006 to 2010. The data was from an industrial air quality monitoring station in Malaysia. The SR analysis established that meteorological parameters had less influence on PM 10 concentration levels having coefficient of determination (R 2 ) result from 23 to 29% based on seasoned and unseasoned analysis. While, the result of the prediction analysis showed that PCR models had a better R 2 result than MLR methods. The results for the analyses based on both seasoned and unseasoned data established that MLR models had R 2 result from 0.50 to 0.60. While, PCR models had R 2 result from 0.66 to 0.89. In addition, the validation analysis using 2016 data also recognised that the PCR model outperformed the MLR model, with the PCR model for the seasoned analysis having the best result. These analyses will aid in achieving sustainable air quality management strategies.
Lo, Benjamin W Y; Fukuda, Hitoshi; Angle, Mark; Teitelbaum, Jeanne; Macdonald, R Loch; Farrokhyar, Forough; Thabane, Lehana; Levine, Mitchell A H
2016-01-01
Classification and regression tree analysis involves the creation of a decision tree by recursive partitioning of a dataset into more homogeneous subgroups. Thus far, there is scarce literature on using this technique to create clinical prediction tools for aneurysmal subarachnoid hemorrhage (SAH). The classification and regression tree analysis technique was applied to the multicenter Tirilazad database (3551 patients) in order to create the decision-making algorithm. In order to elucidate prognostic subgroups in aneurysmal SAH, neurologic, systemic, and demographic factors were taken into account. The dependent variable used for analysis was the dichotomized Glasgow Outcome Score at 3 months. Classification and regression tree analysis revealed seven prognostic subgroups. Neurological grade, occurrence of post-admission stroke, occurrence of post-admission fever, and age represented the explanatory nodes of this decision tree. Split sample validation revealed classification accuracy of 79% for the training dataset and 77% for the testing dataset. In addition, the occurrence of fever at 1-week post-aneurysmal SAH is associated with increased odds of post-admission stroke (odds ratio: 1.83, 95% confidence interval: 1.56-2.45, P < 0.01). A clinically useful classification tree was generated, which serves as a prediction tool to guide bedside prognostication and clinical treatment decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH.
A multilateral modelling of Youth Soccer Performance Index (YSPI)
NASA Astrophysics Data System (ADS)
Bisyri Husin Musawi Maliki, Ahmad; Razali Abdullah, Mohamad; Juahir, Hafizan; Abdullah, Farhana; Ain Shahirah Abdullah, Nurul; Muazu Musa, Rabiu; Musliha Mat-Rasid, Siti; Adnan, Aleesha; Azura Kosni, Norlaila; Muhamad, Wan Siti Amalina Wan; Afiqah Mohamad Nasir, Nur
2018-04-01
This study aims to identify the most dominant factors that influencing performance of soccer player and to predict group performance for soccer players. A total of 184 of youth soccer players from Malaysia sport school and six soccer academy encompasses as respondence of the study. Exploratory factor analysis (EFA) and Confirmatory factor analysis (CFA) were computed to identify the most dominant factors whereas reducing the initial 26 parameters with recommended >0.5 of factor loading. Meanwhile, prediction of the soccer performance was predicted by regression model. CFA revealed that sit and reach, vertical jump, VO2max, age, weight, height, sitting height, calf circumference (cc), medial upper arm circumference (muac), maturation, bicep, triceps, subscapular, suprailiac, 5M, 10M, and 20M speed were the most dominant factors. Further index analysis forming Youth Soccer Performance Index (YSPI) resulting by categorizing three groups namely, high, moderate, and low. The regression model for this study was significant set as p < 0.001 and R2 is 0.8222 which explained that the model contributed a total of 82% prediction ability to predict the whole set of the variables. The significant parameters in contributing prediction of YSPI are discussed. As a conclusion, the precision of the prediction models by integrating a multilateral factor reflecting for predicting potential soccer player and hopefully can create a competitive soccer games.
Technical note: A linear model for predicting δ13 Cprotein.
Pestle, William J; Hubbe, Mark; Smith, Erin K; Stevenson, Joseph M
2015-08-01
Development of a model for the prediction of δ(13) Cprotein from δ(13) Ccollagen and Δ(13) Cap-co . Model-generated values could, in turn, serve as "consumer" inputs for multisource mixture modeling of paleodiet. Linear regression analysis of previously published controlled diet data facilitated the development of a mathematical model for predicting δ(13) Cprotein (and an experimentally generated error term) from isotopic data routinely generated during the analysis of osseous remains (δ(13) Cco and Δ(13) Cap-co ). Regression analysis resulted in a two-term linear model (δ(13) Cprotein (%) = (0.78 × δ(13) Cco ) - (0.58× Δ(13) Cap-co ) - 4.7), possessing a high R-value of 0.93 (r(2) = 0.86, P < 0.01), and experimentally generated error terms of ±1.9% for any predicted individual value of δ(13) Cprotein . This model was tested using isotopic data from Formative Period individuals from northern Chile's Atacama Desert. The model presented here appears to hold significant potential for the prediction of the carbon isotope signature of dietary protein using only such data as is routinely generated in the course of stable isotope analysis of human osseous remains. These predicted values are ideal for use in multisource mixture modeling of dietary protein source contribution. © 2015 Wiley Periodicals, Inc.
Liu, Shujie; Kawamoto, Taisuke; Morita, Osamu; Yoshinari, Kouichi; Honda, Hiroshi
2017-03-01
Chemical exposure often results in liver hypertrophy in animal tests, characterized by increased liver weight, hepatocellular hypertrophy, and/or cell proliferation. While most of these changes are considered adaptive responses, there is concern that they may be associated with carcinogenesis. In this study, we have employed a toxicogenomic approach using a logistic ridge regression model to identify genes responsible for liver hypertrophy and hypertrophic hepatocarcinogenesis and to develop a predictive model for assessing hypertrophy-inducing compounds. Logistic regression models have previously been used in the quantification of epidemiological risk factors. DNA microarray data from the Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System were used to identify hypertrophy-related genes that are expressed differently in hypertrophy induced by carcinogens and non-carcinogens. Data were collected for 134 chemicals (72 non-hypertrophy-inducing chemicals, 27 hypertrophy-inducing non-carcinogenic chemicals, and 15 hypertrophy-inducing carcinogenic compounds). After applying logistic ridge regression analysis, 35 genes for liver hypertrophy (e.g., Acot1 and Abcc3) and 13 genes for hypertrophic hepatocarcinogenesis (e.g., Asns and Gpx2) were selected. The predictive models built using these genes were 94.8% and 82.7% accurate, respectively. Pathway analysis of the genes indicates that, aside from a xenobiotic metabolism-related pathway as an adaptive response for liver hypertrophy, amino acid biosynthesis and oxidative responses appear to be involved in hypertrophic hepatocarcinogenesis. Early detection and toxicogenomic characterization of liver hypertrophy using our models may be useful for predicting carcinogenesis. In addition, the identified genes provide novel insight into discrimination between adverse hypertrophy associated with carcinogenesis and adaptive hypertrophy in risk assessment. Copyright © 2017 Elsevier Inc. All rights reserved.
Adachi, Daiki; Nishiguchi, Shu; Fukutani, Naoto; Hotta, Takayuki; Tashiro, Yuto; Morino, Saori; Shirooka, Hidehiko; Nozaki, Yuma; Hirata, Hinako; Yamaguchi, Moe; Yorozu, Ayanori; Takahashi, Masaki; Aoyama, Tomoki
2017-05-01
The purpose of this study was to investigate which spatial and temporal parameters of the Timed Up and Go (TUG) test are associated with motor function in elderly individuals. This study included 99 community-dwelling women aged 72.9 ± 6.3 years. Step length, step width, single support time, variability of the aforementioned parameters, gait velocity, cadence, reaction time from starting signal to first step, and minimum distance between the foot and a marker placed to 3 in front of the chair were measured using our analysis system. The 10-m walk test, five times sit-to-stand (FTSTS) test, and one-leg standing (OLS) test were used to assess motor function. Stepwise multivariate linear regression analysis was used to determine which TUG test parameters were associated with each motor function test. Finally, we calculated a predictive model for each motor function test using each regression coefficient. In stepwise linear regression analysis, step length and cadence were significantly associated with the 10-m walk test, FTSTS and OLS test. Reaction time was associated with the FTSTS test, and step width was associated with the OLS test. Each predictive model showed a strong correlation with the 10-m walk test and OLS test (P < 0.01), which was not significant higher correlation than TUG test time. We showed which TUG test parameters were associated with each motor function test. Moreover, the TUG test time regarded as the lower extremity function and mobility has strong predictive ability in each motor function test. Copyright © 2017 The Japanese Orthopaedic Association. Published by Elsevier B.V. All rights reserved.
Yamazaki, Takeshi; Takeda, Hisato; Hagiya, Koichi; Yamaguchi, Satoshi; Sasaki, Osamu
2018-03-13
Because lactation periods in dairy cows lengthen with increasing total milk production, it is important to predict individual productivities after 305 days in milk (DIM) to determine the optimal lactation period. We therefore examined whether the random regression (RR) coefficient from 306 to 450 DIM (M2) can be predicted from those during the first 305 DIM (M1) by using a random regression model. We analyzed test-day milk records from 85690 Holstein cows in their first lactations and 131727 cows in their later (second to fifth) lactations. Data in M1 and M2 were analyzed separately by using different single-trait RR animal models. We then performed a multiple regression analysis of the RR coefficients of M2 on those of M1 during the first and later lactations. The first-order Legendre polynomials were practical covariates of random regression for the milk yields of M2. All RR coefficients for the additive genetic (AG) effect and the intercept for the permanent environmental (PE) effect of M2 had moderate to strong correlations with the intercept for the AG effect of M1. The coefficients of determination for multiple regression of the combined intercepts for the AG and PE effects of M2 on the coefficients for the AG effect of M1 were moderate to high. The daily milk yields of M2 predicted by using the RR coefficients for the AG effect of M1 were highly correlated with those obtained by using the coefficients of M2. Milk production after 305 DIM can be predicted by using the RR coefficient estimates of the AG effect during the first 305 DIM.
Estelles-Lopez, Lucia; Ropodi, Athina; Pavlidis, Dimitris; Fotopoulou, Jenny; Gkousari, Christina; Peyrodie, Audrey; Panagou, Efstathios; Nychas, George-John; Mohareb, Fady
2017-09-01
Over the past decade, analytical approaches based on vibrational spectroscopy, hyperspectral/multispectral imagining and biomimetic sensors started gaining popularity as rapid and efficient methods for assessing food quality, safety and authentication; as a sensible alternative to the expensive and time-consuming conventional microbiological techniques. Due to the multi-dimensional nature of the data generated from such analyses, the output needs to be coupled with a suitable statistical approach or machine-learning algorithms before the results can be interpreted. Choosing the optimum pattern recognition or machine learning approach for a given analytical platform is often challenging and involves a comparative analysis between various algorithms in order to achieve the best possible prediction accuracy. In this work, "MeatReg", a web-based application is presented, able to automate the procedure of identifying the best machine learning method for comparing data from several analytical techniques, to predict the counts of microorganisms responsible of meat spoilage regardless of the packaging system applied. In particularly up to 7 regression methods were applied and these are ordinary least squares regression, stepwise linear regression, partial least square regression, principal component regression, support vector regression, random forest and k-nearest neighbours. MeatReg" was tested with minced beef samples stored under aerobic and modified atmosphere packaging and analysed with electronic nose, HPLC, FT-IR, GC-MS and Multispectral imaging instrument. Population of total viable count, lactic acid bacteria, pseudomonads, Enterobacteriaceae and B. thermosphacta, were predicted. As a result, recommendations of which analytical platforms are suitable to predict each type of bacteria and which machine learning methods to use in each case were obtained. The developed system is accessible via the link: www.sorfml.com. Copyright © 2017 Elsevier Ltd. All rights reserved.
Validation of Metrics as Error Predictors
NASA Astrophysics Data System (ADS)
Mendling, Jan
In this chapter, we test the validity of metrics that were defined in the previous chapter for predicting errors in EPC business process models. In Section 5.1, we provide an overview of how the analysis data is generated. Section 5.2 describes the sample of EPCs from practice that we use for the analysis. Here we discuss a disaggregation by the EPC model group and by error as well as a correlation analysis between metrics and error. Based on this sample, we calculate a logistic regression model for predicting error probability with the metrics as input variables in Section 5.3. In Section 5.4, we then test the regression function for an independent sample of EPC models from textbooks as a cross-validation. Section 5.5 summarizes the findings.
Kempe, P T; van Oppen, P; de Haan, E; Twisk, J W R; Sluis, A; Smit, J H; van Dyck, R; van Balkom, A J L M
2007-09-01
Two methods for predicting remissions in obsessive-compulsive disorder (OCD) treatment are evaluated. Y-BOCS measurements of 88 patients with a primary OCD (DSM-III-R) diagnosis were performed over a 16-week treatment period, and during three follow-ups. Remission at any measurement was defined as a Y-BOCS score lower than thirteen combined with a reduction of seven points when compared with baseline. Logistic regression models were compared with a Cox regression for recurrent events model. Logistic regression yielded different models at different evaluation times. The recurrent events model remained stable when fewer measurements were used. Higher baseline levels of neuroticism and more severe OCD symptoms were associated with a lower chance of remission, early age of onset and more depressive symptoms with a higher chance. Choice of outcome time affects logistic regression prediction models. Recurrent events analysis uses all information on remissions and relapses. Short- and long-term predictors for OCD remission show overlap.
Liu, Fei; Ye, Lanhan; Peng, Jiyu; Song, Kunlin; Shen, Tingting; Zhang, Chu; He, Yong
2018-02-27
Fast detection of heavy metals is very important for ensuring the quality and safety of crops. Laser-induced breakdown spectroscopy (LIBS), coupled with uni- and multivariate analysis, was applied for quantitative analysis of copper in three kinds of rice (Jiangsu rice, regular rice, and Simiao rice). For univariate analysis, three pre-processing methods were applied to reduce fluctuations, including background normalization, the internal standard method, and the standard normal variate (SNV). Linear regression models showed a strong correlation between spectral intensity and Cu content, with an R 2 more than 0.97. The limit of detection (LOD) was around 5 ppm, lower than the tolerance limit of copper in foods. For multivariate analysis, partial least squares regression (PLSR) showed its advantage in extracting effective information for prediction, and its sensitivity reached 1.95 ppm, while support vector machine regression (SVMR) performed better in both calibration and prediction sets, where R c 2 and R p 2 reached 0.9979 and 0.9879, respectively. This study showed that LIBS could be considered as a constructive tool for the quantification of copper contamination in rice.
Ye, Lanhan; Song, Kunlin; Shen, Tingting
2018-01-01
Fast detection of heavy metals is very important for ensuring the quality and safety of crops. Laser-induced breakdown spectroscopy (LIBS), coupled with uni- and multivariate analysis, was applied for quantitative analysis of copper in three kinds of rice (Jiangsu rice, regular rice, and Simiao rice). For univariate analysis, three pre-processing methods were applied to reduce fluctuations, including background normalization, the internal standard method, and the standard normal variate (SNV). Linear regression models showed a strong correlation between spectral intensity and Cu content, with an R2 more than 0.97. The limit of detection (LOD) was around 5 ppm, lower than the tolerance limit of copper in foods. For multivariate analysis, partial least squares regression (PLSR) showed its advantage in extracting effective information for prediction, and its sensitivity reached 1.95 ppm, while support vector machine regression (SVMR) performed better in both calibration and prediction sets, where Rc2 and Rp2 reached 0.9979 and 0.9879, respectively. This study showed that LIBS could be considered as a constructive tool for the quantification of copper contamination in rice. PMID:29495445
Park, Ji Hyun; Kim, Hyeon-Young; Lee, Hanna; Yun, Eun Kyoung
2015-12-01
This study compares the performance of the logistic regression and decision tree analysis methods for assessing the risk factors for infection in cancer patients undergoing chemotherapy. The subjects were 732 cancer patients who were receiving chemotherapy at K university hospital in Seoul, Korea. The data were collected between March 2011 and February 2013 and were processed for descriptive analysis, logistic regression and decision tree analysis using the IBM SPSS Statistics 19 and Modeler 15.1 programs. The most common risk factors for infection in cancer patients receiving chemotherapy were identified as alkylating agents, vinca alkaloid and underlying diabetes mellitus. The logistic regression explained 66.7% of the variation in the data in terms of sensitivity and 88.9% in terms of specificity. The decision tree analysis accounted for 55.0% of the variation in the data in terms of sensitivity and 89.0% in terms of specificity. As for the overall classification accuracy, the logistic regression explained 88.0% and the decision tree analysis explained 87.2%. The logistic regression analysis showed a higher degree of sensitivity and classification accuracy. Therefore, logistic regression analysis is concluded to be the more effective and useful method for establishing an infection prediction model for patients undergoing chemotherapy. Copyright © 2015 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Rajab, Jasim M.; MatJafri, M. Z.; Lim, H. S.
2013-06-01
This study encompasses columnar ozone modelling in the peninsular Malaysia. Data of eight atmospheric parameters [air surface temperature (AST), carbon monoxide (CO), methane (CH4), water vapour (H2Ovapour), skin surface temperature (SSKT), atmosphere temperature (AT), relative humidity (RH), and mean surface pressure (MSP)] data set, retrieved from NASA's Atmospheric Infrared Sounder (AIRS), for the entire period (2003-2008) was employed to develop models to predict the value of columnar ozone (O3) in study area. The combined method, which is based on using both multiple regressions combined with principal component analysis (PCA) modelling, was used to predict columnar ozone. This combined approach was utilized to improve the prediction accuracy of columnar ozone. Separate analysis was carried out for north east monsoon (NEM) and south west monsoon (SWM) seasons. The O3 was negatively correlated with CH4, H2Ovapour, RH, and MSP, whereas it was positively correlated with CO, AST, SSKT, and AT during both the NEM and SWM season periods. Multiple regression analysis was used to fit the columnar ozone data using the atmospheric parameter's variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to acquire subsets of the predictor variables to be comprised in the linear regression model of the atmospheric parameter's variables. It was found that the increase in columnar O3 value is associated with an increase in the values of AST, SSKT, AT, and CO and with a drop in the levels of CH4, H2Ovapour, RH, and MSP. The result of fitting the best models for the columnar O3 value using eight of the independent variables gave about the same values of the R (≈0.93) and R2 (≈0.86) for both the NEM and SWM seasons. The common variables that appeared in both regression equations were SSKT, CH4 and RH, and the principal precursor of the columnar O3 value in both the NEM and SWM seasons was SSKT.
Ho, Sean Wei Loong; Tan, Teong Jin Lester; Lee, Keng Thiam
2016-03-01
To evaluate whether pre-operative anthropometric data can predict the optimal diameter and length of hamstring tendon autograft for anterior cruciate ligament (ACL) reconstruction. This was a cohort study that involved 169 patients who underwent single-bundle ACL reconstruction (single surgeon) with 4-stranded MM Gracilis and MM Semi-Tendinosus autografts. Height, weight, body mass index (BMI), gender, race, age and -smoking status were recorded pre-operatively. Intra-operatively, the diameter and functional length of the 4-stranded autograft was recorded. Multiple regression analysis was used to determine the relationship between the anthropometric measurements and the length and diameter of the implanted autografts. The strongest correlation between 4-stranded hamstring autograft diameter was height and weight. This correlation was stronger in females than males. BMI had a moderate correlation with the diameter of the graft in females. Females had a significantly smaller graft both in diameter and length when compared with males. Linear regression models did not show any significant correlation between hamstring autograft length with height and weight (p>0.05). Simple regression analysis demonstrated that height and weight can be used to predict hamstring graft diameter. The following regression equation was obtained for females: Graft diameter=0.012+0.034*Height+0.026*Weight (R2=0.358, p=0.004) The following regression equation was obtained for males: Graft diameter=5.130+0.012*Height+0.007*Weight (R2=0.086, p=0.002). Pre-operative anthropometric data has a positive correlation with the diameter of 4 stranded hamstring autografts but no significant correlation with the length. This data can be utilised to predict the autograft diameter and may be useful for pre-operative planning and patient counseling for graft selection.
Liu, Xiu-ying; Wang, Li; Chang, Qing-rui; Wang, Xiao-xing; Shang, Yan
2015-07-01
Wuqi County of Shaanxi Province, where the vegetation recovering measures have been carried out for years, was taken as the study area. A total of 100 loess samples from 24 different profiles were collected. Total nitrogen (TN) and alkali hydrolysable nitrogen (AHN) contents of the soil samples were analyzed, and the soil samples were scanned in the visible/near-infrared (VNIR) region of 350-2500 nm in the laboratory. The calibration models were developed between TN and AHN contents and VNIR values based on correlation analysis (CA) and partial least squares regression (PLS). Independent samples validated the calibration models. The results indicated that the optimum model for predicting TN of loess was established by using first derivative of reflectance. The best model for predicting AHN of loess was established by using normal derivative spectra. The optimum TN model could effectively predict TN in loess from 0 to 40 cm, but the optimum AHN model could only roughly predict AHN at the same depth. This study provided a good method for rapidly predicting TN of loess where vegetation recovering measures have been adopted, but prediction of AHN needs to be further studied.
Fu, Xia; Liang, Xinling; Song, Li; Huang, Huigen; Wang, Jing; Chen, Yuanhan; Zhang, Li; Quan, Zilin; Shi, Wei
2014-04-01
To develop a predictive model for circuit clotting in patients with continuous renal replacement therapy (CRRT). A total of 425 cases were selected. 302 cases were used to develop a predictive model of extracorporeal circuit life span during CRRT without citrate anticoagulation in 24 h, and 123 cases were used to validate the model. The prediction formula was developed using multivariate Cox proportional-hazards regression analysis, from which a risk score was assigned. The mean survival time of the circuit was 15.0 ± 1.3 h, and the rate of circuit clotting was 66.6 % during 24 h of CRRT. Five significant variables were assigned a predicting score according to the regression coefficient: insufficient blood flow, no anticoagulation, hematocrit ≥0.37, lactic acid of arterial blood gas analysis ≤3 mmol/L and APTT < 44.2 s. The Hosmer-Lemeshow test showed no significant difference between the predicted and actual circuit clotting (R (2) = 0.232; P = 0.301). A risk score that includes the five above-mentioned variables can be used to predict the likelihood of extracorporeal circuit clotting in patients undergoing CRRT.
Deckel, A W; Hesselbrock, V; Bauer, L
1995-04-01
This experiment examined the relationship between anterior brain functioning and alcohol-related expectancies. Ninety-one young men at risk for developing alcoholism were assessed on the Alcohol Expectancy Questionnaire (AEQ) and administered neuropsychological and EEG tests. Three of the scales on the AEQ, including the "Enhanced Sexual Functioning" scale, the "Increased Social Assertiveness" scale, and items from the "Global/Positive Change scale," were used, because each of these scales has been found to discriminate alcohol-based expectancies adequately by at least two separate sets of investigators. Regression analysis found that anterior neuropsychological tests (including the Wisconsin Card Sorting test, the Porteus Maze test, the Controlled Oral Word Fluency test, and the Luria-Nebraska motor functioning tests) were predictive of the AEQ scale scores on regression analysis. One of the AEQ scales, "Enhanced Sexual Functioning," was also predicted by WAIS-R-Verbal scales, whereas the "Global/Positive" AEQ scale was predicted by the WAIS-R Performance scales. Regression analysis using EEG power as predictors found that left versus right hemisphere "difference" scores obtained from frontal EEG leads were predictive of the three AEQ scales. Conversely, parietal EEG power did not significantly predict any of the expectancy scales. It is concluded that anterior brain any of the expectancy scales. It is concluded that anterior brain functioning is associated with alcohol-related expectancies. These findings suggest that alcohol-related expectancy may be, in part, biologically determined by frontal/prefrontal systems, and that dysfunctioning in these systems may serve as a risk factor for the development of alcohol-related behaviors.
Automating annotation of information-giving for analysis of clinical conversation.
Mayfield, Elijah; Laws, M Barton; Wilson, Ira B; Penstein Rosé, Carolyn
2014-02-01
Coding of clinical communication for fine-grained features such as speech acts has produced a substantial literature. However, annotation by humans is laborious and expensive, limiting application of these methods. We aimed to show that through machine learning, computers could code certain categories of speech acts with sufficient reliability to make useful distinctions among clinical encounters. The data were transcripts of 415 routine outpatient visits of HIV patients which had previously been coded for speech acts using the Generalized Medical Interaction Analysis System (GMIAS); 50 had also been coded for larger scale features using the Comprehensive Analysis of the Structure of Encounters System (CASES). We aggregated selected speech acts into information-giving and requesting, then trained the machine to automatically annotate using logistic regression classification. We evaluated reliability by per-speech act accuracy. We used multiple regression to predict patient reports of communication quality from post-visit surveys using the patient and provider information-giving to information-requesting ratio (briefly, information-giving ratio) and patient gender. Automated coding produces moderate reliability with human coding (accuracy 71.2%, κ=0.57), with high correlation between machine and human prediction of the information-giving ratio (r=0.96). The regression significantly predicted four of five patient-reported measures of communication quality (r=0.263-0.344). The information-giving ratio is a useful and intuitive measure for predicting patient perception of provider-patient communication quality. These predictions can be made with automated annotation, which is a practical option for studying large collections of clinical encounters with objectivity, consistency, and low cost, providing greater opportunity for training and reflection for care providers.
A Prediction Model for Community Colleges Using Graduation Rate as the Performance Indicator
ERIC Educational Resources Information Center
Moosai, Susan
2010-01-01
In this thesis a prediction model using graduation rate as the performance indicator is obtained for community colleges for three cohort years, 2003, 2004, and 2005 in the states of California, Florida, and Michigan. Multiple Regression analysis, using an aggregate of seven predictor variables, was employed in determining this prediction model.…
ERIC Educational Resources Information Center
Owen, Steven V.; Feldhusen, John F.
This study compares the effectiveness of three models of multivariate prediction for academic success in identifying the criterion variance of achievement in nursing education. The first model involves the use of an optimum set of predictors and one equation derived from a regression analysis on first semester grade average in predicting the…
Aghamolaei, Teamur; Sadat Tavafian, Sedigheh; Madani, Abdoulhossain
2012-09-01
This study aimed to apply the conceptual framework of the theory of planned behavior (TPB) to explain fish consumption in a sample of people who lived in Bandar Abbass, Iran. We investigated the role of three traditional constructs of TPB that included attitude, social norms, and perceived behavioral control in an effort to characterize the intention to consume fish as well as the behavioral trends that characterize fish consumption. Data were derived from a cross-sectional sample of 321 subjects. Alpha coefficient correlation and linear regression analysis were applied to test the relationships between constructs. The predictors of fish consumption frequency were also evaluated. Multiple regression analysis revealed that attitude, subjective norms, and perceived behavioral control significantly predicted intention to eat fish (R2 = 0.54, F = 128.4, P < 0.001). Multiple regression analysis for the intention to eat fish and perceived behavioral control revealed that both factors significantly predicted fish consumption frequency (R2 = 0.58, F = 223.1, P < 0.001). The results indicated that the models fit well with the data. Attitude, subjective norms, and perceived behavioral control all had significant positive impacts on behavioral intention. Moreover, both intention and perceived behavioral control could be used to predict the frequency of fish consumption.
Moro, Marilyn; Goparaju, Balaji; Castillo, Jelina; Alameddine, Yvonne; Bianchi, Matt T
2016-01-01
Introduction Periodic limb movements of sleep (PLMS) may increase cardiovascular and cerebrovascular morbidity. However, most people with PLMS are either asymptomatic or have nonspecific symptoms. Therefore, predicting elevated PLMS in the absence of restless legs syndrome remains an important clinical challenge. Methods We undertook a retrospective analysis of demographic data, subjective symptoms, and objective polysomnography (PSG) findings in a clinical cohort with or without obstructive sleep apnea (OSA) from our laboratory (n=443 with OSA, n=209 without OSA). Correlation analysis and regression modeling were performed to determine predictors of periodic limb movement index (PLMI). Markov decision analysis with TreeAge software compared strategies to detect PLMS: in-laboratory PSG, at-home testing, and a clinical prediction tool based on the regression analysis. Results Elevated PLMI values (>15 per hour) were observed in >25% of patients. PLMI values in No-OSA patients correlated with age, sex, self-reported nocturnal leg jerks, restless legs syndrome symptoms, and hypertension. In OSA patients, PLMI correlated only with age and self-reported psychiatric medications. Regression models indicated only a modest predictive value of demographics, symptoms, and clinical history. Decision modeling suggests that at-home testing is favored as the pretest probability of PLMS increases, given plausible assumptions regarding PLMS morbidity, costs, and assumed benefits of pharmacological therapy. Conclusion Although elevated PLMI values were commonly observed, routinely acquired clinical information had only weak predictive utility. As the clinical importance of elevated PLMI continues to evolve, it is likely that objective measures such as PSG or at-home PLMS monitors will prove increasingly important for clinical and research endeavors. PMID:27540316
Double Cross-Validation in Multiple Regression: A Method of Estimating the Stability of Results.
ERIC Educational Resources Information Center
Rowell, R. Kevin
In multiple regression analysis, where resulting predictive equation effectiveness is subject to shrinkage, it is especially important to evaluate result replicability. Double cross-validation is an empirical method by which an estimate of invariance or stability can be obtained from research data. A procedure for double cross-validation is…
Early Home Activities and Oral Language Skills in Middle Childhood: A Quantile Analysis
ERIC Educational Resources Information Center
Law, James; Rush, Robert; King, Tom; Westrupp, Elizabeth; Reilly, Sheena
2018-01-01
Oral language development is a key outcome of elementary school, and it is important to identify factors that predict it most effectively. Commonly researchers use ordinary least squares regression with conclusions restricted to average performance conditional on relevant covariates. Quantile regression offers a more sophisticated alternative.…
Predictors of High Profit and High Deficit Outliers under SwissDRG of a Tertiary Care Center
Mehra, Tarun; Müller, Christian Thomas Benedikt; Volbracht, Jörk; Seifert, Burkhardt; Moos, Rudolf
2015-01-01
Principles Case weights of Diagnosis Related Groups (DRGs) are determined by the average cost of cases from a previous billing period. However, a significant amount of cases are largely over- or underfunded. We therefore decided to analyze earning outliers of our hospital as to search for predictors enabling a better grouping under SwissDRG. Methods 28,893 inpatient cases without additional private insurance discharged from our hospital in 2012 were included in our analysis. Outliers were defined by the interquartile range method. Predictors for deficit and profit outliers were determined with logistic regressions. Predictors were shortlisted with the LASSO regularized logistic regression method and compared to results of Random forest analysis. 10 of these parameters were selected for quantile regression analysis as to quantify their impact on earnings. Results Psychiatric diagnosis and admission as an emergency case were significant predictors for higher deficit with negative regression coefficients for all analyzed quantiles (p<0.001). Admission from an external health care provider was a significant predictor for a higher deficit in all but the 90% quantile (p<0.001 for Q10, Q20, Q50, Q80 and p = 0.0017 for Q90). Burns predicted higher earnings for cases which were favorably remunerated (p<0.001 for the 90% quantile). Osteoporosis predicted a higher deficit in the most underfunded cases, but did not predict differences in earnings for balanced or profitable cases (Q10 and Q20: p<0.00, Q50: p = 0.10, Q80: p = 0.88 and Q90: p = 0.52). ICU stay, mechanical and patient clinical complexity level score (PCCL) predicted higher losses at the 10% quantile but also higher profits at the 90% quantile (p<0.001). Conclusion We suggest considering psychiatric diagnosis, admission as an emergencay case and admission from an external health care provider as DRG split criteria as they predict large, consistent and significant losses. PMID:26517545
Predictors of High Profit and High Deficit Outliers under SwissDRG of a Tertiary Care Center.
Mehra, Tarun; Müller, Christian Thomas Benedikt; Volbracht, Jörk; Seifert, Burkhardt; Moos, Rudolf
2015-01-01
Case weights of Diagnosis Related Groups (DRGs) are determined by the average cost of cases from a previous billing period. However, a significant amount of cases are largely over- or underfunded. We therefore decided to analyze earning outliers of our hospital as to search for predictors enabling a better grouping under SwissDRG. 28,893 inpatient cases without additional private insurance discharged from our hospital in 2012 were included in our analysis. Outliers were defined by the interquartile range method. Predictors for deficit and profit outliers were determined with logistic regressions. Predictors were shortlisted with the LASSO regularized logistic regression method and compared to results of Random forest analysis. 10 of these parameters were selected for quantile regression analysis as to quantify their impact on earnings. Psychiatric diagnosis and admission as an emergency case were significant predictors for higher deficit with negative regression coefficients for all analyzed quantiles (p<0.001). Admission from an external health care provider was a significant predictor for a higher deficit in all but the 90% quantile (p<0.001 for Q10, Q20, Q50, Q80 and p = 0.0017 for Q90). Burns predicted higher earnings for cases which were favorably remunerated (p<0.001 for the 90% quantile). Osteoporosis predicted a higher deficit in the most underfunded cases, but did not predict differences in earnings for balanced or profitable cases (Q10 and Q20: p<0.00, Q50: p = 0.10, Q80: p = 0.88 and Q90: p = 0.52). ICU stay, mechanical and patient clinical complexity level score (PCCL) predicted higher losses at the 10% quantile but also higher profits at the 90% quantile (p<0.001). We suggest considering psychiatric diagnosis, admission as an emergency case and admission from an external health care provider as DRG split criteria as they predict large, consistent and significant losses.
Classifying machinery condition using oil samples and binary logistic regression
NASA Astrophysics Data System (ADS)
Phillips, J.; Cripps, E.; Lau, John W.; Hodkiewicz, M. R.
2015-08-01
The era of big data has resulted in an explosion of condition monitoring information. The result is an increasing motivation to automate the costly and time consuming human elements involved in the classification of machine health. When working with industry it is important to build an understanding and hence some trust in the classification scheme for those who use the analysis to initiate maintenance tasks. Typically "black box" approaches such as artificial neural networks (ANN) and support vector machines (SVM) can be difficult to provide ease of interpretability. In contrast, this paper argues that logistic regression offers easy interpretability to industry experts, providing insight to the drivers of the human classification process and to the ramifications of potential misclassification. Of course, accuracy is of foremost importance in any automated classification scheme, so we also provide a comparative study based on predictive performance of logistic regression, ANN and SVM. A real world oil analysis data set from engines on mining trucks is presented and using cross-validation we demonstrate that logistic regression out-performs the ANN and SVM approaches in terms of prediction for healthy/not healthy engines.
Jansson, Bruce S; Nyamathi, Adeline; Heidemann, Gretchen; Duan, Lei; Kaplan, Charles
2015-01-01
Although literature documents the need for hospital social workers, nurses, and medical residents to engage in patient advocacy, little information exists about what predicts the extent they do so. This study aims to identify predictors of health professionals' patient advocacy engagement with respect to a broad range of patients' problems. A cross-sectional research design was employed with a sample of 94 social workers, 97 nurses, and 104 medical residents recruited from eight hospitals in Los Angeles. Bivariate correlations explored whether seven scales (Patient Advocacy Eagerness, Ethical Commitment, Skills, Tangible Support, Organizational Receptivity, Belief Other Professionals Engage, and Belief the Hospital Empowers Patients) were associated with patient advocacy engagement, measured by the validated Patient Advocacy Engagement Scale. Regression analysis examined whether these scales, when controlling for sociodemographic and setting variables, predicted patient advocacy engagement. While all seven predictor scales were significantly associated with patient advocacy engagement in correlational analyses, only Eagerness, Skills, and Belief the Hospital Empowers Patients predicted patient advocacy engagement in regression analyses. Additionally, younger professionals engaged in higher levels of patient advocacy than older professionals, and social workers engaged in greater patient advocacy than nurses. Limitations and the utility of these findings for acute-care hospitals are discussed.
Carnahan, Brian; Meyer, Gérard; Kuntz, Lois-Ann
2003-01-01
Multivariate classification models play an increasingly important role in human factors research. In the past, these models have been based primarily on discriminant analysis and logistic regression. Models developed from machine learning research offer the human factors professional a viable alternative to these traditional statistical classification methods. To illustrate this point, two machine learning approaches--genetic programming and decision tree induction--were used to construct classification models designed to predict whether or not a student truck driver would pass his or her commercial driver license (CDL) examination. The models were developed and validated using the curriculum scores and CDL exam performances of 37 student truck drivers who had completed a 320-hr driver training course. Results indicated that the machine learning classification models were superior to discriminant analysis and logistic regression in terms of predictive accuracy. Actual or potential applications of this research include the creation of models that more accurately predict human performance outcomes.
Fowler, Stephanie M; Ponnampalam, Eric N; Schmidt, Heinar; Wynn, Peter; Hopkins, David L
2015-12-01
A hand held Raman spectroscopic device was used to predict intramuscular fat (IMF) levels and the major fatty acid (FA) groups of fresh intact ovine M. longissimus lumborum (LL). IMF levels were determined using the Soxhlet method, while FA analysis was conducted using a rapid (KOH in water, methanol and sulphuric acid in water) extraction procedure. IMF levels and FA values were regressed against Raman spectra using partial least squares regression and against each other using linear regression. The results indicate that there is potential to predict PUFA (R(2)=0.93) and MUFA (R(2)=0.54) as well as SFA values that had been adjusted for IMF content (R(2)=0.54). However, this potential was significantly reduced when correlations between predicted and observed values were determined by cross validation (R(2)cv=0.21-0.00). Overall, the prediction of major FA groups using Raman spectra was more precise (relative reductions in error of 0.3-40.8%) compared to the null models. Copyright © 2015 Elsevier Ltd. All rights reserved.
Kondo, M; Nagao, Y; Mahbub, M H; Tanabe, T; Tanizawa, Y
2018-04-29
To identify factors predicting early postpartum glucose intolerance in Japanese women with gestational diabetes mellitus, using decision-curve analysis. A retrospective cohort study was performed. The participants were 123 Japanese women with gestational diabetes who underwent 75-g oral glucose tolerance tests at 8-12 weeks after delivery. They were divided into a glucose intolerance and a normal glucose tolerance group based on postpartum oral glucose tolerance test results. Analysis of the pregnancy oral glucose tolerance test results showed predictive factors for postpartum glucose intolerance. We also evaluated the clinical usefulness of the prediction model based on decision-curve analysis. Of 123 women, 78 (63.4%) had normoglycaemia and 45 (36.6%) had glucose intolerance. Multivariable logistic regression analysis showed insulinogenic index/fasting immunoreactive insulin and summation of glucose levels, assessed during pregnancy oral glucose tolerance tests (total glucose), to be independent risk factors for postpartum glucose intolerance. Evaluating the regression models, the best discrimination (area under the curve 0.725) was obtained using the basic model (i.e. age, family history of diabetes, BMI ≥25 kg/m 2 and use of insulin during pregnancy) plus insulinogenic index/fasting immunoreactive insulin <1.1. Decision-curve analysis showed that combining insulinogenic index/fasting immunoreactive insulin <1.1 with basic clinical information resulted in superior net benefits for prediction of postpartum glucose intolerance. Insulinogenic index/fasting immunoreactive insulin calculated using oral glucose tolerance test results during pregnancy is potentially useful for predicting early postpartum glucose intolerance in Japanese women with gestational diabetes. © 2018 Diabetes UK.
Thakar, Sumit; Sivaraju, Laxminadh; Jacob, Kuruthukulangara S; Arun, Aditya Atal; Aryan, Saritha; Mohan, Dilip; Sai Kiran, Narayanam Anantha; Hegde, Alangar S
2018-01-01
OBJECTIVE Although various predictors of postoperative outcome have been previously identified in patients with Chiari malformation Type I (CMI) with syringomyelia, there is no known algorithm for predicting a multifactorial outcome measure in this widely studied disorder. Using one of the largest preoperative variable arrays used so far in CMI research, the authors attempted to generate a formula for predicting postoperative outcome. METHODS Data from the clinical records of 82 symptomatic adult patients with CMI and altered hindbrain CSF flow who were managed with foramen magnum decompression, C-1 laminectomy, and duraplasty over an 8-year period were collected and analyzed. Various preoperative clinical and radiological variables in the 57 patients who formed the study cohort were assessed in a bivariate analysis to determine their ability to predict clinical outcome (as measured on the Chicago Chiari Outcome Scale [CCOS]) and the resolution of syrinx at the last follow-up. The variables that were significant in the bivariate analysis were further analyzed in a multiple linear regression analysis. Different regression models were tested, and the model with the best prediction of CCOS was identified and internally validated in a subcohort of 25 patients. RESULTS There was no correlation between CCOS score and syrinx resolution (p = 0.24) at a mean ± SD follow-up of 40.29 ± 10.36 months. Multiple linear regression analysis revealed that the presence of gait instability, obex position, and the M-line-fourth ventricle vertex (FVV) distance correlated with CCOS score, while the presence of motor deficits was associated with poor syrinx resolution (p ≤ 0.05). The algorithm generated from the regression model demonstrated good diagnostic accuracy (area under curve 0.81), with a score of more than 128 points demonstrating 100% specificity for clinical improvement (CCOS score of 11 or greater). The model had excellent reliability (κ = 0.85) and was validated with fair accuracy in the validation cohort (area under the curve 0.75). CONCLUSIONS The presence of gait imbalance and motor deficits independently predict worse clinical and radiological outcomes, respectively, after decompressive surgery for CMI with altered hindbrain CSF flow. Caudal displacement of the obex and a shorter M-line-FVV distance correlated with good CCOS scores, indicating that patients with a greater degree of hindbrain pathology respond better to surgery. The proposed points-based algorithm has good predictive value for postoperative multifactorial outcome in these patients.
Ramasubramanian, Viswanathan; Glasser, Adrian
2015-01-01
PURPOSE To determine whether relatively low-resolution ultrasound biomicroscopy (UBM) can predict the accommodative optical response in prepresbyopic eyes as well as in a previous study of young phakic subjects, despite lower accommodative amplitudes. SETTING College of Optometry, University of Houston, Houston, USA. DESIGN Observational cross-sectional study. METHODS Static accommodative optical response was measured with infrared photorefraction and an autorefractor (WR-5100K) in subjects aged 36 to 46 years. A 35 MHz UBM device (Vumax, Sonomed Escalon) was used to image the left eye, while the right eye viewed accommodative stimuli. Custom-developed Matlab image-analysis software was used to perform automated analysis of UBM images to measure the ocular biometry parameters. The accommodative optical response was predicted from biometry parameters using linear regression, 95% confidence intervals (CIs), and 95% prediction intervals. RESULTS The study evaluated 25 subjects. Per-diopter (D) accommodative changes in anterior chamber depth (ACD), lens thickness, anterior and posterior lens radii of curvature, and anterior segment length were similar to previous values from young subjects. The standard deviations (SDs) of accommodative optical response predicted from linear regressions for UBM-measured biometry parameters were ACD, 0.15 D; lens thickness, 0.25 D; anterior lens radii of curvature, 0.09 D; posterior lens radii of curvature, 0.37 D; and anterior segment length, 0.42 D. CONCLUSIONS Ultrasound biomicroscopy parameters can, on average, predict accommodative optical response with SDs of less than 0.55 D using linear regressions and 95% CIs. Ultrasound biomicroscopy can be used to visualize and quantify accommodative biometric changes and predict accommodative optical response in prepresbyopic eyes. PMID:26049831
Vathsangam, Harshvardhan; Emken, Adar; Schroeder, E. Todd; Spruijt-Metz, Donna; Sukhatme, Gaurav S.
2011-01-01
This paper describes an experimental study in estimating energy expenditure from treadmill walking using a single hip-mounted triaxial inertial sensor comprised of a triaxial accelerometer and a triaxial gyroscope. Typical physical activity characterization using accelerometer generated counts suffers from two drawbacks - imprecison (due to proprietary counts) and incompleteness (due to incomplete movement description). We address these problems in the context of steady state walking by directly estimating energy expenditure with data from a hip-mounted inertial sensor. We represent the cyclic nature of walking with a Fourier transform of sensor streams and show how one can map this representation to energy expenditure (as measured by V O2 consumption, mL/min) using three regression techniques - Least Squares Regression (LSR), Bayesian Linear Regression (BLR) and Gaussian Process Regression (GPR). We perform a comparative analysis of the accuracy of sensor streams in predicting energy expenditure (measured by RMS prediction accuracy). Triaxial information is more accurate than uniaxial information. LSR based approaches are prone to outlier sensitivity and overfitting. Gyroscopic information showed equivalent if not better prediction accuracy as compared to accelerometers. Combining accelerometer and gyroscopic information provided better accuracy than using either sensor alone. We also analyze the best algorithmic approach among linear and nonlinear methods as measured by RMS prediction accuracy and run time. Nonlinear regression methods showed better prediction accuracy but required an order of magnitude of run time. This paper emphasizes the role of probabilistic techniques in conjunction with joint modeling of triaxial accelerations and rotational rates to improve energy expenditure prediction for steady-state treadmill walking. PMID:21690001
A 3-Year Study of Predictive Factors for Positive and Negative Appendicectomies.
Chang, Dwayne T S; Maluda, Melissa; Lee, Lisa; Premaratne, Chandrasiri; Khamhing, Srisongham
2018-03-06
Early and accurate identification or exclusion of acute appendicitis is the key to avoid the morbidity of delayed treatment for true appendicitis or unnecessary appendicectomy, respectively. We aim (i) to identify potential predictive factors for positive and negative appendicectomies; and (ii) to analyse the use of ultrasound scans (US) and computed tomography (CT) scans for acute appendicitis. All appendicectomies that took place at our hospital from the 1st of January 2013 to the 31st of December 2015 were retrospectively recorded. Test results of potential predictive factors of acute appendicitis were recorded. Statistical analysis was performed using Fisher exact test, logistic regression analysis, sensitivity, specificity, and positive and negative predictive values calculation. 208 patients were included in this study. 184 patients had histologically proven acute appendicitis. The other 24 patients had either nonappendicitis pathology or normal appendix. Logistic regression analysis showed statistically significant associations between appendicitis and white cell count, neutrophil count, C-reactive protein, and bilirubin. Neutrophil count was the test with the highest sensitivity and negative predictive values, whereas bilirubin was the test with the highest specificity and positive predictive values (PPV). US and CT scans had high sensitivity and PPV for diagnosing appendicitis. No single test was sufficient to diagnose or exclude acute appendicitis by itself. Combining tests with high sensitivity (abnormal neutrophil count, and US and CT scans) and high specificity (raised bilirubin) may predict acute appendicitis more accurately.
NASA Technical Reports Server (NTRS)
Roth, D. J.; Swickard, S. M.; Stang, D. B.; Deguire, M. R.
1991-01-01
A review and statistical analysis of the ultrasonic velocity method for estimating the porosity fraction in polycrystalline materials is presented. Initially, a semiempirical model is developed showing the origin of the linear relationship between ultrasonic velocity and porosity fraction. Then, from a compilation of data produced by many researchers, scatter plots of velocity versus percent porosity data are shown for Al2O3, MgO, porcelain-based ceramics, PZT, SiC, Si3N4, steel, tungsten, UO2,(U0.30Pu0.70)C, and YBa2Cu3O(7-x). Linear regression analysis produces predicted slope, intercept, correlation coefficient, level of significance, and confidence interval statistics for the data. Velocity values predicted from regression analysis of fully-dense materials are in good agreement with those calculated from elastic properties.
NASA Technical Reports Server (NTRS)
Roth, D. J.; Swickard, S. M.; Stang, D. B.; Deguire, M. R.
1990-01-01
A review and statistical analysis of the ultrasonic velocity method for estimating the porosity fraction in polycrystalline materials is presented. Initially, a semi-empirical model is developed showing the origin of the linear relationship between ultrasonic velocity and porosity fraction. Then, from a compilation of data produced by many researchers, scatter plots of velocity versus percent porosity data are shown for Al2O3, MgO, porcelain-based ceramics, PZT, SiC, Si3N4, steel, tungsten, UO2,(U0.30Pu0.70)C, and YBa2Cu3O(7-x). Linear regression analysis produced predicted slope, intercept, correlation coefficient, level of significance, and confidence interval statistics for the data. Velocity values predicted from regression analysis for fully-dense materials are in good agreement with those calculated from elastic properties.
Taxi-Out Time Prediction for Departures at Charlotte Airport Using Machine Learning Techniques
NASA Technical Reports Server (NTRS)
Lee, Hanbong; Malik, Waqar; Jung, Yoon C.
2016-01-01
Predicting the taxi-out times of departures accurately is important for improving airport efficiency and takeoff time predictability. In this paper, we attempt to apply machine learning techniques to actual traffic data at Charlotte Douglas International Airport for taxi-out time prediction. To find the key factors affecting aircraft taxi times, surface surveillance data is first analyzed. From this data analysis, several variables, including terminal concourse, spot, runway, departure fix and weight class, are selected for taxi time prediction. Then, various machine learning methods such as linear regression, support vector machines, k-nearest neighbors, random forest, and neural networks model are applied to actual flight data. Different traffic flow and weather conditions at Charlotte airport are also taken into account for more accurate prediction. The taxi-out time prediction results show that linear regression and random forest techniques can provide the most accurate prediction in terms of root-mean-square errors. We also discuss the operational complexity and uncertainties that make it difficult to predict the taxi times accurately.
ERIC Educational Resources Information Center
Miller, T. E.; Herreid, C. H.
2008-01-01
This article presents a project intended to produce a model for predicting the risk of attrition of individual students enrolled at the University of South Florida. The project is premised upon the principle that college student attrition is as highly individual and personal as any other aspect of the college-going experience. Students make…
Likhvantseva, V G; Sokolov, V A; Levanova, O N; Kovelenova, I V
2018-01-01
Prediction of the clinical course of primary open-angle glaucoma (POAG) is one of the main directions in solving the problem of vision loss prevention and stabilization of the pathological process. Simple statistical methods of correlation analysis show the extent of each risk factor's impact, but do not indicate the total impact of these factors in personalized combinations. The relationships between the risk factors is subject to correlation and regression analysis. The regression equation represents the dependence of the mathematical expectation of the resulting sign on the combination of factor signs. To develop a technique for predicting the probability of development and progression of primary open-angle glaucoma based on a personalized combination of risk factors by linear multivariate regression analysis. The study included 66 patients (23 female and 43 male; 132 eyes) with newly diagnosed primary open-angle glaucoma. The control group consisted of 14 patients (8 male and 6 female). Standard ophthalmic examination was supplemented with biochemical study of lacrimal fluid. Concentration of matrix metalloproteinase MMP-2 and MMP-9 in tear fluid in both eyes was determined using 'sandwich' enzyme-linked immunosorbent assay (ELISA) method. The study resulted in the development of regression equations and step-by-step multivariate logistic models that can help calculate the risk of development and progression of POAG. Those models are based on expert evaluation of clinical and instrumental indicators of hydrodynamic disturbances (coefficient of outflow ease - C, volume of intraocular fluid secretion - F, fluctuation of intraocular pressure), as well as personalized morphometric parameters of the retina (central retinal thickness in the macular area) and concentration of MMP-2 and MMP-9 in the tear film. The newly developed regression equations are highly informative and can be a reliable tool for studying of the influence vector and assessment of pathogenic potential of the independent risk factors in specific personalized combinations.
Prediction of anthropometric foot characteristics in children.
Morrison, Stewart C; Durward, Brian R; Watt, Gordon F; Donaldson, Malcolm D C
2009-01-01
The establishment of growth reference values is needed in pediatric practice where pathologic conditions can have a detrimental effect on the growth and development of the pediatric foot. This study aims to use multiple regression to evaluate the effects of multiple predictor variables (height, age, body mass, and gender) on anthropometric characteristics of the peripubescent foot. Two hundred children aged 9 to 12 years were recruited, and three anthropometric measurements of the pediatric foot were recorded (foot length, forefoot width, and navicular height). Multiple regression analysis was conducted, and coefficients for gender, height, and body mass all had significant relationships for the prediction of forefoot width and foot length (P < or = .05, r > or = 0.7). The coefficients for gender and body mass were not significant for the prediction of navicular height (P > or = .05), whereas height was (P < or = .05). Normative growth reference values and prognostic regression equations are presented for the peripubescent foot.
Clustering performance comparison using K-means and expectation maximization algorithms.
Jung, Yong Gyu; Kang, Min Soo; Heo, Jun
2014-11-14
Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K -means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K -means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.
Speech prosody impairment predicts cognitive decline in Parkinson's disease.
Rektorova, Irena; Mekyska, Jiri; Janousova, Eva; Kostalova, Milena; Eliasova, Ilona; Mrackova, Martina; Berankova, Dagmar; Necasova, Tereza; Smekal, Zdenek; Marecek, Radek
2016-08-01
Impairment of speech prosody is characteristic for Parkinson's disease (PD) and does not respond well to dopaminergic treatment. We assessed whether baseline acoustic parameters, alone or in combination with other predominantly non-dopaminergic symptoms may predict global cognitive decline as measured by the Addenbrooke's cognitive examination (ACE-R) and/or worsening of cognitive status as assessed by a detailed neuropsychological examination. Forty-four consecutive non-depressed PD patients underwent clinical and cognitive testing, and acoustic voice analysis at baseline and at the two-year follow-up. Influence of speech and other clinical parameters on worsening of the ACE-R and of the cognitive status was analyzed using linear and logistic regression. The cognitive status (classified as normal cognition, mild cognitive impairment and dementia) deteriorated in 25% of patients during the follow-up. The multivariate linear regression model consisted of the variation in range of the fundamental voice frequency (F0VR) and the REM Sleep Behavioral Disorder Screening Questionnaire (RBDSQ). These parameters explained 37.2% of the variability of the change in ACE-R. The most significant predictors in the univariate logistic regression were the speech index of rhythmicity (SPIR; p = 0.012), disease duration (p = 0.019), and the RBDSQ (p = 0.032). The multivariate regression analysis revealed that SPIR alone led to 73.2% accuracy in predicting a change in cognitive status. Combining SPIR with RBDSQ improved the prediction accuracy of SPIR alone by 7.3%. Impairment of speech prosody together with symptoms of RBD predicted rapid cognitive decline and worsening of PD cognitive status during a two-year period. Copyright © 2016 Elsevier Ltd. All rights reserved.
A Comparison between Multiple Regression Models and CUN-BAE Equation to Predict Body Fat in Adults
Fuster-Parra, Pilar; Bennasar-Veny, Miquel; Tauler, Pedro; Yañez, Aina; López-González, Angel A.; Aguiló, Antoni
2015-01-01
Background Because the accurate measure of body fat (BF) is difficult, several prediction equations have been proposed. The aim of this study was to compare different multiple regression models to predict BF, including the recently reported CUN-BAE equation. Methods Multi regression models using body mass index (BMI) and body adiposity index (BAI) as predictors of BF will be compared. These models will be also compared with the CUN-BAE equation. For all the analysis a sample including all the participants and another one including only the overweight and obese subjects will be considered. The BF reference measure was made using Bioelectrical Impedance Analysis. Results The simplest models including only BMI or BAI as independent variables showed that BAI is a better predictor of BF. However, adding the variable sex to both models made BMI a better predictor than the BAI. For both the whole group of participants and the group of overweight and obese participants, using simple models (BMI, age and sex as variables) allowed obtaining similar correlations with BF as when the more complex CUN-BAE was used (ρ = 0:87 vs. ρ = 0:86 for the whole sample and ρ = 0:88 vs. ρ = 0:89 for overweight and obese subjects, being the second value the one for CUN-BAE). Conclusions There are simpler models than CUN-BAE equation that fits BF as well as CUN-BAE does. Therefore, it could be considered that CUN-BAE overfits. Using a simple linear regression model, the BAI, as the only variable, predicts BF better than BMI. However, when the sex variable is introduced, BMI becomes the indicator of choice to predict BF. PMID:25821960
A comparison between multiple regression models and CUN-BAE equation to predict body fat in adults.
Fuster-Parra, Pilar; Bennasar-Veny, Miquel; Tauler, Pedro; Yañez, Aina; López-González, Angel A; Aguiló, Antoni
2015-01-01
Because the accurate measure of body fat (BF) is difficult, several prediction equations have been proposed. The aim of this study was to compare different multiple regression models to predict BF, including the recently reported CUN-BAE equation. Multi regression models using body mass index (BMI) and body adiposity index (BAI) as predictors of BF will be compared. These models will be also compared with the CUN-BAE equation. For all the analysis a sample including all the participants and another one including only the overweight and obese subjects will be considered. The BF reference measure was made using Bioelectrical Impedance Analysis. The simplest models including only BMI or BAI as independent variables showed that BAI is a better predictor of BF. However, adding the variable sex to both models made BMI a better predictor than the BAI. For both the whole group of participants and the group of overweight and obese participants, using simple models (BMI, age and sex as variables) allowed obtaining similar correlations with BF as when the more complex CUN-BAE was used (ρ = 0:87 vs. ρ = 0:86 for the whole sample and ρ = 0:88 vs. ρ = 0:89 for overweight and obese subjects, being the second value the one for CUN-BAE). There are simpler models than CUN-BAE equation that fits BF as well as CUN-BAE does. Therefore, it could be considered that CUN-BAE overfits. Using a simple linear regression model, the BAI, as the only variable, predicts BF better than BMI. However, when the sex variable is introduced, BMI becomes the indicator of choice to predict BF.
Cox regression analysis with missing covariates via nonparametric multiple imputation.
Hsu, Chiu-Hsieh; Yu, Mandi
2018-01-01
We consider the situation of estimating Cox regression in which some covariates are subject to missing, and there exists additional information (including observed event time, censoring indicator and fully observed covariates) which may be predictive of the missing covariates. We propose to use two working regression models: one for predicting the missing covariates and the other for predicting the missing probabilities. For each missing covariate observation, these two working models are used to define a nearest neighbor imputing set. This set is then used to non-parametrically impute covariate values for the missing observation. Upon the completion of imputation, Cox regression is performed on the multiply imputed datasets to estimate the regression coefficients. In a simulation study, we compare the nonparametric multiple imputation approach with the augmented inverse probability weighted (AIPW) method, which directly incorporates the two working models into estimation of Cox regression, and the predictive mean matching imputation (PMM) method. We show that all approaches can reduce bias due to non-ignorable missing mechanism. The proposed nonparametric imputation method is robust to mis-specification of either one of the two working models and robust to mis-specification of the link function of the two working models. In contrast, the PMM method is sensitive to misspecification of the covariates included in imputation. The AIPW method is sensitive to the selection probability. We apply the approaches to a breast cancer dataset from Surveillance, Epidemiology and End Results (SEER) Program.
Tighe, Elizabeth L; Schatschneider, Christopher
2016-07-01
The purpose of this study was to investigate the joint and unique contributions of morphological awareness and vocabulary knowledge at five reading comprehension levels in adult basic education (ABE) students. We introduce the statistical technique of multiple quantile regression, which enabled us to assess the predictive utility of morphological awareness and vocabulary knowledge at multiple points (quantiles) along the continuous distribution of reading comprehension. To demonstrate the efficacy of our multiple quantile regression analysis, we compared and contrasted our results with a traditional multiple regression analytic approach. Our results indicated that morphological awareness and vocabulary knowledge accounted for a large portion of the variance (82%-95%) in reading comprehension skills across all quantiles. Morphological awareness exhibited the greatest unique predictive ability at lower levels of reading comprehension whereas vocabulary knowledge exhibited the greatest unique predictive ability at higher levels of reading comprehension. These results indicate the utility of using multiple quantile regression to assess trajectories of component skills across multiple levels of reading comprehension. The implications of our findings for ABE programs are discussed. © Hammill Institute on Disabilities 2014.
Hoos, Anne B.; Patel, Anant R.
1996-01-01
Model-adjustment procedures were applied to the combined data bases of storm-runoff quality for Chattanooga, Knoxville, and Nashville, Tennessee, to improve predictive accuracy for storm-runoff quality for urban watersheds in these three cities and throughout Middle and East Tennessee. Data for 45 storms at 15 different sites (five sites in each city) constitute the data base. Comparison of observed values of storm-runoff load and event-mean concentration to the predicted values from the regional regression models for 10 constituents shows prediction errors, as large as 806,000 percent. Model-adjustment procedures, which combine the regional model predictions with local data, are applied to improve predictive accuracy. Standard error of estimate after model adjustment ranges from 67 to 322 percent. Calibration results may be biased due to sampling error in the Tennessee data base. The relatively large values of standard error of estimate for some of the constituent models, although representing significant reduction (at least 50 percent) in prediction error compared to estimation with unadjusted regional models, may be unacceptable for some applications. The user may wish to collect additional local data for these constituents and repeat the analysis, or calibrate an independent local regression model.
Muhlestein, Whitney E; Akagi, Dallin S; Kallos, Justiss A; Morone, Peter J; Weaver, Kyle D; Thompson, Reid C; Chambless, Lola B
2018-04-01
Objective Machine learning (ML) algorithms are powerful tools for predicting patient outcomes. This study pilots a novel approach to algorithm selection and model creation using prediction of discharge disposition following meningioma resection as a proof of concept. Materials and Methods A diversity of ML algorithms were trained on a single-institution database of meningioma patients to predict discharge disposition. Algorithms were ranked by predictive power and top performers were combined to create an ensemble model. The final ensemble was internally validated on never-before-seen data to demonstrate generalizability. The predictive power of the ensemble was compared with a logistic regression. Further analyses were performed to identify how important variables impact the ensemble. Results Our ensemble model predicted disposition significantly better than a logistic regression (area under the curve of 0.78 and 0.71, respectively, p = 0.01). Tumor size, presentation at the emergency department, body mass index, convexity location, and preoperative motor deficit most strongly influence the model, though the independent impact of individual variables is nuanced. Conclusion Using a novel ML technique, we built a guided ML ensemble model that predicts discharge destination following meningioma resection with greater predictive power than a logistic regression, and that provides greater clinical insight than a univariate analysis. These techniques can be extended to predict many other patient outcomes of interest.
Predictors of Early Termination in a University Counseling Training Clinic
ERIC Educational Resources Information Center
Lampropoulos, Georgios K.; Schneider, Mercedes K.; Spengler, Paul M.
2009-01-01
Despite the existence of counseling dropout research, there are limited predictive data for counseling in training clinics. Potential predictor variables were investigated in this archival study of 380 client files in a university counseling training clinic. Multinomial logistic regression, predictive discriminant analysis, and classification and…
Bark analysis as a guide to cassava nutrition in Sierra Leone
DOE Office of Scientific and Technical Information (OSTI.GOV)
Godfrey-Sam-Aggrey, W.; Garber, M.J.
1979-01-01
Cassava main stem barks from two experiments in which similar fertilizers were applied directly in a 2/sup 5/ confounded factorial design were analyzed and the bark nutrients used as a guide to cassava nutrition. The application of multiple regression analysis to the respective root yields and bark nutrient concentrations enable nutrient levels and optimum adjusted root yields to be derived. Differences in bark nutrient concentrations reflected soil fertility levels. Bark analysis and the application of multiple regression analysis to root yields and bark nutrients appear to be useful tools for predicting fertilizer recommendations for cassava production.
Regression Analysis of Stage Variability for West-Central Florida Lakes
Sacks, Laura A.; Ellison, Donald L.; Swancar, Amy
2008-01-01
The variability in a lake's stage depends upon many factors, including surface-water flows, meteorological conditions, and hydrogeologic characteristics near the lake. An understanding of the factors controlling lake-stage variability for a population of lakes may be helpful to water managers who set regulatory levels for lakes. The goal of this study is to determine whether lake-stage variability can be predicted using multiple linear regression and readily available lake and basin characteristics defined for each lake. Regressions were evaluated for a recent 10-year period (1996-2005) and for a historical 10-year period (1954-63). Ground-water pumping is considered to have affected stage at many of the 98 lakes included in the recent period analysis, and not to have affected stage at the 20 lakes included in the historical period analysis. For the recent period, regression models had coefficients of determination (R2) values ranging from 0.60 to 0.74, and up to five explanatory variables. Standard errors ranged from 21 to 37 percent of the average stage variability. Net leakage was the most important explanatory variable in regressions describing the full range and low range in stage variability for the recent period. The most important explanatory variable in the model predicting the high range in stage variability was the height over median lake stage at which surface-water outflow would occur. Other explanatory variables in final regression models for the recent period included the range in annual rainfall for the period and several variables related to local and regional hydrogeology: (1) ground-water pumping within 1 mile of each lake, (2) the amount of ground-water inflow (by category), (3) the head gradient between the lake and the Upper Floridan aquifer, and (4) the thickness of the intermediate confining unit. Many of the variables in final regression models are related to hydrogeologic characteristics, underscoring the importance of ground-water exchange in controlling the stage of karst lakes in Florida. Regression equations were used to predict lake-stage variability for the recent period for 12 additional lakes, and the median difference between predicted and observed values ranged from 11 to 23 percent. Coefficients of determination for the historical period were considerably lower (maximum R2 of 0.28) than for the recent period. Reasons for these low R2 values are probably related to the small number of lakes (20) with stage data for an equivalent time period that were unaffected by ground-water pumping, the similarity of many of the lake types (large surface-water drainage lakes), and the greater uncertainty in defining historical basin characteristics. The lack of lake-stage data unaffected by ground-water pumping and the poor regression results obtained for that group of lakes limit the ability to predict natural lake-stage variability using this method in west-central Florida.
NASA Technical Reports Server (NTRS)
Wilson, R. M.; Reichmann, E. J.; Teuber, D. L.
1984-01-01
An empirical method is developed to predict certain parameters of future solar activity cycles. Sunspot cycle statistics are examined, and curve fitting and linear regression analysis techniques are utilized.
Ahearn, Elizabeth A.
2010-01-01
Multiple linear regression equations for determining flow-duration statistics were developed to estimate select flow exceedances ranging from 25- to 99-percent for six 'bioperiods'-Salmonid Spawning (November), Overwinter (December-February), Habitat Forming (March-April), Clupeid Spawning (May), Resident Spawning (June), and Rearing and Growth (July-October)-in Connecticut. Regression equations also were developed to estimate the 25- and 99-percent flow exceedances without reference to a bioperiod. In total, 32 equations were developed. The predictive equations were based on regression analyses relating flow statistics from streamgages to GIS-determined basin and climatic characteristics for the drainage areas of those streamgages. Thirty-nine streamgages (and an additional 6 short-term streamgages and 28 partial-record sites for the non-bioperiod 99-percent exceedance) in Connecticut and adjacent areas of neighboring States were used in the regression analysis. Weighted least squares regression analysis was used to determine the predictive equations; weights were assigned based on record length. The basin characteristics-drainage area, percentage of area with coarse-grained stratified deposits, percentage of area with wetlands, mean monthly precipitation (November), mean seasonal precipitation (December, January, and February), and mean basin elevation-are used as explanatory variables in the equations. Standard errors of estimate of the 32 equations ranged from 10.7 to 156 percent with medians of 19.2 and 55.4 percent to predict the 25- and 99-percent exceedances, respectively. Regression equations to estimate high and median flows (25- to 75-percent exceedances) are better predictors (smaller variability of the residual values around the regression line) than the equations to estimate low flows (less than 75-percent exceedance). The Habitat Forming (March-April) bioperiod had the smallest standard errors of estimate, ranging from 10.7 to 20.9 percent. In contrast, the Rearing and Growth (July-October) bioperiod had the largest standard errors, ranging from 30.9 to 156 percent. The adjusted coefficient of determination of the equations ranged from 77.5 to 99.4 percent with medians of 98.5 and 90.6 percent to predict the 25- and 99-percent exceedances, respectively. Descriptive information on the streamgages used in the regression, measured basin and climatic characteristics, and estimated flow-duration statistics are provided in this report. Flow-duration statistics and the 32 regression equations for estimating flow-duration statistics in Connecticut are stored on the U.S. Geological Survey World Wide Web application ?StreamStats? (http://water.usgs.gov/osw/streamstats/index.html). The regression equations developed in this report can be used to produce unbiased estimates of select flow exceedances statewide.
Regression Model Optimization for the Analysis of Experimental Data
NASA Technical Reports Server (NTRS)
Ulbrich, N.
2009-01-01
A candidate math model search algorithm was developed at Ames Research Center that determines a recommended math model for the multivariate regression analysis of experimental data. The search algorithm is applicable to classical regression analysis problems as well as wind tunnel strain gage balance calibration analysis applications. The algorithm compares the predictive capability of different regression models using the standard deviation of the PRESS residuals of the responses as a search metric. This search metric is minimized during the search. Singular value decomposition is used during the search to reject math models that lead to a singular solution of the regression analysis problem. Two threshold dependent constraints are also applied. The first constraint rejects math models with insignificant terms. The second constraint rejects math models with near-linear dependencies between terms. The math term hierarchy rule may also be applied as an optional constraint during or after the candidate math model search. The final term selection of the recommended math model depends on the regressor and response values of the data set, the user s function class combination choice, the user s constraint selections, and the result of the search metric minimization. A frequently used regression analysis example from the literature is used to illustrate the application of the search algorithm to experimental data.
ERIC Educational Resources Information Center
Fan, Xitao; Wang, Lin
The Monte Carlo study compared the performance of predictive discriminant analysis (PDA) and that of logistic regression (LR) for the two-group classification problem. Prior probabilities were used for classification, but the cost of misclassification was assumed to be equal. The study used a fully crossed three-factor experimental design (with…
ERIC Educational Resources Information Center
Luna, Andrew L.; Brennan, Kelly A.
2009-01-01
This study uses a regression model to determine if a significant difference exists between the actual budget allocation that an academic department received and the model's predicted budget allocation for that same department. Budget data from a Southeastern Master's/Comprehensive state university were used as the dependent variable, and the…
Shackelford, S D; Wheeler, T L; Koohmaraie, M
2003-01-01
The present experiment was conducted to evaluate the ability of the U.S. Meat Animal Research Center's beef carcass image analysis system to predict calculated yield grade, longissimus muscle area, preliminary yield grade, adjusted preliminary yield grade, and marbling score under commercial beef processing conditions. In two commercial beef-processing facilities, image analysis was conducted on 800 carcasses on the beef-grading chain immediately after the conventional USDA beef quality and yield grades were applied. Carcasses were blocked by plant and observed calculated yield grade. The carcasses were then separated, with 400 carcasses assigned to a calibration data set that was used to develop regression equations, and the remaining 400 carcasses assigned to a prediction data set used to validate the regression equations. Prediction equations, which included image analysis variables and hot carcass weight, accounted for 90, 88, 90, 88, and 76% of the variation in calculated yield grade, longissimus muscle area, preliminary yield grade, adjusted preliminary yield grade, and marbling score, respectively, in the prediction data set. In comparison, the official USDA yield grade as applied by online graders accounted for 73% of the variation in calculated yield grade. The technology described herein could be used by the beef industry to more accurately determine beef yield grades; however, this system does not provide an accurate enough prediction of marbling score to be used without USDA grader interaction for USDA quality grading.
NASA Astrophysics Data System (ADS)
Jakubowski, J.; Stypulkowski, J. B.; Bernardeau, F. G.
2017-12-01
The first phase of the Abu Hamour drainage and storm tunnel was completed in early 2017. The 9.5 km long, 3.7 m diameter tunnel was excavated with two Earth Pressure Balance (EPB) Tunnel Boring Machines from Herrenknecht. TBM operation processes were monitored and recorded by Data Acquisition and Evaluation System. The authors coupled collected TBM drive data with available information on rock mass properties, cleansed, completed with secondary variables and aggregated by weeks and shifts. Correlations and descriptive statistics charts were examined. Multivariate Linear Regression and CART regression tree models linking TBM penetration rate (PR), penetration per revolution (PPR) and field penetration index (FPI) with TBM operational and geotechnical characteristics were performed for the conditions of the weak/soft rock of Doha. Both regression methods are interpretable and the data were screened with different computational approaches allowing enriched insight. The primary goal of the analysis was to investigate empirical relations between multiple explanatory and responding variables, to search for best subsets of explanatory variables and to evaluate the strength of linear and non-linear relations. For each of the penetration indices, a predictive model coupling both regression methods was built and validated. The resultant models appeared to be stronger than constituent ones and indicated an opportunity for more accurate and robust TBM performance predictions.
Predicting the rate of change in timber value for forest stands infested with gypsy moth
David A. Gansner; Owen W. Herrick
1982-01-01
Presents a method for estimating the potential impact of gypsy moth attacks on forest-stand value. Robust regression analysis is used to develop an equation for predicting the rate of change in timber value from easy-to-measure key characteristics of stand condition.
ERIC Educational Resources Information Center
Del Prette, Zilda Aparecida Pereira; Prette, Almir Del; De Oliveira, Lael Almeida; Gresham, Frank M.; Vance, Michael J.
2012-01-01
Social skills are specific behaviors that individuals exhibit in order to successfully complete social tasks whereas social competence represents judgments by significant others that these social tasks have been successfully accomplished. The present investigation identified the best sociobehavioral predictors obtained from different raters…
Development of a traveltime prediction equation for streams in Arkansas
Funkhouser, Jaysson E.; Barks, C. Shane
2004-01-01
During 1971 and 1981 and 2001 and 2003, traveltime measurements were made at 33 sample sites on 18 streams throughout northern and western Arkansas using fluorescent dye. Most measurements were made during steady-state base-flow conditions with the exception of three measurements made during near steady-state medium-flow conditions (for the study described in this report, medium-flow is approximately 100-150 percent of the mean monthly streamflow during the month the dye trace was conducted). These traveltime data were compared to the U.S. Geological Survey?s national traveltime prediction equation and used to develop a specific traveltime prediction equation for Arkansas streams. In general, the national traveltime prediction equation yielded results that over-predicted the velocity of the streams for 29 of the 33 sites measured. The standard error for the national traveltime prediction equation was 105 percent. The coefficient of determination was 0.78. The Arkansas prediction equation developed from a regression analysis of dye-tracing results was a significant improvement over the national prediction equation. This regression analysis yielded a standard error of 46 percent and a coefficient of determination of 0.74. The predicted velocities using this equation compared better to measured velocities. Using the variables in a regression analysis, the Arkansas prediction equation derived for the peak velocity in feet per second was: (Actual Equation Shown in report) In addition to knowing when the peak concentration will arrive at a site, it is of great interest to know when the leading edge of a contaminant plume will arrive. The traveltime of the leading edge of a contaminant plume indicates when a potential problem might first develop and also defines the overall shape of the concentration response function. Previous USGS reports have shown no significant relation between any of the variables and the time from injection to the arrival of the leading edge of the dye plume. For this report, the analysis of the dye-tracing data yielded a significant correlation between traveltime of the leading edge and traveltime of the peak concentration with an R2 value of 0.99. These data indicate that the traveltime of the leading edge can be estimated from: (Actual Equation Shown in Report)
ERIC Educational Resources Information Center
Liu, Leping; Maddux, Cleborne D.
2008-01-01
This article presents a study of Web 2.0 articles intended to (a) analyze the content of what is written and (b) develop a statistical model to predict whether authors' write about the need for new instructional design strategies and models. Eighty-eight technology articles were subjected to lexical analysis and a logistic regression model was…
Tani, Yuji; Ogasawara, Katsuhiko
2012-01-01
This study aimed to contribute to the management of a healthcare organization by providing management information using time-series analysis of business data accumulated in the hospital information system, which has not been utilized thus far. In this study, we examined the performance of the prediction method using the auto-regressive integrated moving-average (ARIMA) model, using the business data obtained at the Radiology Department. We made the model using the data used for analysis, which was the number of radiological examinations in the past 9 years, and we predicted the number of radiological examinations in the last 1 year. Then, we compared the actual value with the forecast value. We were able to establish that the performance prediction method was simple and cost-effective by using free software. In addition, we were able to build the simple model by pre-processing the removal of trend components using the data. The difference between predicted values and actual values was 10%; however, it was more important to understand the chronological change rather than the individual time-series values. Furthermore, our method was highly versatile and adaptable compared to the general time-series data. Therefore, different healthcare organizations can use our method for the analysis and forecasting of their business data.
p53 predictive value for pT1-2 N0 disease at radical cystectomy.
Shariat, Shahrokh F; Lotan, Yair; Karakiewicz, Pierre I; Ashfaq, Raheela; Isbarn, Hendrik; Fradet, Yves; Bastian, Patrick J; Nielsen, Matthew E; Capitanio, Umberto; Jeldres, Claudio; Montorsi, Francesco; Müller, Stefan C; Karam, Jose A; Heukamp, Lukas C; Netto, George; Lerner, Seth P; Sagalowsky, Arthur I; Cote, Richard J
2009-09-01
Approximately 15% to 30% of patients with pT1-2N0M0 urothelial carcinoma of the bladder experience disease progression despite radical cystectomy with curative intent. We determined whether p53 expression would improve the prediction of disease progression after radical cystectomy for pT1-2N0M0 UCB. In a multi-institutional retrospective cohort we identified 324 patients with pT1-2N0M0 urothelial carcinoma of the bladder who underwent radical cystectomy. Analysis focused on a testing cohort of 272 patients and an external validation of 52. Competing risks regression models were used to test the association of variables with cancer specific mortality after accounting for nonbladder cancer caused mortality. In the testing cohort 91 patients (33.5%) had altered p53 expression (p53alt). On multivariate competing risks regression analysis altered p53 achieved independent status for predicting disease recurrence and cancer specific mortality (each p <0.001). Adding p53 increased the accuracy of multivariate competing risks regression models predicting recurrence and cancer specific mortality by 5.7% (62.0% vs 67.7%) and 5.4% (61.6% vs 67.0%), respectively. Alterations in p53 represent a highly promising marker of disease recurrence and cancer specific mortality after radical cystectomy for urothelial carcinoma of the bladder. Analysis confirmed previous findings and showed that considering p53 can result in substantial accuracy gains relative to the use of standard predictors. The value and the level of the current evidence clearly exceed previous proof of the independent predictor status of p53 for predicting recurrence and cancer specific mortality.
Zhang, Hong-guang; Lu, Jian-gang
2016-02-01
Abstract To overcome the problems of significant difference among samples and nonlinearity between the property and spectra of samples in spectral quantitative analysis, a local regression algorithm is proposed in this paper. In this algorithm, net signal analysis method(NAS) was firstly used to obtain the net analyte signal of the calibration samples and unknown samples, then the Euclidean distance between net analyte signal of the sample and net analyte signal of calibration samples was calculated and utilized as similarity index. According to the defined similarity index, the local calibration sets were individually selected for each unknown sample. Finally, a local PLS regression model was built on each local calibration sets for each unknown sample. The proposed method was applied to a set of near infrared spectra of meat samples. The results demonstrate that the prediction precision and model complexity of the proposed method are superior to global PLS regression method and conventional local regression algorithm based on spectral Euclidean distance.
Serum Irisin Predicts Mortality Risk in Acute Heart Failure Patients.
Shen, Shutong; Gao, Rongrong; Bei, Yihua; Li, Jin; Zhang, Haifeng; Zhou, Yanli; Yao, Wenming; Xu, Dongjie; Zhou, Fang; Jin, Mengchao; Wei, Siqi; Wang, Kai; Xu, Xuejuan; Li, Yongqin; Xiao, Junjie; Li, Xinli
2017-01-01
Irisin is a peptide hormone cleaved from a plasma membrane protein fibronectin type III domain containing protein 5 (FNDC5). Emerging studies have indicated association between serum irisin and many major chronic diseases including cardiovascular diseases. However, the role of serum irisin as a predictor for mortality risk in acute heart failure (AHF) patients is not clear. AHF patients were enrolled and serum was collected at the admission and all patients were followed up for 1 year. Enzyme-linked immunosorbent assay was used to measure serum irisin levels. To explore predictors for AHF mortality, the univariate and multivariate logistic regression analysis, and receiver-operator characteristic (ROC) curve analysis were used. To determine the role of serum irisin levels in predicting survival, Kaplan-Meier survival analysis was used. In this study, 161 AHF patients were enrolled and serum irisin level was found to be significantly higher in patients deceased in 1-year follow-up. The univariate logistic regression analysis identified 18 variables associated with all-cause mortality in AHF patients, while the multivariate logistic regression analysis identified 2 variables namely blood urea nitrogen and serum irisin. ROC curve analysis indicated that blood urea nitrogen and the most commonly used biomarker, NT-pro-BNP, displayed poor prognostic value for AHF (AUCs ≤ 0.700) compared to serum irisin (AUC = 0.753). Kaplan-Meier survival analysis demonstrated that AHF patients with higher serum irisin had significantly higher mortality (P<0.001). Collectively, our study identified serum irisin as a predictive biomarker for 1-year all-cause mortality in AHF patients though large multicenter studies are highly needed. © 2017 The Author(s). Published by S. Karger AG, Basel.
NASA Astrophysics Data System (ADS)
Takayama, T.; Iwasaki, A.
2016-06-01
Above-ground biomass prediction of tropical rain forest using remote sensing data is of paramount importance to continuous large-area forest monitoring. Hyperspectral data can provide rich spectral information for the biomass prediction; however, the prediction accuracy is affected by a small-sample-size problem, which widely exists as overfitting in using high dimensional data where the number of training samples is smaller than the dimensionality of the samples due to limitation of require time, cost, and human resources for field surveys. A common approach to addressing this problem is reducing the dimensionality of dataset. Also, acquired hyperspectral data usually have low signal-to-noise ratio due to a narrow bandwidth and local or global shifts of peaks due to instrumental instability or small differences in considering practical measurement conditions. In this work, we propose a methodology based on fused lasso regression that select optimal bands for the biomass prediction model with encouraging sparsity and grouping, which solves the small-sample-size problem by the dimensionality reduction from the sparsity and the noise and peak shift problem by the grouping. The prediction model provided higher accuracy with root-mean-square error (RMSE) of 66.16 t/ha in the cross-validation than other methods; multiple linear analysis, partial least squares regression, and lasso regression. Furthermore, fusion of spectral and spatial information derived from texture index increased the prediction accuracy with RMSE of 62.62 t/ha. This analysis proves efficiency of fused lasso and image texture in biomass estimation of tropical forests.
Yang, D H; Su, Z Q; Chen, Y; Chen, Z B; Ding, Z N; Weng, Y Y; Li, J; Li, X; Tong, Q L; Han, Y X; Zhang, X
2016-03-08
To assess the predictive value of the albumin to globulin ratio (AGR) in evaluation of disease severity and prognosis in myasthenia gravis patients. A total of 135 myasthenia gravis (MG) patients were enrolled between February 2009 and March 2015. The AGR was detected on the first day of hospitalization and ranked from lowest to highest, and the patients were divided into three equal tertiles according to the AGR values, which were T1 (AGR <1.34), T2 (1.34≤AGR≤1.53) and T3 (AGR>1.53). The Kaplan-Meier curve was used to evaluate the prognostic value of AGR. Cox model analysis was used to evaluate the relevant factors. Multivariate Logistic regression analysis was used to find the predictors of myasthenia crisis during hospitalization. The median length of hospital stay for each tertile was: for the T1 21 days (15-35.5), T2 18 days (14-27.5), and T3 16 days (12-22.5) (P<0.01), and Kaplan-Meier curves showed significant difference among the three groups. In the univariate model, serum albumin, creatinine, AGR and MGFA clinical classification were related to prognosis of myasthenia gravis. At the multivariate Cox regression analysis, the AGR (P<0.001) and MGFA clinical classification (P<0.001) were independent predictive factors of disease severity and prognosis in myasthenia gravis patients. Respectively, the hazard ratio (HR) were 4.655 (95% CI: 2.355-9.202) and 0.596 (95% CI: 0.492-0.723). Multivariate Logistic regression analysis showed the AGR (P<0.001) and MGFA clinical classification were related to myasthenia crisis. The AGR may represent a simple, potentially useful predictive biomarker for evaluating the disease severity and prognosis of patients with myasthenia gravis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhong, H; Wang, J; Shen, L
Purpose: The purpose of this study is to investigate the relationship between computed tomographic (CT) texture features of primary lesions and metastasis-free survival for rectal cancer patients; and to develop a datamining prediction model using texture features. Methods: A total of 220 rectal cancer patients treated with neoadjuvant chemo-radiotherapy (CRT) were enrolled in this study. All patients underwent CT scans before CRT. The primary lesions on the CT images were delineated by two experienced oncologists. The CT images were filtered by Laplacian of Gaussian (LoG) filters with different filter values (1.0–2.5: from fine to coarse). Both filtered and unfiltered imagesmore » were analyzed using Gray-level Co-occurrence Matrix (GLCM) texture analysis with different directions (transversal, sagittal, and coronal). Totally, 270 texture features with different species, directions and filter values were extracted. Texture features were examined with Student’s t-test for selecting predictive features. Principal Component Analysis (PCA) was performed upon the selected features to reduce the feature collinearity. Artificial neural network (ANN) and logistic regression were applied to establish metastasis prediction models. Results: Forty-six of 220 patients developed metastasis with a follow-up time of more than 2 years. Sixtyseven texture features were significantly different in t-test (p<0.05) between patients with and without metastasis, and 12 of them were extremely significant (p<0.001). The Area-under-the-curve (AUC) of ANN was 0.72, and the concordance index (CI) of logistic regression was 0.71. The predictability of ANN was slightly better than logistic regression. Conclusion: CT texture features of primary lesions are related to metastasisfree survival of rectal cancer patients. Both ANN and logistic regression based models can be developed for prediction.« less
NASA Astrophysics Data System (ADS)
Uca; Toriman, Ekhwan; Jaafar, Othman; Maru, Rosmini; Arfan, Amal; Saleh Ahmar, Ansari
2018-01-01
Prediction of suspended sediment discharge in a catchments area is very important because it can be used to evaluation the erosion hazard, management of its water resources, water quality, hydrology project management (dams, reservoirs, and irrigation) and to determine the extent of the damage that occurred in the catchments. Multiple Linear Regression analysis and artificial neural network can be used to predict the amount of daily suspended sediment discharge. Regression analysis using the least square method, whereas artificial neural networks using Radial Basis Function (RBF) and feedforward multilayer perceptron with three learning algorithms namely Levenberg-Marquardt (LM), Scaled Conjugate Descent (SCD) and Broyden-Fletcher-Goldfarb-Shanno Quasi-Newton (BFGS). The number neuron of hidden layer is three to sixteen, while in output layer only one neuron because only one output target. The mean absolute error (MAE), root mean square error (RMSE), coefficient of determination (R2 ) and coefficient of efficiency (CE) of the multiple linear regression (MLRg) value Model 2 (6 input variable independent) has the lowest the value of MAE and RMSE (0.0000002 and 13.6039) and highest R2 and CE (0.9971 and 0.9971). When compared between LM, SCG and RBF, the BFGS model structure 3-7-1 is the better and more accurate to prediction suspended sediment discharge in Jenderam catchment. The performance value in testing process, MAE and RMSE (13.5769 and 17.9011) is smallest, meanwhile R2 and CE (0.9999 and 0.9998) is the highest if it compared with the another BFGS Quasi-Newton model (6-3-1, 9-10-1 and 12-12-1). Based on the performance statistics value, MLRg, LM, SCG, BFGS and RBF suitable and accurately for prediction by modeling the non-linear complex behavior of suspended sediment responses to rainfall, water depth and discharge. The comparison between artificial neural network (ANN) and MLRg, the MLRg Model 2 accurately for to prediction suspended sediment discharge (kg/day) in Jenderan catchment area.
Regression analysis for solving diagnosis problem of children's health
NASA Astrophysics Data System (ADS)
Cherkashina, Yu A.; Gerget, O. M.
2016-04-01
The paper includes results of scientific researches. These researches are devoted to the application of statistical techniques, namely, regression analysis, to assess the health status of children in the neonatal period based on medical data (hemostatic parameters, parameters of blood tests, the gestational age, vascular-endothelial growth factor) measured at 3-5 days of children's life. In this paper a detailed description of the studied medical data is given. A binary logistic regression procedure is discussed in the paper. Basic results of the research are presented. A classification table of predicted values and factual observed values is shown, the overall percentage of correct recognition is determined. Regression equation coefficients are calculated, the general regression equation is written based on them. Based on the results of logistic regression, ROC analysis was performed, sensitivity and specificity of the model are calculated and ROC curves are constructed. These mathematical techniques allow carrying out diagnostics of health of children providing a high quality of recognition. The results make a significant contribution to the development of evidence-based medicine and have a high practical importance in the professional activity of the author.
Regression analysis using dependent Polya trees.
Schörgendorfer, Angela; Branscum, Adam J
2013-11-30
Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. Copyright © 2013 John Wiley & Sons, Ltd.
Modeling the prediction of business intelligence system effectiveness.
Weng, Sung-Shun; Yang, Ming-Hsien; Koo, Tian-Lih; Hsiao, Pei-I
2016-01-01
Although business intelligence (BI) technologies are continually evolving, the capability to apply BI technologies has become an indispensable resource for enterprises running in today's complex, uncertain and dynamic business environment. This study performed pioneering work by constructing models and rules for the prediction of business intelligence system effectiveness (BISE) in relation to the implementation of BI solutions. For enterprises, effectively managing critical attributes that determine BISE to develop prediction models with a set of rules for self-evaluation of the effectiveness of BI solutions is necessary to improve BI implementation and ensure its success. The main study findings identified the critical prediction indicators of BISE that are important to forecasting BI performance and highlighted five classification and prediction rules of BISE derived from decision tree structures, as well as a refined regression prediction model with four critical prediction indicators constructed by logistic regression analysis that can enable enterprises to improve BISE while effectively managing BI solution implementation and catering to academics to whom theory is important.
Fendt, Lúcia; Amaral, Karine; D Picon, Paulo
2017-01-01
Aim: Peginterferon plus ribavirin (peg-IFN/RBV) is still the standard of care for treatment of hepatitis C virus (HCV) in many countries. Given the high toxicity of this regimen, our study aimed to develop a prediction tool that can identify which patients are unlikely to benefit from peg-IFN/RBV and could thus postpone treatment in favor of new-generation direct-acting antivirals. Materials and methods: Binary regression was performed using demographic, clinical, and laboratory covariates and sustained virological response (SVR) outcomes from a prospective cohort of individuals referred for therapy from 2003 to 2008 in a public HCV treatment center in Rio Grande do Sul, Brazil. Results: Of the 743 participants analyzed, 489 completed 48 weeks of treatment (65.8%). A total of 202 of those who completed peg-IFN/RBV therapy achieved SVR (27.2% responders), 196 did not (26.4%), and 91 had missing viral load (VL) at week 72 (12.2% loss to follow-up). The remainder discontinued therapy (n = 254 [34.2%]), 78 (30.7%) doing so due to adverse effects. Baseline covariates included in the regression model were sex, age, human immunodeficiency virus, infection status, aspartate transaminase, alanine transaminase, hemoglobin, platelets, serum creatinine, prothrombin time, pretreatment VL, cirrhosis on liver biopsy, and treatment naivety. A predicted SVR of 17.9% had 90.0% sensitivity for detecting true nonresponders. The negative likelihood ratio at a predicted SVR of 17.9% was 0.16, and the negative predictive value was 92.6%. Conclusion: Easily obtainable variables can identify patients that will likely not benefit from peg-IFN-based therapy. This prediction model might be useful to clinicians. Clinical significance: To our knowledge, this is the only prediction tool that can reliably help clinicians to postpone peg-IFN/RBV therapy for HCV genotype 1 patients. How to cite this article: Picon RV, Fendt L, Amaral K, Picon PD. Prediction of Sustained Virological Response to Peginterferon-based Therapy for Chronic Hepatitis C: Regression Analysis of a Cohort from Rio Grande do Sul, Brazil. Euroasian J Hepato-Gastroenterol 2017;7(1):27-33. PMID:29201768
V Picon, Rafael; Fendt, Lúcia; Amaral, Karine; D Picon, Paulo
2017-01-01
Peginterferon plus ribavirin (peg-IFN/RBV) is still the standard of care for treatment of hepatitis C virus (HCV) in many countries. Given the high toxicity of this regimen, our study aimed to develop a prediction tool that can identify which patients are unlikely to benefit from peg-IFN/RBV and could thus postpone treatment in favor of new-generation direct-acting antivirals. Binary regression was performed using demographic, clinical, and laboratory covariates and sustained virological response (SVR) outcomes from a prospective cohort of individuals referred for therapy from 2003 to 2008 in a public HCV treatment center in Rio Grande do Sul, Brazil. Of the 743 participants analyzed, 489 completed 48 weeks of treatment (65.8%). A total of 202 of those who completed peg-IFN/RBV therapy achieved SVR (27.2% responders), 196 did not (26.4%), and 91 had missing viral load (VL) at week 72 (12.2% loss to follow-up). The remainder discontinued therapy (n = 254 [34.2%]), 78 (30.7%) doing so due to adverse effects. Baseline covariates included in the regression model were sex, age, human immunodeficiency virus, infection status, aspartate transaminase, alanine transaminase, hemoglobin, platelets, serum creatinine, prothrombin time, pretreatment VL, cirrhosis on liver biopsy, and treatment naivety. A predicted SVR of 17.9% had 90.0% sensitivity for detecting true nonresponders. The negative likelihood ratio at a predicted SVR of 17.9% was 0.16, and the negative predictive value was 92.6%. Easily obtainable variables can identify patients that will likely not benefit from peg-IFN-based therapy. This prediction model might be useful to clinicians. To our knowledge, this is the only prediction tool that can reliably help clinicians to postpone peg-IFN/RBV therapy for HCV genotype 1 patients. How to cite this article: Picon RV, Fendt L, Amaral K, Picon PD. Prediction of Sustained Virological Response to Peginterferon-based Therapy for Chronic Hepatitis C: Regression Analysis of a Cohort from Rio Grande do Sul, Brazil. Euroasian J Hepato-Gastroenterol 2017;7(1):27-33.
Predicting acute pain after cesarean delivery using three simple questions.
Pan, Peter H; Tonidandel, Ashley M; Aschenbrenner, Carol A; Houle, Timothy T; Harris, Lynne C; Eisenach, James C
2013-05-01
Interindividual variability in postoperative pain presents a clinical challenge. Preoperative quantitative sensory testing is useful but time consuming in predicting postoperative pain intensity. The current study was conducted to develop and validate a predictive model of acute postcesarean pain using a simple three-item preoperative questionnaire. A total of 200 women scheduled for elective cesarean delivery under subarachnoid anesthesia were enrolled (192 subjects analyzed). Patients were asked to rate the intensity of loudness of audio tones, their level of anxiety and anticipated pain, and analgesic need from surgery. Postoperatively, patients reported the intensity of evoked pain. Regression analysis was performed to generate a predictive model for pain from these measures. A validation cohort of 151 women was enrolled to test the reliability of the model (131 subjects analyzed). Responses from each of the three preoperative questions correlated moderately with 24-h evoked pain intensity (r = 0.24-0.33, P < 0.001). Audio tone rating added uniquely, but minimally, to the model and was not included in the predictive model. The multiple regression analysis yielded a statistically significant model (R = 0.20, P < 0.001), whereas the validation cohort showed reliably a very similar regression line (R = 0.18). In predicting the upper 20th percentile of evoked pain scores, the optimal cut point was 46.9 (z =0.24) such that sensitivity of 0.68 and specificity of 0.67 were as balanced as possible. This simple three-item questionnaire is useful to help predict postcesarean evoked pain intensity, and could be applied to further research and clinical application to tailor analgesic therapy to those who need it most.
Quantifying female bodily attractiveness by a statistical analysis of body measurements.
Gründl, Martin; Eisenmann-Klein, Marita; Prantl, Lukas
2009-03-01
To investigate what makes a female figure attractive, an extensive experiment was conducted using high-quality photographic stimulus material and several systematically varied figure parameters. The objective was to predict female bodily attractiveness by using figure measurements. For generating stimulus material, a frontal-view photograph of a woman with normal body proportions was taken. Using morphing software, 243 variations of this photograph were produced by systematically manipulating the following features: weight, hip width, waist width, bust size, and leg length. More than 34,000 people participated in the web-based experiment and judged the attractiveness of the figures. All of the altered figures were measured (e.g., bust width, underbust width, waist width, hip width, and so on). Based on these measurements, ratios were calculated (e.g., waist-to-hip ratio). A multiple regression analysis was designed to predict the attractiveness rank of a figure by using figure measurements. The results show that the attractiveness of a woman's figure may be predicted by using her body measurements. The regression analysis explains a variance of 80 percent. Important predictors are bust-to-underbust ratio, bust-to-waist ratio, waist-to-hip ratio, and an androgyny index (an indicator of a typical female body). The study shows that the attractiveness of a female figure is the result of complex interactions of numerous factors. It affirms the importance of viewing the appearance of a bodily feature in the context of other bodily features when performing preoperative analysis. Based on the standardized beta-weights of the regression model, the relative importance of figure parameters in context of preoperative analysis is discussed.
FPGA implementation of predictive degradation model for engine oil lifetime
NASA Astrophysics Data System (ADS)
Idros, M. F. M.; Razak, A. H. A.; Junid, S. A. M. Al; Suliman, S. I.; Halim, A. K.
2018-03-01
This paper presents the implementation of linear regression model for degradation prediction on Register Transfer Logic (RTL) using QuartusII. A stationary model had been identified in the degradation trend for the engine oil in a vehicle in time series method. As for RTL implementation, the degradation model is written in Verilog HDL and the data input are taken at a certain time. Clock divider had been designed to support the timing sequence of input data. At every five data, a regression analysis is adapted for slope variation determination and prediction calculation. Here, only the negative value are taken as the consideration for the prediction purposes for less number of logic gate. Least Square Method is adapted to get the best linear model based on the mean values of time series data. The coded algorithm has been implemented on FPGA for validation purposes. The result shows the prediction time to change the engine oil.
Harvey, H Benjamin; Liu, Catherine; Ai, Jing; Jaworsky, Cristina; Guerrier, Claude Emmanuel; Flores, Efren; Pianykh, Oleg
2017-10-01
To test whether data elements available in the electronic medical record (EMR) can be effectively leveraged to predict failure to attend a scheduled radiology examination. Using data from a large academic medical center, we identified all patients with a diagnostic imaging examination scheduled from January 1, 2016, to April 1, 2016, and determined whether the patient successfully attended the examination. Demographic, clinical, and health services utilization variables available in the EMR potentially relevant to examination attendance were recorded for each patient. We used descriptive statistics and logistic regression models to test whether these data elements could predict failure to attend a scheduled radiology examination. The predictive accuracy of the regression models were determined by calculating the area under the receiver operator curve. Among the 54,652 patient appointments with radiology examinations scheduled during the study period, 6.5% were no-shows. No-show rates were highest for the modalities of mammography and CT and lowest for PET and MRI. Logistic regression indicated that 16 of the 27 demographic, clinical, and health services utilization factors were significantly associated with failure to attend a scheduled radiology examination (P ≤ .05). Stepwise logistic regression analysis demonstrated that previous no-shows, days between scheduling and appointments, modality type, and insurance type were most strongly predictive of no-show. A model considering all 16 data elements had good ability to predict radiology no-shows (area under the receiver operator curve = 0.753). The predictive ability was similar or improved when these models were analyzed by modality. Patient and examination information readily available in the EMR can be successfully used to predict radiology no-shows. Moving forward, this information can be proactively leveraged to identify patients who might benefit from additional patient engagement through appointment reminders or other targeted interventions to avoid no-shows. Copyright © 2017 American College of Radiology. Published by Elsevier Inc. All rights reserved.
Monthly streamflow forecasting in the Rhine basin
NASA Astrophysics Data System (ADS)
Schick, Simon; Rössler, Ole; Weingartner, Rolf
2017-04-01
Forecasting seasonal streamflow of the Rhine river is of societal relevance as the Rhine is an important water way and water resource in Western Europe. The present study investigates the predictability of monthly mean streamflow at lead times of zero, one, and two months with the focus on potential benefits by the integration of seasonal climate predictions. Specifically, we use seasonal predictions of precipitation and surface air temperature released by the European Centre for Medium-Range Weather Forecasts (ECMWF) for a regression analysis. In order to disentangle forecast uncertainty, the 'Reverse Ensemble Streamflow Prediction' framework is adapted here to the context of regression: By using appropriate subsets of predictors the regression model is constrained to either the initial conditions, the meteorological forcing, or both. An operational application is mimicked by equipping the model with the seasonal climate predictions provided by ECMWF. Finally, to mitigate the spatial aggregation of the meteorological fields the model is also applied at the subcatchment scale, and the resulting predictions are combined afterwards. The hindcast experiment is carried out for the period 1982-2011 in cross validation mode at two gauging stations, namely the Rhine at Lobith and Basel. The results show that monthly forecasts are skillful with respect to climatology only at zero lead time. In addition, at zero lead time the integration of seasonal climate predictions decreases the mean absolute error by 5 to 10 percentage compared to forecasts which are solely based on initial conditions. This reduction most likely is induced by the seasonal prediction of precipitation and not air temperature. The study is completed by bench marking the regression model with runoff simulations from ECMWFs seasonal forecast system. By simply using basin averages followed by a linear bias correction, these runoff simulations translate well to monthly streamflow. Though the regression model is only slightly outperformed, we argue that runoff out of the land surface component of seasonal climate forecasting systems is an interesting option when it comes to seasonal streamflow forecasting in large river basins.
2012-06-01
indices was independent of the presence or absence of hepatic steatosis on abdominal imaging. Key Research Accomplishments Abstracts presented at...transplant; 3) hepatic encephalopathy; and 4) hepatocellular carcinoma. Logistic regression analysis confirmed that the clinical predictive value of the...Gumus S, Saul,MI Bae KT. Noninvasive Hepatic Fibrosis Scores Predict Liver-Related Outcomes in Diabetic Patients [abstract]. Gastroenterology. 2012
Jose F. Negron
1998-01-01
Infested and uninfested areas within Douglas fir, Pseudotsuga menziesii Mirb.. Franco, stands affected by the Douglas-fir beetle, Dendroctonus pseudotsugae Hopk. were sampled in the Colorado Front Range, CO. Classification tree models were built to predict probabilities of infestation. Regression trees and linear regression analysis were used to model amount of tree...
Financial, Academic, and Environmental Influences on the Retention and Graduation of Students
ERIC Educational Resources Information Center
Wohlgemuth, Darin; Whalen, Don; Sullivan, Julia; Nading, Carolyn; Shelley, Mack; Wang, Yongyi (Rebecca)
2007-01-01
Regression analysis was used to study retention and graduation for the fall 1996 entering class of students at a midwestern research extensive university (n = 3,610; 44% female, 8% minority, 77% in-state). Logistic regression was used to predict the likelihood of a student being retained for each of four years, and the outcome of graduation at the…
Development of precursors recognition methods in vector signals
NASA Astrophysics Data System (ADS)
Kapralov, V. G.; Elagin, V. V.; Kaveeva, E. G.; Stankevich, L. A.; Dremin, M. M.; Krylov, S. V.; Borovov, A. E.; Harfush, H. A.; Sedov, K. S.
2017-10-01
Precursor recognition methods in vector signals of plasma diagnostics are presented. Their requirements and possible options for their development are considered. In particular, the variants of using symbolic regression for building a plasma disruption prediction system are discussed. The initial data preparation using correlation analysis and symbolic regression is discussed. Special attention is paid to the possibility of using algorithms in real time.
2011-01-01
Principal component regression is a multivariate data analysis approach routinely used to predict neurochemical concentrations from in vivo fast-scan cyclic voltammetry measurements. This mathematical procedure can rapidly be employed with present day computer programming languages. Here, we evaluate several methods that can be used to evaluate and improve multivariate concentration determination. The cyclic voltammetric representation of the calculated regression vector is shown to be a valuable tool in determining whether the calculated multivariate model is chemically appropriate. The use of Cook’s distance successfully identified outliers contained within in vivo fast-scan cyclic voltammetry training sets. This work also presents the first direct interpretation of a residual color plot and demonstrated the effect of peak shifts on predicted dopamine concentrations. Finally, separate analyses of smaller increments of a single continuous measurement could not be concatenated without substantial error in the predicted neurochemical concentrations due to electrode drift. Taken together, these tools allow for the construction of more robust multivariate calibration models and provide the first approach to assess the predictive ability of a procedure that is inherently impossible to validate because of the lack of in vivo standards. PMID:21966586
Sloas, Stacey B; Keith, Becky; Whitehead, Malcolm T
2013-01-01
This study investigated a pretest strategy that identified physical therapist assistant (PTA) students who were at risk of failure on the National Physical Therapy Examination (NPTE). Program assessment data from five cohorts of PTA students (2005-2009) were used to develop a stepwise multiple regression formula that predicted first-time NPTE licensure scores. Data used included the Nelson-Denny Reading Test, grades from eight core courses, grade point average upon admission to the program, and scores from three mock NPTE exams given during the program. Pearson correlation coefficients were calculated between each of the 15 variables and NPTE scores. Stepwise multiple regression analysis was performed using data collected at the ends of the first, second, and third (final) semesters of the program. Data from the class of 2010 were then used to validate the formula. The end-of-program formula accounted for the greatest variance (57%) in predicted scores. Those students scoring below a predicted scaled score of 620 were identified to be at risk of failure of the licensure exam. These students were counseled, and a remedial plan was developed based on regression predictions prior to them sitting for the licensure exam.
Yılmaz Isıkhan, Selen; Karabulut, Erdem; Alpar, Celal Reha
2016-01-01
Background/Aim . Evaluating the success of dose prediction based on genetic or clinical data has substantially advanced recently. The aim of this study is to predict various clinical dose values from DNA gene expression datasets using data mining techniques. Materials and Methods . Eleven real gene expression datasets containing dose values were included. First, important genes for dose prediction were selected using iterative sure independence screening. Then, the performances of regression trees (RTs), support vector regression (SVR), RT bagging, SVR bagging, and RT boosting were examined. Results . The results demonstrated that a regression-based feature selection method substantially reduced the number of irrelevant genes from raw datasets. Overall, the best prediction performance in nine of 11 datasets was achieved using SVR; the second most accurate performance was provided using a gradient-boosting machine (GBM). Conclusion . Analysis of various dose values based on microarray gene expression data identified common genes found in our study and the referenced studies. According to our findings, SVR and GBM can be good predictors of dose-gene datasets. Another result of the study was to identify the sample size of n = 25 as a cutoff point for RT bagging to outperform a single RT.
Keithley, Richard B; Wightman, R Mark
2011-06-07
Principal component regression is a multivariate data analysis approach routinely used to predict neurochemical concentrations from in vivo fast-scan cyclic voltammetry measurements. This mathematical procedure can rapidly be employed with present day computer programming languages. Here, we evaluate several methods that can be used to evaluate and improve multivariate concentration determination. The cyclic voltammetric representation of the calculated regression vector is shown to be a valuable tool in determining whether the calculated multivariate model is chemically appropriate. The use of Cook's distance successfully identified outliers contained within in vivo fast-scan cyclic voltammetry training sets. This work also presents the first direct interpretation of a residual color plot and demonstrated the effect of peak shifts on predicted dopamine concentrations. Finally, separate analyses of smaller increments of a single continuous measurement could not be concatenated without substantial error in the predicted neurochemical concentrations due to electrode drift. Taken together, these tools allow for the construction of more robust multivariate calibration models and provide the first approach to assess the predictive ability of a procedure that is inherently impossible to validate because of the lack of in vivo standards.
Neural Network and Regression Methods Demonstrated in the Design Optimization of a Subsonic Aircraft
NASA Technical Reports Server (NTRS)
Hopkins, Dale A.; Lavelle, Thomas M.; Patnaik, Surya
2003-01-01
The neural network and regression methods of NASA Glenn Research Center s COMETBOARDS design optimization testbed were used to generate approximate analysis and design models for a subsonic aircraft operating at Mach 0.85 cruise speed. The analytical model is defined by nine design variables: wing aspect ratio, engine thrust, wing area, sweep angle, chord-thickness ratio, turbine temperature, pressure ratio, bypass ratio, fan pressure; and eight response parameters: weight, landing velocity, takeoff and landing field lengths, approach thrust, overall efficiency, and compressor pressure and temperature. The variables were adjusted to optimally balance the engines to the airframe. The solution strategy included a sensitivity model and the soft analysis model. Researchers generated the sensitivity model by training the approximators to predict an optimum design. The trained neural network predicted all response variables, within 5-percent error. This was reduced to 1 percent by the regression method. The soft analysis model was developed to replace aircraft analysis as the reanalyzer in design optimization. Soft models have been generated for a neural network method, a regression method, and a hybrid method obtained by combining the approximators. The performance of the models is graphed for aircraft weight versus thrust as well as for wing area and turbine temperature. The regression method followed the analytical solution with little error. The neural network exhibited 5-percent maximum error over all parameters. Performance of the hybrid method was intermediate in comparison to the individual approximators. Error in the response variable is smaller than that shown in the figure because of a distortion scale factor. The overall performance of the approximators was considered to be satisfactory because aircraft analysis with NASA Langley Research Center s FLOPS (Flight Optimization System) code is a synthesis of diverse disciplines: weight estimation, aerodynamic analysis, engine cycle analysis, propulsion data interpolation, mission performance, airfield length for landing and takeoff, noise footprint, and others.
Parsaeian, M; Mohammad, K; Mahmoudi, M; Zeraati, H
2012-01-01
Background: The purpose of this investigation was to compare empirically predictive ability of an artificial neural network with a logistic regression in prediction of low back pain. Methods: Data from the second national health survey were considered in this investigation. This data includes the information of low back pain and its associated risk factors among Iranian people aged 15 years and older. Artificial neural network and logistic regression models were developed using a set of 17294 data and they were validated in a test set of 17295 data. Hosmer and Lemeshow recommendation for model selection was used in fitting the logistic regression. A three-layer perceptron with 9 inputs, 3 hidden and 1 output neurons was employed. The efficiency of two models was compared by receiver operating characteristic analysis, root mean square and -2 Loglikelihood criteria. Results: The area under the ROC curve (SE), root mean square and -2Loglikelihood of the logistic regression was 0.752 (0.004), 0.3832 and 14769.2, respectively. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the artificial neural network was 0.754 (0.004), 0.3770 and 14757.6, respectively. Conclusions: Based on these three criteria, artificial neural network would give better performance than logistic regression. Although, the difference is statistically significant, it does not seem to be clinically significant. PMID:23113198
Parsaeian, M; Mohammad, K; Mahmoudi, M; Zeraati, H
2012-01-01
The purpose of this investigation was to compare empirically predictive ability of an artificial neural network with a logistic regression in prediction of low back pain. Data from the second national health survey were considered in this investigation. This data includes the information of low back pain and its associated risk factors among Iranian people aged 15 years and older. Artificial neural network and logistic regression models were developed using a set of 17294 data and they were validated in a test set of 17295 data. Hosmer and Lemeshow recommendation for model selection was used in fitting the logistic regression. A three-layer perceptron with 9 inputs, 3 hidden and 1 output neurons was employed. The efficiency of two models was compared by receiver operating characteristic analysis, root mean square and -2 Loglikelihood criteria. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the logistic regression was 0.752 (0.004), 0.3832 and 14769.2, respectively. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the artificial neural network was 0.754 (0.004), 0.3770 and 14757.6, respectively. Based on these three criteria, artificial neural network would give better performance than logistic regression. Although, the difference is statistically significant, it does not seem to be clinically significant.
Ma, Jing; Yu, Jiong; Hao, Guangshu; Wang, Dan; Sun, Yanni; Lu, Jianxin; Cao, Hongcui; Lin, Feiyan
2017-02-20
The prevalence of high hyperlipemia is increasing around the world. Our aims are to analyze the relationship of triglyceride (TG) and cholesterol (TC) with indexes of liver function and kidney function, and to develop a prediction model of TG, TC in overweight people. A total of 302 adult healthy subjects and 273 overweight subjects were enrolled in this study. The levels of fasting indexes of TG (fs-TG), TC (fs-TC), blood glucose, liver function, and kidney function were measured and analyzed by correlation analysis and multiple linear regression (MRL). The back propagation artificial neural network (BP-ANN) was applied to develop prediction models of fs-TG and fs-TC. The results showed there was significant difference in biochemical indexes between healthy people and overweight people. The correlation analysis showed fs-TG was related to weight, height, blood glucose, and indexes of liver and kidney function; while fs-TC was correlated with age, indexes of liver function (P < 0.01). The MRL analysis indicated regression equations of fs-TG and fs-TC both had statistic significant (P < 0.01) when included independent indexes. The BP-ANN model of fs-TG reached training goal at 59 epoch, while fs-TC model achieved high prediction accuracy after training 1000 epoch. In conclusions, there was high relationship of fs-TG and fs-TC with weight, height, age, blood glucose, indexes of liver function and kidney function. Based on related variables, the indexes of fs-TG and fs-TC can be predicted by BP-ANN models in overweight people.
Lien, Tonje G; Borgan, Ørnulf; Reppe, Sjur; Gautvik, Kaare; Glad, Ingrid Kristine
2018-03-07
Using high-dimensional penalized regression we studied genome-wide DNA-methylation in bone biopsies of 80 postmenopausal women in relation to their bone mineral density (BMD). The women showed BMD varying from severely osteoporotic to normal. Global gene expression data from the same individuals was available, and since DNA-methylation often affects gene expression, the overall aim of this paper was to include both of these omics data sets into an integrated analysis. The classical penalized regression uses one penalty, but we incorporated individual penalties for each of the DNA-methylation sites. These individual penalties were guided by the strength of association between DNA-methylations and gene transcript levels. DNA-methylations that were highly associated to one or more transcripts got lower penalties and were therefore favored compared to DNA-methylations showing less association to expression. Because of the complex pathways and interactions among genes, we investigated both the association between DNA-methylations and their corresponding cis gene, as well as the association between DNA-methylations and trans-located genes. Two integrating penalized methods were used: first, an adaptive group-regularized ridge regression, and secondly, variable selection was performed through a modified version of the weighted lasso. When information from gene expressions was integrated, predictive performance was considerably improved, in terms of predictive mean square error, compared to classical penalized regression without data integration. We found a 14.7% improvement in the ridge regression case and a 17% improvement for the lasso case. Our version of the weighted lasso with data integration found a list of 22 interesting methylation sites. Several corresponded to genes that are known to be important in bone formation. Using BMD as response and these 22 methylation sites as covariates, least square regression analyses resulted in R 2 =0.726, comparable to an average R 2 =0.438 for 10000 randomly selected groups of DNA-methylations with group size 22. Two recent types of penalized regression methods were adapted to integrate DNA-methylation and their association to gene expression in the analysis of bone mineral density. In both cases predictions clearly benefit from including the additional information on gene expressions.
Psychosocial correlates of HIV protection motivation among black adolescents in Venda, South Africa.
Boer, Henk; Mashamba, M Tshilidzi
2005-12-01
We assessed the usefulness of the theory of planned behavior (TPB) and protection motivation theory (PMT) to predict intended condom use among 201 adolescents from Venda, South Africa. Results indicated that both the TPB and the PMT could significantly predict intended condom use, although the level of explained variance was limited. Hierarchical regression analysis indicated that there was considerable overlap between the TPB and the PMT in predicting condom use intention. In the regression analysis that used both the TPB and the PMT variables subjective norms and response efficacy were positively related to intended condom use. The results indicated that both the TPB and the PMT were valuable in explaining intended condom use among African adolescents. The TPB made clear that the social environment is an important contextual factor, whereas the PMT made clear that response efficacy is positively related to condom use intention. The results of this study indicated that social cognition models have some value in the analysis of condom use intention of African adolescents, but the role of other factors like myths about condoms should be further examined.
NASA Technical Reports Server (NTRS)
Ulbrich, N.; Volden, T.
2018-01-01
Analysis and use of temperature-dependent wind tunnel strain-gage balance calibration data are discussed in the paper. First, three different methods are presented and compared that may be used to process temperature-dependent strain-gage balance data. The first method uses an extended set of independent variables in order to process the data and predict balance loads. The second method applies an extended load iteration equation during the analysis of balance calibration data. The third method uses temperature-dependent sensitivities for the data analysis. Physical interpretations of the most important temperature-dependent regression model terms are provided that relate temperature compensation imperfections and the temperature-dependent nature of the gage factor to sets of regression model terms. Finally, balance calibration recommendations are listed so that temperature-dependent calibration data can be obtained and successfully processed using the reviewed analysis methods.
Advanced Statistics for Exotic Animal Practitioners.
Hodsoll, John; Hellier, Jennifer M; Ryan, Elizabeth G
2017-09-01
Correlation and regression assess the association between 2 or more variables. This article reviews the core knowledge needed to understand these analyses, moving from visual analysis in scatter plots through correlation, simple and multiple linear regression, and logistic regression. Correlation estimates the strength and direction of a relationship between 2 variables. Regression can be considered more general and quantifies the numerical relationships between an outcome and 1 or multiple variables in terms of a best-fit line, allowing predictions to be made. Each technique is discussed with examples and the statistical assumptions underlying their correct application. Copyright © 2017 Elsevier Inc. All rights reserved.
Robust neural network with applications to credit portfolio data analysis.
Feng, Yijia; Li, Runze; Sudjianto, Agus; Zhang, Yiyun
2010-01-01
In this article, we study nonparametric conditional quantile estimation via neural network structure. We proposed an estimation method that combines quantile regression and neural network (robust neural network, RNN). It provides good smoothing performance in the presence of outliers and can be used to construct prediction bands. A Majorization-Minimization (MM) algorithm was developed for optimization. Monte Carlo simulation study is conducted to assess the performance of RNN. Comparison with other nonparametric regression methods (e.g., local linear regression and regression splines) in real data application demonstrate the advantage of the newly proposed procedure.
Bahadir Kilavuzoglu, Ayse Ebru; Bozkurt, Tahir Kansu; Cosar, Cemile Banu; Sener, Asım Bozkurt
2017-06-24
To describe a sample predictive model for intraocular pressure (IOP) following laser in situ keratomileusis (LASIK) for myopia and an IOP constant. The records of patients that underwent LASIK for myopia and myopic astigmatism via WaveLight Allegretto Wave Eye-Q 400 Hz excimer laser and Hansatome XP microkeratome were retrospectively reviewed. Patients with no systemic or ocular disease other than myopia or myopic astigmatism were included in the study. Preoperative and postoperative month 1 data and intraoperative data were used to build the predictive model for IOP after LASIK. The IOP constant was calculated by subtracting the predicted IOP from preoperative IOP. The paired samples t test, Pearson's correlation analysis, curve estimation analysis, and linear regression analysis were used to evaluate the study data. The study included 425 eyes in 214 patients with a mean age of 32 ± 7.8 years. Mean spherical equivalent of the attempted correction (SE-ac) was -3.7 ± 1.7 diopters. Mean post-LASIK decrease in IOP was 4.6 ± 2.3 mmHg. The difference between preoperative and postoperative IOP was statistically significant (P < 0.001). SE-ac, preoperative IOP, and central corneal thickness had highly significant effects on postoperative IOP, based on linear regression analysis (P < 0.001 and R 2 = 0.043, P < 0.001 and R 2 = 0.370, and P < 0.001 and R 2 = 0.132, respectively). Regression model was created (F = 127.733, P < 0.001), and the adjusted R 2 value was 0.548. Evaluation of IOP after LASIK is important in myopic patients. The present study described a practical formula for predicting the true IOP with the aid of an IOP constant value in myopic eyes following LASIK.
NASA Technical Reports Server (NTRS)
Mcgwire, K.; Friedl, M.; Estes, J. E.
1993-01-01
This article describes research related to sampling techniques for establishing linear relations between land surface parameters and remotely-sensed data. Predictive relations are estimated between percentage tree cover in a savanna environment and a normalized difference vegetation index (NDVI) derived from the Thematic Mapper sensor. Spatial autocorrelation in original measurements and regression residuals is examined using semi-variogram analysis at several spatial resolutions. Sampling schemes are then tested to examine the effects of autocorrelation on predictive linear models in cases of small sample sizes. Regression models between image and ground data are affected by the spatial resolution of analysis. Reducing the influence of spatial autocorrelation by enforcing minimum distances between samples may also improve empirical models which relate ground parameters to satellite data.
Dinç, Erdal; Ertekin, Zehra Ceren
2016-01-01
An application of parallel factor analysis (PARAFAC) and three-way partial least squares (3W-PLS1) regression models to ultra-performance liquid chromatography-photodiode array detection (UPLC-PDA) data with co-eluted peaks in the same wavelength and time regions was described for the multicomponent quantitation of hydrochlorothiazide (HCT) and olmesartan medoxomil (OLM) in tablets. Three-way dataset of HCT and OLM in their binary mixtures containing telmisartan (IS) as an internal standard was recorded with a UPLC-PDA instrument. Firstly, the PARAFAC algorithm was applied for the decomposition of three-way UPLC-PDA data into the chromatographic, spectral and concentration profiles to quantify the concerned compounds. Secondly, 3W-PLS1 approach was subjected to the decomposition of a tensor consisting of three-way UPLC-PDA data into a set of triads to build 3W-PLS1 regression for the analysis of the same compounds in samples. For the proposed three-way analysis methods in the regression and prediction steps, the applicability and validity of PARAFAC and 3W-PLS1 models were checked by analyzing the synthetic mixture samples, inter-day and intra-day samples, and standard addition samples containing HCT and OLM. Two different three-way analysis methods, PARAFAC and 3W-PLS1, were successfully applied to the quantitative estimation of the solid dosage form containing HCT and OLM. Regression and prediction results provided from three-way analysis were compared with those obtained by traditional UPLC method. Copyright © 2015 Elsevier B.V. All rights reserved.
Predicting Higher Education Outcomes and Implications for a Postsecondary Institution Ratings System
ERIC Educational Resources Information Center
Walker, Eddie G., II
2016-01-01
The accountability of colleges and universities is a high priority for those making policy decisions. The purpose of this study was to determine institutional characteristics predicting retention rates, graduation rates and transfer-out rates using publicly available data from the US Department of Education. Using regression analysis, it was…
ERIC Educational Resources Information Center
Bozpolat, Ebru
2017-01-01
The purpose of this study is to determine whether Cumhuriyet University Faculty of Education students' levels of speaking anxiety are predicted by the variables of gender, department, grade, such sub-dimensions of "Speaking Self-Efficacy Scale for Pre-Service Teachers" as "public speaking," "effective speaking,"…
Modeling critical habitat for Flammulated Owls (Otus flammeolus)
David A. Christie; Astrid M. van Woudenberg
1997-01-01
Multiple logistic regression analysis was used to produce a prediction model for Flammulated Owl (Otus flammeolus) breeding habitat within the Kamloops Forest Region in south-central British Columbia. Using the model equation, a pilot habitat prediction map was created within a Geographic Information System (GIS) environment that had a 75.7 percent...
Adolescent Suicide Attempters: What Predicts Future Suicidal Acts?
ERIC Educational Resources Information Center
Groholt, Berit; Ekeberg, Oivind; Haldorsen, Tor
2006-01-01
Predictors for repetition of suicide attempts were evaluated among 92 adolescent suicide attempters 9 years after an index suicide attempt (90% females). Five were dead, two by suicide. Thirty-one (42%) of 73 had repeated a suicide attempt. In multiple Cox regression analysis, four factors had an independent predictive effect: comorbid disorders,…
The Utility of the MAPI in Predicting Urban Middle School Competence.
ERIC Educational Resources Information Center
Paulus, John A.; Perosa, Linda M.
A sample of 107 eighth graders from a large urban middle school in the Midwest was administered the Millon Adolescent Personality Inventory (MAPI) to determine its utility in predicting grades earned, attendance, and social competence. The results of hierarchical multiple regression analysis indicated that the MAPI coping patterns significantly…
ERIC Educational Resources Information Center
Montoya, Isaac D.
2008-01-01
Three classification techniques (Chi-square Automatic Interaction Detection [CHAID], Classification and Regression Tree [CART], and discriminant analysis) were tested to determine their accuracy in predicting Temporary Assistance for Needy Families program recipients' future employment. Technique evaluation was based on proportion of correctly…
ERIC Educational Resources Information Center
Hicks, Catherine
2018-01-01
Purpose: This paper aims to explore predicting employee learning activity via employee characteristics and usage for two online learning tools. Design/methodology/approach: Statistical analysis focused on observational data collected from user logs. Data are analyzed via regression models. Findings: Findings are presented for over 40,000…
ERIC Educational Resources Information Center
Saltonstall, Margot
2013-01-01
This study seeks to advance and expand research on college student success. Using multinomial logistic regression analysis, the study investigates the contribution of psychosocial variables above and beyond traditional achievement and demographic measures to predicting first-semester college grade point average (GPA). It also investigates if…
Graphical methods for the sensitivity analysis in discriminant analysis
Kim, Youngil; Anderson-Cook, Christine M.; Dae-Heung, Jang
2015-09-30
Similar to regression, many measures to detect influential data points in discriminant analysis have been developed. Many follow similar principles as the diagnostic measures used in linear regression in the context of discriminant analysis. Here we focus on the impact on the predicted classification posterior probability when a data point is omitted. The new method is intuitive and easily interpretative compared to existing methods. We also propose a graphical display to show the individual movement of the posterior probability of other data points when a specific data point is omitted. This enables the summaries to capture the overall pattern ofmore » the change.« less
A New Hybrid-Multiscale SSA Prediction of Non-Stationary Time Series
NASA Astrophysics Data System (ADS)
Ghanbarzadeh, Mitra; Aminghafari, Mina
2016-02-01
Singular spectral analysis (SSA) is a non-parametric method used in the prediction of non-stationary time series. It has two parameters, which are difficult to determine and very sensitive to their values. Since, SSA is a deterministic-based method, it does not give good results when the time series is contaminated with a high noise level and correlated noise. Therefore, we introduce a novel method to handle these problems. It is based on the prediction of non-decimated wavelet (NDW) signals by SSA and then, prediction of residuals by wavelet regression. The advantages of our method are the automatic determination of parameters and taking account of the stochastic structure of time series. As shown through the simulated and real data, we obtain better results than SSA, a non-parametric wavelet regression method and Holt-Winters method.
Ling, Ru; Liu, Jiawang
2011-12-01
To construct prediction model for health workforce and hospital beds in county hospitals of Hunan by multiple linear regression. We surveyed 16 counties in Hunan with stratified random sampling according to uniform questionnaires,and multiple linear regression analysis with 20 quotas selected by literature view was done. Independent variables in the multiple linear regression model on medical personnels in county hospitals included the counties' urban residents' income, crude death rate, medical beds, business occupancy, professional equipment value, the number of devices valued above 10 000 yuan, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, and utilization rate of hospital beds. Independent variables in the multiple linear regression model on county hospital beds included the the population of aged 65 and above in the counties, disposable income of urban residents, medical personnel of medical institutions in county area, business occupancy, the total value of professional equipment, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, utilization rate of hospital beds, and length of hospitalization. The prediction model shows good explanatory and fitting, and may be used for short- and mid-term forecasting.
2018-01-01
Background Many studies have tried to develop predictors for return-to-work (RTW). However, since complex factors have been demonstrated to predict RTW, it is difficult to use them practically. This study investigated whether factors used in previous studies could predict whether an individual had returned to his/her original work by four years after termination of the worker's recovery period. Methods An initial logistic regression analysis of 1,567 participants of the fourth Panel Study of Worker's Compensation Insurance yielded odds ratios. The participants were divided into two subsets, a training dataset and a test dataset. Using the training dataset, logistic regression, decision tree, random forest, and support vector machine models were established, and important variables of each model were identified. The predictive abilities of the different models were compared. Results The analysis showed that only earned income and company-related factors significantly affected return-to-original-work (RTOW). The random forest model showed the best accuracy among the tested machine learning models; however, the difference was not prominent. Conclusion It is possible to predict a worker's probability of RTOW using machine learning techniques with moderate accuracy. PMID:29736160
Doyle, Frank; McGee, Hannah; Delaney, Mary; Motterlini, Nicola; Conroy, Ronán
2011-01-01
Depression is prevalent in patients hospitalized with acute coronary syndrome (ACS). We determined whether theoretical vulnerabilities for depression (interpersonal life events, reinforcing events, cognitive distortions, Type D personality) predicted depression, or depression trajectories, post-hospitalization. We followed 375 ACS patients who completed depression scales during hospital admission and at least once during three follow-up intervals over 1 year (949 observations). Questionnaires assessing vulnerabilities were completed at baseline. Logistic regression for panel/longitudinal data predicted depression status during follow-up. Latent class analysis determined depression trajectories. Multinomial logistic regression modeled the relationship between vulnerabilities and trajectories. Vulnerabilities predicted depression status over time in univariate and multivariate analysis, even when controlling for baseline depression. Proportions in each depression trajectory category were as follows: persistent (15%), subthreshold (37%), never depressed (48%). Vulnerabilities independently predicted each of these trajectories, with effect sizes significantly highest for the persistent depression group. Self-reported vulnerabilities - stressful life events, reduced reinforcing events, cognitive distortions, personality - measured during hospitalization can identify those at risk for depression post-ACS and especially those with persistent depressive episodes. Interventions should focus on these vulnerabilities. Copyright © 2011 Elsevier Inc. All rights reserved.
Singal, Amit G.; Mukherjee, Ashin; Elmunzer, B. Joseph; Higgins, Peter DR; Lok, Anna S.; Zhu, Ji; Marrero, Jorge A; Waljee, Akbar K
2015-01-01
Background Predictive models for hepatocellular carcinoma (HCC) have been limited by modest accuracy and lack of validation. Machine learning algorithms offer a novel methodology, which may improve HCC risk prognostication among patients with cirrhosis. Our study's aim was to develop and compare predictive models for HCC development among cirrhotic patients, using conventional regression analysis and machine learning algorithms. Methods We enrolled 442 patients with Child A or B cirrhosis at the University of Michigan between January 2004 and September 2006 (UM cohort) and prospectively followed them until HCC development, liver transplantation, death, or study termination. Regression analysis and machine learning algorithms were used to construct predictive models for HCC development, which were tested on an independent validation cohort from the Hepatitis C Antiviral Long-term Treatment against Cirrhosis (HALT-C) Trial. Both models were also compared to the previously published HALT-C model. Discrimination was assessed using receiver operating characteristic curve analysis and diagnostic accuracy was assessed with net reclassification improvement and integrated discrimination improvement statistics. Results After a median follow-up of 3.5 years, 41 patients developed HCC. The UM regression model had a c-statistic of 0.61 (95%CI 0.56-0.67), whereas the machine learning algorithm had a c-statistic of 0.64 (95%CI 0.60–0.69) in the validation cohort. The machine learning algorithm had significantly better diagnostic accuracy as assessed by net reclassification improvement (p<0.001) and integrated discrimination improvement (p=0.04). The HALT-C model had a c-statistic of 0.60 (95%CI 0.50-0.70) in the validation cohort and was outperformed by the machine learning algorithm (p=0.047). Conclusion Machine learning algorithms improve the accuracy of risk stratifying patients with cirrhosis and can be used to accurately identify patients at high-risk for developing HCC. PMID:24169273
Ignjatović, Aleksandra; Stojanović, Miodrag; Milošević, Zoran; Anđelković Apostolović, Marija
2017-12-02
The interest in developing risk models in medicine not only is appealing, but also associated with many obstacles in different aspects of predictive model development. Initially, the association of biomarkers or the association of more markers with the specific outcome was proven by statistical significance, but novel and demanding questions required the development of new and more complex statistical techniques. Progress of statistical analysis in biomedical research can be observed the best through the history of the Framingham study and development of the Framingham score. Evaluation of predictive models comes from a combination of the facts which are results of several metrics. Using logistic regression and Cox proportional hazards regression analysis, the calibration test, and the ROC curve analysis should be mandatory and eliminatory, and the central place should be taken by some new statistical techniques. In order to obtain complete information related to the new marker in the model, recently, there is a recommendation to use the reclassification tables by calculating the net reclassification index and the integrated discrimination improvement. Decision curve analysis is a novel method for evaluating the clinical usefulness of a predictive model. It may be noted that customizing and fine-tuning of the Framingham risk score initiated the development of statistical analysis. Clinically applicable predictive model should be a trade-off between all abovementioned statistical metrics, a trade-off between calibration and discrimination, accuracy and decision-making, costs and benefits, and quality and quantity of patient's life.
Quality of life in breast cancer patients--a quantile regression analysis.
Pourhoseingholi, Mohamad Amin; Safaee, Azadeh; Moghimi-Dehkordi, Bijan; Zeighami, Bahram; Faghihzadeh, Soghrat; Tabatabaee, Hamid Reza; Pourhoseingholi, Asma
2008-01-01
Quality of life study has an important role in health care especially in chronic diseases, in clinical judgment and in medical resources supplying. Statistical tools like linear regression are widely used to assess the predictors of quality of life. But when the response is not normal the results are misleading. The aim of this study is to determine the predictors of quality of life in breast cancer patients, using quantile regression model and compare to linear regression. A cross-sectional study conducted on 119 breast cancer patients that admitted and treated in chemotherapy ward of Namazi hospital in Shiraz. We used QLQ-C30 questionnaire to assessment quality of life in these patients. A quantile regression was employed to assess the assocciated factors and the results were compared to linear regression. All analysis carried out using SAS. The mean score for the global health status for breast cancer patients was 64.92+/-11.42. Linear regression showed that only grade of tumor, occupational status, menopausal status, financial difficulties and dyspnea were statistically significant. In spite of linear regression, financial difficulties were not significant in quantile regression analysis and dyspnea was only significant for first quartile. Also emotion functioning and duration of disease statistically predicted the QOL score in the third quartile. The results have demonstrated that using quantile regression leads to better interpretation and richer inference about predictors of the breast cancer patient quality of life.
A regression-kriging model for estimation of rainfall in the Laohahe basin
NASA Astrophysics Data System (ADS)
Wang, Hong; Ren, Li L.; Liu, Gao H.
2009-10-01
This paper presents a multivariate geostatistical algorithm called regression-kriging (RK) for predicting the spatial distribution of rainfall by incorporating five topographic/geographic factors of latitude, longitude, altitude, slope and aspect. The technique is illustrated using rainfall data collected at 52 rain gauges from the Laohahe basis in northeast China during 1986-2005 . Rainfall data from 44 stations were selected for modeling and the remaining 8 stations were used for model validation. To eliminate multicollinearity, the five explanatory factors were first transformed using factor analysis with three Principal Components (PCs) extracted. The rainfall data were then fitted using step-wise regression and residuals interpolated using SK. The regression coefficients were estimated by generalized least squares (GLS), which takes the spatial heteroskedasticity between rainfall and PCs into account. Finally, the rainfall prediction based on RK was compared with that predicted from ordinary kriging (OK) and ordinary least squares (OLS) multiple regression (MR). For correlated topographic factors are taken into account, RK improves the efficiency of predictions. RK achieved a lower relative root mean square error (RMSE) (44.67%) than MR (49.23%) and OK (73.60%) and a lower bias than MR and OK (23.82 versus 30.89 and 32.15 mm) for annual rainfall. It is much more effective for the wet season than for the dry season. RK is suitable for estimation of rainfall in areas where there are no stations nearby and where topography has a major influence on rainfall.
2013-01-01
Background This study aims to improve accuracy of Bioelectrical Impedance Analysis (BIA) prediction equations for estimating fat free mass (FFM) of the elderly by using non-linear Back Propagation Artificial Neural Network (BP-ANN) model and to compare the predictive accuracy with the linear regression model by using energy dual X-ray absorptiometry (DXA) as reference method. Methods A total of 88 Taiwanese elderly adults were recruited in this study as subjects. Linear regression equations and BP-ANN prediction equation were developed using impedances and other anthropometrics for predicting the reference FFM measured by DXA (FFMDXA) in 36 male and 26 female Taiwanese elderly adults. The FFM estimated by BIA prediction equations using traditional linear regression model (FFMLR) and BP-ANN model (FFMANN) were compared to the FFMDXA. The measuring results of an additional 26 elderly adults were used to validate than accuracy of the predictive models. Results The results showed the significant predictors were impedance, gender, age, height and weight in developed FFMLR linear model (LR) for predicting FFM (coefficient of determination, r2 = 0.940; standard error of estimate (SEE) = 2.729 kg; root mean square error (RMSE) = 2.571kg, P < 0.001). The above predictors were set as the variables of the input layer by using five neurons in the BP-ANN model (r2 = 0.987 with a SD = 1.192 kg and relatively lower RMSE = 1.183 kg), which had greater (improved) accuracy for estimating FFM when compared with linear model. The results showed a better agreement existed between FFMANN and FFMDXA than that between FFMLR and FFMDXA. Conclusion When compared the performance of developed prediction equations for estimating reference FFMDXA, the linear model has lower r2 with a larger SD in predictive results than that of BP-ANN model, which indicated ANN model is more suitable for estimating FFM. PMID:23388042
[Hyperspectral Estimation of Apple Tree Canopy LAI Based on SVM and RF Regression].
Han, Zhao-ying; Zhu, Xi-cun; Fang, Xian-yi; Wang, Zhuo-yuan; Wang, Ling; Zhao, Geng-Xing; Jiang, Yuan-mao
2016-03-01
Leaf area index (LAI) is the dynamic index of crop population size. Hyperspectral technology can be used to estimate apple canopy LAI rapidly and nondestructively. It can be provide a reference for monitoring the tree growing and yield estimation. The Red Fuji apple trees of full bearing fruit are the researching objects. Ninety apple trees canopies spectral reflectance and LAI values were measured by the ASD Fieldspec3 spectrometer and LAI-2200 in thirty orchards in constant two years in Qixia research area of Shandong Province. The optimal vegetation indices were selected by the method of correlation analysis of the original spectral reflectance and vegetation indices. The models of predicting the LAI were built with the multivariate regression analysis method of support vector machine (SVM) and random forest (RF). The new vegetation indices, GNDVI527, ND-VI676, RVI682, FD-NVI656 and GRVI517 and the previous two main vegetation indices, NDVI670 and NDVI705, are in accordance with LAI. In the RF regression model, the calibration set decision coefficient C-R2 of 0.920 and validation set decision coefficient V-R2 of 0.889 are higher than the SVM regression model by 0.045 and 0.033 respectively. The root mean square error of calibration set C-RMSE of 0.249, the root mean square error validation set V-RMSE of 0.236 are lower than that of the SVM regression model by 0.054 and 0.058 respectively. Relative analysis of calibrating error C-RPD and relative analysis of validation set V-RPD reached 3.363 and 2.520, 0.598 and 0.262, respectively, which were higher than the SVM regression model. The measured and predicted the scatterplot trend line slope of the calibration set and validation set C-S and V-S are close to 1. The estimation result of RF regression model is better than that of the SVM. RF regression model can be used to estimate the LAI of red Fuji apple trees in full fruit period.
Cui, Zaixu; Gong, Gaolang
2018-06-02
Individualized behavioral/cognitive prediction using machine learning (ML) regression approaches is becoming increasingly applied. The specific ML regression algorithm and sample size are two key factors that non-trivially influence prediction accuracies. However, the effects of the ML regression algorithm and sample size on individualized behavioral/cognitive prediction performance have not been comprehensively assessed. To address this issue, the present study included six commonly used ML regression algorithms: ordinary least squares (OLS) regression, least absolute shrinkage and selection operator (LASSO) regression, ridge regression, elastic-net regression, linear support vector regression (LSVR), and relevance vector regression (RVR), to perform specific behavioral/cognitive predictions based on different sample sizes. Specifically, the publicly available resting-state functional MRI (rs-fMRI) dataset from the Human Connectome Project (HCP) was used, and whole-brain resting-state functional connectivity (rsFC) or rsFC strength (rsFCS) were extracted as prediction features. Twenty-five sample sizes (ranged from 20 to 700) were studied by sub-sampling from the entire HCP cohort. The analyses showed that rsFC-based LASSO regression performed remarkably worse than the other algorithms, and rsFCS-based OLS regression performed markedly worse than the other algorithms. Regardless of the algorithm and feature type, both the prediction accuracy and its stability exponentially increased with increasing sample size. The specific patterns of the observed algorithm and sample size effects were well replicated in the prediction using re-testing fMRI data, data processed by different imaging preprocessing schemes, and different behavioral/cognitive scores, thus indicating excellent robustness/generalization of the effects. The current findings provide critical insight into how the selected ML regression algorithm and sample size influence individualized predictions of behavior/cognition and offer important guidance for choosing the ML regression algorithm or sample size in relevant investigations. Copyright © 2018 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hooman, A.; Mohammadzadeh, M
Some medical and epidemiological surveys have been designed to predict a nominal response variable with several levels. With regard to the type of pregnancy there are four possible states: wanted, unwanted by wife, unwanted by husband and unwanted by couple. In this paper, we have predicted the type of pregnancy, as well as the factors influencing it using three different models and comparing them. Regarding the type of pregnancy with several levels, we developed a multinomial logistic regression, a neural network and a flexible discrimination based on the data and compared their results using tow statistical indices: Surface under curvemore » (ROC) and kappa coefficient. Based on these tow indices, flexible discrimination proved to be a better fit for prediction on data in comparison to other methods. When the relations among variables are complex, one can use flexible discrimination instead of multinomial logistic regression and neural network to predict the nominal response variables with several levels in order to gain more accurate predictions.« less
Assessment of parametric uncertainty for groundwater reactive transport modeling,
Shi, Xiaoqing; Ye, Ming; Curtis, Gary P.; Miller, Geoffery L.; Meyer, Philip D.; Kohler, Matthias; Yabusaki, Steve; Wu, Jichun
2014-01-01
The validity of using Gaussian assumptions for model residuals in uncertainty quantification of a groundwater reactive transport model was evaluated in this study. Least squares regression methods explicitly assume Gaussian residuals, and the assumption leads to Gaussian likelihood functions, model parameters, and model predictions. While the Bayesian methods do not explicitly require the Gaussian assumption, Gaussian residuals are widely used. This paper shows that the residuals of the reactive transport model are non-Gaussian, heteroscedastic, and correlated in time; characterizing them requires using a generalized likelihood function such as the formal generalized likelihood function developed by Schoups and Vrugt (2010). For the surface complexation model considered in this study for simulating uranium reactive transport in groundwater, parametric uncertainty is quantified using the least squares regression methods and Bayesian methods with both Gaussian and formal generalized likelihood functions. While the least squares methods and Bayesian methods with Gaussian likelihood function produce similar Gaussian parameter distributions, the parameter distributions of Bayesian uncertainty quantification using the formal generalized likelihood function are non-Gaussian. In addition, predictive performance of formal generalized likelihood function is superior to that of least squares regression and Bayesian methods with Gaussian likelihood function. The Bayesian uncertainty quantification is conducted using the differential evolution adaptive metropolis (DREAM(zs)) algorithm; as a Markov chain Monte Carlo (MCMC) method, it is a robust tool for quantifying uncertainty in groundwater reactive transport models. For the surface complexation model, the regression-based local sensitivity analysis and Morris- and DREAM(ZS)-based global sensitivity analysis yield almost identical ranking of parameter importance. The uncertainty analysis may help select appropriate likelihood functions, improve model calibration, and reduce predictive uncertainty in other groundwater reactive transport and environmental modeling.
Adjustment of regional regression equations for urban storm-runoff quality using at-site data
Barks, C.S.
1996-01-01
Regional regression equations have been developed to estimate urban storm-runoff loads and mean concentrations using a national data base. Four statistical methods using at-site data to adjust the regional equation predictions were developed to provide better local estimates. The four adjustment procedures are a single-factor adjustment, a regression of the observed data against the predicted values, a regression of the observed values against the predicted values and additional local independent variables, and a weighted combination of a local regression with the regional prediction. Data collected at five representative storm-runoff sites during 22 storms in Little Rock, Arkansas, were used to verify, and, when appropriate, adjust the regional regression equation predictions. Comparison of observed values of stormrunoff loads and mean concentrations to the predicted values from the regional regression equations for nine constituents (chemical oxygen demand, suspended solids, total nitrogen as N, total ammonia plus organic nitrogen as N, total phosphorus as P, dissolved phosphorus as P, total recoverable copper, total recoverable lead, and total recoverable zinc) showed large prediction errors ranging from 63 percent to more than several thousand percent. Prediction errors for 6 of the 18 regional regression equations were less than 100 percent and could be considered reasonable for water-quality prediction equations. The regression adjustment procedure was used to adjust five of the regional equation predictions to improve the predictive accuracy. For seven of the regional equations the observed and the predicted values are not significantly correlated. Thus neither the unadjusted regional equations nor any of the adjustments were appropriate. The mean of the observed values was used as a simple estimator when the regional equation predictions and adjusted predictions were not appropriate.
NASA Astrophysics Data System (ADS)
Cai, Jun; Wang, Kuaishe; Shi, Jiamin; Wang, Wen; Liu, Yingying
2018-01-01
Constitutive analysis for hot working of BFe10-1-2 alloy was carried out by using experimental stress-strain data from isothermal hot compression tests, in a wide range of temperature of 1,023 1,273 K, and strain rate range of 0.001 10 s-1. A constitutive equation based on modified double multiple nonlinear regression was proposed considering the independent effects of strain, strain rate, temperature and their interrelation. The predicted flow stress data calculated from the developed equation was compared with the experimental data. Correlation coefficient (R), average absolute relative error (AARE) and relative errors were introduced to verify the validity of the developed constitutive equation. Subsequently, a comparative study was made on the capability of strain-compensated Arrhenius-type constitutive model. The results showed that the developed constitutive equation based on modified double multiple nonlinear regression could predict flow stress of BFe10-1-2 alloy with good correlation and generalization.
Successful Indicators Study (SIS) Methodology Report: Deviant Case Analysis Pilot.
ERIC Educational Resources Information Center
Bailey, Jerry; Hafner, Anne
A deviant case analysis pilot study analyzed California local education agency data to determine the usefulness of regression analysis in predicting change in achievement from 1984 to 1989 and identified outliers or districts that show greater achievement changes than would be expected given changed demographic conditions. This report on the…
Nomogram Prediction of Overall Survival After Curative Irradiation for Uterine Cervical Cancer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seo, YoungSeok; Yoo, Seong Yul; Kim, Mi-Sook
Purpose: The purpose of this study was to develop a nomogram capable of predicting the probability of 5-year survival after radical radiotherapy (RT) without chemotherapy for uterine cervical cancer. Methods and Materials: We retrospectively analyzed 549 patients that underwent radical RT for uterine cervical cancer between March 1994 and April 2002 at our institution. Multivariate analysis using Cox proportional hazards regression was performed and this Cox model was used as the basis for the devised nomogram. The model was internally validated for discrimination and calibration by bootstrap resampling. Results: By multivariate regression analysis, the model showed that age, hemoglobin levelmore » before RT, Federation Internationale de Gynecologie Obstetrique (FIGO) stage, maximal tumor diameter, lymph node status, and RT dose at Point A significantly predicted overall survival. The survival prediction model demonstrated good calibration and discrimination. The bootstrap-corrected concordance index was 0.67. The predictive ability of the nomogram proved to be superior to FIGO stage (p = 0.01). Conclusions: The devised nomogram offers a significantly better level of discrimination than the FIGO staging system. In particular, it improves predictions of survival probability and could be useful for counseling patients, choosing treatment modalities and schedules, and designing clinical trials. However, before this nomogram is used clinically, it should be externally validated.« less
Bian, Xihui; Li, Shujuan; Lin, Ligang; Tan, Xiaoyao; Fan, Qingjie; Li, Ming
2016-06-21
Accurate prediction of the model is fundamental to the successful analysis of complex samples. To utilize abundant information embedded over frequency and time domains, a novel regression model is presented for quantitative analysis of hydrocarbon contents in the fuel oil samples. The proposed method named as high and low frequency unfolded PLSR (HLUPLSR), which integrates empirical mode decomposition (EMD) and unfolded strategy with partial least squares regression (PLSR). In the proposed method, the original signals are firstly decomposed into a finite number of intrinsic mode functions (IMFs) and a residue by EMD. Secondly, the former high frequency IMFs are summed as a high frequency matrix and the latter IMFs and residue are summed as a low frequency matrix. Finally, the two matrices are unfolded to an extended matrix in variable dimension, and then the PLSR model is built between the extended matrix and the target values. Coupled with Ultraviolet (UV) spectroscopy, HLUPLSR has been applied to determine hydrocarbon contents of light gas oil and diesel fuels samples. Comparing with single PLSR and other signal processing techniques, the proposed method shows superiority in prediction ability and better model interpretation. Therefore, HLUPLSR method provides a promising tool for quantitative analysis of complex samples. Copyright © 2016 Elsevier B.V. All rights reserved.
QSAR Analysis of 2-Amino or 2-Methyl-1-Substituted Benzimidazoles Against Pseudomonas aeruginosa
Podunavac-Kuzmanović, Sanja O.; Cvetković, Dragoljub D.; Barna, Dijana J.
2009-01-01
A set of benzimidazole derivatives were tested for their inhibitory activities against the Gram-negative bacterium Pseudomonas aeruginosa and minimum inhibitory concentrations were determined for all the compounds. Quantitative structure activity relationship (QSAR) analysis was applied to fourteen of the abovementioned derivatives using a combination of various physicochemical, steric, electronic, and structural molecular descriptors. A multiple linear regression (MLR) procedure was used to model the relationships between molecular descriptors and the antibacterial activity of the benzimidazole derivatives. The stepwise regression method was used to derive the most significant models as a calibration model for predicting the inhibitory activity of this class of molecules. The best QSAR models were further validated by a leave one out technique as well as by the calculation of statistical parameters for the established theoretical models. To confirm the predictive power of the models, an external set of molecules was used. High agreement between experimental and predicted inhibitory values, obtained in the validation procedure, indicated the good quality of the derived QSAR models. PMID:19468332
NASA Astrophysics Data System (ADS)
de Andrés, Javier; Landajo, Manuel; Lorca, Pedro; Labra, Jose; Ordóñez, Patricia
Artificial neural networks have proven to be useful tools for solving financial analysis problems such as financial distress prediction and audit risk assessment. In this paper we focus on the performance of robust (least absolute deviation-based) neural networks on measuring liquidity of firms. The problem of learning the bivariate relationship between the components (namely, current liabilities and current assets) of the so-called current ratio is analyzed, and the predictive performance of several modelling paradigms (namely, linear and log-linear regressions, classical ratios and neural networks) is compared. An empirical analysis is conducted on a representative data base from the Spanish economy. Results indicate that classical ratio models are largely inadequate as a realistic description of the studied relationship, especially when used for predictive purposes. In a number of cases, especially when the analyzed firms are microenterprises, the linear specification is improved by considering the flexible non-linear structures provided by neural networks.
Relationship of physical activity to fundamental movement skills among adolescents.
Okely, A D; Booth, M L; Patterson, J W
2001-11-01
To determine the relationship of participation in organized and nonorganized physical activity with fundamental movement skills among adolescents. Male and female children in Grade 8 (mean age, 13.3 yr) and Grade 10 (mean age, 15.3 yr) were assessed on six fundamental movement skills (run, vertical jump, catch, overhand throw, forehand strike, and kick). Physical activity was assessed using a self-report recall measure where students reported the type, duration, and frequency of participation in organized physical activity and nonorganized physical activity during a usual week. Multiple regression analysis indicated that fundamental movement skills significantly predicted time in organized physical activity, although the percentage of variance it could explain was small. This prediction was stronger for girls than for boys. Multiple regression analysis showed no relationship between time in nonorganized physical activity and fundamental movement skills. Fundamental movement skills are significantly associated with adolescents' participation in organized physical activity, but predict only a small portion of it.
Applicability of the Tanaka-Johnston and Moyers mixed dentition analyses in Northeast Han Chinese.
Sherpa, Jangbu; Sah, Gopal; Rong, Zeng; Wu, Lipeng
2015-06-01
To assess applicability of the Tanaka-Johnston and Moyers prediction methods in a Han ethnic group from Northeast China and to develop prediction equations for this same population. Cross-sectional study. Department of Orthodontics, School of Stomatology, Jiamusi University, Heilongjiang, China. A total of 130 subjects (65 male and 65 female) aged 16-21 years from a Han ethnic group of Northeast China were recruited from dental students and patients seeking orthodontic treatment. Ethnicity was verified by questionnaire. Mesio-distal tooth width was measured using Digital Vernier calipers. Predicted values were obtained from the Tanaka-Johnston and Moyers methods in both arches were compared with the actual measured widths. Based on regression analysis, prediction equations were developed. Tanaka-Johnston equations were not precise, except for the upper arch in males. However, the Moyers 85th percentile in the upper arch and 75th percentile in the lower arch predicted the sum precisely in males. For females, the Moyers 75th percentile predicted the sum precisely for the upper arch, but none of the Moyers percentiles predicted in the lower arch. Both the Tanaka-Johnston and Moyers method may not be applied universally without question. Hence, it may be safer to develop regression equations for specific populations. Validating studies must be conducted to confirm the precision of these newly developed regression equations.
Glass, Lisa M; Dickson, Rolland C; Anderson, Joseph C; Suriawinata, Arief A; Putra, Juan; Berk, Brian S; Toor, Arifa
2015-04-01
Given the rising epidemics of obesity and metabolic syndrome, nonalcoholic steatohepatitis (NASH) is now the most common cause of liver disease in the developed world. Effective treatment for NASH, either to reverse or prevent the progression of hepatic fibrosis, is currently lacking. To define the predictors associated with improved hepatic fibrosis in NASH patients undergoing serial liver biopsies at prolonged biopsy interval. This is a cohort study of 45 NASH patients undergoing serial liver biopsies for clinical monitoring in a tertiary care setting. Biopsies were scored using the NASH Clinical Research Network guidelines. Fibrosis regression was defined as improvement in fibrosis score ≥1 stage. Univariate analysis utilized Fisher's exact or Student's t test. Multivariate regression models determined independent predictors for regression of fibrosis. Forty-five NASH patients with biopsies collected at a mean interval of 4.6 years (±1.4) were included. The mean initial fibrosis stage was 1.96, two patients had cirrhosis and 12 patients (26.7 %) underwent bariatric surgery. There was a significantly higher rate of fibrosis regression among patients who lost ≥10 % total body weight (TBW) (63.2 vs. 9.1 %; p = 0.001) and who underwent bariatric surgery (47.4 vs. 4.5 %; p = 0.003). Factors such as age, gender, glucose intolerance, elevated ferritin, and A1AT heterozygosity did not influence fibrosis regression. On multivariate analysis, only weight loss of ≥10 % TBW predicted fibrosis regression [OR 8.14 (CI 1.08-61.17)]. Results indicate that regression of fibrosis in NASH is possible, even in advanced stages. Weight loss of ≥10 % TBW predicts fibrosis regression.
Mercedes Berterretche; Andrew T. Hudak; Warren B. Cohen; Thomas K. Maiersperger; Stith T. Gower; Jennifer Dungan
2005-01-01
This study compared aspatial and spatial methods of using remote sensing and field data to predict maximum growing season leaf area index (LAI) maps in a boreal forest in Manitoba, Canada. The methods tested were orthogonal regression analysis (reduced major axis, RMA) and two geostatistical techniques: kriging with an external drift (KED) and sequential Gaussian...
L.R. Iverson; A.M. Prasad; A. Liaw
2004-01-01
More and better machine learning tools are becoming available for landscape ecologists to aid in understanding species-environment relationships and to map probable species occurrence now and potentially into the future. To thal end, we evaluated three statistical models: Regression Tree Analybib (RTA), Bagging Trees (BT) and Random Forest (RF) for their utility in...
Equations for predicting biomass in 2- to 6-year-old Eucalyptus saligna in Hawaii
Craig D. Whitesell; Susan C. Miyasaka; Robert F. Strand; Thomas H. Schubert; Katharine E. McDuffie
1988-01-01
Eucalyptus saligna trees grown in short-rotation plantations on the island of Hawaii were measured, harvested, and weighed to provide data for developing regression equations using non-destructive stand measurements. Regression analysis of the data from 190 trees in the 2.0- to 3.5-year range and 96 trees in the 4- to 6-year range related stem-only...
ERIC Educational Resources Information Center
Kitsantas, Anastasia; Kitsantas, Panagiota; Kitsantas, Thomas
2012-01-01
The purpose of this exploratory study was to assess the relative importance of a number of variables in predicting students' interest in math and/or computer science. Classification and regression trees (CART) were employed in the analysis of survey data collected from 276 college students enrolled in two U.S. and Greek universities. The results…
Stapel, Sandra N; Looijaard, Wilhelmus G P M; Dekker, Ingeborg M; Girbes, Armand R J; Weijs, Peter J M; Oudemans-van Straaten, Heleen M
2018-05-11
A low bioelectrical impedance analysis (BIA)-derived phase angle (PA) predicts morbidity and mortality in different patient groups. An association between PA and long-term mortality in ICU patients has not been demonstrated before. The purpose of the present study was to determine whether PA on ICU admission independently predicts 90-day mortality. This prospective observational study was performed in a mixed university ICU. BIA was performed in 196 patients within 24 h of ICU admission. To test the independent association between PA and 90-day mortality, logistic regression analysis was performed using the APACHE IV predicted mortality as confounder. The optimal cutoff value of PA for mortality prediction was determined by ROC curve analysis. Using this cutoff value, patients were categorized into low or normal PA group and the association with 90-day mortality was tested again. The PA of survivors was higher than of the non-survivors (5.0° ± 1.3° vs. 4.1° ± 1.2°, p < 0.001). The area under the ROC curve of PA for 90-day mortality was 0.70 (CI 0.59-0.80). PA was associated with 90-day mortality (OR = 0.56, CI: 0.38-0.77, p = 0.001) on univariate logistic regression analysis and also after adjusting for BMI, gender, age, and APACHE IV on multivariable logistic regression (OR = 0.65, CI: 0.44-0.96, p = 0.031). A PA < 4.8° was an independent predictor of 90-day mortality (adjusted OR = 3.65, CI: 1.34-9.93, p = 0.011). Phase angle at ICU admission is an independent predictor of 90-day mortality. This biological marker can aid in long-term mortality risk assessment of critically ill patients.
Eken, Cenker; Bilge, Ugur; Kartal, Mutlu; Eray, Oktay
2009-06-03
Logistic regression is the most common statistical model for processing multivariate data in the medical literature. Artificial intelligence models like an artificial neural network (ANN) and genetic algorithm (GA) may also be useful to interpret medical data. The purpose of this study was to perform artificial intelligence models on a medical data sheet and compare to logistic regression. ANN, GA, and logistic regression analysis were carried out on a data sheet of a previously published article regarding patients presenting to an emergency department with flank pain suspicious for renal colic. The study population was composed of 227 patients: 176 patients had a diagnosis of urinary stone, while 51 ultimately had no calculus. The GA found two decision rules in predicting urinary stones. Rule 1 consisted of being male, pain not spreading to back, and no fever. In rule 2, pelvicaliceal dilatation on bedside ultrasonography replaced no fever. ANN, GA rule 1, GA rule 2, and logistic regression had a sensitivity of 94.9, 67.6, 56.8, and 95.5%, a specificity of 78.4, 76.47, 86.3, and 47.1%, a positive likelihood ratio of 4.4, 2.9, 4.1, and 1.8, and a negative likelihood ratio of 0.06, 0.42, 0.5, and 0.09, respectively. The area under the curve was found to be 0.867, 0.720, 0.715, and 0.713 for all applications, respectively. Data mining techniques such as ANN and GA can be used for predicting renal colic in emergency settings and to constitute clinical decision rules. They may be an alternative to conventional multivariate analysis applications used in biostatistics.
Prediction equations for maximal respiratory pressures of Brazilian adolescents.
Mendes, Raquel E F; Campos, Tania F; Macêdo, Thalita M F; Borja, Raíssa O; Parreira, Verônica F; Mendonça, Karla M P P
2013-01-01
The literature emphasizes the need for studies to provide reference values and equations able to predict respiratory muscle strength of Brazilian subjects at different ages and from different regions of Brazil. To develop prediction equations for maximal respiratory pressures (MRP) of Brazilian adolescents. In total, 182 healthy adolescents (98 boys and 84 girls) aged between 12 and 18 years, enrolled in public and private schools in the city of Natal-RN, were evaluated using an MVD300 digital manometer (Globalmed®) according to a standardized protocol. Statistical analysis was performed using SPSS Statistics 17.0 software, with a significance level of 5%. Data normality was verified using the Kolmogorov-Smirnov test, and descriptive analysis results were expressed as the mean and standard deviation. To verify the correlation between the MRP and the independent variables (age, weight, height and sex), the Pearson correlation test was used. To obtain the prediction equations, stepwise multiple linear regression was used. The variables height, weight and sex were correlated to MRP. However, weight and sex explained part of the variability of MRP, and the regression analysis in this study indicated that these variables contributed significantly in predicting maximal inspiratory pressure, and only sex contributed significantly to maximal expiratory pressure. This study provides reference values and two models of prediction equations for maximal inspiratory and expiratory pressures and sets the necessary normal lower limits for the assessment of the respiratory muscle strength of Brazilian adolescents.
Allyn, Jérôme; Allou, Nicolas; Augustin, Pascal; Philip, Ivan; Martinet, Olivier; Belghiti, Myriem; Provenchere, Sophie; Montravers, Philippe; Ferdynus, Cyril
2017-01-01
The benefits of cardiac surgery are sometimes difficult to predict and the decision to operate on a given individual is complex. Machine Learning and Decision Curve Analysis (DCA) are recent methods developed to create and evaluate prediction models. We conducted a retrospective cohort study using a prospective collected database from December 2005 to December 2012, from a cardiac surgical center at University Hospital. The different models of prediction of mortality in-hospital after elective cardiac surgery, including EuroSCORE II, a logistic regression model and a machine learning model, were compared by ROC and DCA. Of the 6,520 patients having elective cardiac surgery with cardiopulmonary bypass, 6.3% died. Mean age was 63.4 years old (standard deviation 14.4), and mean EuroSCORE II was 3.7 (4.8) %. The area under ROC curve (IC95%) for the machine learning model (0.795 (0.755-0.834)) was significantly higher than EuroSCORE II or the logistic regression model (respectively, 0.737 (0.691-0.783) and 0.742 (0.698-0.785), p < 0.0001). Decision Curve Analysis showed that the machine learning model, in this monocentric study, has a greater benefit whatever the probability threshold. According to ROC and DCA, machine learning model is more accurate in predicting mortality after elective cardiac surgery than EuroSCORE II. These results confirm the use of machine learning methods in the field of medical prediction.
NASA Astrophysics Data System (ADS)
Wang, Yunzhi; Qiu, Yuchen; Thai, Theresa; More, Kathleen; Ding, Kai; Liu, Hong; Zheng, Bin
2016-03-01
How to rationally identify epithelial ovarian cancer (EOC) patients who will benefit from bevacizumab or other antiangiogenic therapies is a critical issue in EOC treatments. The motivation of this study is to quantitatively measure adiposity features from CT images and investigate the feasibility of predicting potential benefit of EOC patients with or without receiving bevacizumab-based chemotherapy treatment using multivariate statistical models built based on quantitative adiposity image features. A dataset involving CT images from 59 advanced EOC patients were included. Among them, 32 patients received maintenance bevacizumab after primary chemotherapy and the remaining 27 patients did not. We developed a computer-aided detection (CAD) scheme to automatically segment subcutaneous fat areas (VFA) and visceral fat areas (SFA) and then extracted 7 adiposity-related quantitative features. Three multivariate data analysis models (linear regression, logistic regression and Cox proportional hazards regression) were performed respectively to investigate the potential association between the model-generated prediction results and the patients' progression-free survival (PFS) and overall survival (OS). The results show that using all 3 statistical models, a statistically significant association was detected between the model-generated results and both of the two clinical outcomes in the group of patients receiving maintenance bevacizumab (p<0.01), while there were no significant association for both PFS and OS in the group of patients without receiving maintenance bevacizumab. Therefore, this study demonstrated the feasibility of using quantitative adiposity-related CT image features based statistical prediction models to generate a new clinical marker and predict the clinical outcome of EOC patients receiving maintenance bevacizumab-based chemotherapy.
Simms, Laura E.; Engebretson, Mark J.; Pilipenko, Viacheslav; ...
2016-04-07
The daily maximum relativistic electron flux at geostationary orbit can be predicted well with a set of daily averaged predictor variables including previous day's flux, seed electron flux, solar wind velocity and number density, AE index, IMF Bz, Dst, and ULF and VLF wave power. As predictor variables are intercorrelated, we used multiple regression analyses to determine which are the most predictive of flux when other variables are controlled. Empirical models produced from regressions of flux on measured predictors from 1 day previous were reasonably effective at predicting novel observations. Adding previous flux to the parameter set improves the predictionmore » of the peak of the increases but delays its anticipation of an event. Previous day's solar wind number density and velocity, AE index, and ULF wave activity are the most significant explanatory variables; however, the AE index, measuring substorm processes, shows a negative correlation with flux when other parameters are controlled. This may be due to the triggering of electromagnetic ion cyclotron waves by substorms that cause electron precipitation. VLF waves show lower, but significant, influence. The combined effect of ULF and VLF waves shows a synergistic interaction, where each increases the influence of the other on flux enhancement. Correlations between observations and predictions for this 1 day lag model ranged from 0.71 to 0.89 (average: 0.78). Furthermore, a path analysis of correlations between predictors suggests that solar wind and IMF parameters affect flux through intermediate processes such as ring current ( Dst), AE, and wave activity.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Simms, Laura E.; Engebretson, Mark J.; Pilipenko, Viacheslav
The daily maximum relativistic electron flux at geostationary orbit can be predicted well with a set of daily averaged predictor variables including previous day's flux, seed electron flux, solar wind velocity and number density, AE index, IMF Bz, Dst, and ULF and VLF wave power. As predictor variables are intercorrelated, we used multiple regression analyses to determine which are the most predictive of flux when other variables are controlled. Empirical models produced from regressions of flux on measured predictors from 1 day previous were reasonably effective at predicting novel observations. Adding previous flux to the parameter set improves the predictionmore » of the peak of the increases but delays its anticipation of an event. Previous day's solar wind number density and velocity, AE index, and ULF wave activity are the most significant explanatory variables; however, the AE index, measuring substorm processes, shows a negative correlation with flux when other parameters are controlled. This may be due to the triggering of electromagnetic ion cyclotron waves by substorms that cause electron precipitation. VLF waves show lower, but significant, influence. The combined effect of ULF and VLF waves shows a synergistic interaction, where each increases the influence of the other on flux enhancement. Correlations between observations and predictions for this 1 day lag model ranged from 0.71 to 0.89 (average: 0.78). Furthermore, a path analysis of correlations between predictors suggests that solar wind and IMF parameters affect flux through intermediate processes such as ring current ( Dst), AE, and wave activity.« less
Managing Salary Equity. AIR Forum 1981 Paper.
ERIC Educational Resources Information Center
Prather, James E.; Posey, Ellen I.
Technical considerations in the development of a salary equity model based upon regression analysis are reviewed, and a simplified salary prediction equation is examined. Application and communication of the results of the analysis within the existing operational context of a postsecondary institution are also addressed. The literature is…
Amit, N; Ibrahim, N; Aga Mohd Jaladin, R; Che Din, N
2017-10-01
This research examined the predicting roles of reasons for living and social support on depression, anxiety and stress in Malaysia. This research was carried out on a sample of 263 participants (age range 12-24 years old), from Klang Valley, Selangor. The survey package comprises demographic information, a measure of reasons for living, social support, depression, anxiety and stress. To analyse the data, correlation analysis and a series of linear multiple regression analysis were carried out. Findings showed that there were low negative relationships between all subdomains and the total score of reasons for living and depression. There were also low negative relationships between domain-specific of social support (family and friends) and total social support and depression. In terms of the family alliance, self-acceptance and total score of reasons for living, they were negatively associated with anxiety, whereas family social support was negatively associated with stress. The linear regression analysis showed that only future optimism and family social support found to be the significant predictors for depression. Family alliance and total reasons for living were significant in predicting anxiety, whereas family social support was significant in predicting stress. These findings have the potential to promote awareness related to depression, anxiety, and stress among youth in Malaysia.
Sumiyoshi, Chika; Uetsuki, Miki; Suga, Motomu; Kasai, Kiyoto; Sumiyoshi, Tomiki
2013-12-30
Short forms (SF) of the Wechsler Intelligence Scale have been developed to enhance its practicality. However, only a few studies have addressed the Wechsler Intelligence Scale Revised (WAIS-R) SFs based on data from patients with schizophrenia. The current study was conducted to develop the WAIS-R SFs for these patients based on the intelligence structure and predictability of the Full IQ (FIQ). Relations to demographic and clinical variables were also examined on selecting plausible subtests. The WAIS-R was administered to 90 Japanese patients with schizophrenia. Exploratory factor analysis (EFA) and multiple regression analysis were conducted to find potential subtests. EFA extracted two dominant factors corresponding to Verbal IQ and Performance IQ measures. Subtests with higher factor loadings on those factors were initially nominated. Regression analysis was carried out to reach the model containing all the nominated subtests. The optimality of the potential subtests included in that model was evaluated from the perspectives of the representativeness of intelligence structure, FIQ predictability, and the relation with demographic and clinical variables. Taken together, the dyad of Vocabulary and Block Design was considered to be the most optimal WAIS-R SF for patients with schizophrenia, reflecting both intelligence structure and FIQ predictability. © 2013 Elsevier Ireland Ltd. All rights reserved.
Machine learning of swimming data via wisdom of crowd and regression analysis.
Xie, Jiang; Xu, Junfu; Nie, Celine; Nie, Qing
2017-04-01
Every performance, in an officially sanctioned meet, by a registered USA swimmer is recorded into an online database with times dating back to 1980. For the first time, statistical analysis and machine learning methods are systematically applied to 4,022,631 swim records. In this study, we investigate performance features for all strokes as a function of age and gender. The variances in performance of males and females for different ages and strokes were studied, and the correlations of performances for different ages were estimated using the Pearson correlation. Regression analysis show the performance trends for both males and females at different ages and suggest critical ages for peak training. Moreover, we assess twelve popular machine learning methods to predict or classify swimmer performance. Each method exhibited different strengths or weaknesses in different cases, indicating no one method could predict well for all strokes. To address this problem, we propose a new method by combining multiple inference methods to derive Wisdom of Crowd Classifier (WoCC). Our simulation experiments demonstrate that the WoCC is a consistent method with better overall prediction accuracy. Our study reveals several new age-dependent trends in swimming and provides an accurate method for classifying and predicting swimming times.
Regression Model Term Selection for the Analysis of Strain-Gage Balance Calibration Data
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert Manfred; Volden, Thomas R.
2010-01-01
The paper discusses the selection of regression model terms for the analysis of wind tunnel strain-gage balance calibration data. Different function class combinations are presented that may be used to analyze calibration data using either a non-iterative or an iterative method. The role of the intercept term in a regression model of calibration data is reviewed. In addition, useful algorithms and metrics originating from linear algebra and statistics are recommended that will help an analyst (i) to identify and avoid both linear and near-linear dependencies between regression model terms and (ii) to make sure that the selected regression model of the calibration data uses only statistically significant terms. Three different tests are suggested that may be used to objectively assess the predictive capability of the final regression model of the calibration data. These tests use both the original data points and regression model independent confirmation points. Finally, data from a simplified manual calibration of the Ames MK40 balance is used to illustrate the application of some of the metrics and tests to a realistic calibration data set.
Functional mixture regression.
Yao, Fang; Fu, Yuejiao; Lee, Thomas C M
2011-04-01
In functional linear models (FLMs), the relationship between the scalar response and the functional predictor process is often assumed to be identical for all subjects. Motivated by both practical and methodological considerations, we relax this assumption and propose a new class of functional regression models that allow the regression structure to vary for different groups of subjects. By projecting the predictor process onto its eigenspace, the new functional regression model is simplified to a framework that is similar to classical mixture regression models. This leads to the proposed approach named as functional mixture regression (FMR). The estimation of FMR can be readily carried out using existing software implemented for functional principal component analysis and mixture regression. The practical necessity and performance of FMR are illustrated through applications to a longevity analysis of female medflies and a human growth study. Theoretical investigations concerning the consistent estimation and prediction properties of FMR along with simulation experiments illustrating its empirical properties are presented in the supplementary material available at Biostatistics online. Corresponding results demonstrate that the proposed approach could potentially achieve substantial gains over traditional FLMs.
Stuart P. Cottrell; Alan R. Graefe
1995-01-01
This paper examines predictors of boater behavior in a specific behavior situation, namely the percentage of raw sewage discharged from recreational vessels in a sanitation pumpout facility on the Chesapeake Bay. Results of a multiple regression analysis show knowledge predicts behavior in specific issue situations. In addition, the more specific the...
ERIC Educational Resources Information Center
Pryor, Brandt W.
1990-01-01
To test the predictive utility of the theory of reasoned action, 110 oral surgeons completed a questionnaire regarding participation in continuing education. Multiple regression analysis showed that the theory accounted for over 41 percent of variance in intention to participate. Intention appeared controlled by attitude, determined by strength of…
ERIC Educational Resources Information Center
Greer, Wil
2013-01-01
This study identified the variables associated with data-driven instruction (DDI) that are perceived to best predict student achievement. Of the DDI variables discussed in the literature, 51 of them had a sufficient enough research base to warrant statistical analysis. Of them, 26 were statistically significant. Multiple regression and an…
ERIC Educational Resources Information Center
Ilhan, Tahsin
2012-01-01
This study examined the predictive power of sex roles and attachment styles on loneliness. A total of 188 undergraduate students (114 female, and 74 male) from Gazi University completed the Bem Sex Role Inventory, UCLA Loneliness Scale, and Relationship Scales Questionnaire. Hierarchic Multiple Regression analysis and t-test were used to test…
Accounting for Slipping and Other False Negatives in Logistic Models of Student Learning
ERIC Educational Resources Information Center
MacLellan, Christopher J.; Liu, Ran; Koedinger, Kenneth R.
2015-01-01
Additive Factors Model (AFM) and Performance Factors Analysis (PFA) are two popular models of student learning that employ logistic regression to estimate parameters and predict performance. This is in contrast to Bayesian Knowledge Tracing (BKT) which uses a Hidden Markov Model formalism. While all three models tend to make similar predictions,…
Hoffman, Haydn; Lee, Sunghoon I; Garst, Jordan H; Lu, Derek S; Li, Charles H; Nagasawa, Daniel T; Ghalehsari, Nima; Jahanforouz, Nima; Razaghy, Mehrdad; Espinal, Marie; Ghavamrezaii, Amir; Paak, Brian H; Wu, Irene; Sarrafzadeh, Majid; Lu, Daniel C
2015-09-01
This study introduces the use of multivariate linear regression (MLR) and support vector regression (SVR) models to predict postoperative outcomes in a cohort of patients who underwent surgery for cervical spondylotic myelopathy (CSM). Currently, predicting outcomes after surgery for CSM remains a challenge. We recruited patients who had a diagnosis of CSM and required decompressive surgery with or without fusion. Fine motor function was tested preoperatively and postoperatively with a handgrip-based tracking device that has been previously validated, yielding mean absolute accuracy (MAA) results for two tracking tasks (sinusoidal and step). All patients completed Oswestry disability index (ODI) and modified Japanese Orthopaedic Association questionnaires preoperatively and postoperatively. Preoperative data was utilized in MLR and SVR models to predict postoperative ODI. Predictions were compared to the actual ODI scores with the coefficient of determination (R(2)) and mean absolute difference (MAD). From this, 20 patients met the inclusion criteria and completed follow-up at least 3 months after surgery. With the MLR model, a combination of the preoperative ODI score, preoperative MAA (step function), and symptom duration yielded the best prediction of postoperative ODI (R(2)=0.452; MAD=0.0887; p=1.17 × 10(-3)). With the SVR model, a combination of preoperative ODI score, preoperative MAA (sinusoidal function), and symptom duration yielded the best prediction of postoperative ODI (R(2)=0.932; MAD=0.0283; p=5.73 × 10(-12)). The SVR model was more accurate than the MLR model. The SVR can be used preoperatively in risk/benefit analysis and the decision to operate. Copyright © 2015 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Keat, Sim Chong; Chun, Beh Boon; San, Lim Hwee; Jafri, Mohd Zubir Mat
2015-04-01
Climate change due to carbon dioxide (CO2) emissions is one of the most complex challenges threatening our planet. This issue considered as a great and international concern that primary attributed from different fossil fuels. In this paper, regression model is used for analyzing the causal relationship among CO2 emissions based on the energy consumption in Malaysia using time series data for the period of 1980-2010. The equations were developed using regression model based on the eight major sources that contribute to the CO2 emissions such as non energy, Liquefied Petroleum Gas (LPG), diesel, kerosene, refinery gas, Aviation Turbine Fuel (ATF) and Aviation Gasoline (AV Gas), fuel oil and motor petrol. The related data partly used for predict the regression model (1980-2000) and partly used for validate the regression model (2001-2010). The results of the prediction model with the measured data showed a high correlation coefficient (R2=0.9544), indicating the model's accuracy and efficiency. These results are accurate and can be used in early warning of the population to comply with air quality standards.
Fayed, Nirmeen; Mourad, Wessam; Yassen, Khaled; Görlinger, Klaus
2015-03-01
The ability to predict transfusion requirements may improve perioperative bleeding management as an integral part of a patient blood management program. Therefore, the aim of our study was to evaluate preoperative thromboelastometry as a predictor of transfusion requirements for adult living donor liver transplant recipients. The correlation between preoperative thromboelastometry variables in 100 adult living donor liver transplant recipients and intraoperative blood transfusion requirements was examined by univariate and multivariate linear regression analysis. Thresholds of thromboelastometric parameters for prediction of packed red blood cells (PRBCs), fresh frozen plasma (FFP), platelets, and cryoprecipitate transfusion requirements were determined with receiver operating characteristics analysis. The attending anesthetists were blinded to the preoperative thromboelastometric analysis. However, a thromboelastometry-guided transfusion algorithm with predefined trigger values was used intraoperatively. The transfusion triggers in this algorithm did not change during the study period. Univariate analysis confirmed significant correlations between PRBCs, FFP, platelets or cryoprecipitate transfusion requirements and most thromboelastometric variables. Backward stepwise logistic regression indicated that EXTEM coagulation time (CT), maximum clot firmness (MCF) and INTEM CT, clot formation time (CFT) and MCF are independent predictors for PRBC transfusion. EXTEM CT, CFT and FIBTEM MCF are independent predictors for FFP transfusion. Only EXTEM and INTEM MCF were independent predictors of platelet transfusion. EXTEM CFT and MCF, INTEM CT, CFT and MCF as well as FIBTEM MCF are independent predictors for cryoprecipitate transfusion. Thromboelastometry-based regression equation accounted for 63% of PRBC, 83% of FFP, 61% of cryoprecipitate, and 44% of platelet transfusion requirements. Preoperative thromboelastometric analysis is helpful to predict transfusion requirements in adult living donor liver transplant recipients. This may allow for better preparation and less cross-matching prior to surgery. The findings of our study need to be re-validated in a second prospective patient population.
Stewart, James A.; Kohnert, Aaron A.; Capolungo, Laurent; ...
2018-03-06
The complexity of radiation effects in a material’s microstructure makes developing predictive models a difficult task. In principle, a complete list of all possible reactions between defect species being considered can be used to elucidate damage evolution mechanisms and its associated impact on microstructure evolution. However, a central limitation is that many models use a limited and incomplete catalog of defect energetics and associated reactions. Even for a given model, estimating its input parameters remains a challenge, especially for complex material systems. Here, we present a computational analysis to identify the extent to which defect accumulation, energetics, and irradiation conditionsmore » can be determined via forward and reverse regression models constructed and trained from large data sets produced by cluster dynamics simulations. A global sensitivity analysis, via Sobol’ indices, concisely characterizes parameter sensitivity and demonstrates how this can be connected to variability in defect evolution. Based on this analysis and depending on the definition of what constitutes the input and output spaces, forward and reverse regression models are constructed and allow for the direct calculation of defect accumulation, defect energetics, and irradiation conditions. Here, this computational analysis, exercised on a simplified cluster dynamics model, demonstrates the ability to design predictive surrogate and reduced-order models, and provides guidelines for improving model predictions within the context of forward and reverse engineering of mathematical models for radiation effects in a materials’ microstructure.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stewart, James A.; Kohnert, Aaron A.; Capolungo, Laurent
The complexity of radiation effects in a material’s microstructure makes developing predictive models a difficult task. In principle, a complete list of all possible reactions between defect species being considered can be used to elucidate damage evolution mechanisms and its associated impact on microstructure evolution. However, a central limitation is that many models use a limited and incomplete catalog of defect energetics and associated reactions. Even for a given model, estimating its input parameters remains a challenge, especially for complex material systems. Here, we present a computational analysis to identify the extent to which defect accumulation, energetics, and irradiation conditionsmore » can be determined via forward and reverse regression models constructed and trained from large data sets produced by cluster dynamics simulations. A global sensitivity analysis, via Sobol’ indices, concisely characterizes parameter sensitivity and demonstrates how this can be connected to variability in defect evolution. Based on this analysis and depending on the definition of what constitutes the input and output spaces, forward and reverse regression models are constructed and allow for the direct calculation of defect accumulation, defect energetics, and irradiation conditions. Here, this computational analysis, exercised on a simplified cluster dynamics model, demonstrates the ability to design predictive surrogate and reduced-order models, and provides guidelines for improving model predictions within the context of forward and reverse engineering of mathematical models for radiation effects in a materials’ microstructure.« less
IL-8 predicts pediatric oncology patients with febrile neutropenia at low risk for bacteremia.
Cost, Carrye R; Stegner, Martha M; Leonard, David; Leavey, Patrick
2013-04-01
Despite a low bacteremia rate, pediatric oncology patients are frequently admitted for febrile neutropenia. A pediatric risk prediction model with high sensitivity to identify patients at low risk for bacteremia is not available. We performed a single-institution prospective cohort study of pediatric oncology patients with febrile neutropenia to create a risk prediction model using clinical factors, respiratory viral infection, and cytokine expression. Pediatric oncology patients with febrile neutropenia were enrolled between March 30, 2010 and April 1, 2011 and managed per institutional protocol. Blood samples for C-reactive protein and cytokine expression and nasopharyngeal swabs for respiratory viral testing were obtained. Medical records were reviewed for clinical data. Statistical analysis utilized mixed multiple logistic regression modeling. During the 12-month period, 195 febrile neutropenia episodes were enrolled. There were 24 (12%) episodes of bacteremia. Univariate analysis revealed several factors predictive for bacteremia, and interleukin (IL)-8 was the most predictive variable in the multivariate stepwise logistic regression. Low serum IL-8 predicted patients at low risk for bacteremia with a sensitivity of 0.9 and negative predictive value of 0.98. IL-8 is a highly sensitive predictor for patients at low risk for bacteremia. IL-8 should be utilized in a multi-institution prospective trial to assign risk stratification to pediatric patients admitted with febrile neutropenia.
Comparative decision models for anticipating shortage of food grain production in India
NASA Astrophysics Data System (ADS)
Chattopadhyay, Manojit; Mitra, Subrata Kumar
2018-01-01
This paper attempts to predict food shortages in advance from the analysis of rainfall during the monsoon months along with other inputs used for crop production, such as land used for cereal production, percentage of area covered under irrigation and fertiliser use. We used six binary classification data mining models viz., logistic regression, Multilayer Perceptron, kernel lab-Support Vector Machines, linear discriminant analysis, quadratic discriminant analysis and k-Nearest Neighbors Network, and found that linear discriminant analysis and kernel lab-Support Vector Machines are equally suitable for predicting per capita food shortage with 89.69 % accuracy in overall prediction and 92.06 % accuracy in predicting food shortage ( true negative rate). Advance information of food shortage can help policy makers to take remedial measures in order to prevent devastating consequences arising out of food non-availability.
Hirai, Toshinori; Itoh, Toshimasa; Kimura, Toshimi; Echizen, Hirotoshi
2018-06-06
Febuxostat is an active xanthine oxidase (XO) inhibitor that is widely used in the hyperuricemia treatment. We aimed to evaluate the predictive performance of a pharmacokinetic-pharmacodynamic (PK-PD) model for hypouricemic effects of febuxostat. Previously, we have formulated a PK--PD model for predicting hypouricemic effects of febuxostat as a function of baseline serum urate levels, body weight, renal function, and drug dose using datasets reported in preapproval studies (Hirai T et al., Biol Pharm Bull 2016; 39: 1013-21). Using an updated model with sensitivity analysis, we examined the predictive performance of the PK-PD model using datasets obtained from the medical records of patients who received febuxostat from March 2011 to December 2015 at Tokyo Women's Medical University Hospital. Multivariate regression analysis was performed to explore clinical variables to improve the predictive performance of the model. A total of 1,199 serum urate data were retrieved from 168 patients (age: 60.5 ±17.7 years, 71.4% males) who received febuxostat as hyperuricemia treatment. There was a significant correlation (r=0.68, p<0.01) between serum urate levels observed and those predicted by the modified PK-PD model. A multivariate regression analysis revealed that the predictive performance of the model may be improved further by considering comorbidities, such as diabetes mellitus, estimated glomerular filtration rate (eGFR), and co-administration of loop diuretics (r = 0.77, p<0.01). The PK-PD model may be useful for predicting individualized maintenance doses of febuxostat in real-world patients. This article is protected by copyright. All rights reserved.
Nakamura, Kengo; Yasutaka, Tetsuo; Kuwatani, Tatsu; Komai, Takeshi
2017-11-01
In this study, we applied sparse multiple linear regression (SMLR) analysis to clarify the relationships between soil properties and adsorption characteristics for a range of soils across Japan and identify easily-obtained physical and chemical soil properties that could be used to predict K and n values of cadmium, lead and fluorine. A model was first constructed that can easily predict the K and n values from nine soil parameters (pH, cation exchange capacity, specific surface area, total carbon, soil organic matter from loss on ignition and water holding capacity, the ratio of sand, silt and clay). The K and n values of cadmium, lead and fluorine of 17 soil samples were used to verify the SMLR models by the root mean square error values obtained from 512 combinations of soil parameters. The SMLR analysis indicated that fluorine adsorption to soil may be associated with organic matter, whereas cadmium or lead adsorption to soil is more likely to be influenced by soil pH, IL. We found that an accurate K value can be predicted from more than three soil parameters for most soils. Approximately 65% of the predicted values were between 33 and 300% of their measured values for the K value; 76% of the predicted values were within ±30% of their measured values for the n value. Our findings suggest that adsorption properties of lead, cadmium and fluorine to soil can be predicted from the soil physical and chemical properties using the presented models. Copyright © 2017 Elsevier Ltd. All rights reserved.
Linear and nonlinear models for predicting fish bioconcentration factors for pesticides.
Yuan, Jintao; Xie, Chun; Zhang, Ting; Sun, Jinfang; Yuan, Xuejie; Yu, Shuling; Zhang, Yingbiao; Cao, Yunyuan; Yu, Xingchen; Yang, Xuan; Yao, Wu
2016-08-01
This work is devoted to the applications of the multiple linear regression (MLR), multilayer perceptron neural network (MLP NN) and projection pursuit regression (PPR) to quantitative structure-property relationship analysis of bioconcentration factors (BCFs) of pesticides tested on Bluegill (Lepomis macrochirus). Molecular descriptors of a total of 107 pesticides were calculated with the DRAGON Software and selected by inverse enhanced replacement method. Based on the selected DRAGON descriptors, a linear model was built by MLR, nonlinear models were developed using MLP NN and PPR. The robustness of the obtained models was assessed by cross-validation and external validation using test set. Outliers were also examined and deleted to improve predictive power. Comparative results revealed that PPR achieved the most accurate predictions. This study offers useful models and information for BCF prediction, risk assessment, and pesticide formulation. Copyright © 2016 Elsevier Ltd. All rights reserved.
Assessment of traffic noise levels in urban areas using different soft computing techniques.
Tomić, J; Bogojević, N; Pljakić, M; Šumarac-Pavlović, D
2016-10-01
Available traffic noise prediction models are usually based on regression analysis of experimental data, and this paper presents the application of soft computing techniques in traffic noise prediction. Two mathematical models are proposed and their predictions are compared to data collected by traffic noise monitoring in urban areas, as well as to predictions of commonly used traffic noise models. The results show that application of evolutionary algorithms and neural networks may improve process of development, as well as accuracy of traffic noise prediction.
Rutten, I J G; Ubachs, J; Kruitwagen, R F P M; van Dijk, D P J; Beets-Tan, R G H; Massuger, L F A G; Olde Damink, S W M; Van Gorp, T
2017-04-01
Sarcopenia, severe skeletal muscle loss, has been identified as a prognostic factor in various malignancies. This study aims to investigate whether sarcopenia is associated with overall survival (OS) and surgical complications in patients with advanced ovarian cancer undergoing primary debulking surgery (PDS). Ovarian cancer patients (n = 216) treated with PDS were enrolled retrospectively. Total skeletal muscle surface area was measured on axial computed tomography at the level of the third lumbar vertebra. Optimum stratification was used to find the optimal skeletal muscle index cut-off to define sarcopenia (≤38.73 cm 2 /m 2 ). Cox-regression and Kaplan-Meier analysis were used to analyse the relationship between sarcopenia and OS. The effect of sarcopenia on the development of major surgical complications was studied with logistic regression. Kaplan-Meier analysis showed a significant survival disadvantage for patients with sarcopenia compared to patients without sarcopenia (p = 0.010). Sarcopenia univariably predicted OS (HR 1.536 (95% CI 1.105-2.134), p = 0.011) but was not significant in multivariable Cox-regression analysis (HR 1.362 (95% CI 0.968-1.916), p = 0.076). Significant predictors for OS in multivariable Cox-regression analysis were complete PDS, treatment in a specialised centre and the development of major complications. Sarcopenia was not predictive of major complications. Sarcopenia was not predictive of OS or major complications in ovarian cancer patients undergoing primary debulking surgery. However a strong trend towards a survival disadvantage for patients with sarcopenia was seen. Future prospective studies should focus on interventions to prevent or reverse sarcopenia and possibly increase ovarian cancer survival. Complete cytoreduction remains the strongest predictor of ovarian cancer survival. Copyright © 2017 Elsevier Ltd, BASO ~ The Association for Cancer Surgery, and the European Society of Surgical Oncology. All rights reserved.
Optical scatterometry of quarter-micron patterns using neural regression
NASA Astrophysics Data System (ADS)
Bischoff, Joerg; Bauer, Joachim J.; Haak, Ulrich; Hutschenreuther, Lutz; Truckenbrodt, Horst
1998-06-01
With shrinking dimensions and increasing chip areas, a rapid and non-destructive full wafer characterization after every patterning cycle is an inevitable necessity. In former publications it was shown that Optical Scatterometry (OS) has the potential to push the attainable feature limits of optical techniques from 0.8 . . . 0.5 microns for imaging methods down to 0.1 micron and below. Thus the demands of future metrology can be met. Basically being a nonimaging method, OS combines light scatter (or diffraction) measurements with modern data analysis schemes to solve the inverse scatter issue. For very fine patterns with lambda-to-pitch ratios grater than one, the specular reflected light versus the incidence angle is recorded. Usually, the data analysis comprises two steps -- a training cycle connected the a rigorous forward modeling and the prediction itself. Until now, two data analysis schemes are usually applied -- the multivariate regression based Partial Least Squares method (PLS) and a look-up-table technique which is also referred to as Minimum Mean Square Error approach (MMSE). Both methods are afflicted with serious drawbacks. On the one hand, the prediction accuracy of multivariate regression schemes degrades with larger parameter ranges due to the linearization properties of the method. On the other hand, look-up-table methods are rather time consuming during prediction thus prolonging the processing time and reducing the throughput. An alternate method is an Artificial Neural Network (ANN) based regression which combines the advantages of multivariate regression and MMSE. Due to the versatility of a neural network, not only can its structure be adapted more properly to the scatter problem, but also the nonlinearity of the neuronal transfer functions mimic the nonlinear behavior of optical diffraction processes more adequately. In spite of these pleasant properties, the prediction speed of ANN regression is comparable with that of the PLS-method. In this paper, the viability and performance of ANN-regression will be demonstrated with the example of sub-quarter-micron resist metrology. To this end, 0.25 micrometer line/space patterns have been printed in positive photoresist by means of DUV projection lithography. In order to evaluate the total metrology chain from light scatter measurement through data analysis, a thorough modeling has been performed. Assuming a trapezoidal shape of the developed resist profile, a training data set was generated by means of the Rigorous Coupled Wave Approach (RCWA). After training the model, a second data set was computed and deteriorated by Gaussian noise to imitate real measuring conditions. Then, these data have been fed into the models established before resulting in a Standard Error of Prediction (SEP) which corresponds to the measuring accuracy. Even with putting only little effort in the design of a back-propagation network, the ANN is clearly superior to the PLS-method. Depending on whether a network with one or two hidden layers was used, accuracy gains between 2 and 5 can be achieved compared with PLS regression. Furthermore, the ANN is less noise sensitive, for there is only a doubling of the SEP at 5% noise for ANN whereas for PLS the accuracy degrades rapidly with increasing noise. The accuracy gain also depends on the light polarization and on the measured parameters. Finally, these results have been proven experimentally, where the OS-results are in good accordance with the profiles obtained from cross- sectioning micrographs.
Wanvarie, Samkaew; Sathapatayavongs, Boonmee
2007-09-01
The aim of this paper was to assess factors that predict students' performance in the Medical Licensing Examination of Thailand (MLET) Step1 examination. The hypothesis was that demographic factors and academic records would predict the students' performance in the Step1 Licensing Examination. A logistic regression analysis of demographic factors (age, sex and residence) and academic records [high school grade point average (GPA), National University Entrance Examination Score and GPAs of the pre-clinical years] with the MLET Step1 outcome was accomplished using the data of 117 third-year Ramathibodi medical students. Twenty-three (19.7%) students failed the MLET Step1 examination. Stepwise logistic regression analysis showed that the significant predictors of MLET Step1 success/failure were residence background and GPAs of the second and third preclinical years. For students whose sophomore and third-year GPAs increased by an average of 1 point, the odds of passing the MLET Step1 examination increased by a factor of 16.3 and 12.8 respectively. The minimum GPAs for students from urban and rural backgrounds to pass the examination were estimated from the equation (2.35 vs 2.65 from 4.00 scale). Students from rural backgrounds and/or low-grade point averages in their second and third preclinical years of medical school are at risk of failing the MLET Step1 examination. They should be given intensive tutorials during the second and third pre-clinical years.
Staley, Dennis M.; Negri, Jacquelyn A.; Kean, Jason W.; Laber, Jayme L.; Tillery, Anne C.; Youberg, Ann M.
2016-06-30
Wildfire can significantly alter the hydrologic response of a watershed to the extent that even modest rainstorms can generate dangerous flash floods and debris flows. To reduce public exposure to hazard, the U.S. Geological Survey produces post-fire debris-flow hazard assessments for select fires in the western United States. We use publicly available geospatial data describing basin morphology, burn severity, soil properties, and rainfall characteristics to estimate the statistical likelihood that debris flows will occur in response to a storm of a given rainfall intensity. Using an empirical database and refined geospatial analysis methods, we defined new equations for the prediction of debris-flow likelihood using logistic regression methods. We showed that the new logistic regression model outperformed previous models used to predict debris-flow likelihood.
Tse, Samson; Davidson, Larry; Chung, Ka-Fai; Yu, Chong Ho; Ng, King Lam; Tsoi, Emily
2015-02-01
More mental health services are adopting the recovery paradigm. This study adds to prior research by (a) using measures of stages of recovery and elements of recovery that were designed and validated in a non-Western, Chinese culture and (b) testing which demographic factors predict advanced recovery and whether placing importance on certain elements predicts advanced recovery. We examined recovery and factors associated with recovery among 75 Hong Kong adults who were diagnosed with schizophrenia and assessed to be in clinical remission. Data were collected on socio-demographic factors, recovery stages and elements associated with recovery. Logistic regression analysis was used to identify variables that could best predict stages of recovery. Receiver operating characteristic curves were used to detect the classification accuracy of the model (i.e. rates of correct classification of stages of recovery). Logistic regression results indicated that stages of recovery could be distinguished with reasonable accuracy for Stage 3 ('living with disability', classification accuracy = 75.45%) and Stage 4 ('living beyond disability', classification accuracy = 75.50%). However, there was no sufficient information to predict Combined Stages 1 and 2 ('overwhelmed by disability' and 'struggling with disability'). It was found that having a meaningful role and age were the most important differentiators of recovery stage. Preliminary findings suggest that adopting salient life roles personally is important to recovery and that this component should be incorporated into mental health services. © The Author(s) 2014.
Improvement of Storm Forecasts Using Gridded Bayesian Linear Regression for Northeast United States
NASA Astrophysics Data System (ADS)
Yang, J.; Astitha, M.; Schwartz, C. S.
2017-12-01
Bayesian linear regression (BLR) is a post-processing technique in which regression coefficients are derived and used to correct raw forecasts based on pairs of observation-model values. This study presents the development and application of a gridded Bayesian linear regression (GBLR) as a new post-processing technique to improve numerical weather prediction (NWP) of rain and wind storm forecasts over northeast United States. Ten controlled variables produced from ten ensemble members of the National Center for Atmospheric Research (NCAR) real-time prediction system are used for a GBLR model. In the GBLR framework, leave-one-storm-out cross-validation is utilized to study the performances of the post-processing technique in a database composed of 92 storms. To estimate the regression coefficients of the GBLR, optimization procedures that minimize the systematic and random error of predicted atmospheric variables (wind speed, precipitation, etc.) are implemented for the modeled-observed pairs of training storms. The regression coefficients calculated for meteorological stations of the National Weather Service are interpolated back to the model domain. An analysis of forecast improvements based on error reductions during the storms will demonstrate the value of GBLR approach. This presentation will also illustrate how the variances are optimized for the training partition in GBLR and discuss the verification strategy for grid points where no observations are available. The new post-processing technique is successful in improving wind speed and precipitation storm forecasts using past event-based data and has the potential to be implemented in real-time.
Height and Weight Estimation From Anthropometric Measurements Using Machine Learning Regressions
Fernandes, Bruno J. T.; Roque, Alexandre
2018-01-01
Height and weight are measurements explored to tracking nutritional diseases, energy expenditure, clinical conditions, drug dosages, and infusion rates. Many patients are not ambulant or may be unable to communicate, and a sequence of these factors may not allow accurate estimation or measurements; in those cases, it can be estimated approximately by anthropometric means. Different groups have proposed different linear or non-linear equations which coefficients are obtained by using single or multiple linear regressions. In this paper, we present a complete study of the application of different learning models to estimate height and weight from anthropometric measurements: support vector regression, Gaussian process, and artificial neural networks. The predicted values are significantly more accurate than that obtained with conventional linear regressions. In all the cases, the predictions are non-sensitive to ethnicity, and to gender, if more than two anthropometric parameters are analyzed. The learning model analysis creates new opportunities for anthropometric applications in industry, textile technology, security, and health care. PMID:29651366
A model to predict progression in brain-injured patients.
Tommasino, N; Forteza, D; Godino, M; Mizraji, R; Alvarez, I
2014-11-01
The study of brain death (BD) epidemiology and the acute brain injury (ABI) progression profile is important to improve public health programs, organ procurement strategies, and intensive care unit (ICU) protocols. The purpose of this study was to analyze the ABI progression profile among patients admitted to ICUs with a Glasgow Coma Score (GCS) ≤8, as well as establishing a prediction model of probability of death and BD. This was a retrospective analysis of prospective data that included all brain-injured patients with GCS ≤8 admitted to a total of four public and private ICUs in Uruguay (N = 1447). The independent predictor factors of death and BD were studied using logistic regression analysis. A hierarchical model consisting of 2 nested logit regression models was then created. With these models, the probabilities of death, BD, and death by cardiorespiratory arrest were analyzed. In the first regression, we observed that as the GCS decreased and age increased, the probability of death rose. Each additional year of age increased the probability of death by 0.014. In the second model, however, BD risk decreased with each year of age. The presence of swelling, mass effect, and/or space-occupying lesion increased BD risk for the same given GCS. In the presence of injuries compatible with intracranial hypertension, age behaved as a protective factor that reduced the probability of BD. Based on the analysis of the local epidemiology, a model to predict the probability of death and BD can be developed. The organ potential donation of a country, region, or hospital can be predicted on the basis of this model, customizing it to each specific situation.
Lin, Cheng-Yi; Lin, Ching-Yih; Chang, I-Wei; Sheu, Ming-Jen; Li, Chien-Feng; Lee, Sung-Wei; Lin, Li-Ching; Lee, Ying-En; He, Hong-Lin
2015-01-01
Neoadjuvant concurrent chemoradiotherapy (CCRT) followed by surgery is the mainstay of treatment for locally advanced rectal cancer. Several heparin-binding associated proteins have been reported to play a critical role in cancer progression. However, the clinical relevancies of such proteins and their associations with CCRT response in rectal cancer have not yet to be fully elucidated. The analysis of a public transcriptome of rectal cancer indicated that thrombospondin 2 (THBS2) is a predictive factor for CCRT response. Immunohistochemical analyses were conducted to evaluate the expression of THBS2 in pretreatment biopsy specimens from rectal cancer patients without distant metastasis. Furthermore, the relationships between THBS2 expression and various clinicopathological factors or survival were analyzed. Low expression of THBS2 was significantly associated with advanced pretreatment tumor (P<0.001) and nodal status (P=0.004), post-treatment tumor (P<0.001) and nodal status (P<0.001), increased vascular invasion (P=0.003), increased perineural invasion (P=0.023) and inferior tumor regression grade (P=0.015). In univariate analysis, low THBS2 expression predicted worse outcomes for disease-free survival, local recurrence-free survival and metastasis-free survival (all P<0.001). In multivariate analysis, low expression of THBS2 still served as a negative prognostic factor for disease-free survival (Hazard ratio=3.057, P=0.002) and metastasis-free survival (Hazard ratio=3.362, P=0.012). Low THBS2 expression was correlated with advanced disease status and low tumor regression after preoperative CCRT and that it acted as an independent negative prognostic factor in rectal cancer. THBS2 may represent a predictive biomarker for CCRT response in rectal cancer.
Landslide Hazard Mapping in Rwanda Using Logistic Regression
NASA Astrophysics Data System (ADS)
Piller, A.; Anderson, E.; Ballard, H.
2015-12-01
Landslides in the United States cause more than $1 billion in damages and 50 deaths per year (USGS 2014). Globally, figures are much more grave, yet monitoring, mapping and forecasting of these hazards are less than adequate. Seventy-five percent of the population of Rwanda earns a living from farming, mostly subsistence. Loss of farmland, housing, or life, to landslides is a very real hazard. Landslides in Rwanda have an impact at the economic, social, and environmental level. In a developing nation that faces challenges in tracking, cataloging, and predicting the numerous landslides that occur each year, satellite imagery and spatial analysis allow for remote study. We have focused on the development of a landslide inventory and a statistical methodology for assessing landslide hazards. Using logistic regression on approximately 30 test variables (i.e. slope, soil type, land cover, etc.) and a sample of over 200 landslides, we determine which variables are statistically most relevant to landslide occurrence in Rwanda. A preliminary predictive hazard map for Rwanda has been produced, using the variables selected from the logistic regression analysis.
Prediction of strontium bromide laser efficiency using cluster and decision tree analysis
NASA Astrophysics Data System (ADS)
Iliev, Iliycho; Gocheva-Ilieva, Snezhana; Kulin, Chavdar
2018-01-01
Subject of investigation is a new high-powered strontium bromide (SrBr2) vapor laser emitting in multiline region of wavelengths. The laser is an alternative to the atom strontium lasers and electron free lasers, especially at the line 6.45 μm which line is used in surgery for medical processing of biological tissues and bones with minimal damage. In this paper the experimental data from measurements of operational and output characteristics of the laser are statistically processed by means of cluster analysis and tree-based regression techniques. The aim is to extract the more important relationships and dependences from the available data which influence the increase of the overall laser efficiency. There are constructed and analyzed a set of cluster models. It is shown by using different cluster methods that the seven investigated operational characteristics (laser tube diameter, length, supplied electrical power, and others) and laser efficiency are combined in 2 clusters. By the built regression tree models using Classification and Regression Trees (CART) technique there are obtained dependences to predict the values of efficiency, and especially the maximum efficiency with over 95% accuracy.
Study on power grid characteristics in summer based on Linear regression analysis
NASA Astrophysics Data System (ADS)
Tang, Jin-hui; Liu, You-fei; Liu, Juan; Liu, Qiang; Liu, Zhuan; Xu, Xi
2018-05-01
The correlation analysis of power load and temperature is the precondition and foundation for accurate load prediction, and a great deal of research has been made. This paper constructed the linear correlation model between temperature and power load, then the correlation of fault maintenance work orders with the power load is researched. Data details of Jiangxi province in 2017 summer such as temperature, power load, fault maintenance work orders were adopted in this paper to develop data analysis and mining. Linear regression models established in this paper will promote electricity load growth forecast, fault repair work order review, distribution network operation weakness analysis and other work to further deepen the refinement.
Advanced GIS Exercise: Predicting Rainfall Erosivity Index Using Regression Analysis
ERIC Educational Resources Information Center
Post, Christopher J.; Goddard, Megan A.; Mikhailova, Elena A.; Hall, Steven T.
2006-01-01
Graduate students from a variety of agricultural and natural resource fields are incorporating geographic information systems (GIS) analysis into their graduate research, creating a need for teaching methodologies that help students understand advanced GIS topics for use in their own research. Graduate-level GIS exercises help students understand…
Adolescent Domain Screening Inventory-Short Form: Development and Initial Validation
ERIC Educational Resources Information Center
Corrigan, Matthew J.
2017-01-01
This study sought to develop a short version of the ADSI, and investigate its psychometric properties. Methods: This is a secondary analysis. Analysis to determine the Cronbach's Alpha, correlations to determine concurrent criterion validity and known instrument validity and a logistic regression to determine predictive validity were conducted.…
NASA Astrophysics Data System (ADS)
Uzunova, Yordanka; Prodanova, Krasimira; Spassov, Lubomir
2016-12-01
Orthotopic liver transplantation (OLT) is the only curative treatment for end-stage liver disease. Early diagnosis and treatment of infections after OLT are usually associated with improved outcomes. This study's objective is to identify reliable factors that can predict postoperative infectious morbidity. 27 children were included in the analysis. They underwent liver transplantation in our department. The correlation between two parameters (the level of blood glucose at 5th postoperative day and the duration of the anhepatic phase) and postoperative infections was analyzed, using univariate analysis. In this analysis, an independent predictive factor was derived which adequately identifies patients at risk of infectious complications after a liver transplantation.
Khalil, Mohamed H.; Shebl, Mostafa K.; Kosba, Mohamed A.; El-Sabrout, Karim; Zaki, Nesma
2016-01-01
Aim: This research was conducted to determine the most affecting parameters on hatchability of indigenous and improved local chickens’ eggs. Materials and Methods: Five parameters were studied (fertility, early and late embryonic mortalities, shape index, egg weight, and egg weight loss) on four strains, namely Fayoumi, Alexandria, Matrouh, and Montazah. Multiple linear regression was performed on the studied parameters to determine the most influencing one on hatchability. Results: The results showed significant differences in commercial and scientific hatchability among strains. Alexandria strain has the highest significant commercial hatchability (80.70%). Regarding the studied strains, highly significant differences in hatching chick weight among strains were observed. Using multiple linear regression analysis, fertility made the greatest percent contribution (71.31%) to hatchability, and the lowest percent contributions were made by shape index and egg weight loss. Conclusion: A prediction of hatchability using multiple regression analysis could be a good tool to improve hatchability percentage in chickens. PMID:27651666
Mager, P P; Rothe, H
1990-10-01
Multicollinearity of physicochemical descriptors leads to serious consequences in quantitative structure-activity relationship (QSAR) analysis, such as incorrect estimators and test statistics of regression coefficients of the ordinary least-squares (OLS) model applied usually to QSARs. Beside the diagnosis of the known simple collinearity, principal component regression analysis (PCRA) also allows the diagnosis of various types of multicollinearity. Only if the absolute values of PCRA estimators are order statistics that decrease monotonically, the effects of multicollinearity can be circumvented. Otherwise, obscure phenomena may be observed, such as good data recognition but low predictive model power of a QSAR model.
Bootstrap investigation of the stability of a Cox regression model.
Altman, D G; Andersen, P K
1989-07-01
We describe a bootstrap investigation of the stability of a Cox proportional hazards regression model resulting from the analysis of a clinical trial of azathioprine versus placebo in patients with primary biliary cirrhosis. We have considered stability to refer both to the choice of variables included in the model and, more importantly, to the predictive ability of the model. In stepwise Cox regression analyses of 100 bootstrap samples using 17 candidate variables, the most frequently selected variables were those selected in the original analysis, and no other important variable was identified. Thus there was no reason to doubt the model obtained in the original analysis. For each patient in the trial, bootstrap confidence intervals were constructed for the estimated probability of surviving two years. It is shown graphically that these intervals are markedly wider than those obtained from the original model.
Arsenyev, P A; Trezvov, V V; Saratovskaya, N V
1997-01-01
This work represents a method, which allows to determine phase composition of calcium hydroxylapatite basing on its infrared spectrum. The method uses factor analysis of the spectral data of calibration set of samples to determine minimal number of factors required to reproduce the spectra within experimental error. Multiple linear regression is applied to establish correlation between factor scores of calibration standards and their properties. The regression equations can be used to predict the property value of unknown sample. The regression model was built for determination of beta-tricalcium phosphate content in hydroxylapatite. Statistical estimation of quality of the model was carried out. Application of the factor analysis on spectral data allows to increase accuracy of beta-tricalcium phosphate determination and expand the range of determination towards its less concentration. Reproducibility of results is retained.
Nagwani, Naresh Kumar; Deo, Shirish V
2014-01-01
Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm.
Nagwani, Naresh Kumar; Deo, Shirish V.
2014-01-01
Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm. PMID:25374939
Austin, Peter C; Lee, Douglas S; Steyerberg, Ewout W; Tu, Jack V
2012-01-01
In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. PMID:22777999
Effects of Barometric Fluctuations on Well Water-Level Measurements and Aquifer Test Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Spane, Frank A.
1999-12-16
This report examines the effects of barometric fluctuations on well water-level measurements and evaluates adjustment and removal methods for determining areal aquifer head conditions and aquifer test analysis. Two examples of Hanford Site unconfined aquifer tests are examined that demonstrate baro-metric response analysis and illustrate the predictive/removal capabilities of various methods for well water-level and aquifer total head values. Good predictive/removal characteristics were demonstrated with best corrective results provided by multiple-regression deconvolution methods.
NASA Astrophysics Data System (ADS)
Wang, Xuntao; Feng, Jianhu; Wang, Hu; Hong, Shidi; Zheng, Supei
2018-03-01
A three-dimensional finite element box girder bridge and its asphalt concrete deck pavement were established by ANSYS software, and the interlayer bonding condition of asphalt concrete deck pavement was assumed to be contact bonding condition. Orthogonal experimental design is used to arrange the testing plans of material parameters, and an evaluation of the effect of different material parameters in the mechanical response of asphalt concrete surface layer was conducted by multiple linear regression model and using the results from the finite element analysis. Results indicated that stress regression equations can well predict the stress of the asphalt concrete surface layer, and elastic modulus of waterproof layer has a significant influence on stress values of asphalt concrete surface layer.
Very-short-term wind power prediction by a hybrid model with single- and multi-step approaches
NASA Astrophysics Data System (ADS)
Mohammed, E.; Wang, S.; Yu, J.
2017-05-01
Very-short-term wind power prediction (VSTWPP) has played an essential role for the operation of electric power systems. This paper aims at improving and applying a hybrid method of VSTWPP based on historical data. The hybrid method is combined by multiple linear regressions and least square (MLR&LS), which is intended for reducing prediction errors. The predicted values are obtained through two sub-processes:1) transform the time-series data of actual wind power into the power ratio, and then predict the power ratio;2) use the predicted power ratio to predict the wind power. Besides, the proposed method can include two prediction approaches: single-step prediction (SSP) and multi-step prediction (MSP). WPP is tested comparatively by auto-regressive moving average (ARMA) model from the predicted values and errors. The validity of the proposed hybrid method is confirmed in terms of error analysis by using probability density function (PDF), mean absolute percent error (MAPE) and means square error (MSE). Meanwhile, comparison of the correlation coefficients between the actual values and the predicted values for different prediction times and window has confirmed that MSP approach by using the hybrid model is the most accurate while comparing to SSP approach and ARMA. The MLR&LS is accurate and promising for solving problems in WPP.
Sargolzaie, Narjes; Miri-Moghaddam, Ebrahim
2014-01-01
The most common differential diagnosis of β-thalassemia (β-thal) trait is iron deficiency anemia. Several red blood cell equations were introduced during different studies for differential diagnosis between β-thal trait and iron deficiency anemia. Due to genetic variations in different regions, these equations cannot be useful in all population. The aim of this study was to determine a native equation with high accuracy for differential diagnosis of β-thal trait and iron deficiency anemia for the Sistan and Baluchestan population by logistic regression analysis. We selected 77 iron deficiency anemia and 100 β-thal trait cases. We used binary logistic regression analysis and determined best equations for probability prediction of β-thal trait against iron deficiency anemia in our population. We compared diagnostic values and receiver operative characteristic (ROC) curve related to this equation and another 10 published equations in discriminating β-thal trait and iron deficiency anemia. The binary logistic regression analysis determined the best equation for best probability prediction of β-thal trait against iron deficiency anemia with area under curve (AUC) 0.998. Based on ROC curves and AUC, Green & King, England & Frazer, and then Sirdah indices, respectively, had the most accuracy after our equation. We suggest that to get the best equation and cut-off in each region, one needs to evaluate specific information of each region, specifically in areas where populations are homogeneous, to provide a specific formula for differentiating between β-thal trait and iron deficiency anemia.
Parental education predicts change in intelligence quotient after childhood epilepsy surgery.
Meekes, Joost; van Schooneveld, Monique M J; Braams, Olga B; Jennekens-Schinkel, Aag; van Rijen, Peter C; Hendriks, Marc P H; Braun, Kees P J; van Nieuwenhuizen, Onno
2015-04-01
To know whether change in the intelligence quotient (IQ) of children who undergo epilepsy surgery is associated with the educational level of their parents. Retrospective analysis of data obtained from a cohort of children who underwent epilepsy surgery between January 1996 and September 2010. We performed simple and multiple regression analyses to identify predictors associated with IQ change after surgery. In addition to parental education, six variables previously demonstrated to be associated with IQ change after surgery were included as predictors: age at surgery, duration of epilepsy, etiology, presurgical IQ, reduction of antiepileptic drugs, and seizure freedom. We used delta IQ (IQ 2 years after surgery minus IQ shortly before surgery) as the primary outcome variable, but also performed analyses with pre- and postsurgical IQ as outcome variables to support our findings. To validate the results we performed simple regression analysis with parental education as the predictor in specific subgroups. The sample for regression analysis included 118 children (60 male; median age at surgery 9.73 years). Parental education was significantly associated with delta IQ in simple regression analysis (p = 0.004), and also contributed significantly to postsurgical IQ in multiple regression analysis (p = 0.008). Additional analyses demonstrated that parental education made a unique contribution to prediction of delta IQ, that is, it could not be replaced by the illness-related variables. Subgroup analyses confirmed the association of parental education with IQ change after surgery for most groups. Children whose parents had higher education demonstrate on average a greater increase in IQ after surgery and a higher postsurgical--but not presurgical--IQ than children whose parents completed at most lower secondary education. Parental education--and perhaps other environmental variables--should be considered in the prognosis of cognitive function after childhood epilepsy surgery. Wiley Periodicals, Inc. © 2015 International League Against Epilepsy.
Newman, J; Egan, T; Harbourne, N; O'Riordan, D; Jacquier, J C; O'Sullivan, M
2014-08-01
Sensory evaluation can be problematic for ingredients with a bitter taste during research and development phase of new food products. In this study, 19 dairy protein hydrolysates (DPH) were analysed by an electronic tongue and their physicochemical characteristics, the data obtained from these methods were correlated with their bitterness intensity as scored by a trained sensory panel and each model was also assessed by its predictive capabilities. The physiochemical characteristics of the DPHs investigated were degree of hydrolysis (DH%), and data relating to peptide size and relative hydrophobicity from size exclusion chromatography (SEC) and reverse phase (RP) HPLC. Partial least square regression (PLS) was used to construct the prediction models. All PLS regressions had good correlations (0.78 to 0.93) with the strongest being the combination of data obtained from SEC and RP HPLC. However, the PLS with the strongest predictive power was based on the e-tongue which had the PLS regression with the lowest root mean predicted residual error sum of squares (PRESS) in the study. The results show that the PLS models constructed with the e-tongue and the combination of SEC and RP-HPLC has potential to be used for prediction of bitterness and thus reducing the reliance on sensory analysis in DPHs for future food research. Copyright © 2014 Elsevier B.V. All rights reserved.
2014-01-01
This paper examined the efficiency of multivariate linear regression (MLR) and artificial neural network (ANN) models in prediction of two major water quality parameters in a wastewater treatment plant. Biochemical oxygen demand (BOD) and chemical oxygen demand (COD) as well as indirect indicators of organic matters are representative parameters for sewer water quality. Performance of the ANN models was evaluated using coefficient of correlation (r), root mean square error (RMSE) and bias values. The computed values of BOD and COD by model, ANN method and regression analysis were in close agreement with their respective measured values. Results showed that the ANN performance model was better than the MLR model. Comparative indices of the optimized ANN with input values of temperature (T), pH, total suspended solid (TSS) and total suspended (TS) for prediction of BOD was RMSE = 25.1 mg/L, r = 0.83 and for prediction of COD was RMSE = 49.4 mg/L, r = 0.81. It was found that the ANN model could be employed successfully in estimating the BOD and COD in the inlet of wastewater biochemical treatment plants. Moreover, sensitive examination results showed that pH parameter have more effect on BOD and COD predicting to another parameters. Also, both implemented models have predicted BOD better than COD. PMID:24456676
Pathan, Sameer A; Bhutta, Zain A; Moinudheen, Jibin; Jenkins, Dominic; Silva, Ashwin D; Sharma, Yogdutt; Saleh, Warda A; Khudabakhsh, Zeenat; Irfan, Furqan B; Thomas, Stephen H
2016-01-01
Background: Standard Emergency Department (ED) operations goals include minimization of the time interval (tMD) between patients' initial ED presentation and initial physician evaluation. This study assessed factors known (or suspected) to influence tMD with a two-step goal. The first step was generation of a multivariate model identifying parameters associated with prolongation of tMD at a single study center. The second step was the use of a study center-specific multivariate tMD model as a basis for predictive marginal probability analysis; the marginal model allowed for prediction of the degree of ED operations benefit that would be affected with specific ED operations improvements. Methods: The study was conducted using one month (May 2015) of data obtained from an ED administrative database (EDAD) in an urban academic tertiary ED with an annual census of approximately 500,000; during the study month, the ED saw 39,593 cases. The EDAD data were used to generate a multivariate linear regression model assessing the various demographic and operational covariates' effects on the dependent variable tMD. Predictive marginal probability analysis was used to calculate the relative contributions of key covariates as well as demonstrate the likely tMD impact on modifying those covariates with operational improvements. Analyses were conducted with Stata 14MP, with significance defined at p < 0.05 and confidence intervals (CIs) reported at the 95% level. Results: In an acceptable linear regression model that accounted for just over half of the overall variance in tMD (adjusted r 2 0.51), important contributors to tMD included shift census ( p = 0.008), shift time of day ( p = 0.002), and physician coverage n ( p = 0.004). These strong associations remained even after adjusting for each other and other covariates. Marginal predictive probability analysis was used to predict the overall tMD impact (improvement from 50 to 43 minutes, p < 0.001) of consistent staffing with 22 physicians. Conclusions: The analysis identified expected variables contributing to tMD with regression demonstrating significance and effect magnitude of alterations in covariates including patient census, shift time of day, and number of physicians. Marginal analysis provided operationally useful demonstration of the need to adjust physician coverage numbers, prompting changes at the study ED. The methods used in this analysis may prove useful in other EDs wishing to analyze operations information with the goal of predicting which interventions may have the most benefit.
Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.
2003-01-01
Logistic regression was used to predict the probability of debris flows occurring in areas recently burned by wildland fires. Multiple logistic regression is conceptually similar to multiple linear regression because statistical relations between one dependent variable and several independent variables are evaluated. In logistic regression, however, the dependent variable is transformed to a binary variable (debris flow did or did not occur), and the actual probability of the debris flow occurring is statistically modeled. Data from 399 basins located within 15 wildland fires that burned during 2000-2002 in Colorado, Idaho, Montana, and New Mexico were evaluated. More than 35 independent variables describing the burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows were delineated from National Elevation Data using a Geographic Information System (GIS). (2) Data describing the burn severity, geology, land surface gradient, rainfall, and soil properties were determined for each basin. These data were then downloaded to a statistics software package for analysis using logistic regression. (3) Relations between the occurrence/non-occurrence of debris flows and burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated and several preliminary multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combination produced the most effective model. The multivariate model that best predicted the occurrence of debris flows was selected. (4) The multivariate logistic regression model was entered into a GIS, and a map showing the probability of debris flows was constructed. The most effective model incorporates the percentage of each basin with slope greater than 30 percent, percentage of land burned at medium and high burn severity in each basin, particle size sorting, average storm intensity (millimeters per hour), soil organic matter content, soil permeability, and soil drainage. The results of this study demonstrate that logistic regression is a valuable tool for predicting the probability of debris flows occurring in recently-burned landscapes.
Uddin, Jamal; Zwisler, Ann-Dorthe; Lewinter, Christian; Moniruzzaman, Mohammad; Lund, Ken; Tang, Lars H; Taylor, Rod S
2016-05-01
The aim of this study was to undertake a comprehensive assessment of the patient, intervention and trial-level factors that may predict exercise capacity following exercise-based rehabilitation in patients with coronary heart disease and heart failure. Meta-analysis and meta-regression analysis. Randomized controlled trials of exercise-based rehabilitation were identified from three published systematic reviews. Exercise capacity was pooled across trials using random effects meta-analysis, and meta-regression used to examine the association between exercise capacity and a range of patient (e.g. age), intervention (e.g. exercise frequency) and trial (e.g. risk of bias) factors. 55 trials (61 exercise-control comparisons, 7553 patients) were included. Following exercise-based rehabilitation compared to control, overall exercise capacity was on average 0.95 (95% CI: 0.76-1.41) standard deviation units higher, and in trials reporting maximum oxygen uptake (VO2max) was 3.3 ml/kg.min(-1) (95% CI: 2.6-4.0) higher. There was evidence of a high level of statistical heterogeneity across trials (I(2) statistic > 50%). In multivariable meta-regression analysis, only exercise intervention intensity was found to be significantly associated with VO2max (P = 0.04); those trials with the highest average exercise intensity had the largest mean post-rehabilitation VO2max compared to control. We found considerable heterogeneity across randomized controlled trials in the magnitude of improvement in exercise capacity following exercise-based rehabilitation compared to control among patients with coronary heart disease or heart failure. Whilst higher exercise intensities were associated with a greater level of post-rehabilitation exercise capacity, there was no strong evidence to support other intervention, patient or trial factors to be predictive. © The European Society of Cardiology 2015.
Chen, Jin-hong; Wu, Hai-yun; He, Kun-lun; He, Yao; Qin, Yin-he
2010-10-01
To establish and verify the prediction model for ischemic cardiovascular disease (ICVD) among the elderly population who were under the current health care programs. Statistical analysis on data from physical examination, hospitalization of the past years, from questionnaire and telephone interview was carried out in May, 2003. Data was from a hospital which implementing a health care program. Baseline population with a proportion of 4:1 was randomly selected to generate both module group and verification group. Baseline data was induced to make the verification group into regression model of module group and to generate the predictive value. Distinguished ability with area under ROC curve and the predictive veracity were verified through comparing the predictive incidence rate and actual incidence rate of every deciles group by Hosmer-Lemeshow test. Predictive veracity of the prediction model at population level was verified through comparing the predictive 6-year incidence rates of ICVD with actual 6-year accumulative incidence rates of ICVD with error rate calculated. The samples included 2271 males over the age of 65 with 1817 people for modeling population and 454 for verified population. All of the samples were stratified into two layers to establish hierarchical Cox proportional hazard regression model, including one advanced age group (greater than or equal to 75 years old), and another elderly group (less than 75 years old). Data from the statically analysis showed that the risk factors in aged group were age, systolic blood pressure, serum creatinine level, fasting blood glucose level, while protective factor was high density lipoprotein;in advanced age group, the risk factors were body weight index, systolic blood pressure, serum total cholesterol level, serum creatinine level, fasting blood glucose level, while protective factor was HDL-C. The area under the ROC curve (AUC) and 95%CI were 0.723 and 0.687 - 0.759 respectively. Discriminating power was good. All individual predictive ICVD cumulative incidence and actual incidence were analyzed using Hosmer-Lemeshow test, χ(2) = 1.43, P = 0.786, showing that the predictive veracity was good. The stratified Cox Hazards Regression model was used to establish prediction model of the aged male population under a certain health care program. The common prediction factor of the two age groups were: systolic blood pressure, serum creatinine level, fasting blood glucose level and HDL-C. The area under the ROC curve of the verification group was 0.723, showing that the distinguished ability was good and the predict ability at the individual level and at the group level were also satisfactory. It was feasible to using Cox Proportional Hazards Regression Model for predicting the population groups.
Fu, Fan; Sun, Shengjun; Liu, Liping; Li, Jianying; Su, Yaping; Li, Yingying
2018-04-19
The computed tomography angiography (CTA) spot sign is a validated predictor of haematoma expansion (HE) in spontaneous intracerebral haemorrhage (SICH). We investigated whether defining the iodine concentration (IC) inside the spot sign and the haematoma on Gemstone spectral imaging (GSI) would improve its sensitivity and specificity for predicting HE. From 2014 to 2016, we prospectively enrolled 65 SICH patients who underwent single-phase spectral CTA within 6 h. Logistic regression was performed to assess the risk factors for HE. The predictive performance of individual spot sign characteristics was examined via receiver operating characteristic (ROC) analysis. The spot sign was detected in 46.1% (30/65) of patients. ROC analysis indicated that IC inside the spot sign had the greatest area under the ROC curve for HE (0.858; 95% confidence interval, 0.727-0.989; p = 0.003). Multivariate analysis found that spot sign with higher IC (i.e. IC > 7.82 100 μg/ml) was an independent predictor of HE (odds ratio = 34.27; 95% confidence interval, 5.608-209.41; p < 0.001) with sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) of 0.81, 0.75, 0.90 and 0.60, respectively; while the spot sign showed sensitivity, specificity, PPV and NPV of 0.81, 0.79, 0.73 and 0.86. Logistic regression analysis indicated that the IC in haematomas was independently associated with HE (odds ratio = 1.525; 95% confidence interval, 1.041-2.235; p = 0.030). ICs in haematoma and in spot sign were all independently associated with HE. IC analysis in spectral imaging may help to identify SICH patients for targeted haemostatic therapy. • Iodine concentration in spot sign and haematoma can predict haematoma expansion • Spectral imaging could measure the IC inside the spot sign and haematoma • IC in spot sign improved the positive predictive value (PPV) cf. CTA.
D'Archivio, Angelo Antonio; Incani, Angela; Ruggieri, Fabrizio
2011-01-01
In this paper, we use a quantitative structure-retention relationship (QSRR) method to predict the retention times of polychlorinated biphenyls (PCBs) in comprehensive two-dimensional gas chromatography (GC×GC). We analyse the GC×GC retention data taken from the literature by comparing predictive capability of different regression methods. The various models are generated using 70 out of 209 PCB congeners in the calibration stage, while their predictive performance is evaluated on the remaining 139 compounds. The two-dimensional chromatogram is initially estimated by separately modelling retention times of PCBs in the first and in the second column ((1) t (R) and (2) t (R), respectively). In particular, multilinear regression (MLR) combined with genetic algorithm (GA) variable selection is performed to extract two small subsets of predictors for (1) t (R) and (2) t (R) from a large set of theoretical molecular descriptors provided by the popular software Dragon, which after removal of highly correlated or almost constant variables consists of 237 structure-related quantities. Based on GA-MLR analysis, a four-dimensional and a five-dimensional relationship modelling (1) t (R) and (2) t (R), respectively, are identified. Single-response partial least square (PLS-1) regression is alternatively applied to independently model (1) t (R) and (2) t (R) without the need for preliminary GA variable selection. Further, we explore the possibility of predicting the two-dimensional chromatogram of PCBs in a single calibration procedure by using a two-response PLS (PLS-2) model or a feed-forward artificial neural network (ANN) with two output neurons. In the first case, regression is carried out on the full set of 237 descriptors, while the variables previously selected by GA-MLR are initially considered as ANN inputs and subjected to a sensitivity analysis to remove the redundant ones. Results show PLS-1 regression exhibits a noticeably better descriptive and predictive performance than the other investigated approaches. The observed values of determination coefficients for (1) t (R) and (2) t (R) in calibration (0.9999 and 0.9993, respectively) and prediction (0.9987 and 0.9793, respectively) provided by PLS-1 demonstrate that GC×GC behaviour of PCBs is properly modelled. In particular, the predicted two-dimensional GC×GC chromatogram of 139 PCBs not involved in the calibration stage closely resembles the experimental one. Based on the above lines of evidence, the proposed approach ensures accurate simulation of the whole GC×GC chromatogram of PCBs using experimental determination of only 1/3 retention data of representative congeners.
Christiansen, Cory L; Bade, Michael J; Weitzenkamp, David A; Stevens-Lapsley, Jennifer E
2013-03-01
Factors predicting weight-bearing asymmetry (WBA) after unilateral total knee arthroplasty (TKA) are not known. However, identifying modifiable and non-modifiable predictors of WBA is needed to optimize rehabilitation, especially since WBA is negatively correlated to poor functional performance. The purpose of this study was to identify factors predictive of WBA during sit-stand transitions for people 1month following unilateral TKA. Fifty-nine people were tested preoperatively and 1month following unilateral TKA for WBA using average vertical ground reaction force under each foot during the Five Times Sit-to-Stand Test. Candidate variables tested in the regression analysis represented physical impairments (strength, muscle activation, pain, and motion), demographics, anthropometrics, and movement compensations. WBA, measured as the ratio of surgical/non-surgical limb vertical ground reaction force, was 0.69 (0.18) (mean (SD)) 1month after TKA. Regression analysis identified preoperative WBA (β=0.40), quadriceps strength ratio (β=0.31), and hamstrings strength ratio (β=0.19) as factors predictive of WBA 1month after TKA (R(2)=0.30). Greater amounts of WBA 1month after TKA are predicted by modifiable factors including habitual movement pattern and asymmetry in quadriceps and hamstrings strength. Copyright © 2012 Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Bahadir, Elif
2016-01-01
The ability to predict the success of students when they enter a graduate program is critical for educational institutions because it allows them to develop strategic programs that will help improve students' performances during their stay at an institution. In this study, we present the results of an experimental comparison study of Logistic…
Toyabe, Shin-ichi
2014-01-01
Inpatient falls are the most common adverse events that occur in a hospital, and about 3 to 10% of falls result in serious injuries such as bone fractures and intracranial haemorrhages. We previously reported that bone fractures and intracranial haemorrhages were two major fall-related injuries and that risk assessment score for osteoporotic bone fracture was significantly associated not only with bone fractures after falls but also with intracranial haemorrhage after falls. Based on the results, we tried to establish a risk assessment tool for predicting fall-related severe injuries in a hospital. Possible risk factors related to fall-related serious injuries were extracted from data on inpatients that were admitted to a tertiary-care university hospital by using multivariate Cox’ s regression analysis and multiple logistic regression analysis. We found that fall risk score and fracture risk score were the two significant factors, and we constructed models to predict fall-related severe injuries incorporating these factors. When the prediction model was applied to another independent dataset, the constructed model could detect patients with fall-related severe injuries efficiently. The new assessment system could identify patients prone to severe injuries after falls in a reproducible fashion. PMID:25168984
Sparse partial least squares regression for simultaneous dimension reduction and variable selection
Chun, Hyonho; Keleş, Sündüz
2010-01-01
Partial least squares regression has been an alternative to ordinary least squares for handling multicollinearity in several areas of scientific research since the 1960s. It has recently gained much attention in the analysis of high dimensional genomic data. We show that known asymptotic consistency of the partial least squares estimator for a univariate response does not hold with the very large p and small n paradigm. We derive a similar result for a multivariate response regression with partial least squares. We then propose a sparse partial least squares formulation which aims simultaneously to achieve good predictive performance and variable selection by producing sparse linear combinations of the original predictors. We provide an efficient implementation of sparse partial least squares regression and compare it with well-known variable selection and dimension reduction approaches via simulation experiments. We illustrate the practical utility of sparse partial least squares regression in a joint analysis of gene expression and genomewide binding data. PMID:20107611
Villarrasa-Sapiña, Israel; Álvarez-Pitti, Julio; Cabeza-Ruiz, Ruth; Redón, Pau; Lurbe, Empar; García-Massó, Xavier
2018-02-01
Excess body weight during childhood causes reduced motor functionality and problems in postural control, a negative influence which has been reported in the literature. Nevertheless, no information regarding the effect of body composition on the postural control of overweight and obese children is available. The objective of this study was therefore to establish these relationships. A cross-sectional design was used to establish relationships between body composition and postural control variables obtained in bipedal eyes-open and eyes-closed conditions in twenty-two children. Centre of pressure signals were analysed in the temporal and frequency domains. Pearson correlations were applied to establish relationships between variables. Principal component analysis was applied to the body composition variables to avoid potential multicollinearity in the regression models. These principal components were used to perform a multiple linear regression analysis, from which regression models were obtained to predict postural control. Height and leg mass were the body composition variables that showed the highest correlation with postural control. Multiple regression models were also obtained and several of these models showed a higher correlation coefficient in predicting postural control than simple correlations. These models revealed that leg and trunk mass were good predictors of postural control. More equations were found in the eyes-open than eyes-closed condition. Body weight and height are negatively correlated with postural control. However, leg and trunk mass are better postural control predictors than arm or body mass. Finally, body composition variables are more useful in predicting postural control when the eyes are open. Copyright © 2017 Elsevier Ltd. All rights reserved.
Women, Physical Activity, and Quality of Life: Self-concept as a Mediator.
Gonzalo Silvestre, Tamara; Ubillos Landa, Silvia
2016-02-22
The objectives of this research are: (a) analyze the incremental validity of physical activity's (PA) influence on perceived quality of life (PQL); (b) determine if PA's predictive power is mediated by self-concept; and (c) study if results vary according to a unidimensional or multidimensional approach to self-concept measurement. The sample comprised 160 women from Burgos, Spain aged 18 to 45 years old. Non-probability sampling was used. Two three-step hierarchical regression analyses were applied to forecast PQL. The hedonic quality-of-life indicators, self-concept, self-esteem, and PA were included as independent variables. The first regression analysis included global self-concept as predictor variable, while the second included its five dimensions. Two mediation analyses were conducted to see if PA's ability to predict PQL was mediated by global and physical self-concept. Results from the first regression shows that self-concept, satisfaction with life, and PA were significant predictors. PA slightly but significantly increased explained variance in PQL (2.1%). In the second regression, substituting global self-concept with its five constituent factors, only the physical dimension and satisfaction with life predicted PQL, while PA ceased to be a significant predictor. Mediation analysis revealed that only physical self-concept mediates the relationship between PA and PQL (z = 1.97, p < .050), and not global self-concept. Physical self-concept was the strongest predictor and approximately 32.45 % of PA's effect on PQL was mediated by it. This study's findings support a multidimensional view of self-concept, and represent a more accurate image of the relationship between PQL, PA, and self-concept.
Low-flow, base-flow, and mean-flow regression equations for Pennsylvania streams
Stuckey, Marla H.
2006-01-01
Low-flow, base-flow, and mean-flow characteristics are an important part of assessing water resources in a watershed. These streamflow characteristics can be used by watershed planners and regulators to determine water availability, water-use allocations, assimilative capacities of streams, and aquatic-habitat needs. Streamflow characteristics are commonly predicted by use of regression equations when a nearby streamflow-gaging station is not available. Regression equations for predicting low-flow, base-flow, and mean-flow characteristics for Pennsylvania streams were developed from data collected at 293 continuous- and partial-record streamflow-gaging stations with flow unaffected by upstream regulation, diversion, or mining. Continuous-record stations used in the regression analysis had 9 years or more of data, and partial-record stations used had seven or more measurements collected during base-flow conditions. The state was divided into five low-flow regions and regional regression equations were developed for the 7-day, 10-year; 7-day, 2-year; 30-day, 10-year; 30-day, 2-year; and 90-day, 10-year low flows using generalized least-squares regression. Statewide regression equations were developed for the 10-year, 25-year, and 50-year base flows using generalized least-squares regression. Statewide regression equations were developed for harmonic mean and mean annual flow using weighted least-squares regression. Basin characteristics found to be significant explanatory variables at the 95-percent confidence level for one or more regression equations were drainage area, basin slope, thickness of soil, stream density, mean annual precipitation, mean elevation, and the percentage of glaciation, carbonate bedrock, forested area, and urban area within a basin. Standard errors of prediction ranged from 33 to 66 percent for the n-day, T-year low flows; 21 to 23 percent for the base flows; and 12 to 38 percent for the mean annual flow and harmonic mean, respectively. The regression equations are not valid in watersheds with upstream regulation, diversions, or mining activities. Watersheds with karst features need close examination as to the applicability of the regression-equation results.
Yao, Hong; Zhuang, Wei; Qian, Yu; Xia, Bisheng; Yang, Yang; Qian, Xin
2016-01-01
Turbidity (T) has been widely used to detect the occurrence of pollutants in surface water. Using data collected from January 2013 to June 2014 at eleven sites along two rivers feeding the Taihu Basin, China, the relationship between the concentration of five metals (aluminum (Al), titanium (Ti), nickel (Ni), vanadium (V), lead (Pb)) and turbidity was investigated. Metal concentration was determined using inductively coupled plasma mass spectrometry (ICP-MS). The linear regression of metal concentration and turbidity provided a good fit, with R2 = 0.86–0.93 for 72 data sets collected in the industrial river and R2 = 0.60–0.85 for 60 data sets collected in the cleaner river. All the regression presented good linear relationship, leading to the conclusion that the occurrence of the five metals are directly related to suspended solids, and these metal concentration could be approximated using these regression equations. Thus, the linear regression equations were applied to estimate the metal concentration using online turbidity data from January 1 to June 30 in 2014. In the prediction, the WASP 7.5.2 (Water Quality Analysis Simulation Program) model was introduced to interpret the transport and fates of total suspended solids; in addition, metal concentration downstream of the two rivers was predicted. All the relative errors between the estimated and measured metal concentration were within 30%, and those between the predicted and measured values were within 40%. The estimation and prediction process of metals’ concentration indicated that exploring the relationship between metals and turbidity values might be one effective technique for efficient estimation and prediction of metal concentration to facilitate better long-term monitoring with high temporal and spatial density. PMID:27028017
Yao, Hong; Zhuang, Wei; Qian, Yu; Xia, Bisheng; Yang, Yang; Qian, Xin
2016-01-01
Turbidity (T) has been widely used to detect the occurrence of pollutants in surface water. Using data collected from January 2013 to June 2014 at eleven sites along two rivers feeding the Taihu Basin, China, the relationship between the concentration of five metals (aluminum (Al), titanium (Ti), nickel (Ni), vanadium (V), lead (Pb)) and turbidity was investigated. Metal concentration was determined using inductively coupled plasma mass spectrometry (ICP-MS). The linear regression of metal concentration and turbidity provided a good fit, with R(2) = 0.86-0.93 for 72 data sets collected in the industrial river and R(2) = 0.60-0.85 for 60 data sets collected in the cleaner river. All the regression presented good linear relationship, leading to the conclusion that the occurrence of the five metals are directly related to suspended solids, and these metal concentration could be approximated using these regression equations. Thus, the linear regression equations were applied to estimate the metal concentration using online turbidity data from January 1 to June 30 in 2014. In the prediction, the WASP 7.5.2 (Water Quality Analysis Simulation Program) model was introduced to interpret the transport and fates of total suspended solids; in addition, metal concentration downstream of the two rivers was predicted. All the relative errors between the estimated and measured metal concentration were within 30%, and those between the predicted and measured values were within 40%. The estimation and prediction process of metals' concentration indicated that exploring the relationship between metals and turbidity values might be one effective technique for efficient estimation and prediction of metal concentration to facilitate better long-term monitoring with high temporal and spatial density.
NASA Astrophysics Data System (ADS)
Isingizwe Nturambirwe, J. Frédéric; Perold, Willem J.; Opara, Umezuruike L.
2016-02-01
Near infrared (NIR) spectroscopy has gained extensive use in quality evaluation. It is arguably one of the most advanced spectroscopic tools in non-destructive quality testing of food stuff, from measurement to data analysis and interpretation. NIR spectral data are interpreted through means often involving multivariate statistical analysis, sometimes associated with optimisation techniques for model improvement. The objective of this research was to explore the extent to which genetic algorithms (GA) can be used to enhance model development, for predicting fruit quality. Apple fruits were used, and NIR spectra in the range from 12000 to 4000 cm-1 were acquired on both bruised and healthy tissues, with different degrees of mechanical damage. GAs were used in combination with partial least squares regression methods to develop bruise severity prediction models, and compared to PLS models developed using the full NIR spectrum. A classification model was developed, which clearly separated bruised from unbruised apple tissue. GAs helped improve prediction models by over 10%, in comparison with full spectrum-based models, as evaluated in terms of error of prediction (Root Mean Square Error of Cross-validation). PLS models to predict internal quality, such as sugar content and acidity were developed and compared to the versions optimized by genetic algorithm. Overall, the results highlighted the potential use of GA method to improve speed and accuracy of fruit quality prediction.
Creasy, John M; Midya, Abhishek; Chakraborty, Jayasree; Adams, Lauryn B; Gomes, Camilla; Gonen, Mithat; Seastedt, Kenneth P; Sutton, Elizabeth J; Cercek, Andrea; Kemeny, Nancy E; Shia, Jinru; Balachandran, Vinod P; Kingham, T Peter; Allen, Peter J; DeMatteo, Ronald P; Jarnagin, William R; D'Angelica, Michael I; Do, Richard K G; Simpson, Amber L
2018-06-19
This study investigates whether quantitative image analysis of pretreatment CT scans can predict volumetric response to chemotherapy for patients with colorectal liver metastases (CRLM). Patients treated with chemotherapy for CRLM (hepatic artery infusion (HAI) combined with systemic or systemic alone) were included in the study. Patients were imaged at baseline and approximately 8 weeks after treatment. Response was measured as the percentage change in tumour volume from baseline. Quantitative imaging features were derived from the index hepatic tumour on pretreatment CT, and features statistically significant on univariate analysis were included in a linear regression model to predict volumetric response. The regression model was constructed from 70% of data, while 30% were reserved for testing. Test data were input into the trained model. Model performance was evaluated with mean absolute prediction error (MAPE) and R 2 . Clinicopatholologic factors were assessed for correlation with response. 157 patients were included, split into training (n = 110) and validation (n = 47) sets. MAPE from the multivariate linear regression model was 16.5% (R 2 = 0.774) and 21.5% in the training and validation sets, respectively. Stratified by HAI utilisation, MAPE in the validation set was 19.6% for HAI and 25.1% for systemic chemotherapy alone. Clinical factors associated with differences in median tumour response were treatment strategy, systemic chemotherapy regimen, age and KRAS mutation status (p < 0.05). Quantitative imaging features extracted from pretreatment CT are promising predictors of volumetric response to chemotherapy in patients with CRLM. Pretreatment predictors of response have the potential to better select patients for specific therapies. • Colorectal liver metastases (CRLM) are downsized with chemotherapy but predicting the patients that will respond to chemotherapy is currently not possible. • Heterogeneity and enhancement patterns of CRLM can be measured with quantitative imaging. • Prediction model constructed that predicts volumetric response with 20% error suggesting that quantitative imaging holds promise to better select patients for specific treatments.
NASA Astrophysics Data System (ADS)
Tao, Yulong; Miao, Yunshui; Han, Jiaqi; Yan, Feiyun
2018-05-01
Aiming at the low accuracy of traditional forecasting methods such as linear regression method, this paper presents a prediction method for predicting the relationship between bridge steel box girder and its displacement with wavelet neural network. Compared with traditional forecasting methods, this scheme has better local characteristics and learning ability, which greatly improves the prediction ability of deformation. Through analysis of the instance and found that after compared with the traditional prediction method based on wavelet neural network, the rigid beam deformation prediction accuracy is higher, and is superior to the BP neural network prediction results, conform to the actual demand of engineering design.
Kendrick, Sarah K; Zheng, Qi; Garbett, Nichola C; Brock, Guy N
2017-01-01
DSC is used to determine thermally-induced conformational changes of biomolecules within a blood plasma sample. Recent research has indicated that DSC curves (or thermograms) may have different characteristics based on disease status and, thus, may be useful as a monitoring and diagnostic tool for some diseases. Since thermograms are curves measured over a range of temperature values, they are considered functional data. In this paper we apply functional data analysis techniques to analyze differential scanning calorimetry (DSC) data from individuals from the Lupus Family Registry and Repository (LFRR). The aim was to assess the effect of lupus disease status as well as additional covariates on the thermogram profiles, and use FD analysis methods to create models for classifying lupus vs. control patients on the basis of the thermogram curves. Thermograms were collected for 300 lupus patients and 300 controls without lupus who were matched with diseased individuals based on sex, race, and age. First, functional regression with a functional response (DSC) and categorical predictor (disease status) was used to determine how thermogram curve structure varied according to disease status and other covariates including sex, race, and year of birth. Next, functional logistic regression with disease status as the response and functional principal component analysis (FPCA) scores as the predictors was used to model the effect of thermogram structure on disease status prediction. The prediction accuracy for patients with Osteoarthritis and Rheumatoid Arthritis but without Lupus was also calculated to determine the ability of the classifier to differentiate between Lupus and other diseases. Data were divided 1000 times into separate 2/3 training and 1/3 test data for evaluation of predictions. Finally, derivatives of thermogram curves were included in the models to determine whether they aided in prediction of disease status. Functional regression with thermogram as a functional response and disease status as predictor showed a clear separation in thermogram curve structure between cases and controls. The logistic regression model with FPCA scores as the predictors gave the most accurate results with a mean 79.22% correct classification rate with a mean sensitivity = 79.70%, and specificity = 81.48%. The model correctly classified OA and RA patients without Lupus as controls at a rate of 75.92% on average with a mean sensitivity = 79.70% and specificity = 77.6%. Regression models including FPCA scores for derivative curves did not perform as well, nor did regression models including covariates. Changes in thermograms observed in the disease state likely reflect covalent modifications of plasma proteins or changes in large protein-protein interacting networks resulting in the stabilization of plasma proteins towards thermal denaturation. By relating functional principal components from thermograms to disease status, our Functional Principal Component Analysis model provides results that are more easily interpretable compared to prior studies. Further, the model could also potentially be coupled with other biomarkers to improve diagnostic classification for lupus.
Khazraee, S Hadi; Johnson, Valen; Lord, Dominique
2018-08-01
The Poisson-gamma (PG) and Poisson-lognormal (PLN) regression models are among the most popular means for motor vehicle crash data analysis. Both models belong to the Poisson-hierarchical family of models. While numerous studies have compared the overall performance of alternative Bayesian Poisson-hierarchical models, little research has addressed the impact of model choice on the expected crash frequency prediction at individual sites. This paper sought to examine whether there are any trends among candidate models predictions e.g., that an alternative model's prediction for sites with certain conditions tends to be higher (or lower) than that from another model. In addition to the PG and PLN models, this research formulated a new member of the Poisson-hierarchical family of models: the Poisson-inverse gamma (PIGam). Three field datasets (from Texas, Michigan and Indiana) covering a wide range of over-dispersion characteristics were selected for analysis. This study demonstrated that the model choice can be critical when the calibrated models are used for prediction at new sites, especially when the data are highly over-dispersed. For all three datasets, the PIGam model would predict higher expected crash frequencies than would the PLN and PG models, in order, indicating a clear link between the models predictions and the shape of their mixing distributions (i.e., gamma, lognormal, and inverse gamma, respectively). The thicker tail of the PIGam and PLN models (in order) may provide an advantage when the data are highly over-dispersed. The analysis results also illustrated a major deficiency of the Deviance Information Criterion (DIC) in comparing the goodness-of-fit of hierarchical models; models with drastically different set of coefficients (and thus predictions for new sites) may yield similar DIC values, because the DIC only accounts for the parameters in the lowest (observation) level of the hierarchy and ignores the higher levels (regression coefficients). Copyright © 2018. Published by Elsevier Ltd.
Prediction model of critical weight loss in cancer patients during particle therapy.
Zhang, Zhihong; Zhu, Yu; Zhang, Lijuan; Wang, Ziying; Wan, Hongwei
2018-01-01
The objective of this study is to investigate the predictors of critical weight loss in cancer patients receiving particle therapy, and build a prediction model based on its predictive factors. Patients receiving particle therapy were enroled between June 2015 and June 2016. Body weight was measured at the start and end of particle therapy. Association between critical weight loss (defined as >5%) during particle therapy and patients' demographic, clinical characteristic, pre-therapeutic nutrition risk screening (NRS 2002) and BMI were evaluated by logistic regression and decision tree analysis. Finally, 375 cancer patients receiving particle therapy were included. Mean weight loss was 0.55 kg, and 11.5% of patients experienced critical weight loss during particle therapy. The main predictors of critical weight loss during particle therapy were head and neck tumour location, total radiation dose ≥70 Gy on the primary tumour, and without post-surgery, as indicated by both logistic regression and decision tree analysis. Prediction model that includes tumour locations, total radiation dose and post-surgery had a good predictive ability, with the area under receiver operating characteristic curve 0.79 (95% CI: 0.71-0.88) and 0.78 (95% CI: 0.69-0.86) for decision tree and logistic regression model, respectively. Cancer patients with head and neck tumour location, total radiation dose ≥70 Gy and without post-surgery were at higher risk of critical weight loss during particle therapy, and early intensive nutrition counselling or intervention should be target at this population. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Kamruzzaman, Mohammed; Sun, Da-Wen; ElMasry, Gamal; Allen, Paul
2013-01-15
Many studies have been carried out in developing non-destructive technologies for predicting meat adulteration, but there is still no endeavor for non-destructive detection and quantification of adulteration in minced lamb meat. The main goal of this study was to develop and optimize a rapid analytical technique based on near-infrared (NIR) hyperspectral imaging to detect the level of adulteration in minced lamb. Initial investigation was carried out using principal component analysis (PCA) to identify the most potential adulterate in minced lamb. Minced lamb meat samples were then adulterated with minced pork in the range 2-40% (w/w) at approximately 2% increments. Spectral data were used to develop a partial least squares regression (PLSR) model to predict the level of adulteration in minced lamb. Good prediction model was obtained using the whole spectral range (910-1700 nm) with a coefficient of determination (R(2)(cv)) of 0.99 and root-mean-square errors estimated by cross validation (RMSECV) of 1.37%. Four important wavelengths (940, 1067, 1144 and 1217 nm) were selected using weighted regression coefficients (Bw) and a multiple linear regression (MLR) model was then established using these important wavelengths to predict adulteration. The MLR model resulted in a coefficient of determination (R(2)(cv)) of 0.98 and RMSECV of 1.45%. The developed MLR model was then applied to each pixel in the image to obtain prediction maps to visualize the distribution of adulteration of the tested samples. The results demonstrated that the laborious and time-consuming tradition analytical techniques could be replaced by spectral data in order to provide rapid, low cost and non-destructive testing technique for adulterate detection in minced lamb meat. Copyright © 2012 Elsevier B.V. All rights reserved.
Li, Min; Zhang, Lu; Yao, Xiaolong; Jiang, Xingyu
2017-01-01
The emerging membrane introduction mass spectrometry technique has been successfully used to detect benzene, toluene, ethyl benzene and xylene (BTEX), while overlapped spectra have unfortunately hindered its further application to the analysis of mixtures. Multivariate calibration, an efficient method to analyze mixtures, has been widely applied. In this paper, we compared univariate and multivariate analyses for quantification of the individual components of mixture samples. The results showed that the univariate analysis creates poor models with regression coefficients of 0.912, 0.867, 0.440 and 0.351 for BTEX, respectively. For multivariate analysis, a comparison to the partial-least squares (PLS) model shows that the orthogonal partial-least squares (OPLS) regression exhibits an optimal performance with regression coefficients of 0.995, 0.999, 0.980 and 0.976, favorable calibration parameters (RMSEC and RMSECV) and a favorable validation parameter (RMSEP). Furthermore, the OPLS exhibits a good recovery of 73.86 - 122.20% and relative standard deviation (RSD) of the repeatability of 1.14 - 4.87%. Thus, MIMS coupled with the OPLS regression provides an optimal approach for a quantitative BTEX mixture analysis in monitoring and predicting water pollution.
Modeling vertebrate diversity in Oregon using satellite imagery
NASA Astrophysics Data System (ADS)
Cablk, Mary Elizabeth
Vertebrate diversity was modeled for the state of Oregon using a parametric approach to regression tree analysis. This exploratory data analysis effectively modeled the non-linear relationships between vertebrate richness and phenology, terrain, and climate. Phenology was derived from time-series NOAA-AVHRR satellite imagery for the year 1992 using two methods: principal component analysis and derivation of EROS data center greenness metrics. These two measures of spatial and temporal vegetation condition incorporated the critical temporal element in this analysis. The first three principal components were shown to contain spatial and temporal information about the landscape and discriminated phenologically distinct regions in Oregon. Principal components 2 and 3, 6 greenness metrics, elevation, slope, aspect, annual precipitation, and annual seasonal temperature difference were investigated as correlates to amphibians, birds, all vertebrates, reptiles, and mammals. Variation explained for each regression tree by taxa were: amphibians (91%), birds (67%), all vertebrates (66%), reptiles (57%), and mammals (55%). Spatial statistics were used to quantify the pattern of each taxa and assess validity of resulting predictions from regression tree models. Regression tree analysis was relatively robust against spatial autocorrelation in the response data and graphical results indicated models were well fit to the data.
Allyn, Jérôme; Allou, Nicolas; Augustin, Pascal; Philip, Ivan; Martinet, Olivier; Belghiti, Myriem; Provenchere, Sophie; Montravers, Philippe; Ferdynus, Cyril
2017-01-01
Background The benefits of cardiac surgery are sometimes difficult to predict and the decision to operate on a given individual is complex. Machine Learning and Decision Curve Analysis (DCA) are recent methods developed to create and evaluate prediction models. Methods and finding We conducted a retrospective cohort study using a prospective collected database from December 2005 to December 2012, from a cardiac surgical center at University Hospital. The different models of prediction of mortality in-hospital after elective cardiac surgery, including EuroSCORE II, a logistic regression model and a machine learning model, were compared by ROC and DCA. Of the 6,520 patients having elective cardiac surgery with cardiopulmonary bypass, 6.3% died. Mean age was 63.4 years old (standard deviation 14.4), and mean EuroSCORE II was 3.7 (4.8) %. The area under ROC curve (IC95%) for the machine learning model (0.795 (0.755–0.834)) was significantly higher than EuroSCORE II or the logistic regression model (respectively, 0.737 (0.691–0.783) and 0.742 (0.698–0.785), p < 0.0001). Decision Curve Analysis showed that the machine learning model, in this monocentric study, has a greater benefit whatever the probability threshold. Conclusions According to ROC and DCA, machine learning model is more accurate in predicting mortality after elective cardiac surgery than EuroSCORE II. These results confirm the use of machine learning methods in the field of medical prediction. PMID:28060903
Hamilton, C A; Miller, A; Casablanca, Y; Horowitz, N S; Rungruang, B; Krivak, T C; Richard, S D; Rodriguez, N; Birrer, M J; Backes, F J; Geller, M A; Quinn, M; Goodheart, M J; Mutch, D G; Kavanagh, J J; Maxwell, G L; Bookman, M A
2018-02-01
To identify clinicopathologic factors associated with 10-year overall survival in epithelial ovarian cancer (EOC) and primary peritoneal cancer (PPC), and to develop a predictive model identifying long-term survivors. Demographic, surgical, and clinicopathologic data were abstracted from GOG 182 records. The association between clinical variables and long-term survival (LTS) (>10years) was assessed using multivariable regression analysis. Bootstrap methods were used to develop predictive models from known prognostic clinical factors and predictive accuracy was quantified using optimism-adjusted area under the receiver operating characteristic curve (AUC). The analysis dataset included 3010 evaluable patients, of whom 195 survived greater than ten years. These patients were more likely to have better performance status, endometrioid histology, stage III (rather than stage IV) disease, absence of ascites, less extensive preoperative disease distribution, microscopic disease residual following cyoreduction (R0), and decreased complexity of surgery (p<0.01). Multivariable regression analysis revealed that lower CA-125 levels, absence of ascites, stage, and R0 were significant independent predictors of LTS. A predictive model created using these variables had an AUC=0.729, which outperformed any of the individual predictors. The absence of ascites, a low CA-125, stage, and R0 at the time of cytoreduction are factors associated with LTS when controlling for other confounders. An extensively annotated clinicopathologic prediction model for LTS fell short of clinical utility suggesting that prognostic molecular profiles are needed to better predict which patients are likely to be long-term survivors. Published by Elsevier Inc.
Hamilton, C. A.; Miller, A.; Casablanca, Y.; Horowitz, N. S.; Rungruang, B.; Krivak, T. C.; Richard, S. D.; Rodriguez, N.; Birrer, M.J.; Backes, F.J.; Geller, M.A.; Quinn, M.; Goodheart, M.J.; Mutch, D.G.; Kavanagh, J.J.; Maxwell, G. L.; Bookman, M. A.
2018-01-01
Objective To identify clinicopathologic factors associated with 10-year overall survival in epithelial ovarian cancer (EOC) and primary peritoneal cancer (PPC), and to develop a predictive model identifying long-term survivors. Methods Demographic, surgical, and clinicopathologic data were abstracted from GOG 182 records. The association between clinical variables and long-term survival (LTS) (>10 years) was assessed using multivariable regression analysis. Bootstrap methods were used to develop predictive models from known prognostic clinical factors and predictive accuracy was quantified using optimism-adjusted area under the receiver operating characteristic curve (AUC). Results The analysis dataset included 3,010 evaluable patients, of whom 195 survived greater than ten years. These patients were more likely to have better performance status, endometrioid histology, stage III (rather than stage IV) disease, absence of ascites, less extensive preoperative disease distribution, microscopic disease residual following cyoreduction (R0), and decreased complexity of surgery (p<0.01). Multivariable regression analysis revealed that lower CA-125 levels, absence of ascites, stage, and R0 were significant independent predictors of LTS. A predictive model created using these variables had an AUC=0.729, which outperformed any of the individual predictors. Conclusions The absence of ascites, a low CA-125, stage, and R0 at the time of cytoreduction are factors associated with LTS when controlling for other confounders. An extensively annotated clinicopathologic prediction model for LTS fell short of clinical utility suggesting that prognostic molecular profiles are needed to better predict which patients are likely to be long-term survivors. PMID:29195926
NASA Astrophysics Data System (ADS)
Li, Xiao Ju; Yao, Kun; Dai, Jun Yu; Song, Yun Long
2018-05-01
The underground space, also known as the “fourth dimension” of the city, reflects the efficient use of urban development intensive. Urban traffic link tunnel is a typical underground limited-length space. Due to the geographical location, the special structure of space and the curvature of the tunnel, high-temperature smoke can easily form the phenomenon of “smoke turning” and the fire risk is extremely high. This paper takes an urban traffic link tunnel as an example to focus on the relationship between curvature and the temperature near the fire source, and use the pyrosim built different curvature fire model to analyze the influence of curvature on the temperature of the fire, then using SPSS Multivariate regression analysis simulate curvature of the tunnel and fire temperature data. Finally, a prediction model of urban traffic link tunnel curvature on fire temperature was proposed. The regression model analysis and test show that the curvature is negatively correlated with the tunnel temperature. This model is feasible and can provide a theoretical reference for the urban traffic link tunnel fire protection design and the preparation of the evacuation plan. And also, it provides some reference for other related curved tunnel curvature design and smoke control measures.
Loftin, Mark; Waddell, Dwight E; Robinson, James H; Owens, Scott G
2010-10-01
We compared the energy expenditure to walk or run a mile in adult normal weight walkers (NWW), overweight walkers (OW), and marathon runners (MR). The sample consisted of 19 NWW, 11 OW, and 20 MR adults. Energy expenditure was measured at preferred walking speed (NWW and OW) and running speed of a recently completed marathon. Body composition was assessed via dual-energy x-ray absorptiometry. Analysis of variance was used to compare groups with the Scheffe's procedure used for post hoc analysis. Multiple regression analysis was used to predict energy expenditure. Results that indicated OW exhibited significantly higher (p < 0.05) mass and fat weight than NWW or MR. Similar values were found between NWW and MR. Absolute energy expenditure to walk or run a mile was similar between groups (NWW 93.9 ± 15.0, OW 98.4 ± 29.9, MR 99.3 ± 10.8 kcal); however, significant differences were noted when energy expenditure was expressed relative to mass (MR > NWW > OW). When energy expenditure was expressed per kilogram of fat-free mass, similar values were found across groups. Multiple regression analysis yielded mass and gender as significant predictors of energy expenditure (R = 0.795, SEE = 10.9 kcal). We suggest that walking is an excellent physical activity for energy expenditure in overweight individuals that are capable of walking without predisposed conditions such as osteoarthritis or cardiovascular risk factors. Moreover, from a practical perspective, our regression equation (kcal = mass (kg) × 0.789 - gender (men = 1, women = 2) × 7.634 + 51.109) allows for the prediction of energy expenditure for a given distance (mile) rather than predicting energy expenditure for a given time (minutes).
School Progress Among Children of Same-Sex Couples.
Watkins, Caleb S
2018-06-01
This study uses logit regressions on a pooled sample of children from the 2012, 2013, and 2014 American Community Survey to perform a nationally representative analysis of school progress for a large sample of 4,430 children who reside with same-sex couples. Odds ratios from regressions that compare children between different-sex married couples and same-sex couples fail to show significant differences in normal school progress between households across a variety of sample compositions. Likewise, marginal effects from regressions that compare children with similar family dynamics between different-sex married couples and same-sex couples fail to predict significantly higher probabilities of grade retention for children of same-sex couples. Significantly lower grade retention rates are sometimes predicted for children of same-sex couples than for different-sex married couples, but these differences are sensitive to sample exclusions and do not indicate causal benefits to same-sex parenting.
Pekala, Ronald J; Baglio, Francesca; Cabinio, Monia; Lipari, Susanna; Baglio, Gisella; Mendozzi, Laura; Cecconi, Pietro; Pugnetti, Luigi; Sciaky, Riccardo
2017-01-01
Previous research using stepwise regression analyses found self-reported hypnotic depth (srHD) to be a function of suggestibility, trance state effects, and expectancy. This study sought to replicate and expand that research using a general state measure of hypnotic responsivity, the Phenomenology of Consciousness Inventory: Hypnotic Assessment Procedure (PCI-HAP). Ninety-five participants completed an Italian translation of the PCI-HAP, with srHD scores predicted from the PCI-HAP assessment items. The regression analysis replicated the previous research results. Additionally, stepwise regression analyses were able to predict the srHD score equally well using only the PCI dimension scores. These results not only replicated prior research but suggest how this methodology to assess hypnotic responsivity, when combined with more traditional neurophysiological and cognitive-behavioral methodologies, may allow for a more comprehensive understanding of that enigma called hypnosis.
The use of bulk collectors in monitoring wet deposition at high-altitude sites in winter
Ranalli, A.J.; Turk, J.T.; Campbell, D.H.
1997-01-01
Concentrations of dissolved ions from samples collected by wet/dry collectors were compared to those collected by bulk collectors at Halfmoon Creek and Ned Wilson Lake in western Colorado to determine if bulk collectors can be used to monitor wet deposition chemistry in remote, high-altitude regions in winter. Hydrogen-ion concentration was significantly lower (p 0.05) at Halfmoon Creek. Wet deposition concentrations were predicated from bulk deposition concentrations through linear regression analysis. Results indicate that anions (chloride, nitrate and sulfate) can be predicted with a high degree of confidence. Lack of significant differences between seasonal (winter and summer) ratios of bulk to wet deposition concentrations indicates that at sites where operation of a wet/dry collector during the winter is not practical, wet deposition concentrations can be predicted from bulk collector samples through regression analysis of wet and bulk deposition data collected during the summer.
Tung, Heng-Hsin; Jan, Ming-Shan; Huang, Chiu-Mieh; Shih, Chun-Che; Chang, Chung-Yi; Liau, Cheu-Ye
2011-01-01
The use of incentive spirometry (IS) is reported to prevent and treat postoperative pulmonary complications. This study sought to use the theory of planned behavior to predict the use of IS in this population. The study used a prospective design, with convenience sampling, to recruit a total of 116 postcardiac-surgery patients from 2 medical centers in Taipei, Taiwan, from November 2008 to May 2009. Data were collected through 2 instruments: a demographic questionnaire, and an IS questionnaire. Descriptive analysis, independent t test, one-way analysis of variance, binary regression, and liner regression were used to analyze the data. Perceived behavioral control, but not intention, was a predictor of the use of IS. Our findings provide partial support for the utility of the theory of planned behavior in explaining the use of IS behavior for cardiac surgery patients. Copyright © 2011. Published by Mosby, Inc.
Li, Zhongwei; Xin, Yuezhen; Wang, Xun; Sun, Beibei; Xia, Shengyu; Li, Hui
2016-01-01
Phellinus is a kind of fungus and is known as one of the elemental components in drugs to avoid cancers. With the purpose of finding optimized culture conditions for Phellinus production in the laboratory, plenty of experiments focusing on single factor were operated and large scale of experimental data were generated. In this work, we use the data collected from experiments for regression analysis, and then a mathematical model of predicting Phellinus production is achieved. Subsequently, a gene-set based genetic algorithm is developed to optimize the values of parameters involved in culture conditions, including inoculum size, PH value, initial liquid volume, temperature, seed age, fermentation time, and rotation speed. These optimized values of the parameters have accordance with biological experimental results, which indicate that our method has a good predictability for culture conditions optimization. PMID:27610365
Impact of job characteristics on psychological health of Chinese single working women.
Yeung, D Y; Tang, C S
2001-01-01
This study aims at investigating the impact of individual and contextual job characteristics of control, psychological and physical demand, and security on psychological distress of 193 Chinese single working women in Hong Kong. The mediating role of job satisfaction in the job characteristics-distress relation is also assessed. Multiple regression analysis results show that job satisfaction mediates the effects of job control and security in predicting psychological distress; whereas psychological job demand has an independent effect on mental distress after considering the effect of job satisfaction. This main effect model indicates that psychological distress is best predicted by small company size, high psychological job demand, and low job satisfaction. Results from a separate regression analysis fails to support the overall combined effect of job demand-control on psychological distress. However, a significant physical job demand-control interaction effect on mental distress is noted, which reduces slightly after controlling the effect of job satisfaction.
Dembo, Richard; Belenko, Steven; Childs, Kristina; Wareham, Jennifer; Schmeidler, James
2009-08-01
High rates of infection for chlamydia and gonorrhea have been noted among youths involved in the juvenile justice system. Although both individual and community-level factors have been found to be associated with sexually transmitted disease (STD) risk, their relative importance has not been tested in this population. A two-level logistic regression analysis was completed to assess the influence of individual-level and community-level predictors on STD test results among arrested youths processed at a centralized intake facility. Results from weighted two level logistic regression analyses (n = 1,368) indicated individual-level factors of gender (being female), age, race (being African American), and criminal history predicted the youths' positive STD status. For the community-level predictors, concentrated disadvantage significantly and positively predicted the youths' STD status. Implications of these findings for future research and public health policy are discussed.
Estimating top-of-atmosphere thermal infrared radiance using MERRA-2 atmospheric data
NASA Astrophysics Data System (ADS)
Kleynhans, Tania; Montanaro, Matthew; Gerace, Aaron; Kanan, Christopher
2017-05-01
Thermal infrared satellite images have been widely used in environmental studies. However, satellites have limited temporal resolution, e.g., 16 day Landsat or 1 to 2 day Terra MODIS. This paper investigates the use of the Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2) reanalysis data product, produced by NASA's Global Modeling and Assimilation Office (GMAO) to predict global topof-atmosphere (TOA) thermal infrared radiance. The high temporal resolution of the MERRA-2 data product presents opportunities for novel research and applications. Various methods were applied to estimate TOA radiance from MERRA-2 variables namely (1) a parameterized physics based method, (2) Linear regression models and (3) non-linear Support Vector Regression. Model prediction accuracy was evaluated using temporally and spatially coincident Moderate Resolution Imaging Spectroradiometer (MODIS) thermal infrared data as reference data. This research found that Support Vector Regression with a radial basis function kernel produced the lowest error rates. Sources of errors are discussed and defined. Further research is currently being conducted to train deep learning models to predict TOA thermal radiance
A statistical method for predicting seizure onset zones from human single-neuron recordings
NASA Astrophysics Data System (ADS)
Valdez, André B.; Hickman, Erin N.; Treiman, David M.; Smith, Kris A.; Steinmetz, Peter N.
2013-02-01
Objective. Clinicians often use depth-electrode recordings to localize human epileptogenic foci. To advance the diagnostic value of these recordings, we applied logistic regression models to single-neuron recordings from depth-electrode microwires to predict seizure onset zones (SOZs). Approach. We collected data from 17 epilepsy patients at the Barrow Neurological Institute and developed logistic regression models to calculate the odds of observing SOZs in the hippocampus, amygdala and ventromedial prefrontal cortex, based on statistics such as the burst interspike interval (ISI). Main results. Analysis of these models showed that, for a single-unit increase in burst ISI ratio, the left hippocampus was approximately 12 times more likely to contain a SOZ; and the right amygdala, 14.5 times more likely. Our models were most accurate for the hippocampus bilaterally (at 85% average sensitivity), and performance was comparable with current diagnostics such as electroencephalography. Significance. Logistic regression models can be combined with single-neuron recording to predict likely SOZs in epilepsy patients being evaluated for resective surgery, providing an automated source of clinically useful information.
Perceived Organizational Support for Enhancing Welfare at Work: A Regression Tree Model
Giorgi, Gabriele; Dubin, David; Perez, Javier Fiz
2016-01-01
When trying to examine outcomes such as welfare and well-being, research tends to focus on main effects and take into account limited numbers of variables at a time. There are a number of techniques that may help address this problem. For example, many statistical packages available in R provide easy-to-use methods of modeling complicated analysis such as classification and tree regression (i.e., recursive partitioning). The present research illustrates the value of recursive partitioning in the prediction of perceived organizational support in a sample of more than 6000 Italian bankers. Utilizing the tree function party package in R, we estimated a regression tree model predicting perceived organizational support from a multitude of job characteristics including job demand, lack of job control, lack of supervisor support, training, etc. The resulting model appears particularly helpful in pointing out several interactions in the prediction of perceived organizational support. In particular, training is the dominant factor. Another dimension that seems to influence organizational support is reporting (perceived communication about safety and stress concerns). Results are discussed from a theoretical and methodological point of view. PMID:28082924
Barimani, Shirin; Kleinebudde, Peter
2017-10-01
A multivariate analysis method, Science-Based Calibration (SBC), was used for the first time for endpoint determination of a tablet coating process using Raman data. Two types of tablet cores, placebo and caffeine cores, received a coating suspension comprising a polyvinyl alcohol-polyethylene glycol graft-copolymer and titanium dioxide to a maximum coating thickness of 80µm. Raman spectroscopy was used as in-line PAT tool. The spectra were acquired every minute and correlated to the amount of applied aqueous coating suspension. SBC was compared to another well-known multivariate analysis method, Partial Least Squares-regression (PLS) and a simpler approach, Univariate Data Analysis (UVDA). All developed calibration models had coefficient of determination values (R 2 ) higher than 0.99. The coating endpoints could be predicted with root mean square errors (RMSEP) less than 3.1% of the applied coating suspensions. Compared to PLS and UVDA, SBC proved to be an alternative multivariate calibration method with high predictive power. Copyright © 2017 Elsevier B.V. All rights reserved.
Li, Yankun; Shao, Xueguang; Cai, Wensheng
2007-04-15
Consensus modeling of combining the results of multiple independent models to produce a single prediction avoids the instability of single model. Based on the principle of consensus modeling, a consensus least squares support vector regression (LS-SVR) method for calibrating the near-infrared (NIR) spectra was proposed. In the proposed approach, NIR spectra of plant samples were firstly preprocessed using discrete wavelet transform (DWT) for filtering the spectral background and noise, then, consensus LS-SVR technique was used for building the calibration model. With an optimization of the parameters involved in the modeling, a satisfied model was achieved for predicting the content of reducing sugar in plant samples. The predicted results show that consensus LS-SVR model is more robust and reliable than the conventional partial least squares (PLS) and LS-SVR methods.
Prediction model for the return to work of workers with injuries in Hong Kong.
Xu, Yanwen; Chan, Chetwyn C H; Lo, Karen Hui Yu-Ling; Tang, Dan
2008-01-01
This study attempts to formulate a prediction model of return to work for a group of workers who have been suffering from chronic pain and physical injury while also being out of work in Hong Kong. The study used Case-based Reasoning (CBR) method, and compared the result with the statistical method of logistic regression model. The database of the algorithm of CBR was composed of 67 cases who were also used in the logistic regression model. The testing cases were 32 participants who had a similar background and characteristics to those in the database. The methods of setting constraints and Euclidean distance metric were used in CBR to search the closest cases to the trial case based on the matrix. The usefulness of the algorithm was tested on 32 new participants, and the accuracy of predicting return to work outcomes was 62.5%, which was no better than the 71.2% accuracy derived from the logistic regression model. The results of the study would enable us to have a better understanding of the CBR applied in the field of occupational rehabilitation by comparing with the conventional regression analysis. The findings would also shed light on the development of relevant interventions for the return-to-work process of these workers.
Usefulness of nutritional indices and classifications in predicting death of malnourished children.
Briend, A; Dykewicz, C; Graven, K; Mazumder, R N; Wojtyniak, B; Bennish, M
1986-01-01
The usefulness of nutritional indices and classifications in predicting the death of children under 5 years old was evaluated by comparing measurements of 34 children with diarrhoea who died in a Dhaka hospital with those of 318 patients who were discharged in a satisfactory condition. In a logistic regression analysis mid-upper arm circumference was found to be as effective as other nutritional indices in predicting death. Combinations of different indices did not improve the prediction. Arm circumference might be preferable to more complex criteria for predicting the death of malnourished children. PMID:3089529
Determining degree of optic nerve edema from color fundus photography
NASA Astrophysics Data System (ADS)
Agne, Jason; Wang, Jui-Kai; Kardon, Randy H.; Garvin, Mona K.
2015-03-01
Swelling of the optic nerve head (ONH) is subjectively assessed by clinicians using the Frisén scale. It is believed that a direct measurement of the ONH volume would serve as a better representation of the swelling. However, a direct measurement requires optic nerve imaging with spectral domain optical coherence tomography (SD-OCT) and 3D segmentation of the resulting images, which is not always available during clinical evaluation. Furthermore, telemedical imaging of the eye at remote locations is more feasible with non-mydriatic fundus cameras which are less costly than OCT imagers. Therefore, there is a critical need to develop a more quantitative analysis of optic nerve swelling on a continuous scale, similar to SD-OCT. Here, we select features from more commonly available 2D fundus images and use them to predict ONH volume. Twenty-six features were extracted from each of 48 color fundus images. The features include attributes of the blood vessels, optic nerve head, and peripapillary retina areas. These features were used in a regression analysis to predict ONH volume, as computed by a segmentation of the SD-OCT image. The results of the regression analysis yielded a mean square error of 2.43 mm3 and a correlation coefficient between computed and predicted volumes of R = 0:771, which suggests that ONH volume may be predicted from fundus features alone.
Knowledge of heart disease risk in a multicultural community sample of people with diabetes.
Wagner, Julie; Lacey, Kimberly; Abbott, Gina; de Groot, Mary; Chyun, Deborah
2006-06-01
Prevention of coronary heart disease (CHD) is a primary goal of diabetes management. Unfortunately, CHD risk knowledge is poor among people with diabetes. The objective is to determine predictors of CHD risk knowledge in a community sample of people with diabetes. A total of 678 people with diabetes completed the Heart Disease Facts Questionnaire (HDFQ), a valid and reliable measure of knowledge about the relationship between diabetes and heart disease. In regression analysis with demographics predicting HDFQ scores, sex, annual income, education, and health insurance status predicted HDFQ scores. In a separate regression analysis, having CHD risk factors did not predict HDFQ scores, however, taking medication for CHD risk factors did predict higher HDFQ scores. An analysis of variance showed significant differences between ethnic groups for HDFQ scores; Whites (M = 20.9) showed more CHD risk knowledge than African Americans (M = 19.6), who in turn showed more than Latinos (M = 18.2). Asians scored near Whites (M = 20.4) but did not differ significantly from any other group. Controlling for numerous demographic, socioeconomic, health care, diabetes, and cardiovascular health variables, the magnitude of ethnic differences was attenuated, but persisted. Education regarding modifiable risk factors must be delivered in a timely fashion so that lifestyle modification can be implemented and evaluated before pharmacotherapy is deemed necessary. African Americans and Latinos with diabetes are in the greatest need of education regarding CHD risk.
Fingeret, Abbey L; Martinez, Rebecca H; Hsieh, Christine; Downey, Peter; Nowygrod, Roman
2016-02-01
We aim to determine whether observed operations or internet-based video review predict improved performance in the surgery clerkship. A retrospective review of students' usage of surgical videos, observed operations, evaluations, and examination scores were used to construct an exploratory principal component analysis. Multivariate regression was used to determine factors predictive of clerkship performance. Case log data for 231 students revealed a median of 25 observed cases. Students accessed the web-based video platform a median of 15 times. Principal component analysis yielded 4 factors contributing 74% of the variability with a Kaiser-Meyer-Olkin coefficient of .83. Multivariate regression predicted shelf score (P < .0001), internal clinical skills examination score (P < .0001), subjective evaluations (P < .001), and video website utilization (P < .001) but not observed cases to be significantly associated with overall performance. Utilization of a web-based operative video platform during a surgical clerkship is an independently associated with improved clinical reasoning, fund of knowledge, and overall evaluation. Thus, this modality can serve as a useful adjunct to live observation. Copyright © 2016 Elsevier Inc. All rights reserved.
Can biomechanical variables predict improvement in crouch gait?
Hicks, Jennifer L.; Delp, Scott L.; Schwartz, Michael H.
2011-01-01
Many patients respond positively to treatments for crouch gait, yet surgical outcomes are inconsistent and unpredictable. In this study, we developed a multivariable regression model to determine if biomechanical variables and other subject characteristics measured during a physical exam and gait analysis can predict which subjects with crouch gait will demonstrate improved knee kinematics on a follow-up gait analysis. We formulated the model and tested its performance by retrospectively analyzing 353 limbs of subjects who walked with crouch gait. The regression model was able to predict which subjects would demonstrate ‘improved’ and ‘unimproved’ knee kinematics with over 70% accuracy, and was able to explain approximately 49% of the variance in subjects’ change in knee flexion between gait analyses. We found that improvement in stance phase knee flexion was positively associated with three variables that were drawn from knowledge about the biomechanical contributors to crouch gait: i) adequate hamstrings lengths and velocities, possibly achieved via hamstrings lengthening surgery, ii) normal tibial torsion, possibly achieved via tibial derotation osteotomy, and iii) sufficient muscle strength. PMID:21616666
Fadzillah, Nurrulhidayah Ahmad; Man, Yaakob bin Che; Rohman, Abdul; Rosman, Arieff Salleh; Ismail, Amin; Mustafa, Shuhaimi; Khatib, Alfi
2015-01-01
The authentication of food products from the presence of non-allowed components for certain religion like lard is very important. In this study, we used proton Nuclear Magnetic Resonance ((1)H-NMR) spectroscopy for the analysis of butter adulterated with lard by simultaneously quantification of all proton bearing compounds, and consequently all relevant sample classes. Since the spectra obtained were too complex to be analyzed visually by the naked eyes, the classification of spectra was carried out.The multivariate calibration of partial least square (PLS) regression was used for modelling the relationship between actual value of lard and predicted value. The model yielded a highest regression coefficient (R(2)) of 0.998 and the lowest root mean square error calibration (RMSEC) of 0.0091% and root mean square error prediction (RMSEP) of 0.0090, respectively. Cross validation testing evaluates the predictive power of the model. PLS model was shown as good models as the intercept of R(2)Y and Q(2)Y were 0.0853 and -0.309, respectively.
Huang, An-Min; Fei, Ben-Hua; Jiang, Ze-Hui; Hse, Chung-Yun
2007-09-01
Near infrared spectroscopy is widely used as a quantitative method, and the main multivariate techniques consist of regression methods used to build prediction models, however, the accuracy of analysis results will be affected by many factors. In the present paper, the influence of different sample roughness on the mathematical model of NIR quantitative analysis of wood density was studied. The result of experiments showed that if the roughness of predicted samples was consistent with that of calibrated samples, the result was good, otherwise the error would be much higher. The roughness-mixed model was more flexible and adaptable to different sample roughness. The prediction ability of the roughness-mixed model was much better than that of the single-roughness model.
NASA Technical Reports Server (NTRS)
Beck, L. R.; Rodriguez, M. H.; Dister, S. W.; Rodriguez, A. D.; Washino, R. K.; Roberts, D. R.; Spanner, M. A.
1997-01-01
A blind test of two remote sensing-based models for predicting adult populations of Anopheles albimanus in villages, an indicator of malaria transmission risk, was conducted in southern Chiapas, Mexico. One model was developed using a discriminant analysis approach, while the other was based on regression analysis. The models were developed in 1992 for an area around Tapachula, Chiapas, using Landsat Thematic Mapper (TM) satellite data and geographic information system functions. Using two remotely sensed landscape elements, the discriminant model was able to successfully distinguish between villages with high and low An. albimanus abundance with an overall accuracy of 90%. To test the predictive capability of the models, multitemporal TM data were used to generate a landscape map of the Huixtla area, northwest of Tapachula, where the models were used to predict risk for 40 villages. The resulting predictions were not disclosed until the end of the test. Independently, An. albimanus abundance data were collected in the 40 randomly selected villages for which the predictions had been made. These data were subsequently used to assess the models' accuracies. The discriminant model accurately predicted 79% of the high-abundance villages and 50% of the low-abundance villages, for an overall accuracy of 70%. The regression model correctly identified seven of the 10 villages with the highest mosquito abundance. This test demonstrated that remote sensing-based models generated for one area can be used successfully in another, comparable area.
Shiino, A; Nishida, Y; Yasuda, H; Suzuki, M; Matsuda, M; Inubushi, T
2004-01-01
Background: Normal pressure hydrocephalus (NPH) is considered to be a treatable form of dementia, because cerebrospinal fluid (CSF) shunting can lessen symptoms. However, neuroimaging has failed to predict when shunting will be effective. Objective: To investigate whether 1H (proton) magnetic resonance (MR) spectroscopy could predict functional outcome in patients after shunting. Methods: Neurological state including Hasegawa's dementia scale, gait, continence, and the modified Rankin scale were evaluated in 21 patients with secondary NPH who underwent ventriculo-peritoneal shunting. Outcomes were measured postoperatively at one and 12 months and were classified as excellent, fair, or poor. MR spectra were obtained from left hemispheric white matter. Results: Significant preoperative differences in N-acetyl aspartate (NAA)/creatine (Cr) and NAA/choline (Cho) were noted between patients with excellent and poor outcome at one month (p = 0.0014 and 0.0036, respectively). Multiple regression analysis linked higher preoperative NAA/Cr ratio, gait score, and modified Rankin scale to better one month outcome. Predictive value, sensitivity, and specificity for excellent outcome following shunting were 95.2%, 100%, and 87.5%. Multiple regression analysis indicated that NAA/Cho had the best predictive value for one year outcome (p = 0.0032); predictive value, sensitivity, and specificity were 89.5%, 90.0%, and 88.9%. Conclusions: MR spectroscopy predicted long term post-shunting outcomes in patients with secondary NPH, and it would be a useful assessment tool before lumbar drainage. PMID:15258216
NASA Technical Reports Server (NTRS)
Gentry, R. C.; Rodgers, E.; Steranka, J.; Shenk, W. E.
1978-01-01
A regression technique was developed to forecast 24 hour changes of the maximum winds for weak (maximum winds less than or equal to 65 Kt) and strong (maximum winds greater than 65 Kt) tropical cyclones by utilizing satellite measured equivalent blackbody temperatures around the storm alone and together with the changes in maximum winds during the preceding 24 hours and the current maximum winds. Independent testing of these regression equations shows that the mean errors made by the equations are lower than the errors in forecasts made by the peristence techniques.
Role of subdural electrocorticography in prediction of long-term seizure outcome in epilepsy surgery
Juhász, Csaba; Shah, Aashit; Sood, Sandeep; Chugani, Harry T.
2009-01-01
Since prediction of long-term seizure outcome using preoperative diagnostic modalities remains suboptimal in epilepsy surgery, we evaluated whether interictal spike frequency measures obtained from extraoperative subdural electrocorticography (ECoG) recording could predict long-term seizure outcome. This study included 61 young patients (age 0.4–23.0 years), who underwent extraoperative ECoG recording prior to cortical resection for alleviation of uncontrolled focal seizures. Patient age, frequency of preoperative seizures, neuroimaging findings, ictal and interictal ECoG measures were preoperatively obtained. The seizure outcome was prospectively measured [follow-up period: 2.5–6.4 years (mean 4.6 years)]. Univariate and multivariate logistic regression analyses determined how well preoperative demographic and diagnostic measures predicted long-term seizure outcome. Following the initial cortical resection, Engel Class I, II, III and IV outcomes were noted in 35, 6, 12 and 7 patients, respectively. One child died due to disseminated intravascular coagulation associated with pseudomonas sepsis 2 days after surgery. Univariate regression analyses revealed that incomplete removal of seizure onset zone, higher interictal spike-frequency in the preserved cortex and incomplete removal of cortical abnormalities on neuroimaging were associated with a greater risk of failing to obtain Class I outcome. Multivariate logistic regression analysis revealed that incomplete removal of seizure onset zone was the only independent predictor of failure to obtain Class I outcome. The goodness of regression model fit and the predictive ability of regression model were greatest in the full regression model incorporating both ictal and interictal measures [R2 0.44; Area under the receiver operating characteristic (ROC) curve: 0.81], slightly smaller in the reduced model incorporating ictal but not interictal measures (R2 0.40; Area under the ROC curve: 0.79) and slightly smaller again in the reduced model incorporating interictal but not ictal measures (R2 0.27; Area under the ROC curve: 0.77). Seizure onset zone and interictal spike frequency measures on subdural ECoG recording may both be useful in predicting the long-term seizure outcome of epilepsy surgery. Yet, the additive clinical impact of interictal spike frequency measures to predict long-term surgical outcome may be modest in the presence of ictal ECoG and neuroimaging data. PMID:19286694
Analysis of Multivariate Experimental Data Using A Simplified Regression Model Search Algorithm
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert M.
2013-01-01
A new regression model search algorithm was developed that may be applied to both general multivariate experimental data sets and wind tunnel strain-gage balance calibration data. The algorithm is a simplified version of a more complex algorithm that was originally developed for the NASA Ames Balance Calibration Laboratory. The new algorithm performs regression model term reduction to prevent overfitting of data. It has the advantage that it needs only about one tenth of the original algorithm's CPU time for the completion of a regression model search. In addition, extensive testing showed that the prediction accuracy of math models obtained from the simplified algorithm is similar to the prediction accuracy of math models obtained from the original algorithm. The simplified algorithm, however, cannot guarantee that search constraints related to a set of statistical quality requirements are always satisfied in the optimized regression model. Therefore, the simplified algorithm is not intended to replace the original algorithm. Instead, it may be used to generate an alternate optimized regression model of experimental data whenever the application of the original search algorithm fails or requires too much CPU time. Data from a machine calibration of NASA's MK40 force balance is used to illustrate the application of the new search algorithm.
Saqr, Mohammed; Fors, Uno; Tedre, Matti
2018-02-06
Collaborative learning facilitates reflection, diversifies understanding and stimulates skills of critical and higher-order thinking. Although the benefits of collaborative learning have long been recognized, it is still rarely studied by social network analysis (SNA) in medical education, and the relationship of parameters that can be obtained via SNA with students' performance remains largely unknown. The aim of this work was to assess the potential of SNA for studying online collaborative clinical case discussions in a medical course and to find out which activities correlate with better performance and help predict final grade or explain variance in performance. Interaction data were extracted from the learning management system (LMS) forum module of the Surgery course in Qassim University, College of Medicine. The data were analyzed using social network analysis. The analysis included visual as well as a statistical analysis. Correlation with students' performance was calculated, and automatic linear regression was used to predict students' performance. By using social network analysis, we were able to analyze a large number of interactions in online collaborative discussions and gain an overall insight of the course social structure, track the knowledge flow and the interaction patterns, as well as identify the active participants and the prominent discussion moderators. When augmented with calculated network parameters, SNA offered an accurate view of the course network, each user's position, and level of connectedness. Results from correlation coefficients, linear regression, and logistic regression indicated that a student's position and role in information relay in online case discussions, combined with the strength of that student's network (social capital), can be used as predictors of performance in relevant settings. By using social network analysis, researchers can analyze the social structure of an online course and reveal important information about students' and teachers' interactions that can be valuable in guiding teachers, improve students' engagement, and contribute to learning analytics insights.
Stylianou, Neophytos; Akbarov, Artur; Kontopantelis, Evangelos; Buchan, Iain; Dunn, Ken W
2015-08-01
Predicting mortality from burn injury has traditionally employed logistic regression models. Alternative machine learning methods have been introduced in some areas of clinical prediction as the necessary software and computational facilities have become accessible. Here we compare logistic regression and machine learning predictions of mortality from burn. An established logistic mortality model was compared to machine learning methods (artificial neural network, support vector machine, random forests and naïve Bayes) using a population-based (England & Wales) case-cohort registry. Predictive evaluation used: area under the receiver operating characteristic curve; sensitivity; specificity; positive predictive value and Youden's index. All methods had comparable discriminatory abilities, similar sensitivities, specificities and positive predictive values. Although some machine learning methods performed marginally better than logistic regression the differences were seldom statistically significant and clinically insubstantial. Random forests were marginally better for high positive predictive value and reasonable sensitivity. Neural networks yielded slightly better prediction overall. Logistic regression gives an optimal mix of performance and interpretability. The established logistic regression model of burn mortality performs well against more complex alternatives. Clinical prediction with a small set of strong, stable, independent predictors is unlikely to gain much from machine learning outside specialist research contexts. Copyright © 2015 Elsevier Ltd and ISBI. All rights reserved.
Decreased levels of sRAGE in follicular fluid from patients with PCOS.
Wang, BiJun; Li, Jing; Yang, QingLing; Zhang, FuLi; Hao, MengMeng; Guo, YiHong
2017-03-01
This study aimed to explore the association between soluble receptor for advanced glycation end products (sRAGE) levels in follicular fluid and the number of oocytes retrieved and to evaluate the effect of sRAGE on vascular endothelial growth factor (VEGF) in granulosa cells in patients with polycystic ovarian syndrome (PCOS). Two sets of experiments were performed in this study. In part one, sRAGE and VEGF protein levels in follicular fluid samples from 39 patients with PCOS and 35 non-PCOS patients were measured by ELISA. In part two, ovarian granulosa cells were isolated from an additional 10 patients with PCOS and cultured. VEGF and SP1 mRNA and protein levels, as well as pAKT levels, were detected by real-time PCR and Western blotting after cultured cells were treated with different concentrations of sRAGE. Compared with the non-PCOS patients, patients with PCOS had lower sRAGE levels in follicular fluid. Multi-adjusted regression analysis showed that high sRAGE levels in follicular fluid predicted a lower Gn dose, more oocytes retrieved, and a better IVF outcome in the non-PCOS group. Logistic regression analysis showed that higher sRAGE levels predicted favorably IVF outcomes in the non-PCOS group. Multi-adjusted regression analysis also showed that high sRAGE levels in follicular fluid predicted a lower Gn dose in the PCOS group. Treating granulosa cells isolated from patients with PCOS with recombinant sRAGE decreased VEGF and SP1 mRNA and protein expression and pAKT levels in a dose-dependent manner. © 2017 Society for Reproduction and Fertility.
Analysis of regression methods for solar activity forecasting
NASA Technical Reports Server (NTRS)
Lundquist, C. A.; Vaughan, W. W.
1979-01-01
The paper deals with the potential use of the most recent solar data to project trends in the next few years. Assuming that a mode of solar influence on weather can be identified, advantageous use of that knowledge presumably depends on estimating future solar activity. A frequently used technique for solar cycle predictions is a linear regression procedure along the lines formulated by McNish and Lincoln (1949). The paper presents a sensitivity analysis of the behavior of such regression methods relative to the following aspects: cycle minimum, time into cycle, composition of historical data base, and unnormalized vs. normalized solar cycle data. Comparative solar cycle forecasts for several past cycles are presented as to these aspects of the input data. Implications for the current cycle, No. 21, are also given.
ERIC Educational Resources Information Center
Ying, Yu-Wen; Han, Meekyung
2008-01-01
The study examined variation in the prediction of adjustment in Taiwanese students by ethnic density. A total of 155 Taiwanese students were assessed via survey pre-departure and three times post-arrival in the United States. Hierarchical regression analysis showed students on campuses with fewer other Taiwanese peers formed more friendships with…
NASA Astrophysics Data System (ADS)
Baydaroğlu, Özlem; Koçak, Kasım; Duran, Kemal
2018-06-01
Prediction of water amount that will enter the reservoirs in the following month is of vital importance especially for semi-arid countries like Turkey. Climate projections emphasize that water scarcity will be one of the serious problems in the future. This study presents a methodology for predicting river flow for the subsequent month based on the time series of observed monthly river flow with hybrid models of support vector regression (SVR). Monthly river flow over the period 1940-2012 observed for the Kızılırmak River in Turkey has been used for training the method, which then has been applied for predictions over a period of 3 years. SVR is a specific implementation of support vector machines (SVMs), which transforms the observed input data time series into a high-dimensional feature space (input matrix) by way of a kernel function and performs a linear regression in this space. SVR requires a special input matrix. The input matrix was produced by wavelet transforms (WT), singular spectrum analysis (SSA), and a chaotic approach (CA) applied to the input time series. WT convolutes the original time series into a series of wavelets, and SSA decomposes the time series into a trend, an oscillatory and a noise component by singular value decomposition. CA uses a phase space formed by trajectories, which represent the dynamics producing the time series. These three methods for producing the input matrix for the SVR proved successful, while the SVR-WT combination resulted in the highest coefficient of determination and the lowest mean absolute error.
Malomane, Dorcus Kholofelo; Norris, David; Banga, Cuthbert B; Ngambi, Jones W
2014-02-01
Body weight and weight of body parts are of economic importance. It is difficult to directly predict body weight from highly correlated morphological traits through multiple regression. Factor analysis was carried out to examine the relationship between body weight and five linear body measurements (body length, body girth, wing length, shank thickness, and shank length) in South African Venda (VN), Naked neck (NN), and Potchefstroom koekoek (PK) indigenous chicken breeds, with a view to identify those factors that define body conformation. Multiple regression was subsequently performed to predict body weight, using orthogonal traits derived from the factor analysis. Measurements were obtained from 210 chickens, 22 weeks of age, 70 chickens per breed. High correlations were obtained between body weight and all body measurements except for wing length in PK. Two factors extracted after varimax rotation explained 91, 95, and 83% of total variation in VN, NN, and PK, respectively. Factor 1 explained 73, 90, and 64% in VN, NN, and PK, respectively, and was loaded on all body measurements except for wing length in VN and PK. In a multiple regression, these two factors accounted for 72% variation in body weight in VN, while only factor 1 accounted for 83 and 74% variation in body weight in NN and PK, respectively. The two factors could be used to define body size and conformation of these breeds. Factor 1 could predict body weight in all three breeds. Body measurements can be better selected jointly to improve body weight in these breeds.
Visentin, G; McDermott, A; McParland, S; Berry, D P; Kenny, O A; Brodkorb, A; Fenelon, M A; De Marchi, M
2015-09-01
Rapid, cost-effective monitoring of milk technological traits is a significant challenge for dairy industries specialized in cheese manufacturing. The objective of the present study was to investigate the ability of mid-infrared spectroscopy to predict rennet coagulation time, curd-firming time, curd firmness at 30 and 60min after rennet addition, heat coagulation time, casein micelle size, and pH in cow milk samples, and to quantify associations between these milk technological traits and conventional milk quality traits. Samples (n=713) were collected from 605 cows from multiple herds; the samples represented multiple breeds, stages of lactation, parities, and milking times. Reference analyses were undertaken in accordance with standardized methods, and mid-infrared spectra in the range of 900 to 5,000cm(-1) were available for all samples. Prediction models were developed using partial least squares regression, and prediction accuracy was based on both cross and external validation. The proportion of variance explained by the prediction models in external validation was greatest for pH (71%), followed by rennet coagulation time (55%) and milk heat coagulation time (46%). Models to predict curd firmness 60min from rennet addition and casein micelle size, however, were poor, explaining only 25 and 13%, respectively, of the total variance in each trait within external validation. On average, all prediction models tended to be unbiased. The linear regression coefficient of the reference value on the predicted value varied from 0.17 (casein micelle size regression model) to 0.83 (pH regression model) but all differed from 1. The ratio performance deviation of 1.07 (casein micelle size prediction model) to 1.79 (pH prediction model) for all prediction models in the external validation was <2, suggesting that none of the prediction models could be used for analytical purposes. With the exception of casein micelle size and curd firmness at 60min after rennet addition, the developed prediction models may be useful as a screening method, because the concordance correlation coefficient ranged from 0.63 (heat coagulation time prediction model) to 0.84 (pH prediction model) in the external validation. Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Analysis of Sting Balance Calibration Data Using Optimized Regression Models
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert; Bader, Jon B.
2009-01-01
Calibration data of a wind tunnel sting balance was processed using a search algorithm that identifies an optimized regression model for the data analysis. The selected sting balance had two moment gages that were mounted forward and aft of the balance moment center. The difference and the sum of the two gage outputs were fitted in the least squares sense using the normal force and the pitching moment at the balance moment center as independent variables. The regression model search algorithm predicted that the difference of the gage outputs should be modeled using the intercept and the normal force. The sum of the two gage outputs, on the other hand, should be modeled using the intercept, the pitching moment, and the square of the pitching moment. Equations of the deflection of a cantilever beam are used to show that the search algorithm s two recommended math models can also be obtained after performing a rigorous theoretical analysis of the deflection of the sting balance under load. The analysis of the sting balance calibration data set is a rare example of a situation when regression models of balance calibration data can directly be derived from first principles of physics and engineering. In addition, it is interesting to see that the search algorithm recommended the same regression models for the data analysis using only a set of statistical quality metrics.
Meta-regression analysis of commensal and pathogenic Escherichia coli survival in soil and water.
Franz, Eelco; Schijven, Jack; de Roda Husman, Ana Maria; Blaak, Hetty
2014-06-17
The extent to which pathogenic and commensal E. coli (respectively PEC and CEC) can survive, and which factors predominantly determine the rate of decline, are crucial issues from a public health point of view. The goal of this study was to provide a quantitative summary of the variability in E. coli survival in soil and water over a broad range of individual studies and to identify the most important sources of variability. To that end, a meta-regression analysis on available literature data was conducted. The considerable variation in reported decline rates indicated that the persistence of E. coli is not easily predictable. The meta-analysis demonstrated that for soil and water, the type of experiment (laboratory or field), the matrix subtype (type of water and soil), and temperature were the main factors included in the regression analysis. A higher average decline rate in soil of PEC compared with CEC was observed. The regression models explained at best 57% of the variation in decline rate in soil and 41% of the variation in decline rate in water. This indicates that additional factors, not included in the current meta-regression analysis, are of importance but rarely reported. More complete reporting of experimental conditions may allow future inference on the global effects of these variables on the decline rate of E. coli.
Determination of riverbank erosion probability using Locally Weighted Logistic Regression
NASA Astrophysics Data System (ADS)
Ioannidou, Elena; Flori, Aikaterini; Varouchakis, Emmanouil A.; Giannakis, Georgios; Vozinaki, Anthi Eirini K.; Karatzas, George P.; Nikolaidis, Nikolaos
2015-04-01
Riverbank erosion is a natural geomorphologic process that affects the fluvial environment. The most important issue concerning riverbank erosion is the identification of the vulnerable locations. An alternative to the usual hydrodynamic models to predict vulnerable locations is to quantify the probability of erosion occurrence. This can be achieved by identifying the underlying relations between riverbank erosion and the geomorphological or hydrological variables that prevent or stimulate erosion. Thus, riverbank erosion can be determined by a regression model using independent variables that are considered to affect the erosion process. The impact of such variables may vary spatially, therefore, a non-stationary regression model is preferred instead of a stationary equivalent. Locally Weighted Regression (LWR) is proposed as a suitable choice. This method can be extended to predict the binary presence or absence of erosion based on a series of independent local variables by using the logistic regression model. It is referred to as Locally Weighted Logistic Regression (LWLR). Logistic regression is a type of regression analysis used for predicting the outcome of a categorical dependent variable (e.g. binary response) based on one or more predictor variables. The method can be combined with LWR to assign weights to local independent variables of the dependent one. LWR allows model parameters to vary over space in order to reflect spatial heterogeneity. The probabilities of the possible outcomes are modelled as a function of the independent variables using a logistic function. Logistic regression measures the relationship between a categorical dependent variable and, usually, one or several continuous independent variables by converting the dependent variable to probability scores. Then, a logistic regression is formed, which predicts success or failure of a given binary variable (e.g. erosion presence or absence) for any value of the independent variables. The erosion occurrence probability can be calculated in conjunction with the model deviance regarding the independent variables tested. The most straightforward measure for goodness of fit is the G statistic. It is a simple and effective way to study and evaluate the Logistic Regression model efficiency and the reliability of each independent variable. The developed statistical model is applied to the Koiliaris River Basin on the island of Crete, Greece. Two datasets of river bank slope, river cross-section width and indications of erosion were available for the analysis (12 and 8 locations). Two different types of spatial dependence functions, exponential and tricubic, were examined to determine the local spatial dependence of the independent variables at the measurement locations. The results show a significant improvement when the tricubic function is applied as the erosion probability is accurately predicted at all eight validation locations. Results for the model deviance show that cross-section width is more important than bank slope in the estimation of erosion probability along the Koiliaris riverbanks. The proposed statistical model is a useful tool that quantifies the erosion probability along the riverbanks and can be used to assist managing erosion and flooding events. Acknowledgements This work is part of an on-going THALES project (CYBERSENSORS - High Frequency Monitoring System for Integrated Water Resources Management of Rivers). The project has been co-financed by the European Union (European Social Fund - ESF) and Greek national funds through the Operational Program "Education and Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research Funding Program: THALES. Investing in knowledge society through the European Social Fund.
Archfield, Stacey A.; Pugliese, Alessio; Castellarin, Attilio; Skøien, Jon O.; Kiang, Julie E.
2013-01-01
In the United States, estimation of flood frequency quantiles at ungauged locations has been largely based on regional regression techniques that relate measurable catchment descriptors to flood quantiles. More recently, spatial interpolation techniques of point data have been shown to be effective for predicting streamflow statistics (i.e., flood flows and low-flow indices) in ungauged catchments. Literature reports successful applications of two techniques, canonical kriging, CK (or physiographical-space-based interpolation, PSBI), and topological kriging, TK (or top-kriging). CK performs the spatial interpolation of the streamflow statistic of interest in the two-dimensional space of catchment descriptors. TK predicts the streamflow statistic along river networks taking both the catchment area and nested nature of catchments into account. It is of interest to understand how these spatial interpolation methods compare with generalized least squares (GLS) regression, one of the most common approaches to estimate flood quantiles at ungauged locations. By means of a leave-one-out cross-validation procedure, the performance of CK and TK was compared to GLS regression equations developed for the prediction of 10, 50, 100 and 500 yr floods for 61 streamgauges in the southeast United States. TK substantially outperforms GLS and CK for the study area, particularly for large catchments. The performance of TK over GLS highlights an important distinction between the treatments of spatial correlation when using regression-based or spatial interpolation methods to estimate flood quantiles at ungauged locations. The analysis also shows that coupling TK with CK slightly improves the performance of TK; however, the improvement is marginal when compared to the improvement in performance over GLS.
Ren, Yilong; Wang, Yunpeng; Wu, Xinkai; Yu, Guizhen; Ding, Chuan
2016-10-01
Red light running (RLR) has become a major safety concern at signalized intersection. To prevent RLR related crashes, it is critical to identify the factors that significantly impact the drivers' behaviors of RLR, and to predict potential RLR in real time. In this research, 9-month's RLR events extracted from high-resolution traffic data collected by loop detectors from three signalized intersections were applied to identify the factors that significantly affect RLR behaviors. The data analysis indicated that occupancy time, time gap, used yellow time, time left to yellow start, whether the preceding vehicle runs through the intersection during yellow, and whether there is a vehicle passing through the intersection on the adjacent lane were significantly factors for RLR behaviors. Furthermore, due to the rare events nature of RLR, a modified rare events logistic regression model was developed for RLR prediction. The rare events logistic regression method has been applied in many fields for rare events studies and shows impressive performance, but so far none of previous research has applied this method to study RLR. The results showed that the rare events logistic regression model performed significantly better than the standard logistic regression model. More importantly, the proposed RLR prediction method is purely based on loop detector data collected from a single advance loop detector located 400 feet away from stop-bar. This brings great potential for future field applications of the proposed method since loops have been widely implemented in many intersections and can collect data in real time. This research is expected to contribute to the improvement of intersection safety significantly. Copyright © 2016 Elsevier Ltd. All rights reserved.
Aziz, Shamsul Akmar Ab; Nuawi, Mohd Zaki; Nor, Mohd Jailani Mohd
2015-01-01
The objective of this study was to present a new method for determination of hand-arm vibration (HAV) in Malaysian Army (MA) three-tonne truck steering wheels based on changes in vehicle speed using regression model and the statistical analysis method known as Integrated Kurtosis-Based Algorithm for Z-Notch Filter Technique Vibro (I-kaz Vibro). The test was conducted for two different road conditions, tarmac and dirt roads. HAV exposure was measured using a Brüel & Kjær Type 3649 vibration analyzer, which is capable of recording HAV exposures from steering wheels. The data was analyzed using I-kaz Vibro to determine the HAV values in relation to varying speeds of a truck and to determine the degree of data scattering for HAV data signals. Based on the results obtained, HAV experienced by drivers can be determined using the daily vibration exposure A(8), I-kaz Vibro coefficient (Ƶ(v)(∞)), and the I-kaz Vibro display. The I-kaz Vibro displays also showed greater scatterings, indicating that the values of Ƶ(v)(∞) and A(8) were increasing. Prediction of HAV exposure was done using the developed regression model and graphical representations of Ƶ(v)(∞). The results of the regression model showed that Ƶ(v)(∞) increased when the vehicle speed and HAV exposure increased. For model validation, predicted and measured noise exposures were compared, and high coefficient of correlation (R(2)) values were obtained, indicating that good agreement was obtained between them. By using the developed regression model, we can easily predict HAV exposure from steering wheels for HAV exposure monitoring.
NASA Astrophysics Data System (ADS)
Arantes Camargo, Livia; Marques, José, Jr.
2015-04-01
The prediction of erodibility using indirect methods such as diffuse reflectance spectroscopy could facilitate the characterization of the spatial variability in large areas and optimize implementation of conservation practices. The aim of this study was to evaluate the prediction of interrill erodibility (Ki) and rill erodibility (Kr) by means of iron oxides content and soil color using multiple linear regression and diffuse reflectance spectroscopy (DRS) using regression analysis by least squares partial (PLSR). The soils were collected from three geomorphic surfaces and analyzed for chemical, physical and mineralogical properties, plus scanned in the spectral range from the visible and infrared. Maps of spatial distribution of Ki and Kr were built with the values calculated by the calibrated models that obtained the best accuracy using geostatistics. Interrill-rill erodibility presented negative correlation with iron extracted by dithionite-citrate-bicarbonate, hematite, and chroma, confirming the influence of iron oxides in soil structural stability. Hematite and hue were the attributes that most contributed in calibration models by multiple linear regression for the prediction of Ki (R2 = 0.55) and Kr (R2 = 0.53). The diffuse reflectance spectroscopy via PLSR allowed to predict Interrill-rill erodibility with high accuracy (R2adj = 0.76, 0.81 respectively and RPD> 2.0) in the range of the visible spectrum (380-800 nm) and the characterization of the spatial variability of these attributes by geostatistics.
Cakir, Ebru; Kucuk, Ulku; Pala, Emel Ebru; Sezer, Ozlem; Ekin, Rahmi Gokhan; Cakmak, Ozgur
2017-05-01
Conventional cytomorphologic assessment is the first step to establish an accurate diagnosis in urinary cytology. In cytologic preparations, the separation of low-grade urothelial carcinoma (LGUC) from reactive urothelial proliferation (RUP) can be exceedingly difficult. The bladder washing cytologies of 32 LGUC and 29 RUP were reviewed. The cytologic slides were examined for the presence or absence of the 28 cytologic features. The cytologic criteria showing statistical significance in LGUC were increased numbers of monotonous single (non-umbrella) cells, three-dimensional cellular papillary clusters without fibrovascular cores, irregular bordered clusters, atypical single cells, irregular nuclear overlap, cytoplasmic homogeneity, increased N/C ratio, pleomorphism, nuclear border irregularity, nuclear eccentricity, elongated nuclei, and hyperchromasia (p ˂ 0.05), and the cytologic criteria showing statistical significance in RUP were inflammatory background, mixture of small and large urothelial cells, loose monolayer aggregates, and vacuolated cytoplasm (p ˂ 0.05). When these variables were subjected to a stepwise logistic regression analysis, four features were selected to distinguish LGUC from RUP: increased numbers of monotonous single (non-umbrella) cells, increased nuclear cytoplasmic ratio, hyperchromasia, and presence of small and large urothelial cells (p = 0.0001). By this logistic model of the 32 cases with proven LGUC, the stepwise logistic regression analysis correctly predicted 31 (96.9%) patients with this diagnosis, and of the 29 patients with RUP, the logistic model correctly predicted 26 (89.7%) patients as having this disease. There are several cytologic features to separate LGUC from RUP. Stepwise logistic regression analysis is a valuable tool for determining the most useful cytologic criteria to distinguish these entities. © 2017 APMIS. Published by John Wiley & Sons Ltd.
Binary logistic regression-Instrument for assessing museum indoor air impact on exhibits.
Bucur, Elena; Danet, Andrei Florin; Lehr, Carol Blaziu; Lehr, Elena; Nita-Lazar, Mihai
2017-04-01
This paper presents a new way to assess the environmental impact on historical artifacts using binary logistic regression. The prediction of the impact on the exhibits during certain pollution scenarios (environmental impact) was calculated by a mathematical model based on the binary logistic regression; it allows the identification of those environmental parameters from a multitude of possible parameters with a significant impact on exhibitions and ranks them according to their severity effect. Air quality (NO 2 , SO 2 , O 3 and PM 2.5 ) and microclimate parameters (temperature, humidity) monitoring data from a case study conducted within exhibition and storage spaces of the Romanian National Aviation Museum Bucharest have been used for developing and validating the binary logistic regression method and the mathematical model. The logistic regression analysis was used on 794 data combinations (715 to develop of the model and 79 to validate it) by a Statistical Package for Social Sciences (SPSS 20.0). The results from the binary logistic regression analysis demonstrated that from six parameters taken into consideration, four of them present a significant effect upon exhibits in the following order: O 3 >PM 2.5 >NO 2 >humidity followed at a significant distance by the effects of SO 2 and temperature. The mathematical model, developed in this study, correctly predicted 95.1 % of the cumulated effect of the environmental parameters upon the exhibits. Moreover, this model could also be used in the decisional process regarding the preventive preservation measures that should be implemented within the exhibition space. The paper presents a new way to assess the environmental impact on historical artifacts using binary logistic regression. The mathematical model developed on the environmental parameters analyzed by the binary logistic regression method could be useful in a decision-making process establishing the best measures for pollution reduction and preventive preservation of exhibits.
Boyne, Pierce; Buhr, Sarah; Rockwell, Bradley; Khoury, Jane; Carl, Daniel; Gerson, Myron; Kissela, Brett; Dunning, Kari
2015-10-01
Treadmill aerobic exercise improves gait, aerobic capacity, and cardiovascular health after stroke, but a lack of specificity in current guidelines could lead to underdosing or overdosing of aerobic intensity. The ventilatory threshold (VT) has been recommended as an optimal, specific starting point for continuous aerobic exercise. However, VT measurement is not available in clinical stroke settings. Therefore, the purpose of this study was to identify an accurate method to predict heart rate at the VT (HRVT) for use as a surrogate for VT. A cross-sectional design was employed. Using symptom-limited graded exercise test (GXT) data from 17 subjects more than 6 months poststroke, prediction methods for HRVT were derived by traditional target HR calculations (percentage of HRpeak achieved during GXT, percentage of peak HR reserve [HRRpeak], percentage of age-predicted maximal HR, and percentage of age-predicted maximal HR reserve) and by regression analysis. The validity of the prediction methods was then tested among 8 additional subjects. All prediction methods were validated by the second sample, so data were pooled to calculate refined prediction equations. HRVT was accurately predicted by 80% HRpeak (R, 0.62; standard deviation of error [SDerror], 7 bpm), 62% HRRpeak (R, 0.66; SDerror, 7 bpm), and regression models that included HRpeak (R, 0.62-0.75; SDerror, 5-6 bpm). Derived regression equations, 80% HRpeak and 62% HRRpeak, provide a specific target intensity for initial aerobic exercise prescription that should minimize underdosing and overdosing for persons with chronic stroke. The specificity of these methods may lead to more efficient and effective treatment for poststroke deconditioning.Video Abstract available for more insights from the authors (see Supplemental Digital Content 1, http://links.lww.com/JNPT/A114).
Cuesta-Vargas, Antonio I; González-Sánchez, Manuel
2014-03-01
Currently, there are no studies combining electromyography (EMG) and sonography to estimate the absolute and relative strength values of erector spinae (ES) muscles in healthy individuals. The purpose of this study was to establish whether the maximum voluntary contraction (MVC) of the ES during isometric contractions could be predicted from the changes in surface EMG as well as in fiber pennation and thickness as measured by sonography. Thirty healthy adults performed 3 isometric extensions at 45° from the vertical to calculate the MVC force. Contractions at 33% and 100% of the MVC force were then used during sonographic and EMG recordings. These measurements were used to observe the architecture and function of the muscles during contraction. Statistical analysis was performed using bivariate regression and regression equations. The slope for each regression equation was statistically significant (P < .001) with R(2) values of 0.837 and 0.986 for the right and left ES, respectively. The standard error estimate between the sonographic measurements and the regression-estimated pennation angles for the right and left ES were 0.10 and 0.02, respectively. Erector spinae muscle activation can be predicted from the changes in fiber pennation during isometric contractions at 33% and 100% of the MVC force. These findings could be essential for developing a regression equation that could estimate the level of muscle activation from changes in the muscle architecture.
Predicting the risk of patients with biopsy Gleason score 6 to harbor a higher grade cancer.
Gofrit, Ofer N; Zorn, Kevin C; Taxy, Jerome B; Lin, Shang; Zagaja, Gregory P; Steinberg, Gary D; Shalhav, Arieh L
2007-11-01
Prostate cancer Gleason score 3 + 3 = 6 is currently the most common score assigned on prostatic biopsies. We analyzed the clinical variables that predict the likelihood of a patient with biopsy Gleason score 6 to harbor a higher grade tumor. The study population consisted of 448 patients with a mean age of 59.1 years who underwent radical prostatectomy between February 2003 to October 2006 for Gleason score 6 adenocarcinoma. The effect of preoperative variables on the probability of a Gleason score upgrade on final pathological evaluation was evaluated using logistic regression, and classification and regression tree analysis. Gleason score upgrade was found in 91 of 448 patients (20.3%). Logistic regression showed that only serum prostate specific antigen and the greatest percent of cancer in a core were significantly associated with a score upgrade (p = 0.0014 and 0.023, respectively). Classification and regression tree analysis showed that the risk of a Gleason score upgrade was 62% when serum prostate specific antigen was higher than 12 ng/ml and 18% when serum prostate specific antigen was 12 ng/ml or less. In patients with serum prostate specific antigen lower than 12 ng/ml the risk of a score upgrade could be dichotomized at a greatest percent of cancer in a core of 5%. The risk was 22.6% and 10.5% when the greatest percent of cancer in a core was higher than 5% and 5% or lower, respectively. The probability of patients with a prostate biopsy Gleason score of 6 to conceal a Gleason score of 7 or higher can be predicted using serum prostate specific antigen and the greatest percent of cancer in a core. With these parameters it is possible to predict upgrade rates as high as 62% and as low as 10.5%.
Lin, Ching-Yih; Lee, Ying-En; Tian, Yu-Feng; Sun, Ding-Ping; Sheu, Ming-Jen; Lin, Chen-Yi; Li, Chien-Feng; Lee, Sung-Wei; Lin, Li-Ching; Chang, I-Wei; Wang, Chieh-Tien; He, Hong-Lin
2017-01-01
Background: Numerous transmembrane receptor tyrosine kinase pathways have been found to play an important role in tumor progression in some cancers. This study was aimed to evaluate the clinical impact of Eph receptor A4 (EphA4) in patients with rectal cancer treated with neoadjuvant concurrent chemoradiotherapy (CCRT) combined with mesorectal excision, with special emphasis on tumor regression. Methods: Analysis of the publicly available expression profiling dataset of rectal cancer disclosed that EphA4 was the top-ranking, significantly upregulated, transmembrane receptor tyrosine kinase pathway-associated gene in the non-responders to CCRT, compared with the responders. Immunohistochemical study was conducted to assess the EphA4 expression in pre-treatment biopsy specimens from 172 rectal cancer patients without distant metastasis. The relationships between EphA4 expression and various clinicopathological factors or survival were statistically analyzed. Results: EphA4 expression was significantly associated with vascular invasion ( P =0.015), post-treatment depth of tumor invasion ( P =0.006), pre-treatment and post-treatment lymph node metastasis ( P =0.004 and P =0.011, respectively). More importantly, high EphA4 expression was significantly predictive for lesser degree of tumor regression after CCRT ( P =0.031). At univariate analysis, high EphA4 expression was a negative prognosticator for disease-specific survival ( P =0.0009) and metastasis-free survival ( P =0.0001). At multivariate analysis, high expression of EphA4 still served as an independent adverse prognostic factor for disease-specific survival (HR, 2.528; 95% CI, 1.131-5.651; P =0.024) and metastasis-free survival (HR, 3.908; 95% CI, 1.590-9.601; P =0.003). Conclusion: High expression of EphA4 predicted lesser degree of tumor regression after CCRT and served as an independent negative prognostic factor in patients with rectal cancer.
Automated particle identification through regression analysis of size, shape and colour
NASA Astrophysics Data System (ADS)
Rodriguez Luna, J. C.; Cooper, J. M.; Neale, S. L.
2016-04-01
Rapid point of care diagnostic tests and tests to provide therapeutic information are now available for a range of specific conditions from the measurement of blood glucose levels for diabetes to card agglutination tests for parasitic infections. Due to a lack of specificity these test are often then backed up by more conventional lab based diagnostic methods for example a card agglutination test may be carried out for a suspected parasitic infection in the field and if positive a blood sample can then be sent to a lab for confirmation. The eventual diagnosis is often achieved by microscopic examination of the sample. In this paper we propose a computerized vision system for aiding in the diagnostic process; this system used a novel particle recognition algorithm to improve specificity and speed during the diagnostic process. We will show the detection and classification of different types of cells in a diluted blood sample using regression analysis of their size, shape and colour. The first step is to define the objects to be tracked by a Gaussian Mixture Model for background subtraction and binary opening and closing for noise suppression. After subtracting the objects of interest from the background the next challenge is to predict if a given object belongs to a certain category or not. This is a classification problem, and the output of the algorithm is a Boolean value (true/false). As such the computer program should be able to "predict" with reasonable level of confidence if a given particle belongs to the kind we are looking for or not. We show the use of a binary logistic regression analysis with three continuous predictors: size, shape and color histogram. The results suggest this variables could be very useful in a logistic regression equation as they proved to have a relatively high predictive value on their own.
Prediction of Maximal Oxygen Uptake by Six-Minute Walk Test and Body Mass Index in Healthy Boys.
Jalili, Majid; Nazem, Farzad; Sazvar, Akbar; Ranjbar, Kamal
2018-05-14
To develop an equation to predict maximal oxygen uptake (VO2max) based on the 6-minute walk test (6MWT) and body composition in healthy boys. Direct VO2max, 6-minute walk distance, and anthropometric characteristics were measured in 349 healthy boys (12.49 ± 2.72 years). Multiple regression analysis was used to generate VO2max prediction equations. Cross-validation of the VO2max prediction equations was assessed with predicted residual sum of squares statistics. Pearson correlation was used to assess the correlation between measured and predicted VO2max. Objectively measured VO2max had a significant correlation with demographic and 6MWT characteristics (R = 0.11-0.723, P < .01). Multiple regression analysis revealed the following VO2max prediction equation: VO2max (mL/kg/min) = 12.701 + (0.06 × 6-minute walk distance m ) - (0.732 × body mass index kg/m2 ) (R 2 = 0.79, standard error of the estimate [SEE] = 2.91 mL/kg/min, %SEE = 6.9%). There was strong correlation between measured and predicted VO2max (r = 0.875, P < .001). Cross-validation revealed minimal shrinkage (R 2 p = 0.78 and predicted residual sum of squares SEE = 2.99 mL/kg/min). This study provides a relatively accurate and convenient VO2max prediction equation based on the 6MWT and body mass index in healthy boys. This model can be used for evaluation of cardiorespiratory fitness of boys in different settings. Copyright © 2018 Elsevier Inc. All rights reserved.
Chakraborty, Somsubhra; Weindorf, David C; Li, Bin; Ali Aldabaa, Abdalsamad Abdalsatar; Ghosh, Rakesh Kumar; Paul, Sathi; Nasim Ali, Md
2015-05-01
Using 108 petroleum contaminated soil samples, this pilot study proposed a new analytical approach of combining visible near-infrared diffuse reflectance spectroscopy (VisNIR DRS) and portable X-ray fluorescence spectrometry (PXRF) for rapid and improved quantification of soil petroleum contamination. Results indicated that an advanced fused model where VisNIR DRS spectra-based penalized spline regression (PSR) was used to predict total petroleum hydrocarbon followed by PXRF elemental data-based random forest regression was used to model the PSR residuals, it outperformed (R(2)=0.78, residual prediction deviation (RPD)=2.19) all other models tested, even producing better generalization than using VisNIR DRS alone (RPD's of 1.64, 1.86, and 1.96 for random forest, penalized spline regression, and partial least squares regression, respectively). Additionally, unsupervised principal component analysis using the PXRF+VisNIR DRS system qualitatively separated contaminated soils from control samples. Fusion of PXRF elemental data and VisNIR derivative spectra produced an optimized model for total petroleum hydrocarbon quantification in soils. Copyright © 2015 Elsevier B.V. All rights reserved.
Prediction of Advisability of Returning Home Using the Home Care Score
Matsugi, Akiyoshi; Tani, Keisuke; Tamaru, Yoshiki; Yoshioka, Nami; Yamashita, Akira; Mori, Nobuhiko; Oku, Kosuke; Ikeda, Masashi; Nagano, Kiyoshi
2015-01-01
Purpose. The aim of this study was to assess whether the home care score (HCS), which was developed by the Ministry of Health and Welfare in Japan in 1992, is useful for the prediction of advisability of home care. Methods. Subjects living at home and in assisted-living facilities were analyzed. Binominal logistic regression analyses, using age, sex, the functional independence measure score, and the HCS, along with receiver operating characteristic curve analyses, were conducted. Findings/Conclusions. Only HCS was selected for the regression equation. Receiver operating characteristic curve analysis revealed that the area under the curve (0.9), sensitivity (0.82), specificity (0.83), and positive predictive value (0.84) for HCS were higher than those for the functional independence measure, indicating that the HCS is a powerful predictor for advisability of home care. Clinical Relevance. Comprehensive measurements of the condition of provided care and the activities of daily living of the subjects, which are included in the HCS, are required for the prediction of advisability of home care. PMID:26491568
Adelian, R; Jamali, J; Zare, N; Ayatollahi, S M T; Pooladfar, G R; Roustaei, N
2015-01-01
Identification of the prognostic factors for survival in patients with liver transplantation is challengeable. Various methods of survival analysis have provided different, sometimes contradictory, results from the same data. To compare Cox's regression model with parametric models for determining the independent factors for predicting adults' and pediatrics' survival after liver transplantation. This study was conducted on 183 pediatric patients and 346 adults underwent liver transplantation in Namazi Hospital, Shiraz, southern Iran. The study population included all patients undergoing liver transplantation from 2000 to 2012. The prognostic factors sex, age, Child class, initial diagnosis of the liver disease, PELD/MELD score, and pre-operative laboratory markers were selected for survival analysis. Among 529 patients, 346 (64.5%) were adult and 183 (34.6%) were pediatric cases. Overall, the lognormal distribution was the best-fitting model for adult and pediatric patients. Age in adults (HR=1.16, p<0.05) and weight (HR=2.68, p<0.01) and Child class B (HR=2.12, p<0.05) in pediatric patients were the most important factors for prediction of survival after liver transplantation. Adult patients younger than the mean age and pediatric patients weighing above the mean and Child class A (compared to those with classes B or C) had better survival. Parametric regression model is a good alternative for the Cox's regression model.
Wardenaar, K J; van Loo, H M; Cai, T; Fava, M; Gruber, M J; Li, J; de Jonge, P; Nierenberg, A A; Petukhova, M V; Rose, S; Sampson, N A; Schoevers, R A; Wilcox, M A; Alonso, J; Bromet, E J; Bunting, B; Florescu, S E; Fukao, A; Gureje, O; Hu, C; Huang, Y Q; Karam, A N; Levinson, D; Medina Mora, M E; Posada-Villa, J; Scott, K M; Taib, N I; Viana, M C; Xavier, M; Zarkov, Z; Kessler, R C
2014-11-01
Although variation in the long-term course of major depressive disorder (MDD) is not strongly predicted by existing symptom subtype distinctions, recent research suggests that prediction can be improved by using machine learning methods. However, it is not known whether these distinctions can be refined by added information about co-morbid conditions. The current report presents results on this question. Data came from 8261 respondents with lifetime DSM-IV MDD in the World Health Organization (WHO) World Mental Health (WMH) Surveys. Outcomes included four retrospectively reported measures of persistence/severity of course (years in episode; years in chronic episodes; hospitalization for MDD; disability due to MDD). Machine learning methods (regression tree analysis; lasso, ridge and elastic net penalized regression) followed by k-means cluster analysis were used to augment previously detected subtypes with information about prior co-morbidity to predict these outcomes. Predicted values were strongly correlated across outcomes. Cluster analysis of predicted values found three clusters with consistently high, intermediate or low values. The high-risk cluster (32.4% of cases) accounted for 56.6-72.9% of high persistence, high chronicity, hospitalization and disability. This high-risk cluster had both higher sensitivity and likelihood ratio positive (LR+; relative proportions of cases in the high-risk cluster versus other clusters having the adverse outcomes) than in a parallel analysis that excluded measures of co-morbidity as predictors. Although the results using the retrospective data reported here suggest that useful MDD subtyping distinctions can be made with machine learning and clustering across multiple indicators of illness persistence/severity, replication with prospective data is needed to confirm this preliminary conclusion.
Aliabadi, Mohsen; Golmohammadi, Rostam; Khotanlou, Hassan; Mansoorizadeh, Muharram; Salarpour, Amir
2014-01-01
Noise prediction is considered to be the best method for evaluating cost-preventative noise controls in industrial workrooms. One of the most important issues is the development of accurate models for analysis of the complex relationships among acoustic features affecting noise level in workrooms. In this study, advanced fuzzy approaches were employed to develop relatively accurate models for predicting noise in noisy industrial workrooms. The data were collected from 60 industrial embroidery workrooms in the Khorasan Province, East of Iran. The main acoustic and embroidery process features that influence the noise were used to develop prediction models using MATLAB software. Multiple regression technique was also employed and its results were compared with those of fuzzy approaches. Prediction errors of all prediction models based on fuzzy approaches were within the acceptable level (lower than one dB). However, Neuro-fuzzy model (RMSE=0.53dB and R2=0.88) could slightly improve the accuracy of noise prediction compared with generate fuzzy model. Moreover, fuzzy approaches provided more accurate predictions than did regression technique. The developed models based on fuzzy approaches as useful prediction tools give professionals the opportunity to have an optimum decision about the effectiveness of acoustic treatment scenarios in embroidery workrooms.
Goldstein, Benjamin A.; Navar, Ann Marie; Carter, Rickey E.
2017-01-01
Abstract Risk prediction plays an important role in clinical cardiology research. Traditionally, most risk models have been based on regression models. While useful and robust, these statistical methods are limited to using a small number of predictors which operate in the same way on everyone, and uniformly throughout their range. The purpose of this review is to illustrate the use of machine-learning methods for development of risk prediction models. Typically presented as black box approaches, most machine-learning methods are aimed at solving particular challenges that arise in data analysis that are not well addressed by typical regression approaches. To illustrate these challenges, as well as how different methods can address them, we consider trying to predicting mortality after diagnosis of acute myocardial infarction. We use data derived from our institution's electronic health record and abstract data on 13 regularly measured laboratory markers. We walk through different challenges that arise in modelling these data and then introduce different machine-learning approaches. Finally, we discuss general issues in the application of machine-learning methods including tuning parameters, loss functions, variable importance, and missing data. Overall, this review serves as an introduction for those working on risk modelling to approach the diffuse field of machine learning. PMID:27436868
Mohd Yusof, Mohd Yusmiaidil Putera; Cauwels, Rita; Deschepper, Ellen; Martens, Luc
2015-08-01
The third molar development (TMD) has been widely utilized as one of the radiographic method for dental age estimation. By using the same radiograph of the same individual, third molar eruption (TME) information can be incorporated to the TMD regression model. This study aims to evaluate the performance of dental age estimation in individual method models and the combined model (TMD and TME) based on the classic regressions of multiple linear and principal component analysis. A sample of 705 digital panoramic radiographs of Malay sub-adults aged between 14.1 and 23.8 years was collected. The techniques described by Gleiser and Hunt (modified by Kohler) and Olze were employed to stage the TMD and TME, respectively. The data was divided to develop three respective models based on the two regressions of multiple linear and principal component analysis. The trained models were then validated on the test sample and the accuracy of age prediction was compared between each model. The coefficient of determination (R²) and root mean square error (RMSE) were calculated. In both genders, adjusted R² yielded an increment in the linear regressions of combined model as compared to the individual models. The overall decrease in RMSE was detected in combined model as compared to TMD (0.03-0.06) and TME (0.2-0.8). In principal component regression, low value of adjusted R(2) and high RMSE except in male were exhibited in combined model. Dental age estimation is better predicted using combined model in multiple linear regression models. Copyright © 2015 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
ERIC Educational Resources Information Center
Hess, Brian; Olejnik, Stephen; Huberty, Carl J.
2001-01-01
Studied the efficacy of two improvement-over-chance or "I" effect sizes derived from predictive discriminant analysis and logistic regression analysis for two-group univariate mean comparisons through simulation. Discusses the ways in which the usefulness of each of the indices depends on the population characteristics. (SLD)
Prediction of Battery Life and Behavior from Analysis of Voltage Data
NASA Technical Reports Server (NTRS)
Mcdermott, P. P.
1984-01-01
A method for simulating charge and discharge characteristics of secondary batteries is discussed. The analysis utilizes a nonlinear regression technique where empirical data is computer fitted with a five coefficient nonlinear equation. The equations for charge and discharge voltage are identical except for a change of sign before the second and third terms.
Shah, Chirayu; Miller, Todd W.; Wyatt, Shelby K.; McKinley, Eliot T.; Olivares, Maria Graciela; Sanchez, Violeta; Nolting, Donald D.; Buck, Jason R.; Zhao, Ping; Ansari, M. Sib; Baldwin, Ronald M.; Gore, John C.; Schiff, Rachel; Arteaga, Carlos L.; Manning, H. Charles
2010-01-01
Purpose To evaluate non-invasive imaging methods as predictive biomarkers of response to trastuzumab in mouse models of HER2-overexpressing breast cancer. The correlation between tumor regression and molecular imaging of apoptosis, glucose metabolism, and cellular proliferation was evaluated longitudinally in responding and non-responding tumor-bearing cohorts. Experimental Design Mammary tumors from MMTV/HER2 transgenic female mice were transplanted into syngeneic female mice. BT474 human breast carcinoma cell line xenografts were grown in athymic nude mice. Tumor cell apoptosis (NIR700-Annexin-V accumulation), glucose metabolism ([18F]FDG-PET), and proliferation ([18F]FLT-PET) were evaluated throughout a bi-weekly trastuzumab regimen. Imaging metrics were validated by direct measurement of tumor size and immunohistochemical (IHC) analysis of cleaved caspase-3, phosphorylated AKT (p-AKT) and Ki67. Results NIR700-Annexin-V accumulated significantly in trastuzumab-treated MMTV/HER2 and BT474 tumors that ultimately regressed, but not in non-responding or vehicle-treated tumors. Uptake of [18F]FDG was not affected by trastuzumab treatment in MMTV/HER2 or BT474 tumors. [18F]FLT PET imaging predicted trastuzumab response in BT474 tumors but not in MMTV/HER2 tumors, which exhibited modest uptake of [18F]FLT. Close agreement was observed between imaging metrics and IHC analysis. Conclusions Molecular imaging of apoptosis accurately predicts trastuzumab-induced regression of HER2(+) tumors and may warrant clinical exploration to predict early response to neoadjuvant trastuzumab. Trastuzumab does not appear to alter glucose metabolism substantially enough to afford [18F]FDG-PET significant predictive value in this setting. Although promising in one preclinical model, further studies are required to determine the overall value of [18F]FLT-PET as a biomarker of response to trastuzumab in HER2+ breast cancer. PMID:19584166
Hordge, LaQuana N; McDaniel, Kiara L; Jones, Derick D; Fakayode, Sayo O
2016-05-15
The endocrine disruption property of estrogens necessitates the immediate need for effective monitoring and development of analytical protocols for their analyses in biological and human specimens. This study explores the first combined utility of a steady-state fluorescence spectroscopy and multivariate partial-least-square (PLS) regression analysis for the simultaneous determination of two estrogens (17α-ethinylestradiol (EE) and norgestimate (NOR)) concentrations in bovine serum albumin (BSA) and human serum albumin (HSA) samples. The influence of EE and NOR concentrations and temperature on the emission spectra of EE-HSA EE-BSA, NOR-HSA, and NOR-BSA complexes was also investigated. The binding of EE with HSA and BSA resulted in increase in emission characteristics of HSA and BSA and a significant blue spectra shift. In contrast, the interaction of NOR with HSA and BSA quenched the emission characteristics of HSA and BSA. The observed emission spectral shifts preclude the effective use of traditional univariate regression analysis of fluorescent data for the determination of EE and NOR concentrations in HSA and BSA samples. Multivariate partial-least-squares (PLS) regression analysis was utilized to correlate the changes in emission spectra with EE and NOR concentrations in HSA and BSA samples. The figures-of-merit of the developed PLS regression models were excellent, with limits of detection as low as 1.6×10(-8) M for EE and 2.4×10(-7) M for NOR and good linearity (R(2)>0.994985). The PLS models correctly predicted EE and NOR concentrations in independent validation HSA and BSA samples with a root-mean-square-percent-relative-error (RMS%RE) of less than 6.0% at physiological condition. On the contrary, the use of univariate regression resulted in poor predictions of EE and NOR in HSA and BSA samples, with RMS%RE larger than 40% at physiological conditions. High accuracy, low sensitivity, simplicity, low-cost with no prior analyte extraction or separation required makes this method promising, compelling, and attractive alternative for the rapid determination of estrogen concentrations in biomedical and biological specimens, pharmaceuticals, or environmental samples. Published by Elsevier B.V.
Torija, Antonio J; Ruiz, Diego P
2015-02-01
The prediction of environmental noise in urban environments requires the solution of a complex and non-linear problem, since there are complex relationships among the multitude of variables involved in the characterization and modelling of environmental noise and environmental-noise magnitudes. Moreover, the inclusion of the great spatial heterogeneity characteristic of urban environments seems to be essential in order to achieve an accurate environmental-noise prediction in cities. This problem is addressed in this paper, where a procedure based on feature-selection techniques and machine-learning regression methods is proposed and applied to this environmental problem. Three machine-learning regression methods, which are considered very robust in solving non-linear problems, are used to estimate the energy-equivalent sound-pressure level descriptor (LAeq). These three methods are: (i) multilayer perceptron (MLP), (ii) sequential minimal optimisation (SMO), and (iii) Gaussian processes for regression (GPR). In addition, because of the high number of input variables involved in environmental-noise modelling and estimation in urban environments, which make LAeq prediction models quite complex and costly in terms of time and resources for application to real situations, three different techniques are used to approach feature selection or data reduction. The feature-selection techniques used are: (i) correlation-based feature-subset selection (CFS), (ii) wrapper for feature-subset selection (WFS), and the data reduction technique is principal-component analysis (PCA). The subsequent analysis leads to a proposal of different schemes, depending on the needs regarding data collection and accuracy. The use of WFS as the feature-selection technique with the implementation of SMO or GPR as regression algorithm provides the best LAeq estimation (R(2)=0.94 and mean absolute error (MAE)=1.14-1.16 dB(A)). Copyright © 2014 Elsevier B.V. All rights reserved.
Soil sail content estimation in the yellow river delta with satellite hyperspectral data
Weng, Yongling; Gong, Peng; Zhu, Zhi-Liang
2008-01-01
Soil salinization is one of the most common land degradation processes and is a severe environmental hazard. The primary objective of this study is to investigate the potential of predicting salt content in soils with hyperspectral data acquired with EO-1 Hyperion. Both partial least-squares regression (PLSR) and conventional multiple linear regression (MLR), such as stepwise regression (SWR), were tested as the prediction model. PLSR is commonly used to overcome the problem caused by high-dimensional and correlated predictors. Chemical analysis of 95 samples collected from the top layer of soils in the Yellow River delta area shows that salt content was high on average, and the dominant chemicals in the saline soil were NaCl and MgCl2. Multivariate models were established between soil contents and hyperspectral data. Our results indicate that the PLSR technique with laboratory spectral data has a strong prediction capacity. Spectral bands at 1487-1527, 1971-1991, 2032-2092, and 2163-2355 nm possessed large absolute values of regression coefficients, with the largest coefficient at 2203 nm. We obtained a root mean squared error (RMSE) for calibration (with 61 samples) of RMSEC = 0.753 (R2 = 0.893) and a root mean squared error for validation (with 30 samples) of RMSEV = 0.574. The prediction model was applied on a pixel-by-pixel basis to a Hyperion reflectance image to yield a quantitative surface distribution map of soil salt content. The result was validated successfully from 38 sampling points. We obtained an RMSE estimate of 1.037 (R2 = 0.784) for the soil salt content map derived by the PLSR model. The salinity map derived from the SWR model shows that the predicted value is higher than the true value. These results demonstrate that the PLSR method is a more suitable technique than stepwise regression for quantitative estimation of soil salt content in a large area. ?? 2008 CASI.
Ultrasound-enhanced bioscouring of greige cotton: regression analysis of process factors
USDA-ARS?s Scientific Manuscript database
Ultrasound-enhanced bioscouring process factors for greige cotton fabric are examined using custom experimental design utilizing statistical principles. An equation is presented which predicts bioscouring performance based upon percent reflectance values obtained from UV-Vis measurements of rutheniu...
Flood-frequency prediction methods for unregulated streams of Tennessee, 2000
Law, George S.; Tasker, Gary D.
2003-01-01
Up-to-date flood-frequency prediction methods for unregulated, ungaged rivers and streams of Tennessee have been developed. Prediction methods include the regional-regression method and the newer region-of-influence method. The prediction methods were developed using stream-gage records from unregulated streams draining basins having from 1 percent to about 30 percent total impervious area. These methods, however, should not be used in heavily developed or storm-sewered basins with impervious areas greater than 10 percent. The methods can be used to estimate 2-, 5-, 10-, 25-, 50-, 100-, and 500-year recurrence-interval floods of most unregulated rural streams in Tennessee. A computer application was developed that automates the calculation of flood frequency for unregulated, ungaged rivers and streams of Tennessee. Regional-regression equations were derived by using both single-variable and multivariable regional-regression analysis. Contributing drainage area is the explanatory variable used in the single-variable equations. Contributing drainage area, main-channel slope, and a climate factor are the explanatory variables used in the multivariable equations. Deleted-residual standard error for the single-variable equations ranged from 32 to 65 percent. Deleted-residual standard error for the multivariable equations ranged from 31 to 63 percent. These equations are included in the computer application to allow easy comparison of results produced by the different methods. The region-of-influence method calculates multivariable regression equations for each ungaged site and recurrence interval using basin characteristics from 60 similar sites selected from the study area. Explanatory variables that may be used in regression equations computed by the region-of-influence method include contributing drainage area, main-channel slope, a climate factor, and a physiographic-region factor. Deleted-residual standard error for the region-of-influence method tended to be only slightly smaller than those for the regional-regression method and ranged from 27 to 62 percent.
Direct Breakthrough Curve Prediction From Statistics of Heterogeneous Conductivity Fields
NASA Astrophysics Data System (ADS)
Hansen, Scott K.; Haslauer, Claus P.; Cirpka, Olaf A.; Vesselinov, Velimir V.
2018-01-01
This paper presents a methodology to predict the shape of solute breakthrough curves in heterogeneous aquifers at early times and/or under high degrees of heterogeneity, both cases in which the classical macrodispersion theory may not be applicable. The methodology relies on the observation that breakthrough curves in heterogeneous media are generally well described by lognormal distributions, and mean breakthrough times can be predicted analytically. The log-variance of solute arrival is thus sufficient to completely specify the breakthrough curves, and this is calibrated as a function of aquifer heterogeneity and dimensionless distance from a source plane by means of Monte Carlo analysis and statistical regression. Using the ensemble of simulated groundwater flow and solute transport realizations employed to calibrate the predictive regression, reliability estimates for the prediction are also developed. Additional theoretical contributions include heuristics for the time until an effective macrodispersion coefficient becomes applicable, and also an expression for its magnitude that applies in highly heterogeneous systems. It is seen that the results here represent a way to derive continuous time random walk transition distributions from physical considerations rather than from empirical field calibration.
NASA Astrophysics Data System (ADS)
Snedden, Gregg A.; Steyer, Gregory D.
2013-02-01
Understanding plant community zonation along estuarine stress gradients is critical for effective conservation and restoration of coastal wetland ecosystems. We related the presence of plant community types to estuarine hydrology at 173 sites across coastal Louisiana. Percent relative cover by species was assessed at each site near the end of the growing season in 2008, and hourly water level and salinity were recorded at each site Oct 2007-Sep 2008. Nine plant community types were delineated with k-means clustering, and indicator species were identified for each of the community types with indicator species analysis. An inverse relation between salinity and species diversity was observed. Canonical correspondence analysis (CCA) effectively segregated the sites across ordination space by community type, and indicated that salinity and tidal amplitude were both important drivers of vegetation composition. Multinomial logistic regression (MLR) and Akaike's Information Criterion (AIC) were used to predict the probability of occurrence of the nine vegetation communities as a function of salinity and tidal amplitude, and probability surfaces obtained from the MLR model corroborated the CCA results. The weighted kappa statistic, calculated from the confusion matrix of predicted versus actual community types, was 0.7 and indicated good agreement between observed community types and model predictions. Our results suggest that models based on a few key hydrologic variables can be valuable tools for predicting vegetation community development when restoring and managing coastal wetlands.
Job compensable factors and factor weights derived from job analysis data.
Chi, Chia-Fen; Chang, Tin-Chang; Hsia, Ping-Ling; Song, Jen-Chieh
2007-06-01
Government data on 1,039 job titles in Taiwan were analyzed to assess possible relationships between job attributes and compensation. For each job title, 79 specific variables in six major classes (required education and experience, aptitude, interest, work temperament, physical demands, task environment) were coded to derive the statistical predictors of wage for managers, professionals, technical, clerical, service, farm, craft, operatives, and other workers. Of the 79 variables, only 23 significantly related to pay rate were subjected to a factor and multiple regression analysis for predicting monthly wages. Given the heterogeneous nature of collected job titles, a 4-factor solution (occupational knowledge and skills, human relations skills, work schedule hardships, physical hardships) explaining 43.8% of the total variance but predicting only 23.7% of the monthly pay rate was derived. On the other hand, multiple regression with 9 job analysis items (required education, professional training, professional certificate, professional experience, coordinating, leadership and directing, demand on hearing, proportion of shift working indoors, outdoors and others, rotating shift) better predicted pay and explained 32.5% of the variance. A direct comparison of factors and subfactors of job evaluation plans indicated mental effort and responsibility (accountability) had not been measured with the current job analysis data. Cross-validation of job evaluation factors and ratings with the wage rates is required to calibrate both.
NASA Astrophysics Data System (ADS)
Asencio-Cortés, G.; Morales-Esteban, A.; Shang, X.; Martínez-Álvarez, F.
2018-06-01
Earthquake magnitude prediction is a challenging problem that has been widely studied during the last decades. Statistical, geophysical and machine learning approaches can be found in literature, with no particularly satisfactory results. In recent years, powerful computational techniques to analyze big data have emerged, making possible the analysis of massive datasets. These new methods make use of physical resources like cloud based architectures. California is known for being one of the regions with highest seismic activity in the world and many data are available. In this work, the use of several regression algorithms combined with ensemble learning is explored in the context of big data (1 GB catalog is used), in order to predict earthquakes magnitude within the next seven days. Apache Spark framework, H2 O library in R language and Amazon cloud infrastructure were been used, reporting very promising results.
Beef quality grading using machine vision
NASA Astrophysics Data System (ADS)
Jeyamkondan, S.; Ray, N.; Kranzler, Glenn A.; Biju, Nisha
2000-12-01
A video image analysis system was developed to support automation of beef quality grading. Forty images of ribeye steaks were acquired. Fat and lean meat were differentiated using a fuzzy c-means clustering algorithm. Muscle longissimus dorsi (l.d.) was segmented from the ribeye using morphological operations. At the end of each iteration of erosion and dilation, a convex hull was fitted to the image and compactness was measured. The number of iterations was selected to yield the most compact l.d. Match between the l.d. muscle traced by an expert grader and that segmented by the program was 95.9%. Marbling and color features were extracted from the l.d. muscle and were used to build regression models to predict marbling and color scores. Quality grade was predicted using another regression model incorporating all features. Grades predicted by the model were statistically equivalent to the grades assigned by expert graders.
Prediction of elemental creep. [steady state and cyclic data from regression analysis
NASA Technical Reports Server (NTRS)
Davis, J. W.; Rummler, D. R.
1975-01-01
Cyclic and steady-state creep tests were performed to provide data which were used to develop predictive equations. These equations, describing creep as a function of stress, temperature, and time, were developed through the use of a least squares regression analyses computer program for both the steady-state and cyclic data sets. Comparison of the data from the two types of tests, revealed that there was no significant difference between the cyclic and steady-state creep strains for the L-605 sheet under the experimental conditions investigated (for the same total time at load). Attempts to develop a single linear equation describing the combined steady-state and cyclic creep data resulted in standard errors of estimates higher than obtained for the individual data sets. A proposed approach to predict elemental creep in metals uses the cyclic creep equation and a computer program which applies strain and time hardening theories of creep accumulation.
A Model for Oil-Gas Pipelines Cost Prediction Based on a Data Mining Process
NASA Astrophysics Data System (ADS)
Batzias, Fragiskos A.; Spanidis, Phillip-Mark P.
2009-08-01
This paper addresses the problems associated with the cost estimation of oil/gas pipelines during the elaboration of feasibility assessments. Techno-economic parameters, i.e., cost, length and diameter, are critical for such studies at the preliminary design stage. A methodology for the development of a cost prediction model based on Data Mining (DM) process is proposed. The design and implementation of a Knowledge Base (KB), maintaining data collected from various disciplines of the pipeline industry, are presented. The formulation of a cost prediction equation is demonstrated by applying multiple regression analysis using data sets extracted from the KB. Following the methodology proposed, a learning context is inductively developed as background pipeline data are acquired, grouped and stored in the KB, and through a linear regression model provide statistically substantial results, useful for project managers or decision makers.
Choo, Min Soo; Yoo, Changwon; Cho, Sung Yong; Jeong, Seong Jin; Jeong, Chang Wook; Ku, Ja Hyeon; Oh, Seung-June
2017-04-01
As the elderly population increases, a growing number of patients have lower urinary tract symptom (LUTS)/benign prostatic hyperplasia (BPH). The aim of this study was to develop decision support formulas and nomograms for the prediction of bladder outlet obstruction (BOO) and for BOO-related surgical decision-making, and to validate them in patients with LUTS/BPH. Patient with LUTS/BPH between October 2004 and May 2014 were enrolled as a development cohort. The available variables included age, International Prostate Symptom Score, free uroflowmetry, postvoid residual volume, total prostate volume, and the results of a pressure-flow study. A causal Bayesian network analysis was used to identify relevant parameters. Using multivariate logistic regression analysis, formulas were developed to calculate the probabilities of having BOO and requiring prostatic surgery. Patients between June 2014 and December 2015 were prospectively enrolled for internal validation. Receiver operating characteristic curve analysis, calibration plots, and decision curve analysis were performed. A total of 1,179 male patients with LUTS/BPH, with a mean age of 66.1 years, were included as a development cohort. Another 253 patients were enrolled as an internal validation cohort. Using multivariate logistic regression analysis, 2 and 4 formulas were established to estimate the probabilities of having BOO and requiring prostatic surgery, respectively. Our analysis of the predictive accuracy of the model revealed area under the curve values of 0.82 for BOO and 0.87 for prostatic surgery. The sensitivity and specificity were 53.6% and 87.0% for BOO, and 91.6% and 50.0% for prostatic surgery, respectively. The calibration plot indicated that these prediction models showed a good correspondence. In addition, the decision curve analysis showed a high net benefit across the entire spectrum of probability thresholds. We established nomograms for the prediction of BOO and BOO-related prostatic surgery in patients with LUTS/BPH. Internal validation of the nomograms demonstrated that they predicted both having BOO and requiring prostatic surgery very well.
Kumar, K Vasanth; Porkodi, K; Rocha, F
2008-01-15
A comparison of linear and non-linear regression method in selecting the optimum isotherm was made to the experimental equilibrium data of basic red 9 sorption by activated carbon. The r(2) was used to select the best fit linear theoretical isotherm. In the case of non-linear regression method, six error functions namely coefficient of determination (r(2)), hybrid fractional error function (HYBRID), Marquardt's percent standard deviation (MPSD), the average relative error (ARE), sum of the errors squared (ERRSQ) and sum of the absolute errors (EABS) were used to predict the parameters involved in the two and three parameter isotherms and also to predict the optimum isotherm. Non-linear regression was found to be a better way to obtain the parameters involved in the isotherms and also the optimum isotherm. For two parameter isotherm, MPSD was found to be the best error function in minimizing the error distribution between the experimental equilibrium data and predicted isotherms. In the case of three parameter isotherm, r(2) was found to be the best error function to minimize the error distribution structure between experimental equilibrium data and theoretical isotherms. The present study showed that the size of the error function alone is not a deciding factor to choose the optimum isotherm. In addition to the size of error function, the theory behind the predicted isotherm should be verified with the help of experimental data while selecting the optimum isotherm. A coefficient of non-determination, K(2) was explained and was found to be very useful in identifying the best error function while selecting the optimum isotherm.
Ashrafi, Mahnaz; Bahmanabadi, Akram; Akhond, Mohammad Reza; Arabipoor, Arezoo
2015-11-01
To evaluate demographic, medical history and clinical cycle characteristics of infertile non-polycystic ovary syndrome (NPCOS) women with the purpose of investigating their associations with the prevalence of moderate-to-severe OHSS. In this retrospective study, among 7073 in vitro fertilization and/or intracytoplasmic sperm injection (IVF/ICSI) cycles, 86 cases of NPCO patients who developed moderate-to-severe OHSS while being treated with IVF/ICSI cycles were analyzed during the period of January 2008 to December 2010 at Royan Institute. To review the OHSS risk factors, 172 NPCOS patients without developing OHSS, treated at the same period of time, were selected randomly by computer as control group. We used multiple logistic regression in a backward manner to build a prediction model. The regression analysis revealed that the variables, including age [odds ratio (OR) 0.9, confidence interval (CI) 0.81-0.99], antral follicles count (OR 4.3, CI 2.7-6.9), infertility cause (tubal factor, OR 11.5, CI 1.1-51.3), hypothyroidism (OR 3.8, CI 1.5-9.4) and positive history of ovarian surgery (OR 0.2, CI 0.05-0.9) were the most important predictors of OHSS. The regression model had an area under curve of 0.94, presenting an allowable discriminative performance that was equal with two strong predictive variables, including the number of follicles and serum estradiol level on human chorionic gonadotropin day. The predictive regression model based on primary characteristics of NPCOS patients had equal specificity in comparison with two mentioned strong predictive variables. Therefore, it may be beneficial to apply this model before the beginning of ovarian stimulation protocol.
Predicting heavy metal concentrations in soils and plants using field spectrophotometry
NASA Astrophysics Data System (ADS)
Muradyan, V.; Tepanosyan, G.; Asmaryan, Sh.; Sahakyan, L.; Saghatelyan, A.; Warner, T. A.
2017-09-01
Aim of this study is to predict heavy metal (HM) concentrations in soils and plants using field remote sensing methods. The studied sites were an industrial town of Kajaran and city of Yerevan. The research also included sampling of soils and leaves of two tree species exposed to different pollution levels and determination of contents of HM in lab conditions. The obtained spectral values were then collated with contents of HM in Kajaran soils and the tree leaves sampled in Yerevan, and statistical analysis was done. Consequently, Zn and Pb have a negative correlation coefficient (p <0.01) in a 2498 nm spectral range for soils. Pb has a significantly higher correlation at red edge for plants. A regression models and artificial neural network (ANN) for HM prediction were developed. Good results were obtained for the best stress sensitive spectral band ANN (R2 0.9, RPD 2.0), Simple Linear Regression (SLR) and Partial Least Squares Regression (PLSR) (R2 0.7, RPD 1.4) models. Multiple Linear Regression (MLR) model was not applicable to predict Pb and Zn concentrations in soils in this research. Almost all full spectrum PLS models provide good calibration and validation results (RPD>1.4). Full spectrum ANN models are characterized by excellent calibration R2, rRMSE and RPD (0.9; 0.1 and >2.5 respectively). For prediction of Pb and Ni contents in plants SLR and PLS models were used. The latter provide almost the same results. Our findings indicate that it is possible to make coarse direct estimation of HM content in soils and plants using rapid and economic reflectance spectroscopy.
Starke, A; Haudum, A; Weijers, G; Herzog, K; Wohlsein, P; Beyerbach, M; de Korte, C L; Thijssen, J M; Rehage, J
2010-07-01
The aim was to test the accuracy of calibrated digital analysis of ultrasonographic hepatic images for diagnosing fatty liver in dairy cows. Digital analysis was performed by means of a novel method, computer-aided ultrasound diagnosis (CAUS), previously published by the authors. This method implies a set of pre- and postprocessing steps to normalize and correct the transcutaneous ultrasonographic images. Transcutaneous hepatic ultrasonography was performed before surgical correction on 151 German Holstein dairy cows (mean +/- standard error of the means; body weight: 571+/-7 kg; age: 4.9+/-0.2 yr; DIM: 35+/-5) with left-sided abomasal displacement. Concentration of triacylglycerol (TAG) was biochemically determined in liver samples collected via biopsy and values were considered the gold standard to which ultrasound estimates were compared. According to histopathologic examination of biopsies, none of the cows suffered from hepatic disorders other than hepatic lipidosis. Hepatic TAG concentrations ranged from 4.6 to 292.4 mg/g of liver fresh weight (FW). High correlations were found between the hepatic TAG and mean echo level (r=0.59) and residual attenuation (ResAtt; r=0.80) obtained in ultrasonographic imaging. High correlation existed between ResAtt and mean echo level (r=0.76). The 151 studied cows were split randomly into a training set of 76 cows and a test set of 75 cows. Based on the data from the training set, ResAtt was statistically selected by means of stepwise multiple regression analysis for hepatic TAG prediction (R(2)=0.69). Then, using the predicted TAG data of the test set, receiver operating characteristic analysis was performed to summarize the accuracy and predictive potential of the differentiation between various measured hepatic TAG values, based on TAG predicted from the regression formula. The area under the curve values of the receiver operating characteristic based on the regression equation were 0.94 (<50 vs. >or=50mg of TAG/g of FW), 0.83 (<100 vs. >or=100mg of TAG/g of FW), and 0.97 (<50 vs. >or=100mg of TAG/g of FW). The CAUS methodology and software for digitally analyzing liver ultrasonographic images is considered feasible for noninvasive screening of fatty liver in dairy herd health programs. Using the single parameter linear regression equation might be ideal for practical applications. Copyright (c) 2010 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
McLoughlin, M. Padraig M. M.; Bluford, Dontrell A.
2004-01-01
This study investigated the predictive validity of the Descriptive Tests of Mathematical Skills (DTMS) and the SAT-Mathematics (SAT-M) tests as placement tools for entering students in a small, liberal arts, historically black institution (HBI) using regression analysis. The placement schema is four-tiered: for a remedial algebra course, college…
Surrogate Analysis and Index Developer (SAID) tool
Domanski, Marian M.; Straub, Timothy D.; Landers, Mark N.
2015-10-01
The regression models created in SAID can be used in utilities that have been developed to work with the USGS National Water Information System (NWIS) and for the USGS National Real-Time Water Quality (NRTWQ) Web site. The real-time dissemination of predicted SSC and prediction intervals for each time step has substantial potential to improve understanding of sediment-related water quality and associated engineering and ecological management decisions.
Dankers, Frank; Wijsman, Robin; Troost, Esther G C; Monshouwer, René; Bussink, Johan; Hoffmann, Aswin L
2017-05-07
In our previous work, a multivariable normal-tissue complication probability (NTCP) model for acute esophageal toxicity (AET) Grade ⩾2 after highly conformal (chemo-)radiotherapy for non-small cell lung cancer (NSCLC) was developed using multivariable logistic regression analysis incorporating clinical parameters and mean esophageal dose (MED). Since the esophagus is a tubular organ, spatial information of the esophageal wall dose distribution may be important in predicting AET. We investigated whether the incorporation of esophageal wall dose-surface data with spatial information improves the predictive power of our established NTCP model. For 149 NSCLC patients treated with highly conformal radiation therapy esophageal wall dose-surface histograms (DSHs) and polar dose-surface maps (DSMs) were generated. DSMs were used to generate new DSHs and dose-length-histograms that incorporate spatial information of the dose-surface distribution. From these histograms dose parameters were derived and univariate logistic regression analysis showed that they correlated significantly with AET. Following our previous work, new multivariable NTCP models were developed using the most significant dose histogram parameters based on univariate analysis (19 in total). However, the 19 new models incorporating esophageal wall dose-surface data with spatial information did not show improved predictive performance (area under the curve, AUC range 0.79-0.84) over the established multivariable NTCP model based on conventional dose-volume data (AUC = 0.84). For prediction of AET, based on the proposed multivariable statistical approach, spatial information of the esophageal wall dose distribution is of no added value and it is sufficient to only consider MED as a predictive dosimetric parameter.
NASA Astrophysics Data System (ADS)
Dankers, Frank; Wijsman, Robin; Troost, Esther G. C.; Monshouwer, René; Bussink, Johan; Hoffmann, Aswin L.
2017-05-01
In our previous work, a multivariable normal-tissue complication probability (NTCP) model for acute esophageal toxicity (AET) Grade ⩾2 after highly conformal (chemo-)radiotherapy for non-small cell lung cancer (NSCLC) was developed using multivariable logistic regression analysis incorporating clinical parameters and mean esophageal dose (MED). Since the esophagus is a tubular organ, spatial information of the esophageal wall dose distribution may be important in predicting AET. We investigated whether the incorporation of esophageal wall dose-surface data with spatial information improves the predictive power of our established NTCP model. For 149 NSCLC patients treated with highly conformal radiation therapy esophageal wall dose-surface histograms (DSHs) and polar dose-surface maps (DSMs) were generated. DSMs were used to generate new DSHs and dose-length-histograms that incorporate spatial information of the dose-surface distribution. From these histograms dose parameters were derived and univariate logistic regression analysis showed that they correlated significantly with AET. Following our previous work, new multivariable NTCP models were developed using the most significant dose histogram parameters based on univariate analysis (19 in total). However, the 19 new models incorporating esophageal wall dose-surface data with spatial information did not show improved predictive performance (area under the curve, AUC range 0.79-0.84) over the established multivariable NTCP model based on conventional dose-volume data (AUC = 0.84). For prediction of AET, based on the proposed multivariable statistical approach, spatial information of the esophageal wall dose distribution is of no added value and it is sufficient to only consider MED as a predictive dosimetric parameter.
Ghanem, Eman; Hopfer, Helene; Navarro, Andrea; Ritzer, Maxwell S; Mahmood, Lina; Fredell, Morgan; Cubley, Ashley; Bolen, Jessica; Fattah, Rabia; Teasdale, Katherine; Lieu, Linh; Chua, Tedmund; Marini, Federico; Heymann, Hildegarde; Anslyn, Eric V
2015-05-20
Differential sensing using synthetic receptors as mimics of the mammalian senses of taste and smell is a powerful approach for the analysis of complex mixtures. Herein, we report on the effectiveness of a cross-reactive, supramolecular, peptide-based sensing array in differentiating and predicting the composition of red wine blends. Fifteen blends of Cabernet Sauvignon, Merlot and Cabernet Franc, in addition to the mono varietals, were used in this investigation. Linear Discriminant Analysis (LDA) showed a clear differentiation of blends based on tannin concentration and composition where certain mono varietals like Cabernet Sauvignon seemed to contribute less to the overall characteristics of the blend. Partial Least Squares (PLS) Regression and cross validation were used to build a predictive model for the responses of the receptors to eleven binary blends and the three mono varietals. The optimized model was later used to predict the percentage of each mono varietal in an independent test set composted of four tri-blends with a 15% average error. A partial least square regression model using the mouth-feel and taste descriptive sensory attributes of the wine blends revealed a strong correlation of the receptors to perceived astringency, which is indicative of selective binding to polyphenols in wine.
Prediction of cadmium enrichment in reclaimed coastal soils by classification and regression tree
NASA Astrophysics Data System (ADS)
Ru, Feng; Yin, Aijing; Jin, Jiaxin; Zhang, Xiuying; Yang, Xiaohui; Zhang, Ming; Gao, Chao
2016-08-01
Reclamation of coastal land is one of the most common ways to obtain land resources in China. However, it has long been acknowledged that the artificial interference with coastal land has disadvantageous effects, such as heavy metal contamination. This study aimed to develop a prediction model for cadmium enrichment levels and assess the importance of affecting factors in typical reclaimed land in Eastern China (DFCL: Dafeng Coastal Land). Two hundred and twenty seven surficial soil/sediment samples were collected and analyzed to identify the enrichment levels of cadmium and the possible affecting factors in soils and sediments. The classification and regression tree (CART) model was applied in this study to predict cadmium enrichment levels. The prediction results showed that cadmium enrichment levels assessed by the CART model had an accuracy of 78.0%. The CART model could extract more information on factors affecting the environmental behavior of cadmium than correlation analysis. The integration of correlation analysis and the CART model showed that fertilizer application and organic carbon accumulation were the most important factors affecting soil/sediment cadmium enrichment levels, followed by particle size effects (Al2O3, TFe2O3 and SiO2), contents of Cl and S, surrounding construction areas and reclamation history.
Lorenz, Alyson; Dhingra, Radhika; Chang, Howard H; Bisanzio, Donal; Liu, Yang; Remais, Justin V
2014-01-01
Extrapolating landscape regression models for use in assessing vector-borne disease risk and other applications requires thoughtful evaluation of fundamental model choice issues. To examine implications of such choices, an analysis was conducted to explore the extent to which disparate landscape models agree in their epidemiological and entomological risk predictions when extrapolated to new regions. Agreement between six literature-drawn landscape models was examined by comparing predicted county-level distributions of either Lyme disease or Ixodes scapularis vector using Spearman ranked correlation. AUC analyses and multinomial logistic regression were used to assess the ability of these extrapolated landscape models to predict observed national data. Three models based on measures of vegetation, habitat patch characteristics, and herbaceous landcover emerged as effective predictors of observed disease and vector distribution. An ensemble model containing these three models improved precision and predictive ability over individual models. A priori assessment of qualitative model characteristics effectively identified models that subsequently emerged as better predictors in quantitative analysis. Both a methodology for quantitative model comparison and a checklist for qualitative assessment of candidate models for extrapolation are provided; both tools aim to improve collaboration between those producing models and those interested in applying them to new areas and research questions.
Minority stress and sexual problems among African-American gay and bisexual men.
Zamboni, Brian D; Crawford, Isiaah
2007-08-01
Minority stress, such as racism and gay bashing, may be associated with sexual problems, but this notion has not been examined in the literature. African-American gay/bisexual men face a unique challenge in managing a double minority status, putting them at high risk for stress and sexual problems. This investigation examined ten predictors of sexual problems among 174 African-American gay/bisexual men. Covarying for age, a forward multiple regression analysis showed that the measures of self-esteem, male gender role stress, HIV prevention self-efficacy, and lifetime experiences with racial discrimination significantly added to the prediction of sexual problems. Gay bashing, psychiatric symptoms, low life satisfaction, and low social support were significantly correlated with sexual problems, but did not add to the prediction of sexual problems in the regression analysis. Mediation analyses showed that stress predicted psychiatric symptoms, which then predicted sexual problems. Sexual problems were not significantly related to HIV status, racial/ethnic identity, or gay identity. The findings from this study showed a relationship between experiences with racial and sexual discrimination and sexual problems while also providing support for mediation to illustrate how stress might cause sexual problems. Addressing minority stress in therapy may help minimize and treat sexual difficulties among minority gay/bisexual men.
NASA Astrophysics Data System (ADS)
Alekseenko, M. A.; Gendrina, I. Yu.
2017-11-01
Recently, due to the abundance of various types of observational data in the systems of vision through the atmosphere and the need for their processing, the use of various methods of statistical research in the study of such systems as correlation-regression analysis, dynamic series, variance analysis, etc. is actual. We have attempted to apply elements of correlation-regression analysis for the study and subsequent prediction of the patterns of radiation transfer in these systems same as in the construction of radiation models of the atmosphere. In this paper, we present some results of statistical processing of the results of numerical simulation of the characteristics of vision systems through the atmosphere obtained with the help of a special software package.1
Experimental and computational prediction of glass transition temperature of drugs.
Alzghoul, Ahmad; Alhalaweh, Amjad; Mahlin, Denny; Bergström, Christel A S
2014-12-22
Glass transition temperature (Tg) is an important inherent property of an amorphous solid material which is usually determined experimentally. In this study, the relation between Tg and melting temperature (Tm) was evaluated using a data set of 71 structurally diverse druglike compounds. Further, in silico models for prediction of Tg were developed based on calculated molecular descriptors and linear (multilinear regression, partial least-squares, principal component regression) and nonlinear (neural network, support vector regression) modeling techniques. The models based on Tm predicted Tg with an RMSE of 19.5 K for the test set. Among the five computational models developed herein the support vector regression gave the best result with RMSE of 18.7 K for the test set using only four chemical descriptors. Hence, two different models that predict Tg of drug-like molecules with high accuracy were developed. If Tm is available, a simple linear regression can be used to predict Tg. However, the results also suggest that support vector regression and calculated molecular descriptors can predict Tg with equal accuracy, already before compound synthesis.
du Toit, Lisa; Pillay, Viness; Choonara, Yahya
2010-01-01
Dissolution testing with subsequent analysis is considered as an imperative tool for quality evaluation of the combination rifampicin-isoniazid (RIF-INH) combination. Partial least squares (PLS) regression has been successfully undertaken to select suitable predictor variables and to identify outliers for the generation of equations for RIF and INH determination in fixed-dose combinations (FDCs). The aim of this investigation was to ascertain the applicability of the described technique in testing a novel oral FDC anti-TB drug delivery system and currently available two-drug FDCs, in comparison to the United States Pharmacopeial method for analysis of RIF and INH Capsules with chromatographic determination of INH and colorimetric RIF determination. Regression equations generated employing the statistical coefficients satisfactorily predicted RIF release at each sampling point (R(2)>or=0.9350). There was an acceptable degree of correlation between the drug release data, as predicted by regressional analysis of UV spectrophotometric data, and chromatographic and colorimetric determination of INH (R(2)=0.9793 and R(2)=0.9739) and RIF (R(2)= 0.9976 and R(2)=0.9996) for the two-drug FDC and the novel oral anti-TB drug delivery system, respectively. Regressional analysis of UV spectrophotometric data for simultaneous RIF and INH prediction thus provides a simplified methodology for use in diverse research settings for the assurance of RIF bioavailability from FDC formulations, specifically modified-release forms.
Gordon, Evan M.; Stollstorff, Melanie; Vaidya, Chandan J.
2012-01-01
Many researchers have noted that the functional architecture of the human brain is relatively invariant during task performance and the resting state. Indeed, intrinsic connectivity networks (ICNs) revealed by resting-state functional connectivity analyses are spatially similar to regions activated during cognitive tasks. This suggests that patterns of task-related activation in individual subjects may result from the engagement of one or more of these ICNs; however, this has not been tested. We used a novel analysis, spatial multiple regression, to test whether the patterns of activation during an N-back working memory task could be well described by a linear combination of ICNs delineated using Independent Components Analysis at rest. We found that across subjects, the cingulo-opercular Set Maintenance ICN, as well as right and left Frontoparietal Control ICNs, were reliably activated during working memory, while Default Mode and Visual ICNs were reliably deactivated. Further, involvement of Set Maintenance, Frontoparietal Control, and Dorsal Attention ICNs was sensitive to varying working memory load. Finally, the degree of left Frontoparietal Control network activation predicted response speed, while activation in both left Frontoparietal Control and Dorsal Attention networks predicted task accuracy. These results suggest that a close relationship between resting-state networks and task-evoked activation is functionally relevant for behavior, and that spatial multiple regression analysis is a suitable method for revealing that relationship. PMID:21761505
Analysis of Market Opportunities for Chinese Private Express Delivery Industry
NASA Astrophysics Data System (ADS)
Jiang, Changbing; Bai, Lijun; Tong, Xiaoqing
China's express delivery market has become the arena in which each express enterprise struggles to chase due to the huge potential demand and high profitable prospects. So certain qualitative and quantitative forecast for the future changes of China's express delivery market will help enterprises understand various types of market conditions and social changes in demand and adjust business activities to enhance their competitiveness timely. The development of China's express delivery industry is first introduced in this chapter. Then the theoretical basis of the regression model is overviewed. We also predict the demand trends of China's express delivery market by using Pearson correlation analysis and regression analysis from qualitative and quantitative aspects, respectively. Finally, we draw some conclusions and recommendations for China's express delivery industry.
Unemployment rate and price of gasoline predict the fuel economy of purchased new vehicles.
DOT National Transportation Integrated Search
2011-03-01
This study examined the relationship between two economic indicatorsthe : unemployment rate and the price of gasolineand the fuel economy of purchased new : vehicles. A regression analysis was performed on U.S. monthly data from October 2007 : ...
NASA Technical Reports Server (NTRS)
Mcdermott, P. P.
1980-01-01
The design of an accelerated life test program for electric batteries is discussed. A number of observations and suggestions on the procedures and objectives for conducting an accelerated life test program are presented. Equations based on nonlinear regression analysis for predicting the accelerated life test parameters are discussed.
Virtual Beach version 2.2 (VB 2.2) is a decision support tool. It is designed to construct site-specific Multi-Linear Regression (MLR) models to predict pathogen indicator levels (or fecal indicator bacteria, FIB) at recreational beaches. MLR analysis has outperformed persisten...
Tradespace Exploration for the Engineering of Resilient Systems
2015-05-01
world scenarios. The types of tools within the SAE set include visualization, decision analysis, and M&S, so it is difficult to categorize this toolset... overpopulated , or questionable. ERS Tradespace Workshop Create predictive models using multiple techniques (e.g., regression, Kriging, neural nets
NASA Astrophysics Data System (ADS)
Cheng, Jun-Hu; Jin, Huali; Liu, Zhiwei
2018-01-01
The feasibility of developing a multispectral imaging method using important wavelengths from hyperspectral images selected by genetic algorithm (GA), successive projection algorithm (SPA) and regression coefficient (RC) methods for modeling and predicting protein content in peanut kernel was investigated for the first time. Partial least squares regression (PLSR) calibration model was established between the spectral data from the selected optimal wavelengths and the reference measured protein content ranged from 23.46% to 28.43%. The RC-PLSR model established using eight key wavelengths (1153, 1567, 1972, 2143, 2288, 2339, 2389 and 2446 nm) showed the best predictive results with the coefficient of determination of prediction (R2P) of 0.901, and root mean square error of prediction (RMSEP) of 0.108 and residual predictive deviation (RPD) of 2.32. Based on the obtained best model and image processing algorithms, the distribution maps of protein content were generated. The overall results of this study indicated that developing a rapid and online multispectral imaging system using the feature wavelengths and PLSR analysis is potential and feasible for determination of the protein content in peanut kernels.
Gaussian Process Regression (GPR) Representation in Predictive Model Markup Language (PMML)
Lechevalier, D.; Ak, R.; Ferguson, M.; Law, K. H.; Lee, Y.-T. T.; Rachuri, S.
2017-01-01
This paper describes Gaussian process regression (GPR) models presented in predictive model markup language (PMML). PMML is an extensible-markup-language (XML) -based standard language used to represent data-mining and predictive analytic models, as well as pre- and post-processed data. The previous PMML version, PMML 4.2, did not provide capabilities for representing probabilistic (stochastic) machine-learning algorithms that are widely used for constructing predictive models taking the associated uncertainties into consideration. The newly released PMML version 4.3, which includes the GPR model, provides new features: confidence bounds and distribution for the predictive estimations. Both features are needed to establish the foundation for uncertainty quantification analysis. Among various probabilistic machine-learning algorithms, GPR has been widely used for approximating a target function because of its capability of representing complex input and output relationships without predefining a set of basis functions, and predicting a target output with uncertainty quantification. GPR is being employed to various manufacturing data-analytics applications, which necessitates representing this model in a standardized form for easy and rapid employment. In this paper, we present a GPR model and its representation in PMML. Furthermore, we demonstrate a prototype using a real data set in the manufacturing domain. PMID:29202125
Gaussian Process Regression (GPR) Representation in Predictive Model Markup Language (PMML).
Park, J; Lechevalier, D; Ak, R; Ferguson, M; Law, K H; Lee, Y-T T; Rachuri, S
2017-01-01
This paper describes Gaussian process regression (GPR) models presented in predictive model markup language (PMML). PMML is an extensible-markup-language (XML) -based standard language used to represent data-mining and predictive analytic models, as well as pre- and post-processed data. The previous PMML version, PMML 4.2, did not provide capabilities for representing probabilistic (stochastic) machine-learning algorithms that are widely used for constructing predictive models taking the associated uncertainties into consideration. The newly released PMML version 4.3, which includes the GPR model, provides new features: confidence bounds and distribution for the predictive estimations. Both features are needed to establish the foundation for uncertainty quantification analysis. Among various probabilistic machine-learning algorithms, GPR has been widely used for approximating a target function because of its capability of representing complex input and output relationships without predefining a set of basis functions, and predicting a target output with uncertainty quantification. GPR is being employed to various manufacturing data-analytics applications, which necessitates representing this model in a standardized form for easy and rapid employment. In this paper, we present a GPR model and its representation in PMML. Furthermore, we demonstrate a prototype using a real data set in the manufacturing domain.
de Vries, Durk R; van Herwaarden, Margot A; Smout, André J P M; Samsom, Melvin
2008-06-01
The roles of intragastric pressure (IGP), intraesophageal pressure (IEP), gastroesophageal pressure gradient (GEPG), and body mass index (BMI) in the pathophysiology of gastroesophageal reflux disease (GERD) and hiatal hernia (HH) are only partly understood. In total, 149 GERD patients underwent stationary esophageal manometry, 24-h pH-metry, and endoscopy. One hundred three patients had HH. Linear regression analysis showed that each kilogram per square meter of BMI caused a 0.047-kPa increase in inspiratory IGP (95% confidence interval [CI] 0.026-0.067) and a 0.031-kPa increase in inspiratory GEPG (95% CI 0.007-0.055). Each kilogram per square meter of BMI caused expiratory IGP to increase with 0.043 kPa (95% CI 0.025-0.060) and expiratory IEP with 0.052 kPa (95% CI 0.027-0.077). Each added year of age caused inspiratory IEP to decrease by 0.008 kPa (95% CI -0.015-0.001) and inspiratory GEPG to increase by 0.008 kPa (95% CI 0.000-0.015). In binary logistic regression analysis, HH was predicted by inspiratory and expiratory IGP (odds ratio [OR] 2.93 and 2.62, respectively), inspiratory and expiratory GEPG (OR 3.19 and 2.68, respectively), and BMI (OR 1.72/5 kg/m(2)). In linear regression analysis, HH caused an average 5.09% increase in supine acid exposure (95% CI 0.96-9.22) and an average 3.46% increase in total acid exposure (95% CI 0.82-6.09). Each added year of age caused an average 0.10% increase in upright acid exposure and a 0.09% increase in total acid exposure (95% CI 0.00-0.20 and 0.00-0.18). BMI predicts IGP, inspiratory GEPG, and expiratory IEP. Age predicts inspiratory IEP and GEPG. Presence of HH is predicted by IGP, GEPG, and BMI. GEPG is not associated with acid exposure.
NASA Technical Reports Server (NTRS)
Gaston, S.; Wertheim, M.; Orourke, J. A.
1973-01-01
Summary, consolidation and analysis of specifications, manufacturing process and test controls, and performance results for OAO-2 and OAO-3 lot 20 Amp-Hr sealed nickel cadmium cells and batteries are reported. Correlation of improvements in control requirements with performance is a key feature. Updates for a cell/battery computer model to improve performance prediction capability are included. Applicability of regression analysis computer techniques to relate process controls to performance is checked.
Zhang, Xingyu; Kim, Joyce; Patzer, Rachel E; Pitts, Stephen R; Patzer, Aaron; Schrager, Justin D
2017-10-26
To describe and compare logistic regression and neural network modeling strategies to predict hospital admission or transfer following initial presentation to Emergency Department (ED) triage with and without the addition of natural language processing elements. Using data from the National Hospital Ambulatory Medical Care Survey (NHAMCS), a cross-sectional probability sample of United States EDs from 2012 and 2013 survey years, we developed several predictive models with the outcome being admission to the hospital or transfer vs. discharge home. We included patient characteristics immediately available after the patient has presented to the ED and undergone a triage process. We used this information to construct logistic regression (LR) and multilayer neural network models (MLNN) which included natural language processing (NLP) and principal component analysis from the patient's reason for visit. Ten-fold cross validation was used to test the predictive capacity of each model and receiver operating curves (AUC) were then calculated for each model. Of the 47,200 ED visits from 642 hospitals, 6,335 (13.42%) resulted in hospital admission (or transfer). A total of 48 principal components were extracted by NLP from the reason for visit fields, which explained 75% of the overall variance for hospitalization. In the model including only structured variables, the AUC was 0.824 (95% CI 0.818-0.830) for logistic regression and 0.823 (95% CI 0.817-0.829) for MLNN. Models including only free-text information generated AUC of 0.742 (95% CI 0.731- 0.753) for logistic regression and 0.753 (95% CI 0.742-0.764) for MLNN. When both structured variables and free text variables were included, the AUC reached 0.846 (95% CI 0.839-0.853) for logistic regression and 0.844 (95% CI 0.836-0.852) for MLNN. The predictive accuracy of hospital admission or transfer for patients who presented to ED triage overall was good, and was improved with the inclusion of free text data from a patient's reason for visit regardless of modeling approach. Natural language processing and neural networks that incorporate patient-reported outcome free text may increase predictive accuracy for hospital admission.
Li, Feiming; Gimpel, John R; Arenson, Ethan; Song, Hao; Bates, Bruce P; Ludwin, Fredric
2014-04-01
Few studies have investigated how well scores from the Comprehensive Osteopathic Medical Licensing Examination-USA (COMLEX-USA) series predict resident outcomes, such as performance on board certification examinations. To determine how well COMLEX-USA predicts performance on the American Osteopathic Board of Emergency Medicine (AOBEM) Part I certification examination. The target study population was first-time examinees who took AOBEM Part I in 2011 and 2012 with matched performances on COMLEX-USA Level 1, Level 2-Cognitive Evaluation (CE), and Level 3. Pearson correlations were computed between AOBEM Part I first-attempt scores and COMLEX-USA performances to measure the association between these examinations. Stepwise linear regression analysis was conducted to predict AOBEM Part I scores by the 3 COMLEX-USA scores. An independent t test was conducted to compare mean COMLEX-USA performances between candidates who passed and who failed AOBEM Part I, and a stepwise logistic regression analysis was used to predict the log-odds of passing AOBEM Part I on the basis of COMLEX-USA scores. Scores from AOBEM Part I had the highest correlation with COMLEX-USA Level 3 scores (.57) and slightly lower correlation with COMLEX-USA Level 2-CE scores (.53). The lowest correlation was between AOBEM Part I and COMLEX-USA Level 1 scores (.47). According to the stepwise regression model, COMLEX-USA Level 1 and Level 2-CE scores, which residency programs often use as selection criteria, together explained 30% of variance in AOBEM Part I scores. Adding Level 3 scores explained 37% of variance. The independent t test indicated that the 397 examinees passing AOBEM Part I performed significantly better than the 54 examinees failing AOBEM Part I in all 3 COMLEX-USA levels (P<.001 for all 3 levels). The logistic regression model showed that COMLEX-USA Level 1 and Level 3 scores predicted the log-odds of passing AOBEM Part I (P=.03 and P<.001, respectively). The present study empirically supported the predictive and discriminant validities of the COMLEX-USA series in relation to the AOBEM Part I certification examination. Although residency programs may use COMLEX-USA Level 1 and Level 2-CE scores as partial criteria in selecting residents, Level 3 scores, though typically not available at the time of application, are actually the most statistically related to performances on AOBEM Part I.
Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi
2012-01-01
The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.
2012-01-01
Background Cognitive deficits and multiple psychoactive drug regimens are both common in patients treated for opioid-dependence. Therefore, we examined whether the cognitive performance of patients in opioid-substitution treatment (OST) is associated with their drug treatment variables. Methods Opioid-dependent patients (N = 104) who were treated either with buprenorphine or methadone (n = 52 in both groups) were given attention, working memory, verbal, and visual memory tests after they had been a minimum of six months in treatment. Group-wise results were analysed by analysis of variance. Predictors of cognitive performance were examined by hierarchical regression analysis. Results Buprenorphine-treated patients performed statistically significantly better in a simple reaction time test than methadone-treated ones. No other significant differences between groups in cognitive performance were found. In each OST drug group, approximately 10% of the attention performance could be predicted by drug treatment variables. Use of benzodiazepine medication predicted about 10% of performance variance in working memory. Treatment with more than one other psychoactive drug (than opioid or BZD) and frequent substance abuse during the past month predicted about 20% of verbal memory performance. Conclusions Although this study does not prove a causal relationship between multiple prescription drug use and poor cognitive functioning, the results are relevant for psychosocial recovery, vocational rehabilitation, and psychological treatment of OST patients. Especially for patients with BZD treatment, other treatment options should be actively sought. PMID:23121989
Quantitative analysis of bayberry juice acidity based on visible and near-infrared spectroscopy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shao Yongni; He Yong; Mao Jingyuan
Visible and near-infrared (Vis/NIR) reflectance spectroscopy has been investigated for its ability to nondestructively detect acidity in bayberry juice. What we believe to be a new, better mathematic model is put forward, which we have named principal component analysis-stepwise regression analysis-backpropagation neural network (PCA-SRA-BPNN), to build a correlation between the spectral reflectivity data and the acidity of bayberry juice. In this model, the optimum network parameters,such as the number of input nodes, hidden nodes, learning rate, and momentum, are chosen by the value of root-mean-square (rms) error. The results show that its prediction statistical parameters are correlation coefficient (r) ofmore » 0.9451 and root-mean-square error of prediction(RMSEP) of 0.1168. Partial least-squares (PLS) regression is also established to compare with this model. Before doing this, the influences of various spectral pretreatments (standard normal variate, multiplicative scatter correction, S. Golay first derivative, and wavelet package transform) are compared. The PLS approach with wavelet package transform preprocessing spectra is found to provide the best results, and its prediction statistical parameters are correlation coefficient (r) of 0.9061 and RMSEP of 0.1564. Hence, these two models are both desirable to analyze the data from Vis/NIR spectroscopy and to solve the problem of the acidity prediction of bayberry juice. This supplies basal research to ultimately realize the online measurements of the juice's internal quality through this Vis/NIR spectroscopy technique.« less
Ishino, Takashi; Ragaee, Mahmoud Ali; Maruhashi, Tatsuya; Kajikawa, Masato; Higashi, Yukihito; Sonoyama, Toru; Takeno, Sachio; Hirakawa, Katsuhiro
Cochlear implantation (CI) has been the most successful procedure for restoring hearing in a patient with severe and profound hearing loss. However, possibly owing to the variable brain functions of each patient, its performance and the associated patient satisfaction are widely variable. The authors hypothesize that peripheral and cerebral circulation can be assessed by noninvasive and globally available methods, yielding superior presurgical predictive factors of the performance of CI in adult patients with postlingual hearing loss who are scheduled to undergo CI. Twenty-two adult patients with cochlear implants for postlingual hearing loss were evaluated using Doppler sonography measurement of the cervical arteries (reflecting cerebral blood flow), flow-mediated dilation (FMD; reflecting the condition of cerebral arteries), and their pre-/post-CI best score on a monosyllabic discrimination test (pre-/post-CI best monosyllabic discrimination [BMD] score). Correlations between post-CI BMD score and the other factors were examined using univariate analysis and stepwise multiple linear regression analysis. The prediction factors were calculated by examining the receiver-operating characteristic curve between post-CI BMD score and the significantly positively correlated factors. Age and duration of deafness had a moderately negative correlation. The mean velocity of the internal carotid arteries and FMD had a moderate-to-strong positive correlation with the post-CI BMD score in univariate analysis. Stepwise multiple linear regression analysis revealed that only FMD was significantly positively correlated with post-CI BMD score. Analysis of the receiver-operating characteristic curve showed that a FMD cutoff score of 1.8 significantly predicted post-CI BMD score. These data suggest that FMD is a convenient, noninvasive, and widely available tool for predicting the efficacy of cochlear implants. An FMD cutoff score of 1.8 could be a good index for determining whether patients will hear well with cochlear implants. It could also be used to predict whether cochlear implants will provide good speech recognition benefits to candidates, even if their speech discrimination is poor. This FMD index could become a useful predictive tool for candidates with poor speech discrimination to determine the efficacy of CI before surgery.
No rationale for 1 variable per 10 events criterion for binary logistic regression analysis.
van Smeden, Maarten; de Groot, Joris A H; Moons, Karel G M; Collins, Gary S; Altman, Douglas G; Eijkemans, Marinus J C; Reitsma, Johannes B
2016-11-24
Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth's correction, are compared. The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect ('separation'). We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth's correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.
Sound attenuation of fiberglass lined ventilation ducts
NASA Astrophysics Data System (ADS)
Albright, Jacob
Sound attenuation is a crucial part of designing any HVAC system. Most ventilation systems are designed to be in areas occupied by one or more persons. If these systems do not adequately attenuate the sound of the supply fan, compressor, or any other source of sound, the affected area could be subject to an array of problems ranging from an annoying hum to a deafening howl. The goals of this project are to quantify the sound attenuation properties of fiberglass duct liner and to perform a regression analysis to develop equations to predict insertion loss values for both rectangular and round duct liners. The first goal was accomplished via insertion loss testing. The tests performed conformed to the ASTM E477 standard. Using the insertion loss test data, regression equations were developed to predict insertion loss values for rectangular ducts ranging in size from 12-in x 18-in to 48-in x 48-in in lengths ranging from 3ft to 30ft. Regression equations were also developed to predict insertion loss values for round ducts ranging in diameters from 12-in to 48-in in lengths ranging from 3ft to 30ft.
NASA Astrophysics Data System (ADS)
Xin, Pei; Wang, Shen S. J.; Shen, Chengji; Zhang, Zeyu; Lu, Chunhui; Li, Ling
2018-03-01
Shallow groundwater interacts strongly with surface water across a quarter of global land area, affecting significantly the terrestrial eco-hydrology and biogeochemistry. We examined groundwater behavior subjected to unimodal impulse and irregular surface water fluctuations, combining physical experiments, numerical simulations, and functional data analysis. Both the experiments and numerical simulations demonstrated a damped and delayed response of groundwater table to surface water fluctuations. To quantify this hysteretic shallow groundwater behavior, we developed a regression model with the Gamma distribution functions adopted to account for the dependence of groundwater behavior on antecedent surface water conditions. The regression model fits and predicts well the groundwater table oscillations resulting from propagation of irregular surface water fluctuations in both laboratory and large-scale aquifers. The coefficients of the Gamma distribution function vary spatially, reflecting the hysteresis effect associated with increased amplitude damping and delay as the fluctuation propagates. The regression model, in a relatively simple functional form, has demonstrated its capacity of reproducing high-order nonlinear effects that underpin the surface water and groundwater interactions. The finding has important implications for understanding and predicting shallow groundwater behavior and associated biogeochemical processes, and will contribute broadly to studies of groundwater-dependent ecology and biogeochemistry.
Takagi, Yukinori; Sumi, Misa; Nakamura, Hideki; Sato, Shuntaro; Kawakami, Atsushi; Nakamura, Takashi
2016-02-01
To evaluate ultrasonography (US) grading of salivary gland disease as a predictor of treatment efficacy for impaired salivary function in xerostomia patients with or without Sjögren's syndrome (SS). We retrospectively analysed the prognostic importance of salivary US grading in 317 patients (168 with SS and 149 without SS). US images of the parotid and submandibular glands in each patient were individually categorized into grades 0-4 based on the extent of damage to the gland; and the sum total grade of the two gland types on either side was assigned a US score of 0-8 for each patient. The relative importance of US score and demographic and clinical variables was assessed using stepwise multiple regression analysis after various durations of xerostomia treatment. Multiple regression analysis indicated that the baseline US score before treatment was the most important factor [standardized regression coefficient (β) = -0.523, t-statistic (t) = -7.967, P < 0.001] in predicting negative outcomes in SS patients. Treatment duration (β = 0.277, t = 4.225, P < 0.001) was also a significant but less important positive variable. On the other hand, US grading did not effectively predict treatment outcomes in non-SS patients, with treatment duration (β = 0.199, t = 2.486, P = 0.014) and baseline salivary flow rate before treatment (β = -0.172, t = -2.159, P = 0.032) being significant but weak predictors of positive and negative outcome, respectively. Salivary gland US grading may help to predict outcomes of treatment for impaired salivary function in patients with SS. © The Author 2015. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Tilburg, Charles E.; Jordan, Linda M.; Carlson, Amy E.; Zeeman, Stephan I.; Yund, Philip O.
2015-01-01
Faecal pollution in stormwater, wastewater and direct run-off can carry zoonotic pathogens to streams, rivers and the ocean, reduce water quality, and affect both recreational and commercial fishing areas of the coastal ocean. Typically, the closure of beaches and commercial fishing areas is governed by the testing for the presence of faecal bacteria, which requires an 18–24 h period for sample incubation. As water quality can change during this testing period, the need for accurate and timely predictions of coastal water quality has become acute. In this study, we: (i) examine the relationship between water quality, precipitation and river discharge at several locations within the Gulf of Maine, and (ii) use multiple linear regression models based on readily obtainable hydrometeorological measurements to predict water quality events at five coastal locations. Analysis of a 12 year dataset revealed that high river discharge and/or precipitation events can lead to reduced water quality; however, the use of only these two parameters to predict water quality can result in a number of errors. Analysis of a higher frequency, 2 year study using multiple linear regression models revealed that precipitation, salinity, river discharge, winds, seasonality and coastal circulation correlate with variations in water quality. Although there has been extensive development of regression models for freshwater, this is one of the first attempts to create a mechanistic model to predict water quality in coastal marine waters. Model performance is similar to that of efforts in other regions, which have incorporated models into water resource managers' decisions, indicating that the use of a mechanistic model in coastal Maine is feasible. PMID:26587258
Relationships of Measurement Error and Prediction Error in Observed-Score Regression
ERIC Educational Resources Information Center
Moses, Tim
2012-01-01
The focus of this paper is assessing the impact of measurement errors on the prediction error of an observed-score regression. Measures are presented and described for decomposing the linear regression's prediction error variance into parts attributable to the true score variance and the error variances of the dependent variable and the predictor…
NASA Technical Reports Server (NTRS)
Parker, Peter A.; Geoffrey, Vining G.; Wilson, Sara R.; Szarka, John L., III; Johnson, Nels G.
2010-01-01
The calibration of measurement systems is a fundamental but under-studied problem within industrial statistics. The origins of this problem go back to basic chemical analysis based on NIST standards. In today's world these issues extend to mechanical, electrical, and materials engineering. Often, these new scenarios do not provide "gold standards" such as the standard weights provided by NIST. This paper considers the classic "forward regression followed by inverse regression" approach. In this approach the initial experiment treats the "standards" as the regressor and the observed values as the response to calibrate the instrument. The analyst then must invert the resulting regression model in order to use the instrument to make actual measurements in practice. This paper compares this classical approach to "reverse regression," which treats the standards as the response and the observed measurements as the regressor in the calibration experiment. Such an approach is intuitively appealing because it avoids the need for the inverse regression. However, it also violates some of the basic regression assumptions.
NASA Astrophysics Data System (ADS)
Shan, X.; Zhang, K.; Zhuang, Y.; Fu, R.; Hong, Y.
2017-12-01
Seasonal prediction of rainfall during the dry-to-wet transition season in austral spring (September-November) over southern Amazonia is central for improving planting crops and fire mitigation in that region. Previous studies have identified the key large-scale atmospheric dynamic and thermodynamics pre-conditions during the dry season (June-August) that influence the rainfall anomalies during the dry to wet transition season over Southern Amazonia. Based on these key pre-conditions during dry season, we have evaluated several statistical models and developed a Neural Network based statistical prediction system to predict rainfall during the dry to wet transition for Southern Amazonia (5-15°S, 50-70°W). Multivariate Empirical Orthogonal Function (EOF) Analysis is applied to the following four fields during JJA from the ECMWF Reanalysis (ERA-Interim) spanning from year 1979 to 2015: geopotential height at 200 hPa, surface relative humidity, convective inhibition energy (CIN) index and convective available potential energy (CAPE), to filter out noise and highlight the most coherent spatial and temporal variations. The first 10 EOF modes are retained for inputs to the statistical models, accounting for at least 70% of the total variance in the predictor fields. We have tested several linear and non-linear statistical methods. While the regularized Ridge Regression and Lasso Regression can generally capture the spatial pattern and magnitude of rainfall anomalies, we found that that Neural Network performs best with an accuracy greater than 80%, as expected from the non-linear dependence of the rainfall on the large-scale atmospheric thermodynamic conditions and circulation. Further tests of various prediction skill metrics and hindcasts also suggest this Neural Network prediction approach can significantly improve seasonal prediction skill than the dynamic predictions and regression based statistical predictions. Thus, this statistical prediction system could have shown potential to improve real-time seasonal rainfall predictions in the future.
Stevens, Adam; Murray, Philip; Wojcik, Jerome; Raelson, John; Koledova, Ekaterina; Chatelain, Pierre
2016-01-01
Objective Single-nucleotide polymorphisms (SNPs) associated with the response to recombinant human growth hormone (r-hGH) have previously been identified in growth hormone deficiency (GHD) and Turner syndrome (TS) children in the PREDICT long-term follow-up (LTFU) study (Nbib699855). Here, we describe the PREDICT validation (VAL) study (Nbib1419249), which aimed to confirm these genetic associations. Design and methods Children with GHD (n = 293) or TS (n = 132) were recruited retrospectively from 29 sites in nine countries. All children had completed 1 year of r-hGH therapy. 48 SNPs previously identified as associated with first year growth response to r-hGH were genotyped. Regression analysis was used to assess the association between genotype and growth response using clinical/auxological variables as covariates. Further analysis was undertaken using random forest classification. Results The children were younger, and the growth response was higher in VAL study. Direct genotype analysis did not replicate what was found in the LTFU study. However, using exploratory regression models with covariates, a consistent relationship with growth response in both VAL and LTFU was shown for four genes – SOS1 and INPPL1 in GHD and ESR1 and PTPN1 in TS. The random forest analysis demonstrated that only clinical covariates were important in the prediction of growth response in mild GHD (>4 to <10 μg/L on GH stimulation test), however, in severe GHD (≤4 μg/L) several SNPs contributed (in IGF2, GRB10, FOS, IGFBP3 and GHRHR). Conclusions The PREDICT validation study supports, in an independent cohort, the association of four of 48 genetic markers with growth response to r-hGH treatment in both pre-pubertal GHD and TS children after controlling for clinical/auxological covariates. However, the contribution of these SNPs in a prediction model of first-year response is not sufficient for routine clinical use. PMID:27651465
Stevens, Adam; Murray, Philip; Wojcik, Jerome; Raelson, John; Koledova, Ekaterina; Chatelain, Pierre; Clayton, Peter
2016-12-01
Single-nucleotide polymorphisms (SNPs) associated with the response to recombinant human growth hormone (r-hGH) have previously been identified in growth hormone deficiency (GHD) and Turner syndrome (TS) children in the PREDICT long-term follow-up (LTFU) study (Nbib699855). Here, we describe the PREDICT validation (VAL) study (Nbib1419249), which aimed to confirm these genetic associations. Children with GHD (n = 293) or TS (n = 132) were recruited retrospectively from 29 sites in nine countries. All children had completed 1 year of r-hGH therapy. 48 SNPs previously identified as associated with first year growth response to r-hGH were genotyped. Regression analysis was used to assess the association between genotype and growth response using clinical/auxological variables as covariates. Further analysis was undertaken using random forest classification. The children were younger, and the growth response was higher in VAL study. Direct genotype analysis did not replicate what was found in the LTFU study. However, using exploratory regression models with covariates, a consistent relationship with growth response in both VAL and LTFU was shown for four genes - SOS1 and INPPL1 in GHD and ESR1 and PTPN1 in TS. The random forest analysis demonstrated that only clinical covariates were important in the prediction of growth response in mild GHD (>4 to <10 μg/L on GH stimulation test), however, in severe GHD (≤4 μg/L) several SNPs contributed (in IGF2, GRB10, FOS, IGFBP3 and GHRHR). The PREDICT validation study supports, in an independent cohort, the association of four of 48 genetic markers with growth response to r-hGH treatment in both pre-pubertal GHD and TS children after controlling for clinical/auxological covariates. However, the contribution of these SNPs in a prediction model of first-year response is not sufficient for routine clinical use. © 2016 European Society of Endocrinology.
Wan, Cai-Feng; Liu, Xue-Song; Wang, Lin; Zhang, Jie; Lu, Jin-Song; Li, Feng-Hua
2018-06-01
To clarify whether the quantitative parameters of contrast-enhanced ultrasound (CEUS) can be used to predict pathological complete response (pCR) in patients with locally advanced breast cancer receiving neoadjuvant chemotherapy (NAC). Fifty-one patients with histologically proved locally advanced breast cancer scheduled for NAC were enrolled. The quantitative data for CEUS and the tumor diameter were collected at baseline and before surgery, and compared with the pathological response. Multiple logistic regression analysis was performed to examine quantitative parameters at CEUS and the tumor diameter to predict the pCR, and receiver operating characteristic (ROC) curve analysis was used as a summary statistic. Multiple logistic regression analysis revealed that PEAK (the maximum intensity of the time-intensity curve during bolus transit), PEAK%, TTP% (time to peak), and diameter% were significant independent predictors of pCR, and the area under the ROC curve was 0.932(Az 1 ), and the sensitivity and specificity to predict pCR were 93.7% and 80.0%. The area under the ROC curve for the quantitative parameters was 0.927(Az 2 ), and the sensitivity and specificity to predict pCR were 81.2% and 94.3%. For diameter%, the area under the ROC curve was 0.786 (Az 3 ), and the sensitivity and specificity to predict pCR were 93.8% and 54.3%. The values of Az 1 and Az 2 were significantly higher than that of Az 3 (P = 0.027 and P = 0.034, respectively). However, there was no significant difference between the values of Az 1 and Az 2 (P = 0.825). Quantitative analysis of tumor blood perfusion with CEUS is superior to diameter% to predict pCR, and can be used as a functional technique to evaluate tumor response to NAC. Copyright © 2018. Published by Elsevier B.V.
Analysis of Multivariate Experimental Data Using A Simplified Regression Model Search Algorithm
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert Manfred
2013-01-01
A new regression model search algorithm was developed in 2011 that may be used to analyze both general multivariate experimental data sets and wind tunnel strain-gage balance calibration data. The new algorithm is a simplified version of a more complex search algorithm that was originally developed at the NASA Ames Balance Calibration Laboratory. The new algorithm has the advantage that it needs only about one tenth of the original algorithm's CPU time for the completion of a search. In addition, extensive testing showed that the prediction accuracy of math models obtained from the simplified algorithm is similar to the prediction accuracy of math models obtained from the original algorithm. The simplified algorithm, however, cannot guarantee that search constraints related to a set of statistical quality requirements are always satisfied in the optimized regression models. Therefore, the simplified search algorithm is not intended to replace the original search algorithm. Instead, it may be used to generate an alternate optimized regression model of experimental data whenever the application of the original search algorithm either fails or requires too much CPU time. Data from a machine calibration of NASA's MK40 force balance is used to illustrate the application of the new regression model search algorithm.
Li, Zhenghua; Cheng, Fansheng; Xia, Zhining
2011-01-01
The chemical structures of 114 polycyclic aromatic sulfur heterocycles (PASHs) have been studied by molecular electronegativity-distance vector (MEDV). The linear relationships between gas chromatographic retention index and the MEDV have been established by a multiple linear regression (MLR) model. The results of variable selection by stepwise multiple regression (SMR) and the powerful predictive abilities of the optimization model appraised by leave-one-out cross-validation showed that the optimization model with the correlation coefficient (R) of 0.994 7 and the cross-validated correlation coefficient (Rcv) of 0.994 0 possessed the best statistical quality. Furthermore, when the 114 PASHs compounds were divided into calibration and test sets in the ratio of 2:1, the statistical analysis showed our models possesses almost equal statistical quality, the very similar regression coefficients and the good robustness. The quantitative structure-retention relationship (QSRR) model established may provide a convenient and powerful method for predicting the gas chromatographic retention of PASHs.
Prediction of monthly rainfall in Victoria, Australia: Clusterwise linear regression approach
NASA Astrophysics Data System (ADS)
Bagirov, Adil M.; Mahmood, Arshad; Barton, Andrew
2017-05-01
This paper develops the Clusterwise Linear Regression (CLR) technique for prediction of monthly rainfall. The CLR is a combination of clustering and regression techniques. It is formulated as an optimization problem and an incremental algorithm is designed to solve it. The algorithm is applied to predict monthly rainfall in Victoria, Australia using rainfall data with five input meteorological variables over the period of 1889-2014 from eight geographically diverse weather stations. The prediction performance of the CLR method is evaluated by comparing observed and predicted rainfall values using four measures of forecast accuracy. The proposed method is also compared with the CLR using the maximum likelihood framework by the expectation-maximization algorithm, multiple linear regression, artificial neural networks and the support vector machines for regression models using computational results. The results demonstrate that the proposed algorithm outperforms other methods in most locations.
NASA Astrophysics Data System (ADS)
Sarkar, Arnab; Karki, Vijay; Aggarwal, Suresh K.; Maurya, Gulab S.; Kumar, Rohit; Rai, Awadhesh K.; Mao, Xianglei; Russo, Richard E.
2015-06-01
Laser induced breakdown spectroscopy (LIBS) was applied for elemental characterization of high alloy steel using partial least squares regression (PLSR) with an objective to evaluate the analytical performance of this multivariate approach. The optimization of the number of principle components for minimizing error in PLSR algorithm was investigated. The effect of different pre-treatment procedures on the raw spectral data before PLSR analysis was evaluated based on several statistical (standard error of prediction, percentage relative error of prediction etc.) parameters. The pre-treatment with "NORM" parameter gave the optimum statistical results. The analytical performance of PLSR model improved by increasing the number of laser pulses accumulated per spectrum as well as by truncating the spectrum to appropriate wavelength region. It was found that the statistical benefit of truncating the spectrum can also be accomplished by increasing the number of laser pulses per accumulation without spectral truncation. The constituents (Co and Mo) present in hundreds of ppm were determined with relative precision of 4-9% (2σ), whereas the major constituents Cr and Ni (present at a few percent levels) were determined with a relative precision of ~ 2%(2σ).
A New Global Regression Analysis Method for the Prediction of Wind Tunnel Model Weight Corrections
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert Manfred; Bridge, Thomas M.; Amaya, Max A.
2014-01-01
A new global regression analysis method is discussed that predicts wind tunnel model weight corrections for strain-gage balance loads during a wind tunnel test. The method determines corrections by combining "wind-on" model attitude measurements with least squares estimates of the model weight and center of gravity coordinates that are obtained from "wind-off" data points. The method treats the least squares fit of the model weight separate from the fit of the center of gravity coordinates. Therefore, it performs two fits of "wind- off" data points and uses the least squares estimator of the model weight as an input for the fit of the center of gravity coordinates. Explicit equations for the least squares estimators of the weight and center of gravity coordinates are derived that simplify the implementation of the method in the data system software of a wind tunnel. In addition, recommendations for sets of "wind-off" data points are made that take typical model support system constraints into account. Explicit equations of the confidence intervals on the model weight and center of gravity coordinates and two different error analyses of the model weight prediction are also discussed in the appendices of the paper.
Belilovsky, Eugene; Gkirtzou, Katerina; Misyrlis, Michail; Konova, Anna B; Honorio, Jean; Alia-Klein, Nelly; Goldstein, Rita Z; Samaras, Dimitris; Blaschko, Matthew B
2015-12-01
We explore various sparse regularization techniques for analyzing fMRI data, such as the ℓ1 norm (often called LASSO in the context of a squared loss function), elastic net, and the recently introduced k-support norm. Employing sparsity regularization allows us to handle the curse of dimensionality, a problem commonly found in fMRI analysis. In this work we consider sparse regularization in both the regression and classification settings. We perform experiments on fMRI scans from cocaine-addicted as well as healthy control subjects. We show that in many cases, use of the k-support norm leads to better predictive performance, solution stability, and interpretability as compared to other standard approaches. We additionally analyze the advantages of using the absolute loss function versus the standard squared loss which leads to significantly better predictive performance for the regularization methods tested in almost all cases. Our results support the use of the k-support norm for fMRI analysis and on the clinical side, the generalizability of the I-RISA model of cocaine addiction. Copyright © 2015 Elsevier Ltd. All rights reserved.
Wei, Zhenbo; Wang, Jun; Ye, Linshuang
2011-08-15
A voltammetric electronic tongue (VE-tongue) was developed to discriminate the difference between Chinese rice wines in this research. Three types of Chinese rice wine with different marked ages (1, 3, and 5 years) were classified by the VE-tongue by principal component analysis (PCA) and cluster analysis (CA). The VE-tongue consisted of six working electrodes (gold, silver, platinum, palladium, tungsten, and titanium) in a standard three-electrode configuration. The multi-frequency large amplitude pulse voltammetry (MLAPV), which consisted of four segments of 1 Hz, 10 Hz, 100 Hz, and 1000 Hz, was applied as the potential waveform. The three types of Chinese rice wine could be classified accurately by PCA and CA, and some interesting regularity is shown in the score plots with the help of PCA. Two regression models, partial least squares (PLS) and back-error propagation-artificial neural network (BP-ANN), were used for wine age prediction. The regression results showed that the marked ages of the three types of Chinese rice wine were successfully predicted using PLS and BP-ANN. Copyright © 2011 Elsevier B.V. All rights reserved.
Multiple-Instance Regression with Structured Data
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri L.; Lane, Terran; Roper, Alex
2008-01-01
We present a multiple-instance regression algorithm that models internal bag structure to identify the items most relevant to the bag labels. Multiple-instance regression (MIR) operates on a set of bags with real-valued labels, each containing a set of unlabeled items, in which the relevance of each item to its bag label is unknown. The goal is to predict the labels of new bags from their contents. Unlike previous MIR methods, MI-ClusterRegress can operate on bags that are structured in that they contain items drawn from a number of distinct (but unknown) distributions. MI-ClusterRegress simultaneously learns a model of the bag's internal structure, the relevance of each item, and a regression model that accurately predicts labels for new bags. We evaluated this approach on the challenging MIR problem of crop yield prediction from remote sensing data. MI-ClusterRegress provided predictions that were more accurate than those obtained with non-multiple-instance approaches or MIR methods that do not model the bag structure.
Real estate value prediction using multivariate regression models
NASA Astrophysics Data System (ADS)
Manjula, R.; Jain, Shubham; Srivastava, Sharad; Rajiv Kher, Pranav
2017-11-01
The real estate market is one of the most competitive in terms of pricing and the same tends to vary significantly based on a lot of factors, hence it becomes one of the prime fields to apply the concepts of machine learning to optimize and predict the prices with high accuracy. Therefore in this paper, we present various important features to use while predicting housing prices with good accuracy. We have described regression models, using various features to have lower Residual Sum of Squares error. While using features in a regression model some feature engineering is required for better prediction. Often a set of features (multiple regressions) or polynomial regression (applying a various set of powers in the features) is used for making better model fit. For these models are expected to be susceptible towards over fitting ridge regression is used to reduce it. This paper thus directs to the best application of regression models in addition to other techniques to optimize the result.
Sumithran, P; Purcell, K; Kuyruk, S; Proietto, J; Prendergast, L A
2018-02-01
Consistent, strong predictors of obesity treatment outcomes have not been identified. It has been suggested that broadening the range of predictor variables examined may be valuable. We explored methods to predict outcomes of a very-low-energy diet (VLED)-based programme in a clinically comparable setting, using a wide array of pre-intervention biological and psychosocial participant data. A total of 61 women and 39 men (mean ± standard deviation [SD] body mass index: 39.8 ± 7.3 kg/m 2 ) underwent an 8-week VLED and 12-month follow-up. At baseline, participants underwent a blood test and assessment of psychological, social and behavioural factors previously associated with treatment outcomes. Logistic regression, linear discriminant analysis, decision trees and random forests were used to model outcomes from baseline variables. Of the 100 participants, 88 completed the VLED and 42 attended the Week 60 visit. Overall prediction rates for weight loss of ≥10% at weeks 8 and 60, and attrition at Week 60, using combined data were between 77.8 and 87.6% for logistic regression, and lower for other methods. When logistic regression analyses included only baseline demographic and anthropometric variables, prediction rates were 76.2-86.1%. In this population, considering a wide range of biological and psychosocial data did not improve outcome prediction compared to simply-obtained baseline characteristics. © 2017 World Obesity Federation.
GPA, GMAT, and Scale: A Method of Quantification of Admissions Criteria.
ERIC Educational Resources Information Center
Sobol, Marion G.
1984-01-01
Multiple regression analysis was used to establish a scale, measuring college student involvement in campus activities, work experience, technical background, references, and goals. This scale was tested to see whether it improved the prediction of success in graduate school. (Author/MLW)
Linear regression crash prediction models : issues and proposed solutions.
DOT National Transportation Integrated Search
2010-05-01
The paper develops a linear regression model approach that can be applied to : crash data to predict vehicle crashes. The proposed approach involves novice data aggregation : to satisfy linear regression assumptions; namely error structure normality ...
Gerst, Kerstin; Al-Ghatrif, Majd; Beard, Holly A; Samper-Ternent, Rafael; Markides, Kyriakos S
2010-04-01
This analysis explores nativity differences in depressive symptoms among very old (75+) community-dwelling Mexican Americans. Cross-sectional analysis using the fifth wave (2004-2005) of the Hispanic Established Population for the Epidemiological Study of the Elderly (Hispanic EPESE). The sample consisted of 1699 non-institutionalized Mexican American men and women aged 75 years and above. Depressive symptoms were measured by the Center for Epidemiological Studies Depression Scale (CES-D). Logistic regression was used to predict high depressive symptoms (CES-D score 16 or higher) and multinomial logistic regression was used to predict sub-threshold, moderate, and high depressive symptoms. Results showed that elders born in Mexico had higher odds of more depressive symptoms compared to otherwise similar Mexican Americans born in the US. Age of arrival, gender, and other covariates did not modify that risk. The findings suggest that older Mexican American immigrants are at higher risk of depressive symptomatology compared to persons born in the US, which has significant implications for research, policy, and clinical practice.
Hypomagnesemia predicts postoperative biochemical hypocalcemia after thyroidectomy.
Luo, Han; Yang, Hongliu; Zhao, Wanjun; Wei, Tao; Su, Anping; Wang, Bin; Zhu, Jingqiang
2017-05-25
To investigate the role of magnesium in biochemical and symptomatic hypocalcemia, a retrospective study was conducted. Less-than-total thyroidectomy patients were excluded from the final analysis. Identified the risk factors of biochemical and symptomatic hypocalcemia, and investigated the correlation by logistic regression and correlation test respectively. A total of 304 patients were included in the final analysis. General incidence of hypomagnesemia was 23.36%. Logistic regression showed that gender (female) (OR = 2.238, p = 0.015) and postoperative hypomagnesemia (OR = 2.010, p = 0.017) were independent risk factors for biochemical hypocalcemia. Both Pearson and partial correlation tests indicated there was indeed significant relation between calcium and magnesium. However, relative decreasing of iPTH (>70%) (6.691, p < 0.001) and hypocalcemia (2.222, p = 0.046) were identified as risk factors of symptomatic hypocalcemia. The difference remained significant even in normoparathyroidism patients. Postoperative hypomagnesemia was independent risk factor of biochemical hypocalcemia. Relative decline of iPTH was predominating in predicting symptomatic hypocalcemia.
Tewary, Sweta; Farber, Naomi
2014-01-01
Individuals with rheumatoid arthritis (RA) struggle to maintain improved functional ability and reduced pain levels. Health education emphasizing self-efficacy helps individuals to adjust with the disease outcome and progression. As a basis to develop comprehensive evidence-based patient education programs, the aim of the study was to examine the role of marriage as a predictor of pain and functional self-efficacy among individuals with RA. Review of the regression analysis did not provide support for the relationships between marital quality and self-efficacy. Relationships were not observed between marital quality, length of marriage, and self-efficacy as predicted by the first hypothesis. Additional regression analysis examination found that marital quality, length of marriage, pain, and health assessment together reported significant variance in self-efficacy. However, only health assessment significantly predicted self-efficacy. Other nonexamined variables could have influenced the independent marital quality effects. Future longitudinal studies with larger sample sizes can further validate the current findings.
Quantitative analysis of microbial contamination in private drinking water supply systems.
Allevi, Richard P; Krometis, Leigh-Anne H; Hagedorn, Charles; Benham, Brian; Lawrence, Annie H; Ling, Erin J; Ziegler, Peter E
2013-06-01
Over one million households rely on private water supplies (e.g. well, spring, cistern) in the Commonwealth of Virginia, USA. The present study tested 538 private wells and springs in 20 Virginia counties for total coliforms (TCs) and Escherichia coli along with a suite of chemical contaminants. A logistic regression analysis was used to investigate potential correlations between TC contamination and chemical parameters (e.g. NO3(-), turbidity), as well as homeowner-provided survey data describing system characteristics and perceived water quality. Of the 538 samples collected, 41% (n = 221) were positive for TCs and 10% (n = 53) for E. coli. Chemical parameters were not statistically predictive of microbial contamination. Well depth, water treatment, and farm location proximate to the water supply were factors in a regression model that predicted presence/absence of TCs with 74% accuracy. Microbial and chemical source tracking techniques (Bacteroides gene Bac32F and HF183 detection via polymerase chain reaction and optical brightener detection via fluorometry) identified four samples as likely contaminated with human wastewater.
Ataque de nervios: relationship to anxiety sensitivity and dissociation predisposition.
Hinton, Devon E; Chong, Roberto; Pollack, Mark H; Barlow, David H; McNally, Richard J
2008-01-01
We investigated the relative importance of "fear of arousal symptoms" (i.e., anxiety sensitivity) and "dissociation tendency" in generating ataque de nervios. Puerto Rican patients attending an outpatient psychiatric clinic were assessed for ataque de nervios frequency in the previous month, and they completed the Anxiety Sensitivity Index (ASI) and the Dissociation Experiences Scale (DES). ASI scores were especially high in the ataque-positive group (M=41.6, SD=12.8) as compared with the ataque-negative group (M=27.2, SD=11.7), t(2, 68)=4.6, P<.001. Among the whole sample (N=70), in a logistic regression analysis, the ASI significantly predicted (odds ratio=2.6) the presence of ataque de nervios, but the DES did not. In a linear regression analysis, ataque severity was significantly predicted by both the ASI (beta=.46) and the DES (beta=.29). The theoretical and clinical implications of the strong relationship of the ASI to ataque severity are discussed.
Big Data Toolsets to Pharmacometrics: Application of Machine Learning for Time‐to‐Event Analysis
Gong, Xiajing; Hu, Meng
2018-01-01
Abstract Additional value can be potentially created by applying big data tools to address pharmacometric problems. The performances of machine learning (ML) methods and the Cox regression model were evaluated based on simulated time‐to‐event data synthesized under various preset scenarios, i.e., with linear vs. nonlinear and dependent vs. independent predictors in the proportional hazard function, or with high‐dimensional data featured by a large number of predictor variables. Our results showed that ML‐based methods outperformed the Cox model in prediction performance as assessed by concordance index and in identifying the preset influential variables for high‐dimensional data. The prediction performances of ML‐based methods are also less sensitive to data size and censoring rates than the Cox regression model. In conclusion, ML‐based methods provide a powerful tool for time‐to‐event analysis, with a built‐in capacity for high‐dimensional data and better performance when the predictor variables assume nonlinear relationships in the hazard function. PMID:29536640
Genotype-phenotype association study via new multi-task learning model
Huo, Zhouyuan; Shen, Dinggang
2018-01-01
Research on the associations between genetic variations and imaging phenotypes is developing with the advance in high-throughput genotype and brain image techniques. Regression analysis of single nucleotide polymorphisms (SNPs) and imaging measures as quantitative traits (QTs) has been proposed to identify the quantitative trait loci (QTL) via multi-task learning models. Recent studies consider the interlinked structures within SNPs and imaging QTs through group lasso, e.g. ℓ2,1-norm, leading to better predictive results and insights of SNPs. However, group sparsity is not enough for representing the correlation between multiple tasks and ℓ2,1-norm regularization is not robust either. In this paper, we propose a new multi-task learning model to analyze the associations between SNPs and QTs. We suppose that low-rank structure is also beneficial to uncover the correlation between genetic variations and imaging phenotypes. Finally, we conduct regression analysis of SNPs and QTs. Experimental results show that our model is more accurate in prediction than compared methods and presents new insights of SNPs. PMID:29218896
Characterizing multivariate decoding models based on correlated EEG spectral features
McFarland, Dennis J.
2013-01-01
Objective Multivariate decoding methods are popular techniques for analysis of neurophysiological data. The present study explored potential interpretative problems with these techniques when predictors are correlated. Methods Data from sensorimotor rhythm-based cursor control experiments was analyzed offline with linear univariate and multivariate models. Features were derived from autoregressive (AR) spectral analysis of varying model order which produced predictors that varied in their degree of correlation (i.e., multicollinearity). Results The use of multivariate regression models resulted in much better prediction of target position as compared to univariate regression models. However, with lower order AR features interpretation of the spectral patterns of the weights was difficult. This is likely to be due to the high degree of multicollinearity present with lower order AR features. Conclusions Care should be exercised when interpreting the pattern of weights of multivariate models with correlated predictors. Comparison with univariate statistics is advisable. Significance While multivariate decoding algorithms are very useful for prediction their utility for interpretation may be limited when predictors are correlated. PMID:23466267
Desire thinking: A risk factor for binge eating?
Spada, Marcantonio M; Caselli, Gabriele; Fernie, Bruce A; Manfredi, Chiara; Boccaletti, Fabio; Dallari, Giulia; Gandini, Federica; Pinna, Eleonora; Ruggiero, Giovanni M; Sassaroli, Sandra
2015-08-01
In the current study we explored the role of desire thinking in predicting binge eating independently of Body Mass Index, negative affect and irrational food beliefs. A sample of binge eaters (n=77) and a sample of non-binge eaters (n=185) completed the following self-report instruments: Hospital Anxiety and Depression Scale, Irrational Food Beliefs Scale, Desire Thinking Questionnaire, and Binge Eating Scale. Mann-Whitney U tests revealed that all variable scores were significantly higher for binge eaters than non-binge eaters. A logistic regression analysis indicated that verbal perseveration was a predictor of classification as a binge eater over and above Body Mass Index, negative affect and irrational food beliefs. A hierarchical regression analysis, on the combined sample, indicated that verbal perseveration predicted levels of binge eating independently of Body Mass Index, negative affect and irrational food beliefs. These results highlight the possible role of desire thinking as a risk factor for binge eating. Copyright © 2015 Elsevier Ltd. All rights reserved.
Comparative analysis of used car price evaluation models
NASA Astrophysics Data System (ADS)
Chen, Chuancan; Hao, Lulu; Xu, Cong
2017-05-01
An accurate used car price evaluation is a catalyst for the healthy development of used car market. Data mining has been applied to predict used car price in several articles. However, little is studied on the comparison of using different algorithms in used car price estimation. This paper collects more than 100,000 used car dealing records throughout China to do empirical analysis on a thorough comparison of two algorithms: linear regression and random forest. These two algorithms are used to predict used car price in three different models: model for a certain car make, model for a certain car series and universal model. Results show that random forest has a stable but not ideal effect in price evaluation model for a certain car make, but it shows great advantage in the universal model compared with linear regression. This indicates that random forest is an optimal algorithm when handling complex models with a large number of variables and samples, yet it shows no obvious advantage when coping with simple models with less variables.
Developing a predictive tropospheric ozone model for Tabriz
NASA Astrophysics Data System (ADS)
Khatibi, Rahman; Naghipour, Leila; Ghorbani, Mohammad A.; Smith, Michael S.; Karimi, Vahid; Farhoudi, Reza; Delafrouz, Hadi; Arvanaghi, Hadi
2013-04-01
Predictive ozone models are becoming indispensable tools by providing a capability for pollution alerts to serve people who are vulnerable to the risks. We have developed a tropospheric ozone prediction capability for Tabriz, Iran, by using the following five modeling strategies: three regression-type methods: Multiple Linear Regression (MLR), Artificial Neural Networks (ANNs), and Gene Expression Programming (GEP); and two auto-regression-type models: Nonlinear Local Prediction (NLP) to implement chaos theory and Auto-Regressive Integrated Moving Average (ARIMA) models. The regression-type modeling strategies explain the data in terms of: temperature, solar radiation, dew point temperature, and wind speed, by regressing present ozone values to their past values. The ozone time series are available at various time intervals, including hourly intervals, from August 2010 to March 2011. The results for MLR, ANN and GEP models are not overly good but those produced by NLP and ARIMA are promising for the establishing a forecasting capability.
Asselbergs, Joost; Ruwaard, Jeroen; Ejdys, Michal; Schrader, Niels; Sijbrandij, Marit; Riper, Heleen
2016-03-29
Ecological momentary assessment (EMA) is a useful method to tap the dynamics of psychological and behavioral phenomena in real-world contexts. However, the response burden of (self-report) EMA limits its clinical utility. The aim was to explore mobile phone-based unobtrusive EMA, in which mobile phone usage logs are considered as proxy measures of clinically relevant user states and contexts. This was an uncontrolled explorative pilot study. Our study consisted of 6 weeks of EMA/unobtrusive EMA data collection in a Dutch student population (N=33), followed by a regression modeling analysis. Participants self-monitored their mood on their mobile phone (EMA) with a one-dimensional mood measure (1 to 10) and a two-dimensional circumplex measure (arousal/valence, -2 to 2). Meanwhile, with participants' consent, a mobile phone app unobtrusively collected (meta) data from six smartphone sensor logs (unobtrusive EMA: calls/short message service (SMS) text messages, screen time, application usage, accelerometer, and phone camera events). Through forward stepwise regression (FSR), we built personalized regression models from the unobtrusive EMA variables to predict day-to-day variation in EMA mood ratings. The predictive performance of these models (ie, cross-validated mean squared error and percentage of correct predictions) was compared to naive benchmark regression models (the mean model and a lag-2 history model). A total of 27 participants (81%) provided a mean 35.5 days (SD 3.8) of valid EMA/unobtrusive EMA data. The FSR models accurately predicted 55% to 76% of EMA mood scores. However, the predictive performance of these models was significantly inferior to that of naive benchmark models. Mobile phone-based unobtrusive EMA is a technically feasible and potentially powerful EMA variant. The method is young and positive findings may not replicate. At present, we do not recommend the application of FSR-based mood prediction in real-world clinical settings. Further psychometric studies and more advanced data mining techniques are needed to unlock unobtrusive EMA's true potential.
Brady, Amie M.G.; Plona, Meg B.
2012-01-01
The Cuyahoga River within Cuyahoga Valley National Park (CVNP) is at times impaired for recreational use due to elevated concentrations of Escherichia coli (E. coli), a fecal-indicator bacterium. During the recreational seasons of mid-May through September during 2009–11, samples were collected 4 days per week and analyzed for E. coli concentrations at two sites within CVNP. Other water-quality and environ-mental data, including turbidity, rainfall, and streamflow, were measured and (or) tabulated for analysis. Regression models developed to predict recreational water quality in the river were implemented during the recreational seasons of 2009–11 for one site within CVNP–Jaite. For the 2009 and 2010 seasons, the regression models were better at predicting exceedances of Ohio's single-sample standard for primary-contact recreation compared to the traditional method of using the previous day's E. coli concentration. During 2009, the regression model was based on data collected during 2005 through 2008, excluding available 2004 data. The resulting model for 2009 did not perform as well as expected (based on the calibration data set) and tended to overestimate concentrations (correct responses at 69 percent). During 2010, the regression model was based on data collected during 2004 through 2009, including all of the available data. The 2010 model performed well, correctly predicting 89 percent of the samples above or below the single-sample standard, even though the predictions tended to be lower than actual sample concentrations. During 2011, the regression model was based on data collected during 2004 through 2010 and tended to overestimate concentrations. The 2011 model did not perform as well as the traditional method or as expected, based on the calibration dataset (correct responses at 56 percent). At a second site—Lock 29, approximately 5 river miles upstream from Jaite, a regression model based on data collected at the site during the recreational seasons of 2008–10 also did not perform as well as the traditional method or as well as expected (correct responses at 60 percent). Above normal precipitation in the region and a delayed start to the 2011 sampling season (sampling began mid-June) may have affected how well the 2011 models performed. With these new data, however, updated regression models may be better able to predict recreational water quality conditions due to the increased amount of diverse water quality conditions included in the calibration data. Daily recreational water-quality predictions for Jaite were made available on the Ohio Nowcast Web site at www.ohionowcast.info. Other public outreach included signage at trailheads in the park, articles in the park's quarterly-published schedule of events and volunteer newsletters. A U.S. Geological Survey Fact Sheet was also published to bring attention to water-quality issues in the park.
Zhao, Lei; Li, Weizheng; Su, Zhihong; Liu, Yong; Zhu, Liyong; Zhu, Shaihong
2018-05-29
This study investigated the role of preoperative fasting C-peptide (FCP) levels in predicting diabetic outcomes in low-BMI Chinese patients following Roux-en-Y gastric bypass (RYGB) by comparing the metabolic outcomes of patients with FCP > 1 ng/ml versus FCP ≤ 1 ng/ml. The study sample included 78 type 2 diabetes mellitus patients with an average BMI < 30 kg/m 2 at baseline. Patients' parameters were analyzed before and after surgery, with a 2-year follow-up. A univariate logistic regression analysis and multivariate analysis of variance between the remission and improvement group were performed to determine factors that were associated with type 2 diabetes remission after RYGB. Linear correlation analyses between FCP and metabolic parameters were performed. Patients were divided into two groups: FCP > 1 ng/ml and FCP ≤ 1 ng/ml, with measured parameters compared between the groups. Patients' fasting plasma glucose, 2-h postprandial plasma glucose, FCP, and HbA1c improved significantly after surgery (p < 0.05). Factors associated with type 2 diabetes remission were BMI, 2hINS, and FCP at the univariate logistic regression analysis (p < 0.05). Multivariate logistic regression analysis was performed then showed the results were more related to FCP (OR = 2.39). FCP showed a significant linear correlation with fasting insulin and BMI (p < 0.05). There was a significant difference in remission rate between the FCP > 1 ng/ml and FCP ≤ 1 ng/ml groups (p = 0.01). The parameters of patients with FCP > 1 ng/ml, including BMI, plasma glucose, HbA1c, and plasma insulin, decreased markedly after surgery (p < 0.05). FCP level is a significant predictor of diabetes outcomes after RYGB in low-BMI Chinese patients. An FCP level of 1 ng/ml may be a useful threshold for predicting surgical prognosis, with FCP > 1 ng/ml predicting better clinical outcomes following RYGB.
Wang, Huan; Lei, Leix; Zhang, Han-Qing; Gu, Zheng-Tian; Xing, Fang-Lan; Yan, Fu-Ling
2018-01-01
The triglyceride (TG)-to-high-density lipoprotein cholesterol (HDL-C) ratio (TG/HDL-C) is a simple approach to predicting unfavorable outcomes in cardiovascular disease. The influence of TG/HDL-C on acute ischemic stroke remains elusive. The purpose of this study was to investigate the precise effect of TG/HDL-C on 3-month mortality after acute ischemic stroke (AIS). Patients with AIS were enrolled in the present study from 2011 to 2017. A total of 1459 participants from a single city in China were divided into retrospective training and prospective test cohorts. Medical records were collected periodically to determine the incidence of fatal events. All participants were followed for 3 months. Optimal cutoff values were determined using X-tile software to separate the training cohort patients into higher and lower survival groups based on their lipid levels. A survival analysis was conducted using Kaplan-Meier curves and a Cox proportional hazards regression model. A total of 1459 patients with AIS (median age 68.5 years, 58.5% male) were analyzed. Univariate Cox regression analysis confirmed that TG/HDL-C was a significant prognostic factor for 3-month survival. X-tile identified 0.9 as an optimal cutoff for TG/HDL-C. In the univariate analysis, the prognosis of the TG/HDL-C >0.9 group was markedly superior to that of TG/HDL-C ≤0.9 group (P<0.001). A multivariate Cox regression analysis showed that TG/HDL-C was independently correlated with a reduced risk of mortality (hazard ratio [HR], 0.39; 95% confidence interval [CI], 0.24-0.62; P<0.001). These results were confirmed in the 453 patients in the test cohort. A nomogram was constructed to predict 3-month case-fatality, and the c-indexes of predictive accuracy were 0.684 and 0.670 in the training and test cohorts, respectively (P<0.01). The serum TG/HDL-C ratio may be useful for predicting short-term mortality after AIS. PMID:29896437