Sample records for final regression model

  1. An empirical study using permutation-based resampling in meta-regression

    PubMed Central

    2012-01-01

    Background In meta-regression, as the number of trials in the analyses decreases, the risk of false positives or false negatives increases. This is partly due to the assumption of normality that may not hold in small samples. Creation of a distribution from the observed trials using permutation methods to calculate P values may allow for less spurious findings. Permutation has not been empirically tested in meta-regression. The objective of this study was to perform an empirical investigation to explore the differences in results for meta-analyses on a small number of trials using standard large sample approaches verses permutation-based methods for meta-regression. Methods We isolated a sample of randomized controlled clinical trials (RCTs) for interventions that have a small number of trials (herbal medicine trials). Trials were then grouped by herbal species and condition and assessed for methodological quality using the Jadad scale, and data were extracted for each outcome. Finally, we performed meta-analyses on the primary outcome of each group of trials and meta-regression for methodological quality subgroups within each meta-analysis. We used large sample methods and permutation methods in our meta-regression modeling. We then compared final models and final P values between methods. Results We collected 110 trials across 5 intervention/outcome pairings and 5 to 10 trials per covariate. When applying large sample methods and permutation-based methods in our backwards stepwise regression the covariates in the final models were identical in all cases. The P values for the covariates in the final model were larger in 78% (7/9) of the cases for permutation and identical for 22% (2/9) of the cases. Conclusions We present empirical evidence that permutation-based resampling may not change final models when using backwards stepwise regression, but may increase P values in meta-regression of multiple covariates for relatively small amount of trials. PMID:22587815

  2. Regression Model Term Selection for the Analysis of Strain-Gage Balance Calibration Data

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert Manfred; Volden, Thomas R.

    2010-01-01

    The paper discusses the selection of regression model terms for the analysis of wind tunnel strain-gage balance calibration data. Different function class combinations are presented that may be used to analyze calibration data using either a non-iterative or an iterative method. The role of the intercept term in a regression model of calibration data is reviewed. In addition, useful algorithms and metrics originating from linear algebra and statistics are recommended that will help an analyst (i) to identify and avoid both linear and near-linear dependencies between regression model terms and (ii) to make sure that the selected regression model of the calibration data uses only statistically significant terms. Three different tests are suggested that may be used to objectively assess the predictive capability of the final regression model of the calibration data. These tests use both the original data points and regression model independent confirmation points. Finally, data from a simplified manual calibration of the Ames MK40 balance is used to illustrate the application of some of the metrics and tests to a realistic calibration data set.

  3. Spatial Double Generalized Beta Regression Models: Extensions and Application to Study Quality of Education in Colombia

    ERIC Educational Resources Information Center

    Cepeda-Cuervo, Edilberto; Núñez-Antón, Vicente

    2013-01-01

    In this article, a proposed Bayesian extension of the generalized beta spatial regression models is applied to the analysis of the quality of education in Colombia. We briefly revise the beta distribution and describe the joint modeling approach for the mean and dispersion parameters in the spatial regression models' setting. Finally, we motivate…

  4. Army College Fund Cost-Effectiveness Study

    DTIC Science & Technology

    1990-11-01

    Section A.2 presents a theory of enlistment supply to provide a basis for specifying the regression model , The model Is specified in Section A.3, which...Supplementary materials are included in the final four sections. Section A.6 provides annual trends in the regression model variables. Estimates of the model ...millions, A.S. ESTIMATION OF A YOUTH EARNINGS FORECASTING MODEL Civilian pay is an important explanatory variable in the regression model . Previous

  5. External Tank Liquid Hydrogen (LH2) Prepress Regression Analysis Independent Review Technical Consultation Report

    NASA Technical Reports Server (NTRS)

    Parsons, Vickie s.

    2009-01-01

    The request to conduct an independent review of regression models, developed for determining the expected Launch Commit Criteria (LCC) External Tank (ET)-04 cycle count for the Space Shuttle ET tanking process, was submitted to the NASA Engineering and Safety Center NESC on September 20, 2005. The NESC team performed an independent review of regression models documented in Prepress Regression Analysis, Tom Clark and Angela Krenn, 10/27/05. This consultation consisted of a peer review by statistical experts of the proposed regression models provided in the Prepress Regression Analysis. This document is the consultation's final report.

  6. 77 FR 3121 - Program Integrity: Gainful Employment-Debt Measures; Correction

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-01-23

    ...On June 13, 2011, the Secretary of Education (Secretary) published a notice of final regulations in the Federal Register for Program Integrity: Gainful Employment--Debt Measures (Gainful Employment--Debt Measures) (76 FR 34386). In the preamble of the final regulations, we used the wrong data to calculate the percent of total variance in institutions' repayment rates that may be explained by race/ethnicity. Our intent was to use the data that included all minority students per institution. However, we mistakenly used the data for a subset of minority students per institution. We have now recalculated the total variance using the data that includes all minority students. Through this document, we correct, in the preamble of the Gainful Employment--Debt Measures final regulations, the errors resulting from this misapplication. We do not change the regression analysis model itself; we are using the same model with the appropriate data. Through this notice we also correct, in the preamble of the Gainful Employment--Debt Measures final regulations, our description of one component of the regression analysis. The preamble referred to use of an institutional variable measuring acceptance rates. This description was incorrect; in fact we used an institutional variable measuring retention rates. Correcting this language does not change the regression analysis model itself or the variance explained by the model. The text of the final regulations remains unchanged.

  7. A land use regression model for ambient ultrafine particles in Montreal, Canada: A comparison of linear regression and a machine learning approach.

    PubMed

    Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne

    2016-04-01

    Existing evidence suggests that ambient ultrafine particles (UFPs) (<0.1µm) may contribute to acute cardiorespiratory morbidity. However, few studies have examined the long-term health effects of these pollutants owing in part to a need for exposure surfaces that can be applied in large population-based studies. To address this need, we developed a land use regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.

  8. Developing global regression models for metabolite concentration prediction regardless of cell line.

    PubMed

    André, Silvère; Lagresle, Sylvain; Da Sliva, Anthony; Heimendinger, Pierre; Hannas, Zahia; Calvosa, Éric; Duponchel, Ludovic

    2017-11-01

    Following the Process Analytical Technology (PAT) of the Food and Drug Administration (FDA), drug manufacturers are encouraged to develop innovative techniques in order to monitor and understand their processes in a better way. Within this framework, it has been demonstrated that Raman spectroscopy coupled with chemometric tools allow to predict critical parameters of mammalian cell cultures in-line and in real time. However, the development of robust and predictive regression models clearly requires many batches in order to take into account inter-batch variability and enhance models accuracy. Nevertheless, this heavy procedure has to be repeated for every new line of cell culture involving many resources. This is why we propose in this paper to develop global regression models taking into account different cell lines. Such models are finally transferred to any culture of the cells involved. This article first demonstrates the feasibility of developing regression models, not only for mammalian cell lines (CHO and HeLa cell cultures), but also for insect cell lines (Sf9 cell cultures). Then global regression models are generated, based on CHO cells, HeLa cells, and Sf9 cells. Finally, these models are evaluated considering a fourth cell line(HEK cells). In addition to suitable predictions of glucose and lactate concentration of HEK cell cultures, we expose that by adding a single HEK-cell culture to the calibration set, the predictive ability of the regression models are substantially increased. In this way, we demonstrate that using global models, it is not necessary to consider many cultures of a new cell line in order to obtain accurate models. Biotechnol. Bioeng. 2017;114: 2550-2559. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  9. Model selection for logistic regression models

    NASA Astrophysics Data System (ADS)

    Duller, Christine

    2012-09-01

    Model selection for logistic regression models decides which of some given potential regressors have an effect and hence should be included in the final model. The second interesting question is whether a certain factor is heterogeneous among some subsets, i.e. whether the model should include a random intercept or not. In this paper these questions will be answered with classical as well as with Bayesian methods. The application show some results of recent research projects in medicine and business administration.

  10. Convergent Time-Varying Regression Models for Data Streams: Tracking Concept Drift by the Recursive Parzen-Based Generalized Regression Neural Networks.

    PubMed

    Duda, Piotr; Jaworski, Maciej; Rutkowski, Leszek

    2018-03-01

    One of the greatest challenges in data mining is related to processing and analysis of massive data streams. Contrary to traditional static data mining problems, data streams require that each element is processed only once, the amount of allocated memory is constant and the models incorporate changes of investigated streams. A vast majority of available methods have been developed for data stream classification and only a few of them attempted to solve regression problems, using various heuristic approaches. In this paper, we develop mathematically justified regression models working in a time-varying environment. More specifically, we study incremental versions of generalized regression neural networks, called IGRNNs, and we prove their tracking properties - weak (in probability) and strong (with probability one) convergence assuming various concept drift scenarios. First, we present the IGRNNs, based on the Parzen kernels, for modeling stationary systems under nonstationary noise. Next, we extend our approach to modeling time-varying systems under nonstationary noise. We present several types of concept drifts to be handled by our approach in such a way that weak and strong convergence holds under certain conditions. Finally, in the series of simulations, we compare our method with commonly used heuristic approaches, based on forgetting mechanism or sliding windows, to deal with concept drift. Finally, we apply our concept in a real life scenario solving the problem of currency exchange rates prediction.

  11. Evaluation of Regression Models of Balance Calibration Data Using an Empirical Criterion

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert; Volden, Thomas R.

    2012-01-01

    An empirical criterion for assessing the significance of individual terms of regression models of wind tunnel strain gage balance outputs is evaluated. The criterion is based on the percent contribution of a regression model term. It considers a term to be significant if its percent contribution exceeds the empirical threshold of 0.05%. The criterion has the advantage that it can easily be computed using the regression coefficients of the gage outputs and the load capacities of the balance. First, a definition of the empirical criterion is provided. Then, it is compared with an alternate statistical criterion that is widely used in regression analysis. Finally, calibration data sets from a variety of balances are used to illustrate the connection between the empirical and the statistical criterion. A review of these results indicated that the empirical criterion seems to be suitable for a crude assessment of the significance of a regression model term as the boundary between a significant and an insignificant term cannot be defined very well. Therefore, regression model term reduction should only be performed by using the more universally applicable statistical criterion.

  12. Easy and low-cost identification of metabolic syndrome in patients treated with second-generation antipsychotics: artificial neural network and logistic regression models.

    PubMed

    Lin, Chao-Cheng; Bai, Ya-Mei; Chen, Jen-Yeu; Hwang, Tzung-Jeng; Chen, Tzu-Ting; Chiu, Hung-Wen; Li, Yu-Chuan

    2010-03-01

    Metabolic syndrome (MetS) is an important side effect of second-generation antipsychotics (SGAs). However, many SGA-treated patients with MetS remain undetected. In this study, we trained and validated artificial neural network (ANN) and multiple logistic regression models without biochemical parameters to rapidly identify MetS in patients with SGA treatment. A total of 383 patients with a diagnosis of schizophrenia or schizoaffective disorder (DSM-IV criteria) with SGA treatment for more than 6 months were investigated to determine whether they met the MetS criteria according to the International Diabetes Federation. The data for these patients were collected between March 2005 and September 2005. The input variables of ANN and logistic regression were limited to demographic and anthropometric data only. All models were trained by randomly selecting two-thirds of the patient data and were internally validated with the remaining one-third of the data. The models were then externally validated with data from 69 patients from another hospital, collected between March 2008 and June 2008. The area under the receiver operating characteristic curve (AUC) was used to measure the performance of all models. Both the final ANN and logistic regression models had high accuracy (88.3% vs 83.6%), sensitivity (93.1% vs 86.2%), and specificity (86.9% vs 83.8%) to identify MetS in the internal validation set. The mean +/- SD AUC was high for both the ANN and logistic regression models (0.934 +/- 0.033 vs 0.922 +/- 0.035, P = .63). During external validation, high AUC was still obtained for both models. Waist circumference and diastolic blood pressure were the common variables that were left in the final ANN and logistic regression models. Our study developed accurate ANN and logistic regression models to detect MetS in patients with SGA treatment. The models are likely to provide a noninvasive tool for large-scale screening of MetS in this group of patients. (c) 2010 Physicians Postgraduate Press, Inc.

  13. Science of Test Research Consortium: Year Two Final Report

    DTIC Science & Technology

    2012-10-02

    July 2012. Analysis of an Intervention for Small Unmanned Aerial System ( SUAS ) Accidents, submitted to Quality Engineering, LQEN-2012-0056. Stone... Systems Engineering. Wolf, S. E., R. R. Hill, and J. J. Pignatiello. June 2012. Using Neural Networks and Logistic Regression to Model Small Unmanned ...Human Retina. 6. Wolf, S. E. March 2012. Modeling Small Unmanned Aerial System Mishaps using Logistic Regression and Artificial Neural Networks. 7

  14. Regression Models for Identifying Noise Sources in Magnetic Resonance Images

    PubMed Central

    Zhu, Hongtu; Li, Yimei; Ibrahim, Joseph G.; Shi, Xiaoyan; An, Hongyu; Chen, Yashen; Gao, Wei; Lin, Weili; Rowe, Daniel B.; Peterson, Bradley S.

    2009-01-01

    Stochastic noise, susceptibility artifacts, magnetic field and radiofrequency inhomogeneities, and other noise components in magnetic resonance images (MRIs) can introduce serious bias into any measurements made with those images. We formally introduce three regression models including a Rician regression model and two associated normal models to characterize stochastic noise in various magnetic resonance imaging modalities, including diffusion-weighted imaging (DWI) and functional MRI (fMRI). Estimation algorithms are introduced to maximize the likelihood function of the three regression models. We also develop a diagnostic procedure for systematically exploring MR images to identify noise components other than simple stochastic noise, and to detect discrepancies between the fitted regression models and MRI data. The diagnostic procedure includes goodness-of-fit statistics, measures of influence, and tools for graphical display. The goodness-of-fit statistics can assess the key assumptions of the three regression models, whereas measures of influence can isolate outliers caused by certain noise components, including motion artifacts. The tools for graphical display permit graphical visualization of the values for the goodness-of-fit statistic and influence measures. Finally, we conduct simulation studies to evaluate performance of these methods, and we analyze a real dataset to illustrate how our diagnostic procedure localizes subtle image artifacts by detecting intravoxel variability that is not captured by the regression models. PMID:19890478

  15. Soft-sensing model of temperature for aluminum reduction cell on improved twin support vector regression

    NASA Astrophysics Data System (ADS)

    Li, Tao

    2018-06-01

    The complexity of aluminum electrolysis process leads the temperature for aluminum reduction cells hard to measure directly. However, temperature is the control center of aluminum production. To solve this problem, combining some aluminum plant's practice data, this paper presents a Soft-sensing model of temperature for aluminum electrolysis process on Improved Twin Support Vector Regression (ITSVR). ITSVR eliminates the slow learning speed of Support Vector Regression (SVR) and the over-fit risk of Twin Support Vector Regression (TSVR) by introducing a regularization term into the objective function of TSVR, which ensures the structural risk minimization principle and lower computational complexity. Finally, the model with some other parameters as auxiliary variable, predicts the temperature by ITSVR. The simulation result shows Soft-sensing model based on ITSVR has short time-consuming and better generalization.

  16. Utility of an Abbreviated Dizziness Questionnaire to Differentiate between Causes of Vertigo and Guide Appropriate Referral: A Multicenter Prospective Blinded Study

    PubMed Central

    Roland, Lauren T.; Kallogjeri, Dorina; Sinks, Belinda C.; Rauch, Steven D.; Shepard, Neil T.; White, Judith A.; Goebel, Joel A.

    2015-01-01

    Objective Test performance of a focused dizziness questionnaire’s ability to discriminate between peripheral and non-peripheral causes of vertigo. Study Design Prospective multi-center Setting Four academic centers with experienced balance specialists Patients New dizzy patients Interventions A 32-question survey was given to participants. Balance specialists were blinded and a diagnosis was established for all participating patients within 6 months. Main outcomes Multinomial logistic regression was used to evaluate questionnaire performance in predicting final diagnosis and differentiating between peripheral and non-peripheral vertigo. Univariate and multivariable stepwise logistic regression were used to identify questions as significant predictors of the ultimate diagnosis. C-index was used to evaluate performance and discriminative power of the multivariable models. Results 437 patients participated in the study. Eight participants without confirmed diagnoses were excluded and 429 were included in the analysis. Multinomial regression revealed that the model had good overall predictive accuracy of 78.5% for the final diagnosis and 75.5% for differentiating between peripheral and non-peripheral vertigo. Univariate logistic regression identified significant predictors of three main categories of vertigo: peripheral, central and other. Predictors were entered into forward stepwise multivariable logistic regression. The discriminative power of the final models for peripheral, central and other causes were considered good as measured by c-indices of 0.75, 0.7 and 0.78, respectively. Conclusions This multicenter study demonstrates a focused dizziness questionnaire can accurately predict diagnosis for patients with chronic/relapsing dizziness referred to outpatient clinics. Additionally, this survey has significant capability to differentiate peripheral from non-peripheral causes of vertigo and may, in the future, serve as a screening tool for specialty referral. Clinical utility of this questionnaire to guide specialty referral is discussed. PMID:26485598

  17. Utility of an Abbreviated Dizziness Questionnaire to Differentiate Between Causes of Vertigo and Guide Appropriate Referral: A Multicenter Prospective Blinded Study.

    PubMed

    Roland, Lauren T; Kallogjeri, Dorina; Sinks, Belinda C; Rauch, Steven D; Shepard, Neil T; White, Judith A; Goebel, Joel A

    2015-12-01

    Test performance of a focused dizziness questionnaire's ability to discriminate between peripheral and nonperipheral causes of vertigo. Prospective multicenter. Four academic centers with experienced balance specialists. New dizzy patients. A 32-question survey was given to participants. Balance specialists were blinded and a diagnosis was established for all participating patients within 6 months. Multinomial logistic regression was used to evaluate questionnaire performance in predicting final diagnosis and differentiating between peripheral and nonperipheral vertigo. Univariate and multivariable stepwise logistic regression were used to identify questions as significant predictors of the ultimate diagnosis. C-index was used to evaluate performance and discriminative power of the multivariable models. In total, 437 patients participated in the study. Eight participants without confirmed diagnoses were excluded and 429 were included in the analysis. Multinomial regression revealed that the model had good overall predictive accuracy of 78.5% for the final diagnosis and 75.5% for differentiating between peripheral and nonperipheral vertigo. Univariate logistic regression identified significant predictors of three main categories of vertigo: peripheral, central, and other. Predictors were entered into forward stepwise multivariable logistic regression. The discriminative power of the final models for peripheral, central, and other causes was considered good as measured by c-indices of 0.75, 0.7, and 0.78, respectively. This multicenter study demonstrates a focused dizziness questionnaire can accurately predict diagnosis for patients with chronic/relapsing dizziness referred to outpatient clinics. Additionally, this survey has significant capability to differentiate peripheral from nonperipheral causes of vertigo and may, in the future, serve as a screening tool for specialty referral. Clinical utility of this questionnaire to guide specialty referral is discussed.

  18. A Pre-Screening Questionnaire to Predict Non-24-Hour Sleep-Wake Rhythm Disorder (N24HSWD) among the Blind

    PubMed Central

    Flynn-Evans, Erin E.; Lockley, Steven W.

    2016-01-01

    Study Objectives: There is currently no questionnaire-based pre-screening tool available to detect non-24-hour sleep-wake rhythm disorder (N24HSWD) among blind patients. Our goal was to develop such a tool, derived from gold standard, objective hormonal measures of circadian entrainment status, for the detection of N24HSWD among those with visual impairment. Methods: We evaluated the contribution of 40 variables in their ability to predict N24HSWD among 127 blind women, classified using urinary 6-sulfatoxymelatonin period, an objective marker of circadian entrainment status in this population. We subjected the 40 candidate predictors to 1,000 bootstrapped iterations of a logistic regression forward selection model to predict N24HSWD, with model inclusion set at the p < 0.05 level. We removed any predictors that were not selected at least 1% of the time in the 1,000 bootstrapped models and applied a second round of 1,000 bootstrapped logistic regression forward selection models to the remaining 23 candidate predictors. We included all questions that were selected at least 10% of the time in the final model. We subjected the selected predictors to a final logistic regression model to predict N24SWD over 1,000 bootstrapped models to calculate the concordance statistic and adjusted optimism of the final model. We used this information to generate a predictive model and determined the sensitivity and specificity of the model. Finally, we applied the model to a cohort of 1,262 blind women who completed the survey, but did not collect urine samples. Results: The final model consisted of eight questions. The concordance statistic, adjusted for bootstrapping, was 0.85. The positive predictive value was 88%, the negative predictive value was 79%. Applying this model to our larger dataset of women, we found that 61% of those without light perception, and 27% with some degree of light perception, would be referred for further screening for N24HSWD. Conclusions: Our model has predictive utility sufficient to serve as a pre-screening questionnaire for N24HSWD among the blind. Citation: Flynn-Evans EE, Lockley SW. A pre-screening questionnaire to predict non-24-hour sleep-wake rhythm disorder (N24HSWD) among the blind. J Clin Sleep Med 2016;12(5):703–710. PMID:26951421

  19. Regression Model Optimization for the Analysis of Experimental Data

    NASA Technical Reports Server (NTRS)

    Ulbrich, N.

    2009-01-01

    A candidate math model search algorithm was developed at Ames Research Center that determines a recommended math model for the multivariate regression analysis of experimental data. The search algorithm is applicable to classical regression analysis problems as well as wind tunnel strain gage balance calibration analysis applications. The algorithm compares the predictive capability of different regression models using the standard deviation of the PRESS residuals of the responses as a search metric. This search metric is minimized during the search. Singular value decomposition is used during the search to reject math models that lead to a singular solution of the regression analysis problem. Two threshold dependent constraints are also applied. The first constraint rejects math models with insignificant terms. The second constraint rejects math models with near-linear dependencies between terms. The math term hierarchy rule may also be applied as an optional constraint during or after the candidate math model search. The final term selection of the recommended math model depends on the regressor and response values of the data set, the user s function class combination choice, the user s constraint selections, and the result of the search metric minimization. A frequently used regression analysis example from the literature is used to illustrate the application of the search algorithm to experimental data.

  20. Time series modeling by a regression approach based on a latent process.

    PubMed

    Chamroukhi, Faicel; Samé, Allou; Govaert, Gérard; Aknin, Patrice

    2009-01-01

    Time series are used in many domains including finance, engineering, economics and bioinformatics generally to represent the change of a measurement over time. Modeling techniques may then be used to give a synthetic representation of such data. A new approach for time series modeling is proposed in this paper. It consists of a regression model incorporating a discrete hidden logistic process allowing for activating smoothly or abruptly different polynomial regression models. The model parameters are estimated by the maximum likelihood method performed by a dedicated Expectation Maximization (EM) algorithm. The M step of the EM algorithm uses a multi-class Iterative Reweighted Least-Squares (IRLS) algorithm to estimate the hidden process parameters. To evaluate the proposed approach, an experimental study on simulated data and real world data was performed using two alternative approaches: a heteroskedastic piecewise regression model using a global optimization algorithm based on dynamic programming, and a Hidden Markov Regression Model whose parameters are estimated by the Baum-Welch algorithm. Finally, in the context of the remote monitoring of components of the French railway infrastructure, and more particularly the switch mechanism, the proposed approach has been applied to modeling and classifying time series representing the condition measurements acquired during switch operations.

  1. Using social cognitive theory to explain discretionary, "leisure-time" physical exercise among high school students.

    PubMed

    Winters, Eric R; Petosa, Rick L; Charlton, Thomas E

    2003-06-01

    To examine whether knowledge of high school students' actions of self-regulation, and perceptions of self-efficacy to overcome exercise barriers, social situation, and outcome expectation will predict non-school related moderate and vigorous physical exercise. High school students enrolled in introductory Physical Education courses completed questionnaires that targeted selected Social Cognitive Theory variables. They also self-reported their typical "leisure-time" exercise participation using a standardized questionnaire. Bivariate correlation statistic and hierarchical regression were conducted on reports of moderate and vigorous exercise frequency. Each predictor variable was significantly associated with measures of moderate and vigorous exercise frequency. All predictor variables were significant in the final regression model used to explain vigorous exercise. After controlling for the effects of gender, the psychosocial variables explained 29% of variance in vigorous exercise frequency. Three of four predictor variables were significant in the final regression equation used to explain moderate exercise. The final regression equation accounted for 11% of variance in moderate exercise frequency. Professionals who attempt to increase the prevalence of physical exercise through educational methods should focus on the psychosocial variables utilized in this study.

  2. Markovian prediction of future values for food grains in the economic survey

    NASA Astrophysics Data System (ADS)

    Sathish, S.; Khadar Babu, S. K.

    2017-11-01

    Now-a-days prediction and forecasting are plays a vital role in research. For prediction, regression is useful to predict the future value and current value on production process. In this paper, we assume food grain production exhibit Markov chain dependency and time homogeneity. The economic generative performance evaluation the balance time artificial fertilization different level in Estrusdetection using a daily Markov chain model. Finally, Markov process prediction gives better performance compare with Regression model.

  3. Predicting Final GPA of Graduate School Students: Comparing Artificial Neural Networking and Simultaneous Multiple Regression

    ERIC Educational Resources Information Center

    Anderson, Joan L.

    2006-01-01

    Data from graduate student applications at a large Western university were used to determine which factors were the best predictors of success in graduate school, as defined by cumulative graduate grade point average. Two statistical models were employed and compared: artificial neural networking and simultaneous multiple regression. Both models…

  4. Local polynomial estimation of heteroscedasticity in a multivariate linear regression model and its applications in economics.

    PubMed

    Su, Liyun; Zhao, Yanyong; Yan, Tianshun; Li, Fenglan

    2012-01-01

    Multivariate local polynomial fitting is applied to the multivariate linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to non-parametric technique of local polynomial estimation, it is unnecessary to know the form of heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we verify that the regression coefficients is asymptotic normal based on numerical simulations and normal Q-Q plots of residuals. Finally, the simulation results and the local polynomial estimation of real data indicate that our approach is surely effective in finite-sample situations.

  5. An overall strategy based on regression models to estimate relative survival and model the effects of prognostic factors in cancer survival studies.

    PubMed

    Remontet, L; Bossard, N; Belot, A; Estève, J

    2007-05-10

    Relative survival provides a measure of the proportion of patients dying from the disease under study without requiring the knowledge of the cause of death. We propose an overall strategy based on regression models to estimate the relative survival and model the effects of potential prognostic factors. The baseline hazard was modelled until 10 years follow-up using parametric continuous functions. Six models including cubic regression splines were considered and the Akaike Information Criterion was used to select the final model. This approach yielded smooth and reliable estimates of mortality hazard and allowed us to deal with sparse data taking into account all the available information. Splines were also used to model simultaneously non-linear effects of continuous covariates and time-dependent hazard ratios. This led to a graphical representation of the hazard ratio that can be useful for clinical interpretation. Estimates of these models were obtained by likelihood maximization. We showed that these estimates could be also obtained using standard algorithms for Poisson regression. Copyright 2006 John Wiley & Sons, Ltd.

  6. An introduction to using Bayesian linear regression with clinical data.

    PubMed

    Baldwin, Scott A; Larson, Michael J

    2017-11-01

    Statistical training psychology focuses on frequentist methods. Bayesian methods are an alternative to standard frequentist methods. This article provides researchers with an introduction to fundamental ideas in Bayesian modeling. We use data from an electroencephalogram (EEG) and anxiety study to illustrate Bayesian models. Specifically, the models examine the relationship between error-related negativity (ERN), a particular event-related potential, and trait anxiety. Methodological topics covered include: how to set up a regression model in a Bayesian framework, specifying priors, examining convergence of the model, visualizing and interpreting posterior distributions, interval estimates, expected and predicted values, and model comparison tools. We also discuss situations where Bayesian methods can outperform frequentist methods as well has how to specify more complicated regression models. Finally, we conclude with recommendations about reporting guidelines for those using Bayesian methods in their own research. We provide data and R code for replicating our analyses. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Application of Machine-Learning Models to Predict Tacrolimus Stable Dose in Renal Transplant Recipients

    NASA Astrophysics Data System (ADS)

    Tang, Jie; Liu, Rong; Zhang, Yue-Li; Liu, Mou-Ze; Hu, Yong-Fang; Shao, Ming-Jie; Zhu, Li-Jun; Xin, Hua-Wen; Feng, Gui-Wen; Shang, Wen-Jun; Meng, Xiang-Guang; Zhang, Li-Rong; Ming, Ying-Zi; Zhang, Wei

    2017-02-01

    Tacrolimus has a narrow therapeutic window and considerable variability in clinical use. Our goal was to compare the performance of multiple linear regression (MLR) and eight machine learning techniques in pharmacogenetic algorithm-based prediction of tacrolimus stable dose (TSD) in a large Chinese cohort. A total of 1,045 renal transplant patients were recruited, 80% of which were randomly selected as the “derivation cohort” to develop dose-prediction algorithm, while the remaining 20% constituted the “validation cohort” to test the final selected algorithm. MLR, artificial neural network (ANN), regression tree (RT), multivariate adaptive regression splines (MARS), boosted regression tree (BRT), support vector regression (SVR), random forest regression (RFR), lasso regression (LAR) and Bayesian additive regression trees (BART) were applied and their performances were compared in this work. Among all the machine learning models, RT performed best in both derivation [0.71 (0.67-0.76)] and validation cohorts [0.73 (0.63-0.82)]. In addition, the ideal rate of RT was 4% higher than that of MLR. To our knowledge, this is the first study to use machine learning models to predict TSD, which will further facilitate personalized medicine in tacrolimus administration in the future.

  8. Advanced statistics: linear regression, part II: multiple linear regression.

    PubMed

    Marill, Keith A

    2004-01-01

    The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.

  9. Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate

    NASA Astrophysics Data System (ADS)

    Minh, Vu Trieu; Katushin, Dmitri; Antonov, Maksim; Veinthal, Renno

    2017-03-01

    This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM) based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), rock brittleness index (BI), the distance between planes of weakness (DPW), and the alpha angle (Alpha) between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP). Four (4) statistical regression models (two linear and two nonlinear) are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2) of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.

  10. Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications

    PubMed Central

    Qian, Guoqi; Wu, Yuehua; Ferrari, Davide; Qiao, Puxue; Hollande, Frédéric

    2016-01-01

    Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method. PMID:27212939

  11. Analysis of the Influence of Quantile Regression Model on Mainland Tourists' Service Satisfaction Performance

    PubMed Central

    Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen

    2014-01-01

    It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models. PMID:24574916

  12. Analysis of the influence of quantile regression model on mainland tourists' service satisfaction performance.

    PubMed

    Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen

    2014-01-01

    It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models.

  13. The moderation of resilience on the negative effect of pain on depression and post-traumatic growth in individuals with spinal cord injury.

    PubMed

    Min, Jung-Ah; Lee, Chang-Uk; Hwang, Sung-Il; Shin, Jung-In; Lee, Bum-Suk; Han, Sang-Hoon; Ju, Hye-In; Lee, Cha-Yeon; Lee, Chul; Chae, Jeong-Ho

    2014-01-01

    To determine the moderating effect of resilience on the negative effects of chronic pain on depression and post-traumatic growth. Community-dwelling individuals with SCI (n = 37) were recruited at short-term admission for yearly regular health examination. Participants completed self-rating standardized questionnaires measuring pain, resilience, depression and post-traumatic growth. Hierarchical linear regression analysis was performed to identify the moderating effect of resilience on the relationships of pain with depression and post-traumatic growth after controlling for relevant covariates. In the regression model of depression, the effect of pain severity on depression was decreased (β was changed from 0.47 to 0.33) after entering resilience into the model. In the final model, both pain and resilience were significant independent predictors for depression (β = 0.33, p = 0.038 and β = -0.47, p = 0.012, respectively). In the regression model of post-traumatic growth, the effect of pain severity became insignificant after entering resilience into the model. In the final model, resilience was a significant predictor (β = 0.51, p = 0.016). Resilience potentially mitigated the negative effects of pain. Moreover, it independently contributed to reduced depression and greater post-traumatic growth. Our findings suggest that resilience might provide a potential target for intervention in SCI individuals.

  14. Random forest models to predict aqueous solubility.

    PubMed

    Palmer, David S; O'Boyle, Noel M; Glen, Robert C; Mitchell, John B O

    2007-01-01

    Random Forest regression (RF), Partial-Least-Squares (PLS) regression, Support Vector Machines (SVM), and Artificial Neural Networks (ANN) were used to develop QSPR models for the prediction of aqueous solubility, based on experimental data for 988 organic molecules. The Random Forest regression model predicted aqueous solubility more accurately than those created by PLS, SVM, and ANN and offered methods for automatic descriptor selection, an assessment of descriptor importance, and an in-parallel measure of predictive ability, all of which serve to recommend its use. The prediction of log molar solubility for an external test set of 330 molecules that are solid at 25 degrees C gave an r2 = 0.89 and RMSE = 0.69 log S units. For a standard data set selected from the literature, the model performed well with respect to other documented methods. Finally, the diversity of the training and test sets are compared to the chemical space occupied by molecules in the MDL drug data report, on the basis of molecular descriptors selected by the regression analysis.

  15. Regression-based adaptive sparse polynomial dimensional decomposition for sensitivity analysis

    NASA Astrophysics Data System (ADS)

    Tang, Kunkun; Congedo, Pietro; Abgrall, Remi

    2014-11-01

    Polynomial dimensional decomposition (PDD) is employed in this work for global sensitivity analysis and uncertainty quantification of stochastic systems subject to a large number of random input variables. Due to the intimate structure between PDD and Analysis-of-Variance, PDD is able to provide simpler and more direct evaluation of the Sobol' sensitivity indices, when compared to polynomial chaos (PC). Unfortunately, the number of PDD terms grows exponentially with respect to the size of the input random vector, which makes the computational cost of the standard method unaffordable for real engineering applications. In order to address this problem of curse of dimensionality, this work proposes a variance-based adaptive strategy aiming to build a cheap meta-model by sparse-PDD with PDD coefficients computed by regression. During this adaptive procedure, the model representation by PDD only contains few terms, so that the cost to resolve repeatedly the linear system of the least-square regression problem is negligible. The size of the final sparse-PDD representation is much smaller than the full PDD, since only significant terms are eventually retained. Consequently, a much less number of calls to the deterministic model is required to compute the final PDD coefficients.

  16. Multilevel covariance regression with correlated random effects in the mean and variance structure.

    PubMed

    Quintero, Adrian; Lesaffre, Emmanuel

    2017-09-01

    Multivariate regression methods generally assume a constant covariance matrix for the observations. In case a heteroscedastic model is needed, the parametric and nonparametric covariance regression approaches can be restrictive in the literature. We propose a multilevel regression model for the mean and covariance structure, including random intercepts in both components and allowing for correlation between them. The implied conditional covariance function can be different across clusters as a result of the random effect in the variance structure. In addition, allowing for correlation between the random intercepts in the mean and covariance makes the model convenient for skewedly distributed responses. Furthermore, it permits us to analyse directly the relation between the mean response level and the variability in each cluster. Parameter estimation is carried out via Gibbs sampling. We compare the performance of our model to other covariance modelling approaches in a simulation study. Finally, the proposed model is applied to the RN4CAST dataset to identify the variables that impact burnout of nurses in Belgium. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  17. Using Gamma and Quantile Regressions to Explore the Association between Job Strain and Adiposity in the ELSA-Brasil Study: Does Gender Matter?

    PubMed

    Fonseca, Maria de Jesus Mendes da; Juvanhol, Leidjaira Lopes; Rotenberg, Lúcia; Nobre, Aline Araújo; Griep, Rosane Härter; Alves, Márcia Guimarães de Mello; Cardoso, Letícia de Oliveira; Giatti, Luana; Nunes, Maria Angélica; Aquino, Estela M L; Chor, Dóra

    2017-11-17

    This paper explores the association between job strain and adiposity, using two statistical analysis approaches and considering the role of gender. The research evaluated 11,960 active baseline participants (2008-2010) in the ELSA-Brasil study. Job strain was evaluated through a demand-control questionnaire, while body mass index (BMI) and waist circumference (WC) were evaluated in continuous form. The associations were estimated using gamma regression models with an identity link function. Quantile regression models were also estimated from the final set of co-variables established by gamma regression. The relationship that was found varied by analytical approach and gender. Among the women, no association was observed between job strain and adiposity in the fitted gamma models. In the quantile models, a pattern of increasing effects of high strain was observed at higher BMI and WC distribution quantiles. Among the men, high strain was associated with adiposity in the gamma regression models. However, when quantile regression was used, that association was found not to be homogeneous across outcome distributions. In addition, in the quantile models an association was observed between active jobs and BMI. Our results point to an association between job strain and adiposity, which follows a heterogeneous pattern. Modelling strategies can produce different results and should, accordingly, be used to complement one another.

  18. Bias in logistic regression due to imperfect diagnostic test results and practical correction approaches.

    PubMed

    Valle, Denis; Lima, Joanna M Tucker; Millar, Justin; Amratia, Punam; Haque, Ubydul

    2015-11-04

    Logistic regression is a statistical model widely used in cross-sectional and cohort studies to identify and quantify the effects of potential disease risk factors. However, the impact of imperfect tests on adjusted odds ratios (and thus on the identification of risk factors) is under-appreciated. The purpose of this article is to draw attention to the problem associated with modelling imperfect diagnostic tests, and propose simple Bayesian models to adequately address this issue. A systematic literature review was conducted to determine the proportion of malaria studies that appropriately accounted for false-negatives/false-positives in a logistic regression setting. Inference from the standard logistic regression was also compared with that from three proposed Bayesian models using simulations and malaria data from the western Brazilian Amazon. A systematic literature review suggests that malaria epidemiologists are largely unaware of the problem of using logistic regression to model imperfect diagnostic test results. Simulation results reveal that statistical inference can be substantially improved when using the proposed Bayesian models versus the standard logistic regression. Finally, analysis of original malaria data with one of the proposed Bayesian models reveals that microscopy sensitivity is strongly influenced by how long people have lived in the study region, and an important risk factor (i.e., participation in forest extractivism) is identified that would have been missed by standard logistic regression. Given the numerous diagnostic methods employed by malaria researchers and the ubiquitous use of logistic regression to model the results of these diagnostic tests, this paper provides critical guidelines to improve data analysis practice in the presence of misclassification error. Easy-to-use code that can be readily adapted to WinBUGS is provided, enabling straightforward implementation of the proposed Bayesian models.

  19. Vector autoregressive models: A Gini approach

    NASA Astrophysics Data System (ADS)

    Mussard, Stéphane; Ndiaye, Oumar Hamady

    2018-02-01

    In this paper, it is proven that the usual VAR models may be performed in the Gini sense, that is, on a ℓ1 metric space. The Gini regression is robust to outliers. As a consequence, when data are contaminated by extreme values, we show that semi-parametric VAR-Gini regressions may be used to obtain robust estimators. The inference about the estimators is made with the ℓ1 norm. Also, impulse response functions and Gini decompositions for prevision errors are introduced. Finally, Granger's causality tests are properly derived based on U-statistics.

  20. Independent contrasts and PGLS regression estimators are equivalent.

    PubMed

    Blomberg, Simon P; Lefevre, James G; Wells, Jessie A; Waterhouse, Mary

    2012-05-01

    We prove that the slope parameter of the ordinary least squares regression of phylogenetically independent contrasts (PICs) conducted through the origin is identical to the slope parameter of the method of generalized least squares (GLSs) regression under a Brownian motion model of evolution. This equivalence has several implications: 1. Understanding the structure of the linear model for GLS regression provides insight into when and why phylogeny is important in comparative studies. 2. The limitations of the PIC regression analysis are the same as the limitations of the GLS model. In particular, phylogenetic covariance applies only to the response variable in the regression and the explanatory variable should be regarded as fixed. Calculation of PICs for explanatory variables should be treated as a mathematical idiosyncrasy of the PIC regression algorithm. 3. Since the GLS estimator is the best linear unbiased estimator (BLUE), the slope parameter estimated using PICs is also BLUE. 4. If the slope is estimated using different branch lengths for the explanatory and response variables in the PIC algorithm, the estimator is no longer the BLUE, so this is not recommended. Finally, we discuss whether or not and how to accommodate phylogenetic covariance in regression analyses, particularly in relation to the problem of phylogenetic uncertainty. This discussion is from both frequentist and Bayesian perspectives.

  1. Fusing Data Mining, Machine Learning and Traditional Statistics to Detect Biomarkers Associated with Depression

    PubMed Central

    Dipnall, Joanna F.

    2016-01-01

    Background Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study. Methods The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009–2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators. Results After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001). Conclusion The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin. PMID:26848571

  2. Fusing Data Mining, Machine Learning and Traditional Statistics to Detect Biomarkers Associated with Depression.

    PubMed

    Dipnall, Joanna F; Pasco, Julie A; Berk, Michael; Williams, Lana J; Dodd, Seetal; Jacka, Felice N; Meyer, Denny

    2016-01-01

    Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study. The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009-2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators. After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001). The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin.

  3. Latent Transition Analysis of Pre-Service Teachers' Efficacy in Mathematics and Science

    ERIC Educational Resources Information Center

    Ward, Elizabeth Kennedy

    2009-01-01

    This study modeled changes in pre-service teacher efficacy in mathematics and science over the course of the final year of teacher preparation using latent transition analysis (LTA), a longitudinal form of analysis that builds on two modeling traditions (latent class analysis (LCA) and auto-regressive modeling). Data were collected using the…

  4. An investigation on fatality of drivers in vehicle-fixed object accidents on expressways in China: Using multinomial logistic regression model.

    PubMed

    Peng, Yong; Peng, Shuangling; Wang, Xinghua; Tan, Shiyang

    2018-06-01

    This study aims to identify the effects of characteristics of vehicle, roadway, driver, and environment on fatality of drivers in vehicle-fixed object accidents on expressways in Changsha-Zhuzhou-Xiangtan district of Hunan province in China by developing multinomial logistic regression models. For this purpose, 121 vehicle-fixed object accidents from 2011-2017 are included in the modeling process. First, descriptive statistical analysis is made to understand the main characteristics of the vehicle-fixed object crashes. Then, 19 explanatory variables are selected, and correlation analysis of each two variables is conducted to choose the variables to be concluded. Finally, five multinomial logistic regression models including different independent variables are compared, and the model with best fitting and prediction capability is chosen as the final model. The results showed that the turning direction in avoiding fixed objects raised the possibility that drivers would die. About 64% of drivers died in the accident were found being ejected out of the car, of which 50% did not use a seatbelt before the fatal accidents. Drivers are likely to die when they encounter bad weather on the expressway. Drivers with less than 10 years of driving experience are more likely to die in these accidents. Fatigue or distracted driving is also a significant factor in fatality of drivers. Findings from this research provide an insight into reducing fatality of drivers in vehicle-fixed object accidents.

  5. Models for predicting the mass of lime fruits by some engineering properties.

    PubMed

    Miraei Ashtiani, Seyed-Hassan; Baradaran Motie, Jalal; Emadi, Bagher; Aghkhani, Mohammad-Hosein

    2014-11-01

    Grading fruits based on mass is important in packaging and reduces the waste, also increases the marketing value of agricultural produce. The aim of this study was mass modeling of two major cultivars of Iranian limes based on engineering attributes. Models were classified into three: 1-Single and multiple variable regressions of lime mass and dimensional characteristics. 2-Single and multiple variable regressions of lime mass and projected areas. 3-Single regression of lime mass based on its actual volume and calculated volume assumed as ellipsoid and prolate spheroid shapes. All properties considered in the current study were found to be statistically significant (ρ < 0.01). The results indicated that mass modeling of lime based on minor diameter and first projected area are the most appropriate models in the first and the second classifications, respectively. In third classification, the best model was obtained on the basis of the prolate spheroid volume. It was finally concluded that the suitable grading system of lime mass is based on prolate spheroid volume.

  6. Simulation of parametric model towards the fixed covariate of right censored lung cancer data

    NASA Astrophysics Data System (ADS)

    Afiqah Muhamad Jamil, Siti; Asrul Affendi Abdullah, M.; Kek, Sie Long; Ridwan Olaniran, Oyebayo; Enera Amran, Syahila

    2017-09-01

    In this study, simulation procedure was applied to measure the fixed covariate of right censored data by using parametric survival model. The scale and shape parameter were modified to differentiate the analysis of parametric regression survival model. Statistically, the biases, mean biases and the coverage probability were used in this analysis. Consequently, different sample sizes were employed to distinguish the impact of parametric regression model towards right censored data with 50, 100, 150 and 200 number of sample. R-statistical software was utilised to develop the coding simulation with right censored data. Besides, the final model of right censored simulation was compared with the right censored lung cancer data in Malaysia. It was found that different values of shape and scale parameter with different sample size, help to improve the simulation strategy for right censored data and Weibull regression survival model is suitable fit towards the simulation of survival of lung cancer patients data in Malaysia.

  7. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tang, Kunkun, E-mail: ktg@illinois.edu; Inria Bordeaux – Sud-Ouest, Team Cardamom, 200 avenue de la Vieille Tour, 33405 Talence; Congedo, Pietro M.

    The Polynomial Dimensional Decomposition (PDD) is employed in this work for the global sensitivity analysis and uncertainty quantification (UQ) of stochastic systems subject to a moderate to large number of input random variables. Due to the intimate connection between the PDD and the Analysis of Variance (ANOVA) approaches, PDD is able to provide a simpler and more direct evaluation of the Sobol' sensitivity indices, when compared to the Polynomial Chaos expansion (PC). Unfortunately, the number of PDD terms grows exponentially with respect to the size of the input random vector, which makes the computational cost of standard methods unaffordable formore » real engineering applications. In order to address the problem of the curse of dimensionality, this work proposes essentially variance-based adaptive strategies aiming to build a cheap meta-model (i.e. surrogate model) by employing the sparse PDD approach with its coefficients computed by regression. Three levels of adaptivity are carried out in this paper: 1) the truncated dimensionality for ANOVA component functions, 2) the active dimension technique especially for second- and higher-order parameter interactions, and 3) the stepwise regression approach designed to retain only the most influential polynomials in the PDD expansion. During this adaptive procedure featuring stepwise regressions, the surrogate model representation keeps containing few terms, so that the cost to resolve repeatedly the linear systems of the least-squares regression problem is negligible. The size of the finally obtained sparse PDD representation is much smaller than the one of the full expansion, since only significant terms are eventually retained. Consequently, a much smaller number of calls to the deterministic model is required to compute the final PDD coefficients.« less

  8. ToxiM: A Toxicity Prediction Tool for Small Molecules Developed Using Machine Learning and Chemoinformatics Approaches.

    PubMed

    Sharma, Ashok K; Srivastava, Gopal N; Roy, Ankita; Sharma, Vineet K

    2017-01-01

    The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84-0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better ( R 2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better ( R 2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules.

  9. ToxiM: A Toxicity Prediction Tool for Small Molecules Developed Using Machine Learning and Chemoinformatics Approaches

    PubMed Central

    Sharma, Ashok K.; Srivastava, Gopal N.; Roy, Ankita; Sharma, Vineet K.

    2017-01-01

    The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84–0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better (R2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better (R2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules. PMID:29249969

  10. New robust statistical procedures for the polytomous logistic regression models.

    PubMed

    Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro

    2018-05-17

    This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.

  11. Adaptive surrogate modeling by ANOVA and sparse polynomial dimensional decomposition for global sensitivity analysis in fluid simulation

    NASA Astrophysics Data System (ADS)

    Tang, Kunkun; Congedo, Pietro M.; Abgrall, Rémi

    2016-06-01

    The Polynomial Dimensional Decomposition (PDD) is employed in this work for the global sensitivity analysis and uncertainty quantification (UQ) of stochastic systems subject to a moderate to large number of input random variables. Due to the intimate connection between the PDD and the Analysis of Variance (ANOVA) approaches, PDD is able to provide a simpler and more direct evaluation of the Sobol' sensitivity indices, when compared to the Polynomial Chaos expansion (PC). Unfortunately, the number of PDD terms grows exponentially with respect to the size of the input random vector, which makes the computational cost of standard methods unaffordable for real engineering applications. In order to address the problem of the curse of dimensionality, this work proposes essentially variance-based adaptive strategies aiming to build a cheap meta-model (i.e. surrogate model) by employing the sparse PDD approach with its coefficients computed by regression. Three levels of adaptivity are carried out in this paper: 1) the truncated dimensionality for ANOVA component functions, 2) the active dimension technique especially for second- and higher-order parameter interactions, and 3) the stepwise regression approach designed to retain only the most influential polynomials in the PDD expansion. During this adaptive procedure featuring stepwise regressions, the surrogate model representation keeps containing few terms, so that the cost to resolve repeatedly the linear systems of the least-squares regression problem is negligible. The size of the finally obtained sparse PDD representation is much smaller than the one of the full expansion, since only significant terms are eventually retained. Consequently, a much smaller number of calls to the deterministic model is required to compute the final PDD coefficients.

  12. Modeling and managing risk early in software development

    NASA Technical Reports Server (NTRS)

    Briand, Lionel C.; Thomas, William M.; Hetmanski, Christopher J.

    1993-01-01

    In order to improve the quality of the software development process, we need to be able to build empirical multivariate models based on data collectable early in the software process. These models need to be both useful for prediction and easy to interpret, so that remedial actions may be taken in order to control and optimize the development process. We present an automated modeling technique which can be used as an alternative to regression techniques. We show how it can be used to facilitate the identification and aid the interpretation of the significant trends which characterize 'high risk' components in several Ada systems. Finally, we evaluate the effectiveness of our technique based on a comparison with logistic regression based models.

  13. Family and school environmental predictors of sleep bruxism in children.

    PubMed

    Rossi, Debora; Manfredini, Daniele

    2013-01-01

    To identify potential predictors of self-reported sleep bruxism (SB) within children's family and school environments. A total of 65 primary school children (55.4% males, mean age 9.3 ± 1.9 years) were administered a 10-item questionnaire investigating the prevalence of self-reported SB as well as nine family and school-related potential bruxism predictors. Regression analyses were performed to assess the correlation between the potential predictors and SB. A positive answer to the self-reported SB item was endorsed by 18.8% of subjects, with no sex differences. Multiple variable regression analysis identified a final model showing that having divorced parents and not falling asleep easily were the only two weak predictors of self-reported SB. The percentage of explained variance for SB by the final multiple regression model was 13.3% (Nagelkerke's R² = 0.133). While having a high specificity and a good negative predictive value, the model showed unacceptable sensitivity and positive predictive values. The resulting accuracy to predict the presence of self-reported SB was 73.8%. The present investigation suggested that, among family and school-related matters, having divorced parents and not falling asleep easily were two predictors, even if weak, of a child's self-report of SB.

  14. The effect of service satisfaction and spiritual well-being on the quality of life of patients with schizophrenia.

    PubMed

    Lanfredi, Mariangela; Candini, Valentina; Buizza, Chiara; Ferrari, Clarissa; Boero, Maria E; Giobbio, Gian M; Goldschmidt, Nicoletta; Greppo, Stefania; Iozzino, Laura; Maggi, Paolo; Melegari, Anna; Pasqualetti, Patrizio; Rossi, Giuseppe; de Girolamo, Giovanni

    2014-05-15

    Quality of life (QOL) has been considered an important outcome measure in psychiatric research and determinants of QOL have been widely investigated. We aimed at detecting predictors of QOL at baseline and at testing the longitudinal interrelations of the baseline predictors with QOL scores at a 1-year follow-up in a sample of patients living in Residential Facilities (RFs). Logistic regression models were adopted to evaluate the association between WHOQoL-Bref scores and potential determinants of QOL. In addition, all variables significantly associated with QOL domains in the final logistic regression model were included by using the Structural Equation Modeling (SEM). We included 139 patients with a diagnosis of schizophrenia spectrum. In the final logistic regression model level of activity, social support, age, service satisfaction, spiritual well-being and symptoms' severity were identified as predictors of QOL scores at baseline. Longitudinal analyses carried out by SEM showed that 40% of QOL follow-up variability was explained by QOL at baseline, and significant indirect effects toward QOL at follow-up were found for satisfaction with services and for social support. Rehabilitation plans for people with schizophrenia living in RFs should also consider mediators of change in subjective QOL such as satisfaction with mental health services. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  15. Carbon emissions risk map from deforestation in the tropical Amazon

    NASA Astrophysics Data System (ADS)

    Ometto, J.; Soler, L. S.; Assis, T. D.; Oliveira, P. V.; Aguiar, A. P.

    2011-12-01

    Assis, Pedro Valle This work aims to estimate the carbon emissions from tropical deforestation in the Brazilian Amazon associated to the risk assessment of future land use change. The emissions are estimated by incorporating temporal deforestation dynamics, accounting for the biophysical and socioeconomic heterogeneity in the region, as well secondary forest growth dynamic in abandoned areas. The land cover change model that supported the risk assessment of deforestation, was run based on linear regressions. This method takes into account spatial heterogeneity of deforestation as the spatial variables adopted to fit the final regression model comprise: environmental aspects, economic attractiveness, accessibility and land tenure structure. After fitting a suitable regression models for each land cover category, the potential of each cell to be deforested (25x25km and 5x5 km of resolution) in the near future was used to calculate the risk assessment of land cover change. The carbon emissions model combines high-resolution new forest clear-cut mapping and four alternative sources of spatial information on biomass distribution for different vegetation types. The risk assessment map of CO2 emissions, was obtained by crossing the simulation results of the historical land cover changes to a map of aboveground biomass contained in the remaining forest. This final map represents the risk of CO2 emissions at 25x25km and 5x5 km until 2020, under a scenario of carbon emission reduction target.

  16. [Local Regression Algorithm Based on Net Analyte Signal and Its Application in Near Infrared Spectral Analysis].

    PubMed

    Zhang, Hong-guang; Lu, Jian-gang

    2016-02-01

    Abstract To overcome the problems of significant difference among samples and nonlinearity between the property and spectra of samples in spectral quantitative analysis, a local regression algorithm is proposed in this paper. In this algorithm, net signal analysis method(NAS) was firstly used to obtain the net analyte signal of the calibration samples and unknown samples, then the Euclidean distance between net analyte signal of the sample and net analyte signal of calibration samples was calculated and utilized as similarity index. According to the defined similarity index, the local calibration sets were individually selected for each unknown sample. Finally, a local PLS regression model was built on each local calibration sets for each unknown sample. The proposed method was applied to a set of near infrared spectra of meat samples. The results demonstrate that the prediction precision and model complexity of the proposed method are superior to global PLS regression method and conventional local regression algorithm based on spectral Euclidean distance.

  17. Fatigue design of a cellular phone folder using regression model-based multi-objective optimization

    NASA Astrophysics Data System (ADS)

    Kim, Young Gyun; Lee, Jongsoo

    2016-08-01

    In a folding cellular phone, the folding device is repeatedly opened and closed by the user, which eventually results in fatigue damage, particularly to the front of the folder. Hence, it is important to improve the safety and endurance of the folder while also reducing its weight. This article presents an optimal design for the folder front that maximizes its fatigue endurance while minimizing its thickness. Design data for analysis and optimization were obtained experimentally using a test jig. Multi-objective optimization was carried out using a nonlinear regression model. Three regression methods were employed: back-propagation neural networks, logistic regression and support vector machines. The AdaBoost ensemble technique was also used to improve the approximation. Two-objective Pareto-optimal solutions were identified using the non-dominated sorting genetic algorithm (NSGA-II). Finally, a numerically optimized solution was validated against experimental product data, in terms of both fatigue endurance and thickness index.

  18. Schistosomiasis Breeding Environment Situation Analysis in Dongting Lake Area

    NASA Astrophysics Data System (ADS)

    Li, Chuanrong; Jia, Yuanyuan; Ma, Lingling; Liu, Zhaoyan; Qian, Yonggang

    2013-01-01

    Monitoring environmental characteristics, such as vegetation, soil moisture et al., of Oncomelania hupensis (O. hupensis)’ spatial/temporal distribution is of vital importance to the schistosomiasis prevention and control. In this study, the relationship between environmental factors derived from remotely sensed data and the density of O. hupensis was analyzed by a multiple linear regression model. Secondly, spatial analysis of the regression residual was investigated by the semi-variogram method. Thirdly, spatial analysis of the regression residual and the multiple linear regression model were both employed to estimate the spatial variation of O. hupensis density. Finally, the approach was used to monitor and predict the spatial and temporal variations of oncomelania of Dongting Lake region, China. And the areas of potential O. hupensis habitats were predicted and the influence of Three Gorges Dam (TGB)project on the density of O. hupensis was analyzed.

  19. Discriminative least squares regression for multiclass classification and feature selection.

    PubMed

    Xiang, Shiming; Nie, Feiping; Meng, Gaofeng; Pan, Chunhong; Zhang, Changshui

    2012-11-01

    This paper presents a framework of discriminative least squares regression (LSR) for multiclass classification and feature selection. The core idea is to enlarge the distance between different classes under the conceptual framework of LSR. First, a technique called ε-dragging is introduced to force the regression targets of different classes moving along opposite directions such that the distances between classes can be enlarged. Then, the ε-draggings are integrated into the LSR model for multiclass classification. Our learning framework, referred to as discriminative LSR, has a compact model form, where there is no need to train two-class machines that are independent of each other. With its compact form, this model can be naturally extended for feature selection. This goal is achieved in terms of L2,1 norm of matrix, generating a sparse learning model for feature selection. The model for multiclass classification and its extension for feature selection are finally solved elegantly and efficiently. Experimental evaluation over a range of benchmark datasets indicates the validity of our method.

  20. POWER PRIOR DISTRIBUTIONS FOR REGRESSION MODELS. (R824757)

    EPA Science Inventory

    The perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Concl...

  1. FIRE: an SPSS program for variable selection in multiple linear regression analysis via the relative importance of predictors.

    PubMed

    Lorenzo-Seva, Urbano; Ferrando, Pere J

    2011-03-01

    We provide an SPSS program that implements currently recommended techniques and recent developments for selecting variables in multiple linear regression analysis via the relative importance of predictors. The approach consists of: (1) optimally splitting the data for cross-validation, (2) selecting the final set of predictors to be retained in the equation regression, and (3) assessing the behavior of the chosen model using standard indices and procedures. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.

  2. Bootstrap Prediction Intervals in Non-Parametric Regression with Applications to Anomaly Detection

    NASA Technical Reports Server (NTRS)

    Kumar, Sricharan; Srivistava, Ashok N.

    2012-01-01

    Prediction intervals provide a measure of the probable interval in which the outputs of a regression model can be expected to occur. Subsequently, these prediction intervals can be used to determine if the observed output is anomalous or not, conditioned on the input. In this paper, a procedure for determining prediction intervals for outputs of nonparametric regression models using bootstrap methods is proposed. Bootstrap methods allow for a non-parametric approach to computing prediction intervals with no specific assumptions about the sampling distribution of the noise or the data. The asymptotic fidelity of the proposed prediction intervals is theoretically proved. Subsequently, the validity of the bootstrap based prediction intervals is illustrated via simulations. Finally, the bootstrap prediction intervals are applied to the problem of anomaly detection on aviation data.

  3. Deconvolution single shot multibox detector for supermarket commodity detection and classification

    NASA Astrophysics Data System (ADS)

    Li, Dejian; Li, Jian; Nie, Binling; Sun, Shouqian

    2017-07-01

    This paper proposes an image detection model to detect and classify supermarkets shelves' commodity. Based on the principle of the features directly affects the accuracy of the final classification, feature maps are performed to combine high level features with bottom level features. Then set some fixed anchors on those feature maps, finally the label and the position of commodity is generated by doing a box regression and classification. In this work, we proposed a model named Deconvolutiuon Single Shot MultiBox Detector, we evaluated the model using 300 images photographed from real supermarket shelves. Followed the same protocol in other recent methods, the results showed that our model outperformed other baseline methods.

  4. The Effect of Attending Tutoring on Course Grades in Calculus I

    ERIC Educational Resources Information Center

    Rickard, Brian; Mills, Melissa

    2018-01-01

    Tutoring centres are common in universities in the United States, but there are few published studies that statistically examine the effects of tutoring on student success. This study utilizes multiple regression analysis to model the effect of tutoring attendance on final course grades in Calculus I. Our model predicted that every three visits to…

  5. The Multivariate Regression Statistics Strategy to Investigate Content-Effect Correlation of Multiple Components in Traditional Chinese Medicine Based on a Partial Least Squares Method.

    PubMed

    Peng, Ying; Li, Su-Ning; Pei, Xuexue; Hao, Kun

    2018-03-01

    Amultivariate regression statisticstrategy was developed to clarify multi-components content-effect correlation ofpanaxginseng saponins extract and predict the pharmacological effect by components content. In example 1, firstly, we compared pharmacological effects between panax ginseng saponins extract and individual saponin combinations. Secondly, we examined the anti-platelet aggregation effect in seven different saponin combinations of ginsenoside Rb1, Rg1, Rh, Rd, Ra3 and notoginsenoside R1. Finally, the correlation between anti-platelet aggregation and the content of multiple components was analyzed by a partial least squares algorithm. In example 2, firstly, 18 common peaks were identified in ten different batches of panax ginseng saponins extracts from different origins. Then, we investigated the anti-myocardial ischemia reperfusion injury effects of the ten different panax ginseng saponins extracts. Finally, the correlation between the fingerprints and the cardioprotective effects was analyzed by a partial least squares algorithm. Both in example 1 and 2, the relationship between the components content and pharmacological effect was modeled well by the partial least squares regression equations. Importantly, the predicted effect curve was close to the observed data of dot marked on the partial least squares regression model. This study has given evidences that themulti-component content is a promising information for predicting the pharmacological effects of traditional Chinese medicine.

  6. A framework for longitudinal data analysis via shape regression

    NASA Astrophysics Data System (ADS)

    Fishbaugh, James; Durrleman, Stanley; Piven, Joseph; Gerig, Guido

    2012-02-01

    Traditional longitudinal analysis begins by extracting desired clinical measurements, such as volume or head circumference, from discrete imaging data. Typically, the continuous evolution of a scalar measurement is estimated by choosing a 1D regression model, such as kernel regression or fitting a polynomial of fixed degree. This type of analysis not only leads to separate models for each measurement, but there is no clear anatomical or biological interpretation to aid in the selection of the appropriate paradigm. In this paper, we propose a consistent framework for the analysis of longitudinal data by estimating the continuous evolution of shape over time as twice differentiable flows of deformations. In contrast to 1D regression models, one model is chosen to realistically capture the growth of anatomical structures. From the continuous evolution of shape, we can simply extract any clinical measurements of interest. We demonstrate on real anatomical surfaces that volume extracted from a continuous shape evolution is consistent with a 1D regression performed on the discrete measurements. We further show how the visualization of shape progression can aid in the search for significant measurements. Finally, we present an example on a shape complex of the brain (left hemisphere, right hemisphere, cerebellum) that demonstrates a potential clinical application for our framework.

  7. Estimating Soil Cation Exchange Capacity from Soil Physical and Chemical Properties

    NASA Astrophysics Data System (ADS)

    Bateni, S. M.; Emamgholizadeh, S.; Shahsavani, D.

    2014-12-01

    The soil Cation Exchange Capacity (CEC) is an important soil characteristic that has many applications in soil science and environmental studies. For example, CEC influences soil fertility by controlling the exchange of ions in the soil. Measurement of CEC is costly and difficult. Consequently, several studies attempted to obtain CEC from readily measurable soil physical and chemical properties such as soil pH, organic matter, soil texture, bulk density, and particle size distribution. These studies have often used multiple regression or artificial neural network models. Regression-based models cannot capture the intricate relationship between CEC and soil physical and chemical attributes and provide inaccurate CEC estimates. Although neural network models perform better than regression methods, they act like a black-box and cannot generate an explicit expression for retrieval of CEC from soil properties. In a departure with regression and neural network models, this study uses Genetic Expression Programming (GEP) and Multivariate Adaptive Regression Splines (MARS) to estimate CEC from easily measurable soil variables such as clay, pH, and OM. CEC estimates from GEP and MARS are compared with measurements at two field sites in Iran. Results show that GEP and MARS can estimate CEC accurately. Also, the MARS model performs slightly better than GEP. Finally, a sensitivity test indicates that organic matter and pH have respectively the least and the most significant impact on CEC.

  8. The effect of attending tutoring on course grades in Calculus I

    NASA Astrophysics Data System (ADS)

    Rickard, Brian; Mills, Melissa

    2018-04-01

    Tutoring centres are common in universities in the United States, but there are few published studies that statistically examine the effects of tutoring on student success. This study utilizes multiple regression analysis to model the effect of tutoring attendance on final course grades in Calculus I. Our model predicted that every three visits to the tutoring centre is correlated with an increase of a students' course grade by one per cent, after controlling for prior academic ability. We also found that for lower-achieving students, attending tutoring had a greater impact on final grades.

  9. Estimating severity of sideways fall using a generic multi linear regression model based on kinematic input variables.

    PubMed

    van der Zijden, A M; Groen, B E; Tanck, E; Nienhuis, B; Verdonschot, N; Weerdesteyn, V

    2017-03-21

    Many research groups have studied fall impact mechanics to understand how fall severity can be reduced to prevent hip fractures. Yet, direct impact force measurements with force plates are restricted to a very limited repertoire of experimental falls. The purpose of this study was to develop a generic model for estimating hip impact forces (i.e. fall severity) in in vivo sideways falls without the use of force plates. Twelve experienced judokas performed sideways Martial Arts (MA) and Block ('natural') falls on a force plate, both with and without a mat on top. Data were analyzed to determine the hip impact force and to derive 11 selected (subject-specific and kinematic) variables. Falls from kneeling height were used to perform a stepwise regression procedure to assess the effects of these input variables and build the model. The final model includes four input variables, involving one subject-specific measure and three kinematic variables: maximum upper body deceleration, body mass, shoulder angle at the instant of 'maximum impact' and maximum hip deceleration. The results showed that estimated and measured hip impact forces were linearly related (explained variances ranging from 46 to 63%). Hip impact forces of MA falls onto the mat from a standing position (3650±916N) estimated by the final model were comparable with measured values (3698±689N), even though these data were not used for training the model. In conclusion, a generic linear regression model was developed that enables the assessment of fall severity through kinematic measures of sideways falls, without using force plates. Copyright © 2017 Elsevier Ltd. All rights reserved.

  10. A computational approach to compare regression modelling strategies in prediction research.

    PubMed

    Pajouheshnia, Romin; Pestman, Wiebe R; Teerenstra, Steven; Groenwold, Rolf H H

    2016-08-25

    It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in clinical prediction research. A framework for comparing logistic regression modelling strategies by their likelihoods was formulated using a wrapper approach. Five different strategies for modelling, including simple shrinkage methods, were compared in four empirical data sets to illustrate the concept of a priori strategy comparison. Simulations were performed in both randomly generated data and empirical data to investigate the influence of data characteristics on strategy performance. We applied the comparison framework in a case study setting. Optimal strategies were selected based on the results of a priori comparisons in a clinical data set and the performance of models built according to each strategy was assessed using the Brier score and calibration plots. The performance of modelling strategies was highly dependent on the characteristics of the development data in both linear and logistic regression settings. A priori comparisons in four empirical data sets found that no strategy consistently outperformed the others. The percentage of times that a model adjustment strategy outperformed a logistic model ranged from 3.9 to 94.9 %, depending on the strategy and data set. However, in our case study setting the a priori selection of optimal methods did not result in detectable improvement in model performance when assessed in an external data set. The performance of prediction modelling strategies is a data-dependent process and can be highly variable between data sets within the same clinical domain. A priori strategy comparison can be used to determine an optimal logistic regression modelling strategy for a given data set before selecting a final modelling approach.

  11. A kinetic model of municipal sludge degradation during non-catalytic wet oxidation.

    PubMed

    Prince-Pike, Arrian; Wilson, David I; Baroutian, Saeid; Andrews, John; Gapes, Daniel J

    2015-12-15

    Wet oxidation is a successful process for the treatment of municipal sludge. In addition, the resulting effluent from wet oxidation is a useful carbon source for subsequent biological nutrient removal processes in wastewater treatment. Owing to limitations with current kinetic models, this study produced a kinetic model which predicts the concentrations of key intermediate components during wet oxidation. The model was regressed from lab-scale experiments and then subsequently validated using data from a wet oxidation pilot plant. The model was shown to be accurate in predicting the concentrations of each component, and produced good results when applied to a plant 500 times larger in size. A statistical study was undertaken to investigate the validity of the regressed model parameters. Finally the usefulness of the model was demonstrated by suggesting optimum operating conditions such that volatile fatty acids were maximised. Copyright © 2015 Elsevier Ltd. All rights reserved.

  12. A hybrid sales forecasting scheme by combining independent component analysis with K-means clustering and support vector regression.

    PubMed

    Lu, Chi-Jie; Chang, Chi-Chang

    2014-01-01

    Sales forecasting plays an important role in operating a business since it can be used to determine the required inventory level to meet consumer demand and avoid the problem of under/overstocking. Improving the accuracy of sales forecasting has become an important issue of operating a business. This study proposes a hybrid sales forecasting scheme by combining independent component analysis (ICA) with K-means clustering and support vector regression (SVR). The proposed scheme first uses the ICA to extract hidden information from the observed sales data. The extracted features are then applied to K-means algorithm for clustering the sales data into several disjoined clusters. Finally, the SVR forecasting models are applied to each group to generate final forecasting results. Experimental results from information technology (IT) product agent sales data reveal that the proposed sales forecasting scheme outperforms the three comparison models and hence provides an efficient alternative for sales forecasting.

  13. A Hybrid Sales Forecasting Scheme by Combining Independent Component Analysis with K-Means Clustering and Support Vector Regression

    PubMed Central

    2014-01-01

    Sales forecasting plays an important role in operating a business since it can be used to determine the required inventory level to meet consumer demand and avoid the problem of under/overstocking. Improving the accuracy of sales forecasting has become an important issue of operating a business. This study proposes a hybrid sales forecasting scheme by combining independent component analysis (ICA) with K-means clustering and support vector regression (SVR). The proposed scheme first uses the ICA to extract hidden information from the observed sales data. The extracted features are then applied to K-means algorithm for clustering the sales data into several disjoined clusters. Finally, the SVR forecasting models are applied to each group to generate final forecasting results. Experimental results from information technology (IT) product agent sales data reveal that the proposed sales forecasting scheme outperforms the three comparison models and hence provides an efficient alternative for sales forecasting. PMID:25045738

  14. Analytic model for the long-term evolution of circular Earth satellite orbits including lunar node regression

    NASA Astrophysics Data System (ADS)

    Zhu, Ting-Lei; Zhao, Chang-Yin; Zhang, Ming-Jiang

    2017-04-01

    This paper aims to obtain an analytic approximation to the evolution of circular orbits governed by the Earth's J2 and the luni-solar gravitational perturbations. Assuming that the lunar orbital plane coincides with the ecliptic plane, Allan and Cook (Proc. R. Soc. A, Math. Phys. Eng. Sci. 280(1380):97, 1964) derived an analytic solution to the orbital plane evolution of circular orbits. Using their result as an intermediate solution, we establish an approximate analytic model with lunar orbital inclination and its node regression be taken into account. Finally, an approximate analytic expression is derived, which is accurate compared to the numerical results except for the resonant cases when the period of the reference orbit approximately equals the integer multiples (especially 1 or 2 times) of lunar node regression period.

  15. Integration of logistic regression, Markov chain and cellular automata models to simulate urban expansion

    NASA Astrophysics Data System (ADS)

    Jokar Arsanjani, Jamal; Helbich, Marco; Kainz, Wolfgang; Darvishi Boloorani, Ali

    2013-04-01

    This research analyses the suburban expansion in the metropolitan area of Tehran, Iran. A hybrid model consisting of logistic regression model, Markov chain (MC), and cellular automata (CA) was designed to improve the performance of the standard logistic regression model. Environmental and socio-economic variables dealing with urban sprawl were operationalised to create a probability surface of spatiotemporal states of built-up land use for the years 2006, 2016, and 2026. For validation, the model was evaluated by means of relative operating characteristic values for different sets of variables. The approach was calibrated for 2006 by cross comparing of actual and simulated land use maps. The achieved outcomes represent a match of 89% between simulated and actual maps of 2006, which was satisfactory to approve the calibration process. Thereafter, the calibrated hybrid approach was implemented for forthcoming years. Finally, future land use maps for 2016 and 2026 were predicted by means of this hybrid approach. The simulated maps illustrate a new wave of suburban development in the vicinity of Tehran at the western border of the metropolis during the next decades.

  16. Locally-constrained Boundary Regression for Segmentation of Prostate and Rectum in the Planning CT Images

    PubMed Central

    Shao, Yeqin; Gao, Yaozong; Wang, Qian; Yang, Xin; Shen, Dinggang

    2015-01-01

    Automatic and accurate segmentation of the prostate and rectum in planning CT images is a challenging task due to low image contrast, unpredictable organ (relative) position, and uncertain existence of bowel gas across different patients. Recently, regression forest was adopted for organ deformable segmentation on 2D medical images by training one landmark detector for each point on the shape model. However, it seems impractical for regression forest to guide 3D deformable segmentation as a landmark detector, due to large number of vertices in the 3D shape model as well as the difficulty in building accurate 3D vertex correspondence for each landmark detector. In this paper, we propose a novel boundary detection method by exploiting the power of regression forest for prostate and rectum segmentation. The contributions of this paper are as follows: 1) we introduce regression forest as a local boundary regressor to vote the entire boundary of a target organ, which avoids training a large number of landmark detectors and building an accurate 3D vertex correspondence for each landmark detector; 2) an auto-context model is integrated with regression forest to improve the accuracy of the boundary regression; 3) we further combine a deformable segmentation method with the proposed local boundary regressor for the final organ segmentation by integrating organ shape priors. Our method is evaluated on a planning CT image dataset with 70 images from 70 different patients. The experimental results show that our proposed boundary regression method outperforms the conventional boundary classification method in guiding the deformable model for prostate and rectum segmentations. Compared with other state-of-the-art methods, our method also shows a competitive performance. PMID:26439938

  17. Methods for estimating annual exceedance probability discharges for streams in Arkansas, based on data through water year 2013

    USGS Publications Warehouse

    Wagner, Daniel M.; Krieger, Joshua D.; Veilleux, Andrea G.

    2016-08-04

    In 2013, the U.S. Geological Survey initiated a study to update regional skew, annual exceedance probability discharges, and regional regression equations used to estimate annual exceedance probability discharges for ungaged locations on streams in the study area with the use of recent geospatial data, new analytical methods, and available annual peak-discharge data through the 2013 water year. An analysis of regional skew using Bayesian weighted least-squares/Bayesian generalized-least squares regression was performed for Arkansas, Louisiana, and parts of Missouri and Oklahoma. The newly developed constant regional skew of -0.17 was used in the computation of annual exceedance probability discharges for 281 streamgages used in the regional regression analysis. Based on analysis of covariance, four flood regions were identified for use in the generation of regional regression models. Thirty-nine basin characteristics were considered as potential explanatory variables, and ordinary least-squares regression techniques were used to determine the optimum combinations of basin characteristics for each of the four regions. Basin characteristics in candidate models were evaluated based on multicollinearity with other basin characteristics (variance inflation factor < 2.5) and statistical significance at the 95-percent confidence level (p ≤ 0.05). Generalized least-squares regression was used to develop the final regression models for each flood region. Average standard errors of prediction of the generalized least-squares models ranged from 32.76 to 59.53 percent, with the largest range in flood region D. Pseudo coefficients of determination of the generalized least-squares models ranged from 90.29 to 97.28 percent, with the largest range also in flood region D. The regional regression equations apply only to locations on streams in Arkansas where annual peak discharges are not substantially affected by regulation, diversion, channelization, backwater, or urbanization. The applicability and accuracy of the regional regression equations depend on the basin characteristics measured for an ungaged location on a stream being within range of those used to develop the equations.

  18. Post-processing through linear regression

    NASA Astrophysics Data System (ADS)

    van Schaeybroeck, B.; Vannitsem, S.

    2011-03-01

    Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS) method, a new time-dependent Tikhonov regularization (TDTR) method, the total least-square method, a new geometric-mean regression (GM), a recently introduced error-in-variables (EVMOS) method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified. These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise). At long lead times the regression schemes (EVMOS, TDTR) which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.

  19. Logistic regression for risk factor modelling in stuttering research.

    PubMed

    Reed, Phil; Wu, Yaqionq

    2013-06-01

    To outline the uses of logistic regression and other statistical methods for risk factor analysis in the context of research on stuttering. The principles underlying the application of a logistic regression are illustrated, and the types of questions to which such a technique has been applied in the stuttering field are outlined. The assumptions and limitations of the technique are discussed with respect to existing stuttering research, and with respect to formulating appropriate research strategies to accommodate these considerations. Finally, some alternatives to the approach are briefly discussed. The way the statistical procedures are employed are demonstrated with some hypothetical data. Research into several practical issues concerning stuttering could benefit if risk factor modelling were used. Important examples are early diagnosis, prognosis (whether a child will recover or persist) and assessment of treatment outcome. After reading this article you will: (a) Summarize the situations in which logistic regression can be applied to a range of issues about stuttering; (b) Follow the steps in performing a logistic regression analysis; (c) Describe the assumptions of the logistic regression technique and the precautions that need to be checked when it is employed; (d) Be able to summarize its advantages over other techniques like estimation of group differences and simple regression. Copyright © 2012 Elsevier Inc. All rights reserved.

  20. Household water treatment in developing countries: comparing different intervention types using meta-regression.

    PubMed

    Hunter, Paul R

    2009-12-01

    Household water treatment (HWT) is being widely promoted as an appropriate intervention for reducing the burden of waterborne disease in poor communities in developing countries. A recent study has raised concerns about the effectiveness of HWT, in part because of concerns over the lack of blinding and in part because of considerable heterogeneity in the reported effectiveness of randomized controlled trials. This study set out to attempt to investigate the causes of this heterogeneity and so identify factors associated with good health gains. Studies identified in an earlier systematic review and meta-analysis were supplemented with more recently published randomized controlled trials. A total of 28 separate studies of randomized controlled trials of HWT with 39 intervention arms were included in the analysis. Heterogeneity was studied using the "metareg" command in Stata. Initial analyses with single candidate predictors were undertaken and all variables significant at the P < 0.2 level were included in a final regression model. Further analyses were done to estimate the effect of the interventions over time by MonteCarlo modeling using @Risk and the parameter estimates from the final regression model. The overall effect size of all unblinded studies was relative risk = 0.56 (95% confidence intervals 0.51-0.63), but after adjusting for bias due to lack of blinding the effect size was much lower (RR = 0.85, 95% CI = 0.76-0.97). Four main variables were significant predictors of effectiveness of intervention in a multipredictor meta regression model: Log duration of study follow-up (regression coefficient of log effect size = 0.186, standard error (SE) = 0.072), whether or not the study was blinded (coefficient 0.251, SE 0.066) and being conducted in an emergency setting (coefficient -0.351, SE 0.076) were all significant predictors of effect size in the final model. Compared to the ceramic filter all other interventions were much less effective (Biosand 0.247, 0.073; chlorine and safe waste storage 0.295, 0.061; combined coagulant-chlorine 0.2349, 0.067; SODIS 0.302, 0.068). A Monte Carlo model predicted that over 12 months ceramic filters were likely to be still effective at reducing disease, whereas SODIS, chlorination, and coagulation-chlorination had little if any benefit. Indeed these three interventions are predicted to have the same or less effect than what may be expected due purely to reporting bias in unblinded studies With the currently available evidence ceramic filters are the most effective form of HWT in the longterm, disinfection-only interventions including SODIS appear to have poor if any longterm public health benefit.

  1. Individualized Prediction of Heat Stress in Firefighters: A Data-Driven Approach Using Classification and Regression Trees.

    PubMed

    Mani, Ashutosh; Rao, Marepalli; James, Kelley; Bhattacharya, Amit

    2015-01-01

    The purpose of this study was to explore data-driven models, based on decision trees, to develop practical and easy to use predictive models for early identification of firefighters who are likely to cross the threshold of hyperthermia during live-fire training. Predictive models were created for three consecutive live-fire training scenarios. The final predicted outcome was a categorical variable: will a firefighter cross the upper threshold of hyperthermia - Yes/No. Two tiers of models were built, one with and one without taking into account the outcome (whether a firefighter crossed hyperthermia or not) from the previous training scenario. First tier of models included age, baseline heart rate and core body temperature, body mass index, and duration of training scenario as predictors. The second tier of models included the outcome of the previous scenario in the prediction space, in addition to all the predictors from the first tier of models. Classification and regression trees were used independently for prediction. The response variable for the regression tree was the quantitative variable: core body temperature at the end of each scenario. The predicted quantitative variable from regression trees was compared to the upper threshold of hyperthermia (38°C) to predict whether a firefighter would enter hyperthermia. The performance of classification and regression tree models was satisfactory for the second (success rate = 79%) and third (success rate = 89%) training scenarios but not for the first (success rate = 43%). Data-driven models based on decision trees can be a useful tool for predicting physiological response without modeling the underlying physiological systems. Early prediction of heat stress coupled with proactive interventions, such as pre-cooling, can help reduce heat stress in firefighters.

  2. Quantile Regression Models for Current Status Data

    PubMed Central

    Ou, Fang-Shu; Zeng, Donglin; Cai, Jianwen

    2016-01-01

    Current status data arise frequently in demography, epidemiology, and econometrics where the exact failure time cannot be determined but is only known to have occurred before or after a known observation time. We propose a quantile regression model to analyze current status data, because it does not require distributional assumptions and the coefficients can be interpreted as direct regression effects on the distribution of failure time in the original time scale. Our model assumes that the conditional quantile of failure time is a linear function of covariates. We assume conditional independence between the failure time and observation time. An M-estimator is developed for parameter estimation which is computed using the concave-convex procedure and its confidence intervals are constructed using a subsampling method. Asymptotic properties for the estimator are derived and proven using modern empirical process theory. The small sample performance of the proposed method is demonstrated via simulation studies. Finally, we apply the proposed method to analyze data from the Mayo Clinic Study of Aging. PMID:27994307

  3. Determination of osteoporosis risk factors using a multiple logistic regression model in postmenopausal Turkish women.

    PubMed

    Akkus, Zeki; Camdeviren, Handan; Celik, Fatma; Gur, Ali; Nas, Kemal

    2005-09-01

    To determine the risk factors of osteoporosis using a multiple binary logistic regression method and to assess the risk variables for osteoporosis, which is a major and growing health problem in many countries. We presented a case-control study, consisting of 126 postmenopausal healthy women as control group and 225 postmenopausal osteoporotic women as the case group. The study was carried out in the Department of Physical Medicine and Rehabilitation, Dicle University, Diyarbakir, Turkey between 1999-2002. The data from the 351 participants were collected using a standard questionnaire that contains 43 variables. A multiple logistic regression model was then used to evaluate the data and to find the best regression model. We classified 80.1% (281/351) of the participants using the regression model. Furthermore, the specificity value of the model was 67% (84/126) of the control group while the sensitivity value was 88% (197/225) of the case group. We found the distribution of residual values standardized for final model to be exponential using the Kolmogorow-Smirnow test (p=0.193). The receiver operating characteristic curve was found successful to predict patients with risk for osteoporosis. This study suggests that low levels of dietary calcium intake, physical activity, education, and longer duration of menopause are independent predictors of the risk of low bone density in our population. Adequate dietary calcium intake in combination with maintaining a daily physical activity, increasing educational level, decreasing birth rate, and duration of breast-feeding may contribute to healthy bones and play a role in practical prevention of osteoporosis in Southeast Anatolia. In addition, the findings of the present study indicate that the use of multivariate statistical method as a multiple logistic regression in osteoporosis, which maybe influenced by many variables, is better than univariate statistical evaluation.

  4. Improved accuracy in quantitative laser-induced breakdown spectroscopy using sub-models

    USGS Publications Warehouse

    Anderson, Ryan; Clegg, Samuel M.; Frydenvang, Jens; Wiens, Roger C.; McLennan, Scott M.; Morris, Richard V.; Ehlmann, Bethany L.; Dyar, M. Darby

    2017-01-01

    Accurate quantitative analysis of diverse geologic materials is one of the primary challenges faced by the Laser-Induced Breakdown Spectroscopy (LIBS)-based ChemCam instrument on the Mars Science Laboratory (MSL) rover. The SuperCam instrument on the Mars 2020 rover, as well as other LIBS instruments developed for geochemical analysis on Earth or other planets, will face the same challenge. Consequently, part of the ChemCam science team has focused on the development of improved multivariate analysis calibrations methods. Developing a single regression model capable of accurately determining the composition of very different target materials is difficult because the response of an element’s emission lines in LIBS spectra can vary with the concentration of other elements. We demonstrate a conceptually simple “sub-model” method for improving the accuracy of quantitative LIBS analysis of diverse target materials. The method is based on training several regression models on sets of targets with limited composition ranges and then “blending” these “sub-models” into a single final result. Tests of the sub-model method show improvement in test set root mean squared error of prediction (RMSEP) for almost all cases. The sub-model method, using partial least squares regression (PLS), is being used as part of the current ChemCam quantitative calibration, but the sub-model method is applicable to any multivariate regression method and may yield similar improvements.

  5. Perceptions about Homeless Elders and Community Responsibility

    ERIC Educational Resources Information Center

    Kane, Michael N.; Green, Diane; Jacobs, Robin

    2013-01-01

    Human service students were surveyed ("N" = 207) to determine their perceptions about homeless elders and communal responsibility for their well-being. Using a backward regression analysis, a final model ("F" = 15.617, "df" = 7, "p" < 0.001) for Perceptions about Homeless Persons and Community…

  6. ESTIMATING GROUND LEVEL PM 2.5 IN THE EASTERN UNITED STATES USING SATELLITE REMOTE SENSING

    EPA Science Inventory

    An empirical model based on the regression between daily average final particle (PM2.5) concentrations and aerosol optical thickness (AOT) measurements from the Multi-angle Imaging SpectroRadiometer (MISR) was developed and tested using data from the eastern United States during ...

  7. Regression Analysis and Calibration Recommendations for the Characterization of Balance Temperature Effects

    NASA Technical Reports Server (NTRS)

    Ulbrich, N.; Volden, T.

    2018-01-01

    Analysis and use of temperature-dependent wind tunnel strain-gage balance calibration data are discussed in the paper. First, three different methods are presented and compared that may be used to process temperature-dependent strain-gage balance data. The first method uses an extended set of independent variables in order to process the data and predict balance loads. The second method applies an extended load iteration equation during the analysis of balance calibration data. The third method uses temperature-dependent sensitivities for the data analysis. Physical interpretations of the most important temperature-dependent regression model terms are provided that relate temperature compensation imperfections and the temperature-dependent nature of the gage factor to sets of regression model terms. Finally, balance calibration recommendations are listed so that temperature-dependent calibration data can be obtained and successfully processed using the reviewed analysis methods.

  8. Occlusal factors are not related to self-reported bruxism.

    PubMed

    Manfredini, Daniele; Visscher, Corine M; Guarda-Nardini, Luca; Lobbezoo, Frank

    2012-01-01

    To estimate the contribution of various occlusal features of the natural dentition that may identify self-reported bruxers compared to nonbruxers. Two age- and sex-matched groups of self-reported bruxers (n = 67) and self-reported nonbruxers (n = 75) took part in the study. For each patient, the following occlusal features were clinically assessed: retruded contact position (RCP) to intercuspal contact position (ICP) slide length (< 2 mm was considered normal), vertical overlap (< 0 mm was considered an anterior open bite; > 4 mm, a deep bite), horizontal overlap (> 4 mm was considered a large horizontal overlap), incisor dental midline discrepancy (< 2 mm was considered normal), and the presence of a unilateral posterior crossbite, mediotrusive interferences, and laterotrusive interferences. A multiple logistic regression model was used to identify the significant associations between the assessed occlusal features (independent variables) and self-reported bruxism (dependent variable). Accuracy values to predict self-reported bruxism were unacceptable for all occlusal variables. The only variable remaining in the final regression model was laterotrusive interferences (P = .030). The percentage of explained variance for bruxism by the final multiple regression model was 4.6%. This model including only one occlusal factor showed low positive (58.1%) and negative predictive values (59.7%), thus showing a poor accuracy to predict the presence of self-reported bruxism (59.2%). This investigation suggested that the contribution of occlusion to the differentiation between bruxers and nonbruxers is negligible. This finding supports theories that advocate a much diminished role for peripheral anatomical-structural factors in the pathogenesis of bruxism.

  9. Multivariate Boosting for Integrative Analysis of High-Dimensional Cancer Genomic Data

    PubMed Central

    Xiong, Lie; Kuan, Pei-Fen; Tian, Jianan; Keles, Sunduz; Wang, Sijian

    2015-01-01

    In this paper, we propose a novel multivariate component-wise boosting method for fitting multivariate response regression models under the high-dimension, low sample size setting. Our method is motivated by modeling the association among different biological molecules based on multiple types of high-dimensional genomic data. Particularly, we are interested in two applications: studying the influence of DNA copy number alterations on RNA transcript levels and investigating the association between DNA methylation and gene expression. For this purpose, we model the dependence of the RNA expression levels on DNA copy number alterations and the dependence of gene expression on DNA methylation through multivariate regression models and utilize boosting-type method to handle the high dimensionality as well as model the possible nonlinear associations. The performance of the proposed method is demonstrated through simulation studies. Finally, our multivariate boosting method is applied to two breast cancer studies. PMID:26609213

  10. Correlations of turbidity to suspended-sediment concentration in the Toutle River Basin, near Mount St. Helens, Washington, 2010-11

    USGS Publications Warehouse

    Uhrich, Mark A.; Kolasinac, Jasna; Booth, Pamela L.; Fountain, Robert L.; Spicer, Kurt R.; Mosbrucker, Adam R.

    2014-01-01

    Researchers at the U.S. Geological Survey, Cascades Volcano Observatory, investigated alternative methods for the traditional sample-based sediment record procedure in determining suspended-sediment concentration (SSC) and discharge. One such sediment-surrogate technique was developed using turbidity and discharge to estimate SSC for two gaging stations in the Toutle River Basin near Mount St. Helens, Washington. To provide context for the study, methods for collecting sediment data and monitoring turbidity are discussed. Statistical methods used include the development of ordinary least squares regression models for each gaging station. Issues of time-related autocorrelation also are evaluated. Addition of lagged explanatory variables was used to account for autocorrelation in the turbidity, discharge, and SSC data. Final regression model equations and plots are presented for the two gaging stations. The regression models support near-real-time estimates of SSC and improved suspended-sediment discharge records by incorporating continuous instream turbidity. Future use of such models may potentially lower the costs of sediment monitoring by reducing time it takes to collect and process samples and to derive a sediment-discharge record.

  11. Prediction of hourly PM2.5 using a space-time support vector regression model

    NASA Astrophysics Data System (ADS)

    Yang, Wentao; Deng, Min; Xu, Feng; Wang, Hang

    2018-05-01

    Real-time air quality prediction has been an active field of research in atmospheric environmental science. The existing methods of machine learning are widely used to predict pollutant concentrations because of their enhanced ability to handle complex non-linear relationships. However, because pollutant concentration data, as typical geospatial data, also exhibit spatial heterogeneity and spatial dependence, they may violate the assumptions of independent and identically distributed random variables in most of the machine learning methods. As a result, a space-time support vector regression model is proposed to predict hourly PM2.5 concentrations. First, to address spatial heterogeneity, spatial clustering is executed to divide the study area into several homogeneous or quasi-homogeneous subareas. To handle spatial dependence, a Gauss vector weight function is then developed to determine spatial autocorrelation variables as part of the input features. Finally, a local support vector regression model with spatial autocorrelation variables is established for each subarea. Experimental data on PM2.5 concentrations in Beijing are used to verify whether the results of the proposed model are superior to those of other methods.

  12. Workers' compensation costs among construction workers: a robust regression analysis.

    PubMed

    Friedman, Lee S; Forst, Linda S

    2009-11-01

    Workers' compensation data are an important source for evaluating costs associated with construction injuries. We describe the characteristics of injured construction workers filing claims in Illinois between 2000 and 2005 and the factors associated with compensation costs using a robust regression model. In the final multivariable model, the cumulative percent temporary and permanent disability-measures of severity of injury-explained 38.7% of the variance of cost. Attorney costs explained only 0.3% of the variance of the dependent variable. The model used in this study clearly indicated that percent disability was the most important determinant of cost, although the method and uniformity of percent impairment allocation could be better elucidated. There is a need to integrate analytical methods that are suitable for skewed data when analyzing claim costs.

  13. [The Quality of the Family Physician-Patient Relationship. Patient-Related Predictors in a Sample Representative for the German Population].

    PubMed

    Dinkel, Andreas; Schneider, Antonius; Schmutzer, Gabriele; Brähler, Elmar; Henningsen, Peter; Häuser, Winfried

    2016-03-01

    Patient-centeredness and a strong working alliance are core elements of family medicine. Surveys in Germany showed that most people are satisfied with the quality of the family physician-patient relationship. However, factors that are responsible for the quality of the family physician-patient relationship remain unclear. This study aimed at identifying patient-related predictors of the quality of this relationship. Participants of a cross-sectional survey representative for the general German population were assessed using standardized questionnaires. The perceived quality of the family physician-patient relationship was measured with the German version of the Patient-Doctor Relationship Questionnaire (PDRQ-9). Associations of demographic and clinical variables (comorbidity, somatic symptom burden, psychological distress) with the quality of the family physician-patient relationship were assessed by applying hierarchical linear regression. 2278 participants (91,9%) reported having a family physician. The mean total score of the PDRQ-9 was high (M=4,12, SD=0,70). The final regression model showed that higher age, being female, and most notably less somatic and less depressive symptoms predicted a higher quality of the family physician-patient relationship. Comorbidity lost significance when somatic symptom burden was added to the regression model. The final model explained 11% of the variance, indicating a small effect. Experiencing somatic and depressive symptoms emerged as most relevant patient-related predictors of the quality of the family physician-patient relationship. © Georg Thieme Verlag KG Stuttgart · New York.

  14. Impact of volunteer-related and methodology-related factors on the reproducibility of brachial artery flow-mediated vasodilation: analysis of 672 individual repeated measurements.

    PubMed

    van Mil, Anke C C M; Greyling, Arno; Zock, Peter L; Geleijnse, Johanna M; Hopman, Maria T; Mensink, Ronald P; Reesink, Koen D; Green, Daniel J; Ghiadoni, Lorenzo; Thijssen, Dick H

    2016-09-01

    Brachial artery flow-mediated dilation (FMD) is a popular technique to examine endothelial function in humans. Identifying volunteer and methodological factors related to variation in FMD is important to improve measurement accuracy and applicability. Volunteer-related and methodology-related parameters were collected in 672 volunteers from eight affiliated centres worldwide who underwent repeated measures of FMD. All centres adopted contemporary expert-consensus guidelines for FMD assessment. After calculating the coefficient of variation (%) of the FMD for each individual, we constructed quartiles (n = 168 per quartile). Based on two regression models (volunteer-related factors and methodology-related factors), statistically significant components of these two models were added to a final regression model (calculated as β-coefficient and R). This allowed us to identify factors that independently contributed to the variation in FMD%. Median coefficient of variation was 17.5%, with healthy volunteers demonstrating a coefficient of variation 9.3%. Regression models revealed age (β = 0.248, P < 0.001), hypertension (β = 0.104, P < 0.001), dyslipidemia (β = 0.331, P < 0.001), time between measurements (β = 0.318, P < 0.001), lab experience (β = -0.133, P < 0.001) and baseline FMD% (β = 0.082, P < 0.05) as contributors to the coefficient of variation. After including all significant factors in the final model, we found that time between measurements, hypertension, baseline FMD% and lab experience with FMD independently predicted brachial artery variability (total R = 0.202). Although FMD% showed good reproducibility, larger variation was observed in conditions with longer time between measurements, hypertension, less experience and lower baseline FMD%. Accounting for these factors may improve FMD% variability.

  15. Modelling subject-specific childhood growth using linear mixed-effect models with cubic regression splines.

    PubMed

    Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William

    2016-01-01

    Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p < 0.001) when using a linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p < 0.001) and slopes (p < 0.001) of the individual growth trajectories. We also identified important serial correlation within the structure of the data (ρ = 0.66; 95 % CI 0.64 to 0.68; p < 0.001), which we modeled with a first order continuous autoregressive error term as evidenced by the variogram of the residuals and by a lack of association among residuals. The final model provides a parametric linear regression equation for both estimation and prediction of population- and individual-level growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19,598, respectively). While the regression parameters are more complex to interpret in the former, we argue that inference for any problem depends more on the estimated curve or differences in curves rather than the coefficients. Moreover, use of cubic regression splines provides biological meaningful growth velocity and acceleration curves despite increased complexity in coefficient interpretation. Through this stepwise approach, we provide a set of tools to model longitudinal childhood data for non-statisticians using linear mixed-effect models.

  16. Regression Analysis of Top of Descent Location for Idle-thrust Descents

    NASA Technical Reports Server (NTRS)

    Stell, Laurel; Bronsvoort, Jesper; McDonald, Greg

    2013-01-01

    In this paper, multiple regression analysis is used to model the top of descent (TOD) location of user-preferred descent trajectories computed by the flight management system (FMS) on over 1000 commercial flights into Melbourne, Australia. The independent variables cruise altitude, final altitude, cruise Mach, descent speed, wind, and engine type were also recorded or computed post-operations. Both first-order and second-order models are considered, where cross-validation, hypothesis testing, and additional analysis are used to compare models. This identifies the models that should give the smallest errors if used to predict TOD location for new data in the future. A model that is linear in TOD altitude, final altitude, descent speed, and wind gives an estimated standard deviation of 3.9 nmi for TOD location given the trajec- tory parameters, which means about 80% of predictions would have error less than 5 nmi in absolute value. This accuracy is better than demonstrated by other ground automation predictions using kinetic models. Furthermore, this approach would enable online learning of the model. Additional data or further knowl- edge of algorithms is necessary to conclude definitively that no second-order terms are appropriate. Possible applications of the linear model are described, including enabling arriving aircraft to fly optimized descents computed by the FMS even in congested airspace. In particular, a model for TOD location that is linear in the independent variables would enable decision support tool human-machine interfaces for which a kinetic approach would be computationally too slow.

  17. An Exploratory Analysis of Personality, Attitudes, and Study Skills on the Learning Curve within a Team-based Learning Environment

    PubMed Central

    Henry, Teague; Campbell, Ashley

    2015-01-01

    Objective. To examine factors that determine the interindividual variability of learning within a team-based learning environment. Methods. Students in a pharmacokinetics course were given 4 interim, low-stakes cumulative assessments throughout the semester and a cumulative final examination. Students’ Myers-Briggs personality type was assessed, as well as their study skills, motivations, and attitudes towards team-learning. A latent curve model (LCM) was applied and various covariates were assessed to improve the regression model. Results. A quadratic LCM was applied for the first 4 assessments to predict final examination performance. None of the covariates examined significantly impacted the regression model fit except metacognitive self-regulation, which explained some of the variability in the rate of learning. There were some correlations between personality type and attitudes towards team learning, with introverts having a lower opinion of team-learning than extroverts. Conclusion. The LCM could readily describe the learning curve. Extroverted and introverted personality types had the same learning performance even though preference for team-learning was lower in introverts. Other personality traits, study skills, or practice did not significantly contribute to the learning variability in this course. PMID:25861101

  18. An exploratory analysis of personality, attitudes, and study skills on the learning curve within a team-based learning environment.

    PubMed

    Persky, Adam M; Henry, Teague; Campbell, Ashley

    2015-03-25

    To examine factors that determine the interindividual variability of learning within a team-based learning environment. Students in a pharmacokinetics course were given 4 interim, low-stakes cumulative assessments throughout the semester and a cumulative final examination. Students' Myers-Briggs personality type was assessed, as well as their study skills, motivations, and attitudes towards team-learning. A latent curve model (LCM) was applied and various covariates were assessed to improve the regression model. A quadratic LCM was applied for the first 4 assessments to predict final examination performance. None of the covariates examined significantly impacted the regression model fit except metacognitive self-regulation, which explained some of the variability in the rate of learning. There were some correlations between personality type and attitudes towards team learning, with introverts having a lower opinion of team-learning than extroverts. The LCM could readily describe the learning curve. Extroverted and introverted personality types had the same learning performance even though preference for team-learning was lower in introverts. Other personality traits, study skills, or practice did not significantly contribute to the learning variability in this course.

  19. Linear and evolutionary polynomial regression models to forecast coastal dynamics: Comparison and reliability assessment

    NASA Astrophysics Data System (ADS)

    Bruno, Delia Evelina; Barca, Emanuele; Goncalves, Rodrigo Mikosz; de Araujo Queiroz, Heithor Alexandre; Berardi, Luigi; Passarella, Giuseppe

    2018-01-01

    In this paper, the Evolutionary Polynomial Regression data modelling strategy has been applied to study small scale, short-term coastal morphodynamics, given its capability for treating a wide database of known information, non-linearly. Simple linear and multilinear regression models were also applied to achieve a balance between the computational load and reliability of estimations of the three models. In fact, even though it is easy to imagine that the more complex the model, the more the prediction improves, sometimes a "slight" worsening of estimations can be accepted in exchange for the time saved in data organization and computational load. The models' outcomes were validated through a detailed statistical, error analysis, which revealed a slightly better estimation of the polynomial model with respect to the multilinear model, as expected. On the other hand, even though the data organization was identical for the two models, the multilinear one required a simpler simulation setting and a faster run time. Finally, the most reliable evolutionary polynomial regression model was used in order to make some conjecture about the uncertainty increase with the extension of extrapolation time of the estimation. The overlapping rate between the confidence band of the mean of the known coast position and the prediction band of the estimated position can be a good index of the weakness in producing reliable estimations when the extrapolation time increases too much. The proposed models and tests have been applied to a coastal sector located nearby Torre Colimena in the Apulia region, south Italy.

  20. Regression estimators for generic health-related quality of life and quality-adjusted life years.

    PubMed

    Basu, Anirban; Manca, Andrea

    2012-01-01

    To develop regression models for outcomes with truncated supports, such as health-related quality of life (HRQoL) data, and account for features typical of such data such as a skewed distribution, spikes at 1 or 0, and heteroskedasticity. Regression estimators based on features of the Beta distribution. First, both a single equation and a 2-part model are presented, along with estimation algorithms based on maximum-likelihood, quasi-likelihood, and Bayesian Markov-chain Monte Carlo methods. A novel Bayesian quasi-likelihood estimator is proposed. Second, a simulation exercise is presented to assess the performance of the proposed estimators against ordinary least squares (OLS) regression for a variety of HRQoL distributions that are encountered in practice. Finally, the performance of the proposed estimators is assessed by using them to quantify the treatment effect on QALYs in the EVALUATE hysterectomy trial. Overall model fit is studied using several goodness-of-fit tests such as Pearson's correlation test, link and reset tests, and a modified Hosmer-Lemeshow test. The simulation results indicate that the proposed methods are more robust in estimating covariate effects than OLS, especially when the effects are large or the HRQoL distribution has a large spike at 1. Quasi-likelihood techniques are more robust than maximum likelihood estimators. When applied to the EVALUATE trial, all but the maximum likelihood estimators produce unbiased estimates of the treatment effect. One and 2-part Beta regression models provide flexible approaches to regress the outcomes with truncated supports, such as HRQoL, on covariates, after accounting for many idiosyncratic features of the outcomes distribution. This work will provide applied researchers with a practical set of tools to model outcomes in cost-effectiveness analysis.

  1. Bias-motivated bullying and psychosocial problems: implications for HIV risk behaviors among young men who have sex with men.

    PubMed

    Li, Michael Jonathan; Distefano, Anthony; Mouttapa, Michele; Gill, Jasmeet K

    2014-02-01

    The present study aimed to determine whether the experience of bias-motivated bullying was associated with behaviors known to increase the risk of HIV infection among young men who have sex with men (YMSM) aged 18-29, and to assess whether the psychosocial problems moderated this relationship. Using an Internet-based direct marketing approach in sampling, we recruited 545 YMSM residing in the USA to complete an online questionnaire. Multiple linear regression analyses tested three regression models where we controlled for sociodemographics. The first model indicated that bullying during high school was associated with unprotected receptive anal intercourse within the past 12 months, while the second model indicated that bullying after high school was associated with engaging in anal intercourse while under the influence of drugs or alcohol in the past 12 months. In the final regression model, our composite measure of HIV risk behavior was found to be associated with lifetime verbal harassment. None of the psychosocial problems measured in this study - depression, low self-esteem, and internalized homonegativity - moderated any of the associations between bias-motivated bullying victimization and HIV risk behaviors in our regression models. Still, these findings provide novel evidence that bullying prevention programs in schools and communities should be included in comprehensive approaches to HIV prevention among YMSM.

  2. ISC-GEM: Global Instrumental Earthquake Catalogue (1900-2009), III. Re-computed MS and mb, proxy MW, final magnitude composition and completeness assessment

    NASA Astrophysics Data System (ADS)

    Di Giacomo, Domenico; Bondár, István; Storchak, Dmitry A.; Engdahl, E. Robert; Bormann, Peter; Harris, James

    2015-02-01

    This paper outlines the re-computation and compilation of the magnitudes now contained in the final ISC-GEM Reference Global Instrumental Earthquake Catalogue (1900-2009). The catalogue is available via the ISC website (http://www.isc.ac.uk/iscgem/). The available re-computed MS and mb provided an ideal basis for deriving new conversion relationships to moment magnitude MW. Therefore, rather than using previously published regression models, we derived new empirical relationships using both generalized orthogonal linear and exponential non-linear models to obtain MW proxies from MS and mb. The new models were tested against true values of MW, and the newly derived exponential models were then preferred to the linear ones in computing MW proxies. For the final magnitude composition of the ISC-GEM catalogue, we preferred directly measured MW values as published by the Global CMT project for the period 1976-2009 (plus intermediate-depth earthquakes between 1962 and 1975). In addition, over 1000 publications have been examined to obtain direct seismic moment M0 and, therefore, also MW estimates for 967 large earthquakes during 1900-1978 (Lee and Engdahl, 2015) by various alternative methods to the current GCMT procedure. In all other instances we computed MW proxy values by converting our re-computed MS and mb values into MW, using the newly derived non-linear regression models. The final magnitude composition is an improvement in terms of magnitude homogeneity compared to previous catalogues. The magnitude completeness is not homogeneous over the 110 years covered by the ISC-GEM catalogue. Therefore, seismicity rate estimates may be strongly affected without a careful time window selection. In particular, the ISC-GEM catalogue appears to be complete down to MW 5.6 starting from 1964, whereas for the early instrumental period the completeness varies from ∼7.5 to 6.2. Further time and resources would be necessary to homogenize the magnitude of completeness over the entire catalogue length.

  3. Hybrid Rocket Performance Prediction with Coupling Method of CFD and Thermal Conduction Calculation

    NASA Astrophysics Data System (ADS)

    Funami, Yuki; Shimada, Toru

    The final purpose of this study is to develop a design tool for hybrid rocket engines. This tool is a computer code which will be used in order to investigate rocket performance characteristics and unsteady phenomena lasting through the burning time, such as fuel regression or combustion oscillation. When phenomena inside a combustion chamber, namely boundary layer combustion, are described, it is difficult to use rigorous models for this target. It is because calculation cost may be too expensive. Therefore simple models are required for this calculation. In this study, quasi-one-dimensional compressible Euler equations for flowfields inside a chamber and the equation for thermal conduction inside a solid fuel are numerically solved. The energy balance equation at the solid fuel surface is solved to estimate fuel regression rate. Heat feedback model is Karabeyoglu's model dependent on total mass flux. Combustion model is global single step reaction model for 4 chemical species or chemical equilibrium model for 9 chemical species. As a first step, steady-state solutions are reported.

  4. Local Composite Quantile Regression Smoothing for Harris Recurrent Markov Processes

    PubMed Central

    Li, Degui; Li, Runze

    2016-01-01

    In this paper, we study the local polynomial composite quantile regression (CQR) smoothing method for the nonlinear and nonparametric models under the Harris recurrent Markov chain framework. The local polynomial CQR regression method is a robust alternative to the widely-used local polynomial method, and has been well studied in stationary time series. In this paper, we relax the stationarity restriction on the model, and allow that the regressors are generated by a general Harris recurrent Markov process which includes both the stationary (positive recurrent) and nonstationary (null recurrent) cases. Under some mild conditions, we establish the asymptotic theory for the proposed local polynomial CQR estimator of the mean regression function, and show that the convergence rate for the estimator in nonstationary case is slower than that in stationary case. Furthermore, a weighted type local polynomial CQR estimator is provided to improve the estimation efficiency, and a data-driven bandwidth selection is introduced to choose the optimal bandwidth involved in the nonparametric estimators. Finally, we give some numerical studies to examine the finite sample performance of the developed methodology and theory. PMID:27667894

  5. Nonlinear-regression groundwater flow modeling of a deep regional aquifer system

    USGS Publications Warehouse

    Cooley, Richard L.; Konikow, Leonard F.; Naff, Richard L.

    1986-01-01

    A nonlinear regression groundwater flow model, based on a Galerkin finite-element discretization, was used to analyze steady state two-dimensional groundwater flow in the areally extensive Madison aquifer in a 75,000 mi2 area of the Northern Great Plains. Regression parameters estimated include intrinsic permeabilities of the main aquifer and separate lineament zones, discharges from eight major springs surrounding the Black Hills, and specified heads on the model boundaries. Aquifer thickness and temperature variations were included as specified functions. The regression model was applied using sequential F testing so that the fewest number and simplest zonation of intrinsic permeabilities, combined with the simplest overall model, were evaluated initially; additional complexities (such as subdivisions of zones and variations in temperature and thickness) were added in stages to evaluate the subsequent degree of improvement in the model results. It was found that only the eight major springs, a single main aquifer intrinsic permeability, two separate lineament intrinsic permeabilities of much smaller values, and temperature variations are warranted by the observed data (hydraulic heads and prior information on some parameters) for inclusion in a model that attempts to explain significant controls on groundwater flow. Addition of thickness variations did not significantly improve model results; however, thickness variations were included in the final model because they are fairly well defined. Effects on the observed head distribution from other features, such as vertical leakage and regional variations in intrinsic permeability, apparently were overshadowed by measurement errors in the observed heads. Estimates of the parameters correspond well to estimates obtained from other independent sources.

  6. Nonlinear-Regression Groundwater Flow Modeling of a Deep Regional Aquifer System

    NASA Astrophysics Data System (ADS)

    Cooley, Richard L.; Konikow, Leonard F.; Naff, Richard L.

    1986-12-01

    A nonlinear regression groundwater flow model, based on a Galerkin finite-element discretization, was used to analyze steady state two-dimensional groundwater flow in the areally extensive Madison aquifer in a 75,000 mi2 area of the Northern Great Plains. Regression parameters estimated include intrinsic permeabilities of the main aquifer and separate lineament zones, discharges from eight major springs surrounding the Black Hills, and specified heads on the model boundaries. Aquifer thickness and temperature variations were included as specified functions. The regression model was applied using sequential F testing so that the fewest number and simplest zonation of intrinsic permeabilities, combined with the simplest overall model, were evaluated initially; additional complexities (such as subdivisions of zones and variations in temperature and thickness) were added in stages to evaluate the subsequent degree of improvement in the model results. It was found that only the eight major springs, a single main aquifer intrinsic permeability, two separate lineament intrinsic permeabilities of much smaller values, and temperature variations are warranted by the observed data (hydraulic heads and prior information on some parameters) for inclusion in a model that attempts to explain significant controls on groundwater flow. Addition of thickness variations did not significantly improve model results; however, thickness variations were included in the final model because they are fairly well defined. Effects on the observed head distribution from other features, such as vertical leakage and regional variations in intrinsic permeability, apparently were overshadowed by measurement errors in the observed heads. Estimates of the parameters correspond well to estimates obtained from other independent sources.

  7. Hybrid ABC Optimized MARS-Based Modeling of the Milling Tool Wear from Milling Run Experimental Data

    PubMed Central

    García Nieto, Paulino José; García-Gonzalo, Esperanza; Ordóñez Galán, Celestino; Bernardo Sánchez, Antonio

    2016-01-01

    Milling cutters are important cutting tools used in milling machines to perform milling operations, which are prone to wear and subsequent failure. In this paper, a practical new hybrid model to predict the milling tool wear in a regular cut, as well as entry cut and exit cut, of a milling tool is proposed. The model was based on the optimization tool termed artificial bee colony (ABC) in combination with multivariate adaptive regression splines (MARS) technique. This optimization mechanism involved the parameter setting in the MARS training procedure, which significantly influences the regression accuracy. Therefore, an ABC–MARS-based model was successfully used here to predict the milling tool flank wear (output variable) as a function of the following input variables: the time duration of experiment, depth of cut, feed, type of material, etc. Regression with optimal hyperparameters was performed and a determination coefficient of 0.94 was obtained. The ABC–MARS-based model's goodness of fit to experimental data confirmed the good performance of this model. This new model also allowed us to ascertain the most influential parameters on the milling tool flank wear with a view to proposing milling machine's improvements. Finally, conclusions of this study are exposed. PMID:28787882

  8. Hybrid ABC Optimized MARS-Based Modeling of the Milling Tool Wear from Milling Run Experimental Data.

    PubMed

    García Nieto, Paulino José; García-Gonzalo, Esperanza; Ordóñez Galán, Celestino; Bernardo Sánchez, Antonio

    2016-01-28

    Milling cutters are important cutting tools used in milling machines to perform milling operations, which are prone to wear and subsequent failure. In this paper, a practical new hybrid model to predict the milling tool wear in a regular cut, as well as entry cut and exit cut, of a milling tool is proposed. The model was based on the optimization tool termed artificial bee colony (ABC) in combination with multivariate adaptive regression splines (MARS) technique. This optimization mechanism involved the parameter setting in the MARS training procedure, which significantly influences the regression accuracy. Therefore, an ABC-MARS-based model was successfully used here to predict the milling tool flank wear (output variable) as a function of the following input variables: the time duration of experiment, depth of cut, feed, type of material, etc . Regression with optimal hyperparameters was performed and a determination coefficient of 0.94 was obtained. The ABC-MARS-based model's goodness of fit to experimental data confirmed the good performance of this model. This new model also allowed us to ascertain the most influential parameters on the milling tool flank wear with a view to proposing milling machine's improvements. Finally, conclusions of this study are exposed.

  9. Study for Updated Gout Classification Criteria (SUGAR): identification of features to classify gout

    PubMed Central

    Taylor, William J.; Fransen, Jaap; Jansen, Tim L.; Dalbeth, Nicola; Schumacher, H. Ralph; Brown, Melanie; Louthrenoo, Worawit; Vazquez-Mellado, Janitzia; Eliseev, Maxim; McCarthy, Geraldine; Stamp, Lisa K.; Perez-Ruiz, Fernando; Sivera, Francisca; Ea, Hang-Korng; Gerritsen, Martijn; Scire, Carlo; Cavagna, Lorenzo; Lin, Chingtsai; Chou, Yin-Yi; Tausche, Anne-Kathrin; Vargas-Santos, Ana Beatriz; Janssen, Matthijs; Chen, Jiunn-Horng; Slot, Ole; Cimmino, Marco A.; Uhlig, Till; Neogi, Tuhina

    2015-01-01

    Objective To determine which clinical, laboratory and imaging features most accurately distinguished gout from non-gout. Methods A cross-sectional study of consecutive rheumatology clinic patients with at least one swollen joint or subcutaneous tophus. Gout was defined by synovial fluid or tophus aspirate microscopy by certified examiners in all patients. The sample was randomly divided into a model development (2/3) and test sample (1/3). Univariate and multivariate association between clinical features and MSU-defined gout was determined using logistic regression modelling. Shrinkage of regression weights was performed to prevent over-fitting of the final model. Latent class analysis was conducted to identify patterns of joint involvement. Results In total, 983 patients were included. Gout was present in 509 (52%). In the development sample (n=653), these features were selected for the final model (multivariate OR) joint erythema (2.13), difficulty walking (7.34), time to maximal pain < 24 hours (1.32), resolution by 2 weeks (3.58), tophus (7.29), MTP1 ever involved (2.30), location of currently tender joints: Other foot/ankle (2.28), MTP1 (2.82), serum urate level > 6 mg/dl (0.36 mmol/l) (3.35), ultrasound double contour sign (7.23), Xray erosion or cyst (2.49). The final model performed adequately in the test set with no evidence of misfit, high discrimination and predictive ability. MTP1 involvement was the most common joint pattern (39.4%) in gout cases. Conclusion Ten key discriminating features have been identified for further evaluation for new gout classification criteria. Ultrasound findings and degree of uricemia add discriminating value, and will significantly contribute to more accurate classification criteria. PMID:25777045

  10. The relationship between venture capital investment and macro economic variables via statistical computation method

    NASA Astrophysics Data System (ADS)

    Aygunes, Gunes

    2017-07-01

    The objective of this paper is to survey and determine the macroeconomic factors affecting the level of venture capital (VC) investments in a country. The literary depends on venture capitalists' quality and countries' venture capital investments. The aim of this paper is to give relationship between venture capital investment and macro economic variables via statistical computation method. We investigate the countries and macro economic variables. By using statistical computation method, we derive correlation between venture capital investments and macro economic variables. According to method of logistic regression model (logit regression or logit model), macro economic variables are correlated with each other in three group. Venture capitalists regard correlations as a indicator. Finally, we give correlation matrix of our results.

  11. A nonparametric method for assessment of interactions in a median regression model for analyzing right censored data.

    PubMed

    Lee, MinJae; Rahbar, Mohammad H; Talebi, Hooshang

    2018-01-01

    We propose a nonparametric test for interactions when we are concerned with investigation of the simultaneous effects of two or more factors in a median regression model with right censored survival data. Our approach is developed to detect interaction in special situations, when the covariates have a finite number of levels with a limited number of observations in each level, and it allows varying levels of variance and censorship at different levels of the covariates. Through simulation studies, we compare the power of detecting an interaction between the study group variable and a covariate using our proposed procedure with that of the Cox Proportional Hazard (PH) model and censored quantile regression model. We also assess the impact of censoring rate and type on the standard error of the estimators of parameters. Finally, we illustrate application of our proposed method to real life data from Prospective Observational Multicenter Major Trauma Transfusion (PROMMTT) study to test an interaction effect between type of injury and study sites using median time for a trauma patient to receive three units of red blood cells. The results from simulation studies indicate that our procedure performs better than both Cox PH model and censored quantile regression model based on statistical power for detecting the interaction, especially when the number of observations is small. It is also relatively less sensitive to censoring rates or even the presence of conditionally independent censoring that is conditional on the levels of covariates.

  12. Improved accuracy in quantitative laser-induced breakdown spectroscopy using sub-models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Anderson, Ryan B.; Clegg, Samuel M.; Frydenvang, Jens

    We report that accurate quantitative analysis of diverse geologic materials is one of the primary challenges faced by the Laser-Induced Breakdown Spectroscopy (LIBS)-based ChemCam instrument on the Mars Science Laboratory (MSL) rover. The SuperCam instrument on the Mars 2020 rover, as well as other LIBS instruments developed for geochemical analysis on Earth or other planets, will face the same challenge. Consequently, part of the ChemCam science team has focused on the development of improved multivariate analysis calibrations methods. Developing a single regression model capable of accurately determining the composition of very different target materials is difficult because the response ofmore » an element’s emission lines in LIBS spectra can vary with the concentration of other elements. We demonstrate a conceptually simple “submodel” method for improving the accuracy of quantitative LIBS analysis of diverse target materials. The method is based on training several regression models on sets of targets with limited composition ranges and then “blending” these “sub-models” into a single final result. Tests of the sub-model method show improvement in test set root mean squared error of prediction (RMSEP) for almost all cases. Lastly, the sub-model method, using partial least squares regression (PLS), is being used as part of the current ChemCam quantitative calibration, but the sub-model method is applicable to any multivariate regression method and may yield similar improvements.« less

  13. Improved accuracy in quantitative laser-induced breakdown spectroscopy using sub-models

    DOE PAGES

    Anderson, Ryan B.; Clegg, Samuel M.; Frydenvang, Jens; ...

    2016-12-15

    We report that accurate quantitative analysis of diverse geologic materials is one of the primary challenges faced by the Laser-Induced Breakdown Spectroscopy (LIBS)-based ChemCam instrument on the Mars Science Laboratory (MSL) rover. The SuperCam instrument on the Mars 2020 rover, as well as other LIBS instruments developed for geochemical analysis on Earth or other planets, will face the same challenge. Consequently, part of the ChemCam science team has focused on the development of improved multivariate analysis calibrations methods. Developing a single regression model capable of accurately determining the composition of very different target materials is difficult because the response ofmore » an element’s emission lines in LIBS spectra can vary with the concentration of other elements. We demonstrate a conceptually simple “submodel” method for improving the accuracy of quantitative LIBS analysis of diverse target materials. The method is based on training several regression models on sets of targets with limited composition ranges and then “blending” these “sub-models” into a single final result. Tests of the sub-model method show improvement in test set root mean squared error of prediction (RMSEP) for almost all cases. Lastly, the sub-model method, using partial least squares regression (PLS), is being used as part of the current ChemCam quantitative calibration, but the sub-model method is applicable to any multivariate regression method and may yield similar improvements.« less

  14. Supply Rate and Equilibrium Inventory of Air Force Enlisted Personnel: A Simultaneous Model of the Accession and Retention Markets Incorporating Force Level Constraints. Final Report for Period July 1969-June 1976.

    ERIC Educational Resources Information Center

    DeVany, Arthur S.; And Others

    This research was designed to develop and test a model of the Air Force manpower market. The study indicates that previous manpower supply studies failed to account for simultaneous determination of enlistments and retentions and misinterpreted regressions as supply equations. They are, instead, reduced form equations resulting from joint…

  15. Epidemiological characteristics of reported sporadic and outbreak cases of E. coli O157 in people from Alberta, Canada (2000-2002): methodological challenges of comparing clustered to unclustered data.

    PubMed

    Pearl, D L; Louie, M; Chui, L; Doré, K; Grimsrud, K M; Martin, S W; Michel, P; Svenson, L W; McEwen, S A

    2008-04-01

    Using multivariable models, we compared whether there were significant differences between reported outbreak and sporadic cases in terms of their sex, age, and mode and site of disease transmission. We also determined the potential role of administrative, temporal, and spatial factors within these models. We compared a variety of approaches to account for clustering of cases in outbreaks including weighted logistic regression, random effects models, general estimating equations, robust variance estimates, and the random selection of one case from each outbreak. Age and mode of transmission were the only epidemiologically and statistically significant covariates in our final models using the above approaches. Weighing observations in a logistic regression model by the inverse of their outbreak size appeared to be a relatively robust and valid means for modelling these data. Some analytical techniques, designed to account for clustering, had difficulty converging or producing realistic measures of association.

  16. A statistical learning framework for groundwater nitrate models of the Central Valley, California, USA

    USGS Publications Warehouse

    Nolan, Bernard T.; Fienen, Michael N.; Lorenz, David L.

    2015-01-01

    We used a statistical learning framework to evaluate the ability of three machine-learning methods to predict nitrate concentration in shallow groundwater of the Central Valley, California: boosted regression trees (BRT), artificial neural networks (ANN), and Bayesian networks (BN). Machine learning methods can learn complex patterns in the data but because of overfitting may not generalize well to new data. The statistical learning framework involves cross-validation (CV) training and testing data and a separate hold-out data set for model evaluation, with the goal of optimizing predictive performance by controlling for model overfit. The order of prediction performance according to both CV testing R2 and that for the hold-out data set was BRT > BN > ANN. For each method we identified two models based on CV testing results: that with maximum testing R2 and a version with R2 within one standard error of the maximum (the 1SE model). The former yielded CV training R2 values of 0.94–1.0. Cross-validation testing R2 values indicate predictive performance, and these were 0.22–0.39 for the maximum R2 models and 0.19–0.36 for the 1SE models. Evaluation with hold-out data suggested that the 1SE BRT and ANN models predicted better for an independent data set compared with the maximum R2 versions, which is relevant to extrapolation by mapping. Scatterplots of predicted vs. observed hold-out data obtained for final models helped identify prediction bias, which was fairly pronounced for ANN and BN. Lastly, the models were compared with multiple linear regression (MLR) and a previous random forest regression (RFR) model. Whereas BRT results were comparable to RFR, MLR had low hold-out R2 (0.07) and explained less than half the variation in the training data. Spatial patterns of predictions by the final, 1SE BRT model agreed reasonably well with previously observed patterns of nitrate occurrence in groundwater of the Central Valley.

  17. Hybrid modelling based on support vector regression with genetic algorithms in forecasting the cyanotoxins presence in the Trasona reservoir (Northern Spain).

    PubMed

    García Nieto, P J; Alonso Fernández, J R; de Cos Juez, F J; Sánchez Lasheras, F; Díaz Muñiz, C

    2013-04-01

    Cyanotoxins, a kind of poisonous substances produced by cyanobacteria, are responsible for health risks in drinking and recreational waters. As a result, anticipate its presence is a matter of importance to prevent risks. The aim of this study is to use a hybrid approach based on support vector regression (SVR) in combination with genetic algorithms (GAs), known as a genetic algorithm support vector regression (GA-SVR) model, in forecasting the cyanotoxins presence in the Trasona reservoir (Northern Spain). The GA-SVR approach is aimed at highly nonlinear biological problems with sharp peaks and the tests carried out proved its high performance. Some physical-chemical parameters have been considered along with the biological ones. The results obtained are two-fold. In the first place, the significance of each biological and physical-chemical variable on the cyanotoxins presence in the reservoir is determined with success. Finally, a predictive model able to forecast the possible presence of cyanotoxins in a short term was obtained. Copyright © 2013 Elsevier Inc. All rights reserved.

  18. Vegetation Monitoring with Gaussian Processes and Latent Force Models

    NASA Astrophysics Data System (ADS)

    Camps-Valls, Gustau; Svendsen, Daniel; Martino, Luca; Campos, Manuel; Luengo, David

    2017-04-01

    Monitoring vegetation by biophysical parameter retrieval from Earth observation data is a challenging problem, where machine learning is currently a key player. Neural networks, kernel methods, and Gaussian Process (GP) regression have excelled in parameter retrieval tasks at both local and global scales. GP regression is based on solid Bayesian statistics, yield efficient and accurate parameter estimates, and provides interesting advantages over competing machine learning approaches such as confidence intervals. However, GP models are hampered by lack of interpretability, that prevented the widespread adoption by a larger community. In this presentation we will summarize some of our latest developments to address this issue. We will review the main characteristics of GPs and their advantages in vegetation monitoring standard applications. Then, three advanced GP models will be introduced. First, we will derive sensitivity maps for the GP predictive function that allows us to obtain feature ranking from the model and to assess the influence of examples in the solution. Second, we will introduce a Joint GP (JGP) model that combines in situ measurements and simulated radiative transfer data in a single GP model. The JGP regression provides more sensible confidence intervals for the predictions, respects the physics of the underlying processes, and allows for transferability across time and space. Finally, a latent force model (LFM) for GP modeling that encodes ordinary differential equations to blend data-driven modeling and physical models of the system is presented. The LFM performs multi-output regression, adapts to the signal characteristics, is able to cope with missing data in the time series, and provides explicit latent functions that allow system analysis and evaluation. Empirical evidence of the performance of these models will be presented through illustrative examples.

  19. Modeling the probability of giving birth at health institutions among pregnant women attending antenatal care in West Shewa Zone, Oromia, Ethiopia: a cross sectional study.

    PubMed

    Dida, Nagasa; Birhanu, Zewdie; Gerbaba, Mulusew; Tilahun, Dejen; Morankar, Sudhakar

    2014-06-01

    Although ante natal care and institutional delivery is effective means for reducing maternal morbidity and mortality, the probability of giving birth at health institutions among ante natal care attendants has not been modeled in Ethiopia. Therefore, the objective of this study was to model predictors of giving birth at health institutions among expectant mothers following antenatal care. Facility based cross sectional study design was conducted among 322 consecutively selected mothers who were following ante natal care in two districts of West Shewa Zone, Oromia Regional State, Ethiopia. Participants were proportionally recruited from six health institutions. The data were analyzed using SPSS version 17.0. Multivariable logistic regression was employed to develop the prediction model. The final regression model had good discrimination power (89.2%), optimum sensitivity (89.0%) and specificity (80.0%) to predict the probability of giving birth at health institutions. Accordingly, self efficacy (beta=0.41), perceived barrier (beta=-0.31) and perceived susceptibility (beta=0.29) were significantly predicted the probability of giving birth at health institutions. The present study showed that logistic regression model has predicted the probability of giving birth at health institutions and identified significant predictors which health care providers should take into account in promotion of institutional delivery.

  20. Vesicular stomatitis forecasting based on Google Trends

    PubMed Central

    Lu, Yi; Zhou, GuangYa; Chen, Qin

    2018-01-01

    Background Vesicular stomatitis (VS) is an important viral disease of livestock. The main feature of VS is irregular blisters that occur on the lips, tongue, oral mucosa, hoof crown and nipple. Humans can also be infected with vesicular stomatitis and develop meningitis. This study analyses 2014 American VS outbreaks in order to accurately predict vesicular stomatitis outbreak trends. Methods American VS outbreaks data were collected from OIE. The data for VS keywords were obtained by inputting 24 disease-related keywords into Google Trends. After calculating the Pearson and Spearman correlation coefficients, it was found that there was a relationship between outbreaks and keywords derived from Google Trends. Finally, the predicted model was constructed based on qualitative classification and quantitative regression. Results For the regression model, the Pearson correlation coefficients between the predicted outbreaks and actual outbreaks are 0.953 and 0.948, respectively. For the qualitative classification model, we constructed five classification predictive models and chose the best classification predictive model as the result. The results showed, SN (sensitivity), SP (specificity) and ACC (prediction accuracy) values of the best classification predictive model are 78.52%,72.5% and 77.14%, respectively. Conclusion This study applied Google search data to construct a qualitative classification model and a quantitative regression model. The results show that the method is effective and that these two models obtain more accurate forecast. PMID:29385198

  1. An algebraic method for constructing stable and consistent autoregressive filters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Harlim, John, E-mail: jharlim@psu.edu; Department of Meteorology, the Pennsylvania State University, University Park, PA 16802; Hong, Hoon, E-mail: hong@ncsu.edu

    2015-02-15

    In this paper, we introduce an algebraic method to construct stable and consistent univariate autoregressive (AR) models of low order for filtering and predicting nonlinear turbulent signals with memory depth. By stable, we refer to the classical stability condition for the AR model. By consistent, we refer to the classical consistency constraints of Adams–Bashforth methods of order-two. One attractive feature of this algebraic method is that the model parameters can be obtained without directly knowing any training data set as opposed to many standard, regression-based parameterization methods. It takes only long-time average statistics as inputs. The proposed method provides amore » discretization time step interval which guarantees the existence of stable and consistent AR model and simultaneously produces the parameters for the AR models. In our numerical examples with two chaotic time series with different characteristics of decaying time scales, we find that the proposed AR models produce significantly more accurate short-term predictive skill and comparable filtering skill relative to the linear regression-based AR models. These encouraging results are robust across wide ranges of discretization times, observation times, and observation noise variances. Finally, we also find that the proposed model produces an improved short-time prediction relative to the linear regression-based AR-models in forecasting a data set that characterizes the variability of the Madden–Julian Oscillation, a dominant tropical atmospheric wave pattern.« less

  2. Characterizing mammographic images by using generic texture features

    PubMed Central

    2012-01-01

    Introduction Although mammographic density is an established risk factor for breast cancer, its use is limited in clinical practice because of a lack of automated and standardized measurement methods. The aims of this study were to evaluate a variety of automated texture features in mammograms as risk factors for breast cancer and to compare them with the percentage mammographic density (PMD) by using a case-control study design. Methods A case-control study including 864 cases and 418 controls was analyzed automatically. Four hundred seventy features were explored as possible risk factors for breast cancer. These included statistical features, moment-based features, spectral-energy features, and form-based features. An elaborate variable selection process using logistic regression analyses was performed to identify those features that were associated with case-control status. In addition, PMD was assessed and included in the regression model. Results Of the 470 image-analysis features explored, 46 remained in the final logistic regression model. An area under the curve of 0.79, with an odds ratio per standard deviation change of 2.88 (95% CI, 2.28 to 3.65), was obtained with validation data. Adding the PMD did not improve the final model. Conclusions Using texture features to predict the risk of breast cancer appears feasible. PMD did not show any additional value in this study. With regard to the features assessed, most of the analysis tools appeared to reflect mammographic density, although some features did not correlate with PMD. It remains to be investigated in larger case-control studies whether these features can contribute to increased prediction accuracy. PMID:22490545

  3. A comparative analysis of predictive models of morbidity in intensive care unit after cardiac surgery - part II: an illustrative example.

    PubMed

    Cevenini, Gabriele; Barbini, Emanuela; Scolletta, Sabino; Biagioli, Bonizella; Giomarelli, Pierpaolo; Barbini, Paolo

    2007-11-22

    Popular predictive models for estimating morbidity probability after heart surgery are compared critically in a unitary framework. The study is divided into two parts. In the first part modelling techniques and intrinsic strengths and weaknesses of different approaches were discussed from a theoretical point of view. In this second part the performances of the same models are evaluated in an illustrative example. Eight models were developed: Bayes linear and quadratic models, k-nearest neighbour model, logistic regression model, Higgins and direct scoring systems and two feed-forward artificial neural networks with one and two layers. Cardiovascular, respiratory, neurological, renal, infectious and hemorrhagic complications were defined as morbidity. Training and testing sets each of 545 cases were used. The optimal set of predictors was chosen among a collection of 78 preoperative, intraoperative and postoperative variables by a stepwise procedure. Discrimination and calibration were evaluated by the area under the receiver operating characteristic curve and Hosmer-Lemeshow goodness-of-fit test, respectively. Scoring systems and the logistic regression model required the largest set of predictors, while Bayesian and k-nearest neighbour models were much more parsimonious. In testing data, all models showed acceptable discrimination capacities, however the Bayes quadratic model, using only three predictors, provided the best performance. All models showed satisfactory generalization ability: again the Bayes quadratic model exhibited the best generalization, while artificial neural networks and scoring systems gave the worst results. Finally, poor calibration was obtained when using scoring systems, k-nearest neighbour model and artificial neural networks, while Bayes (after recalibration) and logistic regression models gave adequate results. Although all the predictive models showed acceptable discrimination performance in the example considered, the Bayes and logistic regression models seemed better than the others, because they also had good generalization and calibration. The Bayes quadratic model seemed to be a convincing alternative to the much more usual Bayes linear and logistic regression models. It showed its capacity to identify a minimum core of predictors generally recognized as essential to pragmatically evaluate the risk of developing morbidity after heart surgery.

  4. Allometric scaling of biceps strength before and after resistance training in men.

    PubMed

    Zoeller, Robert F; Ryan, Eric D; Gordish-Dressman, Heather; Price, Thomas B; Seip, Richard L; Angelopoulos, Theodore J; Moyna, Niall M; Gordon, Paul M; Thompson, Paul D; Hoffman, Eric P

    2007-06-01

    The purposes of this study were 1) derive allometric scaling models of isometric biceps muscle strength using pretraining body mass (BM) and muscle cross-sectional area (CSA) as scaling variables in adult males, 2) test model appropriateness using regression diagnostics, and 3) cross-validate the models before and after 12 wk of resistance training. A subset of FAMuSS (Functional SNP Associated with Muscle Size and Strength) study data (N=136) were randomly split into two groups (A and B). Allometric scaling models using pretraining BM and CSA were derived and tested for group A. The scaling exponents determined from these models were then applied to and tested on group B pretraining data. Finally, these scaling exponents were applied to and tested on group A and B posttraining data. BM and CSA models produced scaling exponents of 0.64 and 0.71, respectively. Regression diagnostics determined both models to be appropriate. Cross-validation of the models to group B showed that the BM model, but not the CSA model, was appropriate. Removal of the largest six subjects (CSA>30 cm) from group B resulted in an appropriate fit for the CSA model. Application of the models to group A posttraining data showed that both models were appropriate, but only the body mass model was successful for group B. These data suggest that the application of scaling exponents of 0.64 and 0.71, using BM and CSA, respectively, are appropriate for scaling isometric biceps strength in adult males. However, the scaling exponent using CSA may not be appropriate for individuals with biceps CSA>30 cm. Finally, 12 wk of resistance training does not alter the relationship between BM, CSA, and muscular strength as assessed by allometric scaling.

  5. Animal models of maternal high fat diet exposure and effects on metabolism in offspring: a meta-regression analysis.

    PubMed

    Ribaroff, G A; Wastnedge, E; Drake, A J; Sharpe, R M; Chambers, T J G

    2017-06-01

    Animal models of maternal high fat diet (HFD) demonstrate perturbed offspring metabolism although the effects differ markedly between models. We assessed studies investigating metabolic parameters in the offspring of HFD fed mothers to identify factors explaining these inter-study differences. A total of 171 papers were identified, which provided data from 6047 offspring. Data were extracted regarding body weight, adiposity, glucose homeostasis and lipidaemia. Information regarding the macronutrient content of diet, species, time point of exposure and gestational weight gain were collected and utilized in meta-regression models to explore predictive factors. Publication bias was assessed using Egger's regression test. Maternal HFD exposure did not affect offspring birthweight but increased weaning weight, final bodyweight, adiposity, triglyceridaemia, cholesterolaemia and insulinaemia in both female and male offspring. Hyperglycaemia was found in female offspring only. Meta-regression analysis identified lactational HFD exposure as a key moderator. The fat content of the diet did not correlate with any outcomes. There was evidence of significant publication bias for all outcomes except birthweight. Maternal HFD exposure was associated with perturbed metabolism in offspring but between studies was not accounted for by dietary constituents, species, strain or maternal gestational weight gain. Specific weaknesses in experimental design predispose many of the results to bias. © 2017 The Authors. Obesity Reviews published by John Wiley & Sons Ltd on behalf of World Obesity Federation.

  6. Regression Analysis of Stage Variability for West-Central Florida Lakes

    USGS Publications Warehouse

    Sacks, Laura A.; Ellison, Donald L.; Swancar, Amy

    2008-01-01

    The variability in a lake's stage depends upon many factors, including surface-water flows, meteorological conditions, and hydrogeologic characteristics near the lake. An understanding of the factors controlling lake-stage variability for a population of lakes may be helpful to water managers who set regulatory levels for lakes. The goal of this study is to determine whether lake-stage variability can be predicted using multiple linear regression and readily available lake and basin characteristics defined for each lake. Regressions were evaluated for a recent 10-year period (1996-2005) and for a historical 10-year period (1954-63). Ground-water pumping is considered to have affected stage at many of the 98 lakes included in the recent period analysis, and not to have affected stage at the 20 lakes included in the historical period analysis. For the recent period, regression models had coefficients of determination (R2) values ranging from 0.60 to 0.74, and up to five explanatory variables. Standard errors ranged from 21 to 37 percent of the average stage variability. Net leakage was the most important explanatory variable in regressions describing the full range and low range in stage variability for the recent period. The most important explanatory variable in the model predicting the high range in stage variability was the height over median lake stage at which surface-water outflow would occur. Other explanatory variables in final regression models for the recent period included the range in annual rainfall for the period and several variables related to local and regional hydrogeology: (1) ground-water pumping within 1 mile of each lake, (2) the amount of ground-water inflow (by category), (3) the head gradient between the lake and the Upper Floridan aquifer, and (4) the thickness of the intermediate confining unit. Many of the variables in final regression models are related to hydrogeologic characteristics, underscoring the importance of ground-water exchange in controlling the stage of karst lakes in Florida. Regression equations were used to predict lake-stage variability for the recent period for 12 additional lakes, and the median difference between predicted and observed values ranged from 11 to 23 percent. Coefficients of determination for the historical period were considerably lower (maximum R2 of 0.28) than for the recent period. Reasons for these low R2 values are probably related to the small number of lakes (20) with stage data for an equivalent time period that were unaffected by ground-water pumping, the similarity of many of the lake types (large surface-water drainage lakes), and the greater uncertainty in defining historical basin characteristics. The lack of lake-stage data unaffected by ground-water pumping and the poor regression results obtained for that group of lakes limit the ability to predict natural lake-stage variability using this method in west-central Florida.

  7. Fundamental Phenomena on Fuel Decomposition and Boundary-Layer Combustion Precesses with Applications to Hybrid Rocket Motors. Part 1; Experimental Investigation

    NASA Technical Reports Server (NTRS)

    Kuo, Kenneth K.; Lu, Yeu-Cherng; Chiaverini, Martin J.; Johnson, David K.; Serin, Nadir; Risha, Grant A.; Merkle, Charles L.; Venkateswaran, Sankaran

    1996-01-01

    This final report summarizes the major findings on the subject of 'Fundamental Phenomena on Fuel Decomposition and Boundary-Layer Combustion Processes with Applications to Hybrid Rocket Motors', performed from 1 April 1994 to 30 June 1996. Both experimental results from Task 1 and theoretical/numerical results from Task 2 are reported here in two parts. Part 1 covers the experimental work performed and describes the test facility setup, data reduction techniques employed, and results of the test firings, including effects of operating conditions and fuel additives on solid fuel regression rate and thermal profiles of the condensed phase. Part 2 concerns the theoretical/numerical work. It covers physical modeling of the combustion processes including gas/surface coupling, and radiation effect on regression rate. The numerical solution of the flowfield structure and condensed phase regression behavior are presented. Experimental data from the test firings were used for numerical model validation.

  8. Evaluation of logistic regression models and effect of covariates for case-control study in RNA-Seq analysis.

    PubMed

    Choi, Seung Hoan; Labadorf, Adam T; Myers, Richard H; Lunetta, Kathryn L; Dupuis, Josée; DeStefano, Anita L

    2017-02-06

    Next generation sequencing provides a count of RNA molecules in the form of short reads, yielding discrete, often highly non-normally distributed gene expression measurements. Although Negative Binomial (NB) regression has been generally accepted in the analysis of RNA sequencing (RNA-Seq) data, its appropriateness has not been exhaustively evaluated. We explore logistic regression as an alternative method for RNA-Seq studies designed to compare cases and controls, where disease status is modeled as a function of RNA-Seq reads using simulated and Huntington disease data. We evaluate the effect of adjusting for covariates that have an unknown relationship with gene expression. Finally, we incorporate the data adaptive method in order to compare false positive rates. When the sample size is small or the expression levels of a gene are highly dispersed, the NB regression shows inflated Type-I error rates but the Classical logistic and Bayes logistic (BL) regressions are conservative. Firth's logistic (FL) regression performs well or is slightly conservative. Large sample size and low dispersion generally make Type-I error rates of all methods close to nominal alpha levels of 0.05 and 0.01. However, Type-I error rates are controlled after applying the data adaptive method. The NB, BL, and FL regressions gain increased power with large sample size, large log2 fold-change, and low dispersion. The FL regression has comparable power to NB regression. We conclude that implementing the data adaptive method appropriately controls Type-I error rates in RNA-Seq analysis. Firth's logistic regression provides a concise statistical inference process and reduces spurious associations from inaccurately estimated dispersion parameters in the negative binomial framework.

  9. NARMAX model identification of a palm oil biodiesel engine using multi-objective optimization differential evolution

    NASA Astrophysics Data System (ADS)

    Mansor, Zakwan; Zakaria, Mohd Zakimi; Nor, Azuwir Mohd; Saad, Mohd Sazli; Ahmad, Robiah; Jamaluddin, Hishamuddin

    2017-09-01

    This paper presents the black-box modelling of palm oil biodiesel engine (POB) using multi-objective optimization differential evolution (MOODE) algorithm. Two objective functions are considered in the algorithm for optimization; minimizing the number of term of a model structure and minimizing the mean square error between actual and predicted outputs. The mathematical model used in this study to represent the POB system is nonlinear auto-regressive moving average with exogenous input (NARMAX) model. Finally, model validity tests are applied in order to validate the possible models that was obtained from MOODE algorithm and lead to select an optimal model.

  10. A methodology for the design of experiments in computational intelligence with multiple regression models.

    PubMed

    Fernandez-Lozano, Carlos; Gestal, Marcos; Munteanu, Cristian R; Dorado, Julian; Pazos, Alejandro

    2016-01-01

    The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable.

  11. A methodology for the design of experiments in computational intelligence with multiple regression models

    PubMed Central

    Gestal, Marcos; Munteanu, Cristian R.; Dorado, Julian; Pazos, Alejandro

    2016-01-01

    The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable. PMID:27920952

  12. Identification and quantification of ciprofloxacin in urine through excitation-emission fluorescence and three-way PARAFAC calibration.

    PubMed

    Ortiz, M C; Sarabia, L A; Sánchez, M S; Giménez, D

    2009-05-29

    Due to the second-order advantage, calibration models based on parallel factor analysis (PARAFAC) decomposition of three-way data are becoming important in routine analysis. This work studies the possibility of fitting PARAFAC models with excitation-emission fluorescence data for the determination of ciprofloxacin in human urine. The finally chosen PARAFAC decomposition is built with calibration samples spiked with ciprofloxacin, and with other series of urine samples that were also spiked. One of the series of samples has also another drug because the patient was taking mesalazine. The mesalazine is a fluorescent substance that interferes with the ciprofloxacin. Finally, the procedure is applied to samples of a patient who was being treated with ciprofloxacin. The trueness has been established by the regression "predicted concentration versus added concentration". The recovery factor is 88.3% for ciprofloxacin in urine, and the mean of the absolute value of the relative errors is 4.2% for 46 test samples. The multivariate sensitivity of the fit calibration model is evaluated by a regression between the loadings of PARAFAC linked to ciprofloxacin versus the true concentration in spiked samples. The multivariate capability of discrimination is near 8 microg L(-1) when the probabilities of false non-compliance and false compliance are fixed at 5%.

  13. Collaborative simulations and experiments for a novel yield model of coal devolatilization in oxy-coal combustion conditions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Iavarone, Salvatore; Smith, Sean T.; Smith, Philip J.

    Oxy-coal combustion is an emerging low-cost “clean coal” technology for emissions reduction and Carbon Capture and Sequestration (CCS). The use of Computational Fluid Dynamics (CFD) tools is crucial for the development of cost-effective oxy-fuel technologies and the minimization of environmental concerns at industrial scale. The coupling of detailed chemistry models and CFD simulations is still challenging, especially for large-scale plants, because of the high computational efforts required. The development of scale-bridging models is therefore necessary, to find a good compromise between computational efforts and the physical-chemical modeling precision. This paper presents a procedure for scale-bridging modeling of coal devolatilization, inmore » the presence of experimental error, that puts emphasis on the thermodynamic aspect of devolatilization, namely the final volatile yield of coal, rather than kinetics. The procedure consists of an engineering approach based on dataset consistency and Bayesian methodology including Gaussian-Process Regression (GPR). Experimental data from devolatilization tests carried out in an oxy-coal entrained flow reactor were considered and CFD simulations of the reactor were performed. Jointly evaluating experiments and simulations, a novel yield model was validated against the data via consistency analysis. In parallel, a Gaussian-Process Regression was performed, to improve the understanding of the uncertainty associated to the devolatilization, based on the experimental measurements. Potential model forms that could predict yield during devolatilization were obtained. The set of model forms obtained via GPR includes the yield model that was proven to be consistent with the data. Finally, the overall procedure has resulted in a novel yield model for coal devolatilization and in a valuable evaluation of uncertainty in the data, in the model form, and in the model parameters.« less

  14. Collaborative simulations and experiments for a novel yield model of coal devolatilization in oxy-coal combustion conditions

    DOE PAGES

    Iavarone, Salvatore; Smith, Sean T.; Smith, Philip J.; ...

    2017-06-03

    Oxy-coal combustion is an emerging low-cost “clean coal” technology for emissions reduction and Carbon Capture and Sequestration (CCS). The use of Computational Fluid Dynamics (CFD) tools is crucial for the development of cost-effective oxy-fuel technologies and the minimization of environmental concerns at industrial scale. The coupling of detailed chemistry models and CFD simulations is still challenging, especially for large-scale plants, because of the high computational efforts required. The development of scale-bridging models is therefore necessary, to find a good compromise between computational efforts and the physical-chemical modeling precision. This paper presents a procedure for scale-bridging modeling of coal devolatilization, inmore » the presence of experimental error, that puts emphasis on the thermodynamic aspect of devolatilization, namely the final volatile yield of coal, rather than kinetics. The procedure consists of an engineering approach based on dataset consistency and Bayesian methodology including Gaussian-Process Regression (GPR). Experimental data from devolatilization tests carried out in an oxy-coal entrained flow reactor were considered and CFD simulations of the reactor were performed. Jointly evaluating experiments and simulations, a novel yield model was validated against the data via consistency analysis. In parallel, a Gaussian-Process Regression was performed, to improve the understanding of the uncertainty associated to the devolatilization, based on the experimental measurements. Potential model forms that could predict yield during devolatilization were obtained. The set of model forms obtained via GPR includes the yield model that was proven to be consistent with the data. Finally, the overall procedure has resulted in a novel yield model for coal devolatilization and in a valuable evaluation of uncertainty in the data, in the model form, and in the model parameters.« less

  15. Real-time model learning using Incremental Sparse Spectrum Gaussian Process Regression.

    PubMed

    Gijsberts, Arjan; Metta, Giorgio

    2013-05-01

    Novel applications in unstructured and non-stationary human environments require robots that learn from experience and adapt autonomously to changing conditions. Predictive models therefore not only need to be accurate, but should also be updated incrementally in real-time and require minimal human intervention. Incremental Sparse Spectrum Gaussian Process Regression is an algorithm that is targeted specifically for use in this context. Rather than developing a novel algorithm from the ground up, the method is based on the thoroughly studied Gaussian Process Regression algorithm, therefore ensuring a solid theoretical foundation. Non-linearity and a bounded update complexity are achieved simultaneously by means of a finite dimensional random feature mapping that approximates a kernel function. As a result, the computational cost for each update remains constant over time. Finally, algorithmic simplicity and support for automated hyperparameter optimization ensures convenience when employed in practice. Empirical validation on a number of synthetic and real-life learning problems confirms that the performance of Incremental Sparse Spectrum Gaussian Process Regression is superior with respect to the popular Locally Weighted Projection Regression, while computational requirements are found to be significantly lower. The method is therefore particularly suited for learning with real-time constraints or when computational resources are limited. Copyright © 2012 Elsevier Ltd. All rights reserved.

  16. Molecular Classification Substitutes for the Prognostic Variables Stage, Age, and MYCN Status in Neuroblastoma Risk Assessment.

    PubMed

    Rosswog, Carolina; Schmidt, Rene; Oberthuer, André; Juraeva, Dilafruz; Brors, Benedikt; Engesser, Anne; Kahlert, Yvonne; Volland, Ruth; Bartenhagen, Christoph; Simon, Thorsten; Berthold, Frank; Hero, Barbara; Faldum, Andreas; Fischer, Matthias

    2017-12-01

    Current risk stratification systems for neuroblastoma patients consider clinical, histopathological, and genetic variables, and additional prognostic markers have been proposed in recent years. We here sought to select highly informative covariates in a multistep strategy based on consecutive Cox regression models, resulting in a risk score that integrates hazard ratios of prognostic variables. A cohort of 695 neuroblastoma patients was divided into a discovery set (n=75) for multigene predictor generation, a training set (n=411) for risk score development, and a validation set (n=209). Relevant prognostic variables were identified by stepwise multivariable L1-penalized least absolute shrinkage and selection operator (LASSO) Cox regression, followed by backward selection in multivariable Cox regression, and then integrated into a novel risk score. The variables stage, age, MYCN status, and two multigene predictors, NB-th24 and NB-th44, were selected as independent prognostic markers by LASSO Cox regression analysis. Following backward selection, only the multigene predictors were retained in the final model. Integration of these classifiers in a risk scoring system distinguished three patient subgroups that differed substantially in their outcome. The scoring system discriminated patients with diverging outcome in the validation cohort (5-year event-free survival, 84.9±3.4 vs 63.6±14.5 vs 31.0±5.4; P<.001), and its prognostic value was validated by multivariable analysis. We here propose a translational strategy for developing risk assessment systems based on hazard ratios of relevant prognostic variables. Our final neuroblastoma risk score comprised two multigene predictors only, supporting the notion that molecular properties of the tumor cells strongly impact clinical courses of neuroblastoma patients. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  17. Research on Influence and Prediction Model of Urban Traffic Link Tunnel curvature on Fire Temperature Based on Pyrosim--SPSS Multiple Regression Analysis

    NASA Astrophysics Data System (ADS)

    Li, Xiao Ju; Yao, Kun; Dai, Jun Yu; Song, Yun Long

    2018-05-01

    The underground space, also known as the “fourth dimension” of the city, reflects the efficient use of urban development intensive. Urban traffic link tunnel is a typical underground limited-length space. Due to the geographical location, the special structure of space and the curvature of the tunnel, high-temperature smoke can easily form the phenomenon of “smoke turning” and the fire risk is extremely high. This paper takes an urban traffic link tunnel as an example to focus on the relationship between curvature and the temperature near the fire source, and use the pyrosim built different curvature fire model to analyze the influence of curvature on the temperature of the fire, then using SPSS Multivariate regression analysis simulate curvature of the tunnel and fire temperature data. Finally, a prediction model of urban traffic link tunnel curvature on fire temperature was proposed. The regression model analysis and test show that the curvature is negatively correlated with the tunnel temperature. This model is feasible and can provide a theoretical reference for the urban traffic link tunnel fire protection design and the preparation of the evacuation plan. And also, it provides some reference for other related curved tunnel curvature design and smoke control measures.

  18. Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges

    PubMed Central

    Goldstein, Benjamin A.; Navar, Ann Marie; Carter, Rickey E.

    2017-01-01

    Abstract Risk prediction plays an important role in clinical cardiology research. Traditionally, most risk models have been based on regression models. While useful and robust, these statistical methods are limited to using a small number of predictors which operate in the same way on everyone, and uniformly throughout their range. The purpose of this review is to illustrate the use of machine-learning methods for development of risk prediction models. Typically presented as black box approaches, most machine-learning methods are aimed at solving particular challenges that arise in data analysis that are not well addressed by typical regression approaches. To illustrate these challenges, as well as how different methods can address them, we consider trying to predicting mortality after diagnosis of acute myocardial infarction. We use data derived from our institution's electronic health record and abstract data on 13 regularly measured laboratory markers. We walk through different challenges that arise in modelling these data and then introduce different machine-learning approaches. Finally, we discuss general issues in the application of machine-learning methods including tuning parameters, loss functions, variable importance, and missing data. Overall, this review serves as an introduction for those working on risk modelling to approach the diffuse field of machine learning. PMID:27436868

  19. Simulation of urban land surface temperature based on sub-pixel land cover in a coastal city

    NASA Astrophysics Data System (ADS)

    Zhao, Xiaofeng; Deng, Lei; Feng, Huihui; Zhao, Yanchuang

    2014-11-01

    The sub-pixel urban land cover has been proved to have obvious correlations with land surface temperature (LST). Yet these relationships have seldom been used to simulate LST. In this study we provided a new approach of urban LST simulation based on sub-pixel land cover modeling. Landsat TM/ETM+ images of Xiamen city, China on both the January of 2002 and 2007 were used to acquire land cover and then extract the transformation rule using logistic regression. The transformation possibility was taken as its percent in the same pixel after normalization. And cellular automata were used to acquire simulated sub-pixel land cover on 2007 and 2017. On the other hand, the correlations between retrieved LST and sub-pixel land cover achieved by spectral mixture analysis in 2002 were examined and a regression model was built. Then the regression model was used on simulated 2007 land cover to model the LST of 2007. Finally the LST of 2017 was simulated for urban planning and management. The results showed that our method is useful in LST simulation. Although the simulation accuracy is not quite satisfactory, it provides an important idea and a good start in the modeling of urban LST.

  20. Development and validation of a mortality risk model for pediatric sepsis.

    PubMed

    Chen, Mengshi; Lu, Xiulan; Hu, Li; Liu, Pingping; Zhao, Wenjiao; Yan, Haipeng; Tang, Liang; Zhu, Yimin; Xiao, Zhenghui; Chen, Lizhang; Tan, Hongzhuan

    2017-05-01

    Pediatric sepsis is a burdensome public health problem. Assessing the mortality risk of pediatric sepsis patients, offering effective treatment guidance, and improving prognosis to reduce mortality rates, are crucial.We extracted data derived from electronic medical records of pediatric sepsis patients that were collected during the first 24 hours after admission to the pediatric intensive care unit (PICU) of the Hunan Children's hospital from January 2012 to June 2014. A total of 788 children were randomly divided into a training (592, 75%) and validation group (196, 25%). The risk factors for mortality among these patients were identified by conducting multivariate logistic regression in the training group. Based on the established logistic regression equation, the logit probabilities for all patients (in both groups) were calculated to verify the model's internal and external validities.According to the training group, 6 variables (brain natriuretic peptide, albumin, total bilirubin, D-dimer, lactate levels, and mechanical ventilation in 24 hours) were included in the final logistic regression model. The areas under the curves of the model were 0.854 (0.826, 0.881) and 0.844 (0.816, 0.873) in the training and validation groups, respectively.The Mortality Risk Model for Pediatric Sepsis we established in this study showed acceptable accuracy to predict the mortality risk in pediatric sepsis patients.

  1. Development and validation of a mortality risk model for pediatric sepsis

    PubMed Central

    Chen, Mengshi; Lu, Xiulan; Hu, Li; Liu, Pingping; Zhao, Wenjiao; Yan, Haipeng; Tang, Liang; Zhu, Yimin; Xiao, Zhenghui; Chen, Lizhang; Tan, Hongzhuan

    2017-01-01

    Abstract Pediatric sepsis is a burdensome public health problem. Assessing the mortality risk of pediatric sepsis patients, offering effective treatment guidance, and improving prognosis to reduce mortality rates, are crucial. We extracted data derived from electronic medical records of pediatric sepsis patients that were collected during the first 24 hours after admission to the pediatric intensive care unit (PICU) of the Hunan Children's hospital from January 2012 to June 2014. A total of 788 children were randomly divided into a training (592, 75%) and validation group (196, 25%). The risk factors for mortality among these patients were identified by conducting multivariate logistic regression in the training group. Based on the established logistic regression equation, the logit probabilities for all patients (in both groups) were calculated to verify the model's internal and external validities. According to the training group, 6 variables (brain natriuretic peptide, albumin, total bilirubin, D-dimer, lactate levels, and mechanical ventilation in 24 hours) were included in the final logistic regression model. The areas under the curves of the model were 0.854 (0.826, 0.881) and 0.844 (0.816, 0.873) in the training and validation groups, respectively. The Mortality Risk Model for Pediatric Sepsis we established in this study showed acceptable accuracy to predict the mortality risk in pediatric sepsis patients. PMID:28514310

  2. On the interest of combining an analog model to a regression model for the adaptation of the downscaling link. Application to probabilistic prediction of precipitation over France.

    NASA Astrophysics Data System (ADS)

    Chardon, Jérémy; Hingray, Benoit; Favre, Anne-Catherine

    2016-04-01

    Scenarios of surface weather required for the impact studies have to be unbiased and adapted to the space and time scales of the considered hydro-systems. Hence, surface weather scenarios obtained from global climate models and/or numerical weather prediction models are not really appropriated. Outputs of these models have to be post-processed, which is often carried out thanks to Statistical Downscaling Methods (SDMs). Among those SDMs, approaches based on regression are often applied. For a given station, a regression link can be established between a set of large scale atmospheric predictors and the surface weather variable. These links are then used for the prediction of the latter. However, physical processes generating surface weather vary in time. This is well known for precipitation for instance. The most relevant predictors and the regression link are also likely to vary in time. A better prediction skill is thus classically obtained with a seasonal stratification of the data. Another strategy is to identify the most relevant predictor set and establish the regression link from dates that are similar - or analog - to the target date. In practice, these dates can be selected thanks to an analog model. In this study, we explore the possibility of improving the local performance of an analog model - where the analogy is applied to the geopotential heights 1000 and 500 hPa - using additional local scale predictors for the probabilistic prediction of the Safran precipitation over France. For each prediction day, the prediction is obtained from two GLM regression models - for both the occurrence and the quantity of precipitation - for which predictors and parameters are estimated from the analog dates. Firstly, the resulting combined model noticeably allows increasing the prediction performance by adapting the downscaling link for each prediction day. Secondly, the selected predictors for a given prediction depend on the large scale situation and on the considered region. Finally, even with such an adaptive predictor identification, the downscaling link appears to be robust: for a same prediction day, predictors selected for different locations of a given region are similar and the regression parameters are consistent within the region of interest.

  3. PM10 modeling in the Oviedo urban area (Northern Spain) by using multivariate adaptive regression splines

    NASA Astrophysics Data System (ADS)

    Nieto, Paulino José García; Antón, Juan Carlos Álvarez; Vilán, José Antonio Vilán; García-Gonzalo, Esperanza

    2014-10-01

    The aim of this research work is to build a regression model of the particulate matter up to 10 micrometers in size (PM10) by using the multivariate adaptive regression splines (MARS) technique in the Oviedo urban area (Northern Spain) at local scale. This research work explores the use of a nonparametric regression algorithm known as multivariate adaptive regression splines (MARS) which has the ability to approximate the relationship between the inputs and outputs, and express the relationship mathematically. In this sense, hazardous air pollutants or toxic air contaminants refer to any substance that may cause or contribute to an increase in mortality or serious illness, or that may pose a present or potential hazard to human health. To accomplish the objective of this study, the experimental dataset of nitrogen oxides (NOx), carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3) and dust (PM10) were collected over 3 years (2006-2008) and they are used to create a highly nonlinear model of the PM10 in the Oviedo urban nucleus (Northern Spain) based on the MARS technique. One main objective of this model is to obtain a preliminary estimate of the dependence between PM10 pollutant in the Oviedo urban area at local scale. A second aim is to determine the factors with the greatest bearing on air quality with a view to proposing health and lifestyle improvements. The United States National Ambient Air Quality Standards (NAAQS) establishes the limit values of the main pollutants in the atmosphere in order to ensure the health of healthy people. Firstly, this MARS regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the main pollutants in the Oviedo urban area. Secondly, the main advantages of MARS are its capacity to produce simple, easy-to-interpret models, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, on the basis of these numerical calculations, using the multivariate adaptive regression splines (MARS) technique, conclusions of this research work are exposed.

  4. A regression-kriging model for estimation of rainfall in the Laohahe basin

    NASA Astrophysics Data System (ADS)

    Wang, Hong; Ren, Li L.; Liu, Gao H.

    2009-10-01

    This paper presents a multivariate geostatistical algorithm called regression-kriging (RK) for predicting the spatial distribution of rainfall by incorporating five topographic/geographic factors of latitude, longitude, altitude, slope and aspect. The technique is illustrated using rainfall data collected at 52 rain gauges from the Laohahe basis in northeast China during 1986-2005 . Rainfall data from 44 stations were selected for modeling and the remaining 8 stations were used for model validation. To eliminate multicollinearity, the five explanatory factors were first transformed using factor analysis with three Principal Components (PCs) extracted. The rainfall data were then fitted using step-wise regression and residuals interpolated using SK. The regression coefficients were estimated by generalized least squares (GLS), which takes the spatial heteroskedasticity between rainfall and PCs into account. Finally, the rainfall prediction based on RK was compared with that predicted from ordinary kriging (OK) and ordinary least squares (OLS) multiple regression (MR). For correlated topographic factors are taken into account, RK improves the efficiency of predictions. RK achieved a lower relative root mean square error (RMSE) (44.67%) than MR (49.23%) and OK (73.60%) and a lower bias than MR and OK (23.82 versus 30.89 and 32.15 mm) for annual rainfall. It is much more effective for the wet season than for the dry season. RK is suitable for estimation of rainfall in areas where there are no stations nearby and where topography has a major influence on rainfall.

  5. A Cross-Domain Collaborative Filtering Algorithm Based on Feature Construction and Locally Weighted Linear Regression

    PubMed Central

    Jiang, Feng; Han, Ji-zhong

    2018-01-01

    Cross-domain collaborative filtering (CDCF) solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR) model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods. PMID:29623088

  6. A Cross-Domain Collaborative Filtering Algorithm Based on Feature Construction and Locally Weighted Linear Regression.

    PubMed

    Yu, Xu; Lin, Jun-Yu; Jiang, Feng; Du, Jun-Wei; Han, Ji-Zhong

    2018-01-01

    Cross-domain collaborative filtering (CDCF) solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR) model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods.

  7. Hypothesis testing in functional linear regression models with Neyman's truncation and wavelet thresholding for longitudinal data.

    PubMed

    Yang, Xiaowei; Nie, Kun

    2008-03-15

    Longitudinal data sets in biomedical research often consist of large numbers of repeated measures. In many cases, the trajectories do not look globally linear or polynomial, making it difficult to summarize the data or test hypotheses using standard longitudinal data analysis based on various linear models. An alternative approach is to apply the approaches of functional data analysis, which directly target the continuous nonlinear curves underlying discretely sampled repeated measures. For the purposes of data exploration, many functional data analysis strategies have been developed based on various schemes of smoothing, but fewer options are available for making causal inferences regarding predictor-outcome relationships, a common task seen in hypothesis-driven medical studies. To compare groups of curves, two testing strategies with good power have been proposed for high-dimensional analysis of variance: the Fourier-based adaptive Neyman test and the wavelet-based thresholding test. Using a smoking cessation clinical trial data set, this paper demonstrates how to extend the strategies for hypothesis testing into the framework of functional linear regression models (FLRMs) with continuous functional responses and categorical or continuous scalar predictors. The analysis procedure consists of three steps: first, apply the Fourier or wavelet transform to the original repeated measures; then fit a multivariate linear model in the transformed domain; and finally, test the regression coefficients using either adaptive Neyman or thresholding statistics. Since a FLRM can be viewed as a natural extension of the traditional multiple linear regression model, the development of this model and computational tools should enhance the capacity of medical statistics for longitudinal data.

  8. Quality Reporting of Multivariable Regression Models in Observational Studies: Review of a Representative Sample of Articles Published in Biomedical Journals.

    PubMed

    Real, Jordi; Forné, Carles; Roso-Llorach, Albert; Martínez-Sánchez, Jose M

    2016-05-01

    Controlling for confounders is a crucial step in analytical observational studies, and multivariable models are widely used as statistical adjustment techniques. However, the validation of the assumptions of the multivariable regression models (MRMs) should be made clear in scientific reporting. The objective of this study is to review the quality of statistical reporting of the most commonly used MRMs (logistic, linear, and Cox regression) that were applied in analytical observational studies published between 2003 and 2014 by journals indexed in MEDLINE.Review of a representative sample of articles indexed in MEDLINE (n = 428) with observational design and use of MRMs (logistic, linear, and Cox regression). We assessed the quality of reporting about: model assumptions and goodness-of-fit, interactions, sensitivity analysis, crude and adjusted effect estimate, and specification of more than 1 adjusted model.The tests of underlying assumptions or goodness-of-fit of the MRMs used were described in 26.2% (95% CI: 22.0-30.3) of the articles and 18.5% (95% CI: 14.8-22.1) reported the interaction analysis. Reporting of all items assessed was higher in articles published in journals with a higher impact factor.A low percentage of articles indexed in MEDLINE that used multivariable techniques provided information demonstrating rigorous application of the model selected as an adjustment method. Given the importance of these methods to the final results and conclusions of observational studies, greater rigor is required in reporting the use of MRMs in the scientific literature.

  9. Interrupted time series regression for the evaluation of public health interventions: a tutorial.

    PubMed

    Bernal, James Lopez; Cummins, Steven; Gasparrini, Antonio

    2017-02-01

    Interrupted time series (ITS) analysis is a valuable study design for evaluating the effectiveness of population-level health interventions that have been implemented at a clearly defined point in time. It is increasingly being used to evaluate the effectiveness of interventions ranging from clinical therapy to national public health legislation. Whereas the design shares many properties of regression-based approaches in other epidemiological studies, there are a range of unique features of time series data that require additional methodological considerations. In this tutorial we use a worked example to demonstrate a robust approach to ITS analysis using segmented regression. We begin by describing the design and considering when ITS is an appropriate design choice. We then discuss the essential, yet often omitted, step of proposing the impact model a priori. Subsequently, we demonstrate the approach to statistical analysis including the main segmented regression model. Finally we describe the main methodological issues associated with ITS analysis: over-dispersion of time series data, autocorrelation, adjusting for seasonal trends and controlling for time-varying confounders, and we also outline some of the more complex design adaptations that can be used to strengthen the basic ITS design.

  10. Interrupted time series regression for the evaluation of public health interventions: a tutorial

    PubMed Central

    Bernal, James Lopez; Cummins, Steven; Gasparrini, Antonio

    2017-01-01

    Abstract Interrupted time series (ITS) analysis is a valuable study design for evaluating the effectiveness of population-level health interventions that have been implemented at a clearly defined point in time. It is increasingly being used to evaluate the effectiveness of interventions ranging from clinical therapy to national public health legislation. Whereas the design shares many properties of regression-based approaches in other epidemiological studies, there are a range of unique features of time series data that require additional methodological considerations. In this tutorial we use a worked example to demonstrate a robust approach to ITS analysis using segmented regression. We begin by describing the design and considering when ITS is an appropriate design choice. We then discuss the essential, yet often omitted, step of proposing the impact model a priori. Subsequently, we demonstrate the approach to statistical analysis including the main segmented regression model. Finally we describe the main methodological issues associated with ITS analysis: over-dispersion of time series data, autocorrelation, adjusting for seasonal trends and controlling for time-varying confounders, and we also outline some of the more complex design adaptations that can be used to strengthen the basic ITS design. PMID:27283160

  11. Regression analysis for LED color detection of visual-MIMO system

    NASA Astrophysics Data System (ADS)

    Banik, Partha Pratim; Saha, Rappy; Kim, Ki-Doo

    2018-04-01

    Color detection from a light emitting diode (LED) array using a smartphone camera is very difficult in a visual multiple-input multiple-output (visual-MIMO) system. In this paper, we propose a method to determine the LED color using a smartphone camera by applying regression analysis. We employ a multivariate regression model to identify the LED color. After taking a picture of an LED array, we select the LED array region, and detect the LED using an image processing algorithm. We then apply the k-means clustering algorithm to determine the number of potential colors for feature extraction of each LED. Finally, we apply the multivariate regression model to predict the color of the transmitted LEDs. In this paper, we show our results for three types of environmental light condition: room environmental light, low environmental light (560 lux), and strong environmental light (2450 lux). We compare the results of our proposed algorithm from the analysis of training and test R-Square (%) values, percentage of closeness of transmitted and predicted colors, and we also mention about the number of distorted test data points from the analysis of distortion bar graph in CIE1931 color space.

  12. Searching for a two-factor model of marriage duration: commentary on Gottman and Levenson.

    PubMed

    DeKay, Michael L; Greeno, Catherine G; Houck, Patricia R

    2002-01-01

    Gottman and Levenson (2002) report a number of post hoc ordinary least squares regressions to "predict" the length of marriage, given that divorce has occurred. We argue that the type of statistical model they use is inappropriate for answering clinically relevant questions about the causes and timing of divorce, and present several reasons why an alternative family of models called duration models would be more appropriate. The distribution of marriage length is not bimodal, as Gottman and Levenson suggest, and their search for a two-factor model for explaining marriage length is misguided. Their regression models omit many variables known to affect marriage length, and instead use variables that were pre-screened for their predictive ability. Their final model is based on data for only 15 cases, including one unusual case that has undue influence on the results. For these and other technical reasons presented in the text, we believe that Gottman and Levenson's results are not replicable, and that they should not be used to guide interventions for couples in clinical settings.

  13. The Effect of Alcohol Abuse and Dependence and School Experiences on Depression: A National Study of Adolescents

    ERIC Educational Resources Information Center

    Merianos, Ashley L.; King, Keith A.; Vidourek, Rebecca A.; Hardee, Angelica M.

    2016-01-01

    The study purpose was to examine the effect alcohol abuse/dependence and school experiences have on depression among a nationwide sample of adolescents. A secondary analysis of the 2013 National Survey on Drug Use and Health was conducted. The results of the final multivariable logistic regression model revealed that adolescents who reported…

  14. Genotype-phenotype association study via new multi-task learning model

    PubMed Central

    Huo, Zhouyuan; Shen, Dinggang

    2018-01-01

    Research on the associations between genetic variations and imaging phenotypes is developing with the advance in high-throughput genotype and brain image techniques. Regression analysis of single nucleotide polymorphisms (SNPs) and imaging measures as quantitative traits (QTs) has been proposed to identify the quantitative trait loci (QTL) via multi-task learning models. Recent studies consider the interlinked structures within SNPs and imaging QTs through group lasso, e.g. ℓ2,1-norm, leading to better predictive results and insights of SNPs. However, group sparsity is not enough for representing the correlation between multiple tasks and ℓ2,1-norm regularization is not robust either. In this paper, we propose a new multi-task learning model to analyze the associations between SNPs and QTs. We suppose that low-rank structure is also beneficial to uncover the correlation between genetic variations and imaging phenotypes. Finally, we conduct regression analysis of SNPs and QTs. Experimental results show that our model is more accurate in prediction than compared methods and presents new insights of SNPs. PMID:29218896

  15. Predictive model of third molar eruption after second molar extraction.

    PubMed

    De-la-Rosa-Gay, Cristina; Valmaseda-Castellón, Eduard; Gay-Escoda, Cosme

    2010-03-01

    Extraction of second permanent molars is an option for providing space in orthodontic treatment. Although many articles have described its impact on the outcome, there are few data on the prognosis of the eruption of the adjacent third molars. The aims of this investigation were to provide predictive models of eruption of third molars after second permanent molar extraction and to validate them. A total of 48 patients (ages, 11-23 years) who had 128 second permanent molars (54 maxillary, 74 mandibular) extracted during orthodontic treatment were followed until eruption of the third molars was complete. A lineal regression model predicted the final angle of the third molars with the permanent first molar by using the variables of initial angle, jaw, and the developmental stage of the third molar. A logistic regression model predicted the probability of correct eruption by using the variables of initial angle, jaw, sex, age, and the developmental stage of the third molar. 2010 American Association of Orthodontists. Published by Mosby, Inc. All rights reserved.

  16. Using a Guided Machine Learning Ensemble Model to Predict Discharge Disposition following Meningioma Resection.

    PubMed

    Muhlestein, Whitney E; Akagi, Dallin S; Kallos, Justiss A; Morone, Peter J; Weaver, Kyle D; Thompson, Reid C; Chambless, Lola B

    2018-04-01

    Objective  Machine learning (ML) algorithms are powerful tools for predicting patient outcomes. This study pilots a novel approach to algorithm selection and model creation using prediction of discharge disposition following meningioma resection as a proof of concept. Materials and Methods  A diversity of ML algorithms were trained on a single-institution database of meningioma patients to predict discharge disposition. Algorithms were ranked by predictive power and top performers were combined to create an ensemble model. The final ensemble was internally validated on never-before-seen data to demonstrate generalizability. The predictive power of the ensemble was compared with a logistic regression. Further analyses were performed to identify how important variables impact the ensemble. Results  Our ensemble model predicted disposition significantly better than a logistic regression (area under the curve of 0.78 and 0.71, respectively, p  = 0.01). Tumor size, presentation at the emergency department, body mass index, convexity location, and preoperative motor deficit most strongly influence the model, though the independent impact of individual variables is nuanced. Conclusion  Using a novel ML technique, we built a guided ML ensemble model that predicts discharge destination following meningioma resection with greater predictive power than a logistic regression, and that provides greater clinical insight than a univariate analysis. These techniques can be extended to predict many other patient outcomes of interest.

  17. Parameter estimation in Cox models with missing failure indicators and the OPPERA study.

    PubMed

    Brownstein, Naomi C; Cai, Jianwen; Slade, Gary D; Bair, Eric

    2015-12-30

    In a prospective cohort study, examining all participants for incidence of the condition of interest may be prohibitively expensive. For example, the "gold standard" for diagnosing temporomandibular disorder (TMD) is a physical examination by a trained clinician. In large studies, examining all participants in this manner is infeasible. Instead, it is common to use questionnaires to screen for incidence of TMD and perform the "gold standard" examination only on participants who screen positively. Unfortunately, some participants may leave the study before receiving the "gold standard" examination. Within the framework of survival analysis, this results in missing failure indicators. Motivated by the Orofacial Pain: Prospective Evaluation and Risk Assessment (OPPERA) study, a large cohort study of TMD, we propose a method for parameter estimation in survival models with missing failure indicators. We estimate the probability of being an incident case for those lacking a "gold standard" examination using logistic regression. These estimated probabilities are used to generate multiple imputations of case status for each missing examination that are combined with observed data in appropriate regression models. The variance introduced by the procedure is estimated using multiple imputation. The method can be used to estimate both regression coefficients in Cox proportional hazard models as well as incidence rates using Poisson regression. We simulate data with missing failure indicators and show that our method performs as well as or better than competing methods. Finally, we apply the proposed method to data from the OPPERA study. Copyright © 2015 John Wiley & Sons, Ltd.

  18. Characterizing the spatial distribution of ambient ultrafine particles in Toronto, Canada: A land use regression model.

    PubMed

    Weichenthal, Scott; Van Ryswyk, Keith; Goldstein, Alon; Shekarrizfard, Maryam; Hatzopoulou, Marianne

    2016-01-01

    Exposure models are needed to evaluate the chronic health effects of ambient ultrafine particles (<0.1 μm) (UFPs). We developed a land use regression model for ambient UFPs in Toronto, Canada using mobile monitoring data collected during summer/winter 2010-2011. In total, 405 road segments were included in the analysis. The final model explained 67% of the spatial variation in mean UFPs and included terms for the logarithm of distances to highways, major roads, the central business district, Pearson airport, and bus routes as well as variables for the number of on-street trees, parks, open space, and the length of bus routes within a 100 m buffer. There was no systematic difference between measured and predicted values when the model was evaluated in an external dataset, although the R(2) value decreased (R(2) = 50%). This model will be used to evaluate the chronic health effects of UFPs using population-based cohorts in the Toronto area. Crown Copyright © 2015. Published by Elsevier Ltd. All rights reserved.

  19. PREDICTION OF MALIGNANT BREAST LESIONS FROM MRI FEATURES: A COMPARISON OF ARTIFICIAL NEURAL NETWORK AND LOGISTIC REGRESSION TECHNIQUES

    PubMed Central

    McLaren, Christine E.; Chen, Wen-Pin; Nie, Ke; Su, Min-Ying

    2009-01-01

    Rationale and Objectives Dynamic contrast enhanced MRI (DCE-MRI) is a clinical imaging modality for detection and diagnosis of breast lesions. Analytical methods were compared for diagnostic feature selection and performance of lesion classification to differentiate between malignant and benign lesions in patients. Materials and Methods The study included 43 malignant and 28 benign histologically-proven lesions. Eight morphological parameters, ten gray level co-occurrence matrices (GLCM) texture features, and fourteen Laws’ texture features were obtained using automated lesion segmentation and quantitative feature extraction. Artificial neural network (ANN) and logistic regression analysis were compared for selection of the best predictors of malignant lesions among the normalized features. Results Using ANN, the final four selected features were compactness, energy, homogeneity, and Law_LS, with area under the receiver operating characteristic curve (AUC) = 0.82, and accuracy = 0.76. The diagnostic performance of these 4-features computed on the basis of logistic regression yielded AUC = 0.80 (95% CI, 0.688 to 0.905), similar to that of ANN. The analysis also shows that the odds of a malignant lesion decreased by 48% (95% CI, 25% to 92%) for every increase of 1 SD in the Law_LS feature, adjusted for differences in compactness, energy, and homogeneity. Using logistic regression with z-score transformation, a model comprised of compactness, NRL entropy, and gray level sum average was selected, and it had the highest overall accuracy of 0.75 among all models, with AUC = 0.77 (95% CI, 0.660 to 0.880). When logistic modeling of transformations using the Box-Cox method was performed, the most parsimonious model with predictors, compactness and Law_LS, had an AUC of 0.79 (95% CI, 0.672 to 0.898). Conclusion The diagnostic performance of models selected by ANN and logistic regression was similar. The analytic methods were found to be roughly equivalent in terms of predictive ability when a small number of variables were chosen. The robust ANN methodology utilizes a sophisticated non-linear model, while logistic regression analysis provides insightful information to enhance interpretation of the model features. PMID:19409817

  20. Internship workplace preferences of final-year medical students at Zagreb University Medical School, Croatia: all roads lead to Zagreb.

    PubMed

    Polasek, Ozren; Kolcic, Ivana; Dzakula, Aleksandar; Bagat, Mario

    2006-04-01

    Human resources management in health often encounters problems related to workforce geographical distribution. The aim of this study was to investigate the internship workplace preferences of final-year medical students and the reasons associated with their choices. A total of 204 out of 240 final-year medical students at Zagreb University Medical School, Croatia, were surveyed a few months before graduation. We collected data on each student's background, workplace preference, academic performance and emigration preferences. Logistic regression was used to analyse the factors underlying internship workplace preference, classified into two categories: Zagreb versus other areas. Only 39 respondents (19.1%) wanted to obtain internships outside Zagreb, the Croatian capital. Gender and age were not significantly associated with internship workplace preference. A single predictor variable significantly contributed to the logistic regression model: students who believed they would not get the desired specialty more often chose Zagreb as a preferred internship workplace (odds ratio 0.32, 95% CI 0.12-0.86). A strong preference for Zagreb as an internship workplace was recorded. Uncertainty about getting the desired specialty was associated with choosing Zagreb as a workplace, possibly due to more extensive and diverse job opportunities.

  1. High dimensional linear regression models under long memory dependence and measurement error

    NASA Astrophysics Data System (ADS)

    Kaul, Abhishek

    This dissertation consists of three chapters. The first chapter introduces the models under consideration and motivates problems of interest. A brief literature review is also provided in this chapter. The second chapter investigates the properties of Lasso under long range dependent model errors. Lasso is a computationally efficient approach to model selection and estimation, and its properties are well studied when the regression errors are independent and identically distributed. We study the case, where the regression errors form a long memory moving average process. We establish a finite sample oracle inequality for the Lasso solution. We then show the asymptotic sign consistency in this setup. These results are established in the high dimensional setup (p> n) where p can be increasing exponentially with n. Finally, we show the consistency, n½ --d-consistency of Lasso, along with the oracle property of adaptive Lasso, in the case where p is fixed. Here d is the memory parameter of the stationary error sequence. The performance of Lasso is also analysed in the present setup with a simulation study. The third chapter proposes and investigates the properties of a penalized quantile based estimator for measurement error models. Standard formulations of prediction problems in high dimension regression models assume the availability of fully observed covariates and sub-Gaussian and homogeneous model errors. This makes these methods inapplicable to measurement errors models where covariates are unobservable and observations are possibly non sub-Gaussian and heterogeneous. We propose weighted penalized corrected quantile estimators for the regression parameter vector in linear regression models with additive measurement errors, where unobservable covariates are nonrandom. The proposed estimators forgo the need for the above mentioned model assumptions. We study these estimators in both the fixed dimension and high dimensional sparse setups, in the latter setup, the dimensionality can grow exponentially with the sample size. In the fixed dimensional setting we provide the oracle properties associated with the proposed estimators. In the high dimensional setting, we provide bounds for the statistical error associated with the estimation, that hold with asymptotic probability 1, thereby providing the ℓ1-consistency of the proposed estimator. We also establish the model selection consistency in terms of the correctly estimated zero components of the parameter vector. A simulation study that investigates the finite sample accuracy of the proposed estimator is also included in this chapter.

  2. Lifespan development of pro- and anti-saccades: multiple regression models for point estimates.

    PubMed

    Klein, Christoph; Foerster, Friedrich; Hartnegg, Klaus; Fischer, Burkhart

    2005-12-07

    The comparative study of anti- and pro-saccade task performance contributes to our functional understanding of the frontal lobes, their alterations in psychiatric or neurological populations, and their changes during the life span. In the present study, we apply regression analysis to model life span developmental effects on various pro- and anti-saccade task parameters, using data of a non-representative sample of 327 participants aged 9 to 88 years. Development up to the age of about 27 years was dominated by curvilinear rather than linear effects of age. Furthermore, the largest developmental differences were found for intra-subject variability measures and the anti-saccade task parameters. Ageing, by contrast, had the shape of a global linear decline of the investigated saccade functions, lacking the differential effects of age observed during development. While these results do support the assumption that frontal lobe functions can be distinguished from other functions by their strong and protracted development, they do not confirm the assumption of disproportionate deterioration of frontal lobe functions with ageing. We finally show that the regression models applied here to quantify life span developmental effects can also be used for individual predictions in applied research contexts or clinical practice.

  3. Combined Prediction Model of Death Toll for Road Traffic Accidents Based on Independent and Dependent Variables

    PubMed Central

    Zhong-xiang, Feng; Shi-sheng, Lu; Wei-hua, Zhang; Nan-nan, Zhang

    2014-01-01

    In order to build a combined model which can meet the variation rule of death toll data for road traffic accidents and can reflect the influence of multiple factors on traffic accidents and improve prediction accuracy for accidents, the Verhulst model was built based on the number of death tolls for road traffic accidents in China from 2002 to 2011; and car ownership, population, GDP, highway freight volume, highway passenger transportation volume, and highway mileage were chosen as the factors to build the death toll multivariate linear regression model. Then the two models were combined to be a combined prediction model which has weight coefficient. Shapley value method was applied to calculate the weight coefficient by assessing contributions. Finally, the combined model was used to recalculate the number of death tolls from 2002 to 2011, and the combined model was compared with the Verhulst and multivariate linear regression models. The results showed that the new model could not only characterize the death toll data characteristics but also quantify the degree of influence to the death toll by each influencing factor and had high accuracy as well as strong practicability. PMID:25610454

  4. Combined prediction model of death toll for road traffic accidents based on independent and dependent variables.

    PubMed

    Feng, Zhong-xiang; Lu, Shi-sheng; Zhang, Wei-hua; Zhang, Nan-nan

    2014-01-01

    In order to build a combined model which can meet the variation rule of death toll data for road traffic accidents and can reflect the influence of multiple factors on traffic accidents and improve prediction accuracy for accidents, the Verhulst model was built based on the number of death tolls for road traffic accidents in China from 2002 to 2011; and car ownership, population, GDP, highway freight volume, highway passenger transportation volume, and highway mileage were chosen as the factors to build the death toll multivariate linear regression model. Then the two models were combined to be a combined prediction model which has weight coefficient. Shapley value method was applied to calculate the weight coefficient by assessing contributions. Finally, the combined model was used to recalculate the number of death tolls from 2002 to 2011, and the combined model was compared with the Verhulst and multivariate linear regression models. The results showed that the new model could not only characterize the death toll data characteristics but also quantify the degree of influence to the death toll by each influencing factor and had high accuracy as well as strong practicability.

  5. Relationship between body composition and postural control in prepubertal overweight/obese children: A cross-sectional study.

    PubMed

    Villarrasa-Sapiña, Israel; Álvarez-Pitti, Julio; Cabeza-Ruiz, Ruth; Redón, Pau; Lurbe, Empar; García-Massó, Xavier

    2018-02-01

    Excess body weight during childhood causes reduced motor functionality and problems in postural control, a negative influence which has been reported in the literature. Nevertheless, no information regarding the effect of body composition on the postural control of overweight and obese children is available. The objective of this study was therefore to establish these relationships. A cross-sectional design was used to establish relationships between body composition and postural control variables obtained in bipedal eyes-open and eyes-closed conditions in twenty-two children. Centre of pressure signals were analysed in the temporal and frequency domains. Pearson correlations were applied to establish relationships between variables. Principal component analysis was applied to the body composition variables to avoid potential multicollinearity in the regression models. These principal components were used to perform a multiple linear regression analysis, from which regression models were obtained to predict postural control. Height and leg mass were the body composition variables that showed the highest correlation with postural control. Multiple regression models were also obtained and several of these models showed a higher correlation coefficient in predicting postural control than simple correlations. These models revealed that leg and trunk mass were good predictors of postural control. More equations were found in the eyes-open than eyes-closed condition. Body weight and height are negatively correlated with postural control. However, leg and trunk mass are better postural control predictors than arm or body mass. Finally, body composition variables are more useful in predicting postural control when the eyes are open. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. Large signal-to-noise ratio quantification in MLE for ARARMAX models

    NASA Astrophysics Data System (ADS)

    Zou, Yiqun; Tang, Xiafei

    2014-06-01

    It has been shown that closed-loop linear system identification by indirect method can be generally transferred to open-loop ARARMAX (AutoRegressive AutoRegressive Moving Average with eXogenous input) estimation. For such models, the gradient-related optimisation with large enough signal-to-noise ratio (SNR) can avoid the potential local convergence in maximum likelihood estimation. To ease the application of this condition, the threshold SNR needs to be quantified. In this paper, we build the amplitude coefficient which is an equivalence to the SNR and prove the finiteness of the threshold amplitude coefficient within the stability region. The quantification of threshold is achieved by the minimisation of an elaborately designed multi-variable cost function which unifies all the restrictions on the amplitude coefficient. The corresponding algorithm based on two sets of physically realisable system input-output data details the minimisation and also points out how to use the gradient-related method to estimate ARARMAX parameters when local minimum is present as the SNR is small. Then, the algorithm is tested on a theoretical AutoRegressive Moving Average with eXogenous input model for the derivation of the threshold and a gas turbine engine real system for model identification, respectively. Finally, the graphical validation of threshold on a two-dimensional plot is discussed.

  7. Sensitivity Analysis of Mechanical Parameters of Different Rock Layers to the Stability of Coal Roadway in Soft Rock Strata

    PubMed Central

    Zhao, Zeng-hui; Wang, Wei-ming; Gao, Xin; Yan, Ji-xing

    2013-01-01

    According to the geological characteristics of Xinjiang Ili mine in western area of China, a physical model of interstratified strata composed of soft rock and hard coal seam was established. Selecting the tunnel position, deformation modulus, and strength parameters of each layer as influencing factors, the sensitivity coefficient of roadway deformation to each parameter was firstly analyzed based on a Mohr-Columb strain softening model and nonlinear elastic-plastic finite element analysis. Then the effect laws of influencing factors which showed high sensitivity were further discussed. Finally, a regression model for the relationship between roadway displacements and multifactors was obtained by equivalent linear regression under multiple factors. The results show that the roadway deformation is highly sensitive to the depth of coal seam under the floor which should be considered in the layout of coal roadway; deformation modulus and strength of coal seam and floor have a great influence on the global stability of tunnel; on the contrary, roadway deformation is not sensitive to the mechanical parameters of soft roof; roadway deformation under random combinations of multi-factors can be deduced by the regression model. These conclusions provide theoretical significance to the arrangement and stability maintenance of coal roadway. PMID:24459447

  8. High and low frequency unfolded partial least squares regression based on empirical mode decomposition for quantitative analysis of fuel oil samples.

    PubMed

    Bian, Xihui; Li, Shujuan; Lin, Ligang; Tan, Xiaoyao; Fan, Qingjie; Li, Ming

    2016-06-21

    Accurate prediction of the model is fundamental to the successful analysis of complex samples. To utilize abundant information embedded over frequency and time domains, a novel regression model is presented for quantitative analysis of hydrocarbon contents in the fuel oil samples. The proposed method named as high and low frequency unfolded PLSR (HLUPLSR), which integrates empirical mode decomposition (EMD) and unfolded strategy with partial least squares regression (PLSR). In the proposed method, the original signals are firstly decomposed into a finite number of intrinsic mode functions (IMFs) and a residue by EMD. Secondly, the former high frequency IMFs are summed as a high frequency matrix and the latter IMFs and residue are summed as a low frequency matrix. Finally, the two matrices are unfolded to an extended matrix in variable dimension, and then the PLSR model is built between the extended matrix and the target values. Coupled with Ultraviolet (UV) spectroscopy, HLUPLSR has been applied to determine hydrocarbon contents of light gas oil and diesel fuels samples. Comparing with single PLSR and other signal processing techniques, the proposed method shows superiority in prediction ability and better model interpretation. Therefore, HLUPLSR method provides a promising tool for quantitative analysis of complex samples. Copyright © 2016 Elsevier B.V. All rights reserved.

  9. A Retrospective Medical Records Review of Risk Factors for the Development of Respiratory Tract Secretions (Death Rattle) in the Dying Patient.

    PubMed

    Kolb, Hildegard; Snowden, Austyn; Stevens, Elaine; Atherton, Iain

    2018-05-09

    Identification of risk factors predicting the development of death rattle. Respiratory tract secretions, often called death rattle, are among the most common symptoms in dying patients around the world. It is unknown whether death rattle causes distress in patients, but it has been globally reported that distress levels can be high in family members. Although there is a poor evidence base, treatment with antimuscarinic medication is standard practice worldwide and prompt intervention is recognised as crucial for effectiveness. The identification of risk factors for the development of death rattle would allow for targeted interventions. A case ̶ control study was designed to retrospectively review two hundred consecutive medical records of mainly cancer patients who died in a hospice inpatient setting between 2009 - 2011. Fifteen potential risk factors including the original factors weight, smoking, final opioid dose and final Midazolam dose were investigated. Binary logistic regression to identify risk factors for death rattle development. Univariate analysis showed death rattle was significantly associated with final Midazolam doses and final opioid doses, length of dying phase and anticholinergic drug load in the pre-terminal phase. In the final logistic regression model only Midazolam was statistically significant and only at final doses of 20 mg/24hrs or over (OR 3.81 CI 1.41-10.34). Dying patients with a requirement for a high dose of Midazolam have an increased likelihood of developing death rattle. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  10. The statistical analysis of multi-environment data: modeling genotype-by-environment interaction and its genetic basis

    PubMed Central

    Malosetti, Marcos; Ribaut, Jean-Marcel; van Eeuwijk, Fred A.

    2013-01-01

    Genotype-by-environment interaction (GEI) is an important phenomenon in plant breeding. This paper presents a series of models for describing, exploring, understanding, and predicting GEI. All models depart from a two-way table of genotype by environment means. First, a series of descriptive and explorative models/approaches are presented: Finlay–Wilkinson model, AMMI model, GGE biplot. All of these approaches have in common that they merely try to group genotypes and environments and do not use other information than the two-way table of means. Next, factorial regression is introduced as an approach to explicitly introduce genotypic and environmental covariates for describing and explaining GEI. Finally, QTL modeling is presented as a natural extension of factorial regression, where marker information is translated into genetic predictors. Tests for regression coefficients corresponding to these genetic predictors are tests for main effect QTL expression and QTL by environment interaction (QEI). QTL models for which QEI depends on environmental covariables form an interesting model class for predicting GEI for new genotypes and new environments. For realistic modeling of genotypic differences across multiple environments, sophisticated mixed models are necessary to allow for heterogeneity of genetic variances and correlations across environments. The use and interpretation of all models is illustrated by an example data set from the CIMMYT maize breeding program, containing environments differing in drought and nitrogen stress. To help readers to carry out the statistical analyses, GenStat® programs, 15th Edition and Discovery® version, are presented as “Appendix.” PMID:23487515

  11. On the Effectiveness of Security Countermeasures for Critical Infrastructures.

    PubMed

    Hausken, Kjell; He, Fei

    2016-04-01

    A game-theoretic model is developed where an infrastructure of N targets is protected against terrorism threats. An original threat score is determined by the terrorist's threat against each target and the government's inherent protection level and original protection. The final threat score is impacted by the government's additional protection. We investigate and verify the effectiveness of countermeasures using empirical data and two methods. The first is to estimate the model's parameter values to minimize the sum of the squared differences between the government's additional resource investment predicted by the model and the empirical data. The second is to develop a multivariate regression model where the final threat score varies approximately linearly relative to the original threat score, sectors, and threat scenarios, and depends nonlinearly on the additional resource investment. The model and method are offered as tools, and as a way of thinking, to determine optimal resource investments across vulnerable targets subject to terrorism threats. © 2014 Society for Risk Analysis.

  12. Incremental online learning in high dimensions.

    PubMed

    Vijayakumar, Sethu; D'Souza, Aaron; Schaal, Stefan

    2005-12-01

    Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high-dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally efficient and numerically robust, each local model performs the regression analysis with a small number of univariate regressions in selected directions in input space in the spirit of partial least squares regression. We discuss when and how local learning techniques can successfully work in high-dimensional spaces and review the various techniques for local dimensionality reduction before finally deriving the LWPR algorithm. The properties of LWPR are that it (1) learns rapidly with second-order learning methods based on incremental training, (2) uses statistically sound stochastic leave-one-out cross validation for learning without the need to memorize training data, (3) adjusts its weighting kernels based on only local information in order to minimize the danger of negative interference of incremental learning, (4) has a computational complexity that is linear in the number of inputs, and (5) can deal with a large number of-possibly redundant-inputs, as shown in various empirical evaluations with up to 90 dimensional data sets. For a probabilistic interpretation, predictive variance and confidence intervals are derived. To our knowledge, LWPR is the first truly incremental spatially localized learning method that can successfully and efficiently operate in very high-dimensional spaces.

  13. The regression discontinuity design showed to be a valid alternative to a randomized controlled trial for estimating treatment effects.

    PubMed

    Maas, Iris L; Nolte, Sandra; Walter, Otto B; Berger, Thomas; Hautzinger, Martin; Hohagen, Fritz; Lutz, Wolfgang; Meyer, Björn; Schröder, Johanna; Späth, Christina; Klein, Jan Philipp; Moritz, Steffen; Rose, Matthias

    2017-02-01

    To compare treatment effect estimates obtained from a regression discontinuity (RD) design with results from an actual randomized controlled trial (RCT). Data from an RCT (EVIDENT), which studied the effect of an Internet intervention on depressive symptoms measured with the Patient Health Questionnaire (PHQ-9), were used to perform an RD analysis, in which treatment allocation was determined by a cutoff value at baseline (PHQ-9 = 10). A linear regression model was fitted to the data, selecting participants above the cutoff who had received the intervention (n = 317) and control participants below the cutoff (n = 187). Outcome was PHQ-9 sum score 12 weeks after baseline. Robustness of the effect estimate was studied; the estimate was compared with the RCT treatment effect. The final regression model showed a regression coefficient of -2.29 [95% confidence interval (CI): -3.72 to -.85] compared with a treatment effect found in the RCT of -1.57 (95% CI: -2.07 to -1.07). Although the estimates obtained from two designs are not equal, their confidence intervals overlap, suggesting that an RD design can be a valid alternative for RCTs. This finding is particularly important for situations where an RCT may not be feasible or ethical as is often the case in clinical research settings. Copyright © 2016 Elsevier Inc. All rights reserved.

  14. A Two-Step Method to Select Major Surge-Producing Extratropical Cyclones from a 10,000-Year Stochastic Catalog

    NASA Astrophysics Data System (ADS)

    Keshtpoor, M.; Carnacina, I.; Yablonsky, R. M.

    2016-12-01

    Extratropical cyclones (ETCs) are the primary driver of storm surge events along the UK and northwest mainland Europe coastlines. In an effort to evaluate the storm surge risk in coastal communities in this region, a stochastic catalog is developed by perturbing the historical storm seeds of European ETCs to account for 10,000 years of possible ETCs. Numerical simulation of the storm surge generated by the full 10,000-year stochastic catalog, however, is computationally expensive and may take several months to complete with available computational resources. A new statistical regression model is developed to select the major surge-generating events from the stochastic ETC catalog. This regression model is based on the maximum storm surge, obtained via numerical simulations using a calibrated version of the Delft3D-FM hydrodynamic model with a relatively coarse mesh, of 1750 historical ETC events that occurred over the past 38 years in Europe. These numerically-simulated surge values were regressed to the local sea level pressure and the U and V components of the wind field at the location of 196 tide gauge stations near the UK and northwest mainland Europe coastal areas. The regression model suggests that storm surge values in the area of interest are highly correlated to the U- and V-component of wind speed, as well as the sea level pressure. Based on these correlations, the regression model was then used to select surge-generating storms from the 10,000-year stochastic catalog. Results suggest that roughly 105,000 events out of 480,000 stochastic storms are surge-generating events and need to be considered for numerical simulation using a hydrodynamic model. The selected stochastic storms were then simulated in Delft3D-FM, and the final refinement of the storm population was performed based on return period analysis of the 1750 historical event simulations at each of the 196 tide gauges in preparation for Delft3D-FM fine mesh simulations.

  15. Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges.

    PubMed

    Goldstein, Benjamin A; Navar, Ann Marie; Carter, Rickey E

    2017-06-14

    Risk prediction plays an important role in clinical cardiology research. Traditionally, most risk models have been based on regression models. While useful and robust, these statistical methods are limited to using a small number of predictors which operate in the same way on everyone, and uniformly throughout their range. The purpose of this review is to illustrate the use of machine-learning methods for development of risk prediction models. Typically presented as black box approaches, most machine-learning methods are aimed at solving particular challenges that arise in data analysis that are not well addressed by typical regression approaches. To illustrate these challenges, as well as how different methods can address them, we consider trying to predicting mortality after diagnosis of acute myocardial infarction. We use data derived from our institution's electronic health record and abstract data on 13 regularly measured laboratory markers. We walk through different challenges that arise in modelling these data and then introduce different machine-learning approaches. Finally, we discuss general issues in the application of machine-learning methods including tuning parameters, loss functions, variable importance, and missing data. Overall, this review serves as an introduction for those working on risk modelling to approach the diffuse field of machine learning. © The Author 2016. Published by Oxford University Press on behalf of the European Society of Cardiology.

  16. Voxel-wise prostate cell density prediction using multiparametric magnetic resonance imaging and machine learning.

    PubMed

    Sun, Yu; Reynolds, Hayley M; Wraith, Darren; Williams, Scott; Finnegan, Mary E; Mitchell, Catherine; Murphy, Declan; Haworth, Annette

    2018-04-26

    There are currently no methods to estimate cell density in the prostate. This study aimed to develop predictive models to estimate prostate cell density from multiparametric magnetic resonance imaging (mpMRI) data at a voxel level using machine learning techniques. In vivo mpMRI data were collected from 30 patients before radical prostatectomy. Sequences included T2-weighted imaging, diffusion-weighted imaging and dynamic contrast-enhanced imaging. Ground truth cell density maps were computed from histology and co-registered with mpMRI. Feature extraction and selection were performed on mpMRI data. Final models were fitted using three regression algorithms including multivariate adaptive regression spline (MARS), polynomial regression (PR) and generalised additive model (GAM). Model parameters were optimised using leave-one-out cross-validation on the training data and model performance was evaluated on test data using root mean square error (RMSE) measurements. Predictive models to estimate voxel-wise prostate cell density were successfully trained and tested using the three algorithms. The best model (GAM) achieved a RMSE of 1.06 (± 0.06) × 10 3 cells/mm 2 and a relative deviation of 13.3 ± 0.8%. Prostate cell density can be quantitatively estimated non-invasively from mpMRI data using high-quality co-registered data at a voxel level. These cell density predictions could be used for tissue classification, treatment response evaluation and personalised radiotherapy.

  17. Application of mathematical model methods for optimization tasks in construction materials technology

    NASA Astrophysics Data System (ADS)

    Fomina, E. V.; Kozhukhova, N. I.; Sverguzova, S. V.; Fomin, A. E.

    2018-05-01

    In this paper, the regression equations method for design of construction material was studied. Regression and polynomial equations representing the correlation between the studied parameters were proposed. The logic design and software interface of the regression equations method focused on parameter optimization to provide the energy saving effect at the stage of autoclave aerated concrete design considering the replacement of traditionally used quartz sand by coal mining by-product such as argillite. The mathematical model represented by a quadric polynomial for the design of experiment was obtained using calculated and experimental data. This allowed the estimation of relationship between the composition and final properties of the aerated concrete. The surface response graphically presented in a nomogram allowed the estimation of concrete properties in response to variation of composition within the x-space. The optimal range of argillite content was obtained leading to a reduction of raw materials demand, development of target plastic strength of aerated concrete as well as a reduction of curing time before autoclave treatment. Generally, this method allows the design of autoclave aerated concrete with required performance without additional resource and time costs.

  18. 7-year of surface ozone in a coastal city of central Italy: Observations and models

    NASA Astrophysics Data System (ADS)

    Biancofiore, Fabio; Verdecchia, Marco; Di Carlo, Piero; Tomassetti, Barbara; Aruffo, Eleonora; Busilacchio, Marcella; Bianco, Sebastiano; Di Tommaso, Sinibaldo; Colangeli, Carlo

    2014-05-01

    Hourly concentrations of ozone (O3) and nitrogen dioxide (NO2) have been measured for seven years, from 1998 to 2005, in a seaside town in the central Italy. Seasonal trends of O3 and NO2 recorded in the considered years are studied. Furthermore, we have focused our attention on data collected during the 2005, analyzing them using two different methods: a regression model and a neural network model. Both models are used to simulate the hourly ozone concentration, using several sets of input. In order to evaluate the performance of the model four statistical criteria are used: correlation coefficient (R), fractional bias (FB), normalized mean squared error (NMSE) e factor of two (FA2). All the criteria show that the neural network has better results compared to the regression model in all the simulations. In addiction we have tested some improvements of the neural network model, results of these tests are discussed. Finally, we have used the neural network to forecast the ozone hourly concentrations a day ahead and 1, 3, 6, 12 hour ahead. Performances of the model in predicting ozone levels are discussed.

  19. Short-term electric power demand forecasting based on economic-electricity transmission model

    NASA Astrophysics Data System (ADS)

    Li, Wenfeng; Bai, Hongkun; Liu, Wei; Liu, Yongmin; Wang, Yubin Mao; Wang, Jiangbo; He, Dandan

    2018-04-01

    Short-term electricity demand forecasting is the basic work to ensure safe operation of the power system. In this paper, a practical economic electricity transmission model (EETM) is built. With the intelligent adaptive modeling capabilities of Prognoz Platform 7.2, the econometric model consists of three industrial added value and income levels is firstly built, the electricity demand transmission model is also built. By multiple regression, moving averages and seasonal decomposition, the problem of multiple correlations between variables is effectively overcome in EETM. The validity of EETM is proved by comparison with the actual value of Henan Province. Finally, EETM model is used to forecast the electricity consumption of the 1-4 quarter of 2018.

  20. Calibration transfer of a Raman spectroscopic quantification method for the assessment of liquid detergent compositions from at-line laboratory to in-line industrial scale.

    PubMed

    Brouckaert, D; Uyttersprot, J-S; Broeckx, W; De Beer, T

    2018-03-01

    Calibration transfer or standardisation aims at creating a uniform spectral response on different spectroscopic instruments or under varying conditions, without requiring a full recalibration for each situation. In the current study, this strategy is applied to construct at-line multivariate calibration models and consequently employ them in-line in a continuous industrial production line, using the same spectrometer. Firstly, quantitative multivariate models are constructed at-line at laboratory scale for predicting the concentration of two main ingredients in hard surface cleaners. By regressing the Raman spectra of a set of small-scale calibration samples against their reference concentration values, partial least squares (PLS) models are developed to quantify the surfactant levels in the liquid detergent compositions under investigation. After evaluating the models performance with a set of independent validation samples, a univariate slope/bias correction is applied in view of transporting these at-line calibration models to an in-line manufacturing set-up. This standardisation technique allows a fast and easy transfer of the PLS regression models, by simply correcting the model predictions on the in-line set-up, without adjusting anything to the original multivariate calibration models. An extensive statistical analysis is performed in order to assess the predictive quality of the transferred regression models. Before and after transfer, the R 2 and RMSEP of both models is compared for evaluating if their magnitude is similar. T-tests are then performed to investigate whether the slope and intercept of the transferred regression line are not statistically different from 1 and 0, respectively. Furthermore, it is inspected whether no significant bias can be noted. F-tests are executed as well, for assessing the linearity of the transfer regression line and for investigating the statistical coincidence of the transfer and validation regression line. Finally, a paired t-test is performed to compare the original at-line model to the slope/bias corrected in-line model, using interval hypotheses. It is shown that the calibration models of Surfactant 1 and Surfactant 2 yield satisfactory in-line predictions after slope/bias correction. While Surfactant 1 passes seven out of eight statistical tests, the recommended validation parameters are 100% successful for Surfactant 2. It is hence concluded that the proposed strategy for transferring at-line calibration models to an in-line industrial environment via a univariate slope/bias correction of the predicted values offers a successful standardisation approach. Copyright © 2017 Elsevier B.V. All rights reserved.

  1. A quantitative model for designing keyboard layout.

    PubMed

    Shieh, K K; Lin, C C

    1999-02-01

    This study analyzed the quantitative relationship between keytapping times and ergonomic principles in typewriting skills. Keytapping times and key-operating characteristics of a female subject typing on the Qwerty and Dvorak keyboards for six weeks each were collected and analyzed. The results showed that characteristics of the typed material and the movements of hands and fingers were significantly related to keytapping times. The most significant factors affecting keytapping times were association frequency between letters, consecutive use of the same hand or finger, and the finger used. A regression equation for relating keytapping times to ergonomic principles was fitted to the data. Finally, a protocol for design of computerized keyboard layout based on the regression equation was proposed.

  2. Development of hybrid genetic-algorithm-based neural networks using regression trees for modeling air quality inside a public transportation bus.

    PubMed

    Kadiyala, Akhil; Kaur, Devinder; Kumar, Ashok

    2013-02-01

    The present study developed a novel approach to modeling indoor air quality (IAQ) of a public transportation bus by the development of hybrid genetic-algorithm-based neural networks (also known as evolutionary neural networks) with input variables optimized from using the regression trees, referred as the GART approach. This study validated the applicability of the GART modeling approach in solving complex nonlinear systems by accurately predicting the monitored contaminants of carbon dioxide (CO2), carbon monoxide (CO), nitric oxide (NO), sulfur dioxide (SO2), 0.3-0.4 microm sized particle numbers, 0.4-0.5 microm sized particle numbers, particulate matter (PM) concentrations less than 1.0 microm (PM10), and PM concentrations less than 2.5 microm (PM2.5) inside a public transportation bus operating on 20% grade biodiesel in Toledo, OH. First, the important variables affecting each monitored in-bus contaminant were determined using regression trees. Second, the analysis of variance was used as a complimentary sensitivity analysis to the regression tree results to determine a subset of statistically significant variables affecting each monitored in-bus contaminant. Finally, the identified subsets of statistically significant variables were used as inputs to develop three artificial neural network (ANN) models. The models developed were regression tree-based back-propagation network (BPN-RT), regression tree-based radial basis function network (RBFN-RT), and GART models. Performance measures were used to validate the predictive capacity of the developed IAQ models. The results from this approach were compared with the results obtained from using a theoretical approach and a generalized practicable approach to modeling IAQ that included the consideration of additional independent variables when developing the aforementioned ANN models. The hybrid GART models were able to capture majority of the variance in the monitored in-bus contaminants. The genetic-algorithm-based neural network IAQ models outperformed the traditional ANN methods of the back-propagation and the radial basis function networks. The novelty of this research is the development of a novel approach to modeling vehicular indoor air quality by integration of the advanced methods of genetic algorithms, regression trees, and the analysis of variance for the monitored in-vehicle gaseous and particulate matter contaminants, and comparing the results obtained from using the developed approach with conventional artificial intelligence techniques of back propagation networks and radial basis function networks. This study validated the newly developed approach using holdout and threefold cross-validation methods. These results are of great interest to scientists, researchers, and the public in understanding the various aspects of modeling an indoor microenvironment. This methodology can easily be extended to other fields of study also.

  3. Parametric optimization of multiple quality characteristics in laser cutting of Inconel-718 by using hybrid approach of multiple regression analysis and genetic algorithm

    NASA Astrophysics Data System (ADS)

    Shrivastava, Prashant Kumar; Pandey, Arun Kumar

    2018-06-01

    Inconel-718 has found high demand in different industries due to their superior mechanical properties. The traditional cutting methods are facing difficulties for cutting these alloys due to their low thermal potential, lower elasticity and high chemical compatibility at inflated temperature. The challenges of machining and/or finishing of unusual shapes and/or sizes in these materials have also faced by traditional machining. Laser beam cutting may be applied for the miniaturization and ultra-precision cutting and/or finishing by appropriate control of different process parameter. This paper present multi-objective optimization the kerf deviation, kerf width and kerf taper in the laser cutting of Incone-718 sheet. The second order regression models have been developed for different quality characteristics by using the experimental data obtained through experimentation. The regression models have been used as objective function for multi-objective optimization based on the hybrid approach of multiple regression analysis and genetic algorithm. The comparison of optimization results to experimental results shows an improvement of 88%, 10.63% and 42.15% in kerf deviation, kerf width and kerf taper, respectively. Finally, the effects of different process parameters on quality characteristics have also been discussed.

  4. Modelling the effect of the physical and chemical characteristics of the materials used as casing layers on the production parameters of Agaricus bisporus.

    PubMed

    Pardo, Arturo; Emilio Pardo, J; de Juan, J Arturo; Zied, Diego Cunha

    2010-12-01

    The aim of this research was to show the mathematical data obtained through the correlations found between the physical and chemical characteristics of casing layers and the final mushrooms' properties. For this purpose, 8 casing layers were used: soil, soil + peat moss, soil + black peat, soil + composted pine bark, soil + coconut fibre pith, soil + wood fibre, soil + composted vine shoots and, finally, the casing of La Rioja subjected to the ruffling practice. The conclusion that interplays in the fructification process with only the physical and chemical characteristics of casing are complicated was drawn. The mathematical data obtained in earliness could be explained in non-ruffled cultivation. The variability observed for the mushroom weight and the mushroom diameter variables could be explained in both ruffled and non-ruffled cultivations. Finally, the properties of the final quality of mushrooms were established by regression analysis.

  5. "Photographing money" task pricing

    NASA Astrophysics Data System (ADS)

    Jia, Zhongxiang

    2018-05-01

    "Photographing money" [1]is a self-service model under the mobile Internet. The task pricing is reasonable, related to the success of the commodity inspection. First of all, we analyzed the position of the mission and the membership, and introduced the factor of membership density, considering the influence of the number of members around the mission on the pricing. Multivariate regression of task location and membership density using MATLAB to establish the mathematical model of task pricing. At the same time, we can see from the life experience that membership reputation and the intensity of the task will also affect the pricing, and the data of the task success point is more reliable. Therefore, the successful point of the task is selected, and its reputation, task density, membership density and Multiple regression of task positions, according to which a nhew task pricing program. Finally, an objective evaluation is given of the advantages and disadvantages of the established model and solution method, and the improved method is pointed out.

  6. PCI fuel failure analysis: a report on a cooperative program undertaken by Pacific Northwest Laboratory and Chalk River Nuclear Laboratories.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mohr, C.L.; Pankaskie, P.J.; Heasler, P.G.

    Reactor fuel failure data sets in the form of initial power (P/sub i/), final power (P/sub f/), transient increase in power (..delta..P), and burnup (Bu) were obtained for pressurized heavy water reactors (PHWRs), boiling water reactors (BWRs), and pressurized water reactors (PWRs). These data sets were evaluated and used as the basis for developing two predictive fuel failure models, a graphical concept called the PCI-OGRAM, and a nonlinear regression based model called PROFIT. The PCI-OGRAM is an extension of the FUELOGRAM developed by AECL. It is based on a critical threshold concept for stress dependent stress corrosion cracking. The PROFITmore » model, developed at Pacific Northwest Laboratory, is the result of applying standard statistical regression methods to the available PCI fuel failure data and an analysis of the environmental and strain rate dependent stress-strain properties of the Zircaloy cladding.« less

  7. Sense of coherence and hardiness as predictors of the mental health of college students.

    PubMed

    Knowlden, Adam P; Sharma, Manoj; Kanekar, Amar; Atri, Ashutosh

    Psychological distress has a deleterious impact on the mental health of college students. The purpose of this study was to specify a theoretical, sense of coherence, and hardiness-based regression model to predict the mental health of college students. The instruments employed to build the model included the Kessler Psychological Distress Scale K-6, the Sense of Coherence-29, and the College Student Hardiness Measure. Data were collected from a sample of college students (n = 220) attending a Midwestern university. Each of the theoretical predictors regressed on mental health was deemed significant. Collectively, the significant predictors produced an R2 adjusted value of 0.434 (p < 0.001), suggesting the final specified model explained 43.4% of the variance in mental health in the sample of participants. Qualitative cut-points were developed for each scale to aid in measurement of health promotion and education interventions designed to improve the mental health of college students.

  8. Estimation of Particulate Mass and Manganese Exposure Levels among Welders

    PubMed Central

    Hobson, Angela; Seixas, Noah; Sterling, David; Racette, Brad A.

    2011-01-01

    Background: Welders are frequently exposed to Manganese (Mn), which may increase the risk of neurological impairment. Historical exposure estimates for welding-exposed workers are needed for epidemiological studies evaluating the relationship between welding and neurological or other health outcomes. The objective of this study was to develop and validate a multivariate model to estimate quantitative levels of welding fume exposures based on welding particulate mass and Mn concentrations reported in the published literature. Methods: Articles that described welding particulate and Mn exposures during field welding activities were identified through a comprehensive literature search. Summary measures of exposure and related determinants such as year of sampling, welding process performed, type of ventilation used, degree of enclosure, base metal, and location of sampling filter were extracted from each article. The natural log of the reported arithmetic mean exposure level was used as the dependent variable in model building, while the independent variables included the exposure determinants. Cross-validation was performed to aid in model selection and to evaluate the generalizability of the models. Results: A total of 33 particulate and 27 Mn means were included in the regression analysis. The final model explained 76% of the variability in the mean exposures and included welding process and degree of enclosure as predictors. There was very little change in the explained variability and root mean squared error between the final model and its cross-validation model indicating the final model is robust given the available data. Conclusions: This model may be improved with more detailed exposure determinants; however, the relatively large amount of variance explained by the final model along with the positive generalizability results of the cross-validation increases the confidence that the estimates derived from this model can be used for estimating welder exposures in absence of individual measurement data. PMID:20870928

  9. Hybrid rocket engine, theoretical model and experiment

    NASA Astrophysics Data System (ADS)

    Chelaru, Teodor-Viorel; Mingireanu, Florin

    2011-06-01

    The purpose of this paper is to build a theoretical model for the hybrid rocket engine/motor and to validate it using experimental results. The work approaches the main problems of the hybrid motor: the scalability, the stability/controllability of the operating parameters and the increasing of the solid fuel regression rate. At first, we focus on theoretical models for hybrid rocket motor and compare the results with already available experimental data from various research groups. A primary computation model is presented together with results from a numerical algorithm based on a computational model. We present theoretical predictions for several commercial hybrid rocket motors, having different scales and compare them with experimental measurements of those hybrid rocket motors. Next the paper focuses on tribrid rocket motor concept, which by supplementary liquid fuel injection can improve the thrust controllability. A complementary computation model is also presented to estimate regression rate increase of solid fuel doped with oxidizer. Finally, the stability of the hybrid rocket motor is investigated using Liapunov theory. Stability coefficients obtained are dependent on burning parameters while the stability and command matrixes are identified. The paper presents thoroughly the input data of the model, which ensures the reproducibility of the numerical results by independent researchers.

  10. Relationship between long working hours and depression: a 3-year longitudinal study of clerical workers.

    PubMed

    Amagasa, Takashi; Nakayama, Takeo

    2013-08-01

    To clarify how long working hours affect the likelihood of current and future depression. Using data from four repeated measurements collected from 218 clerical workers, four models associating work-related factors to the depressive mood scale were established. The final model was constructed after comparing and testing the goodness-of-fit index using structural equation modeling. Multiple logistic regression analysis was also performed. The final model showed the best fit (normed fit index = 0.908; goodness-of-fit index = 0.936; root-mean-square error of approximation = 0.018). Its standardized total effect indicated that long working hours affected depression at the time of evaluation and 1 to 3 years later. The odds ratio for depression risk was 14.7 in employees who were not long-hours overworked according to the initial survey but who were long-hours overworked according to the second survey. Long working hours increase current and future risks of depression.

  11. Identification of usual interstitial pneumonia pattern using RNA-Seq and machine learning: challenges and solutions.

    PubMed

    Choi, Yoonha; Liu, Tiffany Ting; Pankratz, Daniel G; Colby, Thomas V; Barth, Neil M; Lynch, David A; Walsh, P Sean; Raghu, Ganesh; Kennedy, Giulia C; Huang, Jing

    2018-05-09

    We developed a classifier using RNA sequencing data that identifies the usual interstitial pneumonia (UIP) pattern for the diagnosis of idiopathic pulmonary fibrosis. We addressed significant challenges, including limited sample size, biological and technical sample heterogeneity, and reagent and assay batch effects. We identified inter- and intra-patient heterogeneity, particularly within the non-UIP group. The models classified UIP on transbronchial biopsy samples with a receiver-operating characteristic area under the curve of ~ 0.9 in cross-validation. Using in silico mixed samples in training, we prospectively defined a decision boundary to optimize specificity at ≥85%. The penalized logistic regression model showed greater reproducibility across technical replicates and was chosen as the final model. The final model showed sensitivity of 70% and specificity of 88% in the test set. We demonstrated that the suggested methodologies appropriately addressed challenges of the sample size, disease heterogeneity and technical batch effects and developed a highly accurate and robust classifier leveraging RNA sequencing for the classification of UIP.

  12. Modeling the milling tool wear by using an evolutionary SVM-based model from milling runs experimental data

    NASA Astrophysics Data System (ADS)

    Nieto, Paulino José García; García-Gonzalo, Esperanza; Vilán, José Antonio Vilán; Robleda, Abraham Segade

    2015-12-01

    The main aim of this research work is to build a new practical hybrid regression model to predict the milling tool wear in a regular cut as well as entry cut and exit cut of a milling tool. The model was based on Particle Swarm Optimization (PSO) in combination with support vector machines (SVMs). This optimization mechanism involved kernel parameter setting in the SVM training procedure, which significantly influences the regression accuracy. Bearing this in mind, a PSO-SVM-based model, which is based on the statistical learning theory, was successfully used here to predict the milling tool flank wear (output variable) as a function of the following input variables: the time duration of experiment, depth of cut, feed, type of material, etc. To accomplish the objective of this study, the experimental dataset represents experiments from runs on a milling machine under various operating conditions. In this way, data sampled by three different types of sensors (acoustic emission sensor, vibration sensor and current sensor) were acquired at several positions. A second aim is to determine the factors with the greatest bearing on the milling tool flank wear with a view to proposing milling machine's improvements. Firstly, this hybrid PSO-SVM-based regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the flank wear (output variable) and input variables (time, depth of cut, feed, etc.). Indeed, regression with optimal hyperparameters was performed and a determination coefficient of 0.95 was obtained. The agreement of this model with experimental data confirmed its good performance. Secondly, the main advantages of this PSO-SVM-based model are its capacity to produce a simple, easy-to-interpret model, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, the main conclusions of this study are exposed.

  13. Increasing maternal healthcare use in Rwanda: implications for child nutrition and survival.

    PubMed

    Pierce, Hayley; Heaton, Tim B; Hoffmann, John

    2014-04-01

    Rwanda has made great progress in improving maternal utilization of health care through coordination of external aid and more efficient health policy. Using data from the 2005 and 2010 Rwandan Demographic and Health Surveys, we examine three related questions regarding the impact of expansion of health care in Rwanda. First, did the increased use of health center deliveries apply to women across varying levels of education, economic status, and area of residency? Second, did the benefits associated with being delivered at a health center diminish as utilization became more widespread? Finally, did inequality in child outcomes decline as a result of increased health care utilization? Propensity score matching was used to address the selectivity that arises when choosing to deliver at a hospital. In addition, the regression models include a linear model to predict child nutritional status and Cox regression to predict child survival. The analysis shows that the largest increases in delivery at a health center occur among less educated, less wealthy, and rural Rwandan women. In addition, delivery at a health center is associated with better nutritional status and survival and the benefit is not diminished following the dramatic increase in use of health centers. Finally, educational, economic and residential inequality in child survival and nutrition did not decline. Copyright © 2014 Elsevier Ltd. All rights reserved.

  14. Optimization and Prediction of Ultimate Tensile Strength in Metal Active Gas Welding.

    PubMed

    Ampaiboon, Anusit; Lasunon, On-Uma; Bubphachot, Bopit

    2015-01-01

    We investigated the effect of welding parameters on ultimate tensile strength of structural steel, ST37-2, welded by Metal Active Gas welding. A fractional factorial design was used for determining the significance of six parameters: wire feed rate, welding voltage, welding speed, travel angle, tip-to-work distance, and shielded gas flow rate. A regression model to predict ultimate tensile strength was developed. Finally, we verified optimization of the process parameters experimentally. We achieved an optimum tensile strength (558 MPa) and wire feed rate, 19 m/min, had the greatest effect, followed by tip-to-work distance, 7 mm, welding speed, 200 mm/min, welding voltage, 30 V, and travel angle, 60°. Shield gas flow rate, 10 L/min, was slightly better but had little effect in the 10-20 L/min range. Tests showed that our regression model was able to predict the ultimate tensile strength within 4%.

  15. Building interpretable predictive models for pediatric hospital readmission using Tree-Lasso logistic regression.

    PubMed

    Jovanovic, Milos; Radovanovic, Sandro; Vukicevic, Milan; Van Poucke, Sven; Delibasic, Boris

    2016-09-01

    Quantification and early identification of unplanned readmission risk have the potential to improve the quality of care during hospitalization and after discharge. However, high dimensionality, sparsity, and class imbalance of electronic health data and the complexity of risk quantification, challenge the development of accurate predictive models. Predictive models require a certain level of interpretability in order to be applicable in real settings and create actionable insights. This paper aims to develop accurate and interpretable predictive models for readmission in a general pediatric patient population, by integrating a data-driven model (sparse logistic regression) and domain knowledge based on the international classification of diseases 9th-revision clinical modification (ICD-9-CM) hierarchy of diseases. Additionally, we propose a way to quantify the interpretability of a model and inspect the stability of alternative solutions. The analysis was conducted on >66,000 pediatric hospital discharge records from California, State Inpatient Databases, Healthcare Cost and Utilization Project between 2009 and 2011. We incorporated domain knowledge based on the ICD-9-CM hierarchy in a data driven, Tree-Lasso regularized logistic regression model, providing the framework for model interpretation. This approach was compared with traditional Lasso logistic regression resulting in models that are easier to interpret by fewer high-level diagnoses, with comparable prediction accuracy. The results revealed that the use of a Tree-Lasso model was as competitive in terms of accuracy (measured by area under the receiver operating characteristic curve-AUC) as the traditional Lasso logistic regression, but integration with the ICD-9-CM hierarchy of diseases provided more interpretable models in terms of high-level diagnoses. Additionally, interpretations of models are in accordance with existing medical understanding of pediatric readmission. Best performing models have similar performances reaching AUC values 0.783 and 0.779 for traditional Lasso and Tree-Lasso, respectfully. However, information loss of Lasso models is 0.35 bits higher compared to Tree-Lasso model. We propose a method for building predictive models applicable for the detection of readmission risk based on Electronic Health records. Integration of domain knowledge (in the form of ICD-9-CM taxonomy) and a data-driven, sparse predictive algorithm (Tree-Lasso Logistic Regression) resulted in an increase of interpretability of the resulting model. The models are interpreted for the readmission prediction problem in general pediatric population in California, as well as several important subpopulations, and the interpretations of models comply with existing medical understanding of pediatric readmission. Finally, quantitative assessment of the interpretability of the models is given, that is beyond simple counts of selected low-level features. Copyright © 2016 Elsevier B.V. All rights reserved.

  16. Monthly streamflow forecasting in the Rhine basin

    NASA Astrophysics Data System (ADS)

    Schick, Simon; Rössler, Ole; Weingartner, Rolf

    2017-04-01

    Forecasting seasonal streamflow of the Rhine river is of societal relevance as the Rhine is an important water way and water resource in Western Europe. The present study investigates the predictability of monthly mean streamflow at lead times of zero, one, and two months with the focus on potential benefits by the integration of seasonal climate predictions. Specifically, we use seasonal predictions of precipitation and surface air temperature released by the European Centre for Medium-Range Weather Forecasts (ECMWF) for a regression analysis. In order to disentangle forecast uncertainty, the 'Reverse Ensemble Streamflow Prediction' framework is adapted here to the context of regression: By using appropriate subsets of predictors the regression model is constrained to either the initial conditions, the meteorological forcing, or both. An operational application is mimicked by equipping the model with the seasonal climate predictions provided by ECMWF. Finally, to mitigate the spatial aggregation of the meteorological fields the model is also applied at the subcatchment scale, and the resulting predictions are combined afterwards. The hindcast experiment is carried out for the period 1982-2011 in cross validation mode at two gauging stations, namely the Rhine at Lobith and Basel. The results show that monthly forecasts are skillful with respect to climatology only at zero lead time. In addition, at zero lead time the integration of seasonal climate predictions decreases the mean absolute error by 5 to 10 percentage compared to forecasts which are solely based on initial conditions. This reduction most likely is induced by the seasonal prediction of precipitation and not air temperature. The study is completed by bench marking the regression model with runoff simulations from ECMWFs seasonal forecast system. By simply using basin averages followed by a linear bias correction, these runoff simulations translate well to monthly streamflow. Though the regression model is only slightly outperformed, we argue that runoff out of the land surface component of seasonal climate forecasting systems is an interesting option when it comes to seasonal streamflow forecasting in large river basins.

  17. Influences on Academic Achievement Across High and Low Income Countries: A Re-Analysis of IEA Data.

    ERIC Educational Resources Information Center

    Heyneman, S.; Loxley, W.

    Previous international studies of science achievement put the data through a process of winnowing to decide which variables to keep in the final regressions. Variables were allowed to enter the final regressions if they met a minimum beta coefficient criterion of 0.05 averaged across rich and poor countries alike. The criterion was an average…

  18. Assessing the impact of local meteorological variables on surface ozone in Hong Kong during 2000-2015 using quantile and multiple line regression models

    NASA Astrophysics Data System (ADS)

    Zhao, Wei; Fan, Shaojia; Guo, Hai; Gao, Bo; Sun, Jiaren; Chen, Laiguo

    2016-11-01

    The quantile regression (QR) method has been increasingly introduced to atmospheric environmental studies to explore the non-linear relationship between local meteorological conditions and ozone mixing ratios. In this study, we applied QR for the first time, together with multiple linear regression (MLR), to analyze the dominant meteorological parameters influencing the mean, 10th percentile, 90th percentile and 99th percentile of maximum daily 8-h average (MDA8) ozone concentrations in 2000-2015 in Hong Kong. The dominance analysis (DA) was used to assess the relative importance of meteorological variables in the regression models. Results showed that the MLR models worked better at suburban and rural sites than at urban sites, and worked better in winter than in summer. QR models performed better in summer for 99th and 90th percentiles and performed better in autumn and winter for 10th percentile. And QR models also performed better in suburban and rural areas for 10th percentile. The top 3 dominant variables associated with MDA8 ozone concentrations, changing with seasons and regions, were frequently associated with the six meteorological parameters: boundary layer height, humidity, wind direction, surface solar radiation, total cloud cover and sea level pressure. Temperature rarely became a significant variable in any season, which could partly explain the peak of monthly average ozone concentrations in October in Hong Kong. And we found the effect of solar radiation would be enhanced during extremely ozone pollution episodes (i.e., the 99th percentile). Finally, meteorological effects on MDA8 ozone had no significant changes before and after the 2010 Asian Games.

  19. Estimation of soil cation exchange capacity using Genetic Expression Programming (GEP) and Multivariate Adaptive Regression Splines (MARS)

    NASA Astrophysics Data System (ADS)

    Emamgolizadeh, S.; Bateni, S. M.; Shahsavani, D.; Ashrafi, T.; Ghorbani, H.

    2015-10-01

    The soil cation exchange capacity (CEC) is one of the main soil chemical properties, which is required in various fields such as environmental and agricultural engineering as well as soil science. In situ measurement of CEC is time consuming and costly. Hence, numerous studies have used traditional regression-based techniques to estimate CEC from more easily measurable soil parameters (e.g., soil texture, organic matter (OM), and pH). However, these models may not be able to adequately capture the complex and highly nonlinear relationship between CEC and its influential soil variables. In this study, Genetic Expression Programming (GEP) and Multivariate Adaptive Regression Splines (MARS) were employed to estimate CEC from more readily measurable soil physical and chemical variables (e.g., OM, clay, and pH) by developing functional relations. The GEP- and MARS-based functional relations were tested at two field sites in Iran. Results showed that GEP and MARS can provide reliable estimates of CEC. Also, it was found that the MARS model (with root-mean-square-error (RMSE) of 0.318 Cmol+ kg-1 and correlation coefficient (R2) of 0.864) generated slightly better results than the GEP model (with RMSE of 0.270 Cmol+ kg-1 and R2 of 0.807). The performance of GEP and MARS models was compared with two existing approaches, namely artificial neural network (ANN) and multiple linear regression (MLR). The comparison indicated that MARS and GEP outperformed the MLP model, but they did not perform as good as ANN. Finally, a sensitivity analysis was conducted to determine the most and the least influential variables affecting CEC. It was found that OM and pH have the most and least significant effect on CEC, respectively.

  20. Using Time Series Analysis to Predict Cardiac Arrest in a PICU.

    PubMed

    Kennedy, Curtis E; Aoki, Noriaki; Mariscalco, Michele; Turley, James P

    2015-11-01

    To build and test cardiac arrest prediction models in a PICU, using time series analysis as input, and to measure changes in prediction accuracy attributable to different classes of time series data. Retrospective cohort study. Thirty-one bed academic PICU that provides care for medical and general surgical (not congenital heart surgery) patients. Patients experiencing a cardiac arrest in the PICU and requiring external cardiac massage for at least 2 minutes. None. One hundred three cases of cardiac arrest and 109 control cases were used to prepare a baseline dataset that consisted of 1,025 variables in four data classes: multivariate, raw time series, clinical calculations, and time series trend analysis. We trained 20 arrest prediction models using a matrix of five feature sets (combinations of data classes) with four modeling algorithms: linear regression, decision tree, neural network, and support vector machine. The reference model (multivariate data with regression algorithm) had an accuracy of 78% and 87% area under the receiver operating characteristic curve. The best model (multivariate + trend analysis data with support vector machine algorithm) had an accuracy of 94% and 98% area under the receiver operating characteristic curve. Cardiac arrest predictions based on a traditional model built with multivariate data and a regression algorithm misclassified cases 3.7 times more frequently than predictions that included time series trend analysis and built with a support vector machine algorithm. Although the final model lacks the specificity necessary for clinical application, we have demonstrated how information from time series data can be used to increase the accuracy of clinical prediction models.

  1. A crystal plasticity model for slip in hexagonal close packed metals based on discrete dislocation simulations

    NASA Astrophysics Data System (ADS)

    Messner, Mark C.; Rhee, Moono; Arsenlis, Athanasios; Barton, Nathan R.

    2017-06-01

    This work develops a method for calibrating a crystal plasticity model to the results of discrete dislocation (DD) simulations. The crystal model explicitly represents junction formation and annihilation mechanisms and applies these mechanisms to describe hardening in hexagonal close packed metals. The model treats these dislocation mechanisms separately from elastic interactions among populations of dislocations, which the model represents through a conventional strength-interaction matrix. This split between elastic interactions and junction formation mechanisms more accurately reproduces the DD data and results in a multi-scale model that better represents the lower scale physics. The fitting procedure employs concepts of machine learning—feature selection by regularized regression and cross-validation—to develop a robust, physically accurate crystal model. The work also presents a method for ensuring the final, calibrated crystal model respects the physical symmetries of the crystal system. Calibrating the crystal model requires fitting two linear operators: one describing elastic dislocation interactions and another describing junction formation and annihilation dislocation reactions. The structure of these operators in the final, calibrated model reflect the crystal symmetry and slip system geometry of the DD simulations.

  2. The weighted priors approach for combining expert opinions in logistic regression experiments

    DOE PAGES

    Quinlan, Kevin R.; Anderson-Cook, Christine M.; Myers, Kary L.

    2017-04-24

    When modeling the reliability of a system or component, it is not uncommon for more than one expert to provide very different prior estimates of the expected reliability as a function of an explanatory variable such as age or temperature. Our goal in this paper is to incorporate all information from the experts when choosing a design about which units to test. Bayesian design of experiments has been shown to be very successful for generalized linear models, including logistic regression models. We use this approach to develop methodology for the case where there are several potentially non-overlapping priors under consideration.more » While multiple priors have been used for analysis in the past, they have never been used in a design context. The Weighted Priors method performs well for a broad range of true underlying model parameter choices and is more robust when compared to other reasonable design choices. Finally, we illustrate the method through multiple scenarios and a motivating example. Additional figures for this article are available in the online supplementary information.« less

  3. Face aging effect simulation model based on multilayer representation and shearlet transform

    NASA Astrophysics Data System (ADS)

    Li, Yuancheng; Li, Yan

    2017-09-01

    In order to extract detailed facial features, we build a face aging effect simulation model based on multilayer representation and shearlet transform. The face is divided into three layers: the global layer of the face, the local features layer, and texture layer, which separately establishes the aging model. First, the training samples are classified according to different age groups, and we use active appearance model (AAM) at the global level to obtain facial features. The regression equations of shape and texture with age are obtained by fitting the support vector machine regression, which is based on the radial basis function. We use AAM to simulate the aging of facial organs. Then, for the texture detail layer, we acquire the significant high-frequency characteristic components of the face by using the multiscale shearlet transform. Finally, we get the last simulated aging images of the human face by the fusion algorithm. Experiments are carried out on the FG-NET dataset, and the experimental results show that the simulated face images have less differences from the original image and have a good face aging simulation effect.

  4. The weighted priors approach for combining expert opinions in logistic regression experiments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Quinlan, Kevin R.; Anderson-Cook, Christine M.; Myers, Kary L.

    When modeling the reliability of a system or component, it is not uncommon for more than one expert to provide very different prior estimates of the expected reliability as a function of an explanatory variable such as age or temperature. Our goal in this paper is to incorporate all information from the experts when choosing a design about which units to test. Bayesian design of experiments has been shown to be very successful for generalized linear models, including logistic regression models. We use this approach to develop methodology for the case where there are several potentially non-overlapping priors under consideration.more » While multiple priors have been used for analysis in the past, they have never been used in a design context. The Weighted Priors method performs well for a broad range of true underlying model parameter choices and is more robust when compared to other reasonable design choices. Finally, we illustrate the method through multiple scenarios and a motivating example. Additional figures for this article are available in the online supplementary information.« less

  5. Internship workplace preferences of final-year medical students at Zagreb University Medical School, Croatia: all roads lead to Zagreb

    PubMed Central

    Polasek, Ozren; Kolcic, Ivana; Dzakula, Aleksandar; Bagat, Mario

    2006-01-01

    Background Human resources management in health often encounters problems related to workforce geographical distribution. The aim of this study was to investigate the internship workplace preferences of final-year medical students and the reasons associated with their choices. Method A total of 204 out of 240 final-year medical students at Zagreb University Medical School, Croatia, were surveyed a few months before graduation. We collected data on each student's background, workplace preference, academic performance and emigration preferences. Logistic regression was used to analyse the factors underlying internship workplace preference, classified into two categories: Zagreb versus other areas. Results Only 39 respondents (19.1%) wanted to obtain internships outside Zagreb, the Croatian capital. Gender and age were not significantly associated with internship workplace preference. A single predictor variable significantly contributed to the logistic regression model: students who believed they would not get the desired specialty more often chose Zagreb as a preferred internship workplace (odds ratio 0.32, 95% CI 0.12–0.86). Conclusion A strong preference for Zagreb as an internship workplace was recorded. Uncertainty about getting the desired specialty was associated with choosing Zagreb as a workplace, possibly due to more extensive and diverse job opportunities. PMID:16579857

  6. Water Quality Variable Estimation using Partial Least Squares Regression and Multi-Scale Remote Sensing.

    NASA Astrophysics Data System (ADS)

    Peterson, K. T.; Wulamu, A.

    2017-12-01

    Water, essential to all living organisms, is one of the Earth's most precious resources. Remote sensing offers an ideal approach to monitor water quality over traditional in-situ techniques that are highly time and resource consuming. Utilizing a multi-scale approach, incorporating data from handheld spectroscopy, UAS based hyperspectal, and satellite multispectral images were collected in coordination with in-situ water quality samples for the two midwestern watersheds. The remote sensing data was modeled and correlated to the in-situ water quality variables including chlorophyll content (Chl), turbidity, and total dissolved solids (TDS) using Normalized Difference Spectral Indices (NDSI) and Partial Least Squares Regression (PLSR). The results of the study supported the original hypothesis that correlating water quality variables with remotely sensed data benefits greatly from the use of more complex modeling and regression techniques such as PLSR. The final results generated from the PLSR analysis resulted in much higher R2 values for all variables when compared to NDSI. The combination of NDSI and PLSR analysis also identified key wavelengths for identification that aligned with previous study's findings. This research displays the advantages and future for complex modeling and machine learning techniques to improve water quality variable estimation from spectral data.

  7. Estimating soil temperature using neighboring station data via multi-nonlinear regression and artificial neural network models.

    PubMed

    Bilgili, Mehmet; Sahin, Besir; Sangun, Levent

    2013-01-01

    The aim of this study is to estimate the soil temperatures of a target station using only the soil temperatures of neighboring stations without any consideration of the other variables or parameters related to soil properties. For this aim, the soil temperatures were measured at depths of 5, 10, 20, 50, and 100 cm below the earth surface at eight measuring stations in Turkey. Firstly, the multiple nonlinear regression analysis was performed with the "Enter" method to determine the relationship between the values of target station and neighboring stations. Then, the stepwise regression analysis was applied to determine the best independent variables. Finally, an artificial neural network (ANN) model was developed to estimate the soil temperature of a target station. According to the derived results for the training data set, the mean absolute percentage error and correlation coefficient ranged from 1.45% to 3.11% and from 0.9979 to 0.9986, respectively, while corresponding ranges of 1.685-3.65% and 0.9988-0.9991, respectively, were obtained based on the testing data set. The obtained results show that the developed ANN model provides a simple and accurate prediction to determine the soil temperature. In addition, the missing data at the target station could be determined within a high degree of accuracy.

  8. The implementation of rare events logistic regression to predict the distribution of mesophotic hard corals across the main Hawaiian Islands.

    PubMed

    Veazey, Lindsay M; Franklin, Erik C; Kelley, Christopher; Rooney, John; Frazer, L Neil; Toonen, Robert J

    2016-01-01

    Predictive habitat suitability models are powerful tools for cost-effective, statistically robust assessment of the environmental drivers of species distributions. The aim of this study was to develop predictive habitat suitability models for two genera of scleractinian corals (Leptoserisand Montipora) found within the mesophotic zone across the main Hawaiian Islands. The mesophotic zone (30-180 m) is challenging to reach, and therefore historically understudied, because it falls between the maximum limit of SCUBA divers and the minimum typical working depth of submersible vehicles. Here, we implement a logistic regression with rare events corrections to account for the scarcity of presence observations within the dataset. These corrections reduced the coefficient error and improved overall prediction success (73.6% and 74.3%) for both original regression models. The final models included depth, rugosity, slope, mean current velocity, and wave height as the best environmental covariates for predicting the occurrence of the two genera in the mesophotic zone. Using an objectively selected theta ("presence") threshold, the predicted presence probability values (average of 0.051 for Leptoseris and 0.040 for Montipora) were translated to spatially-explicit habitat suitability maps of the main Hawaiian Islands at 25 m grid cell resolution. Our maps are the first of their kind to use extant presence and absence data to examine the habitat preferences of these two dominant mesophotic coral genera across Hawai'i.

  9. Monitoring and modeling to predict Escherichia coli at Presque Isle Beach 2, City of Erie, Erie County, Pennsylvania

    USGS Publications Warehouse

    Zimmerman, Tammy M.

    2006-01-01

    The Lake Erie shoreline in Pennsylvania spans nearly 40 miles and is a valuable recreational resource for Erie County. Nearly 7 miles of the Lake Erie shoreline lies within Presque Isle State Park in Erie, Pa. Concentrations of Escherichia coli (E. coli) bacteria at permitted Presque Isle beaches occasionally exceed the single-sample bathing-water standard, resulting in unsafe swimming conditions and closure of the beaches. E. coli concentrations and other water-quality and environmental data collected at Presque Isle Beach 2 during the 2004 and 2005 recreational seasons were used to develop models using tobit regression analyses to predict E. coli concentrations. All variables statistically related to E. coli concentrations were included in the initial regression analyses, and after several iterations, only those explanatory variables that made the models significantly better at predicting E. coli concentrations were included in the final models. Regression models were developed using data from 2004, 2005, and the combined 2-year dataset. Variables in the 2004 model and the combined 2004-2005 model were log10 turbidity, rain weight, wave height (calculated), and wind direction. Variables in the 2005 model were log10 turbidity and wind direction. Explanatory variables not included in the final models were water temperature, streamflow, wind speed, and current speed; model results indicated these variables did not meet significance criteria at the 95-percent confidence level (probabilities were greater than 0.05). The predicted E. coli concentrations produced by the models were used to develop probabilities that concentrations would exceed the single-sample bathing-water standard for E. coli of 235 colonies per 100 milliliters. Analysis of the exceedence probabilities helped determine a threshold probability for each model, chosen such that the correct number of exceedences and nonexceedences was maximized and the number of false positives and false negatives was minimized. Future samples with computed exceedence probabilities higher than the selected threshold probability, as determined by the model, will likely exceed the E. coli standard and a beach advisory or closing may need to be issued; computed exceedence probabilities lower than the threshold probability will likely indicate the standard will not be exceeded. Additional data collected each year can be used to test and possibly improve the model. This study will aid beach managers in more rapidly determining when waters are not safe for recreational use and, subsequently, when to issue beach advisories or closings.

  10. [State Recognition of Solid Fermentation Process Based on Near Infrared Spectroscopy with Adaboost and Spectral Regression Discriminant Analysis].

    PubMed

    Yu, Shuang; Liu, Guo-hai; Xia, Rong-sheng; Jiang, Hui

    2016-01-01

    In order to achieve the rapid monitoring of process state of solid state fermentation (SSF), this study attempted to qualitative identification of process state of SSF of feed protein by use of Fourier transform near infrared (FT-NIR) spectroscopy analysis technique. Even more specifically, the FT-NIR spectroscopy combined with Adaboost-SRDA-NN integrated learning algorithm as an ideal analysis tool was used to accurately and rapidly monitor chemical and physical changes in SSF of feed protein without the need for chemical analysis. Firstly, the raw spectra of all the 140 fermentation samples obtained were collected by use of Fourier transform near infrared spectrometer (Antaris II), and the raw spectra obtained were preprocessed by use of standard normal variate transformation (SNV) spectral preprocessing algorithm. Thereafter, the characteristic information of the preprocessed spectra was extracted by use of spectral regression discriminant analysis (SRDA). Finally, nearest neighbors (NN) algorithm as a basic classifier was selected and building state recognition model to identify different fermentation samples in the validation set. Experimental results showed as follows: the SRDA-NN model revealed its superior performance by compared with other two different NN models, which were developed by use of the feature information form principal component analysis (PCA) and linear discriminant analysis (LDA), and the correct recognition rate of SRDA-NN model achieved 94.28% in the validation set. In this work, in order to further improve the recognition accuracy of the final model, Adaboost-SRDA-NN ensemble learning algorithm was proposed by integrated the Adaboost and SRDA-NN methods, and the presented algorithm was used to construct the online monitoring model of process state of SSF of feed protein. Experimental results showed as follows: the prediction performance of SRDA-NN model has been further enhanced by use of Adaboost lifting algorithm, and the correct recognition rate of the Adaboost-SRDA-NN model achieved 100% in the validation set. The overall results demonstrate that SRDA algorithm can effectively achieve the spectral feature information extraction to the spectral dimension reduction in model calibration process of qualitative analysis of NIR spectroscopy. In addition, the Adaboost lifting algorithm can improve the classification accuracy of the final model. The results obtained in this work can provide research foundation for developing online monitoring instruments for the monitoring of SSF process.

  11. Research on Fault Rate Prediction Method of T/R Component

    NASA Astrophysics Data System (ADS)

    Hou, Xiaodong; Yang, Jiangping; Bi, Zengjun; Zhang, Yu

    2017-07-01

    T/R component is an important part of the large phased array radar antenna array, because of its large numbers, high fault rate, it has important significance for fault prediction. Aiming at the problems of traditional grey model GM(1,1) in practical operation, the discrete grey model is established based on the original model in this paper, and the optimization factor is introduced to optimize the background value, and the linear form of the prediction model is added, the improved discrete grey model of linear regression is proposed, finally, an example is simulated and compared with other models. The results show that the method proposed in this paper has higher accuracy and the solution is simple and the application scope is more extensive.

  12. Contribution of neurocognition to 18-month employment outcomes in first-episode psychosis.

    PubMed

    Karambelas, George J; Cotton, Sue M; Farhall, John; Killackey, Eóin; Allott, Kelly A

    2017-10-27

    To examine whether baseline neurocognition predicts vocational outcomes over 18 months in patients with first-episode psychosis enrolled in a randomized controlled trial of Individual Placement and Support or treatment as usual. One-hundred and thirty-four first-episode psychosis participants completed an extensive neurocognitive battery. Principal axis factor analysis using PROMAX rotation was used to determine the underlying structure of the battery. Setwise (hierarchical) multiple linear and logistic regressions were used to examine predictors of (1) total hours employed over 18 months and (2) employment status, respectively. Neurocognition factors were entered in the models after accounting for age, gender, premorbid IQ, negative symptoms, treatment group allocation and employment status at baseline. Five neurocognitive factors were extracted: (1) processing speed, (2) verbal learning and memory, (3) knowledge and reasoning, (4) attention and working memory and (5) visual organization and memory. Employment status over 18 months was not significantly predicted by any of the predictors in the final model. Total hours employed over 18 months were significantly predicted by gender (P = .027), negative symptoms (P = .032) and verbal learning and memory (P = .040). Every step of the regression model was a significant predictor of total hours worked overall (final model: P = .013). Verbal learning and memory, negative symptoms and gender were implicated in duration of employment in first-episode psychosis. The other neurocognitive domains did not significantly contribute to the prediction of vocational outcomes over 18 months. Interventions targeting verbal memory may improve vocational outcomes in early psychosis. © 2017 John Wiley & Sons Australia, Ltd.

  13. Predicting the demand of physician workforce: an international model based on "crowd behaviors".

    PubMed

    Tsai, Tsuen-Chiuan; Eliasziw, Misha; Chen, Der-Fang

    2012-03-26

    Appropriateness of physician workforce greatly influences the quality of healthcare. When facing the crisis of physician shortages, the correction of manpower always takes an extended time period, and both the public and health personnel suffer. To calculate an appropriate number of Physician Density (PD) for a specific country, this study was designed to create a PD prediction model, based on health-related data from many countries. Twelve factors that could possibly impact physicians' demand were chosen, and data of these factors from 130 countries (by reviewing 195) were extracted. Multiple stepwise-linear regression was used to derive the PD prediction model, and a split-sample cross-validation procedure was performed to evaluate the generalizability of the results. Using data from 130 countries, with the consideration of the correlation between variables, and preventing multi-collinearity, seven out of the 12 predictor variables were selected for entry into the stepwise regression procedure. The final model was: PD = (5.014 - 0.128 × proportion under age 15 years + 0.034 × life expectancy)2, with R2 of 80.4%. Using the prediction equation, 70 countries had PDs with "negative discrepancy", while 58 had PDs with "positive discrepancy". This study provided a regression-based PD model to calculate a "norm" number of PD for a specific country. A large PD discrepancy in a country indicates the needs to examine physician's workloads and their well-being, the effectiveness/efficiency of medical care, the promotion of population health and the team resource management.

  14. Socioeconomic and Demographic Disparities in Knowledge of Reproductive Healthcare among Female University Students in Bangladesh

    PubMed Central

    Islam Mondal, Md. Nazrul; Nasir Ullah, Md. Monzur Morshad; Khan, Md. Nuruzzaman; Islam, Mohammad Zamirul; Islam, Md. Nurul; Moni, Sabiha Yasmin; Hoque, Md. Nazrul; Rahman, Md. Mashiur

    2015-01-01

    Background: Reproductive health (RH) is a critical component of women’s health and overall well-being around the world, especially in developing countries. We examine the factors that determine knowledge of RH care among female university students in Bangladesh. Methods: Data on 300 female students were collected from Rajshahi University, Bangladesh through a structured questionnaire using purposive sampling technique. The data were used for univariate analysis, to carry out the description of the variables; bivariate analysis was used to examine the associations between the variables; and finally, multivariate analysis (binary logistic regression model) was used to examine and fit the model and interpret the parameter estimates, especially in terms of odds ratios. Results: The results revealed that more than one-third (34.3%) respondents do not have sufficient knowledge of RH care. The χ2-test identified the significant (p < 0.05) associations between respondents’ knowledge of RH care with respondents’ age, education, family type, watching television; and knowledge about pregnancy, family planning, and contraceptive use. Finally, the binary logistic regression model identified respondents’ age, education, family type; and knowledge about family planning, and contraceptive use as the significant (p < 0.05) predictors of RH care. Conclusions and Global Health Implications: Knowledge of RH care among female university students was found unsatisfactory. Government and concerned organizations should promote and strengthen various health education programs to focus on RH care especially for the female university students in Bangladesh. PMID:27622005

  15. Parameters of Models of Structural Transformations in Alloy Steel Under Welding Thermal Cycle

    NASA Astrophysics Data System (ADS)

    Kurkin, A. S.; Makarov, E. L.; Kurkin, A. B.; Rubtsov, D. E.; Rubtsov, M. E.

    2017-05-01

    A mathematical model of structural transformations in an alloy steel under the thermal cycle of multipass welding is suggested for computer implementation. The minimum necessary set of parameters for describing the transformations under heating and cooling is determined. Ferritic-pearlitic, bainitic and martensitic transformations under cooling of a steel are considered. A method for deriving the necessary temperature and time parameters of the model from the chemical composition of the steel is described. Published data are used to derive regression models of the temperature ranges and parameters of transformation kinetics in alloy steels. It is shown that the disadvantages of the active visual methods of analysis of the final phase composition of steels are responsible for inaccuracy and mismatch of published data. The hardness of a specimen, which correlates with some other mechanical properties of the material, is chosen as the most objective and reproducible criterion of the final phase composition. The models developed are checked by a comparative analysis of computational results and experimental data on the hardness of 140 alloy steels after cooling at various rates.

  16. Evolutionary grinding model for nanometric control of surface roughness for aspheric optical surfaces.

    PubMed

    Han, Jeong-Yeol; Kim, Sug-Whan; Han, Inwoo; Kim, Geon-Hee

    2008-03-17

    A new evolutionary grinding process model has been developed for nanometric control of material removal from an aspheric surface of Zerodur substrate. The model incorporates novel control features such as i) a growing database; ii) an evolving, multi-variable regression equation; and iii) an adaptive correction factor for target surface roughness (Ra) for the next machine run. This process model demonstrated a unique evolutionary controllability of machining performance resulting in the final grinding accuracy (i.e. averaged difference between target and measured surface roughness) of -0.2+/-2.3(sigma) nm Ra over seven trial machine runs for the target surface roughness ranging from 115 nm to 64 nm Ra.

  17. Cost Estimation Techniques for C3I System Software.

    DTIC Science & Technology

    1984-07-01

    opment manmonth have been determined for maxi, midi , and mini .1 type computers. Small to median size timeshared developments used 0.2 to 1.5 hours...development schedule 1.23 1.00 1.10 2.1.3 Detailed Model The final codification of the COCOMO regressions was the development of separate effort...regardless of the software structure level being estimated: D8VC -- the expected development computer (maxi. midi . mini, micro) MODE -- the expected

  18. Nitrogen dioxide concentrations in neighborhoods adjacent to a commercial airport: a land use regression modeling study

    PubMed Central

    2010-01-01

    Background There is growing concern in communities surrounding airports regarding the contribution of various emission sources (such as aircraft and ground support equipment) to nearby ambient concentrations. We used extensive monitoring of nitrogen dioxide (NO2) in neighborhoods surrounding T.F. Green Airport in Warwick, RI, and land-use regression (LUR) modeling techniques to determine the impact of proximity to the airport and local traffic on these concentrations. Methods Palmes diffusion tube samplers were deployed along the airport's fence line and within surrounding neighborhoods for one to two weeks. In total, 644 measurements were collected over three sampling campaigns (October 2007, March 2008 and June 2008) and each sampling location was geocoded. GIS-based variables were created as proxies for local traffic and airport activity. A forward stepwise regression methodology was employed to create general linear models (GLMs) of NO2 variability near the airport. The effect of local meteorology on associations with GIS-based variables was also explored. Results Higher concentrations of NO2 were seen near the airport terminal, entrance roads to the terminal, and near major roads, with qualitatively consistent spatial patterns between seasons. In our final multivariate model (R2 = 0.32), the local influences of highways and arterial/collector roads were statistically significant, as were local traffic density and distance to the airport terminal (all p < 0.001). Local meteorology did not significantly affect associations with principal GIS variables, and the regression model structure was robust to various model-building approaches. Conclusion Our study has shown that there are clear local variations in NO2 in the neighborhoods that surround an urban airport, which are spatially consistent across seasons. LUR modeling demonstrated a strong influence of local traffic, except the smallest roads that predominate in residential areas, as well as proximity to the airport terminal. PMID:21083910

  19. Nitrogen dioxide concentrations in neighborhoods adjacent to a commercial airport: a land use regression modeling study.

    PubMed

    Adamkiewicz, Gary; Hsu, Hsiao-Hsien; Vallarino, Jose; Melly, Steven J; Spengler, John D; Levy, Jonathan I

    2010-11-17

    There is growing concern in communities surrounding airports regarding the contribution of various emission sources (such as aircraft and ground support equipment) to nearby ambient concentrations. We used extensive monitoring of nitrogen dioxide (NO2) in neighborhoods surrounding T.F. Green Airport in Warwick, RI, and land-use regression (LUR) modeling techniques to determine the impact of proximity to the airport and local traffic on these concentrations. Palmes diffusion tube samplers were deployed along the airport's fence line and within surrounding neighborhoods for one to two weeks. In total, 644 measurements were collected over three sampling campaigns (October 2007, March 2008 and June 2008) and each sampling location was geocoded. GIS-based variables were created as proxies for local traffic and airport activity. A forward stepwise regression methodology was employed to create general linear models (GLMs) of NO2 variability near the airport. The effect of local meteorology on associations with GIS-based variables was also explored. Higher concentrations of NO2 were seen near the airport terminal, entrance roads to the terminal, and near major roads, with qualitatively consistent spatial patterns between seasons. In our final multivariate model (R2 = 0.32), the local influences of highways and arterial/collector roads were statistically significant, as were local traffic density and distance to the airport terminal (all p < 0.001). Local meteorology did not significantly affect associations with principal GIS variables, and the regression model structure was robust to various model-building approaches. Our study has shown that there are clear local variations in NO2 in the neighborhoods that surround an urban airport, which are spatially consistent across seasons. LUR modeling demonstrated a strong influence of local traffic, except the smallest roads that predominate in residential areas, as well as proximity to the airport terminal.

  20. Psychological well-being in individuals with mild cognitive impairment.

    PubMed

    Gates, Nicola; Valenzuela, Michael; Sachdev, Perminder S; Singh, Maria A Fiatarone

    2014-01-01

    Cognitive impairments associated with aging and dementia are major sources of burden, deterioration in life quality, and reduced psychological well-being (PWB). Preventative measures to both reduce incident disease and improve PWB in those afflicted are increasingly targeting individuals with mild cognitive impairment (MCI) at early disease stage. However, there is very limited information regarding the relationships between early cognitive changes and memory concern, and life quality and PWB in adults with MCI; furthermore, PWB outcomes are too commonly overlooked in intervention trials. The purpose of this study was therefore to empirically test a theoretical model of PWB in MCI in order to inform clinical intervention. Baseline data from a convenience sample of 100 community-dwelling adults diagnosed with MCI enrolled in the Study of Mental Activity and Regular Training (SMART) trial were collected. A series of regression analyses were performed to develop a reduced model, then hierarchical regression with the Baron Kenny test of mediation derived the final three-tiered model of PWB. Significant predictors of PWB were subjective memory concern, cognitive function, evaluations of quality of life, and negative affect, with a final model explaining 61% of the variance of PWB in MCI. Our empirical findings support a theoretical tiered model of PWB in MCI and contribute to an understanding of the way in which early subtle cognitive deficits impact upon PWB. Multiple targets and entry points for clinical intervention were identified. These include improving the cognitive difficulties associated with MCI. Additionally, these highlight the importance of reducing memory concern, addressing low mood, and suggest that improving a person's quality of life may attenuate the negative effects of depression and anxiety on PWB in this cohort.

  1. Analysis of Market Opportunities for Chinese Private Express Delivery Industry

    NASA Astrophysics Data System (ADS)

    Jiang, Changbing; Bai, Lijun; Tong, Xiaoqing

    China's express delivery market has become the arena in which each express enterprise struggles to chase due to the huge potential demand and high profitable prospects. So certain qualitative and quantitative forecast for the future changes of China's express delivery market will help enterprises understand various types of market conditions and social changes in demand and adjust business activities to enhance their competitiveness timely. The development of China's express delivery industry is first introduced in this chapter. Then the theoretical basis of the regression model is overviewed. We also predict the demand trends of China's express delivery market by using Pearson correlation analysis and regression analysis from qualitative and quantitative aspects, respectively. Finally, we draw some conclusions and recommendations for China's express delivery industry.

  2. Advances in nowcasting influenza-like illness rates using search query logs

    NASA Astrophysics Data System (ADS)

    Lampos, Vasileios; Miller, Andrew C.; Crossan, Steve; Stefansen, Christian

    2015-08-01

    User-generated content can assist epidemiological surveillance in the early detection and prevalence estimation of infectious diseases, such as influenza. Google Flu Trends embodies the first public platform for transforming search queries to indications about the current state of flu in various places all over the world. However, the original model significantly mispredicted influenza-like illness rates in the US during the 2012-13 flu season. In this work, we build on the previous modeling attempt, proposing substantial improvements. Firstly, we investigate the performance of a widely used linear regularized regression solver, known as the Elastic Net. Then, we expand on this model by incorporating the queries selected by the Elastic Net into a nonlinear regression framework, based on a composite Gaussian Process. Finally, we augment the query-only predictions with an autoregressive model, injecting prior knowledge about the disease. We assess predictive performance using five consecutive flu seasons spanning from 2008 to 2013 and qualitatively explain certain shortcomings of the previous approach. Our results indicate that a nonlinear query modeling approach delivers the lowest cumulative nowcasting error, and also suggest that query information significantly improves autoregressive inferences, obtaining state-of-the-art performance.

  3. Advances in nowcasting influenza-like illness rates using search query logs.

    PubMed

    Lampos, Vasileios; Miller, Andrew C; Crossan, Steve; Stefansen, Christian

    2015-08-03

    User-generated content can assist epidemiological surveillance in the early detection and prevalence estimation of infectious diseases, such as influenza. Google Flu Trends embodies the first public platform for transforming search queries to indications about the current state of flu in various places all over the world. However, the original model significantly mispredicted influenza-like illness rates in the US during the 2012-13 flu season. In this work, we build on the previous modeling attempt, proposing substantial improvements. Firstly, we investigate the performance of a widely used linear regularized regression solver, known as the Elastic Net. Then, we expand on this model by incorporating the queries selected by the Elastic Net into a nonlinear regression framework, based on a composite Gaussian Process. Finally, we augment the query-only predictions with an autoregressive model, injecting prior knowledge about the disease. We assess predictive performance using five consecutive flu seasons spanning from 2008 to 2013 and qualitatively explain certain shortcomings of the previous approach. Our results indicate that a nonlinear query modeling approach delivers the lowest cumulative nowcasting error, and also suggest that query information significantly improves autoregressive inferences, obtaining state-of-the-art performance.

  4. Rainfall-induced Landslide Susceptibility assessment at the Longnan county

    NASA Astrophysics Data System (ADS)

    Hong, Haoyuan; Zhang, Ying

    2017-04-01

    Landslides are a serious disaster in Longnan county, China. Therefore landslide susceptibility assessment is useful tool for government or decision making. The main objective of this study is to investigate and compare the frequency ratio, support vector machines, and logistic regression. The Longnan county (Jiangxi province, China) was selected as the case study. First, the landslide inventory map with 354 landslide locations was constructed. Then landslide locations were then randomly divided into a ratio of 70/30 for the training and validating the models. Second, fourteen landslide conditioning factors were prepared such as slope, aspect, altitude, topographic wetness index (TWI), stream power index (SPI), sediment transport index (STI), plan curvature, lithology, distance to faults, distance to rivers, distance to roads, land use, normalized difference vegetation index (NDVI), and rainfall. Using the frequency ratio, support vector machines, and logistic regression, a total of three landslide susceptibility models were constructed. Finally, the overall performance of the resulting models was assessed and compared using the Receiver operating characteristic (ROC) curve technique. The result showed that the support vector machines model is the best model in the study area. The success rate is 88.39 %; and prediction rate is 84.06 %.

  5. [Gaussian process regression and its application in near-infrared spectroscopy analysis].

    PubMed

    Feng, Ai-Ming; Fang, Li-Min; Lin, Min

    2011-06-01

    Gaussian process (GP) is applied in the present paper as a chemometric method to explore the complicated relationship between the near infrared (NIR) spectra and ingredients. After the outliers were detected by Monte Carlo cross validation (MCCV) method and removed from dataset, different preprocessing methods, such as multiplicative scatter correction (MSC), smoothing and derivate, were tried for the best performance of the models. Furthermore, uninformative variable elimination (UVE) was introduced as a variable selection technique and the characteristic wavelengths obtained were further employed as input for modeling. A public dataset with 80 NIR spectra of corn was introduced as an example for evaluating the new algorithm. The optimal models for oil, starch and protein were obtained by the GP regression method. The performance of the final models were evaluated according to the root mean square error of calibration (RMSEC), root mean square error of cross-validation (RMSECV), root mean square error of prediction (RMSEP) and correlation coefficient (r). The models give good calibration ability with r values above 0.99 and the prediction ability is also satisfactory with r values higher than 0.96. The overall results demonstrate that GP algorithm is an effective chemometric method and is promising for the NIR analysis.

  6. Hierarchical Bayesian Markov switching models with application to predicting spawning success of shovelnose sturgeon

    USGS Publications Warehouse

    Holan, S.H.; Davis, G.M.; Wildhaber, M.L.; DeLonay, A.J.; Papoulias, D.M.

    2009-01-01

    The timing of spawning in fish is tightly linked to environmental factors; however, these factors are not very well understood for many species. Specifically, little information is available to guide recruitment efforts for endangered species such as the sturgeon. Therefore, we propose a Bayesian hierarchical model for predicting the success of spawning of the shovelnose sturgeon which uses both biological and behavioural (longitudinal) data. In particular, we use data that were produced from a tracking study that was conducted in the Lower Missouri River. The data that were produced from this study consist of biological variables associated with readiness to spawn along with longitudinal behavioural data collected by using telemetry and archival data storage tags. These high frequency data are complex both biologically and in the underlying behavioural process. To accommodate such complexity we developed a hierarchical linear regression model that uses an eigenvalue predictor, derived from the transition probability matrix of a two-state Markov switching model with generalized auto-regressive conditional heteroscedastic dynamics. Finally, to minimize the computational burden that is associated with estimation of this model, a parallel computing approach is proposed. ?? Journal compilation 2009 Royal Statistical Society.

  7. Assessing Principal Component Regression Prediction of Neurochemicals Detected with Fast-Scan Cyclic Voltammetry

    PubMed Central

    2011-01-01

    Principal component regression is a multivariate data analysis approach routinely used to predict neurochemical concentrations from in vivo fast-scan cyclic voltammetry measurements. This mathematical procedure can rapidly be employed with present day computer programming languages. Here, we evaluate several methods that can be used to evaluate and improve multivariate concentration determination. The cyclic voltammetric representation of the calculated regression vector is shown to be a valuable tool in determining whether the calculated multivariate model is chemically appropriate. The use of Cook’s distance successfully identified outliers contained within in vivo fast-scan cyclic voltammetry training sets. This work also presents the first direct interpretation of a residual color plot and demonstrated the effect of peak shifts on predicted dopamine concentrations. Finally, separate analyses of smaller increments of a single continuous measurement could not be concatenated without substantial error in the predicted neurochemical concentrations due to electrode drift. Taken together, these tools allow for the construction of more robust multivariate calibration models and provide the first approach to assess the predictive ability of a procedure that is inherently impossible to validate because of the lack of in vivo standards. PMID:21966586

  8. Assessing principal component regression prediction of neurochemicals detected with fast-scan cyclic voltammetry.

    PubMed

    Keithley, Richard B; Wightman, R Mark

    2011-06-07

    Principal component regression is a multivariate data analysis approach routinely used to predict neurochemical concentrations from in vivo fast-scan cyclic voltammetry measurements. This mathematical procedure can rapidly be employed with present day computer programming languages. Here, we evaluate several methods that can be used to evaluate and improve multivariate concentration determination. The cyclic voltammetric representation of the calculated regression vector is shown to be a valuable tool in determining whether the calculated multivariate model is chemically appropriate. The use of Cook's distance successfully identified outliers contained within in vivo fast-scan cyclic voltammetry training sets. This work also presents the first direct interpretation of a residual color plot and demonstrated the effect of peak shifts on predicted dopamine concentrations. Finally, separate analyses of smaller increments of a single continuous measurement could not be concatenated without substantial error in the predicted neurochemical concentrations due to electrode drift. Taken together, these tools allow for the construction of more robust multivariate calibration models and provide the first approach to assess the predictive ability of a procedure that is inherently impossible to validate because of the lack of in vivo standards.

  9. Mammalian cell culture monitoring using in situ spectroscopy: Is your method really optimised?

    PubMed

    André, Silvère; Lagresle, Sylvain; Hannas, Zahia; Calvosa, Éric; Duponchel, Ludovic

    2017-03-01

    In recent years, as a result of the process analytical technology initiative of the US Food and Drug Administration, many different works have been carried out on direct and in situ monitoring of critical parameters for mammalian cell cultures by Raman spectroscopy and multivariate regression techniques. However, despite interesting results, it cannot be said that the proposed monitoring strategies, which will reduce errors of the regression models and thus confidence limits of the predictions, are really optimized. Hence, the aim of this article is to optimize some critical steps of spectroscopic acquisition and data treatment in order to reach a higher level of accuracy and robustness of bioprocess monitoring. In this way, we propose first an original strategy to assess the most suited Raman acquisition time for the processes involved. In a second part, we demonstrate the importance of the interbatch variability on the accuracy of the predictive models with a particular focus on the optical probes adjustment. Finally, we propose a methodology for the optimization of the spectral variables selection in order to decrease prediction errors of multivariate regressions. © 2017 American Institute of Chemical Engineers Biotechnol. Prog., 33:308-316, 2017. © 2017 American Institute of Chemical Engineers.

  10. Inferring gene regression networks with model trees

    PubMed Central

    2010-01-01

    Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database) is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear regressions to separate areas of the search space favoring to infer localized similarities over a more global similarity. Furthermore, experimental results show the good performance of REGNET. PMID:20950452

  11. Relationship between body composition and vertical ground reaction forces in obese children when walking.

    PubMed

    Villarrasa-Sapiña, Israel; Serra-Añó, Pilar; Pardo-Ibáñez, Alberto; Gonzalez, Luis-Millán; García-Massó, Xavier

    2017-01-01

    Obesity is now a serious worldwide challenge, especially in children. This condition can cause a number of different health problems, including musculoskeletal disorders, some of which are due to mechanical stress caused by excess body weight. The aim of this study was to determine the association between body composition and the vertical ground reaction force produced during walking in obese children. Sixteen children participated in the study, six females and ten males [11.5 (1.2) years old, 69.8 (15.5) kg, 1.56 (0.09) m, and 28.36 (3.74) kg/m 2 of body mass index (BMI)]. Total weight, lean mass and fat mass were measured by dual-energy X-ray absorptiometry and vertical forces while walking were obtained by a force platform. The vertical force variables analysed were impact and propulsive forces, and the rate of development of both. Multiple regression models for each vertical force parameter were calculated using the body composition variables as input. The impact force regression model was found to be positively related to the weight of obese children and negatively related to lean mass. The regression model showed lean mass was positively related to the propulsive rate. Finally, regression models for impact and propulsive force showed a direct relationship with body weight. Impact force is positively related to the weight of obese children, but lean mass helps to reduce the impact force in this population. Exercise could help obese persons to reduce their total body weight and increase their lean mass, thus reducing impact forces during sports and other activities. Copyright © 2016 Elsevier Ltd. All rights reserved.

  12. Breast mass detection in mammography and tomosynthesis via fully convolutional network-based heatmap regression

    NASA Astrophysics Data System (ADS)

    Zhang, Jun; Cain, Elizabeth Hope; Saha, Ashirbani; Zhu, Zhe; Mazurowski, Maciej A.

    2018-02-01

    Breast mass detection in mammography and digital breast tomosynthesis (DBT) is an essential step in computerized breast cancer analysis. Deep learning-based methods incorporate feature extraction and model learning into a unified framework and have achieved impressive performance in various medical applications (e.g., disease diagnosis, tumor detection, and landmark detection). However, these methods require large-scale accurately annotated data. Unfortunately, it is challenging to get precise annotations of breast masses. To address this issue, we propose a fully convolutional network (FCN) based heatmap regression method for breast mass detection, using only weakly annotated mass regions in mammography images. Specifically, we first generate heat maps of masses based on human-annotated rough regions for breast masses. We then develop an FCN model for end-to-end heatmap regression with an F-score loss function, where the mammography images are regarded as the input and heatmaps for breast masses are used as the output. Finally, the probability map of mass locations can be estimated with the trained model. Experimental results on a mammography dataset with 439 subjects demonstrate the effectiveness of our method. Furthermore, we evaluate whether we can use mammography data to improve detection models for DBT, since mammography shares similar structure with tomosynthesis. We propose a transfer learning strategy by fine-tuning the learned FCN model from mammography images. We test this approach on a small tomosynthesis dataset with only 40 subjects, and we show an improvement in the detection performance as compared to training the model from scratch.

  13. Instinct as interactive structure: Freud's psychoanalysis in historical and metatheoretical perspective.

    PubMed

    Jovanović, Gordana

    2005-01-01

    When Freud began formulating the basic postulates of psychoanalytic theory the concept of instinct was in widespread use. There are very different models of conceptualizing instincts in psychoanalysis: reflex arc, representation, interaction, subject and finally a regressive structure. Freud revised the traditional concept of instinct and his models formed a peculiar metatheoretical history of psychoanalysis. Defining human nature by reference to its determining instinctive essence and commitment to the ideal of natural science led Freud to a naturalistic fallacy. Yet at the same time the hermeneutics of instinct theory reveal a socio-historical meaning of naturalism.

  14. Dental Care Utilization for Examination and Regional Deprivation

    PubMed Central

    Kim, Cheol-Sin; Han, Sun-Young; Lee, Seung Eun; Kang, Jeong-Hee; Kim, Chul-Woung

    2015-01-01

    Objectives: Receiving proper dental care plays a significant role in maintaining good oral health. We investigated the relationship between regional deprivation and dental care utilization. Methods: Multilevel logistic regression was used to identify the relationship between the regional deprivation level and dental care utilization purpose, adjusting for individual-level variables, in adults aged 19+ in the 2008 Korean Community Health Survey (n=220 258). Results: Among Korean adults, 12.8% used dental care to undergo examination and 21.0% visited a dentist for other reasons. In the final model, regional deprivation level was associated with significant variations in dental care utilization for examination (p<0.001). However, this relationship was not shown with dental care utilization for other reasons in the final model. Conclusions: This study’s findings suggest that policy interventions should be considered to reduce regional variations in rates of dental care utilization for examination. PMID:26265665

  15. Modeling groundwater nitrate concentrations in private wells in Iowa

    USGS Publications Warehouse

    Wheeler, David C.; Nolan, Bernard T.; Flory, Abigail R.; DellaValle, Curt T.; Ward, Mary H.

    2015-01-01

    Contamination of drinking water by nitrate is a growing problem in many agricultural areas of the country. Ingested nitrate can lead to the endogenous formation of N-nitroso compounds, potent carcinogens. We developed a predictive model for nitrate concentrations in private wells in Iowa. Using 34,084 measurements of nitrate in private wells, we trained and tested random forest models to predict log nitrate levels by systematically assessing the predictive performance of 179 variables in 36 thematic groups (well depth, distance to sinkholes, location, land use, soil characteristics, nitrogen inputs, meteorology, and other factors). The final model contained 66 variables in 17 groups. Some of the most important variables were well depth, slope length within 1 km of the well, year of sample, and distance to nearest animal feeding operation. The correlation between observed and estimated nitrate concentrations was excellent in the training set (r-square = 0.77) and was acceptable in the testing set (r-square = 0.38). The random forest model had substantially better predictive performance than a traditional linear regression model or a regression tree. Our model will be used to investigate the association between nitrate levels in drinking water and cancer risk in the Iowa participants of the Agricultural Health Study cohort.

  16. Modeling groundwater nitrate concentrations in private wells in Iowa.

    PubMed

    Wheeler, David C; Nolan, Bernard T; Flory, Abigail R; DellaValle, Curt T; Ward, Mary H

    2015-12-01

    Contamination of drinking water by nitrate is a growing problem in many agricultural areas of the country. Ingested nitrate can lead to the endogenous formation of N-nitroso compounds, potent carcinogens. We developed a predictive model for nitrate concentrations in private wells in Iowa. Using 34,084 measurements of nitrate in private wells, we trained and tested random forest models to predict log nitrate levels by systematically assessing the predictive performance of 179 variables in 36 thematic groups (well depth, distance to sinkholes, location, land use, soil characteristics, nitrogen inputs, meteorology, and other factors). The final model contained 66 variables in 17 groups. Some of the most important variables were well depth, slope length within 1 km of the well, year of sample, and distance to nearest animal feeding operation. The correlation between observed and estimated nitrate concentrations was excellent in the training set (r-square=0.77) and was acceptable in the testing set (r-square=0.38). The random forest model had substantially better predictive performance than a traditional linear regression model or a regression tree. Our model will be used to investigate the association between nitrate levels in drinking water and cancer risk in the Iowa participants of the Agricultural Health Study cohort. Copyright © 2015 Elsevier B.V. All rights reserved.

  17. Improving the Spatial Prediction of Soil Organic Carbon Stocks in a Complex Tropical Mountain Landscape by Methodological Specifications in Machine Learning Approaches.

    PubMed

    Ließ, Mareike; Schmidt, Johannes; Glaser, Bruno

    2016-01-01

    Tropical forests are significant carbon sinks and their soils' carbon storage potential is immense. However, little is known about the soil organic carbon (SOC) stocks of tropical mountain areas whose complex soil-landscape and difficult accessibility pose a challenge to spatial analysis. The choice of methodology for spatial prediction is of high importance to improve the expected poor model results in case of low predictor-response correlations. Four aspects were considered to improve model performance in predicting SOC stocks of the organic layer of a tropical mountain forest landscape: Different spatial predictor settings, predictor selection strategies, various machine learning algorithms and model tuning. Five machine learning algorithms: random forests, artificial neural networks, multivariate adaptive regression splines, boosted regression trees and support vector machines were trained and tuned to predict SOC stocks from predictors derived from a digital elevation model and satellite image. Topographical predictors were calculated with a GIS search radius of 45 to 615 m. Finally, three predictor selection strategies were applied to the total set of 236 predictors. All machine learning algorithms-including the model tuning and predictor selection-were compared via five repetitions of a tenfold cross-validation. The boosted regression tree algorithm resulted in the overall best model. SOC stocks ranged between 0.2 to 17.7 kg m-2, displaying a huge variability with diffuse insolation and curvatures of different scale guiding the spatial pattern. Predictor selection and model tuning improved the models' predictive performance in all five machine learning algorithms. The rather low number of selected predictors favours forward compared to backward selection procedures. Choosing predictors due to their indiviual performance was vanquished by the two procedures which accounted for predictor interaction.

  18. Population Dynamics of Dactylella oviparasitica and Heterodera schachtii: Toward a Decision Model for Sugar Beet Planting

    PubMed Central

    Yang, Jiue-in; Benecke, Scott; Jeske, Daniel R.; Rocha, Fernando S.; Smith Becker, Jennifer; Timper, Patricia; Ole Becker, J.

    2012-01-01

    A series of experiments were performed to examine the population dynamics of the sugarbeet cyst nematode, Heterodera schachtii, and the nematophagus fungus Dactylella oviparasitica. After two nematode generations, the population densities of H. schachtii were measured in relation to various initial infestation densities of both D. oviparasitica and H. schachtii. In general, higher initial population densities of D. oviparasitica were associated with lower final population densities of H. schachtii. Regression models showed that the initial densities of D. oviparasitica were only significant when predicting the final densities of H. schachtii J2 and eggs as well as fungal egg parasitism, while the initial densities of J2 were significant for all final H. schachtii population density measurements. We also showed that the densities of H. schachtii-associated D. oviparasitica fluctuate greatly, with rRNA gene numbers going from zero in most field-soil-collected cysts to an average of 4.24 x 108 in mature females isolated directly from root surfaces. Finally, phylogenetic analysis of rRNA genes suggested that D. oviparasitica belongs to a clade of nematophagous fungi that includes Arkansas Fungus strain L (ARF-L) and that these fungi are widely distributed. We anticipate that these findings will provide foundational data facilitating the development of more effective decision models for sugar beet planting. PMID:23481664

  19. A Model to Guide Development of Environmental Final Governing Standards for Overseas United States Department of Defense Installations

    DTIC Science & Technology

    2014-03-28

    four sub-sections were included into “System” because none of them address limits of contaminates or chemicals in the water. 24 The Hazardous...maximum contaminant levels (MCL) of chemicals, stricter emission standards, stricter control limits, greater minimum separation distances, prohibited...0.37 Indonesia Strugglers 52.29 -0.40 Malaysia Progressives 62.51 0.34 Mongolia Regressives 45.37 -0.21 Myanmar Strugglers 52.72 -1.09 Nepal

  20. Essays in energy economics: The electricity industry

    NASA Astrophysics Data System (ADS)

    Martinez-Chombo, Eduardo

    Electricity demand analysis using cointegration and error-correction models with time varying parameters: The Mexican case. In this essay we show how some flexibility can be allowed in modeling the parameters of the electricity demand function by employing the time varying coefficient (TVC) cointegrating model developed by Park and Hahn (1999). With the income elasticity of electricity demand modeled as a TVC, we perform tests to examine the adequacy of the proposed model against the cointegrating regression with fixed coefficients, as well as against the spuriousness of the regression with TVC. The results reject the specification of the model with fixed coefficients and favor the proposed model. We also show how some flexibility is gained in the specification of the error correction model based on the proposed TVC cointegrating model, by including more lags of the error correction term as predetermined variables. Finally, we present the results of some out-of-sample forecast comparison among competing models. Electricity demand and supply in Mexico. In this essay we present a simplified model of the Mexican electricity transmission network. We use the model to approximate the marginal cost of supplying electricity to consumers in different locations and at different times of the year. We examine how costs and system operations will be affected by proposed investments in generation and transmission capacity given a forecast of growth in regional electricity demands. Decomposing electricity prices with jumps. In this essay we propose a model that decomposes electricity prices into two independent stochastic processes: one that represents the "normal" pattern of electricity prices and the other that captures temporary shocks, or "jumps", with non-lasting effects in the market. Each contains specific mean reverting parameters to estimate. In order to identify such components we specify a state-space model with regime switching. Using Kim's (1994) filtering algorithm we estimate the parameters of the model, the transition probabilities and the unobservable components for the mean adjusted series of New South Wales' electricity prices. Finally, bootstrap simulations were performed to estimate the expected contribution of each of the components in the overall electricity prices.

  1. Predicting introductory programming performance: A multi-institutional multivariate study

    NASA Astrophysics Data System (ADS)

    Bergin, Susan; Reilly, Ronan

    2006-12-01

    A model for predicting student performance on introductory programming modules is presented. The model uses attributes identified in a study carried out at four third-level institutions in the Republic of Ireland. Four instruments were used to collect the data and over 25 attributes were examined. A data reduction technique was applied and a logistic regression model using 10-fold stratified cross validation was developed. The model used three attributes: Leaving Certificate Mathematics result (final mathematics examination at second level), number of hours playing computer games while taking the module and programming self-esteem. Prediction success was significant with 80% of students correctly classified. The model also works well on a per-institution level. A discussion on the implications of the model is provided and future work is outlined.

  2. Determination of biodiesel content in biodiesel/diesel blends using NIR and visible spectroscopy with variable selection.

    PubMed

    Fernandes, David Douglas Sousa; Gomes, Adriano A; Costa, Gean Bezerra da; Silva, Gildo William B da; Véras, Germano

    2011-12-15

    This work is concerned of evaluate the use of visible and near-infrared (NIR) range, separately and combined, to determine the biodiesel content in biodiesel/diesel blends using Multiple Linear Regression (MLR) and variable selection by Successive Projections Algorithm (SPA). Full spectrum models employing Partial Least Squares (PLS) and variables selection by Stepwise (SW) regression coupled with Multiple Linear Regression (MLR) and PLS models also with variable selection by Jack-Knife (Jk) were compared the proposed methodology. Several preprocessing were evaluated, being chosen derivative Savitzky-Golay with second-order polynomial and 17-point window for NIR and visible-NIR range, with offset correction. A total of 100 blends with biodiesel content between 5 and 50% (v/v) prepared starting from ten sample of biodiesel. In the NIR and visible region the best model was the SPA-MLR using only two and eight wavelengths with RMSEP of 0.6439% (v/v) and 0.5741 respectively, while in the visible-NIR region the best model was the SW-MLR using five wavelengths and RMSEP of 0.9533% (v/v). Results indicate that both spectral ranges evaluated showed potential for developing a rapid and nondestructive method to quantify biodiesel in blends with mineral diesel. Finally, one can still mention that the improvement in terms of prediction error obtained with the procedure for variables selection was significant. Copyright © 2011 Elsevier B.V. All rights reserved.

  3. Explicit criteria for prioritization of cataract surgery

    PubMed Central

    Ma Quintana, José; Escobar, Antonio; Bilbao, Amaia

    2006-01-01

    Background Consensus techniques have been used previously to create explicit criteria to prioritize cataract extraction; however, the appropriateness of the intervention was not included explicitly in previous studies. We developed a prioritization tool for cataract extraction according to the RAND method. Methods Criteria were developed using a modified Delphi panel judgment process. A panel of 11 ophthalmologists was assembled. Ratings were analyzed regarding the level of agreement among panelists. We studied the effect of all variables on the final panel score using general linear and logistic regression models. Priority scoring systems were developed by means of optimal scaling and general linear models. The explicit criteria developed were summarized by means of regression tree analysis. Results Eight variables were considered to create the indications. Of the 310 indications that the panel evaluated, 22.6% were considered high priority, 52.3% intermediate priority, and 25.2% low priority. Agreement was reached for 31.9% of the indications and disagreement for 0.3%. Logistic regression and general linear models showed that the preoperative visual acuity of the cataractous eye, visual function, and anticipated visual acuity postoperatively were the most influential variables. Alternative and simple scoring systems were obtained by optimal scaling and general linear models where the previous variables were also the most important. The decision tree also shows the importance of the previous variables and the appropriateness of the intervention. Conclusion Our results showed acceptable validity as an evaluation and management tool for prioritizing cataract extraction. It also provides easy algorithms for use in clinical practice. PMID:16512893

  4. Predicting the potential distribution of invasive exotic species using GIS and information-theoretic approaches: A case of ragweed (Ambrosia artemisiifolia L.) distribution in China

    USGS Publications Warehouse

    Hao, Chen; LiJun, Chen; Albright, Thomas P.

    2007-01-01

    Invasive exotic species pose a growing threat to the economy, public health, and ecological integrity of nations worldwide. Explaining and predicting the spatial distribution of invasive exotic species is of great importance to prevention and early warning efforts. We are investigating the potential distribution of invasive exotic species, the environmental factors that influence these distributions, and the ability to predict them using statistical and information-theoretic approaches. For some species, detailed presence/absence occurrence data are available, allowing the use of a variety of standard statistical techniques. However, for most species, absence data are not available. Presented with the challenge of developing a model based on presence-only information, we developed an improved logistic regression approach using Information Theory and Frequency Statistics to produce a relative suitability map. This paper generated a variety of distributions of ragweed (Ambrosia artemisiifolia L.) from logistic regression models applied to herbarium specimen location data and a suite of GIS layers including climatic, topographic, and land cover information. Our logistic regression model was based on Akaike's Information Criterion (AIC) from a suite of ecologically reasonable predictor variables. Based on the results we provided a new Frequency Statistical method to compartmentalize habitat-suitability in the native range. Finally, we used the model and the compartmentalized criterion developed in native ranges to "project" a potential distribution onto the exotic ranges to build habitat-suitability maps. ?? Science in China Press 2007.

  5. Error analysis of leaf area estimates made from allometric regression models

    NASA Technical Reports Server (NTRS)

    Feiveson, A. H.; Chhikara, R. S.

    1986-01-01

    Biological net productivity, measured in terms of the change in biomass with time, affects global productivity and the quality of life through biochemical and hydrological cycles and by its effect on the overall energy balance. Estimating leaf area for large ecosystems is one of the more important means of monitoring this productivity. For a particular forest plot, the leaf area is often estimated by a two-stage process. In the first stage, known as dimension analysis, a small number of trees are felled so that their areas can be measured as accurately as possible. These leaf areas are then related to non-destructive, easily-measured features such as bole diameter and tree height, by using a regression model. In the second stage, the non-destructive features are measured for all or for a sample of trees in the plots and then used as input into the regression model to estimate the total leaf area. Because both stages of the estimation process are subject to error, it is difficult to evaluate the accuracy of the final plot leaf area estimates. This paper illustrates how a complete error analysis can be made, using an example from a study made on aspen trees in northern Minnesota. The study was a joint effort by NASA and the University of California at Santa Barbara known as COVER (Characterization of Vegetation with Remote Sensing).

  6. Improving the Spatial Prediction of Soil Organic Carbon Stocks in a Complex Tropical Mountain Landscape by Methodological Specifications in Machine Learning Approaches

    PubMed Central

    Schmidt, Johannes; Glaser, Bruno

    2016-01-01

    Tropical forests are significant carbon sinks and their soils’ carbon storage potential is immense. However, little is known about the soil organic carbon (SOC) stocks of tropical mountain areas whose complex soil-landscape and difficult accessibility pose a challenge to spatial analysis. The choice of methodology for spatial prediction is of high importance to improve the expected poor model results in case of low predictor-response correlations. Four aspects were considered to improve model performance in predicting SOC stocks of the organic layer of a tropical mountain forest landscape: Different spatial predictor settings, predictor selection strategies, various machine learning algorithms and model tuning. Five machine learning algorithms: random forests, artificial neural networks, multivariate adaptive regression splines, boosted regression trees and support vector machines were trained and tuned to predict SOC stocks from predictors derived from a digital elevation model and satellite image. Topographical predictors were calculated with a GIS search radius of 45 to 615 m. Finally, three predictor selection strategies were applied to the total set of 236 predictors. All machine learning algorithms—including the model tuning and predictor selection—were compared via five repetitions of a tenfold cross-validation. The boosted regression tree algorithm resulted in the overall best model. SOC stocks ranged between 0.2 to 17.7 kg m-2, displaying a huge variability with diffuse insolation and curvatures of different scale guiding the spatial pattern. Predictor selection and model tuning improved the models’ predictive performance in all five machine learning algorithms. The rather low number of selected predictors favours forward compared to backward selection procedures. Choosing predictors due to their indiviual performance was vanquished by the two procedures which accounted for predictor interaction. PMID:27128736

  7. Development and Validation of an Empiric Tool to Predict Favorable Neurologic Outcomes Among PICU Patients.

    PubMed

    Gupta, Punkaj; Rettiganti, Mallikarjuna; Gossett, Jeffrey M; Daufeldt, Jennifer; Rice, Tom B; Wetzel, Randall C

    2018-01-01

    To create a novel tool to predict favorable neurologic outcomes during ICU stay among children with critical illness. Logistic regression models using adaptive lasso methodology were used to identify independent factors associated with favorable neurologic outcomes. A mixed effects logistic regression model was used to create the final prediction model including all predictors selected from the lasso model. Model validation was performed using a 10-fold internal cross-validation approach. Virtual Pediatric Systems (VPS, LLC, Los Angeles, CA) database. Patients less than 18 years old admitted to one of the participating ICUs in the Virtual Pediatric Systems database were included (2009-2015). None. A total of 160,570 patients from 90 hospitals qualified for inclusion. Of these, 1,675 patients (1.04%) were associated with a decline in Pediatric Cerebral Performance Category scale by at least 2 between ICU admission and ICU discharge (unfavorable neurologic outcome). The independent factors associated with unfavorable neurologic outcome included higher weight at ICU admission, higher Pediatric Index of Morality-2 score at ICU admission, cardiac arrest, stroke, seizures, head/nonhead trauma, use of conventional mechanical ventilation and high-frequency oscillatory ventilation, prolonged hospital length of ICU stay, and prolonged use of mechanical ventilation. The presence of chromosomal anomaly, cardiac surgery, and utilization of nitric oxide were associated with favorable neurologic outcome. The final online prediction tool can be accessed at https://soipredictiontool.shinyapps.io/GNOScore/. Our model predicted 139,688 patients with favorable neurologic outcomes in an internal validation sample when the observed number of patients with favorable neurologic outcomes was among 139,591 patients. The area under the receiver operating curve for the validation model was 0.90. This proposed prediction tool encompasses 20 risk factors into one probability to predict favorable neurologic outcome during ICU stay among children with critical illness. Future studies should seek external validation and improved discrimination of this prediction tool.

  8. Growth status of Korean orphans raised in the affluent West: anthropometric trend, multivariate determinants, and descriptive comparison with their North and South Korean peers.

    PubMed

    Schwekendiek, Daniel J

    2017-04-01

    This paper investigates the trend in height among adult Korean orphans who were adopted in early life into affluent Western nations. Final heights of 148 females were analyzed based on a Korean government survey conducted in 2008. Height of the orphans was descriptively compared against final heights of South and North Koreans. Furthermore, statistical determinants of orphan height were investigated in multivariate regressions. Mean height of Korean orphans was 160.44 cm (SD 5.89), which was higher than that of South Koreans at 158.83 cm (SD 5.01). Both Korean orphans and South Koreans were taller than North Koreans at 155.30 cm (SD 4.94). However, height of Korean orphans stagnated at around 160-161 cm while those of North and South Koreans improved over time. In the regression analysis, the socioeconomic status of the adoptive family was statistically significant in all models, while dummies for the adoptive nations and age at adoption were insignificant. This study shows that the mean final height of women experiencing extreme environmental improvements in early-life is capped at 160-161 cm, tentatively suggesting that social stress factors in the host nation or early-life factors in the birth nation might have offset some of the environmental enrichment effects achieved through intercountry adoption.

  9. Oxidative desulfurization: kinetic modelling.

    PubMed

    Dhir, S; Uppaluri, R; Purkait, M K

    2009-01-30

    Increasing environmental legislations coupled with enhanced production of petroleum products demand, the deployment of novel technologies to remove organic sulfur efficiently. This work represents the kinetic modeling of ODS using H(2)O(2) over tungsten-containing layered double hydroxide (LDH) using the experimental data provided by Hulea et al. [V. Hulea, A.L. Maciuca, F. Fajula, E. Dumitriu, Catalytic oxidation of thiophenes and thioethers with hydrogen peroxide in the presence of W-containing layered double hydroxides, Appl. Catal. A: Gen. 313 (2) (2006) 200-207]. The kinetic modeling approach in this work initially targets the scope of the generation of a superstructure of micro-kinetic reaction schemes and models assuming Langmuir-Hinshelwood (LH) and Eley-Rideal (ER) mechanisms. Subsequently, the screening and selection of above models is initially based on profile-based elimination of incompetent schemes followed by non-linear regression search performed using the Levenberg-Marquardt algorithm (LMA) for the chosen models. The above analysis inferred that Eley-Rideal mechanism describes the kinetic behavior of ODS process using tungsten-containing LDH, with adsorption of reactant and intermediate product only taking place on the catalyst surface. Finally, an economic index is presented that scopes the economic aspects of the novel catalytic technology with the parameters obtained during regression analysis to conclude that the cost factor for the catalyst is 0.0062-0.04759 US $ per barrel.

  10. Global Land Use Regression Model for Nitrogen Dioxide Air Pollution.

    PubMed

    Larkin, Andrew; Geddes, Jeffrey A; Martin, Randall V; Xiao, Qingyang; Liu, Yang; Marshall, Julian D; Brauer, Michael; Hystad, Perry

    2017-06-20

    Nitrogen dioxide is a common air pollutant with growing evidence of health impacts independent of other common pollutants such as ozone and particulate matter. However, the worldwide distribution of NO 2 exposure and associated impacts on health is still largely uncertain. To advance global exposure estimates we created a global nitrogen dioxide (NO 2 ) land use regression model for 2011 using annual measurements from 5,220 air monitors in 58 countries. The model captured 54% of global NO 2 variation, with a mean absolute error of 3.7 ppb. Regional performance varied from R 2 = 0.42 (Africa) to 0.67 (South America). Repeated 10% cross-validation using bootstrap sampling (n = 10,000) demonstrated a robust performance with respect to air monitor sampling in North America, Europe, and Asia (adjusted R 2 within 2%) but not for Africa and Oceania (adjusted R 2 within 11%) where NO 2 monitoring data are sparse. The final model included 10 variables that captured both between and within-city spatial gradients in NO 2 concentrations. Variable contributions differed between continental regions, but major roads within 100 m and satellite-derived NO 2 were consistently the strongest predictors. The resulting model can be used for global risk assessments and health studies, particularly in countries without existing NO 2 monitoring data or models.

  11. An improved advertising CTR prediction approach based on the fuzzy deep neural network

    PubMed Central

    Gao, Shu; Li, Mingjiang

    2018-01-01

    Combining a deep neural network with fuzzy theory, this paper proposes an advertising click-through rate (CTR) prediction approach based on a fuzzy deep neural network (FDNN). In this approach, fuzzy Gaussian-Bernoulli restricted Boltzmann machine (FGBRBM) is first applied to input raw data from advertising datasets. Next, fuzzy restricted Boltzmann machine (FRBM) is used to construct the fuzzy deep belief network (FDBN) with the unsupervised method layer by layer. Finally, fuzzy logistic regression (FLR) is utilized for modeling the CTR. The experimental results show that the proposed FDNN model outperforms several baseline models in terms of both data representation capability and robustness in advertising click log datasets with noise. PMID:29727443

  12. An improved advertising CTR prediction approach based on the fuzzy deep neural network.

    PubMed

    Jiang, Zilong; Gao, Shu; Li, Mingjiang

    2018-01-01

    Combining a deep neural network with fuzzy theory, this paper proposes an advertising click-through rate (CTR) prediction approach based on a fuzzy deep neural network (FDNN). In this approach, fuzzy Gaussian-Bernoulli restricted Boltzmann machine (FGBRBM) is first applied to input raw data from advertising datasets. Next, fuzzy restricted Boltzmann machine (FRBM) is used to construct the fuzzy deep belief network (FDBN) with the unsupervised method layer by layer. Finally, fuzzy logistic regression (FLR) is utilized for modeling the CTR. The experimental results show that the proposed FDNN model outperforms several baseline models in terms of both data representation capability and robustness in advertising click log datasets with noise.

  13. Decision tree modeling using R.

    PubMed

    Zhang, Zhongheng

    2016-08-01

    In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building.

  14. Is Early Prescribing of Opioid and Psychotropic Medications Associated With Delayed Return to Work and Increased Final Workers' Compensation Cost?

    PubMed

    Tao, Xuguang Grant; Lavin, Robert A; Yuspeh, Larry; Weaver, Virginia M; Bernacki, Edward J

    2015-12-01

    To explore the association between the initial 60 days of prescriptions for psychotropic medications and final workers' compensation claim outcomes. A cohort of 11,394 claimants involved in lost time injuries between 1999 and 2002 were followed through December 31, 2009. Logistic regressions and Cox Proportional Hazard Models were used in the analysis. The initial 60 days of prescriptions for psychotropic medications were significantly associated with a final claim cost at least $100,000. Odds ratios were 1.88 for short-acting opioids, 2.14 for hypnotics, antianxiety agents, or antidepressants, and 3.91 for long-acting opioids, respectively. Significant associations were also found between decreased time lost from work and decreased claim closures during the study period. Early prescription of opioids and other psychotropic drugs may be useful predictors of high claim costs and time lost from work.

  15. A New Z Score Curve of the Coronary Arterial Internal Diameter Using the Lambda-Mu-Sigma Method in a Pediatric Population.

    PubMed

    Kobayashi, Tohru; Fuse, Shigeto; Sakamoto, Naoko; Mikami, Masashi; Ogawa, Shunichi; Hamaoka, Kenji; Arakaki, Yoshio; Nakamura, Tsuneyuki; Nagasawa, Hiroyuki; Kato, Taichi; Jibiki, Toshiaki; Iwashima, Satoru; Yamakawa, Masaru; Ohkubo, Takashi; Shimoyama, Shinya; Aso, Kentaro; Sato, Seiichi; Saji, Tsutomu

    2016-08-01

    Several coronary artery Z score models have been developed. However, a Z score model derived by the lambda-mu-sigma (LMS) method has not been established. Echocardiographic measurements of the proximal right coronary artery, left main coronary artery, proximal left anterior descending coronary artery, and proximal left circumflex artery were prospectively collected in 3,851 healthy children ≤18 years of age and divided into developmental and validation data sets. In the developmental data set, smooth curves were fitted for each coronary artery using linear, logarithmic, square-root, and LMS methods for both sexes. The relative goodness of fit of these models was compared using the Bayesian information criterion. The best-fitting model was tested for reproducibility using the validation data set. The goodness of fit of the selected model was visually compared with that of the previously reported regression models using a Q-Q plot. Because the internal diameter of each coronary artery was not similar between sexes, sex-specific Z score models were developed. The LMS model with body surface area as the independent variable showed the best goodness of fit; therefore, the internal diameter of each coronary artery was transformed into a sex-specific Z score on the basis of body surface area using the LMS method. In the validation data set, a Q-Q plot of each model indicated that the distribution of Z scores in the LMS models was closer to the normal distribution compared with previously reported regression models. Finally, the final models for each coronary artery in both sexes were developed using the developmental and validation data sets. A Microsoft Excel-based Z score calculator was also created, which is freely available online (http://raise.umin.jp/zsp/calculator/). Novel LMS models with which to estimate the sex-specific Z score of each internal coronary artery diameter were generated and validated using a large pediatric population. Copyright © 2016 American Society of Echocardiography. Published by Elsevier Inc. All rights reserved.

  16. Analysis of the relationship between community characteristics and depression using geographically weighted regression.

    PubMed

    Choi, Hyungyun; Kim, Ho

    2017-01-01

    Achieving national health equity is currently a pressing issue. Large regional variations in the health determinants are observed. Depression, one of the most common mental disorders, has large variations in incidence among different populations, and thus must be regionally analyzed. The present study aimed at analyzing regional disparities in depressive symptoms and identifying the health determinants that require regional interventions. Using health indicators of depression in the Korea Community Health Survey 2011 and 2013, the Moran's I was calculated for each variable to assess spatial autocorrelation, and a validated geographically weighted regression analysis using ArcGIS version 10.1 of different domains: health behavior, morbidity, and the social and physical environments were created, and the final model included a combination of significant variables in these models. In the health behavior domain, the weekly breakfast intake frequency of 1-2 times was the most significantly correlated with depression in all regions, followed by exposure to secondhand smoke and the level of perceived stress in some regions. In the morbidity domain, the rate of lifetime diagnosis of myocardial infarction was the most significantly correlated with depression. In the social and physical environment domain, the trust environment within the local community was highly correlated with depression, showing that lower the level of trust, higher was the level of depression. A final model was constructed and analyzed using highly influential variables from each domain. The models were divided into two groups according to the significance of correlation of each variable with the experience of depression symptoms. The indicators of the regional health status are significantly associated with the incidence of depressive symptoms within a region. The significance of this correlation varied across regions.

  17. Ensemble predictive model for more accurate soil organic carbon spectroscopic estimation

    NASA Astrophysics Data System (ADS)

    Vašát, Radim; Kodešová, Radka; Borůvka, Luboš

    2017-07-01

    A myriad of signal pre-processing strategies and multivariate calibration techniques has been explored in attempt to improve the spectroscopic prediction of soil organic carbon (SOC) over the last few decades. Therefore, to come up with a novel, more powerful, and accurate predictive approach to beat the rank becomes a challenging task. However, there may be a way, so that combine several individual predictions into a single final one (according to ensemble learning theory). As this approach performs best when combining in nature different predictive algorithms that are calibrated with structurally different predictor variables, we tested predictors of two different kinds: 1) reflectance values (or transforms) at each wavelength and 2) absorption feature parameters. Consequently we applied four different calibration techniques, two per each type of predictors: a) partial least squares regression and support vector machines for type 1, and b) multiple linear regression and random forest for type 2. The weights to be assigned to individual predictions within the ensemble model (constructed as a weighted average) were determined by an automated procedure that ensured the best solution among all possible was selected. The approach was tested at soil samples taken from surface horizon of four sites differing in the prevailing soil units. By employing the ensemble predictive model the prediction accuracy of SOC improved at all four sites. The coefficient of determination in cross-validation (R2cv) increased from 0.849, 0.611, 0.811 and 0.644 (the best individual predictions) to 0.864, 0.650, 0.824 and 0.698 for Site 1, 2, 3 and 4, respectively. Generally, the ensemble model affected the final prediction so that the maximal deviations of predicted vs. observed values of the individual predictions were reduced, and thus the correlation cloud became thinner as desired.

  18. Q-learning residual analysis: application to the effectiveness of sequences of antipsychotic medications for patients with schizophrenia.

    PubMed

    Ertefaie, Ashkan; Shortreed, Susan; Chakraborty, Bibhas

    2016-06-15

    Q-learning is a regression-based approach that uses longitudinal data to construct dynamic treatment regimes, which are sequences of decision rules that use patient information to inform future treatment decisions. An optimal dynamic treatment regime is composed of a sequence of decision rules that indicate how to optimally individualize treatment using the patients' baseline and time-varying characteristics to optimize the final outcome. Constructing optimal dynamic regimes using Q-learning depends heavily on the assumption that regression models at each decision point are correctly specified; yet model checking in the context of Q-learning has been largely overlooked in the current literature. In this article, we show that residual plots obtained from standard Q-learning models may fail to adequately check the quality of the model fit. We present a modified Q-learning procedure that accommodates residual analyses using standard tools. We present simulation studies showing the advantage of the proposed modification over standard Q-learning. We illustrate this new Q-learning approach using data collected from a sequential multiple assignment randomized trial of patients with schizophrenia. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  19. Robust Joint Graph Sparse Coding for Unsupervised Spectral Feature Selection.

    PubMed

    Zhu, Xiaofeng; Li, Xuelong; Zhang, Shichao; Ju, Chunhua; Wu, Xindong

    2017-06-01

    In this paper, we propose a new unsupervised spectral feature selection model by embedding a graph regularizer into the framework of joint sparse regression for preserving the local structures of data. To do this, we first extract the bases of training data by previous dictionary learning methods and, then, map original data into the basis space to generate their new representations, by proposing a novel joint graph sparse coding (JGSC) model. In JGSC, we first formulate its objective function by simultaneously taking subspace learning and joint sparse regression into account, then, design a new optimization solution to solve the resulting objective function, and further prove the convergence of the proposed solution. Furthermore, we extend JGSC to a robust JGSC (RJGSC) via replacing the least square loss function with a robust loss function, for achieving the same goals and also avoiding the impact of outliers. Finally, experimental results on real data sets showed that both JGSC and RJGSC outperformed the state-of-the-art algorithms in terms of k -nearest neighbor classification performance.

  20. A canonical correlation neural network for multicollinearity and functional data.

    PubMed

    Gou, Zhenkun; Fyfe, Colin

    2004-03-01

    We review a recent neural implementation of Canonical Correlation Analysis and show, using ideas suggested by Ridge Regression, how to make the algorithm robust. The network is shown to operate on data sets which exhibit multicollinearity. We develop a second model which not only performs as well on multicollinear data but also on general data sets. This model allows us to vary a single parameter so that the network is capable of performing Partial Least Squares regression (at one extreme) to Canonical Correlation Analysis (at the other)and every intermediate operation between the two. On multicollinear data, the parameter setting is shown to be important but on more general data no particular parameter setting is required. Finally, we develop a second penalty term which acts on such data as a smoother in that the resulting weight vectors are much smoother and more interpretable than the weights without the robustification term. We illustrate our algorithms on both artificial and real data.

  1. Model averaging and muddled multimodel inferences.

    PubMed

    Cade, Brian S

    2015-09-01

    Three flawed practices associated with model averaging coefficients for predictor variables in regression models commonly occur when making multimodel inferences in analyses of ecological data. Model-averaged regression coefficients based on Akaike information criterion (AIC) weights have been recommended for addressing model uncertainty but they are not valid, interpretable estimates of partial effects for individual predictors when there is multicollinearity among the predictor variables. Multicollinearity implies that the scaling of units in the denominators of the regression coefficients may change across models such that neither the parameters nor their estimates have common scales, therefore averaging them makes no sense. The associated sums of AIC model weights recommended to assess relative importance of individual predictors are really a measure of relative importance of models, with little information about contributions by individual predictors compared to other measures of relative importance based on effects size or variance reduction. Sometimes the model-averaged regression coefficients for predictor variables are incorrectly used to make model-averaged predictions of the response variable when the models are not linear in the parameters. I demonstrate the issues with the first two practices using the college grade point average example extensively analyzed by Burnham and Anderson. I show how partial standard deviations of the predictor variables can be used to detect changing scales of their estimates with multicollinearity. Standardizing estimates based on partial standard deviations for their variables can be used to make the scaling of the estimates commensurate across models, a necessary but not sufficient condition for model averaging of the estimates to be sensible. A unimodal distribution of estimates and valid interpretation of individual parameters are additional requisite conditions. The standardized estimates or equivalently the t statistics on unstandardized estimates also can be used to provide more informative measures of relative importance than sums of AIC weights. Finally, I illustrate how seriously compromised statistical interpretations and predictions can be for all three of these flawed practices by critiquing their use in a recent species distribution modeling technique developed for predicting Greater Sage-Grouse (Centrocercus urophasianus) distribution in Colorado, USA. These model averaging issues are common in other ecological literature and ought to be discontinued if we are to make effective scientific contributions to ecological knowledge and conservation of natural resources.

  2. Model averaging and muddled multimodel inferences

    USGS Publications Warehouse

    Cade, Brian S.

    2015-01-01

    Three flawed practices associated with model averaging coefficients for predictor variables in regression models commonly occur when making multimodel inferences in analyses of ecological data. Model-averaged regression coefficients based on Akaike information criterion (AIC) weights have been recommended for addressing model uncertainty but they are not valid, interpretable estimates of partial effects for individual predictors when there is multicollinearity among the predictor variables. Multicollinearity implies that the scaling of units in the denominators of the regression coefficients may change across models such that neither the parameters nor their estimates have common scales, therefore averaging them makes no sense. The associated sums of AIC model weights recommended to assess relative importance of individual predictors are really a measure of relative importance of models, with little information about contributions by individual predictors compared to other measures of relative importance based on effects size or variance reduction. Sometimes the model-averaged regression coefficients for predictor variables are incorrectly used to make model-averaged predictions of the response variable when the models are not linear in the parameters. I demonstrate the issues with the first two practices using the college grade point average example extensively analyzed by Burnham and Anderson. I show how partial standard deviations of the predictor variables can be used to detect changing scales of their estimates with multicollinearity. Standardizing estimates based on partial standard deviations for their variables can be used to make the scaling of the estimates commensurate across models, a necessary but not sufficient condition for model averaging of the estimates to be sensible. A unimodal distribution of estimates and valid interpretation of individual parameters are additional requisite conditions. The standardized estimates or equivalently the tstatistics on unstandardized estimates also can be used to provide more informative measures of relative importance than sums of AIC weights. Finally, I illustrate how seriously compromised statistical interpretations and predictions can be for all three of these flawed practices by critiquing their use in a recent species distribution modeling technique developed for predicting Greater Sage-Grouse (Centrocercus urophasianus) distribution in Colorado, USA. These model averaging issues are common in other ecological literature and ought to be discontinued if we are to make effective scientific contributions to ecological knowledge and conservation of natural resources.

  3. Logistic and Multiple Regression: A Two-Pronged Approach to Accurately Estimate Cost Growth in Major DoD Weapon Systems

    DTIC Science & Technology

    2004-03-01

    Breusch - Pagan test for constant variance of the residuals. Using Microsoft Excel® we calculate a p-value of 0.841237. This high p-value, which is above...our alpha of 0.05, indicates that our residuals indeed pass the Breusch - Pagan test for constant variance. In addition to the assumption tests , we...Wilk Test for Normality – Support (Reduced) Model (OLS) Finally, we perform a Breusch - Pagan test for constant variance of the residuals. Using

  4. Stepwise group sparse regression (SGSR): gene-set-based pharmacogenomic predictive models with stepwise selection of functional priors.

    PubMed

    Jang, In Sock; Dienstmann, Rodrigo; Margolin, Adam A; Guinney, Justin

    2015-01-01

    Complex mechanisms involving genomic aberrations in numerous proteins and pathways are believed to be a key cause of many diseases such as cancer. With recent advances in genomics, elucidating the molecular basis of cancer at a patient level is now feasible, and has led to personalized treatment strategies whereby a patient is treated according to his or her genomic profile. However, there is growing recognition that existing treatment modalities are overly simplistic, and do not fully account for the deep genomic complexity associated with sensitivity or resistance to cancer therapies. To overcome these limitations, large-scale pharmacogenomic screens of cancer cell lines--in conjunction with modern statistical learning approaches--have been used to explore the genetic underpinnings of drug response. While these analyses have demonstrated the ability to infer genetic predictors of compound sensitivity, to date most modeling approaches have been data-driven, i.e. they do not explicitly incorporate domain-specific knowledge (priors) in the process of learning a model. While a purely data-driven approach offers an unbiased perspective of the data--and may yield unexpected or novel insights--this strategy introduces challenges for both model interpretability and accuracy. In this study, we propose a novel prior-incorporated sparse regression model in which the choice of informative predictor sets is carried out by knowledge-driven priors (gene sets) in a stepwise fashion. Under regularization in a linear regression model, our algorithm is able to incorporate prior biological knowledge across the predictive variables thereby improving the interpretability of the final model with no loss--and often an improvement--in predictive performance. We evaluate the performance of our algorithm compared to well-known regularization methods such as LASSO, Ridge and Elastic net regression in the Cancer Cell Line Encyclopedia (CCLE) and Genomics of Drug Sensitivity in Cancer (Sanger) pharmacogenomics datasets, demonstrating that incorporation of the biological priors selected by our model confers improved predictability and interpretability, despite much fewer predictors, over existing state-of-the-art methods.

  5. Effects of urban form on the urban heat island effect based on spatial regression model.

    PubMed

    Yin, Chaohui; Yuan, Man; Lu, Youpeng; Huang, Yaping; Liu, Yanfang

    2018-09-01

    The urban heat island (UHI) effect is becoming more of a concern with the accelerated process of urbanization. However, few studies have examined the effect of urban form on land surface temperature (LST) especially from an urban planning perspective. This paper used spatial regression model to investigate the effects of both land use composition and urban form on LST in Wuhan City, China, based on the regulatory planning management unit. Landsat ETM+ image data was used to estimate LST. Land use composition was calculated by impervious surface area proportion, vegetated area proportion, and water proportion, while urban form indicators included sky view factor (SVF), building density, and floor area ratio (FAR). We first tested for spatial autocorrelation of urban LST, which confirmed that a traditional regression method would be invalid. A spatial error model (SEM) was chosen because its parameters were better than a spatial lag model (SLM). The results showed that urban form metrics should be the focus for mitigation efforts of UHI effects. In addition, analysis of the relationship between urban form and UHI effect based on the regulatory planning management unit was helpful for promoting corresponding UHI effect mitigation rules in practice. Finally, the spatial regression model was recommended to be an appropriate method for dealing with problems related to the urban thermal environment. Results suggested that the impact of urbanization on the UHI effect can be mitigated not only by balancing various land use types, but also by optimizing urban form, which is even more effective. This research expands the scientific understanding of effects of urban form on UHI by explicitly analyzing indicators closely related to urban detailed planning at the level of regulatory planning management unit. In addition, it may provide important insights and effective regulation measures for urban planners to mitigate future UHI effects. Copyright © 2018 Elsevier B.V. All rights reserved.

  6. Geospatial Predictive Modelling for Climate Mapping of Selected Severe Weather Phenomena Over Poland: A Methodological Approach

    NASA Astrophysics Data System (ADS)

    Walawender, Ewelina; Walawender, Jakub P.; Ustrnul, Zbigniew

    2017-02-01

    The main purpose of the study is to introduce methods for mapping the spatial distribution of the occurrence of selected atmospheric phenomena (thunderstorms, fog, glaze and rime) over Poland from 1966 to 2010 (45 years). Limited in situ observations as well the discontinuous and location-dependent nature of these phenomena make traditional interpolation inappropriate. Spatially continuous maps were created with the use of geospatial predictive modelling techniques. For each given phenomenon, an algorithm identifying its favourable meteorological and environmental conditions was created on the basis of observations recorded at 61 weather stations in Poland. Annual frequency maps presenting the probability of a day with a thunderstorm, fog, glaze or rime were created with the use of a modelled, gridded dataset by implementing predefined algorithms. Relevant explanatory variables were derived from NCEP/NCAR reanalysis and downscaled with the use of a Regional Climate Model. The resulting maps of favourable meteorological conditions were found to be valuable and representative on the country scale but at different correlation ( r) strength against in situ data (from r = 0.84 for thunderstorms to r = 0.15 for fog). A weak correlation between gridded estimates of fog occurrence and observations data indicated the very local nature of this phenomenon. For this reason, additional environmental predictors of fog occurrence were also examined. Topographic parameters derived from the SRTM elevation model and reclassified CORINE Land Cover data were used as the external, explanatory variables for the multiple linear regression kriging used to obtain the final map. The regression model explained 89 % of annual frequency of fog variability in the study area. Regression residuals were interpolated via simple kriging.

  7. MO-C-17A-03: A GPU-Based Method for Validating Deformable Image Registration in Head and Neck Radiotherapy Using Biomechanical Modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Neylon, J; Min, Y; Qi, S

    2014-06-15

    Purpose: Deformable image registration (DIR) plays a pivotal role in head and neck adaptive radiotherapy but a systematic validation of DIR algorithms has been limited by a lack of quantitative high-resolution groundtruth. We address this limitation by developing a GPU-based framework that provides a systematic DIR validation by generating (a) model-guided synthetic CTs representing posture and physiological changes, and (b) model-guided landmark-based validation. Method: The GPU-based framework was developed to generate massive mass-spring biomechanical models from patient simulation CTs and contoured structures. The biomechanical model represented soft tissue deformations for known rigid skeletal motion. Posture changes were simulated by articulatingmore » skeletal anatomy, which subsequently applied elastic corrective forces upon the soft tissue. Physiological changes such as tumor regression and weight loss were simulated in a biomechanically precise manner. Synthetic CT data was then generated from the deformed anatomy. The initial and final positions for one hundred randomly-chosen mass elements inside each of the internal contoured structures were recorded as ground truth data. The process was automated to create 45 synthetic CT datasets for a given patient CT. For instance, the head rotation was varied between +/− 4 degrees along each axis, and tumor volumes were systematically reduced up to 30%. Finally, the original CT and deformed synthetic CT were registered using an optical flow based DIR. Results: Each synthetic data creation took approximately 28 seconds of computation time. The number of landmarks per data set varied between two and three thousand. The validation method is able to perform sub-voxel analysis of the DIR, and report the results by structure, giving a much more in depth investigation of the error. Conclusions: We presented a GPU based high-resolution biomechanical head and neck model to validate DIR algorithms by generating CT equivalent 3D volumes with simulated posture changes and physiological regression.« less

  8. Predicting fundamental and realized distributions based on thermal niche: A case study of a freshwater turtle

    NASA Astrophysics Data System (ADS)

    Rodrigues, João Fabrício Mota; Coelho, Marco Túlio Pacheco; Ribeiro, Bruno R.

    2018-04-01

    Species distribution models (SDM) have been broadly used in ecology to address theoretical and practical problems. Currently, there are two main approaches to generate SDMs: (i) correlative, which is based on species occurrences and environmental predictor layers and (ii) process-based models, which are constructed based on species' functional traits and physiological tolerances. The distributions estimated by each approach are based on different components of species niche. Predictions of correlative models approach species realized niches, while predictions of process-based are more akin to species fundamental niche. Here, we integrated the predictions of fundamental and realized distributions of the freshwater turtle Trachemys dorbigni. Fundamental distribution was estimated using data of T. dorbigni's egg incubation temperature, and realized distribution was estimated using species occurrence records. Both types of distributions were estimated using the same regression approaches (logistic regression and support vector machines), both considering macroclimatic and microclimatic temperatures. The realized distribution of T. dorbigni was generally nested in its fundamental distribution reinforcing theoretical assumptions that the species' realized niche is a subset of its fundamental niche. Both modelling algorithms produced similar results but microtemperature generated better results than macrotemperature for the incubation model. Finally, our results reinforce the conclusion that species realized distributions are constrained by other factors other than just thermal tolerances.

  9. Statistical analysis of whole-body absorption depending on anatomical human characteristics at a frequency of 2.1 GHz.

    PubMed

    Habachi, A El; Conil, E; Hadjem, A; Vazquez, E; Wong, M F; Gati, A; Fleury, G; Wiart, J

    2010-04-07

    In this paper, we propose identification of the morphological factors that may impact the whole-body averaged specific absorption rate (WBSAR). This study is conducted for the case of exposure to a front plane wave at a 2100 MHz frequency carrier. This study is based on the development of different regression models for estimating the WBSAR as a function of morphological factors. For this purpose, a database of 12 anatomical human models (phantoms) has been considered. Also, 18 supplementary phantoms obtained using the morphing technique were generated to build the required relation. This paper presents three models based on external morphological factors such as the body surface area, the body mass index or the body mass. These models show good results in estimating the WBSAR (<10%) for families obtained by the morphing technique, but these are still less accurate (30%) when applied to different original phantoms. This study stresses the importance of the internal morphological factors such as muscle and fat proportions in characterization of the WBSAR. The regression models are then improved using internal morphological factors with an estimation error of approximately 10% on the WBSAR. Finally, this study is suitable for establishing the statistical distribution of the WBSAR for a given population characterized by its morphology.

  10. Parametrically Guided Generalized Additive Models with Application to Mergers and Acquisitions Data

    PubMed Central

    Fan, Jianqing; Maity, Arnab; Wang, Yihui; Wu, Yichao

    2012-01-01

    Generalized nonparametric additive models present a flexible way to evaluate the effects of several covariates on a general outcome of interest via a link function. In this modeling framework, one assumes that the effect of each of the covariates is nonparametric and additive. However, in practice, often there is prior information available about the shape of the regression functions, possibly from pilot studies or exploratory analysis. In this paper, we consider such situations and propose an estimation procedure where the prior information is used as a parametric guide to fit the additive model. Specifically, we first posit a parametric family for each of the regression functions using the prior information (parametric guides). After removing these parametric trends, we then estimate the remainder of the nonparametric functions using a nonparametric generalized additive model, and form the final estimates by adding back the parametric trend. We investigate the asymptotic properties of the estimates and show that when a good guide is chosen, the asymptotic variance of the estimates can be reduced significantly while keeping the asymptotic variance same as the unguided estimator. We observe the performance of our method via a simulation study and demonstrate our method by applying to a real data set on mergers and acquisitions. PMID:23645976

  11. Statistical analysis of whole-body absorption depending on anatomical human characteristics at a frequency of 2.1 GHz

    NASA Astrophysics Data System (ADS)

    El Habachi, A.; Conil, E.; Hadjem, A.; Vazquez, E.; Wong, M. F.; Gati, A.; Fleury, G.; Wiart, J.

    2010-04-01

    In this paper, we propose identification of the morphological factors that may impact the whole-body averaged specific absorption rate (WBSAR). This study is conducted for the case of exposure to a front plane wave at a 2100 MHz frequency carrier. This study is based on the development of different regression models for estimating the WBSAR as a function of morphological factors. For this purpose, a database of 12 anatomical human models (phantoms) has been considered. Also, 18 supplementary phantoms obtained using the morphing technique were generated to build the required relation. This paper presents three models based on external morphological factors such as the body surface area, the body mass index or the body mass. These models show good results in estimating the WBSAR (<10%) for families obtained by the morphing technique, but these are still less accurate (30%) when applied to different original phantoms. This study stresses the importance of the internal morphological factors such as muscle and fat proportions in characterization of the WBSAR. The regression models are then improved using internal morphological factors with an estimation error of approximately 10% on the WBSAR. Finally, this study is suitable for establishing the statistical distribution of the WBSAR for a given population characterized by its morphology.

  12. Modeling CO{sub 2} and H{sub 2}S solubility in MDEA and DEA: Design implications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rochelle, G.T.; Posey, M.

    1996-12-31

    The solubility of H{sub 2}S and CO{sub 2} in aqueous alkanolamines affects solution capacity and the required circulation rate for acid gas absorption. These thermodynamics also determine the relationship of steam rate and the lean loading of the solution which in turn sets the leak of acid gas from the top of the absorber. Finally, the mechanisms of mass transfer and the role of kinetics, especially in stripping, depend on the vapor/liquid equilibria. Published measurements of CO{sub 2} and H{sub 2}S solubility in methyldiethanolamine (MDEA) and diethanolamine (DEA) are not in general agreement, especially at low loading of acid gas.more » The available sets of solubility data have been regressed with the AspenPlus electrolyte/NRTL model. All of the parameters and constants that make up this model have been carefully evaluated. Independent thermodynamic data such as freezing point and heat of mixing have been included in the regression to strengthen the estimates of model parameters. The parameters for each set of solubility data have been evaluated in an attempt to determine which set is correct. Each evaluated model has been used to calculate the acid gas capacity and minimum stripping steam rate for several industrial cases of acid gas absorption/stripping.« less

  13. Parametrically Guided Generalized Additive Models with Application to Mergers and Acquisitions Data.

    PubMed

    Fan, Jianqing; Maity, Arnab; Wang, Yihui; Wu, Yichao

    2013-01-01

    Generalized nonparametric additive models present a flexible way to evaluate the effects of several covariates on a general outcome of interest via a link function. In this modeling framework, one assumes that the effect of each of the covariates is nonparametric and additive. However, in practice, often there is prior information available about the shape of the regression functions, possibly from pilot studies or exploratory analysis. In this paper, we consider such situations and propose an estimation procedure where the prior information is used as a parametric guide to fit the additive model. Specifically, we first posit a parametric family for each of the regression functions using the prior information (parametric guides). After removing these parametric trends, we then estimate the remainder of the nonparametric functions using a nonparametric generalized additive model, and form the final estimates by adding back the parametric trend. We investigate the asymptotic properties of the estimates and show that when a good guide is chosen, the asymptotic variance of the estimates can be reduced significantly while keeping the asymptotic variance same as the unguided estimator. We observe the performance of our method via a simulation study and demonstrate our method by applying to a real data set on mergers and acquisitions.

  14. Estimation of retinal vessel caliber using model fitting and random forests

    NASA Astrophysics Data System (ADS)

    Araújo, Teresa; Mendonça, Ana Maria; Campilho, Aurélio

    2017-03-01

    Retinal vessel caliber changes are associated with several major diseases, such as diabetes and hypertension. These caliber changes can be evaluated using eye fundus images. However, the clinical assessment is tiresome and prone to errors, motivating the development of automatic methods. An automatic method based on vessel crosssection intensity profile model fitting for the estimation of vessel caliber in retinal images is herein proposed. First, vessels are segmented from the image, vessel centerlines are detected and individual segments are extracted and smoothed. Intensity profiles are extracted perpendicularly to the vessel, and the profile lengths are determined. Then, model fitting is applied to the smoothed profiles. A novel parametric model (DoG-L7) is used, consisting on a Difference-of-Gaussians multiplied by a line which is able to describe profile asymmetry. Finally, the parameters of the best-fit model are used for determining the vessel width through regression using ensembles of bagged regression trees with random sampling of the predictors (random forests). The method is evaluated on the REVIEW public dataset. A precision close to the observers is achieved, outperforming other state-of-the-art methods. The method is robust and reliable for width estimation in images with pathologies and artifacts, with performance independent of the range of diameters.

  15. Internalized stigma among psychiatric outpatients: Associations with quality of life, functioning, hope and self-esteem.

    PubMed

    Picco, Louisa; Pang, Shirlene; Lau, Ying Wen; Jeyagurunathan, Anitha; Satghare, Pratika; Abdin, Edimansyah; Vaingankar, Janhavi Ajit; Lim, Susan; Poh, Chee Lien; Chong, Siow Ann; Subramaniam, Mythily

    2016-12-30

    This study aimed to: (i) determine the prevalence, socio-demographic and clinical correlates of internalized stigma and (ii) explore the association between internalized stigma and quality of life, general functioning, hope and self-esteem, among a multi-ethnic Asian population of patients with mental disorders. This cross-sectional, survey recruited adult patients (n=280) who were seeking treatment at outpatient and affiliated clinics of the only tertiary psychiatric hospital in Singapore. Internalized stigma was measured using the Internalized Stigma of Mental Illness scale. 43.6% experienced moderate to high internalized stigma. After making adjustments in multiple logistic regression analysis, results revealed there were no significant socio-demographic or clinical correlates relating to internalized stigma. Individual logistic regression models found a negative relationship between quality of life, self-esteem, general functioning and internalized stigma whereby lower scores were associated with higher internalized stigma. In the final regression model, which included all psychosocial variables together, self-esteem was the only variable significantly and negatively associated with internalized stigma. The results of this study contribute to our understanding of the role internalized stigma plays in patients with mental illness, and the impact it can have on psychosocial aspects of their lives. Copyright © 2016 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.

  16. Determining delayed admission to intensive care unit for mechanically ventilated patients in the emergency department.

    PubMed

    Hung, Shih-Chiang; Kung, Chia-Te; Hung, Chih-Wei; Liu, Ber-Ming; Liu, Jien-Wei; Chew, Ghee; Chuang, Hung-Yi; Lee, Wen-Huei; Lee, Tzu-Chi

    2014-08-23

    The adverse effects of delayed admission to the intensive care unit (ICU) have been recognized in previous studies. However, the definitions of delayed admission varies across studies. This study proposed a model to define "delayed admission", and explored the effect of ICU-waiting time on patients' outcome. This retrospective cohort study included non-traumatic adult patients on mechanical ventilation in the emergency department (ED), from July 2009 to June 2010. The primary outcomes measures were 21-ventilator-day mortality and prolonged hospital stays (over 30 days). Models of Cox regression and logistic regression were used for multivariate analysis. The non-delayed ICU-waiting was defined as a period in which the time effect on mortality was not statistically significant in a Cox regression model. To identify a suitable cut-off point between "delayed" and "non-delayed", subsets from the overall data were made based on ICU-waiting time and the hazard ratio of ICU-waiting hour in each subset was iteratively calculated. The cut-off time was then used to evaluate the impact of delayed ICU admission on mortality and prolonged length of hospital stay. The final analysis included 1,242 patients. The time effect on mortality emerged after 4 hours, thus we deduced ICU-waiting time in ED > 4 hours as delayed. By logistic regression analysis, delayed ICU admission affected the outcomes of 21 ventilator-days mortality and prolonged hospital stay, with odds ratio of 1.41 (95% confidence interval, 1.05 to 1.89) and 1.56 (95% confidence interval, 1.07 to 2.27) respectively. For patients on mechanical ventilation at the ED, delayed ICU admission is associated with higher probability of mortality and additional resource expenditure. A benchmark waiting time of no more than 4 hours for ICU admission is recommended.

  17. An investigation of the speeding-related crash designation through crash narrative reviews sampled via logistic regression.

    PubMed

    Fitzpatrick, Cole D; Rakasi, Saritha; Knodler, Michael A

    2017-01-01

    Speed is one of the most important factors in traffic safety as higher speeds are linked to increased crash risk and higher injury severities. Nearly a third of fatal crashes in the United States are designated as "speeding-related", which is defined as either "the driver behavior of exceeding the posted speed limit or driving too fast for conditions." While many studies have utilized the speeding-related designation in safety analyses, no studies have examined the underlying accuracy of this designation. Herein, we investigate the speeding-related crash designation through the development of a series of logistic regression models that were derived from the established speeding-related crash typologies and validated using a blind review, by multiple researchers, of 604 crash narratives. The developed logistic regression model accurately identified crashes which were not originally designated as speeding-related but had crash narratives that suggested speeding as a causative factor. Only 53.4% of crashes designated as speeding-related contained narratives which described speeding as a causative factor. Further investigation of these crashes revealed that the driver contributing code (DCC) of "driving too fast for conditions" was being used in three separate situations. Additionally, this DCC was also incorrectly used when "exceeding the posted speed limit" would likely have been a more appropriate designation. Finally, it was determined that the responding officer only utilized one DCC in 82% of crashes not designated as speeding-related but contained a narrative indicating speed as a contributing causal factor. The use of logistic regression models based upon speeding-related crash typologies offers a promising method by which all possible speeding-related crashes could be identified. Published by Elsevier Ltd.

  18. Calibration and Data Analysis of the MC-130 Air Balance

    NASA Technical Reports Server (NTRS)

    Booth, Dennis; Ulbrich, N.

    2012-01-01

    Design, calibration, calibration analysis, and intended use of the MC-130 air balance are discussed. The MC-130 balance is an 8.0 inch diameter force balance that has two separate internal air flow systems and one external bellows system. The manual calibration of the balance consisted of a total of 1854 data points with both unpressurized and pressurized air flowing through the balance. A subset of 1160 data points was chosen for the calibration data analysis. The regression analysis of the subset was performed using two fundamentally different analysis approaches. First, the data analysis was performed using a recently developed extension of the Iterative Method. This approach fits gage outputs as a function of both applied balance loads and bellows pressures while still allowing the application of the iteration scheme that is used with the Iterative Method. Then, for comparison, the axial force was also analyzed using the Non-Iterative Method. This alternate approach directly fits loads as a function of measured gage outputs and bellows pressures and does not require a load iteration. The regression models used by both the extended Iterative and Non-Iterative Method were constructed such that they met a set of widely accepted statistical quality requirements. These requirements lead to reliable regression models and prevent overfitting of data because they ensure that no hidden near-linear dependencies between regression model terms exist and that only statistically significant terms are included. Finally, a comparison of the axial force residuals was performed. Overall, axial force estimates obtained from both methods show excellent agreement as the differences of the standard deviation of the axial force residuals are on the order of 0.001 % of the axial force capacity.

  19. Generating linear regression model to predict motor functions by use of laser range finder during TUG.

    PubMed

    Adachi, Daiki; Nishiguchi, Shu; Fukutani, Naoto; Hotta, Takayuki; Tashiro, Yuto; Morino, Saori; Shirooka, Hidehiko; Nozaki, Yuma; Hirata, Hinako; Yamaguchi, Moe; Yorozu, Ayanori; Takahashi, Masaki; Aoyama, Tomoki

    2017-05-01

    The purpose of this study was to investigate which spatial and temporal parameters of the Timed Up and Go (TUG) test are associated with motor function in elderly individuals. This study included 99 community-dwelling women aged 72.9 ± 6.3 years. Step length, step width, single support time, variability of the aforementioned parameters, gait velocity, cadence, reaction time from starting signal to first step, and minimum distance between the foot and a marker placed to 3 in front of the chair were measured using our analysis system. The 10-m walk test, five times sit-to-stand (FTSTS) test, and one-leg standing (OLS) test were used to assess motor function. Stepwise multivariate linear regression analysis was used to determine which TUG test parameters were associated with each motor function test. Finally, we calculated a predictive model for each motor function test using each regression coefficient. In stepwise linear regression analysis, step length and cadence were significantly associated with the 10-m walk test, FTSTS and OLS test. Reaction time was associated with the FTSTS test, and step width was associated with the OLS test. Each predictive model showed a strong correlation with the 10-m walk test and OLS test (P < 0.01), which was not significant higher correlation than TUG test time. We showed which TUG test parameters were associated with each motor function test. Moreover, the TUG test time regarded as the lower extremity function and mobility has strong predictive ability in each motor function test. Copyright © 2017 The Japanese Orthopaedic Association. Published by Elsevier B.V. All rights reserved.

  20. SU-E-J-244: Development and Validation of a Knowledge Based Planning Model for External Beam Radiation Therapy of Locally Advanced Non-Small Cell Lung Cancer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Z; Kennedy, A; Larsen, E

    2015-06-15

    Purpose: The study aims to develop and validate a knowledge based planning (KBP) model for external beam radiation therapy of locally advanced non-small cell lung cancer (LA-NSCLC). Methods: RapidPlan™ technology was used to develop a lung KBP model. Plans from 65 patients with LA-NSCLC were used to train the model. 25 patients were treated with VMAT, and the other patients were treated with IMRT. Organs-at-risk (OARs) included right lung, left lung, heart, esophagus, and spinal cord. DVH and geometric distribution DVH were extracted from the treated plans. The model was trained using principal component analysis and step-wise multiple regression. Boxmore » plot and regression plot tools were used to identify geometric outliers and dosimetry outliers and help fine-tune the model. The validation was performed by (a) comparing predicted DVH boundaries to actual DVHs of 63 patients and (b) using an independent set of treatment planning data. Results: 63 out of 65 plans were included in the final KBP model with PTV volume ranging from 102.5cc to 1450.2cc. Total treatment dose prescription varied from 50Gy to 70Gy based on institutional guidelines. One patient was excluded due to geometric outlier where 2.18cc of spinal cord was included in PTV. The other patient was excluded due to dosimetric outlier where the dose sparing to spinal cord was heavily enforced in the clinical plan. Target volume, OAR volume, OAR overlap volume percentage to target, and OAR out-of-field volume were included in the trained model. Lungs and heart had two principal component scores of GEDVH, whereas spinal cord and esophagus had three in the final model. Predicted DVH band (mean ±1 standard deviation) represented 66.2±3.6% of all DVHs. Conclusion: A KBP model was developed and validated for radiotherapy of LA-NSCLC in a commercial treatment planning system. The clinical implementation may improve the consistency of IMRT/VMAT planning.« less

  1. An Application of Robust Method in Multiple Linear Regression Model toward Credit Card Debt

    NASA Astrophysics Data System (ADS)

    Amira Azmi, Nur; Saifullah Rusiman, Mohd; Khalid, Kamil; Roslan, Rozaini; Sufahani, Suliadi; Mohamad, Mahathir; Salleh, Rohayu Mohd; Hamzah, Nur Shamsidah Amir

    2018-04-01

    Credit card is a convenient alternative replaced cash or cheque, and it is essential component for electronic and internet commerce. In this study, the researchers attempt to determine the relationship and significance variables between credit card debt and demographic variables such as age, household income, education level, years with current employer, years at current address, debt to income ratio and other debt. The provided data covers 850 customers information. There are three methods that applied to the credit card debt data which are multiple linear regression (MLR) models, MLR models with least quartile difference (LQD) method and MLR models with mean absolute deviation method. After comparing among three methods, it is found that MLR model with LQD method became the best model with the lowest value of mean square error (MSE). According to the final model, it shows that the years with current employer, years at current address, household income in thousands and debt to income ratio are positively associated with the amount of credit debt. Meanwhile variables for age, level of education and other debt are negatively associated with amount of credit debt. This study may serve as a reference for the bank company by using robust methods, so that they could better understand their options and choice that is best aligned with their goals for inference regarding to the credit card debt.

  2. Contribution of Submarine Groundwater on the Water-Food Nexus in Coastal Ecosystems: Effects on Biodiversity and Fishery Production

    NASA Astrophysics Data System (ADS)

    Shoji, J.; Sugimoto, R.; Honda, H.; Tominaga, O.; Taniguchi, M.

    2014-12-01

    In the past decade, machine-learning methods for empirical rainfall-runoff modeling have seen extensive development. However, the majority of research has focused on a small number of methods, such as artificial neural networks, while not considering other approaches for non-parametric regression that have been developed in recent years. These methods may be able to achieve comparable predictive accuracy to ANN's and more easily provide physical insights into the system of interest through evaluation of covariate influence. Additionally, these methods could provide a straightforward, computationally efficient way of evaluating climate change impacts in basins where data to support physical hydrologic models is limited. In this paper, we use multiple regression and machine-learning approaches to predict monthly streamflow in five highly-seasonal rivers in the highlands of Ethiopia. We find that generalized additive models, random forests, and cubist models achieve better predictive accuracy than ANNs in many basins assessed and are also able to outperform physical models developed for the same region. We discuss some challenges that could hinder the use of such models for climate impact assessment, such as biases resulting from model formulation and prediction under extreme climate conditions, and suggest methods for preventing and addressing these challenges. Finally, we demonstrate how predictor variable influence can be assessed to provide insights into the physical functioning of data-sparse watersheds.

  3. Machine learning modeling of plant phenology based on coupling satellite and gridded meteorological dataset

    NASA Astrophysics Data System (ADS)

    Czernecki, Bartosz; Nowosad, Jakub; Jabłońska, Katarzyna

    2018-04-01

    Changes in the timing of plant phenological phases are important proxies in contemporary climate research. However, most of the commonly used traditional phenological observations do not give any coherent spatial information. While consistent spatial data can be obtained from airborne sensors and preprocessed gridded meteorological data, not many studies robustly benefit from these data sources. Therefore, the main aim of this study is to create and evaluate different statistical models for reconstructing, predicting, and improving quality of phenological phases monitoring with the use of satellite and meteorological products. A quality-controlled dataset of the 13 BBCH plant phenophases in Poland was collected for the period 2007-2014. For each phenophase, statistical models were built using the most commonly applied regression-based machine learning techniques, such as multiple linear regression, lasso, principal component regression, generalized boosted models, and random forest. The quality of the models was estimated using a k-fold cross-validation. The obtained results showed varying potential for coupling meteorological derived indices with remote sensing products in terms of phenological modeling; however, application of both data sources improves models' accuracy from 0.6 to 4.6 day in terms of obtained RMSE. It is shown that a robust prediction of early phenological phases is mostly related to meteorological indices, whereas for autumn phenophases, there is a stronger information signal provided by satellite-derived vegetation metrics. Choosing a specific set of predictors and applying a robust preprocessing procedures is more important for final results than the selection of a particular statistical model. The average RMSE for the best models of all phenophases is 6.3, while the individual RMSE vary seasonally from 3.5 to 10 days. Models give reliable proxy for ground observations with RMSE below 5 days for early spring and late spring phenophases. For other phenophases, RMSE are higher and rise up to 9-10 days in the case of the earliest spring phenophases.

  4. Conditional parametric models for storm sewer runoff

    NASA Astrophysics Data System (ADS)

    Jonsdottir, H.; Nielsen, H. Aa; Madsen, H.; Eliasson, J.; Palsson, O. P.; Nielsen, M. K.

    2007-05-01

    The method of conditional parametric modeling is introduced for flow prediction in a sewage system. It is a well-known fact that in hydrological modeling the response (runoff) to input (precipitation) varies depending on soil moisture and several other factors. Consequently, nonlinear input-output models are needed. The model formulation described in this paper is similar to the traditional linear models like final impulse response (FIR) and autoregressive exogenous (ARX) except that the parameters vary as a function of some external variables. The parameter variation is modeled by local lines, using kernels for local linear regression. As such, the method might be referred to as a nearest neighbor method. The results achieved in this study were compared to results from the conventional linear methods, FIR and ARX. The increase in the coefficient of determination is substantial. Furthermore, the new approach conserves the mass balance better. Hence this new approach looks promising for various hydrological models and analysis.

  5. Cognitive predictors of a common multitasking ability: Contributions from working memory, attention control, and fluid intelligence.

    PubMed

    Redick, Thomas S; Shipstead, Zach; Meier, Matthew E; Montroy, Janelle J; Hicks, Kenny L; Unsworth, Nash; Kane, Michael J; Hambrick, D Zachary; Engle, Randall W

    2016-11-01

    Previous research has identified several cognitive abilities that are important for multitasking, but few studies have attempted to measure a general multitasking ability using a diverse set of multitasks. In the final dataset, 534 young adult subjects completed measures of working memory (WM), attention control, fluid intelligence, and multitasking. Correlations, hierarchical regression analyses, confirmatory factor analyses, structural equation models, and relative weight analyses revealed several key findings. First, although the complex tasks used to assess multitasking differed greatly in their task characteristics and demands, a coherent construct specific to multitasking ability was identified. Second, the cognitive ability predictors accounted for substantial variance in the general multitasking construct, with WM and fluid intelligence accounting for the most multitasking variance compared to attention control. Third, the magnitude of the relationships among the cognitive abilities and multitasking varied as a function of the complexity and structure of the various multitasks assessed. Finally, structural equation models based on a multifaceted model of WM indicated that attention control and capacity fully mediated the WM and multitasking relationship. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  6. Birth weight and cognitive development in adolescence: causal relationship or social selection?

    PubMed

    Gorman, Bridget K

    2002-01-01

    Using data from the National Longitudinal Survey of Adolescent Health (Add Health), I investigate the relationship between birth weight and cognitive development among adolescents aged 12-17. Initial OLS regression models reveal a significant, positive relationship between low birth weight and verbal ability. Controlling for demographic, socioeconomic, and other adolescent characteristics modifies, but does not eliminate, this relationship. Additional models that stratify the sample by parental education illustrate the greater importance of other family and adolescent characteristics for cognitive development in adolescence, and a diminished role of birth weight. In the final section of the paper, fixed effects models of non-twin full siblings indicate no significant association between birth weight and verbal ability, suggesting that traditional cross-sectional models overstate the influence of birth weight for cognitive development in adolescence.

  7. Rapid Detection of Volatile Oil in Mentha haplocalyx by Near-Infrared Spectroscopy and Chemometrics.

    PubMed

    Yan, Hui; Guo, Cheng; Shao, Yang; Ouyang, Zhen

    2017-01-01

    Near-infrared spectroscopy combined with partial least squares regression (PLSR) and support vector machine (SVM) was applied for the rapid determination of chemical component of volatile oil content in Mentha haplocalyx . The effects of data pre-processing methods on the accuracy of the PLSR calibration models were investigated. The performance of the final model was evaluated according to the correlation coefficient ( R ) and root mean square error of prediction (RMSEP). For PLSR model, the best preprocessing method combination was first-order derivative, standard normal variate transformation (SNV), and mean centering, which had of 0.8805, of 0.8719, RMSEC of 0.091, and RMSEP of 0.097, respectively. The wave number variables linking to volatile oil are from 5500 to 4000 cm-1 by analyzing the loading weights and variable importance in projection (VIP) scores. For SVM model, six LVs (less than seven LVs in PLSR model) were adopted in model, and the result was better than PLSR model. The and were 0.9232 and 0.9202, respectively, with RMSEC and RMSEP of 0.084 and 0.082, respectively, which indicated that the predicted values were accurate and reliable. This work demonstrated that near infrared reflectance spectroscopy with chemometrics could be used to rapidly detect the main content volatile oil in M. haplocalyx . The quality of medicine directly links to clinical efficacy, thus, it is important to control the quality of Mentha haplocalyx . Near-infrared spectroscopy combined with partial least squares regression (PLSR) and support vector machine (SVM) was applied for the rapid determination of chemical component of volatile oil content in Mentha haplocalyx . For SVM model, 6 LVs (less than 7 LVs in PLSR model) were adopted in model, and the result was better than PLSR model. It demonstrated that near infrared reflectance spectroscopy with chemometrics could be used to rapidly detect the main content volatile oil in Mentha haplocalyx . Abbreviations used: 1 st der: First-order derivative; 2 nd der: Second-order derivative; LOO: Leave-one-out; LVs: Latent variables; MC: Mean centering, NIR: Near-infrared; NIRS: Near infrared spectroscopy; PCR: Principal component regression, PLSR: Partial least squares regression; RBF: Radial basis function; RMSEC: Root mean square error of cross validation, RMSEC: Root mean square error of calibration; RMSEP: Root mean square error of prediction; SNV: Standard normal variate transformation; SVM: Support vector machine; VIP: Variable Importance in projection.

  8. Prediction models of health-related quality of life in different neck pain conditions: a cross-sectional study.

    PubMed

    Beltran-Alacreu, Hector; López-de-Uralde-Villanueva, Ibai; Calvo-Lobo, César; La Touche, Roy; Cano-de-la-Cuerda, Roberto; Gil-Martínez, Alfonso; Fernández-Ayuso, David; Fernández-Carnero, Josué

    2018-01-01

    The main aim of the study was to predict the health-related quality of life (HRQoL) based on physical, functional, and psychological measures in patients with different types of neck pain (NP). This cross-sectional study included 202 patients from a primary health center and the physiotherapy outpatient department of a hospital. Patients were divided into four groups according to their NP characteristics: chronic (CNP), acute whiplash (WHIP), chronic NP associated with temporomandibular dysfunction (NP-TMD), or chronic NP associated with chronic primary headache (NP-PH). The following measures were performed: Short Form-12 Health Survey (SF-12), Neck Disability Index (NDI), visual analog scale (VAS), State-Trait Anxiety Inventory (STAI), Beck Depression Inventory (BECK), and cervical range of movement (CROM). The regression models based on the SF-12 total HRQoL for CNP and NP-TMD groups showed that only NDI was a significant predictor of the worst HRQoL (48.9% and 48.4% of the variance, respectively). In the WHIP group, the regression model showed that BECK was the only significant predictor variable for the worst HRQoL (31.7% of the variance). Finally, in the NP-PH group, the regression showed that the BECK, STAI, and VAS model predicted the worst HRQoL (75.1% of the variance). Chronic nonspecific NP and chronic NP associated with temporomandibular dysfunction were the main predictors of neck disability. In addition, depression, anxiety, and pain were the main predictors of WHIP or primary headache associated with CNP.

  9. Color vision impairment in multiple sclerosis points to retinal ganglion cell damage.

    PubMed

    Lampert, E J; Andorra, M; Torres-Torres, R; Ortiz-Pérez, S; Llufriu, S; Sepúlveda, M; Sola, N; Saiz, A; Sánchez-Dalmau, B; Villoslada, P; Martínez-Lapiscina, Elena H

    2015-11-01

    Multiple Sclerosis (MS) results in color vision impairment regardless of optic neuritis (ON). The exact location of injury remains undefined. The objective of this study is to identify the region leading to dyschromatopsia in MS patients' NON-eyes. We evaluated Spearman correlations between color vision and measures of different regions in the afferent visual pathway in 106 MS patients. Regions with significant correlations were included in logistic regression models to assess their independent role in dyschromatopsia. We evaluated color vision with Hardy-Rand-Rittler plates and retinal damage using Optical Coherence Tomography. We ran SIENAX to measure Normalized Brain Parenchymal Volume (NBPV), FIRST for thalamus volume and Freesurfer for visual cortex areas. We found moderate, significant correlations between color vision and macular retinal nerve fiber layer (rho = 0.289, p = 0.003), ganglion cell complex (GCC = GCIP) (rho = 0.353, p < 0.001), thalamus (rho = 0.361, p < 0.001), and lesion volume within the optic radiations (rho = -0.230, p = 0.030). Only GCC thickness remained significant (p = 0.023) in the logistic regression model. In the final model including lesion load and NBPV as markers of diffuse neuroaxonal damage, GCC remained associated with dyschromatopsia [OR = 0.88 95 % CI (0.80-0.97) p = 0.016]. This association remained significant when we also added sex, age, and disease duration as covariates in the regression model. Dyschromatopsia in NON-eyes is due to damage of retinal ganglion cells (RGC) in MS. Color vision can serve as a marker of RGC damage in MS.

  10. Structural classification of marshes with Polarimetric SAR highlighting the temporal mapping of marshes exposed to oil

    USGS Publications Warehouse

    Ramsey, Elijah W.; Rangoonwala, Amina; Jones, Cathleen E.

    2015-01-01

    Empirical relationships between field-derived Leaf Area Index (LAI) and Leaf Angle Distribution (LAD) and polarimetric synthetic aperture radar (PolSAR) based biophysical indicators were created and applied to map S. alterniflora marsh canopy structure. PolSAR and field data were collected near concurrently in the summers of 2010, 2011, and 2012 in coastal marshes, and PolSAR data alone were acquired in 2009. Regression analyses showed that LAI correspondence with the PolSAR biophysical indicator variables equaled or exceeded those of vegetation water content (VWC) correspondences. In the final six regressor model, the ratio HV/VV explained 49% of the total 77% explained LAI variance, and the HH-VV coherence and phase information accounted for the remainder. HV/HH dominated the two regressor LAD relationship, and spatial heterogeneity and backscatter mechanism followed by coherence information dominated the final three regressor model that explained 74% of the LAD variance. Regression results applied to 2009 through 2012 PolSAR images showed substantial changes in marsh LAI and LAD. Although the direct cause was not substantiated, following a release of freshwater in response to the 2010 Deepwater Horizon oil spill, the fairly uniform interior marsh structure of 2009 was more vertical and dense shortly after the oil spill cessation. After 2010, marsh structure generally progressed back toward the 2009 uniformity; however, the trend was more disjointed in oil impact marshes.             

  11. Factors Impacting Growth in Infants with Single Ventricle Physiology: A Report from Pediatric Heart Network Infant Single Ventricle Trial

    PubMed Central

    Williams, Richard V.; Zak, Victor; Ravishankar, Chitra; Altmann, Karen; Anderson, Jeffrey; Atz, Andrew M.; Dunbar-Masterson, Carolyn; Ghanayem, Nancy; Lambert, Linda; Lurito, Karen; Medoff-Cooper, Barbara; Margossian, Renee; Pemberton, Victoria L.; Russell, Jennifer; Stylianou, Mario; Hsu, Daphne

    2011-01-01

    Objectives To describe growth patterns in infants with single ventricle physiology and determine factors influencing growth. Study design Data from 230 subjects enrolled in the Pediatric Heart Network Infant Single Ventricle Enalapril Trial were used to assess factors influencing change in weight-for-age z-score (Δz) from study enrollment (0.7 ± 0.4 months) to pre-superior cavopulmonary connection (SCPC) (5.1 ± 1.8 months, period 1), and pre-SCPC to final study visit (14.1 ± 0.9 months, period 2). Predictor variables included patient characteristics, feeding regimen, clinical center, and medical factors during neonatal (period 1) and SCPC hospitalizations (period 2). Univariate regression analysis was performed, followed by backward stepwise regression and bootstrapping reliability to inform a final multivariable model. Results Weights were available for 197/230 subjects for period 1 and 173/197 for period 2. For period 1, greater gestational age, younger age at study enrollment, tube feeding at neonatal discharge, and clinical center were associated with a greater negative Δz (poorer growth) in multivariable modeling (adjusted R2 = 0.39, p < 0.001). For period 2, younger age at SCPC and greater daily caloric intake were associated with greater positive Δz (better growth) (R2 = 0.10, p = 0.002). Conclusions Aggressive nutritional support and earlier SCPC are modifiable factors associated with a favorable change in weight-for-age z-score. PMID:21784436

  12. Spatio-temporal modeling of chronic PM 10 exposure for the Nurses' Health Study

    NASA Astrophysics Data System (ADS)

    Yanosky, Jeff D.; Paciorek, Christopher J.; Schwartz, Joel; Laden, Francine; Puett, Robin; Suh, Helen H.

    2008-06-01

    Chronic epidemiological studies of airborne particulate matter (PM) have typically characterized the chronic PM exposures of their study populations using city- or county-wide ambient concentrations, which limit the studies to areas where nearby monitoring data are available and which ignore within-city spatial gradients in ambient PM concentrations. To provide more spatially refined and precise chronic exposure measures, we used a Geographic Information System (GIS)-based spatial smoothing model to predict monthly outdoor PM10 concentrations in the northeastern and midwestern United States. This model included monthly smooth spatial terms and smooth regression terms of GIS-derived and meteorological predictors. Using cross-validation and other pre-specified selection criteria, terms for distance to road by road class, urban land use, block group and county population density, point- and area-source PM10 emissions, elevation, wind speed, and precipitation were found to be important determinants of PM10 concentrations and were included in the final model. Final model performance was strong (cross-validation R2=0.62), with little bias (-0.4 μg m-3) and high precision (6.4 μg m-3). The final model (with monthly spatial terms) performed better than a model with seasonal spatial terms (cross-validation R2=0.54). The addition of GIS-derived and meteorological predictors improved predictive performance over spatial smoothing (cross-validation R2=0.51) or inverse distance weighted interpolation (cross-validation R2=0.29) methods alone and increased the spatial resolution of predictions. The model performed well in both rural and urban areas, across seasons, and across the entire time period. The strong model performance demonstrates its suitability as a means to estimate individual-specific chronic PM10 exposures for large populations.

  13. Using a binary logistic regression method and GIS for evaluating and mapping the groundwater spring potential in the Sultan Mountains (Aksehir, Turkey)

    NASA Astrophysics Data System (ADS)

    Ozdemir, Adnan

    2011-07-01

    SummaryThe purpose of this study is to produce a groundwater spring potential map of the Sultan Mountains in central Turkey, based on a logistic regression method within a Geographic Information System (GIS) environment. Using field surveys, the locations of the springs (440 springs) were determined in the study area. In this study, 17 spring-related factors were used in the analysis: geology, relative permeability, land use/land cover, precipitation, elevation, slope, aspect, total curvature, plan curvature, profile curvature, wetness index, stream power index, sediment transport capacity index, distance to drainage, distance to fault, drainage density, and fault density map. The coefficients of the predictor variables were estimated using binary logistic regression analysis and were used to calculate the groundwater spring potential for the entire study area. The accuracy of the final spring potential map was evaluated based on the observed springs. The accuracy of the model was evaluated by calculating the relative operating characteristics. The area value of the relative operating characteristic curve model was found to be 0.82. These results indicate that the model is a good estimator of the spring potential in the study area. The spring potential map shows that the areas of very low, low, moderate and high groundwater spring potential classes are 105.586 km 2 (28.99%), 74.271 km 2 (19.906%), 101.203 km 2 (27.14%), and 90.05 km 2 (24.671%), respectively. The interpretations of the potential map showed that stream power index, relative permeability of lithologies, geology, elevation, aspect, wetness index, plan curvature, and drainage density play major roles in spring occurrence and distribution in the Sultan Mountains. The logistic regression approach has not yet been used to delineate groundwater potential zones. In this study, the logistic regression method was used to locate potential zones for groundwater springs in the Sultan Mountains. The evolved model was found to be in strong agreement with the available groundwater spring test data. Hence, this method can be used routinely in groundwater exploration under favourable conditions.

  14. Predicting Reactive Intermediate Quantum Yields from Dissolved Organic Matter Photolysis Using Optical Properties and Antioxidant Capacity.

    PubMed

    Mckay, Garrett; Huang, Wenxi; Romera-Castillo, Cristina; Crouch, Jenna E; Rosario-Ortiz, Fernando L; Jaffé, Rudolf

    2017-05-16

    The antioxidant capacity and formation of photochemically produced reactive intermediates (RI) was studied for water samples collected from the Florida Everglades with different spatial (marsh versus estuarine) and temporal (wet versus dry season) characteristics. Measured RI included triplet excited states of dissolved organic matter ( 3 DOM*), singlet oxygen ( 1 O 2 ), and the hydroxyl radical ( • OH). Single and multiple linear regression modeling were performed using a broad range of extrinsic (to predict RI formation rates, R RI ) and intrinsic (to predict RI quantum yields, Φ RI ) parameters. Multiple linear regression models consistently led to better predictions of R RI and Φ RI for our data set but poor prediction of Φ RI for a previously published data set,1 probably because the predictors are intercorrelated (Pearson's r > 0.5). Single linear regression models were built with data compiled from previously published studies (n ≈ 120) in which E2:E3, S, and Φ RI values were measured, which revealed a high degree of similarity between RI-optical property relationships across DOM samples of diverse sources. This study reveals that • OH formation is, in general, decoupled from 3 DOM* and 1 O 2 formation, providing supporting evidence that 3 DOM* is not a • OH precursor. Finally, Φ RI for 1 O 2 and 3 DOM* correlated negatively with antioxidant activity (a surrogate for electron donating capacity) for the collected samples, which is consistent with intramolecular oxidation of DOM moieties by 3 DOM*.

  15. Novel solutions for an old disease: diagnosis of acute appendicitis with random forest, support vector machines, and artificial neural networks.

    PubMed

    Hsieh, Chung-Ho; Lu, Ruey-Hwa; Lee, Nai-Hsin; Chiu, Wen-Ta; Hsu, Min-Huei; Li, Yu-Chuan Jack

    2011-01-01

    Diagnosing acute appendicitis clinically is still difficult. We developed random forests, support vector machines, and artificial neural network models to diagnose acute appendicitis. Between January 2006 and December 2008, patients who had a consultation session with surgeons for suspected acute appendicitis were enrolled. Seventy-five percent of the data set was used to construct models including random forest, support vector machines, artificial neural networks, and logistic regression. Twenty-five percent of the data set was withheld to evaluate model performance. The area under the receiver operating characteristic curve (AUC) was used to evaluate performance, which was compared with that of the Alvarado score. Data from a total of 180 patients were collected, 135 used for training and 45 for testing. The mean age of patients was 39.4 years (range, 16-85). Final diagnosis revealed 115 patients with and 65 without appendicitis. The AUC of random forest, support vector machines, artificial neural networks, logistic regression, and Alvarado was 0.98, 0.96, 0.91, 0.87, and 0.77, respectively. The sensitivity, specificity, positive, and negative predictive values of random forest were 94%, 100%, 100%, and 87%, respectively. Random forest performed better than artificial neural networks, logistic regression, and Alvarado. We demonstrated that random forest can predict acute appendicitis with good accuracy and, deployed appropriately, can be an effective tool in clinical decision making. Copyright © 2011 Mosby, Inc. All rights reserved.

  16. An application of model-fitting procedures for marginal structural models.

    PubMed

    Mortimer, Kathleen M; Neugebauer, Romain; van der Laan, Mark; Tager, Ira B

    2005-08-15

    Marginal structural models (MSMs) are being used more frequently to obtain causal effect estimates in observational studies. Although the principal estimator of MSM coefficients has been the inverse probability of treatment weight (IPTW) estimator, there are few published examples that illustrate how to apply IPTW or discuss the impact of model selection on effect estimates. The authors applied IPTW estimation of an MSM to observational data from the Fresno Asthmatic Children's Environment Study (2000-2002) to evaluate the effect of asthma rescue medication use on pulmonary function and compared their results with those obtained through traditional regression methods. Akaike's Information Criterion and cross-validation methods were used to fit the MSM. In this paper, the influence of model selection and evaluation of key assumptions such as the experimental treatment assignment assumption are discussed in detail. Traditional analyses suggested that medication use was not associated with an improvement in pulmonary function--a finding that is counterintuitive and probably due to confounding by symptoms and asthma severity. The final MSM estimated that medication use was causally related to a 7% improvement in pulmonary function. The authors present examples that should encourage investigators who use IPTW estimation to undertake and discuss the impact of model-fitting procedures to justify the choice of the final weights.

  17. The MET Inhibitor AZD6094 (Savolitinib, HMPL-504) Induces Regression in Papillary Renal Cell Carcinoma Patient-Derived Xenograft Models.

    PubMed

    Schuller, Alwin G; Barry, Evan R; Jones, Rhys D O; Henry, Ryan E; Frigault, Melanie M; Beran, Garry; Linsenmayer, David; Hattersley, Maureen; Smith, Aaron; Wilson, Joanne; Cairo, Stefano; Déas, Olivier; Nicolle, Delphine; Adam, Ammar; Zinda, Michael; Reimer, Corinne; Fawell, Stephen E; Clark, Edwin A; D'Cruz, Celina M

    2015-06-15

    Papillary renal cell carcinoma (PRCC) is the second most common cancer of the kidney and carries a poor prognosis for patients with nonlocalized disease. The HGF receptor MET plays a central role in PRCC and aberrations, either through mutation, copy number gain, or trisomy of chromosome 7 occurring in the majority of cases. The development of effective therapies in PRCC has been hampered in part by a lack of available preclinical models. We determined the pharmacodynamic and antitumor response of the selective MET inhibitor AZD6094 in two PRCC patient-derived xenograft (PDX) models. Two PRCC PDX models were identified and MET mutation status and copy number determined. Pharmacodynamic and antitumor activity of AZD6094 was tested using a dose response up to 25 mg/kg daily, representing clinically achievable exposures, and compared with the activity of the RCC standard-of-care sunitinib (in RCC43b) or the multikinase inhibitor crizotinib (in RCC47). AZD6094 treatment resulted in tumor regressions, whereas sunitinib or crizotinib resulted in unsustained growth inhibition. Pharmacodynamic analysis of tumors revealed that AZD6094 could robustly suppress pMET and the duration of target inhibition was dose related. AZD6094 inhibited multiple signaling nodes, including MAPK, PI3K, and EGFR. Finally, at doses that induced tumor regression, AZD6094 resulted in a dose- and time-dependent induction of cleaved PARP, a marker of cell death. Data presented provide the first report testing therapeutics in preclinical in vivo models of PRCC and support the clinical development of AZD6094 in this indication. ©2015 American Association for Cancer Research.

  18. Aspects of porosity prediction using multivariate linear regression

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Byrnes, A.P.; Wilson, M.D.

    1991-03-01

    Highly accurate multiple linear regression models have been developed for sandstones of diverse compositions. Porosity reduction or enhancement processes are controlled by the fundamental variables, Pressure (P), Temperature (T), Time (t), and Composition (X), where composition includes mineralogy, size, sorting, fluid composition, etc. The multiple linear regression equation, of which all linear porosity prediction models are subsets, takes the generalized form: Porosity = C{sub 0} + C{sub 1}(P) + C{sub 2}(T) + C{sub 3}(X) + C{sub 4}(t) + C{sub 5}(PT) + C{sub 6}(PX) + C{sub 7}(Pt) + C{sub 8}(TX) + C{sub 9}(Tt) + C{sub 10}(Xt) + C{sub 11}(PTX) + C{submore » 12}(PXt) + C{sub 13}(PTt) + C{sub 14}(TXt) + C{sub 15}(PTXt). The first four primary variables are often interactive, thus requiring terms involving two or more primary variables (the form shown implies interaction and not necessarily multiplication). The final terms used may also involve simple mathematic transforms such as log X, e{sup T}, X{sup 2}, or more complex transformations such as the Time-Temperature Index (TTI). The X term in the equation above represents a suite of compositional variable and, therefore, a fully expanded equation may include a series of terms incorporating these variables. Numerous published bivariate porosity prediction models involving P (or depth) or Tt (TTI) are effective to a degree, largely because of the high degree of colinearity between p and TTI. However, all such bivariate models ignore the unique contributions of P and Tt, as well as various X terms. These simpler models become poor predictors in regions where colinear relations change, were important variables have been ignored, or where the database does not include a sufficient range or weight distribution for the critical variables.« less

  19. Poisson Mixture Regression Models for Heart Disease Prediction.

    PubMed

    Mufudza, Chipo; Erol, Hamza

    2016-01-01

    Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.

  20. Poisson Mixture Regression Models for Heart Disease Prediction

    PubMed Central

    Erol, Hamza

    2016-01-01

    Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611

  1. Modelling the standing timber volume of Baden-Württemberg-A large-scale approach using a fusion of Landsat, airborne LiDAR and National Forest Inventory data

    NASA Astrophysics Data System (ADS)

    Maack, Joachim; Lingenfelder, Marcus; Weinacker, Holger; Koch, Barbara

    2016-07-01

    Remote sensing-based timber volume estimation is key for modelling the regional potential, accessibility and price of lignocellulosic raw material for an emerging bioeconomy. We used a unique wall-to-wall airborne LiDAR dataset and Landsat 7 satellite images in combination with terrestrial inventory data derived from the National Forest Inventory (NFI), and applied generalized additive models (GAM) to estimate spatially explicit timber distribution and volume in forested areas. Since the NFI data showed an underlying structure regarding size and ownership, we additionally constructed a socio-economic predictor to enhance the accuracy of the analysis. Furthermore, we balanced the training dataset with a bootstrap method to achieve unbiased regression weights for interpolating timber volume. Finally, we compared and discussed the model performance of the original approach (r2 = 0.56, NRMSE = 9.65%), the approach with balanced training data (r2 = 0.69, NRMSE = 12.43%) and the final approach with balanced training data and the additional socio-economic predictor (r2 = 0.72, NRMSE = 12.17%). The results demonstrate the usefulness of remote sensing techniques for mapping timber volume for a future lignocellulose-based bioeconomy.

  2. Prevalence and Characteristics of Bed-Sharing Among Black and White Infants in Georgia.

    PubMed

    Salm Ward, Trina C; Robb, Sara Wagner; Kanu, Florence A

    2016-02-01

    To examine: (1) the prevalence and characteristics of bed-sharing among non-Hispanic Black and White infants in Georgia, and (2) differences in bed-sharing and sleep position behaviors prior to and after the American Academy of Pediatrics' 2005 recommendations against bed-sharing. Georgia Pregnancy Risk Assessment Monitoring System (PRAMS) data were obtained from the Georgia Department of Public Health. Analysis was guided by the socioecological model levels of: Infant, Maternal, Family, and Community/Society within the context of race. Data from 2004 to 2011 were analyzed to address the first objective and from 2000 to 2004 and 2006 to 2011 to address the second objective. Rao-Scott Chi square tests and backward selection unconditional logistic regression models for weighted data were built separately by race; odds ratios (OR) and 95 % Confidence Intervals (CIs) were calculated. A total of 6595 (3528 Black and 3067 White) cases were analyzed between 2004 and 2011. Significantly more Black mothers (81.9 %) reported "ever" bed-sharing compared to White mothers (56 %), p < 0.001. Logistic regression results indicated that the most parsimonious model included variables from all socioecological levels. For Blacks, the final model included infant age, pregnancy intention, number of dependents, and use of Women, Infant and Children (WIC) Services. For Whites, the final model included infant age, maternal age, financial stress, partner-related stress, and WIC. When comparing the period 2000-2004 to 2006-2011, a total of 10,015 (5373 Black and 4642 White cases) were analyzed. A significant decrease in bedsharing was found for both Blacks and Whites; rates of non-supine sleep position decreased significantly for Blacks but not Whites. Continued high rates of bed-sharing and non-supine sleep position for both Blacks and Whites demonstrate an ongoing need for safe infant sleep messaging. Risk profiles for Black and White mothers differed, suggesting the importance of tailored messaging. Specific research and practice implications are identified and described.

  3. Solving large mixed linear models using preconditioned conjugate gradient iteration.

    PubMed

    Strandén, I; Lidauer, M

    1999-12-01

    Continuous evaluation of dairy cattle with a random regression test-day model requires a fast solving method and algorithm. A new computing technique feasible in Jacobi and conjugate gradient based iterative methods using iteration on data is presented. In the new computing technique, the calculations in multiplication of a vector by a matrix were recorded to three steps instead of the commonly used two steps. The three-step method was implemented in a general mixed linear model program that used preconditioned conjugate gradient iteration. Performance of this program in comparison to other general solving programs was assessed via estimation of breeding values using univariate, multivariate, and random regression test-day models. Central processing unit time per iteration with the new three-step technique was, at best, one-third that needed with the old technique. Performance was best with the test-day model, which was the largest and most complex model used. The new program did well in comparison to other general software. Programs keeping the mixed model equations in random access memory required at least 20 and 435% more time to solve the univariate and multivariate animal models, respectively. Computations of the second best iteration on data took approximately three and five times longer for the animal and test-day models, respectively, than did the new program. Good performance was due to fast computing time per iteration and quick convergence to the final solutions. Use of preconditioned conjugate gradient based methods in solving large breeding value problems is supported by our findings.

  4. Using Patient Demographics and Statistical Modeling to Predict Knee Tibia Component Sizing in Total Knee Arthroplasty.

    PubMed

    Ren, Anna N; Neher, Robert E; Bell, Tyler; Grimm, James

    2018-06-01

    Preoperative planning is important to achieve successful implantation in primary total knee arthroplasty (TKA). However, traditional TKA templating techniques are not accurate enough to predict the component size to a very close range. With the goal of developing a general predictive statistical model using patient demographic information, ordinal logistic regression was applied to build a proportional odds model to predict the tibia component size. The study retrospectively collected the data of 1992 primary Persona Knee System TKA procedures. Of them, 199 procedures were randomly selected as testing data and the rest of the data were randomly partitioned between model training data and model evaluation data with a ratio of 7:3. Different models were trained and evaluated on the training and validation data sets after data exploration. The final model had patient gender, age, weight, and height as independent variables and predicted the tibia size within 1 size difference 96% of the time on the validation data, 94% of the time on the testing data, and 92% on a prospective cadaver data set. The study results indicated the statistical model built by ordinal logistic regression can increase the accuracy of tibia sizing information for Persona Knee preoperative templating. This research shows statistical modeling may be used with radiographs to dramatically enhance the templating accuracy, efficiency, and quality. In general, this methodology can be applied to other TKA products when the data are applicable. Copyright © 2018 Elsevier Inc. All rights reserved.

  5. [A novel approach to NIR spectral quantitative analysis: semi-supervised least-squares support vector regression machine].

    PubMed

    Li, Lin; Xu, Shuo; An, Xin; Zhang, Lu-Da

    2011-10-01

    In near infrared spectral quantitative analysis, the precision of measured samples' chemical values is the theoretical limit of those of quantitative analysis with mathematical models. However, the number of samples that can obtain accurately their chemical values is few. Many models exclude the amount of samples without chemical values, and consider only these samples with chemical values when modeling sample compositions' contents. To address this problem, a semi-supervised LS-SVR (S2 LS-SVR) model is proposed on the basis of LS-SVR, which can utilize samples without chemical values as well as those with chemical values. Similar to the LS-SVR, to train this model is equivalent to solving a linear system. Finally, the samples of flue-cured tobacco were taken as experimental material, and corresponding quantitative analysis models were constructed for four sample compositions' content(total sugar, reducing sugar, total nitrogen and nicotine) with PLS regression, LS-SVR and S2 LS-SVR. For the S2 LS-SVR model, the average relative errors between actual values and predicted ones for the four sample compositions' contents are 6.62%, 7.56%, 6.11% and 8.20%, respectively, and the correlation coefficients are 0.974 1, 0.973 3, 0.923 0 and 0.948 6, respectively. Experimental results show the S2 LS-SVR model outperforms the other two, which verifies the feasibility and efficiency of the S2 LS-SVR model.

  6. Using Explanatory Item Response Models to Evaluate Complex Scientific Tasks Designed for the Next Generation Science Standards

    NASA Astrophysics Data System (ADS)

    Chiu, Tina

    This dissertation includes three studies that analyze a new set of assessment tasks developed by the Learning Progressions in Middle School Science (LPS) Project. These assessment tasks were designed to measure science content knowledge on the structure of matter domain and scientific argumentation, while following the goals from the Next Generation Science Standards (NGSS). The three studies focus on the evidence available for the success of this design and its implementation, generally labelled as "validity" evidence. I use explanatory item response models (EIRMs) as the overarching framework to investigate these assessment tasks. These models can be useful when gathering validity evidence for assessments as they can help explain student learning and group differences. In the first study, I explore the dimensionality of the LPS assessment by comparing the fit of unidimensional, between-item multidimensional, and Rasch testlet models to see which is most appropriate for this data. By applying multidimensional item response models, multiple relationships can be investigated, and in turn, allow for a more substantive look into the assessment tasks. The second study focuses on person predictors through latent regression and differential item functioning (DIF) models. Latent regression models show the influence of certain person characteristics on item responses, while DIF models test whether one group is differentially affected by specific assessment items, after conditioning on latent ability. Finally, the last study applies the linear logistic test model (LLTM) to investigate whether item features can help explain differences in item difficulties.

  7. Boosted structured additive regression for Escherichia coli fed-batch fermentation modeling.

    PubMed

    Melcher, Michael; Scharl, Theresa; Luchner, Markus; Striedner, Gerald; Leisch, Friedrich

    2017-02-01

    The quality of biopharmaceuticals and patients' safety are of highest priority and there are tremendous efforts to replace empirical production process designs by knowledge-based approaches. Main challenge in this context is that real-time access to process variables related to product quality and quantity is severely limited. To date comprehensive on- and offline monitoring platforms are used to generate process data sets that allow for development of mechanistic and/or data driven models for real-time prediction of these important quantities. Ultimate goal is to implement model based feed-back control loops that facilitate online control of product quality. In this contribution, we explore structured additive regression (STAR) models in combination with boosting as a variable selection tool for modeling the cell dry mass, product concentration, and optical density on the basis of online available process variables and two-dimensional fluorescence spectroscopic data. STAR models are powerful extensions of linear models allowing for inclusion of smooth effects or interactions between predictors. Boosting constructs the final model in a stepwise manner and provides a variable importance measure via predictor selection frequencies. Our results show that the cell dry mass can be modeled with a relative error of about ±3%, the optical density with ±6%, the soluble protein with ±16%, and the insoluble product with an accuracy of ±12%. Biotechnol. Bioeng. 2017;114: 321-334. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  8. A Global Land Use Regression Model for Nitrogen Dioxide Air Pollution

    PubMed Central

    Larkin, Andrew; Geddes, Jeffrey A.; Martin, Randall V.; Xiao, Qingyang; Liu, Yang; Marshall, Julian D.; Brauer, Michael; Hystad, Perry

    2017-01-01

    Nitrogen dioxide is a common air pollutant with growing evidence of health impacts independent of other common pollutants such as ozone and particulate matter. However, the global distribution of NO2 exposure and associated impacts on global health is still largely uncertain. To advance global exposure estimates we created a global nitrogen dioxide (NO2) land use regression model for 2011 using annual measurements from 5,220 air monitors in 58 countries. The model captured 54% of global NO2 variation, with a mean absolute error of 3.7 ppb. Regional performance varied from R2 = 0.42 (Africa) to 0.67 (South America). Repeated 10% cross-validation using bootstrap sampling (n=10,000) demonstrated robust performance with respect to air monitor sampling in North America, Europe, and Asia (adjusted R2 within 2%) but not for Africa and Oceania (adjusted R2 within 11%) where NO2 monitoring data are sparse. The final model included 10 variables that captured both between and within-city spatial gradients in NO2 concentrations. Variable contributions differed between continental regions but major roads within 100m and satellite-derived NO2 were consistently the strongest predictors. The resulting model will be made available and can be used for global risk assessments and health studies, particularly in countries without existing NO2 monitoring data or models. PMID:28520422

  9. Study of cyanotoxins presence from experimental cyanobacteria concentrations using a new data mining methodology based on multivariate adaptive regression splines in Trasona reservoir (Northern Spain).

    PubMed

    Garcia Nieto, P J; Sánchez Lasheras, F; de Cos Juez, F J; Alonso Fernández, J R

    2011-11-15

    There is an increasing need to describe cyanobacteria blooms since some cyanobacteria produce toxins, termed cyanotoxins. These latter can be toxic and dangerous to humans as well as other animals and life in general. It must be remarked that the cyanobacteria are reproduced explosively under certain conditions. This results in algae blooms, which can become harmful to other species if the cyanobacteria involved produce cyanotoxins. In this research work, the evolution of cyanotoxins in Trasona reservoir (Principality of Asturias, Northern Spain) was studied with success using the data mining methodology based on multivariate adaptive regression splines (MARS) technique. The results of the present study are two-fold. On one hand, the importance of the different kind of cyanobacteria over the presence of cyanotoxins in the reservoir is presented through the MARS model and on the other hand a predictive model able to forecast the possible presence of cyanotoxins in a short term was obtained. The agreement of the MARS model with experimental data confirmed the good performance of the same one. Finally, conclusions of this innovative research are exposed. Copyright © 2011 Elsevier B.V. All rights reserved.

  10. Robust regression for large-scale neuroimaging studies.

    PubMed

    Fritsch, Virgile; Da Mota, Benoit; Loth, Eva; Varoquaux, Gaël; Banaschewski, Tobias; Barker, Gareth J; Bokde, Arun L W; Brühl, Rüdiger; Butzek, Brigitte; Conrod, Patricia; Flor, Herta; Garavan, Hugh; Lemaitre, Hervé; Mann, Karl; Nees, Frauke; Paus, Tomas; Schad, Daniel J; Schümann, Gunter; Frouin, Vincent; Poline, Jean-Baptiste; Thirion, Bertrand

    2015-05-01

    Multi-subject datasets used in neuroimaging group studies have a complex structure, as they exhibit non-stationary statistical properties across regions and display various artifacts. While studies with small sample sizes can rarely be shown to deviate from standard hypotheses (such as the normality of the residuals) due to the poor sensitivity of normality tests with low degrees of freedom, large-scale studies (e.g. >100 subjects) exhibit more obvious deviations from these hypotheses and call for more refined models for statistical inference. Here, we demonstrate the benefits of robust regression as a tool for analyzing large neuroimaging cohorts. First, we use an analytic test based on robust parameter estimates; based on simulations, this procedure is shown to provide an accurate statistical control without resorting to permutations. Second, we show that robust regression yields more detections than standard algorithms using as an example an imaging genetics study with 392 subjects. Third, we show that robust regression can avoid false positives in a large-scale analysis of brain-behavior relationships with over 1500 subjects. Finally we embed robust regression in the Randomized Parcellation Based Inference (RPBI) method and demonstrate that this combination further improves the sensitivity of tests carried out across the whole brain. Altogether, our results show that robust procedures provide important advantages in large-scale neuroimaging group studies. Copyright © 2015 Elsevier Inc. All rights reserved.

  11. Correlational analysis of neck/shoulder pain and low back pain with the use of digital products, physical activity and psychological status among adolescents in Shanghai.

    PubMed

    Shan, Zhi; Deng, Guoying; Li, Jipeng; Li, Yangyang; Zhang, Yongxing; Zhao, Qinghua

    2013-01-01

    This study investigates the neck/shoulder pain (NSP) and low back pain (LBP) among current high school students in Shanghai and explores the relationship between these pains and their possible influences, including digital products, physical activity, and psychological status. An anonymous self-assessment was administered to 3,600 students across 30 high schools in Shanghai. This questionnaire examined the prevalence of NSP and LBP and the level of physical activity as well as the use of mobile phones, personal computers (PC) and tablet computers (Tablet). The CES-D (Center for Epidemiological Studies Depression) scale was also included in the survey. The survey data were analyzed using the chi-square test, univariate logistic analyses and a multivariate logistic regression model. Three thousand sixteen valid questionnaires were received including 1,460 (48.41%) from male respondents and 1,556 (51.59%) from female respondents. The high school students in this study showed NSP and LBP rates of 40.8% and 33.1%, respectively, and the prevalence of both influenced by the student's grade, use of digital products, and mental status; these factors affected the rates of NSP and LBP to varying degrees. The multivariate logistic regression analysis revealed that Gender, grade, soreness after exercise, PC using habits, tablet use, sitting time after school and academic stress entered the final model of NSP, while the final model of LBP consisted of gender, grade, soreness after exercise, PC using habits, mobile phone use, sitting time after school, academic stress and CES-D score. High school students in Shanghai showed high prevalence of NSP and LBP that were closely related to multiple factors. Appropriate interventions should be implemented to reduce the occurrences of NSP and LBP.

  12. Predicting the duration of sickness absence for patients with common mental disorders in occupational health care.

    PubMed

    Nieuwenhuijsen, Karen; Verbeek, Jos H A M; de Boer, Angela G E M; Blonk, Roland W B; van Dijk, Frank J H

    2006-02-01

    This study attempted to determine the factors that best predict the duration of absence from work among employees with common mental disorders. A cohort of 188 employees, of whom 102 were teachers, on sick leave with common mental disorders was followed for 1 year. Only information potentially available to the occupational physician during a first consultation was included in the predictive model. The predictive power of the variables was tested using Cox's regression analysis with a stepwise backward selection procedure. The hazard ratios (HR) from the final model were used to deduce a simple prediction rule. The resulting prognostic scores were then used to predict the probability of not returning to work after 3, 6, and 12 months. Calculating the area under the curve from the ROC (receiver operating characteristic) curve tested the discriminative ability of the prediction rule. The final Cox's regression model produced the following four predictors of a longer time until return to work: age older than 50 years [HR 0.5, 95% confidence interval (95% CI) 0.3-0.8], expectation of duration absence longer than 3 months (HR 0.5, 95% CI 0.3-0.8), higher educational level (HR 0.5, 95% CI 0.3-0.8), and diagnosis depression or anxiety disorder (HR 0.7, 95% CI 0.4-0.9). The resulting prognostic score yielded areas under the curves ranging from 0.68 to 0.73, which represent acceptable discrimination of the rule. A prediction rule based on four simple variables can be used by occupational physicians to identify unfavorable cases and to predict the duration of sickness absence.

  13. The relationship of bone and blood lead to hypertension: Further analyses of the normative aging study data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hu, H.; Kim, Rokho; Korrick, S.

    1996-12-31

    In an earlier report based on participants in the Veterans Administration Normative Aging Study, we found a significant association between the risk of hypertension and lead levels in tibia. To examine the possible confounding effects of education and occupation, we considered in this study five levels of education and three levels of occupation as independent variables in the statistical model. Of 1,171 active subjects seen between August 1991 and December 1994, 563 provided complete data for this analysis. In the initial logistic regression model, acre and body mass index, family history of hypertension, and dietary sodium intake, but neither cumulativemore » smoking nor alcohol ingestion, conferred increased odds ratios for being hypertensive that were statistically significant. When the lead biomarkers were added separately to this initial logistic model, tibia lead and patella lead levels were associated with significantly elevated odds ratios for hypertension. In the final backward elimination logistic regression model that included categorical variables for education and occupation, the only variables retained were body mass index, family history of hypertension, and tibia lead level. We conclude that education and occupation variables were not confounding the association between the lead biomarkers and hypertension that we reported previously. 27 refs., 3 tabs.« less

  14. Parametric regression model for survival data: Weibull regression model as an example

    PubMed Central

    2016-01-01

    Weibull regression model is one of the most popular forms of parametric regression model that it provides estimate of baseline hazard function, as well as coefficients for covariates. Because of technical difficulties, Weibull regression model is seldom used in medical literature as compared to the semi-parametric proportional hazard model. To make clinical investigators familiar with Weibull regression model, this article introduces some basic knowledge on Weibull regression model and then illustrates how to fit the model with R software. The SurvRegCensCov package is useful in converting estimated coefficients to clinical relevant statistics such as hazard ratio (HR) and event time ratio (ETR). Model adequacy can be assessed by inspecting Kaplan-Meier curves stratified by categorical variable. The eha package provides an alternative method to model Weibull regression model. The check.dist() function helps to assess goodness-of-fit of the model. Variable selection is based on the importance of a covariate, which can be tested using anova() function. Alternatively, backward elimination starting from a full model is an efficient way for model development. Visualization of Weibull regression model after model development is interesting that it provides another way to report your findings. PMID:28149846

  15. Survival modeling for the estimation of transition probabilities in model-based economic evaluations in the absence of individual patient data: a tutorial.

    PubMed

    Diaby, Vakaramoko; Adunlin, Georges; Montero, Alberto J

    2014-02-01

    Survival modeling techniques are increasingly being used as part of decision modeling for health economic evaluations. As many models are available, it is imperative for interested readers to know about the steps in selecting and using the most suitable ones. The objective of this paper is to propose a tutorial for the application of appropriate survival modeling techniques to estimate transition probabilities, for use in model-based economic evaluations, in the absence of individual patient data (IPD). An illustration of the use of the tutorial is provided based on the final progression-free survival (PFS) analysis of the BOLERO-2 trial in metastatic breast cancer (mBC). An algorithm was adopted from Guyot and colleagues, and was then run in the statistical package R to reconstruct IPD, based on the final PFS analysis of the BOLERO-2 trial. It should be emphasized that the reconstructed IPD represent an approximation of the original data. Afterwards, we fitted parametric models to the reconstructed IPD in the statistical package Stata. Both statistical and graphical tests were conducted to verify the relative and absolute validity of the findings. Finally, the equations for transition probabilities were derived using the general equation for transition probabilities used in model-based economic evaluations, and the parameters were estimated from fitted distributions. The results of the application of the tutorial suggest that the log-logistic model best fits the reconstructed data from the latest published Kaplan-Meier (KM) curves of the BOLERO-2 trial. Results from the regression analyses were confirmed graphically. An equation for transition probabilities was obtained for each arm of the BOLERO-2 trial. In this paper, a tutorial was proposed and used to estimate the transition probabilities for model-based economic evaluation, based on the results of the final PFS analysis of the BOLERO-2 trial in mBC. The results of our study can serve as a basis for any model (Markov) that needs the parameterization of transition probabilities, and only has summary KM plots available.

  16. Comment on "Cosmic-ray-driven reaction and greenhouse effect of halogenated molecules: Culprits for atmospheric ozone depletion and global climate change"

    NASA Astrophysics Data System (ADS)

    Nuccitelli, Dana; Cowtan, Kevin; Jacobs, Peter; Richardson, Mark; Way, Robert G.; Blackburn, Anne-Marie; Stolpe, Martin B.; Cook, John

    2014-04-01

    Lu (2013) (L13) argued that solar effects and anthropogenic halogenated gases can explain most of the observed warming of global mean surface air temperatures since 1850, with virtually no contribution from atmospheric carbon dioxide (CO2) concentrations. Here we show that this conclusion is based on assumptions about the saturation of the CO2-induced greenhouse effect that have been experimentally falsified. L13 also confuses equilibrium and transient response, and relies on data sources that have been superseeded due to known inaccuracies. Furthermore, the statistical approach of sequential linear regression artificially shifts variance onto the first predictor. L13's artificial choice of regression order and neglect of other relevant data is the fundamental cause of the incorrect main conclusion. Consideration of more modern data and a more parsimonious multiple regression model leads to contradiction with L13's statistical results. Finally, the correlation arguments in L13 are falsified by considering either the more appropriate metric of global heat accumulation, or data on longer timescales.

  17. Regression analysis of case K interval-censored failure time data in the presence of informative censoring.

    PubMed

    Wang, Peijie; Zhao, Hui; Sun, Jianguo

    2016-12-01

    Interval-censored failure time data occur in many fields such as demography, economics, medical research, and reliability and many inference procedures on them have been developed (Sun, 2006; Chen, Sun, and Peace, 2012). However, most of the existing approaches assume that the mechanism that yields interval censoring is independent of the failure time of interest and it is clear that this may not be true in practice (Zhang et al., 2007; Ma, Hu, and Sun, 2015). In this article, we consider regression analysis of case K interval-censored failure time data when the censoring mechanism may be related to the failure time of interest. For the problem, an estimated sieve maximum-likelihood approach is proposed for the data arising from the proportional hazards frailty model and for estimation, a two-step procedure is presented. In the addition, the asymptotic properties of the proposed estimators of regression parameters are established and an extensive simulation study suggests that the method works well. Finally, we apply the method to a set of real interval-censored data that motivated this study. © 2016, The International Biometric Society.

  18. The application of artificial neural networks and support vector regression for simultaneous spectrophotometric determination of commercial eye drop contents

    NASA Astrophysics Data System (ADS)

    Valizadeh, Maryam; Sohrabi, Mahmoud Reza

    2018-03-01

    In the present study, artificial neural networks (ANNs) and support vector regression (SVR) as intelligent methods coupled with UV spectroscopy for simultaneous quantitative determination of Dorzolamide (DOR) and Timolol (TIM) in eye drop. Several synthetic mixtures were analyzed for validating the proposed methods. At first, neural network time series, which one type of network from the artificial neural network was employed and its efficiency was evaluated. Afterwards, the radial basis network was applied as another neural network. Results showed that the performance of this method is suitable for predicting. Finally, support vector regression was proposed to construct the Zilomole prediction model. Also, root mean square error (RMSE) and mean recovery (%) were calculated for SVR method. Moreover, the proposed methods were compared to the high-performance liquid chromatography (HPLC) as a reference method. One way analysis of variance (ANOVA) test at the 95% confidence level applied to the comparison results of suggested and reference methods that there were no significant differences between them. Also, the effect of interferences was investigated in spike solutions.

  19. Introduction to the use of regression models in epidemiology.

    PubMed

    Bender, Ralf

    2009-01-01

    Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.

  20. An overview of techniques for linking high-dimensional molecular data to time-to-event endpoints by risk prediction models.

    PubMed

    Binder, Harald; Porzelius, Christine; Schumacher, Martin

    2011-03-01

    Analysis of molecular data promises identification of biomarkers for improving prognostic models, thus potentially enabling better patient management. For identifying such biomarkers, risk prediction models can be employed that link high-dimensional molecular covariate data to a clinical endpoint. In low-dimensional settings, a multitude of statistical techniques already exists for building such models, e.g. allowing for variable selection or for quantifying the added value of a new biomarker. We provide an overview of techniques for regularized estimation that transfer this toward high-dimensional settings, with a focus on models for time-to-event endpoints. Techniques for incorporating specific covariate structure are discussed, as well as techniques for dealing with more complex endpoints. Employing gene expression data from patients with diffuse large B-cell lymphoma, some typical modeling issues from low-dimensional settings are illustrated in a high-dimensional application. First, the performance of classical stepwise regression is compared to stage-wise regression, as implemented by a component-wise likelihood-based boosting approach. A second issues arises, when artificially transforming the response into a binary variable. The effects of the resulting loss of efficiency and potential bias in a high-dimensional setting are illustrated, and a link to competing risks models is provided. Finally, we discuss conditions for adequately quantifying the added value of high-dimensional gene expression measurements, both at the stage of model fitting and when performing evaluation. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. Selection of relevant input variables in storm water quality modeling by multiobjective evolutionary polynomial regression paradigm

    NASA Astrophysics Data System (ADS)

    Creaco, E.; Berardi, L.; Sun, Siao; Giustolisi, O.; Savic, D.

    2016-04-01

    The growing availability of field data, from information and communication technologies (ICTs) in "smart" urban infrastructures, allows data modeling to understand complex phenomena and to support management decisions. Among the analyzed phenomena, those related to storm water quality modeling have recently been gaining interest in the scientific literature. Nonetheless, the large amount of available data poses the problem of selecting relevant variables to describe a phenomenon and enable robust data modeling. This paper presents a procedure for the selection of relevant input variables using the multiobjective evolutionary polynomial regression (EPR-MOGA) paradigm. The procedure is based on scrutinizing the explanatory variables that appear inside the set of EPR-MOGA symbolic model expressions of increasing complexity and goodness of fit to target output. The strategy also enables the selection to be validated by engineering judgement. In such context, the multiple case study extension of EPR-MOGA, called MCS-EPR-MOGA, is adopted. The application of the proposed procedure to modeling storm water quality parameters in two French catchments shows that it was able to significantly reduce the number of explanatory variables for successive analyses. Finally, the EPR-MOGA models obtained after the input selection are compared with those obtained by using the same technique without benefitting from input selection and with those obtained in previous works where other data-modeling techniques were used on the same data. The comparison highlights the effectiveness of both EPR-MOGA and the input selection procedure.

  2. Interpretation of commonly used statistical regression models.

    PubMed

    Kasza, Jessica; Wolfe, Rory

    2014-01-01

    A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.

  3. Relationship Between Dental Students' Pre-Admission Record and Performance on the Comprehensive Basic Science Examination.

    PubMed

    Lee, Kevin C; Lee, Victor Y; Zubiaurre, Laureen A; Grbic, John T; Eisig, Sidney B

    2018-04-01

    The Comprehensive Basic Science Examination (CBSE) is the entrance examination for oral and maxillofacial surgery, but its implementation among dental students is a relatively recent and unintended use. The aim of this study was to examine the relationship between pre-admission data and performance on the CBSE for dental students at the Columbia University College of Dental Medicine (CDM). This study followed a retrospective cohort, examining data for the CDM Classes of 2014-19. Data collected were Dental Admission Test (DAT) and CBSE scores and undergraduate GPAs for 49 CDM students who took the CBSE from September 2013 to July 2016. The results showed that the full regression model did not demonstrate significant predictive capability (F[8,40]=1.70, p=0.13). Following stepwise regression, only the DAT Perceptual Ability score remained in the final model (F[1,47]=7.97, p<0.01). Variations in DAT Perceptual Ability scores explained 15% of the variability in CBSE scores (R 2 =0.15). This study found that, among these students, pre-admission data were poor predictors of CBSE performance.

  4. Testing in Microbiome-Profiling Studies with MiRKAT, the Microbiome Regression-Based Kernel Association Test

    PubMed Central

    Zhao, Ni; Chen, Jun; Carroll, Ian M.; Ringel-Kulka, Tamar; Epstein, Michael P.; Zhou, Hua; Zhou, Jin J.; Ringel, Yehuda; Li, Hongzhe; Wu, Michael C.

    2015-01-01

    High-throughput sequencing technology has enabled population-based studies of the role of the human microbiome in disease etiology and exposure response. Distance-based analysis is a popular strategy for evaluating the overall association between microbiome diversity and outcome, wherein the phylogenetic distance between individuals’ microbiome profiles is computed and tested for association via permutation. Despite their practical popularity, distance-based approaches suffer from important challenges, especially in selecting the best distance and extending the methods to alternative outcomes, such as survival outcomes. We propose the microbiome regression-based kernel association test (MiRKAT), which directly regresses the outcome on the microbiome profiles via the semi-parametric kernel machine regression framework. MiRKAT allows for easy covariate adjustment and extension to alternative outcomes while non-parametrically modeling the microbiome through a kernel that incorporates phylogenetic distance. It uses a variance-component score statistic to test for the association with analytical p value calculation. The model also allows simultaneous examination of multiple distances, alleviating the problem of choosing the best distance. Our simulations demonstrated that MiRKAT provides correctly controlled type I error and adequate power in detecting overall association. “Optimal” MiRKAT, which considers multiple candidate distances, is robust in that it suffers from little power loss in comparison to when the best distance is used and can achieve tremendous power gain in comparison to when a poor distance is chosen. Finally, we applied MiRKAT to real microbiome datasets to show that microbial communities are associated with smoking and with fecal protease levels after confounders are controlled for. PMID:25957468

  5. A simple prognostic model for overall survival in metastatic renal cell carcinoma.

    PubMed

    Assi, Hazem I; Patenaude, Francois; Toumishey, Ethan; Ross, Laura; Abdelsalam, Mahmoud; Reiman, Tony

    2016-01-01

    The primary purpose of this study was to develop a simpler prognostic model to predict overall survival for patients treated for metastatic renal cell carcinoma (mRCC) by examining variables shown in the literature to be associated with survival. We conducted a retrospective analysis of patients treated for mRCC at two Canadian centres. All patients who started first-line treatment were included in the analysis. A multivariate Cox proportional hazards regression model was constructed using a stepwise procedure. Patients were assigned to risk groups depending on how many of the three risk factors from the final multivariate model they had. There were three risk factors in the final multivariate model: hemoglobin, prior nephrectomy, and time from diagnosis to treatment. Patients in the high-risk group (two or three risk factors) had a median survival of 5.9 months, while those in the intermediate-risk group (one risk factor) had a median survival of 16.2 months, and those in the low-risk group (no risk factors) had a median survival of 50.6 months. In multivariate analysis, shorter survival times were associated with hemoglobin below the lower limit of normal, absence of prior nephrectomy, and initiation of treatment within one year of diagnosis.

  6. A simple prognostic model for overall survival in metastatic renal cell carcinoma

    PubMed Central

    Assi, Hazem I.; Patenaude, Francois; Toumishey, Ethan; Ross, Laura; Abdelsalam, Mahmoud; Reiman, Tony

    2016-01-01

    Introduction: The primary purpose of this study was to develop a simpler prognostic model to predict overall survival for patients treated for metastatic renal cell carcinoma (mRCC) by examining variables shown in the literature to be associated with survival. Methods: We conducted a retrospective analysis of patients treated for mRCC at two Canadian centres. All patients who started first-line treatment were included in the analysis. A multivariate Cox proportional hazards regression model was constructed using a stepwise procedure. Patients were assigned to risk groups depending on how many of the three risk factors from the final multivariate model they had. Results: There were three risk factors in the final multivariate model: hemoglobin, prior nephrectomy, and time from diagnosis to treatment. Patients in the high-risk group (two or three risk factors) had a median survival of 5.9 months, while those in the intermediate-risk group (one risk factor) had a median survival of 16.2 months, and those in the low-risk group (no risk factors) had a median survival of 50.6 months. Conclusions: In multivariate analysis, shorter survival times were associated with hemoglobin below the lower limit of normal, absence of prior nephrectomy, and initiation of treatment within one year of diagnosis. PMID:27217858

  7. Latent transition analysis of pre-service teachers' efficacy in mathematics and science

    NASA Astrophysics Data System (ADS)

    Ward, Elizabeth Kennedy

    This study modeled changes in pre-service teacher efficacy in mathematics and science over the course of the final year of teacher preparation using latent transition analysis (LTA), a longitudinal form of analysis that builds on two modeling traditions (latent class analysis (LCA) and auto-regressive modeling). Data were collected using the STEBI-B, MTEBI-r, and the ABNTMS instruments. The findings suggest that LTA is a viable technique for use in teacher efficacy research. Teacher efficacy is modeled as a construct with two dimensions: personal teaching efficacy (PTE) and outcome expectancy (OE). Findings suggest that the mathematics and science teaching efficacy (PTE) of pre-service teachers is a multi-class phenomena. The analyses revealed a four-class model of PTE at the beginning and end of the final year of teacher training. Results indicate that when pre-service teachers transition between classes, they tend to move from a lower efficacy class into a higher efficacy class. In addition, the findings suggest that time-varying variables (attitudes and beliefs) and time-invariant variables (previous coursework, previous experiences, and teacher perceptions) are statistically significant predictors of efficacy class membership. Further, analyses suggest that the measures used to assess outcome expectancy are not suitable for LCA and LTA procedures.

  8. Extreme Sparse Multinomial Logistic Regression: A Fast and Robust Framework for Hyperspectral Image Classification

    NASA Astrophysics Data System (ADS)

    Cao, Faxian; Yang, Zhijing; Ren, Jinchang; Ling, Wing-Kuen; Zhao, Huimin; Marshall, Stephen

    2017-12-01

    Although the sparse multinomial logistic regression (SMLR) has provided a useful tool for sparse classification, it suffers from inefficacy in dealing with high dimensional features and manually set initial regressor values. This has significantly constrained its applications for hyperspectral image (HSI) classification. In order to tackle these two drawbacks, an extreme sparse multinomial logistic regression (ESMLR) is proposed for effective classification of HSI. First, the HSI dataset is projected to a new feature space with randomly generated weight and bias. Second, an optimization model is established by the Lagrange multiplier method and the dual principle to automatically determine a good initial regressor for SMLR via minimizing the training error and the regressor value. Furthermore, the extended multi-attribute profiles (EMAPs) are utilized for extracting both the spectral and spatial features. A combinational linear multiple features learning (MFL) method is proposed to further enhance the features extracted by ESMLR and EMAPs. Finally, the logistic regression via the variable splitting and the augmented Lagrangian (LORSAL) is adopted in the proposed framework for reducing the computational time. Experiments are conducted on two well-known HSI datasets, namely the Indian Pines dataset and the Pavia University dataset, which have shown the fast and robust performance of the proposed ESMLR framework.

  9. Application and interpretation of functional data analysis techniques to differential scanning calorimetry data from lupus patients.

    PubMed

    Kendrick, Sarah K; Zheng, Qi; Garbett, Nichola C; Brock, Guy N

    2017-01-01

    DSC is used to determine thermally-induced conformational changes of biomolecules within a blood plasma sample. Recent research has indicated that DSC curves (or thermograms) may have different characteristics based on disease status and, thus, may be useful as a monitoring and diagnostic tool for some diseases. Since thermograms are curves measured over a range of temperature values, they are considered functional data. In this paper we apply functional data analysis techniques to analyze differential scanning calorimetry (DSC) data from individuals from the Lupus Family Registry and Repository (LFRR). The aim was to assess the effect of lupus disease status as well as additional covariates on the thermogram profiles, and use FD analysis methods to create models for classifying lupus vs. control patients on the basis of the thermogram curves. Thermograms were collected for 300 lupus patients and 300 controls without lupus who were matched with diseased individuals based on sex, race, and age. First, functional regression with a functional response (DSC) and categorical predictor (disease status) was used to determine how thermogram curve structure varied according to disease status and other covariates including sex, race, and year of birth. Next, functional logistic regression with disease status as the response and functional principal component analysis (FPCA) scores as the predictors was used to model the effect of thermogram structure on disease status prediction. The prediction accuracy for patients with Osteoarthritis and Rheumatoid Arthritis but without Lupus was also calculated to determine the ability of the classifier to differentiate between Lupus and other diseases. Data were divided 1000 times into separate 2/3 training and 1/3 test data for evaluation of predictions. Finally, derivatives of thermogram curves were included in the models to determine whether they aided in prediction of disease status. Functional regression with thermogram as a functional response and disease status as predictor showed a clear separation in thermogram curve structure between cases and controls. The logistic regression model with FPCA scores as the predictors gave the most accurate results with a mean 79.22% correct classification rate with a mean sensitivity = 79.70%, and specificity = 81.48%. The model correctly classified OA and RA patients without Lupus as controls at a rate of 75.92% on average with a mean sensitivity = 79.70% and specificity = 77.6%. Regression models including FPCA scores for derivative curves did not perform as well, nor did regression models including covariates. Changes in thermograms observed in the disease state likely reflect covalent modifications of plasma proteins or changes in large protein-protein interacting networks resulting in the stabilization of plasma proteins towards thermal denaturation. By relating functional principal components from thermograms to disease status, our Functional Principal Component Analysis model provides results that are more easily interpretable compared to prior studies. Further, the model could also potentially be coupled with other biomarkers to improve diagnostic classification for lupus.

  10. Injury risk functions based on population-based finite element model responses: Application to femurs under dynamic three-point bending.

    PubMed

    Park, Gwansik; Forman, Jason; Kim, Taewung; Panzer, Matthew B; Crandall, Jeff R

    2018-02-28

    The goal of this study was to explore a framework for developing injury risk functions (IRFs) in a bottom-up approach based on responses of parametrically variable finite element (FE) models representing exemplar populations. First, a parametric femur modeling tool was developed and validated using a subject-specific (SS)-FE modeling approach. Second, principal component analysis and regression were used to identify parametric geometric descriptors of the human femur and the distribution of those factors for 3 target occupant sizes (5th, 50th, and 95th percentile males). Third, distributions of material parameters of cortical bone were obtained from the literature for 3 target occupant ages (25, 50, and 75 years) using regression analysis. A Monte Carlo method was then implemented to generate populations of FE models of the femur for target occupants, using a parametric femur modeling tool. Simulations were conducted with each of these models under 3-point dynamic bending. Finally, model-based IRFs were developed using logistic regression analysis, based on the moment at fracture observed in the FE simulation. In total, 100 femur FE models incorporating the variation in the population of interest were generated, and 500,000 moments at fracture were observed (applying 5,000 ultimate strains for each synthesized 100 femur FE models) for each target occupant characteristics. Using the proposed framework on this study, the model-based IRFs for 3 target male occupant sizes (5th, 50th, and 95th percentiles) and ages (25, 50, and 75 years) were developed. The model-based IRF was located in the 95% confidence interval of the test-based IRF for the range of 15 to 70% injury risks. The 95% confidence interval of the developed IRF was almost in line with the mean curve due to a large number of data points. The framework proposed in this study would be beneficial for developing the IRFs in a bottom-up manner, whose range of variabilities is informed by the population-based FE model responses. Specifically, this method mitigates the uncertainties in applying empirical scaling and may improve IRF fidelity when a limited number of experimental specimens are available.

  11. Land Use Regression Models for Alkylbenzenes in a Middle Eastern Megacity: Tehran Study of Exposure Prediction for Environmental Health Research (Tehran SEPEHR).

    PubMed

    Amini, Heresh; Schindler, Christian; Hosseini, Vahid; Yunesian, Masud; Künzli, Nino

    2017-08-01

    Land use regression (LUR) has not been applied thus far to ambient alkylbenzenes in highly polluted megacities. We advanced LUR models for benzene, toluene, ethylbenzene, p-xylene, m-xylene, o-xylene (BTEX), and total BTEX using measurement based estimates of annual means at 179 sites in Tehran megacity, Iran. Overall, 520 predictors were evaluated, such as The Weather Research and Forecasting Model meteorology predictions, emission inventory, and several new others. The final models with R 2 values ranging from 0.64 for p-xylene to 0.70 for benzene were mainly driven by traffic-related variables but the proximity to sewage treatment plants was present in all models indicating a major local source of alkylbenzenes not used in any previous study. We further found that large buffers are needed to explain annual mean concentrations of alkylbenzenes in complex situations of a megacity. About 83% of Tehran's surface had benzene concentrations above air quality standard of 5 μg/m 3 set by European Union and Iranian Government. Toluene was the predominant alkylbenzene, and the most polluted area was the city center. Our analyses on differences between wealthier and poorer areas also showed somewhat higher concentrations for the latter. This is the largest LUR study to predict all BTEX species in a megacity.

  12. A New Hybrid Spatio-temporal Model for Estimating Daily Multi-year PM2.5 Concentrations Across Northeastern USA Using High Resolution Aerosol Optical Depth Data

    NASA Technical Reports Server (NTRS)

    Kloog, Itai; Chudnovsky, Alexandra A.; Just, Allan C.; Nordio, Francesco; Koutrakis, Petros; Coull, Brent A.; Lyapustin, Alexei; Wang, Yujie; Schwartz, Joel

    2014-01-01

    The use of satellite-based aerosol optical depth (AOD) to estimate fine particulate matter PM(sub 2.5) for epidemiology studies has increased substantially over the past few years. These recent studies often report moderate predictive power, which can generate downward bias in effect estimates. In addition, AOD measurements have only moderate spatial resolution, and have substantial missing data. We make use of recent advances in MODIS satellite data processing algorithms (Multi-Angle Implementation of Atmospheric Correction (MAIAC), which allow us to use 1 km (versus currently available 10 km) resolution AOD data.We developed and cross validated models to predict daily PM(sub 2.5) at a 1X 1 km resolution across the northeastern USA (New England, New York and New Jersey) for the years 2003-2011, allowing us to better differentiate daily and long term exposure between urban, suburban, and rural areas. Additionally, we developed an approach that allows us to generate daily high-resolution 200 m localized predictions representing deviations from the area 1 X 1 km grid predictions. We used mixed models regressing PM(sub 2.5) measurements against day-specific random intercepts, and fixed and random AOD and temperature slopes. We then use generalized additive mixed models with spatial smoothing to generate grid cell predictions when AOD was missing. Finally, to get 200 m localized predictions, we regressed the residuals from the final model for each monitor against the local spatial and temporal variables at each monitoring site. Our model performance was excellent (mean out-of-sample R(sup 2) = 0.88). The spatial and temporal components of the out-of-sample results also presented very good fits to the withheld data (R(sup 2) = 0.87, R(sup)2 = 0.87). In addition, our results revealed very little bias in the predicted concentrations (Slope of predictions versus withheld observations = 0.99). Our daily model results show high predictive accuracy at high spatial resolutions and will be useful in reconstructing exposure histories for epidemiological studies across this region.

  13. A New Hybrid Spatio-Temporal Model For Estimating Daily Multi-Year PM2.5 Concentrations Across Northeastern USA Using High Resolution Aerosol Optical Depth Data.

    PubMed

    Kloog, Itai; Chudnovsky, Alexandra A; Just, Allan C; Nordio, Francesco; Koutrakis, Petros; Coull, Brent A; Lyapustin, Alexei; Wang, Yujie; Schwartz, Joel

    2014-10-01

    The use of satellite-based aerosol optical depth (AOD) to estimate fine particulate matter (PM 2.5 ) for epidemiology studies has increased substantially over the past few years. These recent studies often report moderate predictive power, which can generate downward bias in effect estimates. In addition, AOD measurements have only moderate spatial resolution, and have substantial missing data. We make use of recent advances in MODIS satellite data processing algorithms (Multi-Angle Implementation of Atmospheric Correction (MAIAC), which allow us to use 1 km (versus currently available 10 km) resolution AOD data. We developed and cross validated models to predict daily PM 2.5 at a 1×1km resolution across the northeastern USA (New England, New York and New Jersey) for the years 2003-2011, allowing us to better differentiate daily and long term exposure between urban, suburban, and rural areas. Additionally, we developed an approach that allows us to generate daily high-resolution 200 m localized predictions representing deviations from the area 1×1 km grid predictions. We used mixed models regressing PM 2.5 measurements against day-specific random intercepts, and fixed and random AOD and temperature slopes. We then use generalized additive mixed models with spatial smoothing to generate grid cell predictions when AOD was missing. Finally, to get 200 m localized predictions, we regressed the residuals from the final model for each monitor against the local spatial and temporal variables at each monitoring site. Our model performance was excellent (mean out-of-sample R 2 =0.88). The spatial and temporal components of the out-of-sample results also presented very good fits to the withheld data (R 2 =0.87, R 2 =0.87). In addition, our results revealed very little bias in the predicted concentrations (Slope of predictions versus withheld observations = 0.99). Our daily model results show high predictive accuracy at high spatial resolutions and will be useful in reconstructing exposure histories for epidemiological studies across this region.

  14. A New Hybrid Spatio-Temporal Model For Estimating Daily Multi-Year PM2.5 Concentrations Across Northeastern USA Using High Resolution Aerosol Optical Depth Data

    PubMed Central

    Kloog, Itai; Chudnovsky, Alexandra A.; Just, Allan C.; Nordio, Francesco; Koutrakis, Petros; Coull, Brent A.; Lyapustin, Alexei; Wang, Yujie; Schwartz, Joel

    2017-01-01

    Background The use of satellite-based aerosol optical depth (AOD) to estimate fine particulate matter (PM2.5) for epidemiology studies has increased substantially over the past few years. These recent studies often report moderate predictive power, which can generate downward bias in effect estimates. In addition, AOD measurements have only moderate spatial resolution, and have substantial missing data. Methods We make use of recent advances in MODIS satellite data processing algorithms (Multi-Angle Implementation of Atmospheric Correction (MAIAC), which allow us to use 1 km (versus currently available 10 km) resolution AOD data. We developed and cross validated models to predict daily PM2.5 at a 1×1km resolution across the northeastern USA (New England, New York and New Jersey) for the years 2003–2011, allowing us to better differentiate daily and long term exposure between urban, suburban, and rural areas. Additionally, we developed an approach that allows us to generate daily high-resolution 200 m localized predictions representing deviations from the area 1×1 km grid predictions. We used mixed models regressing PM2.5 measurements against day-specific random intercepts, and fixed and random AOD and temperature slopes. We then use generalized additive mixed models with spatial smoothing to generate grid cell predictions when AOD was missing. Finally, to get 200 m localized predictions, we regressed the residuals from the final model for each monitor against the local spatial and temporal variables at each monitoring site. Results Our model performance was excellent (mean out-of-sample R2=0.88). The spatial and temporal components of the out-of-sample results also presented very good fits to the withheld data (R2=0.87, R2=0.87). In addition, our results revealed very little bias in the predicted concentrations (Slope of predictions versus withheld observations = 0.99). Conclusion Our daily model results show high predictive accuracy at high spatial resolutions and will be useful in reconstructing exposure histories for epidemiological studies across this region. PMID:28966552

  15. A new approach to correct the QT interval for changes in heart rate using a nonparametric regression model in beagle dogs.

    PubMed

    Watanabe, Hiroyuki; Miyazaki, Hiroyasu

    2006-01-01

    Over- and/or under-correction of QT intervals for changes in heart rate may lead to misleading conclusions and/or masking the potential of a drug to prolong the QT interval. This study examines a nonparametric regression model (Loess Smoother) to adjust the QT interval for differences in heart rate, with an improved fitness over a wide range of heart rates. 240 sets of (QT, RR) observations collected from each of 8 conscious and non-treated beagle dogs were used as the materials for investigation. The fitness of the nonparametric regression model to the QT-RR relationship was compared with four models (individual linear regression, common linear regression, and Bazett's and Fridericia's correlation models) with reference to Akaike's Information Criterion (AIC). Residuals were visually assessed. The bias-corrected AIC of the nonparametric regression model was the best of the models examined in this study. Although the parametric models did not fit, the nonparametric regression model improved the fitting at both fast and slow heart rates. The nonparametric regression model is the more flexible method compared with the parametric method. The mathematical fit for linear regression models was unsatisfactory at both fast and slow heart rates, while the nonparametric regression model showed significant improvement at all heart rates in beagle dogs.

  16. Improvement of near infrared spectroscopic (NIRS) analysis of caffeine in roasted Arabica coffee by variable selection method of stability competitive adaptive reweighted sampling (SCARS).

    PubMed

    Zhang, Xuan; Li, Wei; Yin, Bin; Chen, Weizhong; Kelly, Declan P; Wang, Xiaoxin; Zheng, Kaiyi; Du, Yiping

    2013-10-01

    Coffee is the most heavily consumed beverage in the world after water, for which quality is a key consideration in commercial trade. Therefore, caffeine content which has a significant effect on the final quality of the coffee products requires to be determined fast and reliably by new analytical techniques. The main purpose of this work was to establish a powerful and practical analytical method based on near infrared spectroscopy (NIRS) and chemometrics for quantitative determination of caffeine content in roasted Arabica coffees. Ground coffee samples within a wide range of roasted levels were analyzed by NIR, meanwhile, in which the caffeine contents were quantitative determined by the most commonly used HPLC-UV method as the reference values. Then calibration models based on chemometric analyses of the NIR spectral data and reference concentrations of coffee samples were developed. Partial least squares (PLS) regression was used to construct the models. Furthermore, diverse spectra pretreatment and variable selection techniques were applied in order to obtain robust and reliable reduced-spectrum regression models. Comparing the respective quality of the different models constructed, the application of second derivative pretreatment and stability competitive adaptive reweighted sampling (SCARS) variable selection provided a notably improved regression model, with root mean square error of cross validation (RMSECV) of 0.375 mg/g and correlation coefficient (R) of 0.918 at PLS factor of 7. An independent test set was used to assess the model, with the root mean square error of prediction (RMSEP) of 0.378 mg/g, mean relative error of 1.976% and mean relative standard deviation (RSD) of 1.707%. Thus, the results provided by the high-quality calibration model revealed the feasibility of NIR spectroscopy for at-line application to predict the caffeine content of unknown roasted coffee samples, thanks to the short analysis time of a few seconds and non-destructive advantages of NIRS. Copyright © 2013 Elsevier B.V. All rights reserved.

  17. Improvement of near infrared spectroscopic (NIRS) analysis of caffeine in roasted Arabica coffee by variable selection method of stability competitive adaptive reweighted sampling (SCARS)

    NASA Astrophysics Data System (ADS)

    Zhang, Xuan; Li, Wei; Yin, Bin; Chen, Weizhong; Kelly, Declan P.; Wang, Xiaoxin; Zheng, Kaiyi; Du, Yiping

    2013-10-01

    Coffee is the most heavily consumed beverage in the world after water, for which quality is a key consideration in commercial trade. Therefore, caffeine content which has a significant effect on the final quality of the coffee products requires to be determined fast and reliably by new analytical techniques. The main purpose of this work was to establish a powerful and practical analytical method based on near infrared spectroscopy (NIRS) and chemometrics for quantitative determination of caffeine content in roasted Arabica coffees. Ground coffee samples within a wide range of roasted levels were analyzed by NIR, meanwhile, in which the caffeine contents were quantitative determined by the most commonly used HPLC-UV method as the reference values. Then calibration models based on chemometric analyses of the NIR spectral data and reference concentrations of coffee samples were developed. Partial least squares (PLS) regression was used to construct the models. Furthermore, diverse spectra pretreatment and variable selection techniques were applied in order to obtain robust and reliable reduced-spectrum regression models. Comparing the respective quality of the different models constructed, the application of second derivative pretreatment and stability competitive adaptive reweighted sampling (SCARS) variable selection provided a notably improved regression model, with root mean square error of cross validation (RMSECV) of 0.375 mg/g and correlation coefficient (R) of 0.918 at PLS factor of 7. An independent test set was used to assess the model, with the root mean square error of prediction (RMSEP) of 0.378 mg/g, mean relative error of 1.976% and mean relative standard deviation (RSD) of 1.707%. Thus, the results provided by the high-quality calibration model revealed the feasibility of NIR spectroscopy for at-line application to predict the caffeine content of unknown roasted coffee samples, thanks to the short analysis time of a few seconds and non-destructive advantages of NIRS.

  18. Simulation of CO2 Solubility in Polystyrene-b-Polybutadieneb-Polystyrene (SEBS) by artificial intelligence network (ANN) method

    NASA Astrophysics Data System (ADS)

    Sharudin, R. W.; AbdulBari Ali, S.; Zulkarnain, M.; Shukri, M. A.

    2018-05-01

    This study reports on the integration of Artificial Neural Network (ANNs) with experimental data in predicting the solubility of carbon dioxide (CO2) blowing agent in SEBS by generating highest possible value for Regression coefficient (R2). Basically, foaming of thermoplastic elastomer with CO2 is highly affected by the CO2 solubility. The ability of ANN in predicting interpolated data of CO2 solubility was investigated by comparing training results via different method of network training. Regards to the final prediction result for CO2 solubility by ANN, the prediction trend (output generate) was corroborated with the experimental results. The obtained result of different method of training showed the trend of output generated by Gradient Descent with Momentum & Adaptive LR (traingdx) required longer training time and required more accurate input to produce better output with final Regression Value of 0.88. However, it goes vice versa with Levenberg-Marquardt (trainlm) technique as it produced better output in quick detention time with final Regression Value of 0.91.

  19. Analysis of the inter- and extracellular formation of platinum nanoparticles by Fusarium oxysporum f. sp. lycopersici using response surface methodology

    NASA Astrophysics Data System (ADS)

    Riddin, T. L.; Gericke, M.; Whiteley, C. G.

    2006-07-01

    Fusarium oxysporum fungal strain was screened and found to be successful for the inter- and extracellular production of platinum nanoparticles. Nanoparticle formation was visually observed, over time, by the colour of the extracellular solution and/or the fungal biomass turning from yellow to dark brown, and their concentration was determined from the amount of residual hexachloroplatinic acid measured from a standard curve at 456 nm. The extracellular nanoparticles were characterized by transmission electron microscopy. Nanoparticles of varying size (10-100 nm) and shape (hexagons, pentagons, circles, squares, rectangles) were produced at both extracellular and intercellular levels by the Fusarium oxysporum. The particles precipitate out of solution and bioaccumulate by nucleation either intercellularly, on the cell wall/membrane, or extracellularly in the surrounding medium. The importance of pH, temperature and hexachloroplatinic acid (H2PtCl6) concentration in nanoparticle formation was examined through the use of a statistical response surface methodology. Only the extracellular production of nanoparticles proved to be statistically significant, with a concentration yield of 4.85 mg l-1 estimated by a first-order regression model. From a second-order polynomial regression, the predicted yield of nanoparticles increased to 5.66 mg l-1 and, after a backward step, regression gave a final model with a yield of 6.59 mg l-1.

  20. Role of social support, hardiness, and acculturation as predictors of mental health among international students of Asian Indian origin.

    PubMed

    Atri, Ashutosh; Sharma, Manoj; Cottrell, Randall

    This study determined the role of social support, hardiness, and acculturation as predictors of mental health among international Asian Indian students enrolled at two large public universities in Ohio. A sample of 185 students completed a 75-item online instrument assessing their social support levels, acculturation, hardiness, and their mental health. Regression analyses were conducted to test for variance in mental health attributable to each of the three independent variables. The final regression model revealed that the belonging aspect of social support, acculturation and prejudice of acculturation scale, and commitment and control of hardiness were all predictive of mental health (R2 = 0.523). Recommendations have been offered to develop interventions that will help strengthen the social support, hardiness, and acculturation of international students and help improve their mental health. Recommendations for development of future Web-based studies also are offered.

  1. Environmental fate model for ultra-low-volume insecticide applications used for adult mosquito management

    USGS Publications Warehouse

    Schleier, Jerome J.; Peterson, Robert K.D.; Irvine, Kathryn M.; Marshall, Lucy M.; Weaver, David K.; Preftakes, Collin J.

    2012-01-01

    One of the more effective ways of managing high densities of adult mosquitoes that vector human and animal pathogens is ultra-low-volume (ULV) aerosol applications of insecticides. The U.S. Environmental Protection Agency uses models that are not validated for ULV insecticide applications and exposure assumptions to perform their human and ecological risk assessments. Currently, there is no validated model that can accurately predict deposition of insecticides applied using ULV technology for adult mosquito management. In addition, little is known about the deposition and drift of small droplets like those used under conditions encountered during ULV applications. The objective of this study was to perform field studies to measure environmental concentrations of insecticides and to develop a validated model to predict the deposition of ULV insecticides. The final regression model was selected by minimizing the Bayesian Information Criterion and its prediction performance was evaluated using k-fold cross validation. Density of the formulation and the density and CMD interaction coefficients were the largest in the model. The results showed that as density of the formulation decreases, deposition increases. The interaction of density and CMD showed that higher density formulations and larger droplets resulted in greater deposition. These results are supported by the aerosol physics literature. A k-fold cross validation demonstrated that the mean square error of the selected regression model is not biased, and the mean square error and mean square prediction error indicated good predictive ability.

  2. A crash-prediction model for multilane roads.

    PubMed

    Caliendo, Ciro; Guida, Maurizio; Parisi, Alessandra

    2007-07-01

    Considerable research has been carried out in recent years to establish relationships between crashes and traffic flow, geometric infrastructure characteristics and environmental factors for two-lane rural roads. Crash-prediction models focused on multilane rural roads, however, have rarely been investigated. In addition, most research has paid but little attention to the safety effects of variables such as stopping sight distance and pavement surface characteristics. Moreover, the statistical approaches have generally included Poisson and Negative Binomial regression models, whilst Negative Multinomial regression model has been used to a lesser extent. Finally, as far as the authors are aware, prediction models involving all the above-mentioned factors have still not been developed in Italy for multilane roads, such as motorways. Thus, in this paper crash-prediction models for a four-lane median-divided Italian motorway were set up on the basis of accident data observed during a 5-year monitoring period extending between 1999 and 2003. The Poisson, Negative Binomial and Negative Multinomial regression models, applied separately to tangents and curves, were used to model the frequency of accident occurrence. Model parameters were estimated by the Maximum Likelihood Method, and the Generalized Likelihood Ratio Test was applied to detect the significant variables to be included in the model equation. Goodness-of-fit was measured by means of both the explained fraction of total variation and the explained fraction of systematic variation. The Cumulative Residuals Method was also used to test the adequacy of a regression model throughout the range of each variable. The candidate set of explanatory variables was: length (L), curvature (1/R), annual average daily traffic (AADT), sight distance (SD), side friction coefficient (SFC), longitudinal slope (LS) and the presence of a junction (J). Separate prediction models for total crashes and for fatal and injury crashes only were considered. For curves it is shown that significant variables are L, 1/R and AADT, whereas for tangents they are L, AADT and junctions. The effect of rain precipitation was analysed on the basis of hourly rainfall data and assumptions about drying time. It is shown that a wet pavement significantly increases the number of crashes. The models developed in this paper for Italian motorways appear to be useful for many applications such as the detection of critical factors, the estimation of accident reduction due to infrastructure and pavement improvement, and the predictions of accidents counts when comparing different design options. Thus this research may represent a point of reference for engineers in adjusting or designing multilane roads.

  3. Predicting the chance of live birth for women undergoing IVF: a novel pretreatment counselling tool.

    PubMed

    Dhillon, R K; McLernon, D J; Smith, P P; Fishel, S; Dowell, K; Deeks, J J; Bhattacharya, S; Coomarasamy, A

    2016-01-01

    Which pretreatment patient variables have an effect on live birth rates following assisted conception? The predictors in the final multivariate logistic regression model found to be significantly associated with reduced chances of IVF/ICSI success were increasing age (particularly above 36 years), tubal factor infertility, unexplained infertility and Asian or Black ethnicity. The two most widely recognized prediction models for live birth following IVF were developed on data from 1991 to 2007; pre-dating significant changes in clinical practice. These existing IVF outcome prediction models do not incorporate key pretreatment predictors, such as BMI, ethnicity and ovarian reserve, which are readily available now. In this cohort study a model to predict live birth was derived using data collected from 9915 women who underwent IVF/ICSI treatment at any CARE (Centres for Assisted Reproduction) clinic from 2008 to 2012. Model validation was performed on data collected from 2723 women who underwent treatment in 2013. The primary outcome for the model was live birth, which was defined as any birth event in which at least one baby was born alive and survived for more than 1 month. Data were collected from 12 fertility clinics within the CARE consortium in the UK. Multivariable logistic regression was used to develop the model. Discriminatory ability was assessed using the area under receiver operating characteristic (AUROC) curve, and calibration was assessed using calibration-in-the-large and the calibration slope test. The predictors in the final model were female age, BMI, ethnicity, antral follicle count (AFC), previous live birth, previous miscarriage, cause and duration of infertility. Upon assessing predictive ability, the AUROC curve for the final model and validation cohort was (0.62; 95% confidence interval (CI) 0.61-0.63) and (0.62; 95% CI 0.60-0.64) respectively. Calibration-in-the-large showed a systematic over-estimation of the predicted probability of live birth (Intercept (95% CI) = -0.168 (-0.252 to -0.084), P < 0.001). However, the calibration slope test was not significant (slope (95% CI) = 1.129 (0.893-1.365), P = 0.28). Due to the calibration-in-the-large test being significant we recalibrated the final model. The recalibrated model showed a much-improved calibration. Our model is unable to account for factors such as smoking and alcohol that can affect IVF/ICSI outcome and is somewhat restricted to representing the ethnic distribution and outcomes for the UK population only. We were unable to account for socioeconomic status and it may be that by having 75% of the population paying privately for their treatment, the results cannot be generalized to people of all socioeconomic backgrounds. In addition, patients and clinicians should understand this model is designed for use before treatment begins and does not include variables that become available (oocyte, embryo and endometrial) as treatment progresses. Finally, this model is also limited to use prior to first cycle only. To our knowledge, this is the first study to present a novel, up-to-date model encompassing three readily available prognostic factors; female BMI, ovarian reserve and ethnicity, which have not previously been used in prediction models for IVF outcome. Following geographical validation, the model can be used to build a user-friendly interface to aid decision-making for couples and their clinicians. Thereafter, a feasibility study of its implementation could focus on patient acceptability and quality of decision-making. None. © The Author 2015. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  4. Modified Regression Correlation Coefficient for Poisson Regression Model

    NASA Astrophysics Data System (ADS)

    Kaengthong, Nattacha; Domthong, Uthumporn

    2017-09-01

    This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).

  5. Predicting perceptual quality of images in realistic scenario using deep filter banks

    NASA Astrophysics Data System (ADS)

    Zhang, Weixia; Yan, Jia; Hu, Shiyong; Ma, Yang; Deng, Dexiang

    2018-03-01

    Classical image perceptual quality assessment models usually resort to natural scene statistic methods, which are based on an assumption that certain reliable statistical regularities hold on undistorted images and will be corrupted by introduced distortions. However, these models usually fail to accurately predict degradation severity of images in realistic scenarios since complex, multiple, and interactive authentic distortions usually appear on them. We propose a quality prediction model based on convolutional neural network. Quality-aware features extracted from filter banks of multiple convolutional layers are aggregated into the image representation. Furthermore, an easy-to-implement and effective feature selection strategy is used to further refine the image representation and finally a linear support vector regression model is trained to map image representation into images' subjective perceptual quality scores. The experimental results on benchmark databases present the effectiveness and generalizability of the proposed model.

  6. Multivariate random regression analysis for body weight and main morphological traits in genetically improved farmed tilapia (Oreochromis niloticus).

    PubMed

    He, Jie; Zhao, Yunfeng; Zhao, Jingli; Gao, Jin; Han, Dandan; Xu, Pao; Yang, Runqing

    2017-11-02

    Because of their high economic importance, growth traits in fish are under continuous improvement. For growth traits that are recorded at multiple time-points in life, the use of univariate and multivariate animal models is limited because of the variable and irregular timing of these measures. Thus, the univariate random regression model (RRM) was introduced for the genetic analysis of dynamic growth traits in fish breeding. We used a multivariate random regression model (MRRM) to analyze genetic changes in growth traits recorded at multiple time-point of genetically-improved farmed tilapia. Legendre polynomials of different orders were applied to characterize the influences of fixed and random effects on growth trajectories. The final MRRM was determined by optimizing the univariate RRM for the analyzed traits separately via penalizing adaptively the likelihood statistical criterion, which is superior to both the Akaike information criterion and the Bayesian information criterion. In the selected MRRM, the additive genetic effects were modeled by Legendre polynomials of three orders for body weight (BWE) and body length (BL) and of two orders for body depth (BD). By using the covariance functions of the MRRM, estimated heritabilities were between 0.086 and 0.628 for BWE, 0.155 and 0.556 for BL, and 0.056 and 0.607 for BD. Only heritabilities for BD measured from 60 to 140 days of age were consistently higher than those estimated by the univariate RRM. All genetic correlations between growth time-points exceeded 0.5 for either single or pairwise time-points. Moreover, correlations between early and late growth time-points were lower. Thus, for phenotypes that are measured repeatedly in aquaculture, an MRRM can enhance the efficiency of the comprehensive selection for BWE and the main morphological traits.

  7. Representation of limb kinematics in Purkinje cell simple spike discharge is conserved across multiple tasks

    PubMed Central

    Hewitt, Angela L.; Popa, Laurentiu S.; Pasalar, Siavash; Hendrix, Claudia M.

    2011-01-01

    Encoding of movement kinematics in Purkinje cell simple spike discharge has important implications for hypotheses of cerebellar cortical function. Several outstanding questions remain regarding representation of these kinematic signals. It is uncertain whether kinematic encoding occurs in unpredictable, feedback-dependent tasks or kinematic signals are conserved across tasks. Additionally, there is a need to understand the signals encoded in the instantaneous discharge of single cells without averaging across trials or time. To address these questions, this study recorded Purkinje cell firing in monkeys trained to perform a manual random tracking task in addition to circular tracking and center-out reach. Random tracking provides for extensive coverage of kinematic workspaces. Direction and speed errors are significantly greater during random than circular tracking. Cross-correlation analyses comparing hand and target velocity profiles show that hand velocity lags target velocity during random tracking. Correlations between simple spike firing from 120 Purkinje cells and hand position, velocity, and speed were evaluated with linear regression models including a time constant, τ, as a measure of the firing lead/lag relative to the kinematic parameters. Across the population, velocity accounts for the majority of simple spike firing variability (63 ± 30% of Radj2), followed by position (28 ± 24% of Radj2) and speed (11 ± 19% of Radj2). Simple spike firing often leads hand kinematics. Comparison of regression models based on averaged vs. nonaveraged firing and kinematics reveals lower Radj2 values for nonaveraged data; however, regression coefficients and τ values are highly similar. Finally, for most cells, model coefficients generated from random tracking accurately estimate simple spike firing in either circular tracking or center-out reach. These findings imply that the cerebellum controls movement kinematics, consistent with a forward internal model that predicts upcoming limb kinematics. PMID:21795616

  8. Prediction model of critical weight loss in cancer patients during particle therapy.

    PubMed

    Zhang, Zhihong; Zhu, Yu; Zhang, Lijuan; Wang, Ziying; Wan, Hongwei

    2018-01-01

    The objective of this study is to investigate the predictors of critical weight loss in cancer patients receiving particle therapy, and build a prediction model based on its predictive factors. Patients receiving particle therapy were enroled between June 2015 and June 2016. Body weight was measured at the start and end of particle therapy. Association between critical weight loss (defined as >5%) during particle therapy and patients' demographic, clinical characteristic, pre-therapeutic nutrition risk screening (NRS 2002) and BMI were evaluated by logistic regression and decision tree analysis. Finally, 375 cancer patients receiving particle therapy were included. Mean weight loss was 0.55 kg, and 11.5% of patients experienced critical weight loss during particle therapy. The main predictors of critical weight loss during particle therapy were head and neck tumour location, total radiation dose ≥70 Gy on the primary tumour, and without post-surgery, as indicated by both logistic regression and decision tree analysis. Prediction model that includes tumour locations, total radiation dose and post-surgery had a good predictive ability, with the area under receiver operating characteristic curve 0.79 (95% CI: 0.71-0.88) and 0.78 (95% CI: 0.69-0.86) for decision tree and logistic regression model, respectively. Cancer patients with head and neck tumour location, total radiation dose ≥70 Gy and without post-surgery were at higher risk of critical weight loss during particle therapy, and early intensive nutrition counselling or intervention should be target at this population. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  9. Predicting use of effective vegetable parenting practices with the Model of Goal Directed Behavior.

    PubMed

    Diep, Cassandra S; Beltran, Alicia; Chen, Tzu-An; Thompson, Debbe; O'Connor, Teresia; Hughes, Sheryl; Baranowski, Janice; Baranowski, Tom

    2015-06-01

    To model effective vegetable parenting practices using the Model of Goal Directed Vegetable Parenting Practices construct scales. An Internet survey was conducted with parents of pre-school children to assess their agreement with effective vegetable parenting practices and Model of Goal Directed Vegetable Parenting Practices items. Block regression modelling was conducted using the composite score of effective vegetable parenting practices scales as the outcome variable and the Model of Goal Directed Vegetable Parenting Practices constructs as predictors in separate and sequential blocks: demographics, intention, desire (intrinsic motivation), perceived barriers, autonomy, relatedness, self-efficacy, habit, anticipated emotions, perceived behavioural control, attitudes and lastly norms. Backward deletion was employed at the end for any variable not significant at P<0·05. Houston, TX, USA. Three hundred and seven parents (mostly mothers) of pre-school children. Significant predictors in the final model in order of relationship strength included habit of active child involvement in vegetable selection, habit of positive vegetable communications, respondent not liking vegetables, habit of keeping a positive vegetable environment and perceived behavioural control of having a positive influence on child's vegetable consumption. The final model's adjusted R 2 was 0·486. This was the first study to test scales from a behavioural model to predict effective vegetable parenting practices. Further research needs to assess these Model of Goal Directed Vegetable Parenting Practices scales for their (i) predictiveness of child consumption of vegetables in longitudinal samples and (ii) utility in guiding design of vegetable parenting practices interventions.

  10. Determining association constants from titration experiments in supramolecular chemistry.

    PubMed

    Thordarson, Pall

    2011-03-01

    The most common approach for quantifying interactions in supramolecular chemistry is a titration of the guest to solution of the host, noting the changes in some physical property through NMR, UV-Vis, fluorescence or other techniques. Despite the apparent simplicity of this approach, there are several issues that need to be carefully addressed to ensure that the final results are reliable. This includes the use of non-linear rather than linear regression methods, careful choice of stoichiometric binding model, the choice of method (e.g., NMR vs. UV-Vis) and concentration of host, the application of advanced data analysis methods such as global analysis and finally the estimation of uncertainties and confidence intervals for the results obtained. This tutorial review will give a systematic overview of all these issues-highlighting some of the key messages herein with simulated data analysis examples.

  11. The impact of global signal regression on resting state correlations: Are anti-correlated networks introduced?

    PubMed Central

    Murphy, Kevin; Birn, Rasmus M.; Handwerker, Daniel A.; Jones, Tyler B.; Bandettini, Peter A.

    2009-01-01

    Low-frequency fluctuations in fMRI signal have been used to map several consistent resting state networks in the brain. Using the posterior cingulate cortex as a seed region, functional connectivity analyses have found not only positive correlations in the default mode network but negative correlations in another resting state network related to attentional processes. The interpretation is that the human brain is intrinsically organized into dynamic, anti-correlated functional networks. Global variations of the BOLD signal are often considered nuisance effects and are commonly removed using a general linear model (GLM) technique. This global signal regression method has been shown to introduce negative activation measures in standard fMRI analyses. The topic of this paper is whether such a correction technique could be the cause of anti-correlated resting state networks in functional connectivity analyses. Here we show that, after global signal regression, correlation values to a seed voxel must sum to a negative value. Simulations also show that small phase differences between regions can lead to spurious negative correlation values. A combination breath holding and visual task demonstrates that the relative phase of global and local signals can affect connectivity measures and that, experimentally, global signal regression leads to bell-shaped correlation value distributions, centred on zero. Finally, analyses of negatively correlated networks in resting state data show that global signal regression is most likely the cause of anti-correlations. These results call into question the interpretation of negatively correlated regions in the brain when using global signal regression as an initial processing step. PMID:18976716

  12. The impact of global signal regression on resting state correlations: are anti-correlated networks introduced?

    PubMed

    Murphy, Kevin; Birn, Rasmus M; Handwerker, Daniel A; Jones, Tyler B; Bandettini, Peter A

    2009-02-01

    Low-frequency fluctuations in fMRI signal have been used to map several consistent resting state networks in the brain. Using the posterior cingulate cortex as a seed region, functional connectivity analyses have found not only positive correlations in the default mode network but negative correlations in another resting state network related to attentional processes. The interpretation is that the human brain is intrinsically organized into dynamic, anti-correlated functional networks. Global variations of the BOLD signal are often considered nuisance effects and are commonly removed using a general linear model (GLM) technique. This global signal regression method has been shown to introduce negative activation measures in standard fMRI analyses. The topic of this paper is whether such a correction technique could be the cause of anti-correlated resting state networks in functional connectivity analyses. Here we show that, after global signal regression, correlation values to a seed voxel must sum to a negative value. Simulations also show that small phase differences between regions can lead to spurious negative correlation values. A combination breath holding and visual task demonstrates that the relative phase of global and local signals can affect connectivity measures and that, experimentally, global signal regression leads to bell-shaped correlation value distributions, centred on zero. Finally, analyses of negatively correlated networks in resting state data show that global signal regression is most likely the cause of anti-correlations. These results call into question the interpretation of negatively correlated regions in the brain when using global signal regression as an initial processing step.

  13. Regression models for predicting peak and continuous three-dimensional spinal loads during symmetric and asymmetric lifting tasks.

    PubMed

    Fathallah, F A; Marras, W S; Parnianpour, M

    1999-09-01

    Most biomechanical assessments of spinal loading during industrial work have focused on estimating peak spinal compressive forces under static and sagittally symmetric conditions. The main objective of this study was to explore the potential of feasibly predicting three-dimensional (3D) spinal loading in industry from various combinations of trunk kinematics, kinetics, and subject-load characteristics. The study used spinal loading, predicted by a validated electromyography-assisted model, from 11 male participants who performed a series of symmetric and asymmetric lifts. Three classes of models were developed: (a) models using workplace, subject, and trunk motion parameters as independent variables (kinematic models); (b) models using workplace, subject, and measured moments variables (kinetic models); and (c) models incorporating workplace, subject, trunk motion, and measured moments variables (combined models). The results showed that peak 3D spinal loading during symmetric and asymmetric lifting were predicted equally well using all three types of regression models. Continuous 3D loading was predicted best using the combined models. When the use of such models is infeasible, the kinematic models can provide adequate predictions. Finally, lateral shear forces (peak and continuous) were consistently underestimated using all three types of models. The study demonstrated the feasibility of predicting 3D loads on the spine under specific symmetric and asymmetric lifting tasks without the need for collecting EMG information. However, further validation and development of the models should be conducted to assess and extend their applicability to lifting conditions other than those presented in this study. Actual or potential applications of this research include exposure assessment in epidemiological studies, ergonomic intervention, and laboratory task assessment.

  14. High-resolution daily gridded datasets of air temperature and wind speed for Europe

    NASA Astrophysics Data System (ADS)

    Brinckmann, S.; Krähenmann, S.; Bissolli, P.

    2015-08-01

    New high-resolution datasets for near surface daily air temperature (minimum, maximum and mean) and daily mean wind speed for Europe (the CORDEX domain) are provided for the period 2001-2010 for the purpose of regional model validation in the framework of DecReg, a sub-project of the German MiKlip project, which aims to develop decadal climate predictions. The main input data sources are hourly SYNOP observations, partly supplemented by station data from the ECA&D dataset (http://www.ecad.eu). These data are quality tested to eliminate erroneous data and various kinds of inhomogeneities. Grids in a resolution of 0.044° (5 km) are derived by spatial interpolation of these station data into the CORDEX area. For temperature interpolation a modified version of a regression kriging method developed by Krähenmann et al. (2011) is used. At first, predictor fields of altitude, continentality and zonal mean temperature are chosen for a regression applied to monthly station data. The residuals of the monthly regression and the deviations of the daily data from the monthly averages are interpolated using simple kriging in a second and third step. For wind speed a new method based on the concept used for temperature was developed, involving predictor fields of exposure, roughness length, coastal distance and ERA Interim reanalysis wind speed at 850 hPa. Interpolation uncertainty is estimated by means of the kriging variance and regression uncertainties. Furthermore, to assess the quality of the final daily grid data, cross validation is performed. Explained variance ranges from 70 to 90 % for monthly temperature and from 50 to 60 % for monthly wind speed. The resulting RMSE for the final daily grid data amounts to 1-2 °C and 1-1.5 m s-1 (depending on season and parameter) for daily temperature parameters and daily mean wind speed, respectively. The datasets presented in this article are published at http://dx.doi.org/10.5676/DWD_CDC/DECREG0110v1.

  15. Health related quality of life in parents of six to eight year old children with Down syndrome.

    PubMed

    Marchal, Jan Pieter; Maurice-Stam, Heleen; Hatzmann, Janneke; van Trotsenburg, A S Paul; Grootenhuis, Martha A

    2013-11-01

    Raising a child with Down syndrome (DS) has been found to be associated with lowered health related quality of life (HRQoL) in the domains cognitive functioning, social functioning, daily activities and vitality. We aimed to explore which socio-demographics, child functioning and psychosocial variables were related to these HRQoL domains in parents of children with DS. Parents of 98 children with DS completed the TNO-AZL adult quality of life questionnaire (TAAQOL) and a questionnaire assessing socio-demographic, child functioning and psychosocial predictors. Using multiple linear regression analyses for each category of predictors, we selected relevant predictors for the final models. The final multiple linear regression models revealed that cognitive functioning was best predicted by the sleep of the child (β=.29, p<.01) and by the parent having given up a hobby (β=-.29, p<.01), social functioning by the quality of the partner relation (β=.34, p<.001), daily activities by the parent having to care for an ill friend or family member (β=-.31, p<.01), and vitality by the parent having enough personal time (β=.32, p<.01). Overall, psychosocial variables rather than socio-demographics or child functioning showed most consistent and powerful relations to the HRQoL domains of cognitive functioning, social functioning, daily activities and vitality. These psychosocial variables mainly related to social support and time pressure. Systematic screening of parents to detect problems timely, and interventions targeting the supportive network and the demands in time are recommended. Copyright © 2013 Elsevier Ltd. All rights reserved.

  16. Reliability and validity of tongue color analysis in the prediction of symptom patterns in terms of East Asian Medicine.

    PubMed

    Park, Young-Jae; Lee, Jin-Moo; Yoo, Seung-Yeon; Park, Young-Bae

    2016-04-01

    To examine whether color parameters of tongue inspection (TI) using a digital camera was reliable and valid, and to examine which color parameters serve as predictors of symptom patterns in terms of East Asian medicine (EAM). Two hundred female subjects' tongue substances were photographed by a mega-pixel digital camera. Together with the photographs, the subjects were asked to complete Yin deficiency, Phlegm pattern, and Cold-Heat pattern questionnaires. Using three sets of digital imaging software, each digital image was exposure- and white balance-corrected, and finally L* (luminance), a* (red-green balance), and b* (yellow-blue balance) values of the tongues were calculated. To examine intra- and inter-rater reliabilities and criterion validity of the color analysis method, three raters were asked to calculate color parameters for 20 digital image samples. Finally, four hierarchical regression models were formed. Color parameters showed good or excellent reliability (0.627-0.887 for intra-class correlation coefficients) and significant criterion validity (0.523-0.718 for Spearman's correlation). In the hierarchical regression models, age was a significant predictor of Yin deficiency (β = 0.192), and b* value of the tip of the tongue was a determinant predictor of Yin deficiency, Phlegm, and Heat patterns (β = - 0.212, - 0.172, and - 0.163). Luminance (L*) was predictive of Yin deficiency (β = -0.172) and Cold (β = 0.173) pattern. Our results suggest that color analysis of the tongue using the L*a*b* system is reliable and valid, and that color parameters partially serve as symptom pattern predictors in EAM practice.

  17. Regression modeling of ground-water flow

    USGS Publications Warehouse

    Cooley, R.L.; Naff, R.L.

    1985-01-01

    Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)

  18. The impact of truant and alcohol-related behavior on educational aspirations: a study of US high school seniors.

    PubMed

    Barry, Adam E; Chaney, Beth; Chaney, J Don

    2011-08-01

    Truancy and alcohol use are quality indicators of academic achievement and success. However, there remains a paucity of substantive research articulating the impact these deviant behaviors have on an adolescent's educational aspirations. The purpose of this study is to assess whether recent alcohol use and truancy impact students' educational aspirations among a nationally representative sample of US high school seniors. This study conducted a secondary data analysis of the Monitoring the Future project data, 2006. Logistic regression was conducted to assess how alcohol use and truancy affected educational aspirations. Subsequent interaction effects were assessed in the final multivariable model. Demographic variables such as age, sex, race, and father and mother's educational level were included as covariates in the regression model. Results indicate that as students engage in increased alcohol use and/or truancy, educational aspirations decrease. Thus, students who indicated a desire to attend a 4-year college/university were less likely to engage in high-risk drinking behavior and/or truancy. Moreover, in testing the interaction between truancy and alcohol use, as it relates to educational aspirations, the logistic regression model found both of these independent variables to be statistically significant predictors of the likelihood students would attend a 4-year college/university. To ensure that adolescents further their education and maximize their potential life opportunities, school and public health officials should initiate efforts to reduce alcohol consumption and truancy among students. Furthermore, future research should examine the risk and protective factors that may influence one's educational aspirations. © 2011, American School Health Association.

  19. A weighted least squares estimation of the polynomial regression model on paddy production in the area of Kedah and Perlis

    NASA Astrophysics Data System (ADS)

    Musa, Rosliza; Ali, Zalila; Baharum, Adam; Nor, Norlida Mohd

    2017-08-01

    The linear regression model assumes that all random error components are identically and independently distributed with constant variance. Hence, each data point provides equally precise information about the deterministic part of the total variation. In other words, the standard deviations of the error terms are constant over all values of the predictor variables. When the assumption of constant variance is violated, the ordinary least squares estimator of regression coefficient lost its property of minimum variance in the class of linear and unbiased estimators. Weighted least squares estimation are often used to maximize the efficiency of parameter estimation. A procedure that treats all of the data equally would give less precisely measured points more influence than they should have and would give highly precise points too little influence. Optimizing the weighted fitting criterion to find the parameter estimates allows the weights to determine the contribution of each observation to the final parameter estimates. This study used polynomial model with weighted least squares estimation to investigate paddy production of different paddy lots based on paddy cultivation characteristics and environmental characteristics in the area of Kedah and Perlis. The results indicated that factors affecting paddy production are mixture fertilizer application cycle, average temperature, the squared effect of average rainfall, the squared effect of pest and disease, the interaction between acreage with amount of mixture fertilizer, the interaction between paddy variety and NPK fertilizer application cycle and the interaction between pest and disease and NPK fertilizer application cycle.

  20. An Update of the Bodeker Scientific Vertically Resolved, Global, Gap-Free Ozone Database

    NASA Astrophysics Data System (ADS)

    Kremser, S.; Bodeker, G. E.; Lewis, J.; Hassler, B.

    2016-12-01

    High vertical resolution ozone measurements from multiple satellite-based instruments have been merged with measurements from the global ozonesonde network to calculate monthly mean ozone values in 5º latitude zones. Ozone number densities and ozone mixing ratios are provided on 70 altitude levels (1 to 70 km) and on 70 pressure levels spaced approximately 1 km apart (878.4 hPa to 0.046 hPa). These data are sparse and do not cover the entire globe or altitude range. To provide a gap-free database, a least squares regression model is fitted to these data and then evaluated globally. By applying a single fit at each level, and using the approach of allowing the regression fits to change only slightly from one level to the next, the regression is less sensitive to measurement anomalies at individual stations or to individual satellite-based instruments. Particular attention is paid to ensuring that the low ozone abundances in the polar regions are captured. This presentation reports on updates to an earlier version of the vertically resolved ozone database, including the incorporation of new ozone measurements and new techniques for combining the data. Compared to previous versions of the database, particular attention is paid to avoiding spatial and temporal sampling biases and tracing uncertainties through to the final product. This updated database, developed within the New Zealand Deep South National Science Challenge, is suitable for assessing ozone fields from chemistry-climate model simulations or for providing the ozone boundary conditions for global climate model simulations that do not treat stratospheric chemistry interactively.

  1. Application of machine learning for the evaluation of turfgrass plots using aerial images

    NASA Astrophysics Data System (ADS)

    Ding, Ke; Raheja, Amar; Bhandari, Subodh; Green, Robert L.

    2016-05-01

    Historically, investigation of turfgrass characteristics have been limited to visual ratings. Although relevant information may result from such evaluations, final inferences may be questionable because of the subjective nature in which the data is collected. Recent advances in computer vision techniques allow researchers to objectively measure turfgrass characteristics such as percent ground cover, turf color, and turf quality from the digital images. This paper focuses on developing a methodology for automated assessment of turfgrass quality from aerial images. Images of several turfgrass plots of varying quality were gathered using a camera mounted on an unmanned aerial vehicle. The quality of these plots were also evaluated based on visual ratings. The goal was to use the aerial images to generate quality evaluations on a regular basis for the optimization of water treatment. Aerial images are used to train a neural network so that appropriate features such as intensity, color, and texture of the turfgrass are extracted from these images. Neural network is a nonlinear classifier commonly used in machine learning. The output of the neural network trained model is the ratings of the grass, which is compared to the visual ratings. Currently, the quality and the color of turfgrass, measured as the greenness of the grass, are evaluated. The textures are calculated using the Gabor filter and co-occurrence matrix. Other classifiers such as support vector machines and simpler linear regression models such as Ridge regression and LARS regression are also used. The performance of each model is compared. The results show encouraging potential for using machine learning techniques for the evaluation of turfgrass quality and color.

  2. Heterogeneous models for an early discrimination between sepsis and non-infective SIRS in medical ward patients: a pilot study.

    PubMed

    Mearelli, Filippo; Fiotti, Nicola; Altamura, Nicola; Zanetti, Michela; Fernandes, Giovanni; Burekovic, Ismet; Occhipinti, Alessandro; Orso, Daniele; Giansante, Carlo; Casarsa, Chiara; Biolo, Gianni

    2014-10-01

    The objective of the study was to determine the accuracy of phospholipase A2 group II (PLA2-II), interferon-gamma-inducible protein 10 (IP-10), angiopoietin-2 (Ang-2), and procalcitonin (PCT) plasma levels in early ruling in/out of sepsis among systemic inflammatory response syndrome (SIRS) patients. Biomarker levels were determined in 80 SIRS patients during the first 4 h of admission to the medical ward. The final diagnosis of sepsis or non-infective SIRS was issued according to good clinical practice. Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) for sepsis diagnosis were assessed. The optimal biomarker combinations with clinical variables were investigated by logistic regression and decision tree (CART). PLA2-II, IP-10 and PCT, but not Ang-2, were significantly higher in septic (n = 60) than in non-infective SIRS (n = 20) patients (P ≤ 0.001, 0.027, and 0.002, respectively). PLA2-II PPV and NPV were 88 and 86%, respectively. The corresponding figures were 100 and 31% for IP-10, and 93 and 35% for PCT. Binary logistic regression model had 100% PPV and NPV, while manual and software-generated CART reached an overall accuracy of 95 and 98%, respectively, both with 100% NPV. PLA2-II and IP-10 associated with clinical variables in regression or decision tree heterogeneous models may be valuable biomarkers for sepsis diagnosis in SIRS patients admitted to medical ward (MW). Further studies are needed to introduce them into clinical practice.

  3. The Plumbing of Land Surface Models: Is Poor Performance a Result of Methodology or Data Quality?

    NASA Technical Reports Server (NTRS)

    Haughton, Ned; Abramowitz, Gab; Pitman, Andy J.; Or, Dani; Best, Martin J.; Johnson, Helen R.; Balsamo, Gianpaolo; Boone, Aaron; Cuntz, Matthais; Decharme, Bertrand; hide

    2016-01-01

    The PALS Land sUrface Model Benchmarking Evaluation pRoject (PLUMBER) illustrated the value of prescribing a priori performance targets in model intercomparisons. It showed that the performance of turbulent energy flux predictions from different land surface models, at a broad range of flux tower sites using common evaluation metrics, was on average worse than relatively simple empirical models. For sensible heat fluxes, all land surface models were outperformed by a linear regression against downward shortwave radiation. For latent heat flux, all land surface models were outperformed by a regression against downward shortwave, surface air temperature and relative humidity. These results are explored here in greater detail and possible causes are investigated. We examine whether particular metrics or sites unduly influence the collated results, whether results change according to time-scale aggregation and whether a lack of energy conservation in fluxtower data gives the empirical models an unfair advantage in the intercomparison. We demonstrate that energy conservation in the observational data is not responsible for these results. We also show that the partitioning between sensible and latent heat fluxes in LSMs, rather than the calculation of available energy, is the cause of the original findings. Finally, we present evidence suggesting that the nature of this partitioning problem is likely shared among all contributing LSMs. While we do not find a single candidate explanation forwhy land surface models perform poorly relative to empirical benchmarks in PLUMBER, we do exclude multiple possible explanations and provide guidance on where future research should focus.

  4. Improving stability of prediction models based on correlated omics data by using network approaches.

    PubMed

    Tissier, Renaud; Houwing-Duistermaat, Jeanine; Rodríguez-Girondo, Mar

    2018-01-01

    Building prediction models based on complex omics datasets such as transcriptomics, proteomics, metabolomics remains a challenge in bioinformatics and biostatistics. Regularized regression techniques are typically used to deal with the high dimensionality of these datasets. However, due to the presence of correlation in the datasets, it is difficult to select the best model and application of these methods yields unstable results. We propose a novel strategy for model selection where the obtained models also perform well in terms of overall predictability. Several three step approaches are considered, where the steps are 1) network construction, 2) clustering to empirically derive modules or pathways, and 3) building a prediction model incorporating the information on the modules. For the first step, we use weighted correlation networks and Gaussian graphical modelling. Identification of groups of features is performed by hierarchical clustering. The grouping information is included in the prediction model by using group-based variable selection or group-specific penalization. We compare the performance of our new approaches with standard regularized regression via simulations. Based on these results we provide recommendations for selecting a strategy for building a prediction model given the specific goal of the analysis and the sizes of the datasets. Finally we illustrate the advantages of our approach by application of the methodology to two problems, namely prediction of body mass index in the DIetary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome study (DILGOM) and prediction of response of each breast cancer cell line to treatment with specific drugs using a breast cancer cell lines pharmacogenomics dataset.

  5. The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring

    ERIC Educational Resources Information Center

    Haberman, Shelby J.; Sinharay, Sandip

    2010-01-01

    Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…

  6. A Development of Nonstationary Regional Frequency Analysis Model with Large-scale Climate Information: Its Application to Korean Watershed

    NASA Astrophysics Data System (ADS)

    Kim, Jin-Young; Kwon, Hyun-Han; Kim, Hung-Soo

    2015-04-01

    The existing regional frequency analysis has disadvantages in that it is difficult to consider geographical characteristics in estimating areal rainfall. In this regard, this study aims to develop a hierarchical Bayesian model based nonstationary regional frequency analysis in that spatial patterns of the design rainfall with geographical information (e.g. latitude, longitude and altitude) are explicitly incorporated. This study assumes that the parameters of Gumbel (or GEV distribution) are a function of geographical characteristics within a general linear regression framework. Posterior distribution of the regression parameters are estimated by Bayesian Markov Chain Monte Carlo (MCMC) method, and the identified functional relationship is used to spatially interpolate the parameters of the distributions by using digital elevation models (DEM) as inputs. The proposed model is applied to derive design rainfalls over the entire Han-river watershed. It was found that the proposed Bayesian regional frequency analysis model showed similar results compared to L-moment based regional frequency analysis. In addition, the model showed an advantage in terms of quantifying uncertainty of the design rainfall and estimating the area rainfall considering geographical information. Finally, comprehensive discussion on design rainfall in the context of nonstationary will be presented. KEYWORDS: Regional frequency analysis, Nonstationary, Spatial information, Bayesian Acknowledgement This research was supported by a grant (14AWMP-B082564-01) from Advanced Water Management Research Program funded by Ministry of Land, Infrastructure and Transport of Korean government.

  7. Use of a machine learning framework to predict substance use disorder treatment success

    PubMed Central

    Kelmansky, Diana; van der Laan, Mark; Sahker, Ethan; Jones, DeShauna; Arndt, Stephan

    2017-01-01

    There are several methods for building prediction models. The wealth of currently available modeling techniques usually forces the researcher to judge, a priori, what will likely be the best method. Super learning (SL) is a methodology that facilitates this decision by combining all identified prediction algorithms pertinent for a particular prediction problem. SL generates a final model that is at least as good as any of the other models considered for predicting the outcome. The overarching aim of this work is to introduce SL to analysts and practitioners. This work compares the performance of logistic regression, penalized regression, random forests, deep learning neural networks, and SL to predict successful substance use disorders (SUD) treatment. A nationwide database including 99,013 SUD treatment patients was used. All algorithms were evaluated using the area under the receiver operating characteristic curve (AUC) in a test sample that was not included in the training sample used to fit the prediction models. AUC for the models ranged between 0.793 and 0.820. SL was superior to all but one of the algorithms compared. An explanation of SL steps is provided. SL is the first step in targeted learning, an analytic framework that yields double robust effect estimation and inference with fewer assumptions than the usual parametric methods. Different aspects of SL depending on the context, its function within the targeted learning framework, and the benefits of this methodology in the addiction field are discussed. PMID:28394905

  8. Use of a machine learning framework to predict substance use disorder treatment success.

    PubMed

    Acion, Laura; Kelmansky, Diana; van der Laan, Mark; Sahker, Ethan; Jones, DeShauna; Arndt, Stephan

    2017-01-01

    There are several methods for building prediction models. The wealth of currently available modeling techniques usually forces the researcher to judge, a priori, what will likely be the best method. Super learning (SL) is a methodology that facilitates this decision by combining all identified prediction algorithms pertinent for a particular prediction problem. SL generates a final model that is at least as good as any of the other models considered for predicting the outcome. The overarching aim of this work is to introduce SL to analysts and practitioners. This work compares the performance of logistic regression, penalized regression, random forests, deep learning neural networks, and SL to predict successful substance use disorders (SUD) treatment. A nationwide database including 99,013 SUD treatment patients was used. All algorithms were evaluated using the area under the receiver operating characteristic curve (AUC) in a test sample that was not included in the training sample used to fit the prediction models. AUC for the models ranged between 0.793 and 0.820. SL was superior to all but one of the algorithms compared. An explanation of SL steps is provided. SL is the first step in targeted learning, an analytic framework that yields double robust effect estimation and inference with fewer assumptions than the usual parametric methods. Different aspects of SL depending on the context, its function within the targeted learning framework, and the benefits of this methodology in the addiction field are discussed.

  9. Predonation Volume of Future Remnant Cortical Kidney Helps Predict Postdonation Renal Function in Live Kidney Donors.

    PubMed

    Fananapazir, Ghaneh; Benzl, Robert; Corwin, Michael T; Chen, Ling-Xin; Sageshima, Junichiro; Stewart, Susan L; Troppmann, Christoph

    2018-07-01

    Purpose To determine whether the predonation computed tomography (CT)-based volume of the future remnant kidney is predictive of postdonation renal function in living kidney donors. Materials and Methods This institutional review board-approved, retrospective, HIPAA-compliant study included 126 live kidney donors who had undergone predonation renal CT between January 2007 and December 2014 as well as 2-year postdonation measurement of estimated glomerular filtration rate (eGFR). The whole kidney volume and cortical volume of the future remnant kidney were measured and standardized for body surface area (BSA). Bivariate linear associations between the ratios of whole kidney volume to BSA and cortical volume to BSA were obtained. A linear regression model for 2-year postdonation eGFR that incorporated donor age, sex, and either whole kidney volume-to-BSA ratio or cortical volume-to-BSA ratio was created, and the coefficient of determination (R 2 ) for the model was calculated. Factors not statistically additive in assessing 2-year eGFR were removed by using backward elimination, and the coefficient of determination for this parsimonious model was calculated. Results Correlation was slightly better for cortical volume-to-BSA ratio than for whole kidney volume-to-BSA ratio (r = 0.48 vs r = 0.44, respectively). The linear regression model incorporating all donor factors had an R 2 of 0.66. The only factors that were significantly additive to the equation were cortical volume-to-BSA ratio and predonation eGFR (P = .01 and P < .01, respectively), and the final parsimonious linear regression model incorporating these two variables explained almost the same amount of variance (R 2 = 0.65) as did the full model. Conclusion The cortical volume of the future remnant kidney helped predict postdonation eGFR at 2 years. The cortical volume-to-BSA ratio should thus be considered for addition as an important variable to living kidney donor evaluation and selection guidelines. © RSNA, 2018.

  10. New models to predict depth of infiltration in endometrial carcinoma based on transvaginal sonography.

    PubMed

    De Smet, F; De Brabanter, J; Van den Bosch, T; Pochet, N; Amant, F; Van Holsbeke, C; Moerman, P; De Moor, B; Vergote, I; Timmerman, D

    2006-06-01

    Preoperative knowledge of the depth of myometrial infiltration is important in patients with endometrial carcinoma. This study aimed at assessing the value of histopathological parameters obtained from an endometrial biopsy (Pipelle de Cornier; results available preoperatively) and ultrasound measurements obtained after transvaginal sonography with color Doppler imaging in the preoperative prediction of the depth of myometrial invasion, as determined by the final histopathological examination of the hysterectomy specimen (the gold standard). We first collected ultrasound and histopathological data from 97 consecutive women with endometrial carcinoma and divided them into two groups according to surgical stage (Stages Ia and Ib vs. Stages Ic and higher). The areas (AUC) under the receiver-operating characteristics curves of the subjective assessment of depth of invasion by an experienced gynecologist and of the individual ultrasound parameters were calculated. Subsequently, we used these variables to train a logistic regression model and least squares support vector machines (LS-SVM) with linear and RBF (radial basis function) kernels. Finally, these models were validated prospectively on data from 76 new patients in order to make a preoperative prediction of the depth of invasion. Of all ultrasound parameters, the ratio of the endometrial and uterine volumes had the largest AUC (78%), while that of the subjective assessment was 79%. The AUCs of the blood flow indices were low (range, 51-64%). Stepwise logistic regression selected the degree of differentiation, the number of fibroids, the endometrial thickness and the volume of the tumor. Compared with the AUC of the subjective assessment (72%), prospective evaluation of the mathematical models resulted in a higher AUC for the LS-SVM model with an RBF kernel (77%), but this difference was not significant. Single morphological parameters do not improve the predictive power when compared with the subjective assessment of depth of myometrial invasion of endometrial cancer, and blood flow indices do not contribute to the prediction of stage. In this study an LS-SVM model with an RBF kernel gave the best prediction; while this might be more reliable than subjective assessment, confirmation by larger prospective studies is required. Copyright 2006 ISUOG. Published by John Wiley & Sons, Ltd.

  11. Moderation analysis using a two-level regression model.

    PubMed

    Yuan, Ke-Hai; Cheng, Ying; Maxwell, Scott

    2014-10-01

    Moderation analysis is widely used in social and behavioral research. The most commonly used model for moderation analysis is moderated multiple regression (MMR) in which the explanatory variables of the regression model include product terms, and the model is typically estimated by least squares (LS). This paper argues for a two-level regression model in which the regression coefficients of a criterion variable on predictors are further regressed on moderator variables. An algorithm for estimating the parameters of the two-level model by normal-distribution-based maximum likelihood (NML) is developed. Formulas for the standard errors (SEs) of the parameter estimates are provided and studied. Results indicate that, when heteroscedasticity exists, NML with the two-level model gives more efficient and more accurate parameter estimates than the LS analysis of the MMR model. When error variances are homoscedastic, NML with the two-level model leads to essentially the same results as LS with the MMR model. Most importantly, the two-level regression model permits estimating the percentage of variance of each regression coefficient that is due to moderator variables. When applied to data from General Social Surveys 1991, NML with the two-level model identified a significant moderation effect of race on the regression of job prestige on years of education while LS with the MMR model did not. An R package is also developed and documented to facilitate the application of the two-level model.

  12. Surrogate Model Application to the Identification of Optimal Groundwater Exploitation Scheme Based on Regression Kriging Method—A Case Study of Western Jilin Province

    PubMed Central

    An, Yongkai; Lu, Wenxi; Cheng, Weiguo

    2015-01-01

    This paper introduces a surrogate model to identify an optimal exploitation scheme, while the western Jilin province was selected as the study area. A numerical simulation model of groundwater flow was established first, and four exploitation wells were set in the Tongyu county and Qian Gorlos county respectively so as to supply water to Daan county. Second, the Latin Hypercube Sampling (LHS) method was used to collect data in the feasible region for input variables. A surrogate model of the numerical simulation model of groundwater flow was developed using the regression kriging method. An optimization model was established to search an optimal groundwater exploitation scheme using the minimum average drawdown of groundwater table and the minimum cost of groundwater exploitation as multi-objective functions. Finally, the surrogate model was invoked by the optimization model in the process of solving the optimization problem. Results show that the relative error and root mean square error of the groundwater table drawdown between the simulation model and the surrogate model for 10 validation samples are both lower than 5%, which is a high approximation accuracy. The contrast between the surrogate-based simulation optimization model and the conventional simulation optimization model for solving the same optimization problem, shows the former only needs 5.5 hours, and the latter needs 25 days. The above results indicate that the surrogate model developed in this study could not only considerably reduce the computational burden of the simulation optimization process, but also maintain high computational accuracy. This can thus provide an effective method for identifying an optimal groundwater exploitation scheme quickly and accurately. PMID:26264008

  13. The microcomputer scientific software series 2: general linear model--regression.

    Treesearch

    Harold M. Rauscher

    1983-01-01

    The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...

  14. Comparative analysis on the probability of being a good payer

    NASA Astrophysics Data System (ADS)

    Mihova, V.; Pavlov, V.

    2017-10-01

    Credit risk assessment is crucial for the bank industry. The current practice uses various approaches for the calculation of credit risk. The core of these approaches is the use of multiple regression models, applied in order to assess the risk associated with the approval of people applying for certain products (loans, credit cards, etc.). Based on data from the past, these models try to predict what will happen in the future. Different data requires different type of models. This work studies the causal link between the conduct of an applicant upon payment of the loan and the data that he completed at the time of application. A database of 100 borrowers from a commercial bank is used for the purposes of the study. The available data includes information from the time of application and credit history while paying off the loan. Customers are divided into two groups, based on the credit history: Good and Bad payers. Linear and logistic regression are applied in parallel to the data in order to estimate the probability of being good for new borrowers. A variable, which contains value of 1 for Good borrowers and value of 0 for Bad candidates, is modeled as a dependent variable. To decide which of the variables listed in the database should be used in the modelling process (as independent variables), a correlation analysis is made. Due to the results of it, several combinations of independent variables are tested as initial models - both with linear and logistic regression. The best linear and logistic models are obtained after initial transformation of the data and following a set of standard and robust statistical criteria. A comparative analysis between the two final models is made and scorecards are obtained from both models to assess new customers at the time of application. A cut-off level of points, bellow which to reject the applications and above it - to accept them, has been suggested for both the models, applying the strategy to keep the same Accept Rate as in the current data.

  15. Risk factors for displaced abomasum or ketosis in Swedish dairy herds.

    PubMed

    Stengärde, L; Hultgren, J; Tråvén, M; Holtenius, K; Emanuelson, U

    2012-03-01

    Risk factors associated with high or low long-term incidence of displaced abomasum (DA) or clinical ketosis were studied in 60 Swedish dairy herds, using multivariable logistic regression modelling. Forty high-incidence herds were included as cases and 20 low-incidence herds as controls. Incidence rates were calculated based on veterinary records of clinical diagnoses. During the 3-year period preceding the herd classification, herds with a high incidence had a disease incidence of DA or clinical ketosis above the 3rd quartile in a national database for disease recordings. Control herds had no cows with DA or clinical ketosis. All herds were visited during the housing period and herdsmen were interviewed about management routines, housing, feeding, milk yield, and herd health. Target groups were heifers in late gestation, dry cows, and cows in early lactation. Univariable logistic regression was used to screen for factors associated with being a high-incidence herd. A multivariable logistic regression model was built using stepwise regression. A higher maximum daily milk yield in multiparous cows and a large herd size (p=0.054 and p=0.066, respectively) tended to be associated with being a high-incidence herd. Not cleaning the heifer feeding platform daily increased the odds of having a high-incidence herd twelvefold (p<0.01). Keeping cows in only one group in the dry period increased the odds of having a high incidence herd eightfold (p=0.03). Herd size was confounded with housing system. Housing system was therefore added to the final logistic regression model. In conclusion, a large herd size, a high maximum daily milk yield, keeping dry cows in one group, and not cleaning the feeding platform daily appear to be important risk factors for a high incidence of DA or clinical ketosis in Swedish dairy herds. These results confirm the importance of housing, management and feeding in the prevention of metabolic disorders in dairy cows around parturition and in early lactation. Copyright © 2011 Elsevier B.V. All rights reserved.

  16. Climate variations and salmonellosis transmission in Adelaide, South Australia: a comparison between regression models

    NASA Astrophysics Data System (ADS)

    Zhang, Ying; Bi, Peng; Hiller, Janet

    2008-01-01

    This is the first study to identify appropriate regression models for the association between climate variation and salmonellosis transmission. A comparison between different regression models was conducted using surveillance data in Adelaide, South Australia. By using notified salmonellosis cases and climatic variables from the Adelaide metropolitan area over the period 1990-2003, four regression methods were examined: standard Poisson regression, autoregressive adjusted Poisson regression, multiple linear regression, and a seasonal autoregressive integrated moving average (SARIMA) model. Notified salmonellosis cases in 2004 were used to test the forecasting ability of the four models. Parameter estimation, goodness-of-fit and forecasting ability of the four regression models were compared. Temperatures occurring 2 weeks prior to cases were positively associated with cases of salmonellosis. Rainfall was also inversely related to the number of cases. The comparison of the goodness-of-fit and forecasting ability suggest that the SARIMA model is better than the other three regression models. Temperature and rainfall may be used as climatic predictors of salmonellosis cases in regions with climatic characteristics similar to those of Adelaide. The SARIMA model could, thus, be adopted to quantify the relationship between climate variations and salmonellosis transmission.

  17. Self-efficacy is independently associated with brain volume in older women.

    PubMed

    Davis, Jennifer C; Nagamatsu, Lindsay S; Hsu, Chun Liang; Beattie, B Lynn; Liu-Ambrose, Teresa

    2012-07-01

    ageing is highly associated with neurodegeneration and atrophy of the brain. Evidence suggests that personality variables are risk factors for reduced brain volume. We examine whether falls-related self-efficacy is independently associated with brain volume. a cross-sectional analysis of whether falls-related self-efficacy is independently associated with brain volumes (total, grey and white matter). Three multivariate regression models were constructed. Covariates included in the models were age, global cognition, systolic blood pressure, functional comorbidity index and current physical activity level. MRI scans were acquired from 79 community-dwelling senior women aged 65-75 years old. Falls-related self-efficacy was assessed by the activities-specific balance confidence (ABC) scale. after accounting for covariates, falls-related self-efficacy was independently associated with both total brain volume and total grey matter volume. The final model for total brain volume accounted for 17% of the variance, with the ABC score accounting for 8%. For total grey matter volume, the final model accounted for 24% of the variance, with the ABC score accounting for 10%. we provide novel evidence that falls-related self-efficacy, a modifiable risk factor for healthy ageing, is positively associated with total brain volume and total grey matter volume. ClinicalTrials.gov Identifier: NCT00426881.

  18. Self-efficacy is independently associated with brain volume in older women

    PubMed Central

    Davis, Jennifer C.; Nagamatsu, Lindsay S.; Hsu, Chun Liang; Beattie, B. Lynn; Liu-Ambrose, Teresa

    2015-01-01

    Background Aging is highly associated with neurodegeneration and atrophy of the brain. Evidence suggests that personality variables are risk factors for reduced brain volume. We examine whether falls-related self-efficacy is independently associated with brain volume. Method A cross-sectional analysis of whether falls-related self-efficacy is independently associated with brain volumes (total, grey, and white matter). Three multivariate regression models were constructed. Covariates included in the models were age, global cognition, systolic blood pressure, functional comorbidity index, and current physical activity level. MRI scans were acquired from 79 community-dwelling senior women aged 65 to 75 years old. Falls-related self-efficacy was assessed by the Activities Specific Balance Confidence (ABC) Scale. Results After accounting for covariates, falls-related self-efficacy was independently associated with both total brain volume and total grey matter volume. The final model for total brain volume accounted for 17% of the variance, with the ABC score accounting for 8%. For total grey matter volume, the final model accounted for 24% of the variance, with the ABC score accounting for 10%. Conclusion We provide novel evidence that falls-related self-efficacy, a modifiable risk factor for healthy aging, is positively associated with total brain volume and total grey matter volume. Trial Registration ClinicalTrials.gov Identifier: NCT00426881. PMID:22436405

  19. Stochastic search, optimization and regression with energy applications

    NASA Astrophysics Data System (ADS)

    Hannah, Lauren A.

    Designing clean energy systems will be an important task over the next few decades. One of the major roadblocks is a lack of mathematical tools to economically evaluate those energy systems. However, solutions to these mathematical problems are also of interest to the operations research and statistical communities in general. This thesis studies three problems that are of interest to the energy community itself or provide support for solution methods: R&D portfolio optimization, nonparametric regression and stochastic search with an observable state variable. First, we consider the one stage R&D portfolio optimization problem to avoid the sequential decision process associated with the multi-stage. The one stage problem is still difficult because of a non-convex, combinatorial decision space and a non-convex objective function. We propose a heuristic solution method that uses marginal project values---which depend on the selected portfolio---to create a linear objective function. In conjunction with the 0-1 decision space, this new problem can be solved as a knapsack linear program. This method scales well to large decision spaces. We also propose an alternate, provably convergent algorithm that does not exploit problem structure. These methods are compared on a solid oxide fuel cell R&D portfolio problem. Next, we propose Dirichlet Process mixtures of Generalized Linear Models (DPGLM), a new method of nonparametric regression that accommodates continuous and categorical inputs, and responses that can be modeled by a generalized linear model. We prove conditions for the asymptotic unbiasedness of the DP-GLM regression mean function estimate. We also give examples for when those conditions hold, including models for compactly supported continuous distributions and a model with continuous covariates and categorical response. We empirically analyze the properties of the DP-GLM and why it provides better results than existing Dirichlet process mixture regression models. We evaluate DP-GLM on several data sets, comparing it to modern methods of nonparametric regression like CART, Bayesian trees and Gaussian processes. Compared to existing techniques, the DP-GLM provides a single model (and corresponding inference algorithms) that performs well in many regression settings. Finally, we study convex stochastic search problems where a noisy objective function value is observed after a decision is made. There are many stochastic search problems whose behavior depends on an exogenous state variable which affects the shape of the objective function. Currently, there is no general purpose algorithm to solve this class of problems. We use nonparametric density estimation to take observations from the joint state-outcome distribution and use them to infer the optimal decision for a given query state. We propose two solution methods that depend on the problem characteristics: function-based and gradient-based optimization. We examine two weighting schemes, kernel-based weights and Dirichlet process-based weights, for use with the solution methods. The weights and solution methods are tested on a synthetic multi-product newsvendor problem and the hour-ahead wind commitment problem. Our results show that in some cases Dirichlet process weights offer substantial benefits over kernel based weights and more generally that nonparametric estimation methods provide good solutions to otherwise intractable problems.

  20. [Evaluation of estimation of prevalence ratio using bayesian log-binomial regression model].

    PubMed

    Gao, W L; Lin, H; Liu, X N; Ren, X W; Li, J S; Shen, X P; Zhu, S L

    2017-03-10

    To evaluate the estimation of prevalence ratio ( PR ) by using bayesian log-binomial regression model and its application, we estimated the PR of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea in their infants by using bayesian log-binomial regression model in Openbugs software. The results showed that caregivers' recognition of infant' s risk signs of diarrhea was associated significantly with a 13% increase of medical care-seeking. Meanwhile, we compared the differences in PR 's point estimation and its interval estimation of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea and convergence of three models (model 1: not adjusting for the covariates; model 2: adjusting for duration of caregivers' education, model 3: adjusting for distance between village and township and child month-age based on model 2) between bayesian log-binomial regression model and conventional log-binomial regression model. The results showed that all three bayesian log-binomial regression models were convergence and the estimated PRs were 1.130(95 %CI : 1.005-1.265), 1.128(95 %CI : 1.001-1.264) and 1.132(95 %CI : 1.004-1.267), respectively. Conventional log-binomial regression model 1 and model 2 were convergence and their PRs were 1.130(95 % CI : 1.055-1.206) and 1.126(95 % CI : 1.051-1.203), respectively, but the model 3 was misconvergence, so COPY method was used to estimate PR , which was 1.125 (95 %CI : 1.051-1.200). In addition, the point estimation and interval estimation of PRs from three bayesian log-binomial regression models differed slightly from those of PRs from conventional log-binomial regression model, but they had a good consistency in estimating PR . Therefore, bayesian log-binomial regression model can effectively estimate PR with less misconvergence and have more advantages in application compared with conventional log-binomial regression model.

  1. Accelerating Approximate Bayesian Computation with Quantile Regression: application to cosmological redshift distributions

    NASA Astrophysics Data System (ADS)

    Kacprzak, T.; Herbel, J.; Amara, A.; Réfrégier, A.

    2018-02-01

    Approximate Bayesian Computation (ABC) is a method to obtain a posterior distribution without a likelihood function, using simulations and a set of distance metrics. For that reason, it has recently been gaining popularity as an analysis tool in cosmology and astrophysics. Its drawback, however, is a slow convergence rate. We propose a novel method, which we call qABC, to accelerate ABC with Quantile Regression. In this method, we create a model of quantiles of distance measure as a function of input parameters. This model is trained on a small number of simulations and estimates which regions of the prior space are likely to be accepted into the posterior. Other regions are then immediately rejected. This procedure is then repeated as more simulations are available. We apply it to the practical problem of estimation of redshift distribution of cosmological samples, using forward modelling developed in previous work. The qABC method converges to nearly same posterior as the basic ABC. It uses, however, only 20% of the number of simulations compared to basic ABC, achieving a fivefold gain in execution time for our problem. For other problems the acceleration rate may vary; it depends on how close the prior is to the final posterior. We discuss possible improvements and extensions to this method.

  2. A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models.

    PubMed

    Tabe-Bordbar, Shayan; Emad, Amin; Zhao, Sihai Dave; Sinha, Saurabh

    2018-04-26

    Cross-validation (CV) is a technique to assess the generalizability of a model to unseen data. This technique relies on assumptions that may not be satisfied when studying genomics datasets. For example, random CV (RCV) assumes that a randomly selected set of samples, the test set, well represents unseen data. This assumption doesn't hold true where samples are obtained from different experimental conditions, and the goal is to learn regulatory relationships among the genes that generalize beyond the observed conditions. In this study, we investigated how the CV procedure affects the assessment of supervised learning methods used to learn gene regulatory networks (or in other applications). We compared the performance of a regression-based method for gene expression prediction estimated using RCV with that estimated using a clustering-based CV (CCV) procedure. Our analysis illustrates that RCV can produce over-optimistic estimates of the model's generalizability compared to CCV. Next, we defined the 'distinctness' of test set from training set and showed that this measure is predictive of performance of the regression method. Finally, we introduced a simulated annealing method to construct partitions with gradually increasing distinctness and showed that performance of different gene expression prediction methods can be better evaluated using this method.

  3. [Screen potential CYP450 2E1 inhibitors from Chinese herbal medicine based on support vector regression and molecular docking method].

    PubMed

    Chen, Xi; Lu, Fang; Jiang, Lu-di; Cai, Yi-Lian; Li, Gong-Yu; Zhang, Yan-Ling

    2016-07-01

    Inhibition of cytochrome P450 (CYP450) enzymes is the most common reasons for drug interactions, so the study on early prediction of CYPs inhibitors can help to decrease the incidence of adverse reactions caused by drug interactions.CYP450 2E1(CYP2E1), as a key role in drug metabolism process, has broad spectrum of drug metabolism substrate. In this study, 32 CYP2E1 inhibitors were collected for the construction of support vector regression (SVR) model. The test set data were used to verify CYP2E1 quantitative models and obtain the optimal prediction model of CYP2E1 inhibitor. Meanwhile, one molecular docking program, CDOCKER, was utilized to analyze the interaction pattern between positive compounds and active pocket to establish the optimal screening model of CYP2E1 inhibitors.SVR model and molecular docking prediction model were combined to screen traditional Chinese medicine database (TCMD), which could improve the calculation efficiency and prediction accuracy. 6 376 traditional Chinese medicine (TCM) compounds predicted by SVR model were obtained, and in further verification by using molecular docking model, 247 TCM compounds with potential inhibitory activities against CYP2E1 were finally retained. Some of them have been verified by experiments. The results demonstrated that this study could provide guidance for the virtual screening of CYP450 inhibitors and the prediction of CYPs-mediated DDIs, and also provide references for clinical rational drug use. Copyright© by the Chinese Pharmaceutical Association.

  4. Construction and analysis of a modular model of caspase activation in apoptosis

    PubMed Central

    Harrington, Heather A; Ho, Kenneth L; Ghosh, Samik; Tung, KC

    2008-01-01

    Background A key physiological mechanism employed by multicellular organisms is apoptosis, or programmed cell death. Apoptosis is triggered by the activation of caspases in response to both extracellular (extrinsic) and intracellular (intrinsic) signals. The extrinsic and intrinsic pathways are characterized by the formation of the death-inducing signaling complex (DISC) and the apoptosome, respectively; both the DISC and the apoptosome are oligomers with complex formation dynamics. Additionally, the extrinsic and intrinsic pathways are coupled through the mitochondrial apoptosis-induced channel via the Bcl-2 family of proteins. Results A model of caspase activation is constructed and analyzed. The apoptosis signaling network is simplified through modularization methodologies and equilibrium abstractions for three functional modules. The mathematical model is composed of a system of ordinary differential equations which is numerically solved. Multiple linear regression analysis investigates the role of each module and reduced models are constructed to identify key contributions of the extrinsic and intrinsic pathways in triggering apoptosis for different cell lines. Conclusion Through linear regression techniques, we identified the feedbacks, dissociation of complexes, and negative regulators as the key components in apoptosis. The analysis and reduced models for our model formulation reveal that the chosen cell lines predominately exhibit strong extrinsic caspase, typical of type I cell, behavior. Furthermore, under the simplified model framework, the selected cells lines exhibit different modes by which caspase activation may occur. Finally the proposed modularized model of apoptosis may generalize behavior for additional cells and tissues, specifically identifying and predicting components responsible for the transition from type I to type II cell behavior. PMID:19077196

  5. Evaluation of weighted regression and sample size in developing a taper model for loblolly pine

    Treesearch

    Kenneth L. Cormier; Robin M. Reich; Raymond L. Czaplewski; William A. Bechtold

    1992-01-01

    A stem profile model, fit using pseudo-likelihood weighted regression, was used to estimate merchantable volume of loblolly pine (Pinus taeda L.) in the southeast. The weighted regression increased model fit marginally, but did not substantially increase model performance. In all cases, the unweighted regression models performed as well as the...

  6. Influence of genetic, biological and pharmacological factors on levodopa dose in Parkinson's disease.

    PubMed

    Altmann, Vivian; Schumacher-Schuh, Artur F; Rieck, Mariana; Callegari-Jacques, Sidia M; Rieder, Carlos R M; Hutz, Mara H

    2016-04-01

    Levodopa is first-line treatment of Parkinson's disease motor symptoms but, dose response is highly variable. Therefore, the aim of this study was to determine how much levodopa dose could be explained by biological, pharmacological and genetic factors. A total of 224 Parkinson's disease patients were genotyped for SV2C and SLC6A3 polymorphisms by allelic discrimination assays. Comedication, demographic and clinical data were also assessed. All variables with p < 0.20 were included in a multiple regression analysis for dose prediction. The final model explained 23% of dose variation (F = 11.54; p < 0.000001). Although a good prediction model was obtained, it still needs to be tested in an independent sample to be validated.

  7. Predict the fatigue life of crack based on extended finite element method and SVR

    NASA Astrophysics Data System (ADS)

    Song, Weizhen; Jiang, Zhansi; Jiang, Hui

    2018-05-01

    Using extended finite element method (XFEM) and support vector regression (SVR) to predict the fatigue life of plate crack. Firstly, the XFEM is employed to calculate the stress intensity factors (SIFs) with given crack sizes. Then predicetion model can be built based on the function relationship of the SIFs with the fatigue life or crack length. Finally, according to the prediction model predict the SIFs at different crack sizes or different cycles. Because of the accuracy of the forward Euler method only ensured by the small step size, a new prediction method is presented to resolve the issue. The numerical examples were studied to demonstrate the proposed method allow a larger step size and have a high accuracy.

  8. Evaluation of driver fatigue on two channels of EEG data.

    PubMed

    Li, Wei; He, Qi-chang; Fan, Xiu-min; Fei, Zhi-min

    2012-01-11

    Electroencephalogram (EEG) data is an effective indicator to evaluate driver fatigue. The 16 channels of EEG data are collected and transformed into three bands (θ, α, and β) in the current paper. First, 12 types of energy parameters are computed based on the EEG data. Then, Grey Relational Analysis (GRA) is introduced to identify the optimal indicator of driver fatigue, after which, the number of significant electrodes is reduced using Kernel Principle Component Analysis (KPCA). Finally, the evaluation model for driver fatigue is established with the regression equation based on the EEG data from two significant electrodes (Fp1 and O1). The experimental results verify that the model is effective in evaluating driver fatigue. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

  9. Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model

    NASA Astrophysics Data System (ADS)

    Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami

    2017-06-01

    A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.

  10. Alcohol Misuse and Psychological Resilience among U.S. Iraq and Afghanistan Era Veteran Military Personnel

    PubMed Central

    Green, Kimberly T.; Beckham, Jean C.; Youssef, Nagy; Elbogen, Eric B.

    2013-01-01

    Objective The present study sought to investigate the longitudinal effects of psychological resilience against alcohol misuse adjusting for socio-demographic factors, trauma-related variables, and self-reported history of alcohol abuse. Methodology Data were from National Post-Deployment Adjustment Study (NPDAS) participants who completed both a baseline and one-year follow-up survey (N=1090). Survey questionnaires measured combat exposure, probable posttraumatic stress disorder (PTSD), psychological resilience, and alcohol misuse, all of which were measured at two discrete time periods (baseline and one-year follow-up). Baseline resilience and change in resilience (increased or decreased) were utilized as independent variables in separate models evaluating alcohol misuse at the one-year follow-up. Results Multiple linear regression analyses controlled for age, gender, level of educational attainment, combat exposure, PTSD symptom severity, and self-reported alcohol abuse. Accounting for these covariates, findings revealed that lower baseline resilience, younger age, male gender, and self-reported alcohol abuse were related to alcohol misuse at the one-year follow-up. A separate regression analysis, adjusting for the same covariates, revealed a relationship between change in resilience (from baseline to the one-year follow-up) and alcohol misuse at the one-year follow-up. The regression model evaluating these variables in a subset of the sample in which all the participants had been deployed to Iraq and/or Afghanistan was consistent with findings involving the overall era sample. Finally, logistic regression analyses of the one-year follow-up data yielded similar results to the baseline and resilience change models. Conclusions These findings suggest that increased psychological resilience is inversely related to alcohol misuse and is protective against alcohol misuse over time. Additionally, it supports the conceptualization of resilience as a process which evolves over time. Moreover, our results underscore the importance of assessing resilience as part of alcohol use screening for preventing alcohol misuse in Iraq and Afghanistan era military veterans. PMID:24090625

  11. Modelling fourier regression for time series data- a case study: modelling inflation in foods sector in Indonesia

    NASA Astrophysics Data System (ADS)

    Prahutama, Alan; Suparti; Wahyu Utami, Tiani

    2018-03-01

    Regression analysis is an analysis to model the relationship between response variables and predictor variables. The parametric approach to the regression model is very strict with the assumption, but nonparametric regression model isn’t need assumption of model. Time series data is the data of a variable that is observed based on a certain time, so if the time series data wanted to be modeled by regression, then we should determined the response and predictor variables first. Determination of the response variable in time series is variable in t-th (yt), while the predictor variable is a significant lag. In nonparametric regression modeling, one developing approach is to use the Fourier series approach. One of the advantages of nonparametric regression approach using Fourier series is able to overcome data having trigonometric distribution. In modeling using Fourier series needs parameter of K. To determine the number of K can be used Generalized Cross Validation method. In inflation modeling for the transportation sector, communication and financial services using Fourier series yields an optimal K of 120 parameters with R-square 99%. Whereas if it was modeled by multiple linear regression yield R-square 90%.

  12. A hysteretic model considering Stribeck effect for small-scale magnetorheological damper

    NASA Astrophysics Data System (ADS)

    Zhao, Yu-Liang; Xu, Zhao-Dong

    2018-06-01

    Magnetorheological (MR) damper is an ideal semi-active control device for vibration suppression. The mechanical properties of this type of devices show strong nonlinear characteristics, especially the performance of the small-scale dampers. Therefore, developing an ideal model that can accurately describe the nonlinearity of such device is crucial to control design. In this paper, the dynamic characteristics of a small-scale MR damper developed by our research group is tested, and the Stribeck effect is observed in the low velocity region. Then, an improved model based on sigmoid model is proposed to describe this Stribeck effect observed in the experiment. After that, the parameters of this model are identified by genetic algorithms, and the mathematical relationship between these parameters and the input current, excitation frequency and amplitude is regressed. Finally, the predicted forces of the proposed model are validated with the experimental data. The results show that this model can well predict the mechanical properties of the small-scale damper, especially the Stribeck effect in the low velocity region.

  13. Effect of state workplace safety laws on occupational injury rates.

    PubMed

    Smitha, M W; Kirk, K A; Oestenstad, K R; Brown, K C; Lee, S D

    2001-12-01

    The purpose of this study was to evaluate the effect of four common types of mandatory state-level workplace safety regulations on injury severity rates during the period 1992 to 1997 for the manufacturing sector. The full Poisson regression model showed safety committee regulations to have a highly significant reducing effect on injury rates, chi 2 (1, n = 3286) = 10.1774, P = 0.0014. Safety program regulations were significant at the alpha = 0.10 level, chi 2 (1, n = 3286) = 3.5676, P = 0.0589. The effect of insurance carrier loss control regulations in the full model was nonsignificant. However, insurance carrier loss control regulations were highly significant (alpha = 0.01) in the final reduced model. Targeting initiatives were nonsignificant in both the full and reduced models (alpha = 0.05). The study results are important to state and federal agencies considering adopting workplace safety regulations that are similar to the four types evaluated in this study.

  14. A Developed Meta-model for Selection of Cotton Fabrics Using Design of Experiments and TOPSIS Method

    NASA Astrophysics Data System (ADS)

    Chakraborty, Shankar; Chatterjee, Prasenjit

    2017-12-01

    Selection of cotton fabrics for providing optimal clothing comfort is often considered as a multi-criteria decision making problem consisting of an array of candidate alternatives to be evaluated based of several conflicting properties. In this paper, design of experiments and technique for order preference by similarity to ideal solution (TOPSIS) are integrated so as to develop regression meta-models for identifying the most suitable cotton fabrics with respect to the computed TOPSIS scores. The applicability of the adopted method is demonstrated using two real time examples. These developed models can also identify the statistically significant fabric properties and their interactions affecting the measured TOPSIS scores and final selection decisions. There exists good degree of congruence between the ranking patterns as derived using these meta-models and the existing methods for cotton fabric ranking and subsequent selection.

  15. Maximum likelihood estimation for semiparametric transformation models with interval-censored data

    PubMed Central

    Mao, Lu; Lin, D. Y.

    2016-01-01

    Abstract Interval censoring arises frequently in clinical, epidemiological, financial and sociological studies, where the event or failure of interest is known only to occur within an interval induced by periodic monitoring. We formulate the effects of potentially time-dependent covariates on the interval-censored failure time through a broad class of semiparametric transformation models that encompasses proportional hazards and proportional odds models. We consider nonparametric maximum likelihood estimation for this class of models with an arbitrary number of monitoring times for each subject. We devise an EM-type algorithm that converges stably, even in the presence of time-dependent covariates, and show that the estimators for the regression parameters are consistent, asymptotically normal, and asymptotically efficient with an easily estimated covariance matrix. Finally, we demonstrate the performance of our procedures through simulation studies and application to an HIV/AIDS study conducted in Thailand. PMID:27279656

  16. Comprehensive evaluation system of intelligent urban growth

    NASA Astrophysics Data System (ADS)

    Li, Lian-Yan; Ren, Xiao-Bin

    2017-06-01

    With the rapid urbanization of the world, urban planning has become increasingly important and necessary to ensure people have access to equitable and sustainable homes, resources and jobs.This article is to talk about building an intelligent city evaluation system.First,using System Analysis Model(SAM) which concludes literature data analysis and stepwise regression analysis to describe intelligent growth scientifically and obtain the evaluation index. Then,using the improved entropy method to obtain the weight of the evaluation index.Afterwards, establishing a complete Smart Growth Comprehensive Evaluation Model(SGCEM).Finally,testing the correctness of the model.Choosing Otago(New Zealand )and Yumen(China) as research object by data mining and SGCEM model,then we get Yumen and Otago’s rational degree’s values are 0.3485 and 0.5376 respectively. It’s believed that the Otago’s smart level is higher,and it is found that the estimated value of rationality is consistent with the reality.

  17. Estimates of long-term mean-annual nutrient loads considered for use in SPARROW models of the Midcontinental region of Canada and the United States, 2002 base year

    USGS Publications Warehouse

    Saad, David A.; Benoy, Glenn A.; Robertson, Dale M.

    2018-05-11

    Streamflow and nutrient concentration data needed to compute nitrogen and phosphorus loads were compiled from Federal, State, Provincial, and local agency databases and also from selected university databases. The nitrogen and phosphorus loads are necessary inputs to Spatially Referenced Regressions on Watershed Attributes (SPARROW) models. SPARROW models are a way to estimate the distribution, sources, and transport of nutrients in streams throughout the Midcontinental region of Canada and the United States. After screening the data, approximately 1,500 sites sampled by 34 agencies were identified as having suitable data for calculating the long-term mean-annual nutrient loads required for SPARROW model calibration. These final sites represent a wide range in watershed sizes, types of nutrient sources, and land-use and watershed characteristics in the Midcontinental region of Canada and the United States.

  18. Distorted Perceptions of Competence and Incompetence Are More than Regression Effects

    ERIC Educational Resources Information Center

    Albanese, M.; Dottl, S.; Mejicano, G.; Zakowski, L.; Seibert, C.; Van Eyck, S.; Prucha, C.

    2006-01-01

    Students inaccurately assess their own skills, especially high- or low-performers on exams. This study assessed whether regression effects account for this observation. After completing the Infection and Immunity course final exam (IIF), second year medical students (N = 143) estimated their performance on the IIF in terms of percent correct and…

  19. Grades, Gender, and Encouragement: A Regression Discontinuity Analysis

    ERIC Educational Resources Information Center

    Owen, Ann L.

    2010-01-01

    The author employs a regression discontinuity design to provide direct evidence on the effects of grades earned in economics principles classes on the decision to major in economics and finds a differential effect for male and female students. Specifically, for female students, receiving an A for a final grade in the first economics class is…

  20. Inverse regression-based uncertainty quantification algorithms for high-dimensional models: Theory and practice

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Weixuan; Lin, Guang; Li, Bing

    2016-09-01

    A well-known challenge in uncertainty quantification (UQ) is the "curse of dimensionality". However, many high-dimensional UQ problems are essentially low-dimensional, because the randomness of the quantity of interest (QoI) is caused only by uncertain parameters varying within a low-dimensional subspace, known as the sufficient dimension reduction (SDR) subspace. Motivated by this observation, we propose and demonstrate in this paper an inverse regression-based UQ approach (IRUQ) for high-dimensional problems. Specifically, we use an inverse regression procedure to estimate the SDR subspace and then convert the original problem to a low-dimensional one, which can be efficiently solved by building a response surface model such as a polynomial chaos expansion. The novelty and advantages of the proposed approach is seen in its computational efficiency and practicality. Comparing with Monte Carlo, the traditionally preferred approach for high-dimensional UQ, IRUQ with a comparable cost generally gives much more accurate solutions even for high-dimensional problems, and even when the dimension reduction is not exactly sufficient. Theoretically, IRUQ is proved to converge twice as fast as the approach it uses seeking the SDR subspace. For example, while a sliced inverse regression method converges to the SDR subspace at the rate ofmore » $$O(n^{-1/2})$$, the corresponding IRUQ converges at $$O(n^{-1})$$. IRUQ also provides several desired conveniences in practice. It is non-intrusive, requiring only a simulator to generate realizations of the QoI, and there is no need to compute the high-dimensional gradient of the QoI. Finally, error bars can be derived for the estimation results reported by IRUQ.« less

  1. Ensuring the consistancy of Flow Direction Curve reconstructions: the 'quantile solidarity' approach

    NASA Astrophysics Data System (ADS)

    Poncelet, Carine; Andreassian, Vazken; Oudin, Ludovic

    2015-04-01

    Flow Duration Curves (FDCs) are a hydrologic tool describing the distribution of streamflows at a catchment outlet. FDCs are usually used for calibration of hydrological models, managing water quality and classifying catchments, among others. For gauged catchments, empirical FDCs can be computed from streamflow records. For ungauged catchments, on the other hand, FDCs cannot be obtained from streamflow records and must therefore be obtained in another manner, for example through reconstructions. Regression-based reconstructions are methods relying on the evaluation of quantiles separately from catchments' attributes (climatic or physical features).The advantage of this category of methods is that it is informative about the processes and it is non-parametric. However, the large number of parameters required can cause unwanted artifacts, typically reconstructions that do not always produce increasing quantiles. In this paper we propose a new approach named Quantile Solidarity (QS), which is applied under strict proxy-basin test conditions (Klemes, 1986) to a set of 600 French catchments. Half of the catchments are considered as gauged and used to calibrate the regression and compute residuals of the regression. The QS approach consists in a three-step regionalization scheme, which first links quantile values to physical descriptors, then reduces the number of regression parameters and finally exploits the spatial correlation of the residuals. The innovation is the utilisation of the parameters continuity across the quantiles to dramatically reduce the number of parameters. The second half of catchment is used as an independent validation set over which we show that the QS approach ensures strictly growing FDC reconstructions in ungauged conditions. Reference: V. KLEMEŠ (1986) Operational testing of hydrological simulation models, Hydrological Sciences Journal, 31:1, 13-24

  2. The effects of climate change on harp seals (Pagophilus groenlandicus).

    PubMed

    Johnston, David W; Bowers, Matthew T; Friedlaender, Ari S; Lavigne, David M

    2012-01-01

    Harp seals (Pagophilus groenlandicus) have evolved life history strategies to exploit seasonal sea ice as a breeding platform. As such, individuals are prepared to deal with fluctuations in the quantity and quality of ice in their breeding areas. It remains unclear, however, how shifts in climate may affect seal populations. The present study assesses the effects of climate change on harp seals through three linked analyses. First, we tested the effects of short-term climate variability on young-of-the year harp seal mortality using a linear regression of sea ice cover in the Gulf of St. Lawrence against stranding rates of dead harp seals in the region during 1992 to 2010. A similar regression of stranding rates and North Atlantic Oscillation (NAO) index values was also conducted. These analyses revealed negative correlations between both ice cover and NAO conditions and seal mortality, indicating that lighter ice cover and lower NAO values result in higher mortality. A retrospective cross-correlation analysis of NAO conditions and sea ice cover from 1978 to 2011 revealed that NAO-related changes in sea ice may have contributed to the depletion of seals on the east coast of Canada during 1950 to 1972, and to their recovery during 1973 to 2000. This historical retrospective also reveals opposite links between neonatal mortality in harp seals in the Northeast Atlantic and NAO phase. Finally, an assessment of the long-term trends in sea ice cover in the breeding regions of harp seals across the entire North Atlantic during 1979 through 2011 using multiple linear regression models and mixed effects linear regression models revealed that sea ice cover in all harp seal breeding regions has been declining by as much as 6 percent per decade over the time series of available satellite data.

  3. The Effects of Climate Change on Harp Seals (Pagophilus groenlandicus)

    PubMed Central

    Johnston, David W.; Bowers, Matthew T.; Friedlaender, Ari S.; Lavigne, David M.

    2012-01-01

    Harp seals (Pagophilus groenlandicus) have evolved life history strategies to exploit seasonal sea ice as a breeding platform. As such, individuals are prepared to deal with fluctuations in the quantity and quality of ice in their breeding areas. It remains unclear, however, how shifts in climate may affect seal populations. The present study assesses the effects of climate change on harp seals through three linked analyses. First, we tested the effects of short-term climate variability on young-of-the year harp seal mortality using a linear regression of sea ice cover in the Gulf of St. Lawrence against stranding rates of dead harp seals in the region during 1992 to 2010. A similar regression of stranding rates and North Atlantic Oscillation (NAO) index values was also conducted. These analyses revealed negative correlations between both ice cover and NAO conditions and seal mortality, indicating that lighter ice cover and lower NAO values result in higher mortality. A retrospective cross-correlation analysis of NAO conditions and sea ice cover from 1978 to 2011 revealed that NAO-related changes in sea ice may have contributed to the depletion of seals on the east coast of Canada during 1950 to 1972, and to their recovery during 1973 to 2000. This historical retrospective also reveals opposite links between neonatal mortality in harp seals in the Northeast Atlantic and NAO phase. Finally, an assessment of the long-term trends in sea ice cover in the breeding regions of harp seals across the entire North Atlantic during 1979 through 2011 using multiple linear regression models and mixed effects linear regression models revealed that sea ice cover in all harp seal breeding regions has been declining by as much as 6 percent per decade over the time series of available satellite data. PMID:22238591

  4. Optimum pelvic incidence minus lumbar lordosis value can be determined by individual pelvic incidence.

    PubMed

    Inami, Satoshi; Moridaira, Hiroshi; Takeuchi, Daisaku; Shiba, Yo; Nohara, Yutaka; Taneichi, Hiroshi

    2016-11-01

    Adult spinal deformity (ASD) classification showing that ideal pelvic incidence minus lumbar lordosis (PI-LL) value is within 10° has been received widely. But no study has focused on the optimum level of PI-LL value that reflects wide variety in PI among patients. This study was conducted to determine the optimum PI-LL value specific to an individual's PI in postoperative ASD patients. 48 postoperative ASD patients were recruited. Spino-pelvic parameters and Oswestry Disability Index (ODI) were measured at the final follow-up. Factors associated with good clinical results were determined by stepwise multiple regression model using the ODI. The patients with ODI under the 75th percentile cutoff were designated into the "good" health related quality of life (HRQOL) group. In this group, the relationship between the PI-LL and PI was assessed by regression analysis. Multiple regression analysis revealed PI-LL as significant parameters associated with ODI. Thirty-six patients with an ODI <22 points (75th percentile cutoff) were categorized into a good HRQOL group, and linear regression models demonstrated the following equation: PI-LL = 0.41PI-11.12 (r = 0.45, P = 0.0059). On the basis of this equation, in the patients with a PI = 50°, the PI-LL is 9°. Whereas in those with a PI = 30°, the optimum PI-LL is calculated to be as low as 1°. In those with a PI = 80°, PI-LL is estimated at 22°. Consequently, an optimum PI-LL is inconsistent in that it depends on the individual PI.

  5. Predicting preterm birth among participants of North Carolina’s Pregnancy Medical Home Program

    PubMed Central

    Tucker, Christine M.; Berrien, Kate; Menard, M. Kathryn; Herring, Amy H.; Daniels, Julie; Rowley, Diane L.; Halpern, Carolyn Tucker

    2016-01-01

    Objective To determine which combination of risk factors from Community Care of North Carolina’s (CCNC) Pregnancy Medical Home (PMH) risk screening form was most predictive of preterm birth (PTB) by parity and race/ethnicity. Methods This retrospective cohort included pregnant Medicaid patients screened by the PMH program before 24 weeks gestation who delivered a live birth in North Carolina between September 2011-September 2012 (N=15,428). Data came from CCNC’s Case Management Information System, Medicaid claims, and birth certificates. Logistic regression with backward stepwise elimination was used to arrive at the final models. To internally validate the predictive model, we used bootstrapping techniques. Results The prevalence of PTB was 11%. Multifetal gestation, a previous PTB, cervical insufficiency, diabetes, renal disease, and hypertension were the strongest risk factors with odds ratios ranging from 2.34 to 10.78. Non-Hispanic black race, underweight, smoking during pregnancy, asthma, other chronic conditions, nulliparity, and a history of a low birth weight infant or fetal death/second trimester loss were additional predictors in the final predictive model. About half of the risk factors prioritized by the PMH program remained in our final model (ROC=0.66). The odds of PTB associated with food insecurity and obesity differed by parity. The influence of unsafe or unstable housing and short interpregnancy interval on PTB differed by race/ethnicity. Conclusions Evaluation of the PMH risk screen provides insight to ensure women at highest risk are prioritized for care management. Using multiple data sources, salient risk factors for PTB were identified, allowing for better-targeted approaches for PTB prevention. PMID:26112751

  6. Multi-pose facial correction based on Gaussian process with combined kernel function

    NASA Astrophysics Data System (ADS)

    Shi, Shuyan; Ji, Ruirui; Zhang, Fan

    2018-04-01

    In order to improve the recognition rate of various postures, this paper proposes a method of facial correction based on Gaussian Process which build a nonlinear regression model between the front and the side face with combined kernel function. The face images with horizontal angle from -45° to +45° can be properly corrected to front faces. Finally, Support Vector Machine is employed for face recognition. Experiments on CAS PEAL R1 face database show that Gaussian process can weaken the influence of pose changes and improve the accuracy of face recognition to certain extent.

  7. Tire-road friction coefficient estimation based on the resonance frequency of in-wheel motor drive system

    NASA Astrophysics Data System (ADS)

    Chen, Long; Bian, Mingyuan; Luo, Yugong; Qin, Zhaobo; Li, Keqiang

    2016-01-01

    In this paper, a resonance frequency-based tire-road friction coefficient (TRFC) estimation method is proposed by considering the dynamics performance of the in-wheel motor drive system under small slip ratio conditions. A frequency response function (FRF) is deduced for the drive system that is composed of a dynamic tire model and a simplified motor model. A linear relationship between the squared system resonance frequency and the TFRC is described with the FRF. Furthermore, the resonance frequency is identified by the Auto-Regressive eXogenous model using the information of the motor torque and the wheel speed, and the TRFC is estimated thereafter by a recursive least squares filter with the identified resonance frequency. Finally, the effectiveness of the proposed approach is demonstrated through simulations and experimental tests on different road surfaces.

  8. Total ion chromatographic fingerprints combined with chemometrics and mass defect filter to predict antitumor components of Picrasma quassioids.

    PubMed

    Shi, Yuanyuan; Zhan, Hao; Zhong, Liuyi; Yan, Fangrong; Feng, Feng; Liu, Wenyuan; Xie, Ning

    2016-07-01

    A method of total ion chromatogram combined with chemometrics and mass defect filter was established for the prediction of active ingredients in Picrasma quassioides samples. The total ion chromatogram data of 28 batches were pretreated with wavelet transformation and correlation optimized warping to correct baseline drifts and retention time shifts. Then partial least squares regression was applied to construct a regression model to bridge the total ion chromatogram fingerprints and the antitumor activity of P. quassioides. Finally, the regression coefficients were used to predict the active peaks in total ion chromatogram fingerprints. In this strategy, mass defect filter was employed to classify and characterize the active peaks from a chemical point of view. A total of 17 constituents were predicted as the potential active compounds, 16 of which were identified as alkaloids by this developed approach. The results showed that the established method was not only simple and easy to operate, but also suitable to predict ultraviolet undetectable compounds and provide chemical information for the prediction of active compounds in herbs. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  9. Visual tracking using objectness-bounding box regression and correlation filters

    NASA Astrophysics Data System (ADS)

    Mbelwa, Jimmy T.; Zhao, Qingjie; Lu, Yao; Wang, Fasheng; Mbise, Mercy

    2018-03-01

    Visual tracking is a fundamental problem in computer vision with extensive application domains in surveillance and intelligent systems. Recently, correlation filter-based tracking methods have shown a great achievement in terms of robustness, accuracy, and speed. However, such methods have a problem of dealing with fast motion (FM), motion blur (MB), illumination variation (IV), and drifting caused by occlusion (OCC). To solve this problem, a tracking method that integrates objectness-bounding box regression (O-BBR) model and a scheme based on kernelized correlation filter (KCF) is proposed. The scheme based on KCF is used to improve the tracking performance of FM and MB. For handling drift problem caused by OCC and IV, we propose objectness proposals trained in bounding box regression as prior knowledge to provide candidates and background suppression. Finally, scheme KCF as a base tracker and O-BBR are fused to obtain a state of a target object. Extensive experimental comparisons of the developed tracking method with other state-of-the-art trackers are performed on some of the challenging video sequences. Experimental comparison results show that our proposed tracking method outperforms other state-of-the-art tracking methods in terms of effectiveness, accuracy, and robustness.

  10. Semiparametric regression analysis of interval-censored competing risks data.

    PubMed

    Mao, Lu; Lin, Dan-Yu; Zeng, Donglin

    2017-09-01

    Interval-censored competing risks data arise when each study subject may experience an event or failure from one of several causes and the failure time is not observed directly but rather is known to lie in an interval between two examinations. We formulate the effects of possibly time-varying (external) covariates on the cumulative incidence or sub-distribution function of competing risks (i.e., the marginal probability of failure from a specific cause) through a broad class of semiparametric regression models that captures both proportional and non-proportional hazards structures for the sub-distribution. We allow each subject to have an arbitrary number of examinations and accommodate missing information on the cause of failure. We consider nonparametric maximum likelihood estimation and devise a fast and stable EM-type algorithm for its computation. We then establish the consistency, asymptotic normality, and semiparametric efficiency of the resulting estimators for the regression parameters by appealing to modern empirical process theory. In addition, we show through extensive simulation studies that the proposed methods perform well in realistic situations. Finally, we provide an application to a study on HIV-1 infection with different viral subtypes. © 2017, The International Biometric Society.

  11. A New Navigation Satellite Clock Bias Prediction Method Based on Modified Clock-bias Quadratic Polynomial Model

    NASA Astrophysics Data System (ADS)

    Wang, Y. P.; Lu, Z. P.; Sun, D. S.; Wang, N.

    2016-01-01

    In order to better express the characteristics of satellite clock bias (SCB) and improve SCB prediction precision, this paper proposed a new SCB prediction model which can take physical characteristics of space-borne atomic clock, the cyclic variation, and random part of SCB into consideration. First, the new model employs a quadratic polynomial model with periodic items to fit and extract the trend term and cyclic term of SCB; then based on the characteristics of fitting residuals, a time series ARIMA ~(Auto-Regressive Integrated Moving Average) model is used to model the residuals; eventually, the results from the two models are combined to obtain final SCB prediction values. At last, this paper uses precise SCB data from IGS (International GNSS Service) to conduct prediction tests, and the results show that the proposed model is effective and has better prediction performance compared with the quadratic polynomial model, grey model, and ARIMA model. In addition, the new method can also overcome the insufficiency of the ARIMA model in model recognition and order determination.

  12. Applying Kaplan-Meier to Item Response Data

    ERIC Educational Resources Information Center

    McNeish, Daniel

    2018-01-01

    Some IRT models can be equivalently modeled in alternative frameworks such as logistic regression. Logistic regression can also model time-to-event data, which concerns the probability of an event occurring over time. Using the relation between time-to-event models and logistic regression and the relation between logistic regression and IRT, this…

  13. Seasonal patterns of dengue fever and associated climate factors in 4 provinces in Vietnam from 1994 to 2013.

    PubMed

    Lee, Hu Suk; Nguyen-Viet, Hung; Nam, Vu Sinh; Lee, Mihye; Won, Sungho; Duc, Phuc Pham; Grace, Delia

    2017-03-20

    In Vietnam, dengue fever (DF) is still a leading cause of hospitalization. The main objective of this study was to evaluate the seasonality and association with climate factors (temperature and precipitation) on the incidences of DF in four provinces where the highest incidence rates were observed from 1994 to 2013 in Vietnam. Incidence rates (per 100,000) were calculated on a monthly basis from during the study period. The seasonal-decomposition procedure based on loess (STL) was used in order to assess the trend and seasonality of DF. In addition, a seasonal cycle subseries (SCS) plot and univariate negative binomial regression (NBR) model were used to evaluate the monthly variability with statistical analysis. Lastly, a generalized estimating equation (GEE) was used to assess the relationship between monthly incidence rates and weather factors (temperature and precipitation). We found that increased incidence rates were observed in the second half of each year (from May through December) which is the rainy season in each province. In Hanoi, the final model showed that 1 °C rise of temperature corresponded to an increase of 13% in the monthly incidence rate of DF. In Khanh Hoa, the final model displayed that 1 °C increase in temperature corresponded to an increase of 17% while 100 mm increase in precipitation corresponded to an increase of 11% of DF incidence rate. For Ho Chi Minh City, none of variables were significant in the model. In An Giang, the final model showed that 100 mm increase of precipitation in the preceding and same months corresponded to an increase of 30% and 22% of DF incidence rate. Our findings provide insight into understanding the seasonal pattern and associated climate risk factors.

  14. Predicting fine-scale distributions of peripheral aquatic species in headwater streams

    DOE PAGES

    DeRolph, Christopher R.; Nelson, Stacy A. C.; Kwak, Thomas J.; ...

    2014-12-09

    Headwater species and peripheral populations that occupy habitat at the edge of a species range may hold an increased conservation value to managers due to their potential to maximize intraspecies diversity and species' adaptive capabilities in the context of rapid environmental change. The southern Appalachian Mountains are the southern extent of the geographic range of native Salvelinus fontinalis and naturalized Oncorhynchus mykiss and Salmo trutta in eastern North America. In this paper, we predicted distributions of these peripheral, headwater wild trout populations at a fine scale to serve as a planning and management tool for resource managers to maximize resistancemore » and resilience of these populations in the face of anthropogenic stressors. We developed correlative logistic regression models to predict occurrence of brook trout, rainbow trout, and brown trout for every interconfluence stream reach in the study area. A stream network was generated to capture a more consistent representation of headwater streams. Each of the final models had four significant metrics in common: stream order, fragmentation, precipitation, and land cover. Strahler stream order was found to be the most influential variable in two of the three final models and the second most influential variable in the other model. Greater than 70% presence accuracy was achieved for all three models. The underrepresentation of headwater streams in commonly used hydrography datasets is an important consideration that warrants close examination when forecasting headwater species distributions and range estimates. Finally and additionally, it appears that a relative watershed position metric (e.g., stream order) is an important surrogate variable (even when elevation is included) for biotic interactions across the landscape in areas where headwater species distributions are influenced by topographical gradients.« less

  15. Estimating flood magnitude and frequency at gaged and ungaged sites on streams in Alaska and conterminous basins in Canada, based on data through water year 2012

    USGS Publications Warehouse

    Curran, Janet H.; Barth, Nancy A.; Veilleux, Andrea G.; Ourso, Robert T.

    2016-03-16

    Estimates of the magnitude and frequency of floods are needed across Alaska for engineering design of transportation and water-conveyance structures, flood-insurance studies, flood-plain management, and other water-resource purposes. This report updates methods for estimating flood magnitude and frequency in Alaska and conterminous basins in Canada. Annual peak-flow data through water year 2012 were compiled from 387 streamgages on unregulated streams with at least 10 years of record. Flood-frequency estimates were computed for each streamgage using the Expected Moments Algorithm to fit a Pearson Type III distribution to the logarithms of annual peak flows. A multiple Grubbs-Beck test was used to identify potentially influential low floods in the time series of peak flows for censoring in the flood frequency analysis.For two new regional skew areas, flood-frequency estimates using station skew were computed for stations with at least 25 years of record for use in a Bayesian least-squares regression analysis to determine a regional skew value. The consideration of basin characteristics as explanatory variables for regional skew resulted in improvements in precision too small to warrant the additional model complexity, and a constant model was adopted. Regional Skew Area 1 in eastern-central Alaska had a regional skew of 0.54 and an average variance of prediction of 0.45, corresponding to an effective record length of 22 years. Regional Skew Area 2, encompassing coastal areas bordering the Gulf of Alaska, had a regional skew of 0.18 and an average variance of prediction of 0.12, corresponding to an effective record length of 59 years. Station flood-frequency estimates for study sites in regional skew areas were then recomputed using a weighted skew incorporating the station skew and regional skew. In a new regional skew exclusion area outside the regional skew areas, the density of long-record streamgages was too sparse for regional analysis and station skew was used for all estimates. Final station flood frequency estimates for all study streamgages are presented for the 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities.Regional multiple-regression analysis was used to produce equations for estimating flood frequency statistics from explanatory basin characteristics. Basin characteristics, including physical and climatic variables, were updated for all study streamgages using a geographical information system and geospatial source data. Screening for similar-sized nested basins eliminated hydrologically redundant sites, and screening for eligibility for analysis of explanatory variables eliminated regulated peaks, outburst peaks, and sites with indeterminate basin characteristics. An ordinary least‑squares regression used flood-frequency statistics and basin characteristics for 341 streamgages (284 in Alaska and 57 in Canada) to determine the most suitable combination of basin characteristics for a flood-frequency regression model and to explore regional grouping of streamgages for explaining variability in flood-frequency statistics across the study area. The most suitable model for explaining flood frequency used drainage area and mean annual precipitation as explanatory variables for the entire study area as a region. Final regression equations for estimating the 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probability discharge in Alaska and conterminous basins in Canada were developed using a generalized least-squares regression. The average standard error of prediction for the regression equations for the various annual exceedance probabilities ranged from 69 to 82 percent, and the pseudo-coefficient of determination (pseudo-R2) ranged from 85 to 91 percent.The regional regression equations from this study were incorporated into the U.S. Geological Survey StreamStats program for a limited area of the State—the Cook Inlet Basin. StreamStats is a national web-based geographic information system application that facilitates retrieval of streamflow statistics and associated information. StreamStats retrieves published data for gaged sites and, for user-selected ungaged sites, delineates drainage areas from topographic and hydrographic data, computes basin characteristics, and computes flood frequency estimates using the regional regression equations.

  16. Serum Folate Shows an Inverse Association with Blood Pressure in a Cohort of Chinese Women of Childbearing Age: A Cross-Sectional Study

    PubMed Central

    Shen, Minxue; Tan, Hongzhuan; Zhou, Shujin; Retnakaran, Ravi; Smith, Graeme N.; Davidge, Sandra T.; Trasler, Jacquetta; Walker, Mark C.; Wen, Shi Wu

    2016-01-01

    Background It has been reported that higher folate intake from food and supplementation is associated with decreased blood pressure (BP). The association between serum folate concentration and BP has been examined in few studies. We aim to examine the association between serum folate and BP levels in a cohort of young Chinese women. Methods We used the baseline data from a pre-conception cohort of women of childbearing age in Liuyang, China, for this study. Demographic data were collected by structured interview. Serum folate concentration was measured by immunoassay, and homocysteine, blood glucose, triglyceride and total cholesterol were measured through standardized clinical procedures. Multiple linear regression and principal component regression model were applied in the analysis. Results A total of 1,532 healthy normotensive non-pregnant women were included in the final analysis. The mean concentration of serum folate was 7.5 ± 5.4 nmol/L and 55% of the women presented with folate deficiency (< 6.8 nmol/L). Multiple linear regression and principal component regression showed that serum folate levels were inversely associated with systolic and diastolic BP, after adjusting for demographic, anthropometric, and biochemical factors. Conclusions Serum folate is inversely associated with BP in non-pregnant women of childbearing age with high prevalence of folate deficiency. PMID:27182603

  17. Self-consistent asset pricing models

    NASA Astrophysics Data System (ADS)

    Malevergne, Y.; Sornette, D.

    2007-08-01

    We discuss the foundations of factor or regression models in the light of the self-consistency condition that the market portfolio (and more generally the risk factors) is (are) constituted of the assets whose returns it is (they are) supposed to explain. As already reported in several articles, self-consistency implies correlations between the return disturbances. As a consequence, the alphas and betas of the factor model are unobservable. Self-consistency leads to renormalized betas with zero effective alphas, which are observable with standard OLS regressions. When the conditions derived from internal consistency are not met, the model is necessarily incomplete, which means that some sources of risk cannot be replicated (or hedged) by a portfolio of stocks traded on the market, even for infinite economies. Analytical derivations and numerical simulations show that, for arbitrary choices of the proxy which are different from the true market portfolio, a modified linear regression holds with a non-zero value αi at the origin between an asset i's return and the proxy's return. Self-consistency also introduces “orthogonality” and “normality” conditions linking the betas, alphas (as well as the residuals) and the weights of the proxy portfolio. Two diagnostics based on these orthogonality and normality conditions are implemented on a basket of 323 assets which have been components of the S&P500 in the period from January 1990 to February 2005. These two diagnostics show interesting departures from dynamical self-consistency starting about 2 years before the end of the Internet bubble. Assuming that the CAPM holds with the self-consistency condition, the OLS method automatically obeys the resulting orthogonality and normality conditions and therefore provides a simple way to self-consistently assess the parameters of the model by using proxy portfolios made only of the assets which are used in the CAPM regressions. Finally, the factor decomposition with the self-consistency condition derives a risk-factor decomposition in the multi-factor case which is identical to the principal component analysis (PCA), thus providing a direct link between model-driven and data-driven constructions of risk factors. This correspondence shows that PCA will therefore suffer from the same limitations as the CAPM and its multi-factor generalization, namely lack of out-of-sample explanatory power and predictability. In the multi-period context, the self-consistency conditions force the betas to be time-dependent with specific constraints.

  18. Sensitivity to gaze-contingent contrast increments in naturalistic movies: An exploratory report and model comparison

    PubMed Central

    Wallis, Thomas S. A.; Dorr, Michael; Bex, Peter J.

    2015-01-01

    Sensitivity to luminance contrast is a prerequisite for all but the simplest visual systems. To examine contrast increment detection performance in a way that approximates the natural environmental input of the human visual system, we presented contrast increments gaze-contingently within naturalistic video freely viewed by observers. A band-limited contrast increment was applied to a local region of the video relative to the observer's current gaze point, and the observer made a forced-choice response to the location of the target (≈25,000 trials across five observers). We present exploratory analyses showing that performance improved as a function of the magnitude of the increment and depended on the direction of eye movements relative to the target location, the timing of eye movements relative to target presentation, and the spatiotemporal image structure at the target location. Contrast discrimination performance can be modeled by assuming that the underlying contrast response is an accelerating nonlinearity (arising from a nonlinear transducer or gain control). We implemented one such model and examined the posterior over model parameters, estimated using Markov-chain Monte Carlo methods. The parameters were poorly constrained by our data; parameters constrained using strong priors taken from previous research showed poor cross-validated prediction performance. Atheoretical logistic regression models were better constrained and provided similar prediction performance to the nonlinear transducer model. Finally, we explored the properties of an extended logistic regression that incorporates both eye movement and image content features. Models of contrast transduction may be better constrained by incorporating data from both artificial and natural contrast perception settings. PMID:26057546

  19. LOGISTIC NETWORK REGRESSION FOR SCALABLE ANALYSIS OF NETWORKS WITH JOINT EDGE/VERTEX DYNAMICS

    PubMed Central

    Almquist, Zack W.; Butts, Carter T.

    2015-01-01

    Change in group size and composition has long been an important area of research in the social sciences. Similarly, interest in interaction dynamics has a long history in sociology and social psychology. However, the effects of endogenous group change on interaction dynamics are a surprisingly understudied area. One way to explore these relationships is through social network models. Network dynamics may be viewed as a process of change in the edge structure of a network, in the vertex set on which edges are defined, or in both simultaneously. Although early studies of such processes were primarily descriptive, recent work on this topic has increasingly turned to formal statistical models. Although showing great promise, many of these modern dynamic models are computationally intensive and scale very poorly in the size of the network under study and/or the number of time points considered. Likewise, currently used models focus on edge dynamics, with little support for endogenously changing vertex sets. Here, the authors show how an existing approach based on logistic network regression can be extended to serve as a highly scalable framework for modeling large networks with dynamic vertex sets. The authors place this approach within a general dynamic exponential family (exponential-family random graph modeling) context, clarifying the assumptions underlying the framework (and providing a clear path for extensions), and they show how model assessment methods for cross-sectional networks can be extended to the dynamic case. Finally, the authors illustrate this approach on a classic data set involving interactions among windsurfers on a California beach. PMID:26120218

  20. LOGISTIC NETWORK REGRESSION FOR SCALABLE ANALYSIS OF NETWORKS WITH JOINT EDGE/VERTEX DYNAMICS.

    PubMed

    Almquist, Zack W; Butts, Carter T

    2014-08-01

    Change in group size and composition has long been an important area of research in the social sciences. Similarly, interest in interaction dynamics has a long history in sociology and social psychology. However, the effects of endogenous group change on interaction dynamics are a surprisingly understudied area. One way to explore these relationships is through social network models. Network dynamics may be viewed as a process of change in the edge structure of a network, in the vertex set on which edges are defined, or in both simultaneously. Although early studies of such processes were primarily descriptive, recent work on this topic has increasingly turned to formal statistical models. Although showing great promise, many of these modern dynamic models are computationally intensive and scale very poorly in the size of the network under study and/or the number of time points considered. Likewise, currently used models focus on edge dynamics, with little support for endogenously changing vertex sets. Here, the authors show how an existing approach based on logistic network regression can be extended to serve as a highly scalable framework for modeling large networks with dynamic vertex sets. The authors place this approach within a general dynamic exponential family (exponential-family random graph modeling) context, clarifying the assumptions underlying the framework (and providing a clear path for extensions), and they show how model assessment methods for cross-sectional networks can be extended to the dynamic case. Finally, the authors illustrate this approach on a classic data set involving interactions among windsurfers on a California beach.

  1. Modeling of thermal degradation kinetics of the C-glucosyl xanthone mangiferin in an aqueous model solution as a function of pH and temperature and protective effect of honeybush extract matrix.

    PubMed

    Beelders, Theresa; de Beer, Dalene; Kidd, Martin; Joubert, Elizabeth

    2018-01-01

    Mangiferin, a C-glucosyl xanthone, abundant in mango and honeybush, is increasingly targeted for its bioactive properties and thus to enhance functional properties of food. The thermal degradation kinetics of mangiferin at pH3, 4, 5, 6 and 7 were each modeled at five temperatures ranging between 60 and 140°C. First-order reaction models were fitted to the data using non-linear regression to determine the reaction rate constant at each pH-temperature combination. The reaction rate constant increased with increasing temperature and pH. Comparison of the reaction rate constants at 100°C revealed an exponential relationship between the reaction rate constant and pH. The data for each pH were also modeled with the Arrhenius equation using non-linear and linear regression to determine the activation energy and pre-exponential factor. Activation energies decreased slightly with increasing pH. Finally, a multi-linear model taking into account both temperature and pH was developed for mangiferin degradation. Sterilization (121°C for 4min) of honeybush extracts dissolved at pH4, 5 and 7 did not cause noticeable degradation of mangiferin, although the multi-linear model predicted 34% degradation at pH7. The extract matrix is postulated to exert a protective effect as changes in potential precursor content could not fully explain the stability of mangiferin. Copyright © 2017 Elsevier Ltd. All rights reserved.

  2. Statistical approach to Higgs boson couplings in the standard model effective field theory

    NASA Astrophysics Data System (ADS)

    Murphy, Christopher W.

    2018-01-01

    We perform a parameter fit in the standard model effective field theory (SMEFT) with an emphasis on using regularized linear regression to tackle the issue of the large number of parameters in the SMEFT. In regularized linear regression, a positive definite function of the parameters of interest is added to the usual cost function. A cross-validation is performed to try to determine the optimal value of the regularization parameter to use, but it selects the standard model (SM) as the best model to explain the measurements. Nevertheless as proof of principle of this technique we apply it to fitting Higgs boson signal strengths in SMEFT, including the latest Run-2 results. Results are presented in terms of the eigensystem of the covariance matrix of the least squares estimators as it has a degree model-independent to it. We find several results in this initial work: the SMEFT predicts the total width of the Higgs boson to be consistent with the SM prediction; the ATLAS and CMS experiments at the LHC are currently sensitive to non-resonant double Higgs boson production. Constraints are derived on the viable parameter space for electroweak baryogenesis in the SMEFT, reinforcing the notion that a first order phase transition requires fairly low-scale beyond the SM physics. Finally, we study which future experimental measurements would give the most improvement on the global constraints on the Higgs sector of the SMEFT.

  3. SkyFACT: high-dimensional modeling of gamma-ray emission with adaptive templates and penalized likelihoods

    NASA Astrophysics Data System (ADS)

    Storm, Emma; Weniger, Christoph; Calore, Francesca

    2017-08-01

    We present SkyFACT (Sky Factorization with Adaptive Constrained Templates), a new approach for studying, modeling and decomposing diffuse gamma-ray emission. Like most previous analyses, the approach relies on predictions from cosmic-ray propagation codes like GALPROP and DRAGON. However, in contrast to previous approaches, we account for the fact that models are not perfect and allow for a very large number (gtrsim 105) of nuisance parameters to parameterize these imperfections. We combine methods of image reconstruction and adaptive spatio-spectral template regression in one coherent hybrid approach. To this end, we use penalized Poisson likelihood regression, with regularization functions that are motivated by the maximum entropy method. We introduce methods to efficiently handle the high dimensionality of the convex optimization problem as well as the associated semi-sparse covariance matrix, using the L-BFGS-B algorithm and Cholesky factorization. We test the method both on synthetic data as well as on gamma-ray emission from the inner Galaxy, |l|<90o and |b|<20o, as observed by the Fermi Large Area Telescope. We finally define a simple reference model that removes most of the residual emission from the inner Galaxy, based on conventional diffuse emission components as well as components for the Fermi bubbles, the Fermi Galactic center excess, and extended sources along the Galactic disk. Variants of this reference model can serve as basis for future studies of diffuse emission in and outside the Galactic disk.

  4. Predicting ectotherm disease vector spread—benefits from multidisciplinary approaches and directions forward

    NASA Astrophysics Data System (ADS)

    Thomas, Stephanie Margarete; Beierkuhnlein, Carl

    2013-05-01

    The occurrence of ectotherm disease vectors outside of their previous distribution area and the emergence of vector-borne diseases can be increasingly observed at a global scale and are accompanied by a growing number of studies which investigate the vast range of determining factors and their causal links. Consequently, a broad span of scientific disciplines is involved in tackling these complex phenomena. First, we evaluate the citation behaviour of relevant scientific literature in order to clarify the question "do scientists consider results of other disciplines to extend their expertise?" We then highlight emerging tools and concepts useful for risk assessment. Correlative models (regression-based, machine-learning and profile techniques), mechanistic models (basic reproduction number R 0) and methods of spatial regression, interaction and interpolation are described. We discuss further steps towards multidisciplinary approaches regarding new tools and emerging concepts to combine existing approaches such as Bayesian geostatistical modelling, mechanistic models which avoid the need for parameter fitting, joined correlative and mechanistic models, multi-criteria decision analysis and geographic profiling. We take the quality of both occurrence data for vector, host and disease cases, and data of the predictor variables into consideration as both determine the accuracy of risk area identification. Finally, we underline the importance of multidisciplinary research approaches. Even if the establishment of communication networks between scientific disciplines and the share of specific methods is time consuming, it promises new insights for the surveillance and control of vector-borne diseases worldwide.

  5. Statistical and Biophysical Models for Predicting Total and Outdoor Water Use in Los Angeles

    NASA Astrophysics Data System (ADS)

    Mini, C.; Hogue, T. S.; Pincetl, S.

    2012-04-01

    Modeling water demand is a complex exercise in the choice of the functional form, techniques and variables to integrate in the model. The goal of the current research is to identify the determinants that control total and outdoor residential water use in semi-arid cities and to utilize that information in the development of statistical and biophysical models that can forecast spatial and temporal urban water use. The City of Los Angeles is unique in its highly diverse socio-demographic, economic and cultural characteristics across neighborhoods, which introduces significant challenges in modeling water use. Increasing climate variability also contributes to uncertainties in water use predictions in urban areas. Monthly individual water use records were acquired from the Los Angeles Department of Water and Power (LADWP) for the 2000 to 2010 period. Study predictors of residential water use include socio-demographic, economic, climate and landscaping variables at the zip code level collected from US Census database. Climate variables are estimated from ground-based observations and calculated at the centroid of each zip code by inverse-distance weighting method. Remotely-sensed products of vegetation biomass and landscape land cover are also utilized. Two linear regression models were developed based on the panel data and variables described: a pooled-OLS regression model and a linear mixed effects model. Both models show income per capita and the percentage of landscape areas in each zip code as being statistically significant predictors. The pooled-OLS model tends to over-estimate higher water use zip codes and both models provide similar RMSE values.Outdoor water use was estimated at the census tract level as the residual between total water use and indoor use. This residual is being compared with the output from a biophysical model including tree and grass cover areas, climate variables and estimates of evapotranspiration at very high spatial resolution. A genetic algorithm based model (Shuffled Complex Evolution-UA; SCE-UA) is also being developed to provide estimates of the predictions and parameters uncertainties and to compare against the linear regression models. Ultimately, models will be selected to undertake predictions for a range of climate change and landscape scenarios. Finally, project results will contribute to a better understanding of water demand to help predict future water use and implement targeted landscaping conservation programs to maintain sustainable water needs for a growing population under uncertain climate variability.

  6. In vitro differential diagnosis of clavus and verruca by a predictive model generated from electrical impedance.

    PubMed

    Hung, Chien-Ya; Sun, Pei-Lun; Chiang, Shu-Jen; Jaw, Fu-Shan

    2014-01-01

    Similar clinical appearances prevent accurate diagnosis of two common skin diseases, clavus and verruca. In this study, electrical impedance is employed as a novel tool to generate a predictive model for differentiating these two diseases. We used 29 clavus and 28 verruca lesions. To obtain impedance parameters, a LCR-meter system was applied to measure capacitance (C), resistance (Re), impedance magnitude (Z), and phase angle (θ). These values were combined with lesion thickness (d) to characterize the tissue specimens. The results from clavus and verruca were then fitted to a univariate logistic regression model with the generalized estimating equations (GEE) method. In model generation, log ZSD and θSD were formulated as predictors by fitting a multiple logistic regression model with the same GEE method. The potential nonlinear effects of covariates were detected by fitting generalized additive models (GAM). Moreover, the model was validated by the goodness-of-fit (GOF) assessments. Significant mean differences of the index d, Re, Z, and θ are found between clavus and verruca (p<0.001). A final predictive model is established with Z and θ indices. The model fits the observed data quite well. In GOF evaluation, the area under the receiver operating characteristics (ROC) curve is 0.875 (>0.7), the adjusted generalized R2 is 0.512 (>0.3), and the p value of the Hosmer-Lemeshow GOF test is 0.350 (>0.05). This technique promises to provide an approved model for differential diagnosis of clavus and verruca. It could provide a rapid, relatively low-cost, safe and non-invasive screening tool in clinic use.

  7. A Multiyear Model of Influenza Vaccination in the United States.

    PubMed

    Kamis, Arnold; Zhang, Yuji; Kamis, Tamara

    2017-07-28

    Vaccinating adults against influenza remains a challenge in the United States. Using data from the Centers for Disease Control and Prevention, we present a model for predicting who receives influenza vaccination in the United States between 2012 and 2014, inclusive. The logistic regression model contains nine predictors: age, pneumococcal vaccination, time since last checkup, highest education level attained, employment, health care coverage, number of personal doctors, smoker status, and annual household income. The model, which classifies correctly 67 percent of the data in 2013, is consistent with models tested on the 2012 and 2014 datasets. Thus, we have a multiyear model to explain and predict influenza vaccination in the United States. The results indicate room for improvement in vaccination rates. We discuss how cognitive biases may underlie reluctance to obtain vaccination. We argue that targeted communications addressing cognitive biases could be useful for effective framing of vaccination messages, thus increasing the vaccination rate. Finally, we discuss limitations of the current study and questions for future research.

  8. Artificial neural networks applied to forecasting time series.

    PubMed

    Montaño Moreno, Juan J; Palmer Pol, Alfonso; Muñoz Gracia, Pilar

    2011-04-01

    This study offers a description and comparison of the main models of Artificial Neural Networks (ANN) which have proved to be useful in time series forecasting, and also a standard procedure for the practical application of ANN in this type of task. The Multilayer Perceptron (MLP), Radial Base Function (RBF), Generalized Regression Neural Network (GRNN), and Recurrent Neural Network (RNN) models are analyzed. With this aim in mind, we use a time series made up of 244 time points. A comparative study establishes that the error made by the four neural network models analyzed is less than 10%. In accordance with the interpretation criteria of this performance, it can be concluded that the neural network models show a close fit regarding their forecasting capacity. The model with the best performance is the RBF, followed by the RNN and MLP. The GRNN model is the one with the worst performance. Finally, we analyze the advantages and limitations of ANN, the possible solutions to these limitations, and provide an orientation towards future research.

  9. Estimating radiative feedbacks from stochastic fluctuations in surface temperature and energy imbalance

    NASA Astrophysics Data System (ADS)

    Proistosescu, C.; Donohoe, A.; Armour, K.; Roe, G.; Stuecker, M. F.; Bitz, C. M.

    2017-12-01

    Joint observations of global surface temperature and energy imbalance provide for a unique opportunity to empirically constrain radiative feedbacks. However, the satellite record of Earth's radiative imbalance is relatively short and dominated by stochastic fluctuations. Estimates of radiative feedbacks obtained by regressing energy imbalance against surface temperature depend strongly on sampling choices and on assumptions about whether the stochastic fluctuations are primarily forced by atmospheric or oceanic variability (e.g. Murphy and Forster 2010, Dessler 2011, Spencer and Braswell 2011, Forster 2016). We develop a framework around a stochastic energy balance model that allows us to parse the different contributions of atmospheric and oceanic forcing based on their differing impacts on the covariance structure - or lagged regression - of temperature and radiative imbalance. We validate the framework in a hierarchy of general circulation models: the impact of atmospheric forcing is examined in unforced control simulations of fixed sea-surface temperature and slab ocean model versions; the impact of oceanic forcing is examined in coupled simulations with prescribed ENSO variability. With the impact of atmospheric and oceanic forcing constrained, we are able to predict the relationship between temperature and radiative imbalance in a fully coupled control simulation, finding that both forcing sources are needed to explain the structure of the lagged-regression. We further model the dependence of feedback estimates on sampling interval by considering the effects of a finite equilibration time for the atmosphere, and issues of smoothing and aliasing. Finally, we develop a method to fit the stochastic model to the short timeseries of temperature and radiative imbalance by performing a Bayesian inference based on a modified version of the spectral Whittle likelihood. We are thus able to place realistic joint uncertainty estimates on both stochastic forcing and radiative feedbacks derived from observational records. We find that these records are, as of yet, too short to be useful in constraining radiative feedbacks, and we provide estimates of how the uncertainty narrows as a function of record length.

  10. Extrapolating regional probability of drying of headwater streams using discrete observations and gauging networks

    NASA Astrophysics Data System (ADS)

    Beaufort, Aurélien; Lamouroux, Nicolas; Pella, Hervé; Datry, Thibault; Sauquet, Eric

    2018-05-01

    Headwater streams represent a substantial proportion of river systems and many of them have intermittent flows due to their upstream position in the network. These intermittent rivers and ephemeral streams have recently seen a marked increase in interest, especially to assess the impact of drying on aquatic ecosystems. The objective of this paper is to quantify how discrete (in space and time) field observations of flow intermittence help to extrapolate over time the daily probability of drying (defined at the regional scale). Two empirical models based on linear or logistic regressions have been developed to predict the daily probability of intermittence at the regional scale across France. Explanatory variables were derived from available daily discharge and groundwater-level data of a dense gauging/piezometer network, and models were calibrated using discrete series of field observations of flow intermittence. The robustness of the models was tested using an independent, dense regional dataset of intermittence observations and observations of the year 2017 excluded from the calibration. The resulting models were used to extrapolate the daily regional probability of drying in France: (i) over the period 2011-2017 to identify the regions most affected by flow intermittence; (ii) over the period 1989-2017, using a reduced input dataset, to analyse temporal variability of flow intermittence at the national level. The two empirical regression models performed equally well between 2011 and 2017. The accuracy of predictions depended on the number of continuous gauging/piezometer stations and intermittence observations available to calibrate the regressions. Regions with the highest performance were located in sedimentary plains, where the monitoring network was dense and where the regional probability of drying was the highest. Conversely, the worst performances were obtained in mountainous regions. Finally, temporal projections (1989-2016) suggested the highest probabilities of intermittence (> 35 %) in 1989-1991, 2003 and 2005. A high density of intermittence observations improved the information provided by gauging stations and piezometers to extrapolate the temporal variability of intermittent rivers and ephemeral streams.

  11. Transient combustion in hybrid rockets

    NASA Astrophysics Data System (ADS)

    Karabeyoglu, Mustafa Arif

    1998-09-01

    Hybrid rockets regained interest recently as an alternative chemical propulsion system due to their advantages over the solid and liquid systems that are currently in use. Development efforts on hybrids revealed two important problem areas: (1) low frequency instabilities and (2) slow transient response. Both of these are closely related to the transient behavior which is a poorly understood aspect of hybrid operation. This thesis is mainly involved with a theoretical study of transient combustion in hybrid rockets. We follow the methodology of identifying and modeling the subsystems of the motor such as the thermal lags in the solid, boundary layer combustion and chamber gasdynamics from a dynamic point of view. We begin with the thermal lag in the solid which yield the regression rate for any given wall heat flux variation. Interesting phenomena such as overshooting during throttling and the amplification and phase lead regions in the frequency domain are discovered. Later we develop a quasi-steady transient hybrid combustion model supported with time delays for the boundary layer processes. This is integrated with the thermal lag system to obtain the thermal combustion (TC) coupled response. The TC coupled system with positive delays generated low frequency instabilities. The scaling of the instabilities are in good agreement with actual motor test data. Finally, we formulate a gasdynamic model for the hybrid chamber which successfully resolves the filling/emptying and longitudinal acoustic behavior of the motor. The TC coupled system is later integrated to the gasdynamic model to obtain the overall response (TCG coupled system) of gaseous oxidizer motors with stiff feed systems. Low frequency instabilities were also encountered for the TCG coupled system. Apart from the transient investigations, the regression rate behavior of liquefying hybrid propellants such as solid cryogenic materials are also studied. The theory is based on the possibility of enhancement of regression rate by the entrainment mass transfer from a liquid layer formed on the fuel surface. The predicted regression rates are in good agreement with the cryogenic experimental findings obtained recently at Edwards Airforce Base with a frozen pentane and gaseous oxygen system.

  12. The estimated effect of mass or footprint reduction in recent light-duty vehicles on U.S. societal fatality risk per vehicle mile traveled.

    PubMed

    Wenzel, Tom

    2013-10-01

    The National Highway Traffic Safety Administration (NHTSA) recently updated its 2003 and 2010 logistic regression analyses of the effect of a reduction in light-duty vehicle mass on US societal fatality risk per vehicle mile traveled (VMT; Kahane, 2012). Societal fatality risk includes the risk to both the occupants of the case vehicle as well as any crash partner or pedestrians. The current analysis is the most thorough investigation of this issue to date. This paper replicates the Kahane analysis and extends it by testing the sensitivity of his results to changes in the definition of risk, and the data and control variables used in the regression models. An assessment by Lawrence Berkeley National Laboratory (LBNL) indicates that the estimated effect of mass reduction on risk is smaller than in Kahane's previous studies, and is statistically non-significant for all but the lightest cars (Wenzel, 2012a). The estimated effects of a reduction in mass or footprint (i.e. wheelbase times track width) are small relative to other vehicle, driver, and crash variables used in the regression models. The recent historical correlation between mass and footprint is not so large to prohibit including both variables in the same regression model; excluding footprint from the model, i.e. allowing footprint to decrease with mass, increases the estimated detrimental effect of mass reduction on risk in cars and crossover utility vehicles (CUVs)/minivans, but has virtually no effect on light trucks. Analysis by footprint deciles indicates that risk does not consistently increase with reduced mass for vehicles of similar footprint. Finally, the estimated effects of mass and footprint reduction are sensitive to the measure of exposure used (fatalities per induced exposure crash, rather than per VMT), as well as other changes in the data or control variables used. It appears that the safety penalty from lower mass can be mitigated with careful vehicle design, and that manufacturers can reduce mass as a strategy to increase their vehicles' fuel economy and reduce greenhouse gas emissions without necessarily compromising societal safety. Published by Elsevier Ltd.

  13. Meteorological influence on predicting surface SO2 concentration from satellite remote sensing in Shanghai, China.

    PubMed

    Xue, Dan; Yin, Jingyuan

    2014-05-01

    In this study, we explored the potential applications of the Ozone Monitoring Instrument (OMI) satellite sensor in air pollution research. The OMI planetary boundary layer sulfur dioxide (SO2_PBL) column density and daily average surface SO2 concentration of Shanghai from 2004 to 2012 were analyzed. After several consecutive years of increase, the surface SO2 concentration finally declined in 2007. It was higher in winter than in other seasons. The coefficient between daily average surface SO2 concentration and SO2_PBL was only 0.316. But SO2_PBL was found to be a highly significant predictor of the surface SO2 concentration using the simple regression model. Five meteorological factors were considered in this study, among them, temperature, dew point, relative humidity, and wind speed were negatively correlated with surface SO2 concentration, while pressure was positively correlated. Furthermore, it was found that dew point was a more effective predictor than temperature. When these meteorological factors were used in multiple regression, the determination coefficient reached 0.379. The relationship of the surface SO2 concentration and meteorological factors was seasonally dependent. In summer and autumn, the regression model performed better than in spring and winter. The surface SO2 concentration predicting method proposed in this study can be easily adapted for other regions, especially most useful for those having no operational air pollution forecasting services or having sparse ground monitoring networks.

  14. Statistical Learning Theory for High Dimensional Prediction: Application to Criterion-Keyed Scale Development

    PubMed Central

    Chapman, Benjamin P.; Weiss, Alexander; Duberstein, Paul

    2016-01-01

    Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in “big data” problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how three common SLT algorithms–Supervised Principal Components, Regularization, and Boosting—can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach—or perhaps because of them–SLT methods may hold value as a statistically rigorous approach to exploratory regression. PMID:27454257

  15. Risk factors for amendment in type, duration and setting of prescribed outpatient parenteral antimicrobial therapy (OPAT) for adult patients with cellulitis: a retrospective cohort study and CART analysis.

    PubMed

    Quirke, Michael; Curran, Emma May; O'Kelly, Patrick; Moran, Ruth; Daly, Eimear; Aylward, Seamus; McElvaney, Gerry; Wakai, Abel

    2018-01-01

    To measure the percentage rate and risk factors for amendment in the type, duration and setting of outpatient parenteral antimicrobial therapy ( OPAT) for the treatment of cellulitis. A retrospective cohort study of adult patients receiving OPAT for cellulitis was performed. Treatment amendment (TA) was defined as hospital admission or change in antibiotic therapy in order to achieve clinical response. Multivariable logistic regression (MVLR) and classification and regression tree (CART) analysis were performed. There were 307 patients enrolled. TA occurred in 36 patients (11.7%). Significant risk factors for TA on MVLR were increased age, increased Numerical Pain Scale Score (NPSS) and immunocompromise. The median OPAT duration was 7 days. Increased age, heart rate and C reactive protein were associated with treatment prolongation. CART analysis selected age <64.5 years, female gender and NPSS <2.5 in the final model, generating a low-sensitivity (27.8%), high-specificity (97.1%) decision tree. Increased age, NPSS and immunocompromise were associated with OPAT amendment. These identified risk factors can be used to support an evidence-based approach to patient selection for OPAT in cellulitis. The CART algorithm has good specificity but lacks sensitivity and is shown to be inferior in this study to logistic regression modelling. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  16. A Model Comparison for Count Data with a Positively Skewed Distribution with an Application to the Number of University Mathematics Courses Completed

    ERIC Educational Resources Information Center

    Liou, Pey-Yan

    2009-01-01

    The current study examines three regression models: OLS (ordinary least square) linear regression, Poisson regression, and negative binomial regression for analyzing count data. Simulation results show that the OLS regression model performed better than the others, since it did not produce more false statistically significant relationships than…

  17. Using instrumental variables to estimate a Cox's proportional hazards regression subject to additive confounding

    PubMed Central

    Tosteson, Tor D.; Morden, Nancy E.; Stukel, Therese A.; O'Malley, A. James

    2014-01-01

    The estimation of treatment effects is one of the primary goals of statistics in medicine. Estimation based on observational studies is subject to confounding. Statistical methods for controlling bias due to confounding include regression adjustment, propensity scores and inverse probability weighted estimators. These methods require that all confounders are recorded in the data. The method of instrumental variables (IVs) can eliminate bias in observational studies even in the absence of information on confounders. We propose a method for integrating IVs within the framework of Cox's proportional hazards model and demonstrate the conditions under which it recovers the causal effect of treatment. The methodology is based on the approximate orthogonality of an instrument with unobserved confounders among those at risk. We derive an estimator as the solution to an estimating equation that resembles the score equation of the partial likelihood in much the same way as the traditional IV estimator resembles the normal equations. To justify this IV estimator for a Cox model we perform simulations to evaluate its operating characteristics. Finally, we apply the estimator to an observational study of the effect of coronary catheterization on survival. PMID:25506259

  18. Using instrumental variables to estimate a Cox's proportional hazards regression subject to additive confounding.

    PubMed

    MacKenzie, Todd A; Tosteson, Tor D; Morden, Nancy E; Stukel, Therese A; O'Malley, A James

    2014-06-01

    The estimation of treatment effects is one of the primary goals of statistics in medicine. Estimation based on observational studies is subject to confounding. Statistical methods for controlling bias due to confounding include regression adjustment, propensity scores and inverse probability weighted estimators. These methods require that all confounders are recorded in the data. The method of instrumental variables (IVs) can eliminate bias in observational studies even in the absence of information on confounders. We propose a method for integrating IVs within the framework of Cox's proportional hazards model and demonstrate the conditions under which it recovers the causal effect of treatment. The methodology is based on the approximate orthogonality of an instrument with unobserved confounders among those at risk. We derive an estimator as the solution to an estimating equation that resembles the score equation of the partial likelihood in much the same way as the traditional IV estimator resembles the normal equations. To justify this IV estimator for a Cox model we perform simulations to evaluate its operating characteristics. Finally, we apply the estimator to an observational study of the effect of coronary catheterization on survival.

  19. Optimisation of the formulation of a bubble bath by a chemometric approach market segmentation and optimisation.

    PubMed

    Marengo, Emilio; Robotti, Elisa; Gennaro, Maria Carla; Bertetto, Mariella

    2003-03-01

    The optimisation of the formulation of a commercial bubble bath was performed by chemometric analysis of Panel Tests results. A first Panel Test was performed to choose the best essence, among four proposed to the consumers; the best essence chosen was used in the revised commercial bubble bath. Afterwards, the effect of changing the amount of four components (the amount of primary surfactant, the essence, the hydratant and the colouring agent) of the bubble bath was studied by a fractional factorial design. The segmentation of the bubble bath market was performed by a second Panel Test, in which the consumers were requested to evaluate the samples coming from the experimental design. The results were then treated by Principal Component Analysis. The market had two segments: people preferring a product with a rich formulation and people preferring a poor product. The final target, i.e. the optimisation of the formulation for each segment, was obtained by the calculation of regression models relating the subjective evaluations given by the Panel and the compositions of the samples. The regression models allowed to identify the best formulations for the two segments ofthe market.

  20. The spatial and temporal association of neighborhood drug markets and rates of sexually transmitted infections in an urban setting.

    PubMed

    Jennings, Jacky M; Woods, Stacy E; Curriero, Frank C

    2013-09-01

    This study examined temporal and spatial relationships between neighborhood drug markets and gonorrhea among census block groups from 2002 to 2005. This was a spatial, longitudinal ecologic study. Poisson regression was used with adjustment in final models for socioeconomic status, residential stability and vacant housing. Increased drug market arrests were significantly associated with a 11% increase gonorrhea (adjusted relative risk (ARR) 1.11; 95% CI 1.05, 1.16). Increased drug market arrests in adjacent neighborhoods were significantly associated with a 27% increase in gonorrhea (ARR 1.27; 95% CI 1.16, 1.36), independent of focal neighborhood drug markets. Increased drug market arrests in the previous year in focal neighborhoods were not associated with gonorrhea (ARR 1.04; 95% CI 0.98, 1.10), adjusting for focal and adjacent drug markets. While the temporal was not supported, our findings support an associative link between drug markets and gonorrhea. The findings suggest that drug markets and their associated sexual networks may extend beyond local neighborhood boundaries indicating the importance of including spatial lags in regression models investigating these associations. Copyright © 2013 Elsevier Ltd. All rights reserved.

  1. The spatial and temporal association of neighborhood drug markets and rates of sexually transmitted infections in an urban setting

    PubMed Central

    Jennings, Jacky M.; Woods, Stacy E.; Curriero, Frank C.

    2013-01-01

    This study examined temporal and spatial relationships between neighborhood drug markets and gonorrhea among census block groups from 2002 to 2005. This was a spatial, longitudinal ecologic study. Poisson regression was used with adjustment in final models for socioeconomic status, residential stability and vacant housing. Increased drug market arrests were significantly associated with a 11% increase gonorrhea (Adjusted Relative Risk (ARR) 1.11; 95% CI 1.05, 1.16). Increased drug market arrests in adjacent neighborhoods were significantly associated with a 27% increase in gonorrhea (ARR 1.27; 95% CI 1.16, 1.36), independent of focal neighborhood drug markets. Increased drug market arrests in the previous year in focal neighborhoods were not associated with gonorrhea (ARR 1.04; 95% CI 0.98, 1.10), adjusting for focal and adjacent drug markets. While the temporal was not supported, our findings support an associative link between drug markets and gonorrhea. The findings suggest that drug markets and their associated sexual networks may extend beyond local neighborhood boundaries indicating the importance of including spatial lags in regression models investigating these associations. PMID:23872251

  2. Mapping individuals' earthquake preparedness in China

    NASA Astrophysics Data System (ADS)

    Wu, Guochun; Han, Ziqiang; Xu, Weijin; Gong, Yue

    2018-05-01

    Disaster preparedness is critical for reducing potential impact. This paper contributes to current knowledge of disaster preparedness using representative national sample data from China, which faces high earthquake risks in many areas of the country. The adoption of earthquake preparedness activities by the general public, including five indicators of material preparedness and five indicators of awareness preparedness, were surveyed and 3245 respondents from all 31 provinces of Mainland China participated in the survey. Linear regression models and logit regression models were used to analyze the effects of potential influencing factors. Overall, the preparedness levels are not satisfied, with a material preparation score of 3.02 (1-5), and awareness preparation score of 2.79 (1-5), nationally. Meanwhile, residents from western China, which has higher earthquake risk, have higher degrees of preparedness. The concern for disaster risk reduction (DRR) and the concern for building safety and participation in public affairs are consistent positive predictors of both material and awareness preparedness. The demographic and socioeconomic variables' effects, such as gender, age, education, income, urban/rural division, and building size, vary according to different preparedness activities. Finally, the paper concludes with a discussion of the theoretical contribution and potential implementation.

  3. Female homicide in Rio Grande do Sul, Brazil.

    PubMed

    Leites, Gabriela Tomedi; Meneghel, Stela Nazareth; Hirakata, Vania Noemi

    2014-01-01

    This study aimed to assess the female homicide rate due to aggression in Rio Grande do Sul, Brazil, using this as a "proxy" of femicide. This was an ecological study which correlated the female homicide rate due to aggression in Rio Grande do Sul, according to the 35 microregions defined by the Brazilian Institute of Geography and Statistics (IBGE), with socioeconomic and demographic variables access and health indicators. Pearson's correlation test was performed with the selected variables. After this, multiple linear regressions were performed with variables with p < 0.20. The standardized average of female homicide rate due to aggression in the period from 2003 to 2007 was 3.1 obits per 100 thousand. After multiple regression analysis, the final model included male mortality due to aggression (p = 0.016), the percentage of hospital admissions for alcohol (p = 0.005) and the proportion of ill-defined deaths (p = 0.015). The model have an explanatory power of 39% (adjusted r2 = 0.391). The results are consistent with other studies and indicate a strong relationship between structural violence in society and violence against women, in addition to a higher incidence of female deaths in places with high alcohol hospitalization.

  4. Social determinants of childhood asthma symptoms: an ecological study in urban Latin America.

    PubMed

    Fattore, Gisel L; Santos, Carlos A T; Barreto, Mauricio L

    2014-04-01

    Asthma is an important public health problem in urban Latin America. This study aimed to analyze the role of socioeconomic and environmental factors as potential determinants of asthma symptoms prevalence in children from Latin American (LA) urban centers. We selected 31 LA urban centers with complete data, and an ecological analysis was performed. According to our theoretical framework, the explanatory variables were classified in three levels: distal, intermediate, and proximate. The association between variables in the three levels and prevalence of asthma symptoms was examined by bivariate and multivariate linear regression analysis weighed by sample size. In a second stage, we fitted several linear regression models introducing sequentially the variables according to the predefined hierarchy. In the final hierarchical model Gini Index, crowding, sanitation, variation in infant mortality rates and homicide rates, explained great part of the variance in asthma prevalence between centers (R(2) = 75.0 %). We found a strong association between socioeconomic and environmental variables and prevalence of asthma symptoms in LA urban children, and according to our hierarchical framework and the results found we suggest that social inequalities (measured by the Gini Index) is a central determinant to explain high prevalence of asthma in LA.

  5. Drivers and potential predictability of summer time North Atlantic polar front jet variability

    NASA Astrophysics Data System (ADS)

    Hall, Richard J.; Jones, Julie M.; Hanna, Edward; Scaife, Adam A.; Erdélyi, Róbert

    2017-06-01

    The variability of the North Atlantic polar front jet stream is crucial in determining summer weather around the North Atlantic basin. Recent extreme summers in western Europe and North America have highlighted the need for greater understanding of this variability, in order to aid seasonal forecasting and mitigate societal, environmental and economic impacts. Here we find that simple linear regression and composite models based on a few predictable factors are able to explain up to 35 % of summertime jet stream speed and latitude variability from 1955 onwards. Sea surface temperature forcings impact predominantly on jet speed, whereas solar and cryospheric forcings appear to influence jet latitude. The cryospheric associations come from the previous autumn, suggesting the survival of an ice-induced signal through the winter season, whereas solar influences lead jet variability by a few years. Regression models covering the earlier part of the twentieth century are much less effective, presumably due to decreased availability of data, and increased uncertainty in observational reanalyses. Wavelet coherence analysis identifies that associations fluctuate over the study period but it is not clear whether this is just internal variability or genuine non-stationarity. Finally we identify areas for future research.

  6. [Stages of behavioral change regarding physical activity in students from a Brazilian town].

    PubMed

    Silva, Diego A S; Smith-Menezes, Aldemir; Almeida-Gomes, Marciusde; de Sousa, Thiago Ferreira

    2010-08-01

    Verifying the association between stages of behavioural change (SBC) for physical activity (PA) and socio-demographic factors, behavioural factors and PA barriers in students from a small town in Brazil. This cross-sectional study's representative sample was formed by 281 high school students from Simão Dias, Sergipe State, in Brazil, having 17.4 (± 1.98) mean age. Socio-demographic information was collected via a self-administered instrument (gender, age, school grade, economic level (EL) and family-head's EL), SBC for PA, behavioural factors (smoking, alcohol and stress) and PA barriers. A hierarchical model was used, involving Poisson regression with respective confidence intervals; significance level was set at 5 % for all analysis. 65.8 % of the participating students were classified in stages referring to inactive physical behaviour. Being female had the probability of presenting 1.37 times higher inactive behaviour (1.14-1.65 95 %CI) when compared to being male in the final regression model; having a low EL remained a risk factor, compared to medium EL students (PR=1.41; 1.15-1.72 95 %CI). These findings may prove useful for developing health promotion programmes in school environments, paying special attention to female and low-EL students.

  7. Breeding value accuracy estimates for growth traits using random regression and multi-trait models in Nelore cattle.

    PubMed

    Boligon, A A; Baldi, F; Mercadante, M E Z; Lobo, R B; Pereira, R J; Albuquerque, L G

    2011-06-28

    We quantified the potential increase in accuracy of expected breeding value for weights of Nelore cattle, from birth to mature age, using multi-trait and random regression models on Legendre polynomials and B-spline functions. A total of 87,712 weight records from 8144 females were used, recorded every three months from birth to mature age from the Nelore Brazil Program. For random regression analyses, all female weight records from birth to eight years of age (data set I) were considered. From this general data set, a subset was created (data set II), which included only nine weight records: at birth, weaning, 365 and 550 days of age, and 2, 3, 4, 5, and 6 years of age. Data set II was analyzed using random regression and multi-trait models. The model of analysis included the contemporary group as fixed effects and age of dam as a linear and quadratic covariable. In the random regression analyses, average growth trends were modeled using a cubic regression on orthogonal polynomials of age. Residual variances were modeled by a step function with five classes. Legendre polynomials of fourth and sixth order were utilized to model the direct genetic and animal permanent environmental effects, respectively, while third-order Legendre polynomials were considered for maternal genetic and maternal permanent environmental effects. Quadratic polynomials were applied to model all random effects in random regression models on B-spline functions. Direct genetic and animal permanent environmental effects were modeled using three segments or five coefficients, and genetic maternal and maternal permanent environmental effects were modeled with one segment or three coefficients in the random regression models on B-spline functions. For both data sets (I and II), animals ranked differently according to expected breeding value obtained by random regression or multi-trait models. With random regression models, the highest gains in accuracy were obtained at ages with a low number of weight records. The results indicate that random regression models provide more accurate expected breeding values than the traditionally finite multi-trait models. Thus, higher genetic responses are expected for beef cattle growth traits by replacing a multi-trait model with random regression models for genetic evaluation. B-spline functions could be applied as an alternative to Legendre polynomials to model covariance functions for weights from birth to mature age.

  8. The Impact of Prior Programming Knowledge on Lecture Attendance and Final Exam

    ERIC Educational Resources Information Center

    Veerasamy, Ashok Kumar; D'Souza, Daryl; Lindén, Rolf; Laakso, Mikko-Jussi

    2018-01-01

    In this article, we report the results of the impact of prior programming knowledge (PPK) on lecture attendance (LA) and on subsequent final programming exam performance in a university level introductory programming course. This study used Spearman's rank correlation coefficient, multiple regression, Kruskal-Wallis, and Bonferroni correction…

  9. Evaluating differential effects using regression interactions and regression mixture models

    PubMed Central

    Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung

    2015-01-01

    Research increasingly emphasizes understanding differential effects. This paper focuses on understanding regression mixture models, a relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their formulation, and their assumptions are compared using Monte Carlo simulations and real data analysis. The capabilities of regression mixture models are described and specific issues to be addressed when conducting regression mixtures are proposed. The paper aims to clarify the role that regression mixtures can take in the estimation of differential effects and increase awareness of the benefits and potential pitfalls of this approach. Regression mixture models are shown to be a potentially effective exploratory method for finding differential effects when these effects can be defined by a small number of classes of respondents who share a typical relationship between a predictor and an outcome. It is also shown that the comparison between regression mixture models and interactions becomes substantially more complex as the number of classes increases. It is argued that regression interactions are well suited for direct tests of specific hypotheses about differential effects and regression mixtures provide a useful approach for exploring effect heterogeneity given adequate samples and study design. PMID:26556903

  10. Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models

    ERIC Educational Resources Information Center

    Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung

    2015-01-01

    Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…

  11. Modeling Linguistic Variables With Regression Models: Addressing Non-Gaussian Distributions, Non-independent Observations, and Non-linear Predictors With Random Effects and Generalized Additive Models for Location, Scale, and Shape

    PubMed Central

    Coupé, Christophe

    2018-01-01

    As statistical approaches are getting increasingly used in linguistics, attention must be paid to the choice of methods and algorithms used. This is especially true since they require assumptions to be satisfied to provide valid results, and because scientific articles still often fall short of reporting whether such assumptions are met. Progress is being, however, made in various directions, one of them being the introduction of techniques able to model data that cannot be properly analyzed with simpler linear regression models. We report recent advances in statistical modeling in linguistics. We first describe linear mixed-effects regression models (LMM), which address grouping of observations, and generalized linear mixed-effects models (GLMM), which offer a family of distributions for the dependent variable. Generalized additive models (GAM) are then introduced, which allow modeling non-linear parametric or non-parametric relationships between the dependent variable and the predictors. We then highlight the possibilities offered by generalized additive models for location, scale, and shape (GAMLSS). We explain how they make it possible to go beyond common distributions, such as Gaussian or Poisson, and offer the appropriate inferential framework to account for ‘difficult’ variables such as count data with strong overdispersion. We also demonstrate how they offer interesting perspectives on data when not only the mean of the dependent variable is modeled, but also its variance, skewness, and kurtosis. As an illustration, the case of phonemic inventory size is analyzed throughout the article. For over 1,500 languages, we consider as predictors the number of speakers, the distance from Africa, an estimation of the intensity of language contact, and linguistic relationships. We discuss the use of random effects to account for genealogical relationships, the choice of appropriate distributions to model count data, and non-linear relationships. Relying on GAMLSS, we assess a range of candidate distributions, including the Sichel, Delaporte, Box-Cox Green and Cole, and Box-Cox t distributions. We find that the Box-Cox t distribution, with appropriate modeling of its parameters, best fits the conditional distribution of phonemic inventory size. We finally discuss the specificities of phoneme counts, weak effects, and how GAMLSS should be considered for other linguistic variables. PMID:29713298

  12. Modeling Linguistic Variables With Regression Models: Addressing Non-Gaussian Distributions, Non-independent Observations, and Non-linear Predictors With Random Effects and Generalized Additive Models for Location, Scale, and Shape.

    PubMed

    Coupé, Christophe

    2018-01-01

    As statistical approaches are getting increasingly used in linguistics, attention must be paid to the choice of methods and algorithms used. This is especially true since they require assumptions to be satisfied to provide valid results, and because scientific articles still often fall short of reporting whether such assumptions are met. Progress is being, however, made in various directions, one of them being the introduction of techniques able to model data that cannot be properly analyzed with simpler linear regression models. We report recent advances in statistical modeling in linguistics. We first describe linear mixed-effects regression models (LMM), which address grouping of observations, and generalized linear mixed-effects models (GLMM), which offer a family of distributions for the dependent variable. Generalized additive models (GAM) are then introduced, which allow modeling non-linear parametric or non-parametric relationships between the dependent variable and the predictors. We then highlight the possibilities offered by generalized additive models for location, scale, and shape (GAMLSS). We explain how they make it possible to go beyond common distributions, such as Gaussian or Poisson, and offer the appropriate inferential framework to account for 'difficult' variables such as count data with strong overdispersion. We also demonstrate how they offer interesting perspectives on data when not only the mean of the dependent variable is modeled, but also its variance, skewness, and kurtosis. As an illustration, the case of phonemic inventory size is analyzed throughout the article. For over 1,500 languages, we consider as predictors the number of speakers, the distance from Africa, an estimation of the intensity of language contact, and linguistic relationships. We discuss the use of random effects to account for genealogical relationships, the choice of appropriate distributions to model count data, and non-linear relationships. Relying on GAMLSS, we assess a range of candidate distributions, including the Sichel, Delaporte, Box-Cox Green and Cole, and Box-Cox t distributions. We find that the Box-Cox t distribution, with appropriate modeling of its parameters, best fits the conditional distribution of phonemic inventory size. We finally discuss the specificities of phoneme counts, weak effects, and how GAMLSS should be considered for other linguistic variables.

  13. Analysing the accuracy of machine learning techniques to develop an integrated influent time series model: case study of a sewage treatment plant, Malaysia.

    PubMed

    Ansari, Mozafar; Othman, Faridah; Abunama, Taher; El-Shafie, Ahmed

    2018-04-01

    The function of a sewage treatment plant is to treat the sewage to acceptable standards before being discharged into the receiving waters. To design and operate such plants, it is necessary to measure and predict the influent flow rate. In this research, the influent flow rate of a sewage treatment plant (STP) was modelled and predicted by autoregressive integrated moving average (ARIMA), nonlinear autoregressive network (NAR) and support vector machine (SVM) regression time series algorithms. To evaluate the models' accuracy, the root mean square error (RMSE) and coefficient of determination (R 2 ) were calculated as initial assessment measures, while relative error (RE), peak flow criterion (PFC) and low flow criterion (LFC) were calculated as final evaluation measures to demonstrate the detailed accuracy of the selected models. An integrated model was developed based on the individual models' prediction ability for low, average and peak flow. An initial assessment of the results showed that the ARIMA model was the least accurate and the NAR model was the most accurate. The RE results also prove that the SVM model's frequency of errors above 10% or below - 10% was greater than the NAR model's. The influent was also forecasted up to 44 weeks ahead by both models. The graphical results indicate that the NAR model made better predictions than the SVM model. The final evaluation of NAR and SVM demonstrated that SVM made better predictions at peak flow and NAR fit well for low and average inflow ranges. The integrated model developed includes the NAR model for low and average influent and the SVM model for peak inflow.

  14. Accuracy of Bayes and Logistic Regression Subscale Probabilities for Educational and Certification Tests

    ERIC Educational Resources Information Center

    Rudner, Lawrence

    2016-01-01

    In the machine learning literature, it is commonly accepted as fact that as calibration sample sizes increase, Naïve Bayes classifiers initially outperform Logistic Regression classifiers in terms of classification accuracy. Applied to subtests from an on-line final examination and from a highly regarded certification examination, this study shows…

  15. Multiple linear regression analysis

    NASA Technical Reports Server (NTRS)

    Edwards, T. R.

    1980-01-01

    Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.

  16. Modeling absolute differences in life expectancy with a censored skew-normal regression approach

    PubMed Central

    Clough-Gorr, Kerri; Zwahlen, Marcel

    2015-01-01

    Parameter estimates from commonly used multivariable parametric survival regression models do not directly quantify differences in years of life expectancy. Gaussian linear regression models give results in terms of absolute mean differences, but are not appropriate in modeling life expectancy, because in many situations time to death has a negative skewed distribution. A regression approach using a skew-normal distribution would be an alternative to parametric survival models in the modeling of life expectancy, because parameter estimates can be interpreted in terms of survival time differences while allowing for skewness of the distribution. In this paper we show how to use the skew-normal regression so that censored and left-truncated observations are accounted for. With this we model differences in life expectancy using data from the Swiss National Cohort Study and from official life expectancy estimates and compare the results with those derived from commonly used survival regression models. We conclude that a censored skew-normal survival regression approach for left-truncated observations can be used to model differences in life expectancy across covariates of interest. PMID:26339544

  17. Early Change in Stroke Size Performs Best in Predicting Response to Therapy.

    PubMed

    Simpkins, Alexis Nétis; Dias, Christian; Norato, Gina; Kim, Eunhee; Leigh, Richard

    2017-01-01

    Reliable imaging biomarkers of response to therapy in acute stroke are needed. The final infarct volume and percent of early reperfusion have been used for this purpose. Early fluctuation in stroke size is a recognized phenomenon, but its utility as a biomarker for response to therapy has not been established. This study examined the clinical relevance of early change in stroke volume and compared it with the final infarct volume and percent of early reperfusion in identifying early neurologic improvement (ENI). Acute stroke patients, enrolled between 2013 and 2014 with serial magnetic resonance imaging (MRI) scans (pretreatment baseline, 2 h post, and 24 h post), who received thrombolysis were included in the analysis. Early change in stroke volume, infarct volume at 24 h on diffusion, and percent of early reperfusion were calculated from the baseline and 2 h MRI scans were compared. ENI was defined as ≥4 point decrease in National Institutes of Health Stroke Scales within 24 h. Logistic regression models and receiver operator characteristics analysis were used to compare the efficacy of 3 imaging biomarkers. Serial MRIs of 58 acute stroke patients were analyzed. Early change in stroke volume was significantly associated with ENI by logistic regression analysis (OR 0.93, p = 0.048) and remained significant after controlling for stroke size and severity (OR 0.90, p = 0.032). Thus, for every 1 mL increase in stroke volume, there was a 10% decrease in the odds of ENI, while for every 1 mL decrease in stroke volume, there was a 10% increase in the odds of ENI. Neither infarct volume at 24 h nor percent of early reperfusion were significantly associated with ENI by logistic regression. Receiver-operator characteristic analysis identified early change in stroke volume as the only biomarker of the 3 that performed significantly different than chance (p = 0.03). Early fluctuations in stroke size may represent a more reliable biomarker for response to therapy than the more traditional measures of final infarct volume and percent of early reperfusion. © 2017 S. Karger AG, Basel.

  18. Exploration, Sampling, And Reconstruction of Free Energy Surfaces with Gaussian Process Regression.

    PubMed

    Mones, Letif; Bernstein, Noam; Csányi, Gábor

    2016-10-11

    Practical free energy reconstruction algorithms involve three separate tasks: biasing, measuring some observable, and finally reconstructing the free energy surface from those measurements. In more than one dimension, adaptive schemes make it possible to explore only relatively low lying regions of the landscape by progressively building up the bias toward the negative of the free energy surface so that free energy barriers are eliminated. Most schemes use the final bias as their best estimate of the free energy surface. We show that large gains in computational efficiency, as measured by the reduction of time to solution, can be obtained by separating the bias used for dynamics from the final free energy reconstruction itself. We find that biasing with metadynamics, measuring a free energy gradient estimator, and reconstructing using Gaussian process regression can give an order of magnitude reduction in computational cost.

  19. Use of an Artificial Neural Network to Construct a Model of Predicting Deep Fungal Infection in Lung Cancer Patients.

    PubMed

    Chen, Jian; Chen, Jie; Ding, Hong-Yan; Pan, Qin-Shi; Hong, Wan-Dong; Xu, Gang; Yu, Fang-You; Wang, Yu-Min

    2015-01-01

    The statistical methods to analyze and predict the related dangerous factors of deep fungal infection in lung cancer patients were several, such as logic regression analysis, meta-analysis, multivariate Cox proportional hazards model analysis, retrospective analysis, and so on, but the results are inconsistent. A total of 696 patients with lung cancer were enrolled. The factors were compared employing Student's t-test or the Mann-Whitney test or the Chi-square test and variables that were significantly related to the presence of deep fungal infection selected as candidates for input into the final artificial neural network analysis (ANN) model. The receiver operating characteristic (ROC) and area under curve (AUC) were used to evaluate the performance of the artificial neural network (ANN) model and logistic regression (LR) model. The prevalence of deep fungal infection from lung cancer in this entire study population was 32.04%(223/696), deep fungal infections occur in sputum specimens 44.05% (200/454). The ratio of candida albicans was 86.99% (194/223) in the total fungi. It was demonstrated that older (≥65 years), use of antibiotics, low serum albumin concentrations (≤37.18 g /L), radiotherapy, surgery, low hemoglobin hyperlipidemia (≤93.67 g /L), long time of hospitalization (≥14 days) were apt to deep fungal infection and the ANN model consisted of the seven factors. The AUC of ANN model (0.829±0.019) was higher than that of LR model (0.756±0.021). The artificial neural network model with variables consisting of age, use of antibiotics, serum albumin concentrations, received radiotherapy, received surgery, hemoglobin, time of hospitalization should be useful for predicting the deep fungal infection in lung cancer.

  20. A new surrogate modeling technique combining Kriging and polynomial chaos expansions – Application to uncertainty analysis in computational dosimetry

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kersaudy, Pierric, E-mail: pierric.kersaudy@orange.com; Whist Lab, 38 avenue du Général Leclerc, 92130 Issy-les-Moulineaux; ESYCOM, Université Paris-Est Marne-la-Vallée, 5 boulevard Descartes, 77700 Marne-la-Vallée

    2015-04-01

    In numerical dosimetry, the recent advances in high performance computing led to a strong reduction of the required computational time to assess the specific absorption rate (SAR) characterizing the human exposure to electromagnetic waves. However, this procedure remains time-consuming and a single simulation can request several hours. As a consequence, the influence of uncertain input parameters on the SAR cannot be analyzed using crude Monte Carlo simulation. The solution presented here to perform such an analysis is surrogate modeling. This paper proposes a novel approach to build such a surrogate model from a design of experiments. Considering a sparse representationmore » of the polynomial chaos expansions using least-angle regression as a selection algorithm to retain the most influential polynomials, this paper proposes to use the selected polynomials as regression functions for the universal Kriging model. The leave-one-out cross validation is used to select the optimal number of polynomials in the deterministic part of the Kriging model. The proposed approach, called LARS-Kriging-PC modeling, is applied to three benchmark examples and then to a full-scale metamodeling problem involving the exposure of a numerical fetus model to a femtocell device. The performances of the LARS-Kriging-PC are compared to an ordinary Kriging model and to a classical sparse polynomial chaos expansion. The LARS-Kriging-PC appears to have better performances than the two other approaches. A significant accuracy improvement is observed compared to the ordinary Kriging or to the sparse polynomial chaos depending on the studied case. This approach seems to be an optimal solution between the two other classical approaches. A global sensitivity analysis is finally performed on the LARS-Kriging-PC model of the fetus exposure problem.« less

  1. Comparison of Predictive Modeling Methods of Aircraft Landing Speed

    NASA Technical Reports Server (NTRS)

    Diallo, Ousmane H.

    2012-01-01

    Expected increases in air traffic demand have stimulated the development of air traffic control tools intended to assist the air traffic controller in accurately and precisely spacing aircraft landing at congested airports. Such tools will require an accurate landing-speed prediction to increase throughput while decreasing necessary controller interventions for avoiding separation violations. There are many practical challenges to developing an accurate landing-speed model that has acceptable prediction errors. This paper discusses the development of a near-term implementation, using readily available information, to estimate/model final approach speed from the top of the descent phase of flight to the landing runway. As a first approach, all variables found to contribute directly to the landing-speed prediction model are used to build a multi-regression technique of the response surface equation (RSE). Data obtained from operations of a major airlines for a passenger transport aircraft type to the Dallas/Fort Worth International Airport are used to predict the landing speed. The approach was promising because it decreased the standard deviation of the landing-speed error prediction by at least 18% from the standard deviation of the baseline error, depending on the gust condition at the airport. However, when the number of variables is reduced to the most likely obtainable at other major airports, the RSE model shows little improvement over the existing methods. Consequently, a neural network that relies on a nonlinear regression technique is utilized as an alternative modeling approach. For the reduced number of variables cases, the standard deviation of the neural network models errors represent over 5% reduction compared to the RSE model errors, and at least 10% reduction over the baseline predicted landing-speed error standard deviation. Overall, the constructed models predict the landing-speed more accurately and precisely than the current state-of-the-art.

  2. A conceptual disease model for adult Pompe disease.

    PubMed

    Kanters, Tim A; Redekop, W Ken; Rutten-Van Mölken, Maureen P M H; Kruijshaar, Michelle E; Güngör, Deniz; van der Ploeg, Ans T; Hakkaart, Leona

    2015-09-15

    Studies in orphan diseases are, by nature, confronted with small patient populations, meaning that randomized controlled trials will have limited statistical power. In order to estimate the effectiveness of treatments in orphan diseases and extrapolate effects into the future, alternative models might be needed. The purpose of this study is to develop a conceptual disease model for Pompe disease in adults (an orphan disease). This conceptual model describes the associations between the most important levels of health concepts for Pompe disease in adults, from biological parameters via physiological parameters, symptoms and functional indicators to health perceptions and final health outcomes as measured in terms of health-related quality of life. The structure of the Wilson-Cleary health outcomes model was used as a blueprint, and filled with clinically relevant aspects for Pompe disease based on literature and expert opinion. Multiple observations per patient from a Dutch cohort study in untreated patients were used to quantify the relationships between the different levels of health concepts in the model by means of regression analyses. Enzyme activity, muscle strength, respiratory function, fatigue, level of handicap, general health perceptions, mental and physical component scales and utility described the different levels of health concepts in the Wilson-Cleary model for Pompe disease. Regression analyses showed that functional status was affected by fatigue, muscle strength and respiratory function. Health perceptions were affected by handicap. In turn, self-reported quality of life was affected by health perceptions. We conceptualized a disease model that incorporated the mechanisms believed to be responsible for impaired quality of life in Pompe disease. The model provides a comprehensive overview of various aspects of Pompe disease in adults, which can be useful for both clinicians and policymakers to support their multi-faceted decision making.

  3. Spontaneous regression of curve in immature idiopathic scoliosis - does spinal column play a role to balance? An observation with literature review.

    PubMed

    Modi, Hitesh N; Suh, Seung-Woo; Yang, Jae-Hyuk; Hong, Jae-Young; Venkatesh, Kp; Muzaffar, Nasir

    2010-11-04

    Child with mild scoliosis is always a subject of interest for most orthopaedic surgeons regarding progression. Literature described Hueter-Volkmann theory regarding disc and vertebral wedging, and muscular imbalance for the progression of adolescent idiopathic scoliosis. However, many authors reported spontaneous resolution of curves also without any reason for that and the rate of resolution reported is almost 25%. Purpose of this study was to question the role of paraspinal muscle tuning/balancing mechanism, especially in patients with idiopathic scoliosis with early mild curve, for spontaneous regression or progression as well as changing pattern of curves. An observational study of serial radiograms in 169 idiopathic scoliosis children (with minimum follow-up one year) was carried. All children with Cobb angle < 25° and who were diagnosed for the first time were selected. As a sign of immaturity at the time of diagnosis, all children had Risser sign 0. No treatment was given to entire study group. Children were divided in three groups at final follow-up: Group A, B and C as children with regression, no change and progression of their curves, respectively. Additionally changes in the pattern of curve were also noted. Average age was 9.2 years at first visit and 10.11 years at final follow-up with an average follow-up of 21 months. 32.5% (55/169), 41.4% (70/169) and 26% (44/169) children exhibited regression, no change and progression in their curves, respectively. 46.1% of children (78/169) showed changing pattern of their curves during the follow-up visits before it settled down to final curve. Comparing final fate of curve with side of curve and number of curves it did not show any relationship (p > 0.05) in our study population. Possible reason for changing patterns could be better explained by the tuning/balancing mechanism of spinal column that makes an effort to balance the spine and result into spontaneous regression or prevent further progression of curve. If this which we called as "tuning/balancing mechanism" fails, curve will ultimately progress.

  4. Stock price forecasting based on time series analysis

    NASA Astrophysics Data System (ADS)

    Chi, Wan Le

    2018-05-01

    Using the historical stock price data to set up a sequence model to explain the intrinsic relationship of data, the future stock price can forecasted. The used models are auto-regressive model, moving-average model and autoregressive-movingaverage model. The original data sequence of unit root test was used to judge whether the original data sequence was stationary. The non-stationary original sequence as a first order difference needed further processing. Then the stability of the sequence difference was re-inspected. If it is still non-stationary, the second order differential processing of the sequence is carried out. Autocorrelation diagram and partial correlation diagram were used to evaluate the parameters of the identified ARMA model, including coefficients of the model and model order. Finally, the model was used to forecast the fitting of the shanghai composite index daily closing price with precision. Results showed that the non-stationary original data series was stationary after the second order difference. The forecast value of shanghai composite index daily closing price was closer to actual value, indicating that the ARMA model in the paper was a certain accuracy.

  5. Error Covariance Penalized Regression: A novel multivariate model combining penalized regression with multivariate error structure.

    PubMed

    Allegrini, Franco; Braga, Jez W B; Moreira, Alessandro C O; Olivieri, Alejandro C

    2018-06-29

    A new multivariate regression model, named Error Covariance Penalized Regression (ECPR) is presented. Following a penalized regression strategy, the proposed model incorporates information about the measurement error structure of the system, using the error covariance matrix (ECM) as a penalization term. Results are reported from both simulations and experimental data based on replicate mid and near infrared (MIR and NIR) spectral measurements. The results for ECPR are better under non-iid conditions when compared with traditional first-order multivariate methods such as ridge regression (RR), principal component regression (PCR) and partial least-squares regression (PLS). Copyright © 2018 Elsevier B.V. All rights reserved.

  6. Alterations of the tunica vasculosa lentis in the rat model of retinopathy of prematurity.

    PubMed

    Favazza, Tara L; Tanimoto, Naoyuki; Munro, Robert J; Beck, Susanne C; Garcia Garrido, Marina; Seide, Christina; Sothilingam, Vithiyanjali; Hansen, Ronald M; Fulton, Anne B; Seeliger, Mathias W; Akula, James D

    2013-08-01

    To study the relationship between retinal and tunica vasculosa lentis (TVL) disease in retinopathy of prematurity (ROP). Although the clinical hallmark of ROP is abnormal retinal blood vessels, the vessels of the anterior segment, including the TVL, are also altered. ROP was induced in Long-Evans pigmented and Sprague Dawley albino rats; room-air-reared (RAR) rats served as controls. Then, fluorescein angiographic images of the TVL and retinal vessels were serially obtained with a scanning laser ophthalmoscope near the height of retinal vascular disease, ~20 days of age, and again at 30 and 64 days of age. Additionally, electroretinograms (ERGs) were obtained prior to the first imaging session. The TVL images were analyzed for percent coverage of the posterior lens. The tortuosity of the retinal arterioles was determined using Retinal Image multiScale Analysis (Gelman et al. in Invest Ophthalmol Vis Sci 46:4734-4738, 2005). In the youngest ROP rats, the TVL was dense, while in RAR rats, it was relatively sparse. By 30 days, the TVL in RAR rats had almost fully regressed, while in ROP rats, it was still pronounced. By the final test age, the TVL had completely regressed in both ROP and RAR rats. In parallel, the tortuous retinal arterioles in ROP rats resolved with increasing age. ERG components indicating postreceptoral dysfunction, the b-wave, and oscillatory potentials were attenuated in ROP rats. These findings underscore the retinal vascular abnormalities and, for the first time, show abnormal anterior segment vasculature in the rat model of ROP. There is delayed regression of the TVL in the rat model of ROP. This demonstrates that ROP is a disease of the whole eye.

  7. Alterations of the Tunica Vasculosa Lentis in the Rat Model of Retinopathy of Prematurity

    PubMed Central

    Favazza, Tara L; Tanimoto, Naoyuki; Munro, Robert J.; Beck, Susanne C.; Garrido, Marina G.; Seide, Christina; Sothilingam, Vithiyanjali; Hansen, Ronald M.; Fulton, Anne B.; Seeliger, Mathias W.; Akula, James D

    2013-01-01

    Purpose To study the relation between retinal and tunica vasculosa lentis (TVL) disease in ROP. Although the clinical hallmark of retinopathy of prematurity (ROP) is abnormal retinal blood vessels, the vessels of the anterior segment, including the TVL, are also altered. Methods ROP was induced in Long Evans pigmented and Sprague-Dawley albino rats; room-air-reared (RAR) rats served as controls. Then, fluorescein angiographic images of the TVL and retinal vessels were serially obtained with a scanning laser ophthalmoscope (SLO) near the height of retinal vascular disease, ∼20 days-of-age, and again at 30 and 64 days-of-age. Additionally, electroretinograms (ERGs) were obtained prior to the first imaging session. The TVL images were analyzed for percent coverage of the posterior lens. The tortuosity of the retinal arterioles was determined using Retinal Image multiScale Analysis (RISA; Gelman et al., 2005). Results In the youngest ROP rats, the TVL was dense, while in RAR rats, it was relatively sparse. By 30 days, the TVL in RAR rats had almost fully regressed, while in ROP rats it was still pronounced. By the final test age, the TVL had completely regressed in both ROP and RAR rats. In parallel, the tortuous retinal arterioles in ROP rats resolved with increasing age. ERG components indicating postreceptoral dysfunction, the b-wave and oscillatory potentials (OPs), were attenuated in ROP rats. Conclusions These findings underscore the retinal vascular abnormalities and, for the first time, show abnormal anterior segment vasculature in the rat model of ROP. There is delayed regression of the TVL in the rat model of ROP. This demonstrates that ROP is a disease of the whole eye. PMID:23748796

  8. Prevalence and correlates of positive mental health in Chinese adolescents.

    PubMed

    Guo, Cheng; Tomson, Göran; Keller, Christina; Söderqvist, Fredrik

    2018-02-17

    Studies investigating the prevalence of positive mental health and its correlates are still scarce compared to the studies on mental disorders, although there is growing interest of assessing positive mental health in adolescents. So far, no other study examining the prevalence and determinants of positive mental health in Chinese adolescents has been found. The purpose of this study was to assess the prevalence and correlates of positive mental health in Chinese adolescents. This cross-sectional study used a questionnaire including Mental Health Continuum-Short Form (MHC-SF) and items regarding multiple aspects of adolescent life. The sample involved a total of 5399 students from grade 8 and 10 in Weifang, China. Multivariate Logistic regression analyses were performed to evaluate the associations between potential indicators regarding socio-economic situations, life style, social support and school life and positive mental health and calculate odds ratios and 95% confidence intervals. More than half (57.4%) of the participants were diagnosed as flourishing. The correlated factors of positive mental health in regression models included gender, perceived family economy, the occurrence of sibling(s), satisfaction of self-appearance, physical activity, sleep quality, stress, social trust, desire to learn, support from teachers and parents as well as whether being bullied at school (OR ranging from 1.23 to 2.75). The Hosmer-Lemeshow p-value for the final regression model (0.45) indicated adequate model fit. This study gives the first overview on prevalence and correlates of positive mental health in Chinese adolescents. The prevalence of positive mental health in Chinese adolescents is higher than reported in most of the previous studies also using MHC-SF. Our findings suggest that adolescents with advantageous socio-economic situations, life style, social support and school life are experiencing better positive mental health than others.

  9. Patient Stratification Using Electronic Health Records from a Chronic Disease Management Program.

    PubMed

    Chen, Robert; Sun, Jimeng; Dittus, Robert S; Fabbri, Daniel; Kirby, Jacqueline; Laffer, Cheryl L; McNaughton, Candace D; Malin, Bradley

    2016-01-04

    The goal of this study is to devise a machine learning framework to assist care coordination programs in prognostic stratification to design and deliver personalized care plans and to allocate financial and medical resources effectively. This study is based on a de-identified cohort of 2,521 hypertension patients from a chronic care coordination program at the Vanderbilt University Medical Center. Patients were modeled as vectors of features derived from electronic health records (EHRs) over a six-year period. We applied a stepwise regression to identify risk factors associated with a decrease in mean arterial pressure of at least 2 mmHg after program enrollment. The resulting features were subsequently validated via a logistic regression classifier. Finally, risk factors were applied to group the patients through model-based clustering. We identified a set of predictive features that consisted of a mix of demographic, medication, and diagnostic concepts. Logistic regression over these features yielded an area under the ROC curve (AUC) of 0.71 (95% CI: [0.67, 0.76]). Based on these features, four clinically meaningful groups are identified through clustering - two of which represented patients with more severe disease profiles, while the remaining represented patients with mild disease profiles. Patients with hypertension can exhibit significant variation in their blood pressure control status and responsiveness to therapy. Yet this work shows that a clustering analysis can generate more homogeneous patient groups, which may aid clinicians in designing and implementing customized care programs. The study shows that predictive modeling and clustering using EHR data can be beneficial for providing a systematic, generalized approach for care providers to tailor their management approach based upon patient-level factors.

  10. Analyzing Student Learning Outcomes: Usefulness of Logistic and Cox Regression Models. IR Applications, Volume 5

    ERIC Educational Resources Information Center

    Chen, Chau-Kuang

    2005-01-01

    Logistic and Cox regression methods are practical tools used to model the relationships between certain student learning outcomes and their relevant explanatory variables. The logistic regression model fits an S-shaped curve into a binary outcome with data points of zero and one. The Cox regression model allows investigators to study the duration…

  11. Robust geographically weighted regression of modeling the Air Polluter Standard Index (APSI)

    NASA Astrophysics Data System (ADS)

    Warsito, Budi; Yasin, Hasbi; Ispriyanti, Dwi; Hoyyi, Abdul

    2018-05-01

    The Geographically Weighted Regression (GWR) model has been widely applied to many practical fields for exploring spatial heterogenity of a regression model. However, this method is inherently not robust to outliers. Outliers commonly exist in data sets and may lead to a distorted estimate of the underlying regression model. One of solution to handle the outliers in the regression model is to use the robust models. So this model was called Robust Geographically Weighted Regression (RGWR). This research aims to aid the government in the policy making process related to air pollution mitigation by developing a standard index model for air polluter (Air Polluter Standard Index - APSI) based on the RGWR approach. In this research, we also consider seven variables that are directly related to the air pollution level, which are the traffic velocity, the population density, the business center aspect, the air humidity, the wind velocity, the air temperature, and the area size of the urban forest. The best model is determined by the smallest AIC value. There are significance differences between Regression and RGWR in this case, but Basic GWR using the Gaussian kernel is the best model to modeling APSI because it has smallest AIC.

  12. Efficacy of DL-methionine hydroxy analogue-free acid in comparison to DL-methionine in growing male white Pekin ducks.

    PubMed

    Kluge, H; Gessner, D K; Herzog, E; Eder, K

    2016-03-01

    The present study was performed to assess the bioefficacy of DL-methionine hydroxy analogue-free acid (MHA) in comparison to DL-methionine (DLM) as sources of methionine for growing male white Pekin ducks in the first 3 wk of life. For this aim, 580 1-day-old male ducks were allocated into 12 treatment groups and received a basal diet that contained 0.29% of methionine, 0.34% of cysteine and 0.63% of total sulphur containing amino acids or the same diet supplemented with either DLM or MHA in amounts to supply 0.05, 0.10, 0.15, 0.20, and 0.25% of methionine equivalents. Ducks fed the control diet without methionine supplement had the lowest final body weights, daily body weight gains and feed intake among all groups. Supplementation of methionine improved final body weights and daily body weight gains in a dose dependent-manner. There was, however, no significant effect of the source of methionine on all of the performance responses. Evaluation of the data of daily body weight gains with an exponential model of regression revealed a nearly identical efficacy (slope of the curves) of both compounds for growth (DLM = 100%, MHA = 101%). According to the exponential model of regression, 95% of the maximum values of daily body weight gain were reached at methionine supplementary levels of 0.080% and 0.079% for DLM and MHA, respectively. Overall, the present study indicates that MHA and DLM have a similar efficacy as sources of methionine for growing ducks. It is moreover shown that dietary methionine concentrations of 0.37% are required to reach 95% of the maximum of daily body weight gains in ducks during the first 3 wk of life. © 2015 Poultry Science Association Inc.

  13. Groundwater potential mapping using C5.0, random forest, and multivariate adaptive regression spline models in GIS.

    PubMed

    Golkarian, Ali; Naghibi, Seyed Amir; Kalantar, Bahareh; Pradhan, Biswajeet

    2018-02-17

    Ever increasing demand for water resources for different purposes makes it essential to have better understanding and knowledge about water resources. As known, groundwater resources are one of the main water resources especially in countries with arid climatic condition. Thus, this study seeks to provide groundwater potential maps (GPMs) employing new algorithms. Accordingly, this study aims to validate the performance of C5.0, random forest (RF), and multivariate adaptive regression splines (MARS) algorithms for generating GPMs in the eastern part of Mashhad Plain, Iran. For this purpose, a dataset was produced consisting of spring locations as indicator and groundwater-conditioning factors (GCFs) as input. In this research, 13 GCFs were selected including altitude, slope aspect, slope angle, plan curvature, profile curvature, topographic wetness index (TWI), slope length, distance from rivers and faults, rivers and faults density, land use, and lithology. The mentioned dataset was divided into two classes of training and validation with 70 and 30% of the springs, respectively. Then, C5.0, RF, and MARS algorithms were employed using R statistical software, and the final values were transformed into GPMs. Finally, two evaluation criteria including Kappa and area under receiver operating characteristics curve (AUC-ROC) were calculated. According to the findings of this research, MARS had the best performance with AUC-ROC of 84.2%, followed by RF and C5.0 algorithms with AUC-ROC values of 79.7 and 77.3%, respectively. The results indicated that AUC-ROC values for the employed models are more than 70% which shows their acceptable performance. As a conclusion, the produced methodology could be used in other geographical areas. GPMs could be used by water resource managers and related organizations to accelerate and facilitate water resource exploitation.

  14. Herd characteristics and cow-level factors associated with Prototheca mastitis on dairy farms in Ontario, Canada.

    PubMed

    Pieper, L; Godkin, A; Roesler, U; Polleichtner, A; Slavic, D; Leslie, K E; Kelton, D F

    2012-10-01

    Prototheca spp. are algae that cause incurable acute or chronic mastitis in dairy cows. The aim of this case-control study was the identification of cow- and herd-level risk factors for this unusual mastitis pathogen. Aseptically collected composite milk samples from 2,428 milking cows in 23 case and 23 control herds were collected between January and May 2011. A questionnaire was administered to the producers, and cow-level production and demographic data were gathered. In 58 of 64 isolates, Prototheca spp. and Prototheca zopfii genotypes were differentiated using PCR and matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. All isolates were identified as Prototheca zopfii genotype 2. The mean within-herd prevalence for Prototheca spp. was 5.1% (range 0.0-12.5%). Case herds had a significantly lower herd-level prevalence of Staphylococcus aureus and a higher prevalence of yeasts than did control herds. The final logistic regression model for herd-level risk factors included use of intramammary injections of a non-intramammary drug [odds ratio (OR) = 136.8], the number of different injectable antibiotic products being used (OR = 2.82), the use of any dry cow teat sealant (external OR = 80.0; internal OR = 34.2), and having treated 3 or more displaced abomasums in the last 12 mo OR = 44.7). The final logistic regression model for cow-level risk factors included second or greater lactation (OR = 4.40) and the logarithm of the lactation-average somatic cell count (OR = 2.99). Unsanitary or repeated intramammary infusions, antibiotic treatment, and off-label use of injectable drugs in the udder might promote Prototheca udder infection. Copyright © 2012 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  15. Body-self unity and self-esteem in patients with rheumatic diseases.

    PubMed

    Bode, Christina; van der Heij, Anouk; Taal, Erik; van de Laar, Mart A F J

    2010-12-01

    Perceptions and evaluations of the own body are important sources of self-esteem. Having a rheumatic disease challenges maintenance of positive self-esteem due to consequences of the disease such as unfavorable sensations as pain and limited (physical) functioning. We expect that a positive experience of the own body in spite of a rheumatic disease (body-self harmony) will be associated with higher levels of self-esteem and that experiencing the body as unworthy part of the own person or as disabler for own strivings (body-self alienation) will result in lower levels of self-esteem. For this explorative study, the body experience questionnaire (BEQ) measuring body-self unity was developed and piloted. One hundred sixty-eight patients visiting the outpatient rheumatology clinic of the Medisch Spectrum Twente, Enschede, The Netherlands, completed a questionnaire on touchscreen computers to measure body-self unity (BEQ), illness cognitions (illness cognition questionnaire), pain intensity, functional limitations (health assessment questionnaire disability index), self-esteem (Rosenberg Self-Esteem Scale) and demographics. To analyze predictors of self-esteem, hierarchical regression analyses were employed. The BEQ revealed a two-factor structure with good reliability (subscale harmony, four items, Cronbach's α = 0.76; subscale alienation, six items, Cronbach's α = 0.84). The final model of the hierarchical regression analyses showed that self-esteem can be predicted by the illness cognitions helplessness and acceptance, by harmony and most strongly by alienation from the body. R(2) of the final model was 0.50. The relationship between functional limitations and self-esteem was totally mediated by the psychological constructs body-self unity and illness cognitions. This explorative study showed the importance of the unity of body and self for self-esteem in patients with a rheumatic disease.

  16. Beneficial and limiting factors for return to work following anterior cruciate ligament reconstruction: a retrospective cohort study.

    PubMed

    Groot, Judith A M; Jonkers, Freerk J; Kievit, Arthur J; Kuijer, P Paul F M; Hoozemans, Marco J M

    2017-02-01

    Evidence-based advice for return to work (RTW) after anterior cruciate ligament (ACL) reconstruction is not available. Therefore, the objectives of this study were to determine when patients achieve full RTW, and to explore the beneficial and limiting factors for fully RTW after ACL reconstruction. A retrospective cohort study was performed after ACL reconstruction among 185 patients in one hospital. Data from patient files and a questionnaire were used to explore whether patient-, injury-, surgery-, sports-, work- and rehabilitation-related factors are beneficial or limiting for fully RTW after ACL reconstruction, using a backward stepwise logistic regression analysis. Of the 125 (68%) patients that returned the questionnaire, 36 were not part of the working population. Of the remaining 89 patients, 82 patients (92%) had returned fully to work at follow-up. The median time to fully RTW was 78 days. In the final regression model, which explained 29% of the variance, a significant OR of 5.4 (90% CI 2.2-13.1) for RTW > 78 days was observed for patients performing heavy knee-demanding work compared to patients performing light knee-demanding work. In addition, a significant and positive OR (1.6, 90% CI 1.2-1.9) for the number of weeks walking with the aid of crutches for RTW > 78 days was observed in the final model. After ACL reconstruction, 92% of the patients fully return to work at a median time of 78 days. The significant predictors for fully RTW > 78 days are performing heavy knee-demanding work and a longer period of walking aided with crutches after ACL reconstruction.

  17. Bayesian Unimodal Density Regression for Causal Inference

    ERIC Educational Resources Information Center

    Karabatsos, George; Walker, Stephen G.

    2011-01-01

    Karabatsos and Walker (2011) introduced a new Bayesian nonparametric (BNP) regression model. Through analyses of real and simulated data, they showed that the BNP regression model outperforms other parametric and nonparametric regression models of common use, in terms of predictive accuracy of the outcome (dependent) variable. The other,…

  18. Bayesian Estimation of Multivariate Latent Regression Models: Gauss versus Laplace

    ERIC Educational Resources Information Center

    Culpepper, Steven Andrew; Park, Trevor

    2017-01-01

    A latent multivariate regression model is developed that employs a generalized asymmetric Laplace (GAL) prior distribution for regression coefficients. The model is designed for high-dimensional applications where an approximate sparsity condition is satisfied, such that many regression coefficients are near zero after accounting for all the model…

  19. A simple approach to power and sample size calculations in logistic regression and Cox regression models.

    PubMed

    Vaeth, Michael; Skovlund, Eva

    2004-06-15

    For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.

  20. Comparative evaluation of urban storm water quality models

    NASA Astrophysics Data System (ADS)

    Vaze, J.; Chiew, Francis H. S.

    2003-10-01

    The estimation of urban storm water pollutant loads is required for the development of mitigation and management strategies to minimize impacts to receiving environments. Event pollutant loads are typically estimated using either regression equations or "process-based" water quality models. The relative merit of using regression models compared to process-based models is not clear. A modeling study is carried out here to evaluate the comparative ability of the regression equations and process-based water quality models to estimate event diffuse pollutant loads from impervious surfaces. The results indicate that, once calibrated, both the regression equations and the process-based model can estimate event pollutant loads satisfactorily. In fact, the loads estimated using the regression equation as a function of rainfall intensity and runoff rate are better than the loads estimated using the process-based model. Therefore, if only estimates of event loads are required, regression models should be used because they are simpler and require less data compared to process-based models.

  1. Bayesian feature selection for high-dimensional linear regression via the Ising approximation with applications to genomics.

    PubMed

    Fisher, Charles K; Mehta, Pankaj

    2015-06-01

    Feature selection, identifying a subset of variables that are relevant for predicting a response, is an important and challenging component of many methods in statistics and machine learning. Feature selection is especially difficult and computationally intensive when the number of variables approaches or exceeds the number of samples, as is often the case for many genomic datasets. Here, we introduce a new approach--the Bayesian Ising Approximation (BIA)-to rapidly calculate posterior probabilities for feature relevance in L2 penalized linear regression. In the regime where the regression problem is strongly regularized by the prior, we show that computing the marginal posterior probabilities for features is equivalent to computing the magnetizations of an Ising model with weak couplings. Using a mean field approximation, we show it is possible to rapidly compute the feature selection path described by the posterior probabilities as a function of the L2 penalty. We present simulations and analytical results illustrating the accuracy of the BIA on some simple regression problems. Finally, we demonstrate the applicability of the BIA to high-dimensional regression by analyzing a gene expression dataset with nearly 30 000 features. These results also highlight the impact of correlations between features on Bayesian feature selection. An implementation of the BIA in C++, along with data for reproducing our gene expression analyses, are freely available at http://physics.bu.edu/∼pankajm/BIACode. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  2. Lung nodule malignancy prediction using multi-task convolutional neural network

    NASA Astrophysics Data System (ADS)

    Li, Xiuli; Kao, Yueying; Shen, Wei; Li, Xiang; Xie, Guotong

    2017-03-01

    In this paper, we investigated the problem of diagnostic lung nodule malignancy prediction using thoracic Computed Tomography (CT) screening. Unlike most existing studies classify the nodules into two types benign and malignancy, we interpreted the nodule malignancy prediction as a regression problem to predict continuous malignancy level. We proposed a joint multi-task learning algorithm using Convolutional Neural Network (CNN) to capture nodule heterogeneity by extracting discriminative features from alternatingly stacked layers. We trained a CNN regression model to predict the nodule malignancy, and designed a multi-task learning mechanism to simultaneously share knowledge among 9 different nodule characteristics (Subtlety, Calcification, Sphericity, Margin, Lobulation, Spiculation, Texture, Diameter and Malignancy), and improved the final prediction result. Each CNN would generate characteristic-specific feature representations, and then we applied multi-task learning on the features to predict the corresponding likelihood for that characteristic. We evaluated the proposed method on 2620 nodules CT scans from LIDC-IDRI dataset with the 5-fold cross validation strategy. The multitask CNN regression result for regression RMSE and mapped classification ACC were 0.830 and 83.03%, while the results for single task regression RMSE 0.894 and mapped classification ACC 74.9%. Experiments show that the proposed method could predict the lung nodule malignancy likelihood effectively and outperforms the state-of-the-art methods. The learning framework could easily be applied in other anomaly likelihood prediction problem, such as skin cancer and breast cancer. It demonstrated the possibility of our method facilitating the radiologists for nodule staging assessment and individual therapeutic planning.

  3. Total pubertal growth in patients with juvenile idiopathic arthritis treated with growth hormone: analysis of a single center.

    PubMed

    Bechtold, S; Beyerlein, A; Ripperger, P; Roeb, J; Dalla Pozza, R; Häfner, R; Haas, J P; Schmidt, H

    2012-10-01

    Growth failure is a permanent sequelae in juvenile idiopathic arthritis (JIA). The aim of the study was to compare pubertal growth in control and growth hormone (GH) treated JIA subjects. 64 children with JIA at a mean age of 10.38 ± 2.80 years were enrolled and followed until final height (measured in standard deviation (SD) scores). 39 children (20 m) received GH therapy and 24 (9 m) served as controls. GH dose was 0.33 mg/kg/week. Linear regression analysis was performed to identify factors influencing total pubertal growth. Mean total pubertal growth was 21.1 ± 1.3 cm (mean ± SD) in GH treated JIA patients and 13.8 ± 1.5 cm in controls. Final height was significantly higher with GH treatment (-1.67 ± 1.20 SD) compared to controls (-3.20 ± 1.84 SD). Linear regression model identified age at onset of puberty (ß=-4.2,CI: -5.9, -2.6 in controls and ß=-2.3,CI: -3.6, -1.1 in GH treated) as the main factor for total pubertal growth. Final height SDS was determined by the difference to target height at onset of puberty (ß=-0.59;CI: -0.80, -0.37 in controls and ß=-0.30,CI: -0.52, -0.08 in GH treated), age at onset of puberty (ß=0.47;CI:0.02,0.93 in controls and 0.23;CI: -0.00,0.46 in GH treated) and height gain during puberty (ß=0.13;CI:0.05,0.21 in controls and ß=0.11;CI:0.07,0.16 in GH treated). Total pubertal growth in JIA patients treated with GH was increased by a factor of 1.5 greater in comparison to controls leading to a significantly better final height. To maximize final height GH treatment should be initiated early to reduce the height deficit at onset of puberty. Copyright © 2012 Elsevier Ltd. All rights reserved.

  4. Association of the pre-internship objective structured clinical examination in final year medical students with comprehensive written examinations.

    PubMed

    Eftekhar, Hasan; Labaf, Ali; Anvari, Pasha; Jamali, Arsia; Sheybaee-Moghaddam, Farshad

    2012-01-01

    The purpose of this study is to evaluate the association of the pre-internship Objective Structured Clinical Examination (OSCE) in final year medical students with comprehensive written examinations. SUBJECTS AND MATERIAL: All medical students of October 2004 admission who took part in the October 2010 National Comprehensive Pre-internship Examination (NCPE) and pre-internship OSCE were included in the study (n = 130). OSCE and NCPE scores and medical grade point average (GPA) were collected. GPA was highly correlated with NCPE (r = 0.76 and P<0.001) and moderately with OSCE (r = 0.68 and P < 0.001). Similarly a moderate correlation was observed between NCPE and OSCE scores(r = 0.6 and P < 0.001).Linear stepwise regression shows r(2) of a model applying GPA as predictor of OSCE score is 0.46 (β = 0.68 and P < 0.001), while addition of gender to the model increases r(2) to 0.59 (β = 0.61 and 0.36, for GPA and male gender, respectively and P < 0.001). Logistic forward regression models shows male gender and GPA are the only dependent predictors of high score in OSCE. OR of GPA and male gender for high OSCE score are 4.89 (95% CI = 2.37-10.06) and 6.95 (95% CI = 2.00-24.21), respectively (P < 0.001). Our findings indicate OSCE and examination which mainly evaluate knowledge, judged by GPA and NCPE are moderately to highly correlated. Our results illustrate the interwoven nature of knowledge and clinical skills. In other words, certain level of knowledge is crucial for appropriate clinical performance. Our findings suggest neither OSCE nor written forms of assessments can replace each other. They are complimentary and should also be combined by other evaluations to cover all attributes of clinical competence efficiently.

  5. Correlational Analysis of neck/shoulder Pain and Low Back Pain with the Use of Digital Products, Physical Activity and Psychological Status among Adolescents in Shanghai

    PubMed Central

    Li, Jipeng; Li, Yangyang; Zhang, Yongxing; Zhao, Qinghua

    2013-01-01

    Purpose This study investigates the neck/shoulder pain (NSP) and low back pain (LBP) among current high school students in Shanghai and explores the relationship between these pains and their possible influences, including digital products, physical activity, and psychological status. Methods An anonymous self-assessment was administered to 3,600 students across 30 high schools in Shanghai. This questionnaire examined the prevalence of NSP and LBP and the level of physical activity as well as the use of mobile phones, personal computers (PC) and tablet computers (Tablet). The CES-D (Center for Epidemiological Studies Depression) scale was also included in the survey. The survey data were analyzed using the chi-square test, univariate logistic analyses and a multivariate logistic regression model. Results Three thousand sixteen valid questionnaires were received including 1,460 (48.41%) from male respondents and 1,556 (51.59%) from female respondents. The high school students in this study showed NSP and LBP rates of 40.8% and 33.1%, respectively, and the prevalence of both influenced by the student’s grade, use of digital products, and mental status; these factors affected the rates of NSP and LBP to varying degrees. The multivariate logistic regression analysis revealed that Gender, grade, soreness after exercise, PC using habits, tablet use, sitting time after school and academic stress entered the final model of NSP, while the final model of LBP consisted of gender, grade, soreness after exercise, PC using habits, mobile phone use, sitting time after school, academic stress and CES-D score. Conclusions High school students in Shanghai showed high prevalence of NSP and LBP that were closely related to multiple factors. Appropriate interventions should be implemented to reduce the occurrences of NSP and LBP. PMID:24147114

  6. Association of the pre-internship objective structured clinical examination in final year medical students with comprehensive written examinations

    PubMed Central

    Eftekhar, Hasan; Labaf, Ali; Anvari, Pasha; Jamali, Arsia; Sheybaee-Moghaddam, Farshad

    2012-01-01

    Aim The purpose of this study is to evaluate the association of the pre-internship Objective Structured Clinical Examination (OSCE) in final year medical students with comprehensive written examinations. Subjects and material All medical students of October 2004 admission who took part in the October 2010 National Comprehensive Pre-internship Examination (NCPE) and pre-internship OSCE were included in the study (n=130). OSCE and NCPE scores and medical grade point average (GPA) were collected. Results GPA was highly correlated with NCPE (r=0.76 and P<0.001) and moderately with OSCE (r=0.68 and P<0.001). Similarly a moderate correlation was observed between NCPE and OSCE scores(r=0.6 and P<0.001).Linear stepwise regression shows r 2 of a model applying GPA as predictor of OSCE score is 0.46 (β=0.68 and P<0.001), while addition of gender to the model increases r 2 to 0.59 (β=0.61 and 0.36, for GPA and male gender, respectively and P<0.001). Logistic forward regression models shows male gender and GPA are the only dependent predictors of high score in OSCE. OR of GPA and male gender for high OSCE score are 4.89 (95% CI=2.37–10.06) and 6.95 (95% CI=2.00–24.21), respectively (P<0.001). Discussion Our findings indicate OSCE and examination which mainly evaluate knowledge, judged by GPA and NCPE are moderately to highly correlated. Our results illustrate the interwoven nature of knowledge and clinical skills. In other words, certain level of knowledge is crucial for appropriate clinical performance. Our findings suggest neither OSCE nor written forms of assessments can replace each other. They are complimentary and should also be combined by other evaluations to cover all attributes of clinical competence efficiently. PMID:22547924

  7. Inferring general relations between network characteristics from specific network ensembles.

    PubMed

    Cardanobile, Stefano; Pernice, Volker; Deger, Moritz; Rotter, Stefan

    2012-01-01

    Different network models have been suggested for the topology underlying complex interactions in natural systems. These models are aimed at replicating specific statistical features encountered in real-world networks. However, it is rarely considered to which degree the results obtained for one particular network class can be extrapolated to real-world networks. We address this issue by comparing different classical and more recently developed network models with respect to their ability to generate networks with large structural variability. In particular, we consider the statistical constraints which the respective construction scheme imposes on the generated networks. After having identified the most variable networks, we address the issue of which constraints are common to all network classes and are thus suitable candidates for being generic statistical laws of complex networks. In fact, we find that generic, not model-related dependencies between different network characteristics do exist. This makes it possible to infer global features from local ones using regression models trained on networks with high generalization power. Our results confirm and extend previous findings regarding the synchronization properties of neural networks. Our method seems especially relevant for large networks, which are difficult to map completely, like the neural networks in the brain. The structure of such large networks cannot be fully sampled with the present technology. Our approach provides a method to estimate global properties of under-sampled networks in good approximation. Finally, we demonstrate on three different data sets (C. elegans neuronal network, R. prowazekii metabolic network, and a network of synonyms extracted from Roget's Thesaurus) that real-world networks have statistical relations compatible with those obtained using regression models.

  8. A predictive model for diagnosing bipolar disorder based on the clinical characteristics of major depressive episodes in Chinese population.

    PubMed

    Gan, Zhaoyu; Diao, Feici; Wei, Qinling; Wu, Xiaoli; Cheng, Minfeng; Guan, Nianhong; Zhang, Ming; Zhang, Jinbei

    2011-11-01

    A correct timely diagnosis of bipolar depression remains a big challenge for clinicians. This study aimed to develop a clinical characteristic based model to predict the diagnosis of bipolar disorder among patients with current major depressive episodes. A prospective study was carried out on 344 patients with current major depressive episodes, with 268 completing 1-year follow-up. Data were collected through structured interviews. Univariate binary logistic regression was conducted to select potential predictive variables among 19 initial variables, and then multivariate binary logistic regression was performed to analyze the combination of risk factors and build a predictive model. Receiver operating characteristic (ROC) curve was plotted. Of 19 initial variables, 13 variables were preliminarily selected, and then forward stepwise exercise produced a final model consisting of 6 variables: age at first onset, maximum duration of depressive episodes, somatalgia, hypersomnia, diurnal variation of mood, irritability. The correct prediction rate of this model was 78% (95%CI: 75%-86%) and the area under the ROC curve was 0.85 (95%CI: 0.80-0.90). The cut-off point for age at first onset was 28.5 years old, while the cut-off point for maximum duration of depressive episode was 7.5 months. The limitations of this study include small sample size, relatively short follow-up period and lack of treatment information. Our predictive models based on six clinical characteristics of major depressive episodes prove to be robust and can help differentiate bipolar depression from unipolar depression. Copyright © 2011 Elsevier B.V. All rights reserved.

  9. Estimation of daily protein intake based on spot urine urea nitrogen concentration in chronic kidney disease patients.

    PubMed

    Kanno, Hiroko; Kanda, Eiichiro; Sato, Asako; Sakamoto, Kaori; Kanno, Yoshihiko

    2016-04-01

    Determination of daily protein intake in the management of chronic kidney disease (CKD) requires precision. Inaccuracies in recording dietary intake occur, and estimation from total urea excretion presents hurdles owing to the difficulty of collecting whole urine for 24 h. Spot urine has been used for measuring daily sodium intake and urinary protein excretion. In this cross-sectional study, we investigated whether urea nitrogen (UN) concentration in spot urine can be used to predict daily protein intake instead of the 24-h urine collection in 193 Japanese CKD patients (Stages G1-G5). After patient randomization into 2 datasets for the development and validation of models, bootstrapping was used to develop protein intake estimation models. The parameters for the candidate multivariate regression models were male gender, age, body mass index (BMI), diabetes mellitus, dyslipidemia, proteinuria, estimated glomerular filtration rate, serum albumin level, spot urinary UN and creatinine level, and spot urinary UN/creatinine levels. The final model contained BMI and spot urinary UN level. The final model was selected because of the higher correlation between the predicted and measured protein intakes r = 0.558 (95 % confidence interval 0.400, 0.683), and the smaller distribution of the difference between the measured and predicted protein intakes than those of the other models. The results suggest that UN concentration in spot urine may be used to estimate daily protein intake and that a prediction formula would be useful for nutritional control in CKD patients.

  10. A generalized right truncated bivariate Poisson regression model with applications to health data.

    PubMed

    Islam, M Ataharul; Chowdhury, Rafiqul I

    2017-01-01

    A generalized right truncated bivariate Poisson regression model is proposed in this paper. Estimation and tests for goodness of fit and over or under dispersion are illustrated for both untruncated and right truncated bivariate Poisson regression models using marginal-conditional approach. Estimation and test procedures are illustrated for bivariate Poisson regression models with applications to Health and Retirement Study data on number of health conditions and the number of health care services utilized. The proposed test statistics are easy to compute and it is evident from the results that the models fit the data very well. A comparison between the right truncated and untruncated bivariate Poisson regression models using the test for nonnested models clearly shows that the truncated model performs significantly better than the untruncated model.

  11. A generalized right truncated bivariate Poisson regression model with applications to health data

    PubMed Central

    Islam, M. Ataharul; Chowdhury, Rafiqul I.

    2017-01-01

    A generalized right truncated bivariate Poisson regression model is proposed in this paper. Estimation and tests for goodness of fit and over or under dispersion are illustrated for both untruncated and right truncated bivariate Poisson regression models using marginal-conditional approach. Estimation and test procedures are illustrated for bivariate Poisson regression models with applications to Health and Retirement Study data on number of health conditions and the number of health care services utilized. The proposed test statistics are easy to compute and it is evident from the results that the models fit the data very well. A comparison between the right truncated and untruncated bivariate Poisson regression models using the test for nonnested models clearly shows that the truncated model performs significantly better than the untruncated model. PMID:28586344

  12. Self-consistent core-pedestal transport simulations with neural network accelerated models

    DOE PAGES

    Meneghini, Orso; Smith, Sterling P.; Snyder, Philip B.; ...

    2017-07-12

    Fusion whole device modeling simulations require comprehensive models that are simultaneously physically accurate, fast, robust, and predictive. In this paper we describe the development of two neural-network (NN) based models as a means to perform a snon-linear multivariate regression of theory-based models for the core turbulent transport fluxes, and the pedestal structure. Specifically, we find that a NN-based approach can be used to consistently reproduce the results of the TGLF and EPED1 theory-based models over a broad range of plasma regimes, and with a computational speedup of several orders of magnitudes. These models are then integrated into a predictive workflowmore » that allows prediction with self-consistent core-pedestal coupling of the kinetic profiles within the last closed flux surface of the plasma. Finally, the NN paradigm is capable of breaking the speed-accuracy trade-off that is expected of traditional numerical physics models, and can provide the missing link towards self-consistent coupled core-pedestal whole device modeling simulations that are physically accurate and yet take only seconds to run.« less

  13. Self-consistent core-pedestal transport simulations with neural network accelerated models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Meneghini, Orso; Smith, Sterling P.; Snyder, Philip B.

    Fusion whole device modeling simulations require comprehensive models that are simultaneously physically accurate, fast, robust, and predictive. In this paper we describe the development of two neural-network (NN) based models as a means to perform a snon-linear multivariate regression of theory-based models for the core turbulent transport fluxes, and the pedestal structure. Specifically, we find that a NN-based approach can be used to consistently reproduce the results of the TGLF and EPED1 theory-based models over a broad range of plasma regimes, and with a computational speedup of several orders of magnitudes. These models are then integrated into a predictive workflowmore » that allows prediction with self-consistent core-pedestal coupling of the kinetic profiles within the last closed flux surface of the plasma. Finally, the NN paradigm is capable of breaking the speed-accuracy trade-off that is expected of traditional numerical physics models, and can provide the missing link towards self-consistent coupled core-pedestal whole device modeling simulations that are physically accurate and yet take only seconds to run.« less

  14. Model parameter estimation approach based on incremental analysis for lithium-ion batteries without using open circuit voltage

    NASA Astrophysics Data System (ADS)

    Wu, Hongjie; Yuan, Shifei; Zhang, Xi; Yin, Chengliang; Ma, Xuerui

    2015-08-01

    To improve the suitability of lithium-ion battery model under varying scenarios, such as fluctuating temperature and SoC variation, dynamic model with parameters updated realtime should be developed. In this paper, an incremental analysis-based auto regressive exogenous (I-ARX) modeling method is proposed to eliminate the modeling error caused by the OCV effect and improve the accuracy of parameter estimation. Then, its numerical stability, modeling error, and parametric sensitivity are analyzed at different sampling rates (0.02, 0.1, 0.5 and 1 s). To identify the model parameters recursively, a bias-correction recursive least squares (CRLS) algorithm is applied. Finally, the pseudo random binary sequence (PRBS) and urban dynamic driving sequences (UDDSs) profiles are performed to verify the realtime performance and robustness of the newly proposed model and algorithm. Different sampling rates (1 Hz and 10 Hz) and multiple temperature points (5, 25, and 45 °C) are covered in our experiments. The experimental and simulation results indicate that the proposed I-ARX model can present high accuracy and suitability for parameter identification without using open circuit voltage.

  15. A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield

    NASA Astrophysics Data System (ADS)

    Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan

    2018-04-01

    In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.

  16. Factors associated with reporting of abuse against children and adolescents by nurses within Primary Health Care1

    PubMed Central

    Rolim, Ana Carine Arruda; Moreira, Gracyelle Alves Remigio; Gondim, Sarah Maria Mendes; Paz, Soraya da Silva; Vieira, Luiza Jane Eyre de Souza

    2014-01-01

    OBJECTIVE: to analyze the factors associated with the underreporting on the part of nurses within Primary Health Care of abuse against children and adolescents. METHOD: cross-sectional study with 616 nurses. A questionnaire addressed socio-demographic data, profession, instrumentation and knowledge on the topic, identification and reporting of abuse cases. Bivariate and multivariate logistic regression was used. RESULTS: female nurses, aged between 21 and 32 years old, not married, with five or more years since graduation, with graduate studies, and working for five or more years in PHC predominated. The final regression model showed that factors such as working for five or more years, having a reporting form within the PHC unit, and believing that reporting within Primary Health Care is an advantage, facilitate reporting. CONCLUSION: the study's results may, in addition to sensitizing nurses, support management professionals in establishing strategies intended to produce compliance with reporting as a legal device that ensures the rights of children and adolescents. PMID:25591102

  17. [Predictors of Resilience in Adolescents with Leukemia].

    PubMed

    Hong, Sung Sil; Park, Ho Ran

    2015-08-01

    The purpose of this study was to identify the factors relating to resilience for adolescents with leukemia and examine the relationship between these factors. From June to September in 2014, 199 adolescents aged 11 to 21 participated in the study as they visited the out-patient clinic at C university hospital for follow-up care. To verify the predictors and the effects of resilience, uncertainty, symptom distress, perceived social support, spiritual perspective, defensive coping, courageous coping, hope, and self-transcendence were measured. Collected data were analyzed using hierarchical regression analysis with the SAS statistics program. The final regression model showed that courageous coping, hope, and self-transcendence were significant predictors related to resilience in adolescents with leukemia and explained for 63% of the variance in resilience. The findings indicate that adolescent-oriented intervention programs enhancing courageous coping, hope, and self-transcendence should be provide for adolescents with leukemia in order to overcome illness-related stress and support physical, psychological and social adjustment.

  18. A fast identification algorithm for Box-Cox transformation based radial basis function neural network.

    PubMed

    Hong, Xia

    2006-07-01

    In this letter, a Box-Cox transformation-based radial basis function (RBF) neural network is introduced using the RBF neural network to represent the transformed system output. Initially a fixed and moderate sized RBF model base is derived based on a rank revealing orthogonal matrix triangularization (QR decomposition). Then a new fast identification algorithm is introduced using Gauss-Newton algorithm to derive the required Box-Cox transformation, based on a maximum likelihood estimator. The main contribution of this letter is to explore the special structure of the proposed RBF neural network for computational efficiency by utilizing the inverse of matrix block decomposition lemma. Finally, the Box-Cox transformation-based RBF neural network, with good generalization and sparsity, is identified based on the derived optimal Box-Cox transformation and a D-optimality-based orthogonal forward regression algorithm. The proposed algorithm and its efficacy are demonstrated with an illustrative example in comparison with support vector machine regression.

  19. Gender Role Conflict, Interest in Casual Sex, and Relationship Satisfaction Among Gay Men

    PubMed Central

    Sanchez, Fráncisco J.; Bocklandt, Sven; Vilain, Eric

    2010-01-01

    This study compared single (n = 129) and partnered gay men (n = 114) to determine if they differed in their concerns over traditional masculine roles and interest in casual sex, and to measure the relationship between concerns over masculine roles and interest in casual sex. Additionally, a regression model to predict relationship satisfaction was tested. Participants were recruited at two Southern California Gay Pride festivals. Group comparisons showed single men were more restrictive in their affectionate behavior with other men (effect-size r = .14) and were more interested in casual sex than partnered men (effect-size r = .13); and partnered men were more concerned with being successful, powerful, and competitive than single men (effect-size r = .20). Different masculine roles were predictive of interest in casual sex among the two groups of men. Finally, a hierarchical regression analysis found that interest in casual sex and the length of one’s current relationship served as unique predictors of relationship satisfaction among the partnered gay men (Cohen’s f2 = .52). PMID:20721305

  20. Spatial Assessment of Model Errors from Four Regression Techniques

    Treesearch

    Lianjun Zhang; Jeffrey H. Gove; Jeffrey H. Gove

    2005-01-01

    Fomst modelers have attempted to account for the spatial autocorrelations among trees in growth and yield models by applying alternative regression techniques such as linear mixed models (LMM), generalized additive models (GAM), and geographicalIy weighted regression (GWR). However, the model errors are commonly assessed using average errors across the entire study...

Top