Robust mislabel logistic regression without modeling mislabel probabilities.
Hung, Hung; Jou, Zhi-Yu; Huang, Su-Yun
2018-03-01
Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression. © 2017, The International Biometric Society.
Detecting nonsense for Chinese comments based on logistic regression
NASA Astrophysics Data System (ADS)
Zhuolin, Ren; Guang, Chen; Shu, Chen
2016-07-01
To understand cyber citizens' opinion accurately from Chinese news comments, the clear definition on nonsense is present, and a detection model based on logistic regression (LR) is proposed. The detection of nonsense can be treated as a binary-classification problem. Besides of traditional lexical features, we propose three kinds of features in terms of emotion, structure and relevance. By these features, we train an LR model and demonstrate its effect in understanding Chinese news comments. We find that each of proposed features can significantly promote the result. In our experiments, we achieve a prediction accuracy of 84.3% which improves the baseline 77.3% by 7%.
NASA Astrophysics Data System (ADS)
Cary, Theodore W.; Cwanger, Alyssa; Venkatesh, Santosh S.; Conant, Emily F.; Sehgal, Chandra M.
2012-03-01
This study compares the performance of two proven but very different machine learners, Naïve Bayes and logistic regression, for differentiating malignant and benign breast masses using ultrasound imaging. Ultrasound images of 266 masses were analyzed quantitatively for shape, echogenicity, margin characteristics, and texture features. These features along with patient age, race, and mammographic BI-RADS category were used to train Naïve Bayes and logistic regression classifiers to diagnose lesions as malignant or benign. ROC analysis was performed using all of the features and using only a subset that maximized information gain. Performance was determined by the area under the ROC curve, Az, obtained from leave-one-out cross validation. Naïve Bayes showed significant variation (Az 0.733 +/- 0.035 to 0.840 +/- 0.029, P < 0.002) with the choice of features, but the performance of logistic regression was relatively unchanged under feature selection (Az 0.839 +/- 0.029 to 0.859 +/- 0.028, P = 0.605). Out of 34 features, a subset of 6 gave the highest information gain: brightness difference, margin sharpness, depth-to-width, mammographic BI-RADs, age, and race. The probabilities of malignancy determined by Naïve Bayes and logistic regression after feature selection showed significant correlation (R2= 0.87, P < 0.0001). The diagnostic performance of Naïve Bayes and logistic regression can be comparable, but logistic regression is more robust. Since probability of malignancy cannot be measured directly, high correlation between the probabilities derived from two basic but dissimilar models increases confidence in the predictive power of machine learning models for characterizing solid breast masses on ultrasound.
Deng, Yingyuan; Wang, Tianfu; Chen, Siping; Liu, Weixiang
2017-01-01
The aim of the study is to screen the significant sonographic features by logistic regression analysis and fit a model to diagnose thyroid nodules. A total of 525 pathological thyroid nodules were retrospectively analyzed. All the nodules underwent conventional ultrasonography (US), strain elastosonography (SE), and contrast -enhanced ultrasound (CEUS). Those nodules’ 12 suspicious sonographic features were used to assess thyroid nodules. The significant features of diagnosing thyroid nodules were picked out by logistic regression analysis. All variables that were statistically related to diagnosis of thyroid nodules, at a level of p < 0.05 were embodied in a logistic regression analysis model. The significant features in the logistic regression model of diagnosing thyroid nodules were calcification, suspected cervical lymph node metastasis, hypoenhancement pattern, margin, shape, vascularity, posterior acoustic, echogenicity, and elastography score. According to the results of logistic regression analysis, the formula that could predict whether or not thyroid nodules are malignant was established. The area under the receiver operating curve (ROC) was 0.930 and the sensitivity, specificity, accuracy, positive predictive value, and negative predictive value were 83.77%, 89.56%, 87.05%, 86.04%, and 87.79% respectively. PMID:29228030
Pang, Tiantian; Huang, Leidan; Deng, Yingyuan; Wang, Tianfu; Chen, Siping; Gong, Xuehao; Liu, Weixiang
2017-01-01
The aim of the study is to screen the significant sonographic features by logistic regression analysis and fit a model to diagnose thyroid nodules. A total of 525 pathological thyroid nodules were retrospectively analyzed. All the nodules underwent conventional ultrasonography (US), strain elastosonography (SE), and contrast -enhanced ultrasound (CEUS). Those nodules' 12 suspicious sonographic features were used to assess thyroid nodules. The significant features of diagnosing thyroid nodules were picked out by logistic regression analysis. All variables that were statistically related to diagnosis of thyroid nodules, at a level of p < 0.05 were embodied in a logistic regression analysis model. The significant features in the logistic regression model of diagnosing thyroid nodules were calcification, suspected cervical lymph node metastasis, hypoenhancement pattern, margin, shape, vascularity, posterior acoustic, echogenicity, and elastography score. According to the results of logistic regression analysis, the formula that could predict whether or not thyroid nodules are malignant was established. The area under the receiver operating curve (ROC) was 0.930 and the sensitivity, specificity, accuracy, positive predictive value, and negative predictive value were 83.77%, 89.56%, 87.05%, 86.04%, and 87.79% respectively.
Häberle, Lothar; Hack, Carolin C; Heusinger, Katharina; Wagner, Florian; Jud, Sebastian M; Uder, Michael; Beckmann, Matthias W; Schulz-Wendtland, Rüdiger; Wittenberg, Thomas; Fasching, Peter A
2017-08-30
Tumors in radiologically dense breast were overlooked on mammograms more often than tumors in low-density breasts. A fast reproducible and automated method of assessing percentage mammographic density (PMD) would be desirable to support decisions whether ultrasonography should be provided for women in addition to mammography in diagnostic mammography units. PMD assessment has still not been included in clinical routine work, as there are issues of interobserver variability and the procedure is quite time consuming. This study investigated whether fully automatically generated texture features of mammograms can replace time-consuming semi-automatic PMD assessment to predict a patient's risk of having an invasive breast tumor that is visible on ultrasound but masked on mammography (mammography failure). This observational study included 1334 women with invasive breast cancer treated at a hospital-based diagnostic mammography unit. Ultrasound was available for the entire cohort as part of routine diagnosis. Computer-based threshold PMD assessments ("observed PMD") were carried out and 363 texture features were obtained from each mammogram. Several variable selection and regression techniques (univariate selection, lasso, boosting, random forest) were applied to predict PMD from the texture features. The predicted PMD values were each used as new predictor for masking in logistic regression models together with clinical predictors. These four logistic regression models with predicted PMD were compared among themselves and with a logistic regression model with observed PMD. The most accurate masking prediction was determined by cross-validation. About 120 of the 363 texture features were selected for predicting PMD. Density predictions with boosting were the best substitute for observed PMD to predict masking. Overall, the corresponding logistic regression model performed better (cross-validated AUC, 0.747) than one without mammographic density (0.734), but less well than the one with the observed PMD (0.753). However, in patients with an assigned mammography failure risk >10%, covering about half of all masked tumors, the boosting-based model performed at least as accurately as the original PMD model. Automatically generated texture features can replace semi-automatically determined PMD in a prediction model for mammography failure, such that more than 50% of masked tumors could be discovered.
Habitat features and predictive habitat modeling for the Colorado chipmunk in southern New Mexico
Rivieccio, M.; Thompson, B.C.; Gould, W.R.; Boykin, K.G.
2003-01-01
Two subspecies of Colorado chipmunk (state threatened and federal species of concern) occur in southern New Mexico: Tamias quadrivittatus australis in the Organ Mountains and T. q. oscuraensis in the Oscura Mountains. We developed a GIS model of potentially suitable habitat based on vegetation and elevation features, evaluated site classifications of the GIS model, and determined vegetation and terrain features associated with chipmunk occurrence. We compared GIS model classifications with actual vegetation and elevation features measured at 37 sites. At 60 sites we measured 18 habitat variables regarding slope, aspect, tree species, shrub species, and ground cover. We used logistic regression to analyze habitat variables associated with chipmunk presence/absence. All (100%) 37 sample sites (28 predicted suitable, 9 predicted unsuitable) were classified correctly by the GIS model regarding elevation and vegetation. For 28 sites predicted suitable by the GIS model, 18 sites (64%) appeared visually suitable based on habitat variables selected from logistic regression analyses, of which 10 sites (36%) were specifically predicted as suitable habitat via logistic regression. We detected chipmunks at 70% of sites deemed suitable via the logistic regression models. Shrub cover, tree density, plant proximity, presence of logs, and presence of rock outcrop were retained in the logistic model for the Oscura Mountains; litter, shrub cover, and grass cover were retained in the logistic model for the Organ Mountains. Evaluation of predictive models illustrates the need for multi-stage analyses to best judge performance. Microhabitat analyses indicate prospective needs for different management strategies between the subspecies. Sensitivities of each population of the Colorado chipmunk to natural and prescribed fire suggest that partial burnings of areas inhabited by Colorado chipmunks in southern New Mexico may be beneficial. These partial burnings may later help avoid a fire that could substantially reduce habitat of chipmunks over a mountain range.
Computing group cardinality constraint solutions for logistic regression problems.
Zhang, Yong; Kwon, Dongjin; Pohl, Kilian M
2017-01-01
We derive an algorithm to directly solve logistic regression based on cardinality constraint, group sparsity and use it to classify intra-subject MRI sequences (e.g. cine MRIs) of healthy from diseased subjects. Group cardinality constraint models are often applied to medical images in order to avoid overfitting of the classifier to the training data. Solutions within these models are generally determined by relaxing the cardinality constraint to a weighted feature selection scheme. However, these solutions relate to the original sparse problem only under specific assumptions, which generally do not hold for medical image applications. In addition, inferring clinical meaning from features weighted by a classifier is an ongoing topic of discussion. Avoiding weighing features, we propose to directly solve the group cardinality constraint logistic regression problem by generalizing the Penalty Decomposition method. To do so, we assume that an intra-subject series of images represents repeated samples of the same disease patterns. We model this assumption by combining series of measurements created by a feature across time into a single group. Our algorithm then derives a solution within that model by decoupling the minimization of the logistic regression function from enforcing the group sparsity constraint. The minimum to the smooth and convex logistic regression problem is determined via gradient descent while we derive a closed form solution for finding a sparse approximation of that minimum. We apply our method to cine MRI of 38 healthy controls and 44 adult patients that received reconstructive surgery of Tetralogy of Fallot (TOF) during infancy. Our method correctly identifies regions impacted by TOF and generally obtains statistically significant higher classification accuracy than alternative solutions to this model, i.e., ones relaxing group cardinality constraints. Copyright © 2016 Elsevier B.V. All rights reserved.
Yang, Lixue; Chen, Kean
2015-11-01
To improve the design of underwater target recognition systems based on auditory perception, this study compared human listeners with automatic classifiers. Performances measures and strategies in three discrimination experiments, including discriminations between man-made and natural targets, between ships and submarines, and among three types of ships, were used. In the experiments, the subjects were asked to assign a score to each sound based on how confident they were about the category to which it belonged, and logistic regression, which represents linear discriminative models, also completed three similar tasks by utilizing many auditory features. The results indicated that the performances of logistic regression improved as the ratio between inter- and intra-class differences became larger, whereas the performances of the human subjects were limited by their unfamiliarity with the targets. Logistic regression performed better than the human subjects in all tasks but the discrimination between man-made and natural targets, and the strategies employed by excellent human subjects were similar to that of logistic regression. Logistic regression and several human subjects demonstrated similar performances when discriminating man-made and natural targets, but in this case, their strategies were not similar. An appropriate fusion of their strategies led to further improvement in recognition accuracy.
McLaren, Christine E.; Chen, Wen-Pin; Nie, Ke; Su, Min-Ying
2009-01-01
Rationale and Objectives Dynamic contrast enhanced MRI (DCE-MRI) is a clinical imaging modality for detection and diagnosis of breast lesions. Analytical methods were compared for diagnostic feature selection and performance of lesion classification to differentiate between malignant and benign lesions in patients. Materials and Methods The study included 43 malignant and 28 benign histologically-proven lesions. Eight morphological parameters, ten gray level co-occurrence matrices (GLCM) texture features, and fourteen Laws’ texture features were obtained using automated lesion segmentation and quantitative feature extraction. Artificial neural network (ANN) and logistic regression analysis were compared for selection of the best predictors of malignant lesions among the normalized features. Results Using ANN, the final four selected features were compactness, energy, homogeneity, and Law_LS, with area under the receiver operating characteristic curve (AUC) = 0.82, and accuracy = 0.76. The diagnostic performance of these 4-features computed on the basis of logistic regression yielded AUC = 0.80 (95% CI, 0.688 to 0.905), similar to that of ANN. The analysis also shows that the odds of a malignant lesion decreased by 48% (95% CI, 25% to 92%) for every increase of 1 SD in the Law_LS feature, adjusted for differences in compactness, energy, and homogeneity. Using logistic regression with z-score transformation, a model comprised of compactness, NRL entropy, and gray level sum average was selected, and it had the highest overall accuracy of 0.75 among all models, with AUC = 0.77 (95% CI, 0.660 to 0.880). When logistic modeling of transformations using the Box-Cox method was performed, the most parsimonious model with predictors, compactness and Law_LS, had an AUC of 0.79 (95% CI, 0.672 to 0.898). Conclusion The diagnostic performance of models selected by ANN and logistic regression was similar. The analytic methods were found to be roughly equivalent in terms of predictive ability when a small number of variables were chosen. The robust ANN methodology utilizes a sophisticated non-linear model, while logistic regression analysis provides insightful information to enhance interpretation of the model features. PMID:19409817
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhong, H; Wang, J; Shen, L
Purpose: The purpose of this study is to investigate the relationship between computed tomographic (CT) texture features of primary lesions and metastasis-free survival for rectal cancer patients; and to develop a datamining prediction model using texture features. Methods: A total of 220 rectal cancer patients treated with neoadjuvant chemo-radiotherapy (CRT) were enrolled in this study. All patients underwent CT scans before CRT. The primary lesions on the CT images were delineated by two experienced oncologists. The CT images were filtered by Laplacian of Gaussian (LoG) filters with different filter values (1.0–2.5: from fine to coarse). Both filtered and unfiltered imagesmore » were analyzed using Gray-level Co-occurrence Matrix (GLCM) texture analysis with different directions (transversal, sagittal, and coronal). Totally, 270 texture features with different species, directions and filter values were extracted. Texture features were examined with Student’s t-test for selecting predictive features. Principal Component Analysis (PCA) was performed upon the selected features to reduce the feature collinearity. Artificial neural network (ANN) and logistic regression were applied to establish metastasis prediction models. Results: Forty-six of 220 patients developed metastasis with a follow-up time of more than 2 years. Sixtyseven texture features were significantly different in t-test (p<0.05) between patients with and without metastasis, and 12 of them were extremely significant (p<0.001). The Area-under-the-curve (AUC) of ANN was 0.72, and the concordance index (CI) of logistic regression was 0.71. The predictability of ANN was slightly better than logistic regression. Conclusion: CT texture features of primary lesions are related to metastasisfree survival of rectal cancer patients. Both ANN and logistic regression based models can be developed for prediction.« less
A New Approach for Mobile Advertising Click-Through Rate Estimation Based on Deep Belief Nets.
Chen, Jie-Hao; Zhao, Zi-Qian; Shi, Ji-Yun; Zhao, Chong
2017-01-01
In recent years, with the rapid development of mobile Internet and its business applications, mobile advertising Click-Through Rate (CTR) estimation has become a hot research direction in the field of computational advertising, which is used to achieve accurate advertisement delivery for the best benefits in the three-side game between media, advertisers, and audiences. Current research on the estimation of CTR mainly uses the methods and models of machine learning, such as linear model or recommendation algorithms. However, most of these methods are insufficient to extract the data features and cannot reflect the nonlinear relationship between different features. In order to solve these problems, we propose a new model based on Deep Belief Nets to predict the CTR of mobile advertising, which combines together the powerful data representation and feature extraction capability of Deep Belief Nets, with the advantage of simplicity of traditional Logistic Regression models. Based on the training dataset with the information of over 40 million mobile advertisements during a period of 10 days, our experiments show that our new model has better estimation accuracy than the classic Logistic Regression (LR) model by 5.57% and Support Vector Regression (SVR) model by 5.80%.
A New Approach for Mobile Advertising Click-Through Rate Estimation Based on Deep Belief Nets
Zhao, Zi-Qian; Shi, Ji-Yun; Zhao, Chong
2017-01-01
In recent years, with the rapid development of mobile Internet and its business applications, mobile advertising Click-Through Rate (CTR) estimation has become a hot research direction in the field of computational advertising, which is used to achieve accurate advertisement delivery for the best benefits in the three-side game between media, advertisers, and audiences. Current research on the estimation of CTR mainly uses the methods and models of machine learning, such as linear model or recommendation algorithms. However, most of these methods are insufficient to extract the data features and cannot reflect the nonlinear relationship between different features. In order to solve these problems, we propose a new model based on Deep Belief Nets to predict the CTR of mobile advertising, which combines together the powerful data representation and feature extraction capability of Deep Belief Nets, with the advantage of simplicity of traditional Logistic Regression models. Based on the training dataset with the information of over 40 million mobile advertisements during a period of 10 days, our experiments show that our new model has better estimation accuracy than the classic Logistic Regression (LR) model by 5.57% and Support Vector Regression (SVR) model by 5.80%. PMID:29209363
Clustering performance comparison using K-means and expectation maximization algorithms.
Jung, Yong Gyu; Kang, Min Soo; Heo, Jun
2014-11-14
Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K -means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K -means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.
Prediction of siRNA potency using sparse logistic regression.
Hu, Wei; Hu, John
2014-06-01
RNA interference (RNAi) can modulate gene expression at post-transcriptional as well as transcriptional levels. Short interfering RNA (siRNA) serves as a trigger for the RNAi gene inhibition mechanism, and therefore is a crucial intermediate step in RNAi. There have been extensive studies to identify the sequence characteristics of potent siRNAs. One such study built a linear model using LASSO (Least Absolute Shrinkage and Selection Operator) to measure the contribution of each siRNA sequence feature. This model is simple and interpretable, but it requires a large number of nonzero weights. We have introduced a novel technique, sparse logistic regression, to build a linear model using single-position specific nucleotide compositions which has the same prediction accuracy of the linear model based on LASSO. The weights in our new model share the same general trend as those in the previous model, but have only 25 nonzero weights out of a total 84 weights, a 54% reduction compared to the previous model. Contrary to the linear model based on LASSO, our model suggests that only a few positions are influential on the efficacy of the siRNA, which are the 5' and 3' ends and the seed region of siRNA sequences. We also employed sparse logistic regression to build a linear model using dual-position specific nucleotide compositions, a task LASSO is not able to accomplish well due to its high dimensional nature. Our results demonstrate the superiority of sparse logistic regression as a technique for both feature selection and regression over LASSO in the context of siRNA design.
Characterizing mammographic images by using generic texture features
2012-01-01
Introduction Although mammographic density is an established risk factor for breast cancer, its use is limited in clinical practice because of a lack of automated and standardized measurement methods. The aims of this study were to evaluate a variety of automated texture features in mammograms as risk factors for breast cancer and to compare them with the percentage mammographic density (PMD) by using a case-control study design. Methods A case-control study including 864 cases and 418 controls was analyzed automatically. Four hundred seventy features were explored as possible risk factors for breast cancer. These included statistical features, moment-based features, spectral-energy features, and form-based features. An elaborate variable selection process using logistic regression analyses was performed to identify those features that were associated with case-control status. In addition, PMD was assessed and included in the regression model. Results Of the 470 image-analysis features explored, 46 remained in the final logistic regression model. An area under the curve of 0.79, with an odds ratio per standard deviation change of 2.88 (95% CI, 2.28 to 3.65), was obtained with validation data. Adding the PMD did not improve the final model. Conclusions Using texture features to predict the risk of breast cancer appears feasible. PMD did not show any additional value in this study. With regard to the features assessed, most of the analysis tools appeared to reflect mammographic density, although some features did not correlate with PMD. It remains to be investigated in larger case-control studies whether these features can contribute to increased prediction accuracy. PMID:22490545
Dynamic Dimensionality Selection for Bayesian Classifier Ensembles
2015-03-19
learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but much more...classifier, Generative learning, Discriminative learning, Naïve Bayes, Feature selection, Logistic regression , higher order attribute independence 16...discriminative learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but
Discriminating Induced-Microearthquakes Using New Seismic Features
NASA Astrophysics Data System (ADS)
Mousavi, S. M.; Horton, S.
2016-12-01
We studied characteristics of induced-microearthquakes on the basis of the waveforms recorded on a limited number of surface receivers using machine-learning techniques. Forty features in the time, frequency, and time-frequency domains were measured on each waveform, and several techniques such as correlation-based feature selection, Artificial Neural Networks (ANNs), Logistic Regression (LR) and X-mean were used as research tools to explore the relationship between these seismic features and source parameters. The results show that spectral features have the highest correlation to source depth. Two new measurements developed as seismic features for this study, spectral centroids and 2D cross-correlations in the time-frequency domain, performed better than the common seismic measurements. These features can be used by machine learning techniques for efficient automatic classification of low energy signals recorded at one or more seismic stations. We applied the technique to 440 microearthquakes-1.7Reference: Mousavi, S.M., S.P. Horton, C. A. Langston, B. Samei, (2016) Seismic features and automatic discrimination of deep and shallow induced-microearthquakes using neural network and logistic regression, Geophys. J. Int. doi: 10.1093/gji/ggw258.
NASA Astrophysics Data System (ADS)
Cao, Faxian; Yang, Zhijing; Ren, Jinchang; Ling, Wing-Kuen; Zhao, Huimin; Marshall, Stephen
2017-12-01
Although the sparse multinomial logistic regression (SMLR) has provided a useful tool for sparse classification, it suffers from inefficacy in dealing with high dimensional features and manually set initial regressor values. This has significantly constrained its applications for hyperspectral image (HSI) classification. In order to tackle these two drawbacks, an extreme sparse multinomial logistic regression (ESMLR) is proposed for effective classification of HSI. First, the HSI dataset is projected to a new feature space with randomly generated weight and bias. Second, an optimization model is established by the Lagrange multiplier method and the dual principle to automatically determine a good initial regressor for SMLR via minimizing the training error and the regressor value. Furthermore, the extended multi-attribute profiles (EMAPs) are utilized for extracting both the spectral and spatial features. A combinational linear multiple features learning (MFL) method is proposed to further enhance the features extracted by ESMLR and EMAPs. Finally, the logistic regression via the variable splitting and the augmented Lagrangian (LORSAL) is adopted in the proposed framework for reducing the computational time. Experiments are conducted on two well-known HSI datasets, namely the Indian Pines dataset and the Pavia University dataset, which have shown the fast and robust performance of the proposed ESMLR framework.
Lee, Bum Ju; Kim, Keun Ho; Ku, Boncho; Jang, Jun-Su; Kim, Jong Yeol
2013-05-01
The body mass index (BMI) provides essential medical information related to body weight for the treatment and prognosis prediction of diseases such as cardiovascular disease, diabetes, and stroke. We propose a method for the prediction of normal, overweight, and obese classes based only on the combination of voice features that are associated with BMI status, independently of weight and height measurements. A total of 1568 subjects were divided into 4 groups according to age and gender differences. We performed statistical analyses by analysis of variance (ANOVA) and Scheffe test to find significant features in each group. We predicted BMI status (normal, overweight, and obese) by a logistic regression algorithm and two ensemble classification algorithms (bagging and random forests) based on statistically significant features. In the Female-2030 group (females aged 20-40 years), classification experiments using an imbalanced (original) data set gave area under the receiver operating characteristic curve (AUC) values of 0.569-0.731 by logistic regression, whereas experiments using a balanced data set gave AUC values of 0.893-0.994 by random forests. AUC values in Female-4050 (females aged 41-60 years), Male-2030 (males aged 20-40 years), and Male-4050 (males aged 41-60 years) groups by logistic regression in imbalanced data were 0.585-0.654, 0.581-0.614, and 0.557-0.653, respectively. AUC values in Female-4050, Male-2030, and Male-4050 groups in balanced data were 0.629-0.893 by bagging, 0.707-0.916 by random forests, and 0.695-0.854 by bagging, respectively. In each group, we found discriminatory features showing statistical differences among normal, overweight, and obese classes. The results showed that the classification models built by logistic regression in imbalanced data were better than those built by the other two algorithms, and significant features differed according to age and gender groups. Our results could support the development of BMI diagnosis tools for real-time monitoring; such tools are considered helpful in improving automated BMI status diagnosis in remote healthcare or telemedicine and are expected to have applications in forensic and medical science. Copyright © 2013 Elsevier B.V. All rights reserved.
Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.
Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg
2009-11-01
G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.
Císař, Petr; Labbé, Laurent; Souček, Pavel; Pelissier, Pablo; Kerneis, Thierry
2018-01-01
The main aim of this study was to develop a new objective method for evaluating the impacts of different diets on the live fish skin using image-based features. In total, one-hundred and sixty rainbow trout (Oncorhynchus mykiss) were fed either a fish-meal based diet (80 fish) or a 100% plant-based diet (80 fish) and photographed using consumer-grade digital camera. Twenty-three colour features and four texture features were extracted. Four different classification methods were used to evaluate fish diets including Random forest (RF), Support vector machine (SVM), Logistic regression (LR) and k-Nearest neighbours (k-NN). The SVM with radial based kernel provided the best classifier with correct classification rate (CCR) of 82% and Kappa coefficient of 0.65. Although the both LR and RF methods were less accurate than SVM, they achieved good classification with CCR 75% and 70% respectively. The k-NN was the least accurate (40%) classification model. Overall, it can be concluded that consumer-grade digital cameras could be employed as the fast, accurate and non-invasive sensor for classifying rainbow trout based on their diets. Furthermore, these was a close association between image-based features and fish diet received during cultivation. These procedures can be used as non-invasive, accurate and precise approaches for monitoring fish status during the cultivation by evaluating diet’s effects on fish skin. PMID:29596375
Saberioon, Mohammadmehdi; Císař, Petr; Labbé, Laurent; Souček, Pavel; Pelissier, Pablo; Kerneis, Thierry
2018-03-29
The main aim of this study was to develop a new objective method for evaluating the impacts of different diets on the live fish skin using image-based features. In total, one-hundred and sixty rainbow trout ( Oncorhynchus mykiss ) were fed either a fish-meal based diet (80 fish) or a 100% plant-based diet (80 fish) and photographed using consumer-grade digital camera. Twenty-three colour features and four texture features were extracted. Four different classification methods were used to evaluate fish diets including Random forest (RF), Support vector machine (SVM), Logistic regression (LR) and k -Nearest neighbours ( k -NN). The SVM with radial based kernel provided the best classifier with correct classification rate (CCR) of 82% and Kappa coefficient of 0.65. Although the both LR and RF methods were less accurate than SVM, they achieved good classification with CCR 75% and 70% respectively. The k -NN was the least accurate (40%) classification model. Overall, it can be concluded that consumer-grade digital cameras could be employed as the fast, accurate and non-invasive sensor for classifying rainbow trout based on their diets. Furthermore, these was a close association between image-based features and fish diet received during cultivation. These procedures can be used as non-invasive, accurate and precise approaches for monitoring fish status during the cultivation by evaluating diet's effects on fish skin.
A new computational strategy for predicting essential genes.
Cheng, Jian; Wu, Wenwu; Zhang, Yinwen; Li, Xiangchen; Jiang, Xiaoqian; Wei, Gehong; Tao, Shiheng
2013-12-21
Determination of the minimum gene set for cellular life is one of the central goals in biology. Genome-wide essential gene identification has progressed rapidly in certain bacterial species; however, it remains difficult to achieve in most eukaryotic species. Several computational models have recently been developed to integrate gene features and used as alternatives to transfer gene essentiality annotations between organisms. We first collected features that were widely used by previous predictive models and assessed the relationships between gene features and gene essentiality using a stepwise regression model. We found two issues that could significantly reduce model accuracy: (i) the effect of multicollinearity among gene features and (ii) the diverse and even contrasting correlations between gene features and gene essentiality existing within and among different species. To address these issues, we developed a novel model called feature-based weighted Naïve Bayes model (FWM), which is based on Naïve Bayes classifiers, logistic regression, and genetic algorithm. The proposed model assesses features and filters out the effects of multicollinearity and diversity. The performance of FWM was compared with other popular models, such as support vector machine, Naïve Bayes model, and logistic regression model, by applying FWM to reciprocally predict essential genes among and within 21 species. Our results showed that FWM significantly improves the accuracy and robustness of essential gene prediction. FWM can remarkably improve the accuracy of essential gene prediction and may be used as an alternative method for other classification work. This method can contribute substantially to the knowledge of the minimum gene sets required for living organisms and the discovery of new drug targets.
An ultra low power feature extraction and classification system for wearable seizure detection.
Page, Adam; Pramod Tim Oates, Siddharth; Mohsenin, Tinoosh
2015-01-01
In this paper we explore the use of a variety of machine learning algorithms for designing a reliable and low-power, multi-channel EEG feature extractor and classifier for predicting seizures from electroencephalographic data (scalp EEG). Different machine learning classifiers including k-nearest neighbor, support vector machines, naïve Bayes, logistic regression, and neural networks are explored with the goal of maximizing detection accuracy while minimizing power, area, and latency. The input to each machine learning classifier is a 198 feature vector containing 9 features for each of the 22 EEG channels obtained over 1-second windows. All classifiers were able to obtain F1 scores over 80% and onset sensitivity of 100% when tested on 10 patients. Among five different classifiers that were explored, logistic regression (LR) proved to have minimum hardware complexity while providing average F-1 score of 91%. Both ASIC and FPGA implementations of logistic regression are presented and show the smallest area, power consumption, and the lowest latency when compared to the previous work.
Patient Stratification Using Electronic Health Records from a Chronic Disease Management Program.
Chen, Robert; Sun, Jimeng; Dittus, Robert S; Fabbri, Daniel; Kirby, Jacqueline; Laffer, Cheryl L; McNaughton, Candace D; Malin, Bradley
2016-01-04
The goal of this study is to devise a machine learning framework to assist care coordination programs in prognostic stratification to design and deliver personalized care plans and to allocate financial and medical resources effectively. This study is based on a de-identified cohort of 2,521 hypertension patients from a chronic care coordination program at the Vanderbilt University Medical Center. Patients were modeled as vectors of features derived from electronic health records (EHRs) over a six-year period. We applied a stepwise regression to identify risk factors associated with a decrease in mean arterial pressure of at least 2 mmHg after program enrollment. The resulting features were subsequently validated via a logistic regression classifier. Finally, risk factors were applied to group the patients through model-based clustering. We identified a set of predictive features that consisted of a mix of demographic, medication, and diagnostic concepts. Logistic regression over these features yielded an area under the ROC curve (AUC) of 0.71 (95% CI: [0.67, 0.76]). Based on these features, four clinically meaningful groups are identified through clustering - two of which represented patients with more severe disease profiles, while the remaining represented patients with mild disease profiles. Patients with hypertension can exhibit significant variation in their blood pressure control status and responsiveness to therapy. Yet this work shows that a clustering analysis can generate more homogeneous patient groups, which may aid clinicians in designing and implementing customized care programs. The study shows that predictive modeling and clustering using EHR data can be beneficial for providing a systematic, generalized approach for care providers to tailor their management approach based upon patient-level factors.
Cakir, Ebru; Kucuk, Ulku; Pala, Emel Ebru; Sezer, Ozlem; Ekin, Rahmi Gokhan; Cakmak, Ozgur
2017-05-01
Conventional cytomorphologic assessment is the first step to establish an accurate diagnosis in urinary cytology. In cytologic preparations, the separation of low-grade urothelial carcinoma (LGUC) from reactive urothelial proliferation (RUP) can be exceedingly difficult. The bladder washing cytologies of 32 LGUC and 29 RUP were reviewed. The cytologic slides were examined for the presence or absence of the 28 cytologic features. The cytologic criteria showing statistical significance in LGUC were increased numbers of monotonous single (non-umbrella) cells, three-dimensional cellular papillary clusters without fibrovascular cores, irregular bordered clusters, atypical single cells, irregular nuclear overlap, cytoplasmic homogeneity, increased N/C ratio, pleomorphism, nuclear border irregularity, nuclear eccentricity, elongated nuclei, and hyperchromasia (p ˂ 0.05), and the cytologic criteria showing statistical significance in RUP were inflammatory background, mixture of small and large urothelial cells, loose monolayer aggregates, and vacuolated cytoplasm (p ˂ 0.05). When these variables were subjected to a stepwise logistic regression analysis, four features were selected to distinguish LGUC from RUP: increased numbers of monotonous single (non-umbrella) cells, increased nuclear cytoplasmic ratio, hyperchromasia, and presence of small and large urothelial cells (p = 0.0001). By this logistic model of the 32 cases with proven LGUC, the stepwise logistic regression analysis correctly predicted 31 (96.9%) patients with this diagnosis, and of the 29 patients with RUP, the logistic model correctly predicted 26 (89.7%) patients as having this disease. There are several cytologic features to separate LGUC from RUP. Stepwise logistic regression analysis is a valuable tool for determining the most useful cytologic criteria to distinguish these entities. © 2017 APMIS. Published by John Wiley & Sons Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huang, W; Tu, S
Purpose: We conducted a retrospective study of Radiomics research for classifying malignancy of small pulmonary nodules. A machine learning algorithm of logistic regression and open research platform of Radiomics, IBEX (Imaging Biomarker Explorer), were used to evaluate the classification accuracy. Methods: The training set included 100 CT image series from cancer patients with small pulmonary nodules where the average diameter is 1.10 cm. These patients registered at Chang Gung Memorial Hospital and received a CT-guided operation of lung cancer lobectomy. The specimens were classified by experienced pathologists with a B (benign) or M (malignant). CT images with slice thickness ofmore » 0.625 mm were acquired from a GE BrightSpeed 16 scanner. The study was formally approved by our institutional internal review board. Nodules were delineated and 374 feature parameters were extracted from IBEX. We first used the t-test and p-value criteria to study which feature can differentiate between group B and M. Then we implemented a logistic regression algorithm to perform nodule malignancy classification. 10-fold cross-validation and the receiver operating characteristic curve (ROC) were used to evaluate the classification accuracy. Finally hierarchical clustering analysis, Spearman rank correlation coefficient, and clustering heat map were used to further study correlation characteristics among different features. Results: 238 features were found differentiable between group B and M based on whether their statistical p-values were less than 0.05. A forward search algorithm was used to select an optimal combination of features for the best classification and 9 features were identified. Our study found the best accuracy of classifying malignancy was 0.79±0.01 with the 10-fold cross-validation. The area under the ROC curve was 0.81±0.02. Conclusion: Benign nodules may be treated as a malignant tumor in low-dose CT and patients may undergo unnecessary surgeries or treatments. Our study may help radiologists to differentiate nodule malignancy for low-dose CT.« less
Lee, Christine K; Hofer, Ira; Gabel, Eilon; Baldi, Pierre; Cannesson, Maxime
2018-04-17
The authors tested the hypothesis that deep neural networks trained on intraoperative features can predict postoperative in-hospital mortality. The data used to train and validate the algorithm consists of 59,985 patients with 87 features extracted at the end of surgery. Feed-forward networks with a logistic output were trained using stochastic gradient descent with momentum. The deep neural networks were trained on 80% of the data, with 20% reserved for testing. The authors assessed improvement of the deep neural network by adding American Society of Anesthesiologists (ASA) Physical Status Classification and robustness of the deep neural network to a reduced feature set. The networks were then compared to ASA Physical Status, logistic regression, and other published clinical scores including the Surgical Apgar, Preoperative Score to Predict Postoperative Mortality, Risk Quantification Index, and the Risk Stratification Index. In-hospital mortality in the training and test sets were 0.81% and 0.73%. The deep neural network with a reduced feature set and ASA Physical Status classification had the highest area under the receiver operating characteristics curve, 0.91 (95% CI, 0.88 to 0.93). The highest logistic regression area under the curve was found with a reduced feature set and ASA Physical Status (0.90, 95% CI, 0.87 to 0.93). The Risk Stratification Index had the highest area under the receiver operating characteristics curve, at 0.97 (95% CI, 0.94 to 0.99). Deep neural networks can predict in-hospital mortality based on automatically extractable intraoperative data, but are not (yet) superior to existing methods.
TU-CD-BRB-01: Normal Lung CT Texture Features Improve Predictive Models for Radiation Pneumonitis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krafft, S; The University of Texas Graduate School of Biomedical Sciences, Houston, TX; Briere, T
2015-06-15
Purpose: Existing normal tissue complication probability (NTCP) models for radiation pneumonitis (RP) traditionally rely on dosimetric and clinical data but are limited in terms of performance and generalizability. Extraction of pre-treatment image features provides a potential new category of data that can improve NTCP models for RP. We consider quantitative measures of total lung CT intensity and texture in a framework for prediction of RP. Methods: Available clinical and dosimetric data was collected for 198 NSCLC patients treated with definitive radiotherapy. Intensity- and texture-based image features were extracted from the T50 phase of the 4D-CT acquired for treatment planning. Amore » total of 3888 features (15 clinical, 175 dosimetric, and 3698 image features) were gathered and considered candidate predictors for modeling of RP grade≥3. A baseline logistic regression model with mean lung dose (MLD) was first considered. Additionally, a least absolute shrinkage and selection operator (LASSO) logistic regression was applied to the set of clinical and dosimetric features, and subsequently to the full set of clinical, dosimetric, and image features. Model performance was assessed by comparing area under the curve (AUC). Results: A simple logistic fit of MLD was an inadequate model of the data (AUC∼0.5). Including clinical and dosimetric parameters within the framework of the LASSO resulted in improved performance (AUC=0.648). Analysis of the full cohort of clinical, dosimetric, and image features provided further and significant improvement in model performance (AUC=0.727). Conclusions: To achieve significant gains in predictive modeling of RP, new categories of data should be considered in addition to clinical and dosimetric features. We have successfully incorporated CT image features into a framework for modeling RP and have demonstrated improved predictive performance. Validation and further investigation of CT image features in the context of RP NTCP modeling is warranted. This work was supported by the Rosalie B. Hite Fellowship in Cancer research awarded to SPK.« less
New machine-learning algorithms for prediction of Parkinson's disease
NASA Astrophysics Data System (ADS)
Mandal, Indrajit; Sairam, N.
2014-03-01
This article presents an enhanced prediction accuracy of diagnosis of Parkinson's disease (PD) to prevent the delay and misdiagnosis of patients using the proposed robust inference system. New machine-learning methods are proposed and performance comparisons are based on specificity, sensitivity, accuracy and other measurable parameters. The robust methods of treating Parkinson's disease (PD) includes sparse multinomial logistic regression, rotation forest ensemble with support vector machines and principal components analysis, artificial neural networks, boosting methods. A new ensemble method comprising of the Bayesian network optimised by Tabu search algorithm as classifier and Haar wavelets as projection filter is used for relevant feature selection and ranking. The highest accuracy obtained by linear logistic regression and sparse multinomial logistic regression is 100% and sensitivity, specificity of 0.983 and 0.996, respectively. All the experiments are conducted over 95% and 99% confidence levels and establish the results with corrected t-tests. This work shows a high degree of advancement in software reliability and quality of the computer-aided diagnosis system and experimentally shows best results with supportive statistical inference.
Large unbalanced credit scoring using Lasso-logistic regression ensemble.
Wang, Hong; Xu, Qingsong; Zhou, Lifeng
2015-01-01
Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data.
Minimalist ensemble algorithms for genome-wide protein localization prediction.
Lin, Jhih-Rong; Mondal, Ananda Mohan; Liu, Rong; Hu, Jianjun
2012-07-03
Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi.
Minimalist ensemble algorithms for genome-wide protein localization prediction
2012-01-01
Background Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. Results This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. Conclusions We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi. PMID:22759391
NASA Astrophysics Data System (ADS)
Xu, Chao; Zhou, Dongxiang; Zhai, Yongping; Liu, Yunhui
2015-12-01
This paper realizes the automatic segmentation and classification of Mycobacterium tuberculosis with conventional light microscopy. First, the candidate bacillus objects are segmented by the marker-based watershed transform. The markers are obtained by an adaptive threshold segmentation based on the adaptive scale Gaussian filter. The scale of the Gaussian filter is determined according to the color model of the bacillus objects. Then the candidate objects are extracted integrally after region merging and contaminations elimination. Second, the shape features of the bacillus objects are characterized by the Hu moments, compactness, eccentricity, and roughness, which are used to classify the single, touching and non-bacillus objects. We evaluated the logistic regression, random forest, and intersection kernel support vector machines classifiers in classifying the bacillus objects respectively. Experimental results demonstrate that the proposed method yields to high robustness and accuracy. The logistic regression classifier performs best with an accuracy of 91.68%.
NASA Astrophysics Data System (ADS)
Pradhan, Biswajeet
2010-05-01
This paper presents the results of the cross-validation of a multivariate logistic regression model using remote sensing data and GIS for landslide hazard analysis on the Penang, Cameron, and Selangor areas in Malaysia. Landslide locations in the study areas were identified by interpreting aerial photographs and satellite images, supported by field surveys. SPOT 5 and Landsat TM satellite imagery were used to map landcover and vegetation index, respectively. Maps of topography, soil type, lineaments and land cover were constructed from the spatial datasets. Ten factors which influence landslide occurrence, i.e., slope, aspect, curvature, distance from drainage, lithology, distance from lineaments, soil type, landcover, rainfall precipitation, and normalized difference vegetation index (ndvi), were extracted from the spatial database and the logistic regression coefficient of each factor was computed. Then the landslide hazard was analysed using the multivariate logistic regression coefficients derived not only from the data for the respective area but also using the logistic regression coefficients calculated from each of the other two areas (nine hazard maps in all) as a cross-validation of the model. For verification of the model, the results of the analyses were then compared with the field-verified landslide locations. Among the three cases of the application of logistic regression coefficient in the same study area, the case of Selangor based on the Selangor logistic regression coefficients showed the highest accuracy (94%), where as Penang based on the Penang coefficients showed the lowest accuracy (86%). Similarly, among the six cases from the cross application of logistic regression coefficient in other two areas, the case of Selangor based on logistic coefficient of Cameron showed highest (90%) prediction accuracy where as the case of Penang based on the Selangor logistic regression coefficients showed the lowest accuracy (79%). Qualitatively, the cross application model yields reasonable results which can be used for preliminary landslide hazard mapping.
Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble
Wang, Hong; Xu, Qingsong; Zhou, Lifeng
2015-01-01
Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data. PMID:25706988
An Entropy-Based Measure for Assessing Fuzziness in Logistic Regression
ERIC Educational Resources Information Center
Weiss, Brandi A.; Dardick, William
2016-01-01
This article introduces an entropy-based measure of data-model fit that can be used to assess the quality of logistic regression models. Entropy has previously been used in mixture-modeling to quantify how well individuals are classified into latent classes. The current study proposes the use of entropy for logistic regression models to quantify…
Detecting Dementia Through Interactive Computer Avatars
Adachi, Hiroyoshi; Ukita, Norimichi; Ikeda, Manabu; Kazui, Hiroaki; Kudo, Takashi; Nakamura, Satoshi
2017-01-01
This paper proposes a new approach to automatically detect dementia. Even though some works have detected dementia from speech and language attributes, most have applied detection using picture descriptions, narratives, and cognitive tasks. In this paper, we propose a new computer avatar with spoken dialog functionalities that produces spoken queries based on the mini-mental state examination, the Wechsler memory scale-revised, and other related neuropsychological questions. We recorded the interactive data of spoken dialogues from 29 participants (14 dementia and 15 healthy controls) and extracted various audiovisual features. We tried to predict dementia using audiovisual features and two machine learning algorithms (support vector machines and logistic regression). Here, we show that the support vector machines outperformed logistic regression, and by using the extracted features they classified the participants into two groups with 0.93 detection performance, as measured by the areas under the receiver operating characteristic curve. We also newly identified some contributing features, e.g., gap before speaking, the variations of fundamental frequency, voice quality, and the ratio of smiling. We concluded that our system has the potential to detect dementia through spoken dialog systems and that the system can assist health care workers. In addition, these findings could help medical personnel detect signs of dementia. PMID:29018636
Mazzarini, Lorenzo; Kotzalidis, Georgios D; Piacentino, Daria; Rizzato, Salvatore; Angst, Jules; Azorin, Jean-Michel; Bowden, Charles L; Mosolov, Sergey; Young, Allan H; Vieta, Eduard; Girardi, Paolo; Perugi, Giulio
2018-03-15
Current classifications separate Bipolar (BD) from Major Depressive Disorder (MDD) based on polarity rather than recurrence. We aimed to determine bipolar/mixed feature frequency in a large MDD multinational sample with (High-Rec) and without (Low-Rec) >3 recurrences, comparing the two subsamples. We measured frequency of bipolarity/hypomanic features during current depressive episodes (MDEs) in 2347 MDD patients from the BRIDGE-II-mix database, comparing High-Rec with Low-Rec. We used Bonferroni-corrected Student's t-test for continuous, and chi-squared test, for categorical variables. Logistic regression estimated the size of the association between clinical characteristics and High-Rec MDD. Compared to Low-Rec (n = 1084, 46.2%), High-Rec patients (n = 1263, 53.8%) were older, with earlier depressive onset, had more family history of BD, more atypical features, suicide attempts, hospitalisations, and treatment resistance and (hypo)manic switches when treated with antidepressants, higher comorbidity with borderline personality disorder, and more hypomanic symptoms during current MDE, resulting in higher rates of mixed depression according to both DSM-5 and research-based diagnostic (RBDC) criteria. Logistic regression showed age at first symptoms < 30 years, current MDE duration ≤ 1 month, hypomania/mania among first-degree relatives, past suicide attempts, treatment-resistance, antidepressant-induced swings, and atypical, mixed, or psychotic features during MDE to associate with High-Rec. Number of MDEs for defining recurrence was arbitrary; cross-sectionality did not allow assessment of conversion from MDD to BD. High-Rec MDD differed from Low-Rec group for several clinical/epidemiological variables, including bipolar/mixed features. Bipolarity specifier and RBDC were more sensitive than DSM-5 criteria in detecting bipolar and mixed features in MDD. Copyright © 2017. Published by Elsevier B.V.
Tangen, C M; Koch, G G
1999-03-01
In the randomized clinical trial setting, controlling for covariates is expected to produce variance reduction for the treatment parameter estimate and to adjust for random imbalances of covariates between the treatment groups. However, for the logistic regression model, variance reduction is not obviously obtained. This can lead to concerns about the assumptions of the logistic model. We introduce a complementary nonparametric method for covariate adjustment. It provides results that are usually compatible with expectations for analysis of covariance. The only assumptions required are based on randomization and sampling arguments. The resulting treatment parameter is a (unconditional) population average log-odds ratio that has been adjusted for random imbalance of covariates. Data from a randomized clinical trial are used to compare results from the traditional maximum likelihood logistic method with those from the nonparametric logistic method. We examine treatment parameter estimates, corresponding standard errors, and significance levels in models with and without covariate adjustment. In addition, we discuss differences between unconditional population average treatment parameters and conditional subpopulation average treatment parameters. Additional features of the nonparametric method, including stratified (multicenter) and multivariate (multivisit) analyses, are illustrated. Extensions of this methodology to the proportional odds model are also made.
Feature Clustering for Accelerating Parallel Coordinate Descent
DOE Office of Scientific and Technical Information (OSTI.GOV)
Scherrer, Chad; Tewari, Ambuj; Halappanavar, Mahantesh
2012-12-06
We demonstrate an approach for accelerating calculation of the regularization path for L1 sparse logistic regression problems. We show the benefit of feature clustering as a preconditioning step for parallel block-greedy coordinate descent algorithms.
Differentially private distributed logistic regression using private and public data.
Ji, Zhanglong; Jiang, Xiaoqian; Wang, Shuang; Xiong, Li; Ohno-Machado, Lucila
2014-01-01
Privacy protecting is an important issue in medical informatics and differential privacy is a state-of-the-art framework for data privacy research. Differential privacy offers provable privacy against attackers who have auxiliary information, and can be applied to data mining models (for example, logistic regression). However, differentially private methods sometimes introduce too much noise and make outputs less useful. Given available public data in medical research (e.g. from patients who sign open-consent agreements), we can design algorithms that use both public and private data sets to decrease the amount of noise that is introduced. In this paper, we modify the update step in Newton-Raphson method to propose a differentially private distributed logistic regression model based on both public and private data. We try our algorithm on three different data sets, and show its advantage over: (1) a logistic regression model based solely on public data, and (2) a differentially private distributed logistic regression model based on private data under various scenarios. Logistic regression models built with our new algorithm based on both private and public datasets demonstrate better utility than models that trained on private or public datasets alone without sacrificing the rigorous privacy guarantee.
The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring
ERIC Educational Resources Information Center
Haberman, Shelby J.; Sinharay, Sandip
2010-01-01
Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anthony, G; Cunliffe, A; Armato, S
2015-06-15
Purpose: To determine whether the addition of standardized uptake value (SUV) statistical variables to CT lung texture features can improve a predictive model of radiation pneumonitis (RP) development in patients undergoing radiation therapy. Methods: Anonymized data from 96 esophageal cancer patients (18 RP-positive cases of Grade ≥ 2) were retrospectively collected including pre-therapy PET/CT scans, pre-/posttherapy diagnostic CT scans and RP status. Twenty texture features (firstorder, fractal, Laws’ filter and gray-level co-occurrence matrix) were calculated from diagnostic CT scans and compared in anatomically matched regions of the lung. The mean, maximum, standard deviation, and 50th–95th percentiles of the SUV valuesmore » for all lung voxels in the corresponding PET scans were acquired. For each texture feature, a logistic regression-based classifier consisting of (1) the average change in that texture feature value between the pre- and post-therapy CT scans and (2) the pre-therapy SUV standard deviation (SUV{sub SD}) was created. The RP-classification performance of each logistic regression model was compared to the performance of its texture feature alone by computing areas under the receiver operating characteristic curves (AUCs). T-tests were performed to determine whether the mean AUC across texture features changed significantly when SUV{sub SD} was added to the classifier. Results: The AUC for single-texturefeature classifiers ranged from 0.58–0.81 in high-dose (≥ 30 Gy) regions of the lungs and from 0.53–0.71 in low-dose (< 10 Gy) regions. Adding SUVSD in a logistic regression model using a 50/50 data partition for training and testing significantly increased the mean AUC by 0.08, 0.06 and 0.04 in the low-, medium- and high-dose regions, respectively. Conclusion: Addition of SUVSD from a pre-therapy PET scan to a single CT-based texture feature improves RP-classification performance on average. These findings demonstrate the potential for more accurate prediction of RP using information from multiple imaging modalities. Supported, in part, by the National Institute of Biomedical Imaging and Bioengineering of the National Institutes of Health under grant number T32 EB002103; SGA receives royalties and licensing fees through the University of Chicago for computer-aided diagnosis technology. HA receives royalties through the University of Chicago for computer-aided diagnosis technology.« less
Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan
2015-03-15
Proteins located in appropriate cellular compartments are of paramount importance to exert their biological functions. Prediction of protein subcellular localization by computational methods is required in the post-genomic era. Recent studies have been focusing on predicting not only single-location proteins but also multi-location proteins. However, most of the existing predictors are far from effective for tackling the challenges of multi-label proteins. This article proposes an efficient multi-label predictor, namely mPLR-Loc, based on penalized logistic regression and adaptive decisions for predicting both single- and multi-location proteins. Specifically, for each query protein, mPLR-Loc exploits the information from the Gene Ontology (GO) database by using its accession number (AC) or the ACs of its homologs obtained via BLAST. The frequencies of GO occurrences are used to construct feature vectors, which are then classified by an adaptive decision-based multi-label penalized logistic regression classifier. Experimental results based on two recent stringent benchmark datasets (virus and plant) show that mPLR-Loc remarkably outperforms existing state-of-the-art multi-label predictors. In addition to being able to rapidly and accurately predict subcellular localization of single- and multi-label proteins, mPLR-Loc can also provide probabilistic confidence scores for the prediction decisions. For readers' convenience, the mPLR-Loc server is available online (http://bioinfo.eie.polyu.edu.hk/mPLRLocServer). Copyright © 2014 Elsevier Inc. All rights reserved.
An Entropy-Based Measure for Assessing Fuzziness in Logistic Regression
Weiss, Brandi A.; Dardick, William
2015-01-01
This article introduces an entropy-based measure of data–model fit that can be used to assess the quality of logistic regression models. Entropy has previously been used in mixture-modeling to quantify how well individuals are classified into latent classes. The current study proposes the use of entropy for logistic regression models to quantify the quality of classification and separation of group membership. Entropy complements preexisting measures of data–model fit and provides unique information not contained in other measures. Hypothetical data scenarios, an applied example, and Monte Carlo simulation results are used to demonstrate the application of entropy in logistic regression. Entropy should be used in conjunction with other measures of data–model fit to assess how well logistic regression models classify cases into observed categories. PMID:29795897
An Entropy-Based Measure for Assessing Fuzziness in Logistic Regression.
Weiss, Brandi A; Dardick, William
2016-12-01
This article introduces an entropy-based measure of data-model fit that can be used to assess the quality of logistic regression models. Entropy has previously been used in mixture-modeling to quantify how well individuals are classified into latent classes. The current study proposes the use of entropy for logistic regression models to quantify the quality of classification and separation of group membership. Entropy complements preexisting measures of data-model fit and provides unique information not contained in other measures. Hypothetical data scenarios, an applied example, and Monte Carlo simulation results are used to demonstrate the application of entropy in logistic regression. Entropy should be used in conjunction with other measures of data-model fit to assess how well logistic regression models classify cases into observed categories.
Wang, Jing-Jing; Wu, Hai-Feng; Sun, Tao; Li, Xia; Wang, Wei; Tao, Li-Xin; Huo, Da; Lv, Ping-Xin; He, Wen; Guo, Xiu-Hua
2013-01-01
Lung cancer, one of the leading causes of cancer-related deaths, usually appears as solitary pulmonary nodules (SPNs) which are hard to diagnose using the naked eye. In this paper, curvelet-based textural features and clinical parameters are used with three prediction models [a multilevel model, a least absolute shrinkage and selection operator (LASSO) regression method, and a support vector machine (SVM)] to improve the diagnosis of benign and malignant SPNs. Dimensionality reduction of the original curvelet-based textural features was achieved using principal component analysis. In addition, non-conditional logistical regression was used to find clinical predictors among demographic parameters and morphological features. The results showed that, combined with 11 clinical predictors, the accuracy rates using 12 principal components were higher than those using the original curvelet-based textural features. To evaluate the models, 10-fold cross validation and back substitution were applied. The results obtained, respectively, were 0.8549 and 0.9221 for the LASSO method, 0.9443 and 0.9831 for SVM, and 0.8722 and 0.9722 for the multilevel model. All in all, it was found that using curvelet-based textural features after dimensionality reduction and using clinical predictors, the highest accuracy rate was achieved with SVM. The method may be used as an auxiliary tool to differentiate between benign and malignant SPNs in CT images.
NASA Astrophysics Data System (ADS)
Yilmaz, Isik; Keskin, Inan; Marschalko, Marian; Bednarik, Martin
2010-05-01
This study compares the GIS based collapse susceptibility mapping methods such as; conditional probability (CP), logistic regression (LR) and artificial neural networks (ANN) applied in gypsum rock masses in Sivas basin (Turkey). Digital Elevation Model (DEM) was first constructed using GIS software. Collapse-related factors, directly or indirectly related to the causes of collapse occurrence, such as distance from faults, slope angle and aspect, topographical elevation, distance from drainage, topographic wetness index- TWI, stream power index- SPI, Normalized Difference Vegetation Index (NDVI) by means of vegetation cover, distance from roads and settlements were used in the collapse susceptibility analyses. In the last stage of the analyses, collapse susceptibility maps were produced from CP, LR and ANN models, and they were then compared by means of their validations. Area Under Curve (AUC) values obtained from all three methodologies showed that the map obtained from ANN model looks like more accurate than the other models, and the results also showed that the artificial neural networks is a usefull tool in preparation of collapse susceptibility map and highly compatible with GIS operating features. Key words: Collapse; doline; susceptibility map; gypsum; GIS; conditional probability; logistic regression; artificial neural networks.
Differentially private distributed logistic regression using private and public data
2014-01-01
Background Privacy protecting is an important issue in medical informatics and differential privacy is a state-of-the-art framework for data privacy research. Differential privacy offers provable privacy against attackers who have auxiliary information, and can be applied to data mining models (for example, logistic regression). However, differentially private methods sometimes introduce too much noise and make outputs less useful. Given available public data in medical research (e.g. from patients who sign open-consent agreements), we can design algorithms that use both public and private data sets to decrease the amount of noise that is introduced. Methodology In this paper, we modify the update step in Newton-Raphson method to propose a differentially private distributed logistic regression model based on both public and private data. Experiments and results We try our algorithm on three different data sets, and show its advantage over: (1) a logistic regression model based solely on public data, and (2) a differentially private distributed logistic regression model based on private data under various scenarios. Conclusion Logistic regression models built with our new algorithm based on both private and public datasets demonstrate better utility than models that trained on private or public datasets alone without sacrificing the rigorous privacy guarantee. PMID:25079786
Exploring students' patterns of reasoning
NASA Astrophysics Data System (ADS)
Matloob Haghanikar, Mojgan
As part of a collaborative study of the science preparation of elementary school teachers, we investigated the quality of students' reasoning and explored the relationship between sophistication of reasoning and the degree to which the courses were considered inquiry oriented. To probe students' reasoning, we developed open-ended written content questions with the distinguishing feature of applying recently learned concepts in a new context. We devised a protocol for developing written content questions that provided a common structure for probing and classifying students' sophistication level of reasoning. In designing our protocol, we considered several distinct criteria, and classified students' responses based on their performance for each criterion. First, we classified concepts into three types: Descriptive, Hypothetical, and Theoretical and categorized the abstraction levels of the responses in terms of the types of concepts and the inter-relationship between the concepts. Second, we devised a rubric based on Bloom's revised taxonomy with seven traits (both knowledge types and cognitive processes) and a defined set of criteria to evaluate each trait. Along with analyzing students' reasoning, we visited universities and observed the courses in which the students were enrolled. We used the Reformed Teaching Observation Protocol (RTOP) to rank the courses with respect to characteristics that are valued for the inquiry courses. We conducted logistic regression for a sample of 18courses with about 900 students and reported the results for performing logistic regression to estimate the relationship between traits of reasoning and RTOP score. In addition, we analyzed conceptual structure of students' responses, based on conceptual classification schemes, and clustered students' responses into six categories. We derived regression model, to estimate the relationship between the sophistication of the categories of conceptual structure and RTOP scores. However, the outcome variable with six categories required a more complicated regression model, known as multinomial logistic regression, generalized from binary logistic regression. With the large amount of collected data, we found that the likelihood of the higher cognitive processes were in favor of classes with higher measures on inquiry. However, the usage of more abstract concepts with higher order conceptual structures was less prevalent in higher RTOP courses.
NASA Astrophysics Data System (ADS)
Wang, Yunzhi; Qiu, Yuchen; Thai, Theresa; More, Kathleen; Ding, Kai; Liu, Hong; Zheng, Bin
2016-03-01
How to rationally identify epithelial ovarian cancer (EOC) patients who will benefit from bevacizumab or other antiangiogenic therapies is a critical issue in EOC treatments. The motivation of this study is to quantitatively measure adiposity features from CT images and investigate the feasibility of predicting potential benefit of EOC patients with or without receiving bevacizumab-based chemotherapy treatment using multivariate statistical models built based on quantitative adiposity image features. A dataset involving CT images from 59 advanced EOC patients were included. Among them, 32 patients received maintenance bevacizumab after primary chemotherapy and the remaining 27 patients did not. We developed a computer-aided detection (CAD) scheme to automatically segment subcutaneous fat areas (VFA) and visceral fat areas (SFA) and then extracted 7 adiposity-related quantitative features. Three multivariate data analysis models (linear regression, logistic regression and Cox proportional hazards regression) were performed respectively to investigate the potential association between the model-generated prediction results and the patients' progression-free survival (PFS) and overall survival (OS). The results show that using all 3 statistical models, a statistically significant association was detected between the model-generated results and both of the two clinical outcomes in the group of patients receiving maintenance bevacizumab (p<0.01), while there were no significant association for both PFS and OS in the group of patients without receiving maintenance bevacizumab. Therefore, this study demonstrated the feasibility of using quantitative adiposity-related CT image features based statistical prediction models to generate a new clinical marker and predict the clinical outcome of EOC patients receiving maintenance bevacizumab-based chemotherapy.
Siuly; Li, Yan; Paul Wen, Peng
2014-03-01
Motor imagery (MI) tasks classification provides an important basis for designing brain-computer interface (BCI) systems. If the MI tasks are reliably distinguished through identifying typical patterns in electroencephalography (EEG) data, a motor disabled people could communicate with a device by composing sequences of these mental states. In our earlier study, we developed a cross-correlation based logistic regression (CC-LR) algorithm for the classification of MI tasks for BCI applications, but its performance was not satisfactory. This study develops a modified version of the CC-LR algorithm exploring a suitable feature set that can improve the performance. The modified CC-LR algorithm uses the C3 electrode channel (in the international 10-20 system) as a reference channel for the cross-correlation (CC) technique and applies three diverse feature sets separately, as the input to the logistic regression (LR) classifier. The present algorithm investigates which feature set is the best to characterize the distribution of MI tasks based EEG data. This study also provides an insight into how to select a reference channel for the CC technique with EEG signals considering the anatomical structure of the human brain. The proposed algorithm is compared with eight of the most recently reported well-known methods including the BCI III Winner algorithm. The findings of this study indicate that the modified CC-LR algorithm has potential to improve the identification performance of MI tasks in BCI systems. The results demonstrate that the proposed technique provides a classification improvement over the existing methods tested. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Wang, Shuang; Jiang, Xiaoqian; Wu, Yuan; Cui, Lijuan; Cheng, Samuel; Ohno-Machado, Lucila
2013-01-01
We developed an EXpectation Propagation LOgistic REgRession (EXPLORER) model for distributed privacy-preserving online learning. The proposed framework provides a high level guarantee for protecting sensitive information, since the information exchanged between the server and the client is the encrypted posterior distribution of coefficients. Through experimental results, EXPLORER shows the same performance (e.g., discrimination, calibration, feature selection etc.) as the traditional frequentist Logistic Regression model, but provides more flexibility in model updating. That is, EXPLORER can be updated one point at a time rather than having to retrain the entire data set when new observations are recorded. The proposed EXPLORER supports asynchronized communication, which relieves the participants from coordinating with one another, and prevents service breakdown from the absence of participants or interrupted communications. PMID:23562651
Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning
ERIC Educational Resources Information Center
Li, Zhushan
2014-01-01
Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…
2013-01-01
Methods for analysis of network dynamics have seen great progress in the past decade. This article shows how Dynamic Network Logistic Regression techniques (a special case of the Temporal Exponential Random Graph Models) can be used to implement decision theoretic models for network dynamics in a panel data context. We also provide practical heuristics for model building and assessment. We illustrate the power of these techniques by applying them to a dynamic blog network sampled during the 2004 US presidential election cycle. This is a particularly interesting case because it marks the debut of Internet-based media such as blogs and social networking web sites as institutionally recognized features of the American political landscape. Using a longitudinal sample of all Democratic National Convention/Republican National Convention–designated blog citation networks, we are able to test the influence of various strategic, institutional, and balance-theoretic mechanisms as well as exogenous factors such as seasonality and political events on the propensity of blogs to cite one another over time. Using a combination of deviance-based model selection criteria and simulation-based model adequacy tests, we identify the combination of processes that best characterizes the choice behavior of the contending blogs. PMID:24143060
Ultrasound based computer-aided-diagnosis of kidneys for pediatric hydronephrosis
NASA Astrophysics Data System (ADS)
Cerrolaza, Juan J.; Peters, Craig A.; Martin, Aaron D.; Myers, Emmarie; Safdar, Nabile; Linguraru, Marius G.
2014-03-01
Ultrasound is the mainstay of imaging for pediatric hydronephrosis, though its potential as diagnostic tool is limited by its subjective assessment, and lack of correlation with renal function. Therefore, all cases showing signs of hydronephrosis undergo further invasive studies, like diuretic renogram, in order to assess the actual renal function. Under the hypothesis that renal morphology is correlated with renal function, a new ultrasound based computer-aided diagnosis (CAD) tool for pediatric hydronephrosis is presented. From 2D ultrasound, a novel set of morphological features of the renal collecting systems and the parenchyma, is automatically extracted using image analysis techniques. From the original set of features, including size, geometric and curvature descriptors, a subset of ten features are selected as predictive variables, combining a feature selection technique and area under the curve filtering. Using the washout half time (T1/2) as indicative of renal obstruction, two groups are defined. Those cases whose T1/2 is above 30 minutes are considered to be severe, while the rest would be in the safety zone, where diuretic renography could be avoided. Two different classification techniques are evaluated (logistic regression, and support vector machines). Adjusting the probability decision thresholds to operate at the point of maximum sensitivity, i.e., preventing any severe case be misclassified, specificities of 53%, and 75% are achieved, for the logistic regression and the support vector machine classifier, respectively. The proposed CAD system allows to establish a link between non-invasive non-ionizing imaging techniques and renal function, limiting the need for invasive and ionizing diuretic renography.
Eiff, M Patrice; Green, Larry A; Jones, Geoff; Devlaeminck, Alex Verdieck; Waller, Elaine; Dexter, Eve; Marino, Miguel; Carney, Patricia A
2017-03-01
Little is known about how the patient-centered medical home (PCMH) is being implemented in residency practices. We describe both the trends in implementation of PCMH features and the influence that working with PCMH features has on resident attitudes toward their importance in 14 family medicine residencies associated with the P4 Project. We assessed 24 residency continuity clinics annually between 2007-2011 on presence or absence of PCMH features. Annual resident surveys (n=690) assessed perceptions of importance of PCMH features using a 4-point scale (not at all important to very important). We used generalized estimating equations logistic regression to assess trends and ordinal-response proportional odds regression models to determine if resident ratings of importance were associated with working with those features during training. Implementation of electronic health record (EHR) features increased significantly from 2007-2011, such as email communication with patients (33% to 67%), preventive services registries (23% to 64%), chronic disease registries (63% to 82%), and population-based quality assurance (46% to 79%). Team-based care was the only process of care feature to change significantly (54% to 93%). Residents with any exposure to EHR-based features had higher odds of rating the features more important compared to those with no exposure. We observed consistently lower odds of the resident rating process of care features as more important with any exposure compared to no exposure. Residencies engaged in educational transformation were more successful in implementing EHR-based PCMH features, and exposure during training appears to positively influence resident ratings of importance, while exposure to process of care features are slower to implement with less influence on importance ratings.
NASA Astrophysics Data System (ADS)
Yao, W.; Poleswki, P.; Krzystek, P.
2016-06-01
The recent success of deep convolutional neural networks (CNN) on a large number of applications can be attributed to large amounts of available training data and increasing computing power. In this paper, a semantic pixel labelling scheme for urban areas using multi-resolution CNN and hand-crafted spatial-spectral features of airborne remotely sensed data is presented. Both CNN and hand-crafted features are applied to image/DSM patches to produce per-pixel class probabilities with a L1-norm regularized logistical regression classifier. The evidence theory infers a degree of belief for pixel labelling from different sources to smooth regions by handling the conflicts present in the both classifiers while reducing the uncertainty. The aerial data used in this study were provided by ISPRS as benchmark datasets for 2D semantic labelling tasks in urban areas, which consists of two data sources from LiDAR and color infrared camera. The test sites are parts of a city in Germany which is assumed to consist of typical object classes including impervious surfaces, trees, buildings, low vegetation, vehicles and clutter. The evaluation is based on the computation of pixel-based confusion matrices by random sampling. The performance of the strategy with respect to scene characteristics and method combination strategies is analyzed and discussed. The competitive classification accuracy could be not only explained by the nature of input data sources: e.g. the above-ground height of nDSM highlight the vertical dimension of houses, trees even cars and the nearinfrared spectrum indicates vegetation, but also attributed to decision-level fusion of CNN's texture-based approach with multichannel spatial-spectral hand-crafted features based on the evidence combination theory.
Yu, Yuanyuan; Li, Hongkai; Sun, Xiaoru; Su, Ping; Wang, Tingting; Liu, Yi; Yuan, Zhongshang; Liu, Yanxun; Xue, Fuzhong
2017-12-28
Confounders can produce spurious associations between exposure and outcome in observational studies. For majority of epidemiologists, adjusting for confounders using logistic regression model is their habitual method, though it has some problems in accuracy and precision. It is, therefore, important to highlight the problems of logistic regression and search the alternative method. Four causal diagram models were defined to summarize confounding equivalence. Both theoretical proofs and simulation studies were performed to verify whether conditioning on different confounding equivalence sets had the same bias-reducing potential and then to select the optimum adjusting strategy, in which logistic regression model and inverse probability weighting based marginal structural model (IPW-based-MSM) were compared. The "do-calculus" was used to calculate the true causal effect of exposure on outcome, then the bias and standard error were used to evaluate the performances of different strategies. Adjusting for different sets of confounding equivalence, as judged by identical Markov boundaries, produced different bias-reducing potential in the logistic regression model. For the sets satisfied G-admissibility, adjusting for the set including all the confounders reduced the equivalent bias to the one containing the parent nodes of the outcome, while the bias after adjusting for the parent nodes of exposure was not equivalent to them. In addition, all causal effect estimations through logistic regression were biased, although the estimation after adjusting for the parent nodes of exposure was nearest to the true causal effect. However, conditioning on different confounding equivalence sets had the same bias-reducing potential under IPW-based-MSM. Compared with logistic regression, the IPW-based-MSM could obtain unbiased causal effect estimation when the adjusted confounders satisfied G-admissibility and the optimal strategy was to adjust for the parent nodes of outcome, which obtained the highest precision. All adjustment strategies through logistic regression were biased for causal effect estimation, while IPW-based-MSM could always obtain unbiased estimation when the adjusted set satisfied G-admissibility. Thus, IPW-based-MSM was recommended to adjust for confounders set.
Tyagi, Neelam; Sutton, Elizabeth; Hunt, Margie; Zhang, Jing; Oh, Jung Hun; Apte, Aditya; Mechalakos, James; Wilgucki, Molly; Gelb, Emily; Mehrara, Babak; Matros, Evan; Ho, Alice
2017-02-01
Capsular contracture (CC) is a serious complication in patients receiving implant-based reconstruction for breast cancer. Currently, no objective methods are available for assessing CC. The goal of the present study was to identify image-based surrogates of CC using magnetic resonance imaging (MRI). We analyzed a retrospective data set of 50 patients who had undergone both a diagnostic MRI scan and a plastic surgeon's evaluation of the CC score (Baker's score) within a 6-month period after mastectomy and reconstructive surgery. The MRI scans were assessed for morphologic shape features of the implant and histogram features of the pectoralis muscle. The shape features, such as roundness, eccentricity, solidity, extent, and ratio length for the implant, were compared with the Baker score. For the pectoralis muscle, the muscle width and median, skewness, and kurtosis of the intensity were compared with the Baker score. Univariate analysis (UVA) using a Wilcoxon rank-sum test and multivariate analysis with the least absolute shrinkage and selection operator logistic regression was performed to determine significant differences in these features between the patient groups categorized according to their Baker's scores. UVA showed statistically significant differences between grade 1 and grade ≥2 for morphologic shape features and histogram features, except for volume and skewness. Only eccentricity, ratio length, and volume were borderline significant in differentiating grade ≤2 and grade ≥3. Features with P<.1 on UVA were used in the multivariate least absolute shrinkage and selection operator logistic regression analysis. Multivariate analysis showed a good level of predictive power for grade 1 versus grade ≥2 CC (area under the receiver operating characteristic curve 0.78, sensitivity 0.78, and specificity 0.82) and for grade ≤2 versus grade ≥3 CC (area under the receiver operating characteristic curve 0.75, sensitivity 0.75, and specificity 0.79). The morphologic shape features described on MR images were associated with the severity of CC. MRI has the potential to further improve the diagnostic ability of the Baker score in breast cancer patients who undergo implant reconstruction. Copyright © 2016 Elsevier Inc. All rights reserved.
Customer Churn Prediction for Broadband Internet Services
NASA Astrophysics Data System (ADS)
Huang, B. Q.; Kechadi, M.-T.; Buckley, B.
Although churn prediction has been an area of research in the voice branch of telecommunications services, more focused studies on the huge growth area of Broadband Internet services are limited. Therefore, this paper presents a new set of features for broadband Internet customer churn prediction, based on Henley segments, the broadband usage, dial types, the spend of dial-up, line-information, bill and payment information, account information. Then the four prediction techniques (Logistic Regressions, Decision Trees, Multilayer Perceptron Neural Networks and Support Vector Machines) are applied in customer churn, based on the new features. Finally, the evaluation of new features and a comparative analysis of the predictors are made for broadband customer churn prediction. The experimental results show that the new features with these four modelling techniques are efficient for customer churn prediction in the broadband service field.
NASA Astrophysics Data System (ADS)
Tan, Maxine; Emaminejad, Nastaran; Qian, Wei; Sun, Shenshen; Kang, Yan; Guan, Yubao; Lure, Fleming; Zheng, Bin
2014-03-01
Stage I non-small-cell lung cancers (NSCLC) usually have favorable prognosis. However, high percentage of NSCLC patients have cancer relapse after surgery. Accurately predicting cancer prognosis is important to optimally treat and manage the patients to minimize the risk of cancer relapse. Studies have shown that an excision repair crosscomplementing 1 (ERCC1) gene was a potentially useful genetic biomarker to predict prognosis of NSCLC patients. Meanwhile, studies also found that chronic obstructive pulmonary disease (COPD) was highly associated with lung cancer prognosis. In this study, we investigated and evaluated the correlations between COPD image features and ERCC1 gene expression. A database involving 106 NSCLC patients was used. Each patient had a thoracic CT examination and ERCC1 genetic test. We applied a computer-aided detection scheme to segment and quantify COPD image features. A logistic regression method and a multilayer perceptron network were applied to analyze the correlation between the computed COPD image features and ERCC1 protein expression. A multilayer perceptron network (MPN) was also developed to test performance of using COPD-related image features to predict ERCC1 protein expression. A nine feature based logistic regression analysis showed the average COPD feature values in the low and high ERCC1 protein expression groups are significantly different (p < 0.01). Using a five-fold cross validation method, the MPN yielded an area under ROC curve (AUC = 0.669±0.053) in classifying between the low and high ERCC1 expression cases. The study indicates that CT phenotype features are associated with the genetic tests, which may provide supplementary information to help improve accuracy in assessing prognosis of NSCLC patients.
Fuzzy multinomial logistic regression analysis: A multi-objective programming approach
NASA Astrophysics Data System (ADS)
Abdalla, Hesham A.; El-Sayed, Amany A.; Hamed, Ramadan
2017-05-01
Parameter estimation for multinomial logistic regression is usually based on maximizing the likelihood function. For large well-balanced datasets, Maximum Likelihood (ML) estimation is a satisfactory approach. Unfortunately, ML can fail completely or at least produce poor results in terms of estimated probabilities and confidence intervals of parameters, specially for small datasets. In this study, a new approach based on fuzzy concepts is proposed to estimate parameters of the multinomial logistic regression. The study assumes that the parameters of multinomial logistic regression are fuzzy. Based on the extension principle stated by Zadeh and Bárdossy's proposition, a multi-objective programming approach is suggested to estimate these fuzzy parameters. A simulation study is used to evaluate the performance of the new approach versus Maximum likelihood (ML) approach. Results show that the new proposed model outperforms ML in cases of small datasets.
Buri, Luigi; Hassan, Cesare; Bersani, Gianluca; Anti, Marcello; Bianco, Maria Antonietta; Cipolletta, Livio; Di Giulio, Emilio; Di Matteo, Giovanni; Familiari, Luigi; Ficano, Leonardo; Loriga, Pietro; Morini, Sergio; Pietropaolo, Vincenzo; Zambelli, Alessandro; Grossi, Enzo; Intraligi, Marco; Buscema, Massimo
2010-06-01
Selecting patients appropriately for upper endoscopy (EGD) is crucial for efficient use of endoscopy. The objective of this study was to compare different clinical strategies and statistical methods to select patients for EGD, namely appropriateness guidelines, age and/or alarm features, and multivariate and artificial neural network (ANN) models. A nationwide, multicenter, prospective study was undertaken in which consecutive patients referred for EGD during a 1-month period were enrolled. Before EGD, the endoscopist assessed referral appropriateness according to the American Society for Gastrointestinal Endoscopy (ASGE) guidelines, also collecting clinical and demographic variables. Outcomes of the study were detection of relevant findings and new diagnosis of malignancy at EGD. The accuracy of the following clinical strategies and predictive rules was compared: (i) ASGE appropriateness guidelines (indicated vs. not indicated), (ii) simplified rule (>or=45 years or alarm features vs. <45 years without alarm features), (iii) logistic regression model, and (iv) ANN models. A total of 8,252 patients were enrolled in 57 centers. Overall, 3,803 (46%) relevant findings and 132 (1.6%) new malignancies were detected. Sensitivity, specificity, and area under the receiver-operating characteristic curve (AUC) of the simplified rule were similar to that of the ASGE guidelines for both relevant findings (82%/26%/0.55 vs. 88%/27%/0.52) and cancer (97%/22%/0.58 vs. 98%/20%/0.58). Both logistic regression and ANN models seemed to be substantially more accurate in predicting new cases of malignancy, with an AUC of 0.82 and 0.87, respectively. A simple predictive rule based on age and alarm features is similarly effective to the more complex ASGE guidelines in selecting patients for EGD. Regression and ANN models may be useful in identifying a relatively small subgroup of patients at higher risk of cancer.
Should metacognition be measured by logistic regression?
Rausch, Manuel; Zehetleitner, Michael
2017-03-01
Are logistic regression slopes suitable to quantify metacognitive sensitivity, i.e. the efficiency with which subjective reports differentiate between correct and incorrect task responses? We analytically show that logistic regression slopes are independent from rating criteria in one specific model of metacognition, which assumes (i) that rating decisions are based on sensory evidence generated independently of the sensory evidence used for primary task responses and (ii) that the distributions of evidence are logistic. Given a hierarchical model of metacognition, logistic regression slopes depend on rating criteria. According to all considered models, regression slopes depend on the primary task criterion. A reanalysis of previous data revealed that massive numbers of trials are required to distinguish between hierarchical and independent models with tolerable accuracy. It is argued that researchers who wish to use logistic regression as measure of metacognitive sensitivity need to control the primary task criterion and rating criteria. Copyright © 2017 Elsevier Inc. All rights reserved.
Wang, Shuang; Jiang, Xiaoqian; Wu, Yuan; Cui, Lijuan; Cheng, Samuel; Ohno-Machado, Lucila
2013-06-01
We developed an EXpectation Propagation LOgistic REgRession (EXPLORER) model for distributed privacy-preserving online learning. The proposed framework provides a high level guarantee for protecting sensitive information, since the information exchanged between the server and the client is the encrypted posterior distribution of coefficients. Through experimental results, EXPLORER shows the same performance (e.g., discrimination, calibration, feature selection, etc.) as the traditional frequentist logistic regression model, but provides more flexibility in model updating. That is, EXPLORER can be updated one point at a time rather than having to retrain the entire data set when new observations are recorded. The proposed EXPLORER supports asynchronized communication, which relieves the participants from coordinating with one another, and prevents service breakdown from the absence of participants or interrupted communications. Copyright © 2013 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yan, H; Chen, Z; Nath, R
Purpose: kV fluoroscopic imaging combined with MV treatment beam imaging has been investigated for intrafractional motion monitoring and correction. It is, however, subject to additional kV imaging dose to normal tissue. To balance tracking accuracy and imaging dose, we previously proposed an adaptive imaging strategy to dynamically decide future imaging type and moments based on motion tracking uncertainty. kV imaging may be used continuously for maximal accuracy or only when the position uncertainty (probability of out of threshold) is high if a preset imaging dose limit is considered. In this work, we propose more accurate methods to estimate tracking uncertaintymore » through analyzing acquired data in real-time. Methods: We simulated motion tracking process based on a previously developed imaging framework (MV + initial seconds of kV imaging) using real-time breathing data from 42 patients. Motion tracking errors for each time point were collected together with the time point’s corresponding features, such as tumor motion speed and 2D tracking error of previous time points, etc. We tested three methods for error uncertainty estimation based on the features: conditional probability distribution, logistic regression modeling, and support vector machine (SVM) classification to detect errors exceeding a threshold. Results: For conditional probability distribution, polynomial regressions on three features (previous tracking error, prediction quality, and cosine of the angle between the trajectory and the treatment beam) showed strong correlation with the variation (uncertainty) of the mean 3D tracking error and its standard deviation: R-square = 0.94 and 0.90, respectively. The logistic regression and SVM classification successfully identified about 95% of tracking errors exceeding 2.5mm threshold. Conclusion: The proposed methods can reliably estimate the motion tracking uncertainty in real-time, which can be used to guide adaptive additional imaging to confirm the tumor is within the margin or initialize motion compensation if it is out of the margin.« less
Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model
NASA Astrophysics Data System (ADS)
Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami
2017-06-01
A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.
NASA Astrophysics Data System (ADS)
Li, Zuhe; Fan, Yangyu; Liu, Weihua; Yu, Zeqi; Wang, Fengqin
2017-01-01
We aim to apply sparse autoencoder-based unsupervised feature learning to emotional semantic analysis for textile images. To tackle the problem of limited training data, we present a cross-domain feature learning scheme for emotional textile image classification using convolutional autoencoders. We further propose a correlation-analysis-based feature selection method for the weights learned by sparse autoencoders to reduce the number of features extracted from large size images. First, we randomly collect image patches on an unlabeled image dataset in the source domain and learn local features with a sparse autoencoder. We then conduct feature selection according to the correlation between different weight vectors corresponding to the autoencoder's hidden units. We finally adopt a convolutional neural network including a pooling layer to obtain global feature activations of textile images in the target domain and send these global feature vectors into logistic regression models for emotional image classification. The cross-domain unsupervised feature learning method achieves 65% to 78% average accuracy in the cross-validation experiments corresponding to eight emotional categories and performs better than conventional methods. Feature selection can reduce the computational cost of global feature extraction by about 50% while improving classification performance.
Jovanovic, Milos; Radovanovic, Sandro; Vukicevic, Milan; Van Poucke, Sven; Delibasic, Boris
2016-09-01
Quantification and early identification of unplanned readmission risk have the potential to improve the quality of care during hospitalization and after discharge. However, high dimensionality, sparsity, and class imbalance of electronic health data and the complexity of risk quantification, challenge the development of accurate predictive models. Predictive models require a certain level of interpretability in order to be applicable in real settings and create actionable insights. This paper aims to develop accurate and interpretable predictive models for readmission in a general pediatric patient population, by integrating a data-driven model (sparse logistic regression) and domain knowledge based on the international classification of diseases 9th-revision clinical modification (ICD-9-CM) hierarchy of diseases. Additionally, we propose a way to quantify the interpretability of a model and inspect the stability of alternative solutions. The analysis was conducted on >66,000 pediatric hospital discharge records from California, State Inpatient Databases, Healthcare Cost and Utilization Project between 2009 and 2011. We incorporated domain knowledge based on the ICD-9-CM hierarchy in a data driven, Tree-Lasso regularized logistic regression model, providing the framework for model interpretation. This approach was compared with traditional Lasso logistic regression resulting in models that are easier to interpret by fewer high-level diagnoses, with comparable prediction accuracy. The results revealed that the use of a Tree-Lasso model was as competitive in terms of accuracy (measured by area under the receiver operating characteristic curve-AUC) as the traditional Lasso logistic regression, but integration with the ICD-9-CM hierarchy of diseases provided more interpretable models in terms of high-level diagnoses. Additionally, interpretations of models are in accordance with existing medical understanding of pediatric readmission. Best performing models have similar performances reaching AUC values 0.783 and 0.779 for traditional Lasso and Tree-Lasso, respectfully. However, information loss of Lasso models is 0.35 bits higher compared to Tree-Lasso model. We propose a method for building predictive models applicable for the detection of readmission risk based on Electronic Health records. Integration of domain knowledge (in the form of ICD-9-CM taxonomy) and a data-driven, sparse predictive algorithm (Tree-Lasso Logistic Regression) resulted in an increase of interpretability of the resulting model. The models are interpreted for the readmission prediction problem in general pediatric population in California, as well as several important subpopulations, and the interpretations of models comply with existing medical understanding of pediatric readmission. Finally, quantitative assessment of the interpretability of the models is given, that is beyond simple counts of selected low-level features. Copyright © 2016 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Lin, Yingzhi; Deng, Xiangzheng; Li, Xing; Ma, Enjun
2014-12-01
Spatially explicit simulation of land use change is the basis for estimating the effects of land use and cover change on energy fluxes, ecology and the environment. At the pixel level, logistic regression is one of the most common approaches used in spatially explicit land use allocation models to determine the relationship between land use and its causal factors in driving land use change, and thereby to evaluate land use suitability. However, these models have a drawback in that they do not determine/allocate land use based on the direct relationship between land use change and its driving factors. Consequently, a multinomial logistic regression method was introduced to address this flaw, and thereby, judge the suitability of a type of land use in any given pixel in a case study area of the Jiangxi Province, China. A comparison of the two regression methods indicated that the proportion of correctly allocated pixels using multinomial logistic regression was 92.98%, which was 8.47% higher than that obtained using logistic regression. Paired t-test results also showed that pixels were more clearly distinguished by multinomial logistic regression than by logistic regression. In conclusion, multinomial logistic regression is a more efficient and accurate method for the spatial allocation of land use changes. The application of this method in future land use change studies may improve the accuracy of predicting the effects of land use and cover change on energy fluxes, ecology, and environment.
2016-01-01
Understanding the relationship between physiological measurements from human subjects and their demographic data is important within both the biometric and forensic domains. In this paper we explore the relationship between measurements of the human hand and a range of demographic features. We assess the ability of linear regression and machine learning classifiers to predict demographics from hand features, thereby providing evidence on both the strength of relationship and the key features underpinning this relationship. Our results show that we are able to predict sex, height, weight and foot size accurately within various data-range bin sizes, with machine learning classification algorithms out-performing linear regression in most situations. In addition, we identify the features used to provide these relationships applicable across multiple applications. PMID:27806075
Miguel-Hurtado, Oscar; Guest, Richard; Stevenage, Sarah V; Neil, Greg J; Black, Sue
2016-01-01
Understanding the relationship between physiological measurements from human subjects and their demographic data is important within both the biometric and forensic domains. In this paper we explore the relationship between measurements of the human hand and a range of demographic features. We assess the ability of linear regression and machine learning classifiers to predict demographics from hand features, thereby providing evidence on both the strength of relationship and the key features underpinning this relationship. Our results show that we are able to predict sex, height, weight and foot size accurately within various data-range bin sizes, with machine learning classification algorithms out-performing linear regression in most situations. In addition, we identify the features used to provide these relationships applicable across multiple applications.
Serum Leptin Is a Biomarker of Malnutrition in Decompensated Cirrhosis
Rachakonda, Vikrant; Borhani, Amir A.; Dunn, Michael A.; Andrzejewski, Margaret; Martin, Kelly; Behari, Jaideep
2016-01-01
Background and Aims Malnutrition is a leading cause of morbidity and mortality in cirrhosis. There is no consensus as to the optimal approach for identifying malnutrition in end-stage liver disease. The aim of this study was to measure biochemical, serologic, hormonal, radiographic, and anthropometric features in a cohort of hospitalized cirrhotic patients to characterize biomarkers for identification of malnutrition. Design In this prospective observational cohort study, 52 hospitalized cirrhotic patients were classified as malnourished (42.3%) or nourished (57.7%) based on mid-arm muscle circumference < 23 cm and dominant handgrip strength < 30 kg. Anthropometric measurements were obtained. Appetite was assessed using the Simplified Nutrition Appetite Questionnaire (SNAQ) score. Fasting levels of serum adipokines, cytokines, and hormones were determined using Luminex assays. Logistic regression analysis was used to determine features independently associated with malnutrition. Results Subjects with and without malnutrition differed in several key features of metabolic phenotype including wet and dry BMI, skeletal muscle index, visceral fat index and HOMA-IR. Serum leptin levels were lower and INR was higher in malnourished subjects. Serum leptin was significantly correlated with HOMA-IR, wet and dry BMI, mid-arm muscle circumference, skeletal muscle index, and visceral fat index. Logistic regression analysis revealed that INR and log-transformed leptin were independently associated with malnutrition. Conclusions Low serum leptin and elevated INR are associated with malnutrition in hospitalized patients with end-stage liver disease. PMID:27583675
Influences on choice of surgery as a career: a study of consecutive cohorts in a medical school.
Sobral, Dejano T
2006-06-01
To examine the differential impact of person-based and programme-related features on graduates' dichotomous choice between surgical or non-surgical field specialties for first-year residency. A 10-year cohort study was conducted, following 578 students (55.4% male) who graduated from a university medical school during 1994-2003. Data were collected as follows: at the beginning of medical studies, on career preference and learning frame; during medical studies, on academic achievement, cross-year peer tutoring and selective clinical traineeship, and at graduation, on the first-year residency selected. Contingency and logistic regression analyses were performed, with graduates grouped by the dichotomous choice of surgery or not. Overall, 23% of graduates selected a first-year residency in surgery. Seven time-steady features related to this choice: male sex, high self-confidence, option of surgery at admission, active learning style, preference for surgery after Year 1, peer tutoring on clinical surgery, and selective training in clinical surgery. Logistic regression analysis, including all features, predicted 87.1% of the graduates' choices. Male sex, updated preference, peer tutoring and selective training were the most significant predictors in the pathway to choice. The relative roles of person-based and programme-related factors in the choice process are discussed. The findings suggest that for most students the choice of surgery derives from a temporal summation of influences that encompass entry and post-entry factors blended in variable patterns. It is likely that sex-unbiased peer tutoring and selective training supported the students' search process for personal compatibility with specialty-related domains of content and process.
Discrete post-processing of total cloud cover ensemble forecasts
NASA Astrophysics Data System (ADS)
Hemri, Stephan; Haiden, Thomas; Pappenberger, Florian
2017-04-01
This contribution presents an approach to post-process ensemble forecasts for the discrete and bounded weather variable of total cloud cover. Two methods for discrete statistical post-processing of ensemble predictions are tested. The first approach is based on multinomial logistic regression, the second involves a proportional odds logistic regression model. Applying them to total cloud cover raw ensemble forecasts from the European Centre for Medium-Range Weather Forecasts improves forecast skill significantly. Based on station-wise post-processing of raw ensemble total cloud cover forecasts for a global set of 3330 stations over the period from 2007 to early 2014, the more parsimonious proportional odds logistic regression model proved to slightly outperform the multinomial logistic regression model. Reference Hemri, S., Haiden, T., & Pappenberger, F. (2016). Discrete post-processing of total cloud cover ensemble forecasts. Monthly Weather Review 144, 2565-2577.
Sakado, K; Sakado, M; Seki, T; Kuwabara, H; Kojima, M; Sato, T; Someya, T
2001-06-01
Although a number of studies have reported on the association between obsessional personality features as measured by the Munich Personality Test (MPT) "Rigidity" scale and depression, there has been no examination of these relationships in a non-clinical sample. The dimensional scores on the MPT were compared between subjects with and without lifetime depression, using a sample of employed Japanese adults. The odds ratio for suffering from lifetime depression was estimated by multiple logistic regression analysis. To diagnose a lifetime history of depression, the Inventory to Diagnose Depression, Lifetime version (IDDL) was used. The subjects with lifetime depression scored significantly higher on the "Rigidity" scale than the subjects without lifetime depression. In our logistic regression analysis, three risk factors were identified as each independently increasing a person's risk for suffering from lifetime depression: higher levels of "Rigidity", being of the female gender, and suffering from current depressive symptoms. The MPT "Rigidity" scale is a sensitive measure of personality features that occur with depression.
Bakal, Gokhan; Talari, Preetham; Kakani, Elijah V; Kavuluru, Ramakanth
2018-06-01
Identifying new potential treatment options for medical conditions that cause human disease burden is a central task of biomedical research. Since all candidate drugs cannot be tested with animal and clinical trials, in vitro approaches are first attempted to identify promising candidates. Likewise, identifying different causal relations between biomedical entities is also critical to understand biomedical processes. Generally, natural language processing (NLP) and machine learning are used to predict specific relations between any given pair of entities using the distant supervision approach. To build high accuracy supervised predictive models to predict previously unknown treatment and causative relations between biomedical entities based only on semantic graph pattern features extracted from biomedical knowledge graphs. We used 7000 treats and 2918 causes hand-curated relations from the UMLS Metathesaurus to train and test our models. Our graph pattern features are extracted from simple paths connecting biomedical entities in the SemMedDB graph (based on the well-known SemMedDB database made available by the U.S. National Library of Medicine). Using these graph patterns connecting biomedical entities as features of logistic regression and decision tree models, we computed mean performance measures (precision, recall, F-score) over 100 distinct 80-20% train-test splits of the datasets. For all experiments, we used a positive:negative class imbalance of 1:10 in the test set to model relatively more realistic scenarios. Our models predict treats and causes relations with high F-scores of 99% and 90% respectively. Logistic regression model coefficients also help us identify highly discriminative patterns that have an intuitive interpretation. We are also able to predict some new plausible relations based on false positives that our models scored highly based on our collaborations with two physician co-authors. Finally, our decision tree models are able to retrieve over 50% of treatment relations from a recently created external dataset. We employed semantic graph patterns connecting pairs of candidate biomedical entities in a knowledge graph as features to predict treatment/causative relations between them. We provide what we believe is the first evidence in direct prediction of biomedical relations based on graph features. Our work complements lexical pattern based approaches in that the graph patterns can be used as additional features for weakly supervised relation prediction. Copyright © 2018 Elsevier Inc. All rights reserved.
Mahrooghy, Majid; Ashraf, Ahmed B; Daye, Dania; Mies, Carolyn; Feldman, Michael; Rosen, Mark; Kontos, Despina
2013-01-01
Breast tumors are heterogeneous lesions. Intra-tumor heterogeneity presents a major challenge for cancer diagnosis and treatment. Few studies have worked on capturing tumor heterogeneity from imaging. Most studies to date consider aggregate measures for tumor characterization. In this work we capture tumor heterogeneity by partitioning tumor pixels into subregions and extracting heterogeneity wavelet kinetic (HetWave) features from breast dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) to obtain the spatiotemporal patterns of the wavelet coefficients and contrast agent uptake from each partition. Using a genetic algorithm for feature selection, and a logistic regression classifier with leave one-out cross validation, we tested our proposed HetWave features for the task of classifying breast cancer recurrence risk. The classifier based on our features gave an ROC AUC of 0.78, outperforming previously proposed kinetic, texture, and spatial enhancement variance features which give AUCs of 0.69, 0.64, and 0.65, respectively.
NASA Astrophysics Data System (ADS)
He, Ting; Fan, Ming; Zhang, Peng; Li, Hui; Zhang, Juan; Shao, Guoliang; Li, Lihua
2018-03-01
Breast cancer can be classified into four molecular subtypes of Luminal A, Luminal B, HER2 and Basal-like, which have significant differences in treatment and survival outcomes. We in this study aim to predict immunohistochemistry (IHC) determined molecular subtypes of breast cancer using image features derived from tumor and peritumoral stroma region based on diffusion weighted imaging (DWI). A dataset of 126 breast cancer patients were collected who underwent preoperative breast MRI with a 3T scanner. The apparent diffusion coefficients (ADCs) were recorded from DWI, and breast image was segmented into regions comprising the tumor and the surrounding stromal. Statistical characteristics in various breast tumor and peritumoral regions were computed, including mean, minimum, maximum, variance, interquartile range, range, skewness, and kurtosis of ADC values. Additionally, the difference of features between each two regions were also calculated. The univariate logistic based classifier was performed for evaluating the performance of the individual features for discriminating subtypes. For multi-class classification, multivariate logistic regression model was trained and validated. The results showed that the tumor boundary and proximal peritumoral stroma region derived features have a higher performance in classification compared to that of the other regions. Furthermore, the prediction model using statistical features, difference features and all the features combined from these regions generated AUC values of 0.774, 0.796 and 0.811, respectively. The results in this study indicate that ADC feature in tumor and peritumoral stromal region would be valuable for estimating the molecular subtype in breast cancer.
Folded concave penalized learning in identifying multimodal MRI marker for Parkinson’s disease
Liu, Hongcheng; Du, Guangwei; Zhang, Lijun; Lewis, Mechelle M.; Wang, Xue; Yao, Tao; Li, Runze; Huang, Xuemei
2016-01-01
Background Brain MRI holds promise to gauge different aspects of Parkinson’s disease (PD)-related pathological changes. Its analysis, however, is hindered by the high-dimensional nature of the data. New method This study introduces folded concave penalized (FCP) sparse logistic regression to identify biomarkers for PD from a large number of potential factors. The proposed statistical procedures target the challenges of high-dimensionality with limited data samples acquired. The maximization problem associated with the sparse logistic regression model is solved by local linear approximation. The proposed procedures then are applied to the empirical analysis of multimodal MRI data. Results From 45 features, the proposed approach identified 15 MRI markers and the UPSIT, which are known to be clinically relevant to PD. By combining the MRI and clinical markers, we can enhance substantially the specificity and sensitivity of the model, as indicated by the ROC curves. Comparison to existing methods We compare the folded concave penalized learning scheme with both the Lasso penalized scheme and the principle component analysis-based feature selection (PCA) in the Parkinson’s biomarker identification problem that takes into account both the clinical features and MRI markers. The folded concave penalty method demonstrates a substantially better clinical potential than both the Lasso and PCA in terms of specificity and sensitivity. Conclusions For the first time, we applied the FCP learning method to MRI biomarker discovery in PD. The proposed approach successfully identified MRI markers that are clinically relevant. Combining these biomarkers with clinical features can substantially enhance performance. PMID:27102045
NASA Astrophysics Data System (ADS)
Farag, A. Z. A.; Sultan, M.; Elkadiri, R.; Abdelhalim, A.
2014-12-01
An integrated approach using remote sensing, landscape analysis and statistical methods was conducted to assess the role of groundwater sapping in shaping the Saharan landscape. A GIS-based logistic regression model was constructed to automatically delineate the spatial distribution of the sapping features over areas occupied by the Nubian Sandstone Aquifer System (NSAS): (1) an inventory was compiled of known locations of sapping features identified either in the field or from satellite datasets (e.g. Orbview-3 and Google Earth Digital Globe imagery); (2) spatial analyses were conducted in a GIS environment and seven geomorphological and geological predisposing factors (i.e. slope, stream density, cross-sectional and profile curvature, minimum and maximum curvature, and lithology) were identified; (3) a binary logistic regression model was constructed, optimized and validated to describe the relationship between the sapping locations and the set of controlling factors and (4) the generated model (prediction accuracy: 90.1%) was used to produce a regional sapping map over the NSAS. Model outputs indicate: (1) groundwater discharge and structural control played an important role in excavating the Saharan natural depressions as evidenced by the wide distribution of sapping features (areal extent: 1180 km2) along the fault-controlled escarpments of the Libyan Plateau; (2) proximity of mapped sapping features to reported paleolake and tufa deposits suggesting a causal effect. Our preliminary observations (from satellite imagery) and statistical analyses together with previous studies in the North Western Sahara Aquifer System (North Africa), Sinai Peninsula, Negev Desert, and The Plateau of Najd (Saudi Arabia) indicate extensive occurrence of sapping features along the escarpments bordering the northern margins of the Saharan-Arabian Desert; these areas share similar hydrologic settings with the NSAS domains and they too witnessed wet climatic periods in the Mid-Late Quaternary.
NASA Astrophysics Data System (ADS)
Priya, Mallika; Rao, Bola Sadashiva Satish; Chandra, Subhash; Ray, Satadru; Mathew, Stanley; Datta, Anirbit; Nayak, Subramanya G.; Mahato, Krishna Kishore
2016-02-01
In spite of many efforts for early detection of breast cancer, there is still lack of technology for immediate implementation. In the present study, the potential photoacoustic spectroscopy was evaluated in discriminating breast cancer from normal, involving blood serum samples seeking early detection. Three photoacoustic spectra in time domain were recorded from each of 20 normal and 20 malignant samples at 281nm pulsed laser excitations and a total of 120 spectra were generated. The time domain spectra were then Fast Fourier Transformed into frequency domain and 116.5625 - 206.875 kHz region was selected for further analysis using a combinational approach of wavelet, PCA and logistic regression. Initially, wavelet analysis was performed on the FFT data and seven features (mean, median, area under the curve, variance, standard deviation, skewness and kurtosis) from each were extracted. PCA was then performed on the feature matrix (7x120) for discriminating malignant samples from the normal by plotting a decision boundary using logistic regression analysis. The unsupervised mode of classification used in the present study yielded specificity and sensitivity values of 100% in each respectively with a ROC - AUC value of 1. The results obtained have clearly demonstrated the capability of photoacoustic spectroscopy in discriminating cancer from the normal, suggesting its possible clinical implications.
Das, D K; Maiti, A K; Chakraborty, C
2015-03-01
In this paper, we propose a comprehensive image characterization cum classification framework for malaria-infected stage detection using microscopic images of thin blood smears. The methodology mainly includes microscopic imaging of Leishman stained blood slides, noise reduction and illumination correction, erythrocyte segmentation, feature selection followed by machine classification. Amongst three-image segmentation algorithms (namely, rule-based, Chan-Vese-based and marker-controlled watershed methods), marker-controlled watershed technique provides better boundary detection of erythrocytes specially in overlapping situations. Microscopic features at intensity, texture and morphology levels are extracted to discriminate infected and noninfected erythrocytes. In order to achieve subgroup of potential features, feature selection techniques, namely, F-statistic and information gain criteria are considered here for ranking. Finally, five different classifiers, namely, Naive Bayes, multilayer perceptron neural network, logistic regression, classification and regression tree (CART), RBF neural network have been trained and tested by 888 erythrocytes (infected and noninfected) for each features' subset. Performance evaluation of the proposed methodology shows that multilayer perceptron network provides higher accuracy for malaria-infected erythrocytes recognition and infected stage classification. Results show that top 90 features ranked by F-statistic (specificity: 98.64%, sensitivity: 100%, PPV: 99.73% and overall accuracy: 96.84%) and top 60 features ranked by information gain provides better results (specificity: 97.29%, sensitivity: 100%, PPV: 99.46% and overall accuracy: 96.73%) for malaria-infected stage classification. © 2014 The Authors Journal of Microscopy © 2014 Royal Microscopical Society.
Mielniczuk, Jan; Teisseyre, Paweł
2018-03-01
Detection of gene-gene interactions is one of the most important challenges in genome-wide case-control studies. Besides traditional logistic regression analysis, recently the entropy-based methods attracted a significant attention. Among entropy-based methods, interaction information is one of the most promising measures having many desirable properties. Although both logistic regression and interaction information have been used in several genome-wide association studies, the relationship between them has not been thoroughly investigated theoretically. The present paper attempts to fill this gap. We show that although certain connections between the two methods exist, in general they refer two different concepts of dependence and looking for interactions in those two senses leads to different approaches to interaction detection. We introduce ordering between interaction measures and specify conditions for independent and dependent genes under which interaction information is more discriminative measure than logistic regression. Moreover, we show that for so-called perfect distributions those measures are equivalent. The numerical experiments illustrate the theoretical findings indicating that interaction information and its modified version are more universal tools for detecting various types of interaction than logistic regression and linkage disequilibrium measures. © 2017 WILEY PERIODICALS, INC.
Detecting DIF in Polytomous Items Using MACS, IRT and Ordinal Logistic Regression
ERIC Educational Resources Information Center
Elosua, Paula; Wells, Craig
2013-01-01
The purpose of the present study was to compare the Type I error rate and power of two model-based procedures, the mean and covariance structure model (MACS) and the item response theory (IRT), and an observed-score based procedure, ordinal logistic regression, for detecting differential item functioning (DIF) in polytomous items. A simulation…
Steganalysis using logistic regression
NASA Astrophysics Data System (ADS)
Lubenko, Ivans; Ker, Andrew D.
2011-02-01
We advocate Logistic Regression (LR) as an alternative to the Support Vector Machine (SVM) classifiers commonly used in steganalysis. LR offers more information than traditional SVM methods - it estimates class probabilities as well as providing a simple classification - and can be adapted more easily and efficiently for multiclass problems. Like SVM, LR can be kernelised for nonlinear classification, and it shows comparable classification accuracy to SVM methods. This work is a case study, comparing accuracy and speed of SVM and LR classifiers in detection of LSB Matching and other related spatial-domain image steganography, through the state-of-art 686-dimensional SPAM feature set, in three image sets.
London Measure of Unplanned Pregnancy: guidance for its use as an outcome measure
Hall, Jennifer A; Barrett, Geraldine; Copas, Andrew; Stephenson, Judith
2017-01-01
Background The London Measure of Unplanned Pregnancy (LMUP) is a psychometrically validated measure of the degree of intention of a current or recent pregnancy. The LMUP is increasingly being used worldwide, and can be used to evaluate family planning or preconception care programs. However, beyond recommending the use of the full LMUP scale, there is no published guidance on how to use the LMUP as an outcome measure. Ordinal logistic regression has been recommended informally, but studies published to date have all used binary logistic regression and dichotomized the scale at different cut points. There is thus a need for evidence-based guidance to provide a standardized methodology for multivariate analysis and to enable comparison of results. This paper makes recommendations for the regression method for analysis of the LMUP as an outcome measure. Materials and methods Data collected from 4,244 pregnant women in Malawi were used to compare five regression methods: linear, logistic with two cut points, and ordinal logistic with either the full or grouped LMUP score. The recommendations were then tested on the original UK LMUP data. Results There were small but no important differences in the findings across the regression models. Logistic regression resulted in the largest loss of information, and assumptions were violated for the linear and ordinal logistic regression. Consequently, robust standard errors were used for linear regression and a partial proportional odds ordinal logistic regression model attempted. The latter could only be fitted for grouped LMUP score. Conclusion We recommend the linear regression model with robust standard errors to make full use of the LMUP score when analyzed as an outcome measure. Ordinal logistic regression could be considered, but a partial proportional odds model with grouped LMUP score may be required. Logistic regression is the least-favored option, due to the loss of information. For logistic regression, the cut point for un/planned pregnancy should be between nine and ten. These recommendations will standardize the analysis of LMUP data and enhance comparability of results across studies. PMID:28435343
Assessing the Effects of Software Platforms on Volumetric Segmentation of Glioblastoma
Dunn, William D.; Aerts, Hugo J.W.L.; Cooper, Lee A.; Holder, Chad A.; Hwang, Scott N.; Jaffe, Carle C.; Brat, Daniel J.; Jain, Rajan; Flanders, Adam E.; Zinn, Pascal O.; Colen, Rivka R.; Gutman, David A.
2017-01-01
Background Radiological assessments of biologically relevant regions in glioblastoma have been associated with genotypic characteristics, implying a potential role in personalized medicine. Here, we assess the reproducibility and association with survival of two volumetric segmentation platforms and explore how methodology could impact subsequent interpretation and analysis. Methods Post-contrast T1- and T2-weighted FLAIR MR images of 67 TCGA patients were segmented into five distinct compartments (necrosis, contrast-enhancement, FLAIR, post contrast abnormal, and total abnormal tumor volumes) by two quantitative image segmentation platforms - 3D Slicer and a method based on Velocity AI and FSL. We investigated the internal consistency of each platform by correlation statistics, association with survival, and concordance with consensus neuroradiologist ratings using ordinal logistic regression. Results We found high correlations between the two platforms for FLAIR, post contrast abnormal, and total abnormal tumor volumes (spearman’s r(67) = 0.952, 0.959, and 0.969 respectively). Only modest agreement was observed for necrosis and contrast-enhancement volumes (r(67) = 0.693 and 0.773 respectively), likely arising from differences in manual and automated segmentation methods of these regions by 3D Slicer and Velocity AI/FSL, respectively. Survival analysis based on AUC revealed significant predictive power of both platforms for the following volumes: contrast-enhancement, post contrast abnormal, and total abnormal tumor volumes. Finally, ordinal logistic regression demonstrated correspondence to manual ratings for several features. Conclusion Tumor volume measurements from both volumetric platforms produced highly concordant and reproducible estimates across platforms for general features. As automated or semi-automated volumetric measurements replace manual linear or area measurements, it will become increasingly important to keep in mind that measurement differences between segmentation platforms for more detailed features could influence downstream survival or radio genomic analyses. PMID:29600296
Rank-Optimized Logistic Matrix Regression toward Improved Matrix Data Classification.
Zhang, Jianguang; Jiang, Jianmin
2018-02-01
While existing logistic regression suffers from overfitting and often fails in considering structural information, we propose a novel matrix-based logistic regression to overcome the weakness. In the proposed method, 2D matrices are directly used to learn two groups of parameter vectors along each dimension without vectorization, which allows the proposed method to fully exploit the underlying structural information embedded inside the 2D matrices. Further, we add a joint [Formula: see text]-norm on two parameter matrices, which are organized by aligning each group of parameter vectors in columns. This added co-regularization term has two roles-enhancing the effect of regularization and optimizing the rank during the learning process. With our proposed fast iterative solution, we carried out extensive experiments. The results show that in comparison to both the traditional tensor-based methods and the vector-based regression methods, our proposed solution achieves better performance for matrix data classifications.
Szekér, Szabolcs; Vathy-Fogarassy, Ágnes
2018-01-01
Logistic regression based propensity score matching is a widely used method in case-control studies to select the individuals of the control group. This method creates a suitable control group if all factors affecting the output variable are known. However, if relevant latent variables exist as well, which are not taken into account during the calculations, the quality of the control group is uncertain. In this paper, we present a statistics-based research in which we try to determine the relationship between the accuracy of the logistic regression model and the uncertainty of the dependent variable of the control group defined by propensity score matching. Our analyses show that there is a linear correlation between the fit of the logistic regression model and the uncertainty of the output variable. In certain cases, a latent binary explanatory variable can result in a relative error of up to 70% in the prediction of the outcome variable. The observed phenomenon calls the attention of analysts to an important point, which must be taken into account when deducting conclusions.
Mixed features in patients with a major depressive episode: the BRIDGE-II-MIX study.
Perugi, Giulio; Angst, Jules; Azorin, Jean-Michel; Bowden, Charles L; Mosolov, Sergey; Reis, Joao; Vieta, Eduard; Young, Allan H
2015-03-01
To estimate the frequency of mixed states in patients diagnosed with major depressive episode (MDE) according to conceptually different definitions and to compare their clinical validity. This multicenter, multinational cross-sectional Bipolar Disorders: Improving Diagnosis, Guidance and Education (BRIDGE)-II-MIX study enrolled 2,811 adult patients experiencing an MDE. Data were collected per protocol on sociodemographic variables, current and past psychiatric symptoms, and clinical variables that are risk factors for bipolar disorder. The frequency of mixed features was determined by applying both DSM-5 criteria and a priori described Research-Based Diagnostic Criteria (RBDC). Clinical variables associated with mixed features were assessed using logistic regression. Overall, 212 patients (7.5%) fulfilled DSM-5 criteria for MDE with mixed features (DSM-5-MXS), and 818 patients (29.1%) fulfilled diagnostic criteria for a predefined RBDC depressive mixed state (RBDC-MXS). The most frequent manic/hypomanic symptoms were irritable mood (32.6%), emotional/mood lability (29.8%), distractibility (24.4%), psychomotor agitation (16.1%), impulsivity (14.5%), aggression (14.2%), racing thoughts (11.8%), and pressure to keep talking (11.4%). Euphoria (4.6%), grandiosity (3.7%), and hypersexuality (2.6%) were less represented. In multivariate logistic regression analysis, RBDC-MXS was associated with the largest number of variables including diagnosis of bipolar disorder, family history of mania, lifetime suicide attempts, duration of the current episode > 1 month, atypical features, early onset, history of antidepressant-induced mania/hypomania, and lifetime comorbidity with anxiety, alcohol and substance use disorders, attention-deficit/hyperactivity disorder, and borderline personality disorder. Depressive mixed state, defined as the presence of 3 or more manic/hypomanic features, was present in around one-third of patients experiencing an MDE. The valid symptom, illness course and family history RBDC criteria we assessed identified 4 times more MDE patients as having mixed features and yielded statistically more robust associations with several illness characteristics of bipolar disorder than did DSM-5 criteria. © Copyright 2015 Physicians Postgraduate Press, Inc.
Stylianou, Neophytos; Akbarov, Artur; Kontopantelis, Evangelos; Buchan, Iain; Dunn, Ken W
2015-08-01
Predicting mortality from burn injury has traditionally employed logistic regression models. Alternative machine learning methods have been introduced in some areas of clinical prediction as the necessary software and computational facilities have become accessible. Here we compare logistic regression and machine learning predictions of mortality from burn. An established logistic mortality model was compared to machine learning methods (artificial neural network, support vector machine, random forests and naïve Bayes) using a population-based (England & Wales) case-cohort registry. Predictive evaluation used: area under the receiver operating characteristic curve; sensitivity; specificity; positive predictive value and Youden's index. All methods had comparable discriminatory abilities, similar sensitivities, specificities and positive predictive values. Although some machine learning methods performed marginally better than logistic regression the differences were seldom statistically significant and clinically insubstantial. Random forests were marginally better for high positive predictive value and reasonable sensitivity. Neural networks yielded slightly better prediction overall. Logistic regression gives an optimal mix of performance and interpretability. The established logistic regression model of burn mortality performs well against more complex alternatives. Clinical prediction with a small set of strong, stable, independent predictors is unlikely to gain much from machine learning outside specialist research contexts. Copyright © 2015 Elsevier Ltd and ISBI. All rights reserved.
ERIC Educational Resources Information Center
Gruneir, Andrea; Miller, Susan C.; Intrator, Orna; Mor, Vincent
2007-01-01
Purpose: The purpose of this study was to quantify the effect of specific nursing home features and state Medicaid policies on the risk of hospitalization among cognitively impaired nursing home residents. Design and Methods: We used multilevel logistic regression to estimate the odds of hospitalization among long-stay (greater than 90 days)…
Amini, Payam; Maroufizadeh, Saman; Samani, Reza Omani; Hamidi, Omid; Sepidarkish, Mahdi
2017-06-01
Preterm birth (PTB) is a leading cause of neonatal death and the second biggest cause of death in children under five years of age. The objective of this study was to determine the prevalence of PTB and its associated factors using logistic regression and decision tree classification methods. This cross-sectional study was conducted on 4,415 pregnant women in Tehran, Iran, from July 6-21, 2015. Data were collected by a researcher-developed questionnaire through interviews with mothers and review of their medical records. To evaluate the accuracy of the logistic regression and decision tree methods, several indices such as sensitivity, specificity, and the area under the curve were used. The PTB rate was 5.5% in this study. The logistic regression outperformed the decision tree for the classification of PTB based on risk factors. Logistic regression showed that multiple pregnancies, mothers with preeclampsia, and those who conceived with assisted reproductive technology had an increased risk for PTB ( p < 0.05). Identifying and training mothers at risk as well as improving prenatal care may reduce the PTB rate. We also recommend that statisticians utilize the logistic regression model for the classification of risk groups for PTB.
Nonconvex Sparse Logistic Regression With Weakly Convex Regularization
NASA Astrophysics Data System (ADS)
Shen, Xinyue; Gu, Yuantao
2018-06-01
In this work we propose to fit a sparse logistic regression model by a weakly convex regularized nonconvex optimization problem. The idea is based on the finding that a weakly convex function as an approximation of the $\\ell_0$ pseudo norm is able to better induce sparsity than the commonly used $\\ell_1$ norm. For a class of weakly convex sparsity inducing functions, we prove the nonconvexity of the corresponding sparse logistic regression problem, and study its local optimality conditions and the choice of the regularization parameter to exclude trivial solutions. Despite the nonconvexity, a method based on proximal gradient descent is used to solve the general weakly convex sparse logistic regression, and its convergence behavior is studied theoretically. Then the general framework is applied to a specific weakly convex function, and a necessary and sufficient local optimality condition is provided. The solution method is instantiated in this case as an iterative firm-shrinkage algorithm, and its effectiveness is demonstrated in numerical experiments by both randomly generated and real datasets.
Austin, Peter C; Lee, Douglas S; Steyerberg, Ewout W; Tu, Jack V
2012-01-01
In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. PMID:22777999
Deletion Diagnostics for Alternating Logistic Regressions
Preisser, John S.; By, Kunthel; Perin, Jamie; Qaqish, Bahjat F.
2013-01-01
Deletion diagnostics are introduced for the regression analysis of clustered binary outcomes estimated with alternating logistic regressions, an implementation of generalized estimating equations (GEE) that estimates regression coefficients in a marginal mean model and in a model for the intracluster association given by the log odds ratio. The diagnostics are developed within an estimating equations framework that recasts the estimating functions for association parameters based upon conditional residuals into equivalent functions based upon marginal residuals. Extensions of earlier work on GEE diagnostics follow directly, including computational formulae for one-step deletion diagnostics that measure the influence of a cluster of observations on the estimated regression parameters and on the overall marginal mean or association model fit. The diagnostic formulae are evaluated with simulations studies and with an application concerning an assessment of factors associated with health maintenance visits in primary care medical practices. The application and the simulations demonstrate that the proposed cluster-deletion diagnostics for alternating logistic regressions are good approximations of their exact fully iterated counterparts. PMID:22777960
Characterization and machine learning prediction of allele-specific DNA methylation.
He, Jianlin; Sun, Ming-an; Wang, Zhong; Wang, Qianfei; Li, Qing; Xie, Hehuang
2015-12-01
A large collection of Single Nucleotide Polymorphisms (SNPs) has been identified in the human genome. Currently, the epigenetic influences of SNPs on their neighboring CpG sites remain elusive. A growing body of evidence suggests that locus-specific information, including genomic features and local epigenetic state, may play important roles in the epigenetic readout of SNPs. In this study, we made use of mouse methylomes with known SNPs to develop statistical models for the prediction of SNP associated allele-specific DNA methylation (ASM). ASM has been classified into parent-of-origin dependent ASM (P-ASM) and sequence-dependent ASM (S-ASM), which comprises scattered-S-ASM (sS-ASM) and clustered-S-ASM (cS-ASM). We found that P-ASM and cS-ASM CpG sites are both enriched in CpG rich regions, promoters and exons, while sS-ASM CpG sites are enriched in simple repeat and regions with high frequent SNP occurrence. Using Lasso-grouped Logistic Regression (LGLR), we selected 21 out of 282 genomic and methylation related features that are powerful in distinguishing cS-ASM CpG sites and trained the classifiers with machine learning techniques. Based on 5-fold cross-validation, the logistic regression classifier was found to be the best for cS-ASM prediction with an ACC of 0.77, an AUC of 0.84 and an MCC of 0.54. Lastly, we applied the logistic regression classifier on human brain methylome and predicted 608 genes associated with cS-ASM. Gene ontology term enrichment analysis indicated that these cS-ASM associated genes are significantly enriched in the category coding for transcripts with alternative splicing forms. In summary, this study provided an analytical procedure for cS-ASM prediction and shed new light on the understanding of different types of ASM events. Published by Elsevier Inc.
NASA Astrophysics Data System (ADS)
Zhang, Hong; Hou, Rui; Yi, Lei; Meng, Juan; Pan, Zhisong; Zhou, Yuhuan
2016-07-01
The accurate identification of encrypted data stream helps to regulate illegal data, detect network attacks and protect users' information. In this paper, a novel encrypted data stream identification algorithm is introduced. The proposed method is based on randomness characteristics of encrypted data stream. We use a l1-norm regularized logistic regression to improve sparse representation of randomness features and Fuzzy Gaussian Mixture Model (FGMM) to improve identification accuracy. Experimental results demonstrate that the method can be adopted as an effective technique for encrypted data stream identification.
glmnetLRC f/k/a lrc package: Logistic Regression Classification
DOE Office of Scientific and Technical Information (OSTI.GOV)
2016-06-09
Methods for fitting and predicting logistic regression classifiers (LRC) with an arbitrary loss function using elastic net or best subsets. This package adds additional model fitting features to the existing glmnet and bestglm R packages. This package was created to perform the analyses described in Amidan BG, Orton DJ, LaMarche BL, et al. 2014. Signatures for Mass Spectrometry Data Quality. Journal of Proteome Research. 13(4), 2215-2222. It makes the model fitting available in the glmnet and bestglm packages more general by identifying optimal model parameters via cross validation with an customizable loss function. It also identifies the optimal threshold formore » binary classification.« less
Hill, Andrew; Loh, Po-Ru; Bharadwaj, Ragu B.; Pons, Pascal; Shang, Jingbo; Guinan, Eva; Lakhani, Karim; Kilty, Iain
2017-01-01
Abstract Background: The association of differing genotypes with disease-related phenotypic traits offers great potential to both help identify new therapeutic targets and support stratification of patients who would gain the greatest benefit from specific drug classes. Development of low-cost genotyping and sequencing has made collecting large-scale genotyping data routine in population and therapeutic intervention studies. In addition, a range of new technologies is being used to capture numerous new and complex phenotypic descriptors. As a result, genotype and phenotype datasets have grown exponentially. Genome-wide association studies associate genotypes and phenotypes using methods such as logistic regression. As existing tools for association analysis limit the efficiency by which value can be extracted from increasing volumes of data, there is a pressing need for new software tools that can accelerate association analyses on large genotype-phenotype datasets. Results: Using open innovation (OI) and contest-based crowdsourcing, the logistic regression analysis in a leading, community-standard genetics software package (PLINK 1.07) was substantially accelerated. OI allowed us to do this in <6 months by providing rapid access to highly skilled programmers with specialized, difficult-to-find skill sets. Through a crowd-based contest a combination of computational, numeric, and algorithmic approaches was identified that accelerated the logistic regression in PLINK 1.07 by 18- to 45-fold. Combining contest-derived logistic regression code with coarse-grained parallelization, multithreading, and associated changes to data initialization code further developed through distributed innovation, we achieved an end-to-end speedup of 591-fold for a data set size of 6678 subjects by 645 863 variants, compared to PLINK 1.07's logistic regression. This represents a reduction in run time from 4.8 hours to 29 seconds. Accelerated logistic regression code developed in this project has been incorporated into the PLINK2 project. Conclusions: Using iterative competition-based OI, we have developed a new, faster implementation of logistic regression for genome-wide association studies analysis. We present lessons learned and recommendations on running a successful OI process for bioinformatics. PMID:28327993
Hill, Andrew; Loh, Po-Ru; Bharadwaj, Ragu B; Pons, Pascal; Shang, Jingbo; Guinan, Eva; Lakhani, Karim; Kilty, Iain; Jelinsky, Scott A
2017-05-01
The association of differing genotypes with disease-related phenotypic traits offers great potential to both help identify new therapeutic targets and support stratification of patients who would gain the greatest benefit from specific drug classes. Development of low-cost genotyping and sequencing has made collecting large-scale genotyping data routine in population and therapeutic intervention studies. In addition, a range of new technologies is being used to capture numerous new and complex phenotypic descriptors. As a result, genotype and phenotype datasets have grown exponentially. Genome-wide association studies associate genotypes and phenotypes using methods such as logistic regression. As existing tools for association analysis limit the efficiency by which value can be extracted from increasing volumes of data, there is a pressing need for new software tools that can accelerate association analyses on large genotype-phenotype datasets. Using open innovation (OI) and contest-based crowdsourcing, the logistic regression analysis in a leading, community-standard genetics software package (PLINK 1.07) was substantially accelerated. OI allowed us to do this in <6 months by providing rapid access to highly skilled programmers with specialized, difficult-to-find skill sets. Through a crowd-based contest a combination of computational, numeric, and algorithmic approaches was identified that accelerated the logistic regression in PLINK 1.07 by 18- to 45-fold. Combining contest-derived logistic regression code with coarse-grained parallelization, multithreading, and associated changes to data initialization code further developed through distributed innovation, we achieved an end-to-end speedup of 591-fold for a data set size of 6678 subjects by 645 863 variants, compared to PLINK 1.07's logistic regression. This represents a reduction in run time from 4.8 hours to 29 seconds. Accelerated logistic regression code developed in this project has been incorporated into the PLINK2 project. Using iterative competition-based OI, we have developed a new, faster implementation of logistic regression for genome-wide association studies analysis. We present lessons learned and recommendations on running a successful OI process for bioinformatics. © The Author 2017. Published by Oxford University Press.
Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula
2011-01-01
Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.
Unconditional or Conditional Logistic Regression Model for Age-Matched Case-Control Data?
Kuo, Chia-Ling; Duan, Yinghui; Grady, James
2018-01-01
Matching on demographic variables is commonly used in case-control studies to adjust for confounding at the design stage. There is a presumption that matched data need to be analyzed by matched methods. Conditional logistic regression has become a standard for matched case-control data to tackle the sparse data problem. The sparse data problem, however, may not be a concern for loose-matching data when the matching between cases and controls is not unique, and one case can be matched to other controls without substantially changing the association. Data matched on a few demographic variables are clearly loose-matching data, and we hypothesize that unconditional logistic regression is a proper method to perform. To address the hypothesis, we compare unconditional and conditional logistic regression models by precision in estimates and hypothesis testing using simulated matched case-control data. Our results support our hypothesis; however, the unconditional model is not as robust as the conditional model to the matching distortion that the matching process not only makes cases and controls similar for matching variables but also for the exposure status. When the study design involves other complex features or the computational burden is high, matching in loose-matching data can be ignored for negligible loss in testing and estimation if the distributions of matching variables are not extremely different between cases and controls.
Unconditional or Conditional Logistic Regression Model for Age-Matched Case–Control Data?
Kuo, Chia-Ling; Duan, Yinghui; Grady, James
2018-01-01
Matching on demographic variables is commonly used in case–control studies to adjust for confounding at the design stage. There is a presumption that matched data need to be analyzed by matched methods. Conditional logistic regression has become a standard for matched case–control data to tackle the sparse data problem. The sparse data problem, however, may not be a concern for loose-matching data when the matching between cases and controls is not unique, and one case can be matched to other controls without substantially changing the association. Data matched on a few demographic variables are clearly loose-matching data, and we hypothesize that unconditional logistic regression is a proper method to perform. To address the hypothesis, we compare unconditional and conditional logistic regression models by precision in estimates and hypothesis testing using simulated matched case–control data. Our results support our hypothesis; however, the unconditional model is not as robust as the conditional model to the matching distortion that the matching process not only makes cases and controls similar for matching variables but also for the exposure status. When the study design involves other complex features or the computational burden is high, matching in loose-matching data can be ignored for negligible loss in testing and estimation if the distributions of matching variables are not extremely different between cases and controls. PMID:29552553
Accurate Diabetes Risk Stratification Using Machine Learning: Role of Missing Value and Outliers.
Maniruzzaman, Md; Rahman, Md Jahanur; Al-MehediHasan, Md; Suri, Harman S; Abedin, Md Menhazul; El-Baz, Ayman; Suri, Jasjit S
2018-04-10
Diabetes mellitus is a group of metabolic diseases in which blood sugar levels are too high. About 8.8% of the world was diabetic in 2017. It is projected that this will reach nearly 10% by 2045. The major challenge is that when machine learning-based classifiers are applied to such data sets for risk stratification, leads to lower performance. Thus, our objective is to develop an optimized and robust machine learning (ML) system under the assumption that missing values or outliers if replaced by a median configuration will yield higher risk stratification accuracy. This ML-based risk stratification is designed, optimized and evaluated, where: (i) the features are extracted and optimized from the six feature selection techniques (random forest, logistic regression, mutual information, principal component analysis, analysis of variance, and Fisher discriminant ratio) and combined with ten different types of classifiers (linear discriminant analysis, quadratic discriminant analysis, naïve Bayes, Gaussian process classification, support vector machine, artificial neural network, Adaboost, logistic regression, decision tree, and random forest) under the hypothesis that both missing values and outliers when replaced by computed medians will improve the risk stratification accuracy. Pima Indian diabetic dataset (768 patients: 268 diabetic and 500 controls) was used. Our results demonstrate that on replacing the missing values and outliers by group median and median values, respectively and further using the combination of random forest feature selection and random forest classification technique yields an accuracy, sensitivity, specificity, positive predictive value, negative predictive value and area under the curve as: 92.26%, 95.96%, 79.72%, 91.14%, 91.20%, and 0.93, respectively. This is an improvement of 10% over previously developed techniques published in literature. The system was validated for its stability and reliability. RF-based model showed the best performance when outliers are replaced by median values.
Determining factors influencing survival of breast cancer by fuzzy logistic regression model.
Nikbakht, Roya; Bahrampour, Abbas
2017-01-01
Fuzzy logistic regression model can be used for determining influential factors of disease. This study explores the important factors of actual predictive survival factors of breast cancer's patients. We used breast cancer data which collected by cancer registry of Kerman University of Medical Sciences during the period of 2000-2007. The variables such as morphology, grade, age, and treatments (surgery, radiotherapy, and chemotherapy) were applied in the fuzzy logistic regression model. Performance of model was determined in terms of mean degree of membership (MDM). The study results showed that almost 41% of patients were in neoplasm and malignant group and more than two-third of them were still alive after 5-year follow-up. Based on the fuzzy logistic model, the most important factors influencing survival were chemotherapy, morphology, and radiotherapy, respectively. Furthermore, the MDM criteria show that the fuzzy logistic regression have a good fit on the data (MDM = 0.86). Fuzzy logistic regression model showed that chemotherapy is more important than radiotherapy in survival of patients with breast cancer. In addition, another ability of this model is calculating possibilistic odds of survival in cancer patients. The results of this study can be applied in clinical research. Furthermore, there are few studies which applied the fuzzy logistic models. Furthermore, we recommend using this model in various research areas.
Practical Session: Logistic Regression
NASA Astrophysics Data System (ADS)
Clausel, M.; Grégoire, G.
2014-12-01
An exercise is proposed to illustrate the logistic regression. One investigates the different risk factors in the apparition of coronary heart disease. It has been proposed in Chapter 5 of the book of D.G. Kleinbaum and M. Klein, "Logistic Regression", Statistics for Biology and Health, Springer Science Business Media, LLC (2010) and also by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr341.pdf). This example is based on data given in the file evans.txt coming from http://www.sph.emory.edu/dkleinb/logreg3.htm#data.
Automatic annotation of protein motif function with Gene Ontology terms.
Lu, Xinghua; Zhai, Chengxiang; Gopalakrishnan, Vanathi; Buchanan, Bruce G
2004-09-02
Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, a much needed and important task is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO) project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. This paper presents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifs is viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association is found to be a very useful feature. We take advantage of the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correct association. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about the functions of newly discovered candidate protein motifs.
Kesselmeier, Miriam; Lorenzo Bermejo, Justo
2017-11-01
Logistic regression is the most common technique used for genetic case-control association studies. A disadvantage of standard maximum likelihood estimators of the genotype relative risk (GRR) is their strong dependence on outlier subjects, for example, patients diagnosed at unusually young age. Robust methods are available to constrain outlier influence, but they are scarcely used in genetic studies. This article provides a non-intimidating introduction to robust logistic regression, and investigates its benefits and limitations in genetic association studies. We applied the bounded Huber and extended the R package 'robustbase' with the re-descending Hampel functions to down-weight outlier influence. Computer simulations were carried out to assess the type I error rate, mean squared error (MSE) and statistical power according to major characteristics of the genetic study and investigated markers. Simulations were complemented with the analysis of real data. Both standard and robust estimation controlled type I error rates. Standard logistic regression showed the highest power but standard GRR estimates also showed the largest bias and MSE, in particular for associated rare and recessive variants. For illustration, a recessive variant with a true GRR=6.32 and a minor allele frequency=0.05 investigated in a 1000 case/1000 control study by standard logistic regression resulted in power=0.60 and MSE=16.5. The corresponding figures for Huber-based estimation were power=0.51 and MSE=0.53. Overall, Hampel- and Huber-based GRR estimates did not differ much. Robust logistic regression may represent a valuable alternative to standard maximum likelihood estimation when the focus lies on risk prediction rather than identification of susceptibility variants. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Identifying the optimal segmentors for mass classification in mammograms
NASA Astrophysics Data System (ADS)
Zhang, Yu; Tomuro, Noriko; Furst, Jacob; Raicu, Daniela S.
2015-03-01
In this paper, we present the results of our investigation on identifying the optimal segmentor(s) from an ensemble of weak segmentors, used in a Computer-Aided Diagnosis (CADx) system which classifies suspicious masses in mammograms as benign or malignant. This is an extension of our previous work, where we used various parameter settings of image enhancement techniques to each suspicious mass (region of interest (ROI)) to obtain several enhanced images, then applied segmentation to each image to obtain several contours of a given mass. Each segmentation in this ensemble is essentially a "weak segmentor" because no single segmentation can produce the optimal result for all images. Then after shape features are computed from the segmented contours, the final classification model was built using logistic regression. The work in this paper focuses on identifying the optimal segmentor(s) from an ensemble mix of weak segmentors. For our purpose, optimal segmentors are those in the ensemble mix which contribute the most to the overall classification rather than the ones that produced high precision segmentation. To measure the segmentors' contribution, we examined weights on the features in the derived logistic regression model and computed the average feature weight for each segmentor. The result showed that, while in general the segmentors with higher segmentation success rates had higher feature weights, some segmentors with lower segmentation rates had high classification feature weights as well.
A Solution to Separation and Multicollinearity in Multiple Logistic Regression
Shen, Jianzhao; Gao, Sujuan
2010-01-01
In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27–38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth’s penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study. PMID:20376286
A Solution to Separation and Multicollinearity in Multiple Logistic Regression.
Shen, Jianzhao; Gao, Sujuan
2008-10-01
In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27-38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth's penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study.
Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon
2015-01-01
Background: Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. Methods: In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. Results: The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Conclusion: Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended. PMID:26793655
Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon
2015-01-01
Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended.
Estimating the Probability of Rare Events Occurring Using a Local Model Averaging.
Chen, Jin-Hua; Chen, Chun-Shu; Huang, Meng-Fan; Lin, Hung-Chih
2016-10-01
In statistical applications, logistic regression is a popular method for analyzing binary data accompanied by explanatory variables. But when one of the two outcomes is rare, the estimation of model parameters has been shown to be severely biased and hence estimating the probability of rare events occurring based on a logistic regression model would be inaccurate. In this article, we focus on estimating the probability of rare events occurring based on logistic regression models. Instead of selecting a best model, we propose a local model averaging procedure based on a data perturbation technique applied to different information criteria to obtain different probability estimates of rare events occurring. Then an approximately unbiased estimator of Kullback-Leibler loss is used to choose the best one among them. We design complete simulations to show the effectiveness of our approach. For illustration, a necrotizing enterocolitis (NEC) data set is analyzed. © 2016 Society for Risk Analysis.
Habitat patch size and nesting success of yellow-breasted chats
Dick E. Burhans; Frank R. Thompson III
1999-01-01
We measured vegetation at shrub patches used for nesting by Yellow-breasted Chats (Icteria virens) to evaluate the importance of nesting habitat patch features on nest predation, cowbird parasitism, and nest site selection. Logistic regression models indicated that nests in small patches (average diameter
NASA Astrophysics Data System (ADS)
Schaeben, Helmut; Semmler, Georg
2016-09-01
The objective of prospectivity modeling is prediction of the conditional probability of the presence T = 1 or absence T = 0 of a target T given favorable or prohibitive predictors B, or construction of a two classes 0,1 classification of T. A special case of logistic regression called weights-of-evidence (WofE) is geologists' favorite method of prospectivity modeling due to its apparent simplicity. However, the numerical simplicity is deceiving as it is implied by the severe mathematical modeling assumption of joint conditional independence of all predictors given the target. General weights of evidence are explicitly introduced which are as simple to estimate as conventional weights, i.e., by counting, but do not require conditional independence. Complementary to the regression view is the classification view on prospectivity modeling. Boosting is the construction of a strong classifier from a set of weak classifiers. From the regression point of view it is closely related to logistic regression. Boost weights-of-evidence (BoostWofE) was introduced into prospectivity modeling to counterbalance violations of the assumption of conditional independence even though relaxation of modeling assumptions with respect to weak classifiers was not the (initial) purpose of boosting. In the original publication of BoostWofE a fabricated dataset was used to "validate" this approach. Using the same fabricated dataset it is shown that BoostWofE cannot generally compensate lacking conditional independence whatever the consecutively processing order of predictors. Thus the alleged features of BoostWofE are disproved by way of counterexamples, while theoretical findings are confirmed that logistic regression including interaction terms can exactly compensate violations of joint conditional independence if the predictors are indicators.
Computational approaches for predicting biomedical research collaborations.
Zhang, Qing; Yu, Hong
2014-01-01
Biomedical research is increasingly collaborative, and successful collaborations often produce high impact work. Computational approaches can be developed for automatically predicting biomedical research collaborations. Previous works of collaboration prediction mainly explored the topological structures of research collaboration networks, leaving out rich semantic information from the publications themselves. In this paper, we propose supervised machine learning approaches to predict research collaborations in the biomedical field. We explored both the semantic features extracted from author research interest profile and the author network topological features. We found that the most informative semantic features for author collaborations are related to research interest, including similarity of out-citing citations, similarity of abstracts. Of the four supervised machine learning models (naïve Bayes, naïve Bayes multinomial, SVMs, and logistic regression), the best performing model is logistic regression with an ROC ranging from 0.766 to 0.980 on different datasets. To our knowledge we are the first to study in depth how research interest and productivities can be used for collaboration prediction. Our approach is computationally efficient, scalable and yet simple to implement. The datasets of this study are available at https://github.com/qingzhanggithub/medline-collaboration-datasets.
Parsaeian, M; Mohammad, K; Mahmoudi, M; Zeraati, H
2012-01-01
Background: The purpose of this investigation was to compare empirically predictive ability of an artificial neural network with a logistic regression in prediction of low back pain. Methods: Data from the second national health survey were considered in this investigation. This data includes the information of low back pain and its associated risk factors among Iranian people aged 15 years and older. Artificial neural network and logistic regression models were developed using a set of 17294 data and they were validated in a test set of 17295 data. Hosmer and Lemeshow recommendation for model selection was used in fitting the logistic regression. A three-layer perceptron with 9 inputs, 3 hidden and 1 output neurons was employed. The efficiency of two models was compared by receiver operating characteristic analysis, root mean square and -2 Loglikelihood criteria. Results: The area under the ROC curve (SE), root mean square and -2Loglikelihood of the logistic regression was 0.752 (0.004), 0.3832 and 14769.2, respectively. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the artificial neural network was 0.754 (0.004), 0.3770 and 14757.6, respectively. Conclusions: Based on these three criteria, artificial neural network would give better performance than logistic regression. Although, the difference is statistically significant, it does not seem to be clinically significant. PMID:23113198
Parsaeian, M; Mohammad, K; Mahmoudi, M; Zeraati, H
2012-01-01
The purpose of this investigation was to compare empirically predictive ability of an artificial neural network with a logistic regression in prediction of low back pain. Data from the second national health survey were considered in this investigation. This data includes the information of low back pain and its associated risk factors among Iranian people aged 15 years and older. Artificial neural network and logistic regression models were developed using a set of 17294 data and they were validated in a test set of 17295 data. Hosmer and Lemeshow recommendation for model selection was used in fitting the logistic regression. A three-layer perceptron with 9 inputs, 3 hidden and 1 output neurons was employed. The efficiency of two models was compared by receiver operating characteristic analysis, root mean square and -2 Loglikelihood criteria. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the logistic regression was 0.752 (0.004), 0.3832 and 14769.2, respectively. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the artificial neural network was 0.754 (0.004), 0.3770 and 14757.6, respectively. Based on these three criteria, artificial neural network would give better performance than logistic regression. Although, the difference is statistically significant, it does not seem to be clinically significant.
Advanced colorectal neoplasia risk stratification by penalized logistic regression.
Lin, Yunzhi; Yu, Menggang; Wang, Sijian; Chappell, Richard; Imperiale, Thomas F
2016-08-01
Colorectal cancer is the second leading cause of death from cancer in the United States. To facilitate the efficiency of colorectal cancer screening, there is a need to stratify risk for colorectal cancer among the 90% of US residents who are considered "average risk." In this article, we investigate such risk stratification rules for advanced colorectal neoplasia (colorectal cancer and advanced, precancerous polyps). We use a recently completed large cohort study of subjects who underwent a first screening colonoscopy. Logistic regression models have been used in the literature to estimate the risk of advanced colorectal neoplasia based on quantifiable risk factors. However, logistic regression may be prone to overfitting and instability in variable selection. Since most of the risk factors in our study have several categories, it was tempting to collapse these categories into fewer risk groups. We propose a penalized logistic regression method that automatically and simultaneously selects variables, groups categories, and estimates their coefficients by penalizing the [Formula: see text]-norm of both the coefficients and their differences. Hence, it encourages sparsity in the categories, i.e. grouping of the categories, and sparsity in the variables, i.e. variable selection. We apply the penalized logistic regression method to our data. The important variables are selected, with close categories simultaneously grouped, by penalized regression models with and without the interactions terms. The models are validated with 10-fold cross-validation. The receiver operating characteristic curves of the penalized regression models dominate the receiver operating characteristic curve of naive logistic regressions, indicating a superior discriminative performance. © The Author(s) 2013.
Binary logistic regression-Instrument for assessing museum indoor air impact on exhibits.
Bucur, Elena; Danet, Andrei Florin; Lehr, Carol Blaziu; Lehr, Elena; Nita-Lazar, Mihai
2017-04-01
This paper presents a new way to assess the environmental impact on historical artifacts using binary logistic regression. The prediction of the impact on the exhibits during certain pollution scenarios (environmental impact) was calculated by a mathematical model based on the binary logistic regression; it allows the identification of those environmental parameters from a multitude of possible parameters with a significant impact on exhibitions and ranks them according to their severity effect. Air quality (NO 2 , SO 2 , O 3 and PM 2.5 ) and microclimate parameters (temperature, humidity) monitoring data from a case study conducted within exhibition and storage spaces of the Romanian National Aviation Museum Bucharest have been used for developing and validating the binary logistic regression method and the mathematical model. The logistic regression analysis was used on 794 data combinations (715 to develop of the model and 79 to validate it) by a Statistical Package for Social Sciences (SPSS 20.0). The results from the binary logistic regression analysis demonstrated that from six parameters taken into consideration, four of them present a significant effect upon exhibits in the following order: O 3 >PM 2.5 >NO 2 >humidity followed at a significant distance by the effects of SO 2 and temperature. The mathematical model, developed in this study, correctly predicted 95.1 % of the cumulated effect of the environmental parameters upon the exhibits. Moreover, this model could also be used in the decisional process regarding the preventive preservation measures that should be implemented within the exhibition space. The paper presents a new way to assess the environmental impact on historical artifacts using binary logistic regression. The mathematical model developed on the environmental parameters analyzed by the binary logistic regression method could be useful in a decision-making process establishing the best measures for pollution reduction and preventive preservation of exhibits.
Radiomorphometric analysis of frontal sinus for sex determination.
Verma, Saumya; Mahima, V G; Patil, Karthikeya
2014-09-01
Sex determination of unknown individuals carries crucial significance in forensic research, in cases where fragments of skull persist with no likelihood of identification based on dental arch. In these instances sex determination becomes important to rule out certain number of possibilities instantly and helps in establishing a biological profile of human remains. The aim of the study is to evaluate a mathematical method based on logistic regression analysis capable of ascertaining the sex of individuals in the South Indian population. The study was conducted in the department of Oral Medicine and Radiology. The right and left areas, maximum height, width of frontal sinus were determined in 100 Caldwell views of 50 women and 50 men aged 20 years and above, with the help of Vernier callipers and a square grid with 1 square measuring 1mm(2) in area. Student's t-test, logistic regression analysis. The mean values of variables were greater in men, based on Student's t-test at 5% level of significance. The mathematical model based on logistic regression analysis gave percentage agreement of total area to correctly predict the female gender as 55.2%, of right area as 60.9% and of left area as 55.2%. The areas of the frontal sinus and the logistic regression proved to be unreliable in sex determination. (Logit = 0.924 - 0.00217 × right area).
Risk factors for autistic regression: results of an ambispective cohort study.
Zhang, Ying; Xu, Qiong; Liu, Jing; Li, She-chang; Xu, Xiu
2012-08-01
A subgroup of children diagnosed with autism experience developmental regression featured by a loss of previously acquired abilities. The pathogeny of autistic regression is unknown, although many risk factors likely exist. To better characterize autistic regression and investigate the association between autistic regression and potential influencing factors in Chinese autistic children, we conducted an ambispective study with a cohort of 170 autistic subjects. Analyses by multiple logistic regression showed significant correlations between autistic regression and febrile seizures (OR = 3.53, 95% CI = 1.17-10.65, P = .025), as well as with a family history of neuropsychiatric disorders (OR = 3.62, 95% CI = 1.35-9.71, P = .011). This study suggests that febrile seizures and family history of neuropsychiatric disorders are correlated with autistic regression.
Modeling Governance KB with CATPCA to Overcome Multicollinearity in the Logistic Regression
NASA Astrophysics Data System (ADS)
Khikmah, L.; Wijayanto, H.; Syafitri, U. D.
2017-04-01
The problem often encounters in logistic regression modeling are multicollinearity problems. Data that have multicollinearity between explanatory variables with the result in the estimation of parameters to be bias. Besides, the multicollinearity will result in error in the classification. In general, to overcome multicollinearity in regression used stepwise regression. They are also another method to overcome multicollinearity which involves all variable for prediction. That is Principal Component Analysis (PCA). However, classical PCA in only for numeric data. Its data are categorical, one method to solve the problems is Categorical Principal Component Analysis (CATPCA). Data were used in this research were a part of data Demographic and Population Survey Indonesia (IDHS) 2012. This research focuses on the characteristic of women of using the contraceptive methods. Classification results evaluated using Area Under Curve (AUC) values. The higher the AUC value, the better. Based on AUC values, the classification of the contraceptive method using stepwise method (58.66%) is better than the logistic regression model (57.39%) and CATPCA (57.39%). Evaluation of the results of logistic regression using sensitivity, shows the opposite where CATPCA method (99.79%) is better than logistic regression method (92.43%) and stepwise (92.05%). Therefore in this study focuses on major class classification (using a contraceptive method), then the selected model is CATPCA because it can raise the level of the major class model accuracy.
New robust statistical procedures for the polytomous logistic regression models.
Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro
2018-05-17
This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.
Spashett, Renee; Fernie, Gordon; Reid, Ian C; Cameron, Isobel M
2014-09-01
This study aimed to explore the relationship of Montgomery-Åsberg Depression Rating Scale (MADRS) symptom subtypes with response to electroconvulsive therapy (ECT) and subsequent ECT treatment within 12 months. A consecutive sample of 414 patients with depression receiving ECT in the North East of Scotland was assessed by retrospective chart review. Response rate was defined as greater than or equal to 50% decrease in pretreatment total MADRS score or a posttreatment total MADRS less than or equal to 10. Principal component analyses were conducted on a sample with psychotic features (n = 124) and a sample without psychotic features (n = 290). Scores on extracted factor subscales, clinical and demographic characteristics were assessed for association with response and subsequent ECT treatment within 12 months. Where more than 1 variable was associated with response or subsequent ECT, logistic regression analysis was applied. MADRS symptom subtypes formed 3 separate factors in both samples. Logistic regression revealed older age and high "Despondency" subscale score predicted response in the nonpsychotic group. Older age alone predicted response in the group with psychotic features. Nonpsychotic patients subsequently re-treated with ECT were older than those not prescribed subsequent ECT. No association of variables emerged with subsequent ECT treatment in the group with psychotic features. Being of older age and the presence of psychotic features predicted response. Presence of psychotic features alone predicted subsequent retreatment. Subscale scores of the MADRS are of limited use in predicting which patients with depression will respond to ECT, with the exception of "Despondency" subscale scores in patients without psychotic features.
2011-01-01
Background The relationship between asthma and traffic-related pollutants has received considerable attention. The use of individual-level exposure measures, such as residence location or proximity to emission sources, may avoid ecological biases. Method This study focused on the pediatric Medicaid population in Detroit, MI, a high-risk population for asthma-related events. A population-based matched case-control analysis was used to investigate associations between acute asthma outcomes and proximity of residence to major roads, including freeways. Asthma cases were identified as all children who made at least one asthma claim, including inpatient and emergency department visits, during the three-year study period, 2004-06. Individually matched controls were randomly selected from the rest of the Medicaid population on the basis of non-respiratory related illness. We used conditional logistic regression with distance as both categorical and continuous variables, and examined non-linear relationships with distance using polynomial splines. The conditional logistic regression models were then extended by considering multiple asthma states (based on the frequency of acute asthma outcomes) using polychotomous conditional logistic regression. Results Asthma events were associated with proximity to primary roads with an odds ratio of 0.97 (95% CI: 0.94, 0.99) for a 1 km increase in distance using conditional logistic regression, implying that asthma events are less likely as the distance between the residence and a primary road increases. Similar relationships and effect sizes were found using polychotomous conditional logistic regression. Another plausible exposure metric, a reduced form response surface model that represents atmospheric dispersion of pollutants from roads, was not associated under that exposure model. Conclusions There is moderately strong evidence of elevated risk of asthma close to major roads based on the results obtained in this population-based matched case-control study. PMID:21513554
Risk Factors of Falls in Community-Dwelling Older Adults: Logistic Regression Tree Analysis
ERIC Educational Resources Information Center
Yamashita, Takashi; Noe, Douglas A.; Bailer, A. John
2012-01-01
Purpose of the Study: A novel logistic regression tree-based method was applied to identify fall risk factors and possible interaction effects of those risk factors. Design and Methods: A nationally representative sample of American older adults aged 65 years and older (N = 9,592) in the Health and Retirement Study 2004 and 2006 modules was used.…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rios Velazquez, E; Parmar, C; Narayan, V
Purpose: To compare the complementary value of quantitative radiomic features to that of radiologist-annotated semantic features in predicting EGFR mutations in lung adenocarcinomas. Methods: Pre-operative CT images of 258 lung adenocarcinoma patients were available. Tumors were segmented using the sing-click ensemble segmentation algorithm. A set of radiomic features was extracted using 3D-Slicer. Test-retest reproducibility and unsupervised dimensionality reduction were applied to select a subset of reproducible and independent radiomic features. Twenty semantic annotations were scored by an expert radiologist, describing the tumor, surrounding tissue and associated findings. Minimum-redundancy-maximum-relevance (MRMR) was used to identify the most informative radiomic and semantic featuresmore » in 172 patients (training-set, temporal split). Radiomic, semantic and combined radiomic-semantic logistic regression models to predict EGFR mutations were evaluated in and independent validation dataset of 86 patients using the area under the receiver operating curve (AUC). Results: EGFR mutations were found in 77/172 (45%) and 39/86 (45%) of the training and validation sets, respectively. Univariate AUCs showed a similar range for both feature types: radiomics median AUC = 0.57 (range: 0.50 – 0.62); semantic median AUC = 0.53 (range: 0.50 – 0.64, Wilcoxon p = 0.55). After MRMR feature selection, the best-performing radiomic, semantic, and radiomic-semantic logistic regression models, for EGFR mutations, showed a validation AUC of 0.56 (p = 0.29), 0.63 (p = 0.063) and 0.67 (p = 0.004), respectively. Conclusion: Quantitative volumetric and textural Radiomic features complement the qualitative and semi-quantitative radiologist annotations. The prognostic value of informative qualitative semantic features such as cavitation and lobulation is increased with the addition of quantitative textural features from the tumor region.« less
Tsunoda, Naoko; Hashimoto, Mamoru; Ishikawa, Tomohisa; Fukuhara, Ryuji; Yuki, Seiji; Tanaka, Hibiki; Hatada, Yutaka; Miyagawa, Yusuke; Ikeda, Manabu
2018-05-08
Auditory hallucinations are an important symptom for diagnosing dementia with Lewy bodies (DLB), yet they have received less attention than visual hallucinations. We investigated the clinical features of auditory hallucinations and the possible mechanisms by which they arise in patients with DLB. We recruited 124 consecutive patients with probable DLB (diagnosis based on the DLB International Workshop 2005 criteria; study period: June 2007-January 2015) from the dementia referral center of Kumamoto University Hospital. We used the Neuropsychiatric Inventory to assess the presence of auditory hallucinations, visual hallucinations, and other neuropsychiatric symptoms. We reviewed all available clinical records of patients with auditory hallucinations to assess their clinical features. We performed multiple logistic regression analysis to identify significant independent predictors of auditory hallucinations. Of the 124 patients, 44 (35.5%) had auditory hallucinations and 75 (60.5%) had visual hallucinations. The majority of patients (90.9%) with auditory hallucinations also had visual hallucinations. Auditory hallucinations consisted mostly of human voices, and 90% of patients described them as like hearing a soundtrack of the scene. Multiple logistic regression showed that the presence of auditory hallucinations was significantly associated with female sex (P = .04) and hearing impairment (P = .004). The analysis also revealed independent correlations between the presence of auditory hallucinations and visual hallucinations (P < .001), phantom boarder delusions (P = .001), and depression (P = .038). Auditory hallucinations are common neuropsychiatric symptoms in DLB and usually appear as a background soundtrack accompanying visual hallucinations. Auditory hallucinations in patients with DLB are more likely to occur in women and those with impaired hearing, depression, delusions, or visual hallucinations. © Copyright 2018 Physicians Postgraduate Press, Inc.
Schell, Greggory J; Lavieri, Mariel S; Stein, Joshua D; Musch, David C
2013-12-21
Open-angle glaucoma (OAG) is a prevalent, degenerate ocular disease which can lead to blindness without proper clinical management. The tests used to assess disease progression are susceptible to process and measurement noise. The aim of this study was to develop a methodology which accounts for the inherent noise in the data and improve significant disease progression identification. Longitudinal observations from the Collaborative Initial Glaucoma Treatment Study (CIGTS) were used to parameterize and validate a Kalman filter model and logistic regression function. The Kalman filter estimates the true value of biomarkers associated with OAG and forecasts future values of these variables. We develop two logistic regression models via generalized estimating equations (GEE) for calculating the probability of experiencing significant OAG progression: one model based on the raw measurements from CIGTS and another model based on the Kalman filter estimates of the CIGTS data. Receiver operating characteristic (ROC) curves and associated area under the ROC curve (AUC) estimates are calculated using cross-fold validation. The logistic regression model developed using Kalman filter estimates as data input achieves higher sensitivity and specificity than the model developed using raw measurements. The mean AUC for the Kalman filter-based model is 0.961 while the mean AUC for the raw measurements model is 0.889. Hence, using the probability function generated via Kalman filter estimates and GEE for logistic regression, we are able to more accurately classify patients and instances as experiencing significant OAG progression. A Kalman filter approach for estimating the true value of OAG biomarkers resulted in data input which improved the accuracy of a logistic regression classification model compared to a model using raw measurements as input. This methodology accounts for process and measurement noise to enable improved discrimination between progression and nonprogression in chronic diseases.
Automatic seed selection for segmentation of liver cirrhosis in laparoscopic sequences
NASA Astrophysics Data System (ADS)
Sinha, Rahul; Marcinczak, Jan Marek; Grigat, Rolf-Rainer
2014-03-01
For computer aided diagnosis based on laparoscopic sequences, image segmentation is one of the basic steps which define the success of all further processing. However, many image segmentation algorithms require prior knowledge which is given by interaction with the clinician. We propose an automatic seed selection algorithm for segmentation of liver cirrhosis in laparoscopic sequences which assigns each pixel a probability of being cirrhotic liver tissue or background tissue. Our approach is based on a trained classifier using SIFT and RGB features with PCA. Due to the unique illumination conditions in laparoscopic sequences of the liver, a very low dimensional feature space can be used for classification via logistic regression. The methodology is evaluated on 718 cirrhotic liver and background patches that are taken from laparoscopic sequences of 7 patients. Using a linear classifier we achieve a precision of 91% in a leave-one-patient-out cross-validation. Furthermore, we demonstrate that with logistic probability estimates, seeds with high certainty of being cirrhotic liver tissue can be obtained. For example, our precision of liver seeds increases to 98.5% if only seeds with more than 95% probability of being liver are used. Finally, these automatically selected seeds can be used as priors in Graph Cuts which is demonstrated in this paper.
Automatic prediction of solar flares and super geomagnetic storms
NASA Astrophysics Data System (ADS)
Song, Hui
Space weather is the response of our space environment to the constantly changing Sun. As the new technology advances, mankind has become more and more dependent on space system, satellite-based services. A geomagnetic storm, a disturbance in Earth's magnetosphere, may produce many harmful effects on Earth. Solar flares and Coronal Mass Ejections (CMEs) are believed to be the major causes of geomagnetic storms. Thus, establishing a real time forecasting method for them is very important in space weather study. The topics covered in this dissertation are: the relationship between magnetic gradient and magnetic shear of solar active regions; the relationship between solar flare index and magnetic features of solar active regions; based on these relationships a statistical ordinal logistic regression model is developed to predict the probability of solar flare occurrences in the next 24 hours; and finally the relationship between magnetic structures of CME source regions and geomagnetic storms, in particular, the super storms when the D st index decreases below -200 nT is studied and proved to be able to predict those super storms. The results are briefly summarized as follows: (1) There is a significant correlation between magnetic gradient and magnetic shear of active region. Furthermore, compared with magnetic shear, magnetic gradient might be a better proxy to locate where a large flare occurs. It appears to be more accurate in identification of sources of X-class flares than M-class flares; (2) Flare index, defined by weighting the SXR flares, is proved to have positive correlation with three magnetic features of active region; (3) A statistical ordinal logistic regression model is proposed for solar flare prediction. The results are much better than those data published in the NASA/SDAC service, and comparable to the data provided by the NOAA/SEC complicated expert system. To our knowledge, this is the first time that logistic regression model has been applied in solar physics to predict flare occurrences; (4) The magnetic orientation angle [straight theta], determined from a potential field model, is proved to be able to predict the probability of super geomagnetic storms (D= st <=-200nT). The results show that those active regions associated with | [straight theta]| < 90° are more likely to cause a super geomagnetic storm.
[Optimization of diagnosis indicator selection and inspection plan by 3.0T MRI in breast cancer].
Jiang, Zhongbiao; Wang, Yunhua; He, Zhong; Zhang, Lejun; Zheng, Kai
2013-08-01
To optimize 3.0T MRI diagnosis indicator in breast cancer and to select the best MRI scan program. Totally 45 patients with breast cancers were collected, and another 35 patients with benign breast tumor served as the control group. All patients underwent 3.0T MRI, including T1- weighted imaging (T1WI), fat suppression of the T2-weighted imaging (T2WI), diffusion weighted imaging (DWI), 1H magnetic resonance spectroscopy (1H-MRS) and dynamic contrast enhanced (DCE) sequence. With operation pathology results as the gold standard in the diagnosis of breast diseases, the pathological results of benign and malignant served as dependent variables, and the diagnostic indicators of MRI were taken as independent variables. We put all the indicators of MRI examination under Logistic regression analysis, established the Logistic model, and optimized the diagnosis indicators of MRI examination to further improve MRI scan of breast cancer. By Logistic regression analysis, some indicators were selected in the equation, including the edge feature of the tumor, the time-signal intensity curve (TIC) type and the apparent diffusion coefficient (ADC) value when b=500 s/mm2. The regression equation was Logit (P)=-21.936+20.478X6+3.267X7+ 21.488X3. Valuable indicators in the diagnosis of breast cancer are the edge feature of the tumor, the TIC type and the ADC value when b=500 s/mm2. Combining conventional MRI scan, DWI and dynamic enhanced MRI is a better examination program, while MRS is the complementary program when diagnosis is difficult.
NASA Astrophysics Data System (ADS)
Ariffin, Syaiba Balqish; Midi, Habshah
2014-06-01
This article is concerned with the performance of logistic ridge regression estimation technique in the presence of multicollinearity and high leverage points. In logistic regression, multicollinearity exists among predictors and in the information matrix. The maximum likelihood estimator suffers a huge setback in the presence of multicollinearity which cause regression estimates to have unduly large standard errors. To remedy this problem, a logistic ridge regression estimator is put forward. It is evident that the logistic ridge regression estimator outperforms the maximum likelihood approach for handling multicollinearity. The effect of high leverage points are then investigated on the performance of the logistic ridge regression estimator through real data set and simulation study. The findings signify that logistic ridge regression estimator fails to provide better parameter estimates in the presence of both high leverage points and multicollinearity.
Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P
2015-01-01
This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of more than 10% over the standard classification models, which can be translated to correct labeling of additional 400 - 500 readmissions for heart failure patients in the state of California over a year. Lastly, several key predictor identified from the HCUP data include the disposition location from discharge, the number of chronic conditions, and the number of acute procedures. It would be beneficial to apply simple decision rules obtained from the decision tree in an ad-hoc manner to guide the cohort stratification. It could be potentially beneficial to explore the effect of pairwise interactions between influential predictors when building the logistic regression models for different data strata. Judicious use of the ad-hoc CLR models developed offers insights into future development of prediction models for hospital readmissions, which can lead to better intuition in identifying high-risk patients and developing effective post-discharge care strategies. Lastly, this paper is expected to raise the awareness of collecting data on additional markers and developing necessary database infrastructure for larger-scale exploratory studies on readmission risk prediction.
Beukinga, Roelof J; Hulshoff, Jan B; van Dijk, Lisanne V; Muijs, Christina T; Burgerhof, Johannes G M; Kats-Ugurlu, Gursah; Slart, Riemer H J A; Slump, Cornelis H; Mul, Véronique E M; Plukker, John Th M
2017-05-01
Adequate prediction of tumor response to neoadjuvant chemoradiotherapy (nCRT) in esophageal cancer (EC) patients is important in a more personalized treatment. The current best clinical method to predict pathologic complete response is SUV max in 18 F-FDG PET/CT imaging. To improve the prediction of response, we constructed a model to predict complete response to nCRT in EC based on pretreatment clinical parameters and 18 F-FDG PET/CT-derived textural features. Methods: From a prospectively maintained single-institution database, we reviewed 97 consecutive patients with locally advanced EC and a pretreatment 18 F-FDG PET/CT scan between 2009 and 2015. All patients were treated with nCRT (carboplatin/paclitaxel/41.4 Gy) followed by esophagectomy. We analyzed clinical, geometric, and pretreatment textural features extracted from both 18 F-FDG PET and CT. The current most accurate prediction model with SUV max as a predictor variable was compared with 6 different response prediction models constructed using least absolute shrinkage and selection operator regularized logistic regression. Internal validation was performed to estimate the model's performances. Pathologic response was defined as complete versus incomplete response (Mandard tumor regression grade system 1 vs. 2-5). Results: Pathologic examination revealed 19 (19.6%) complete and 78 (80.4%) incomplete responders. Least absolute shrinkage and selection operator regularization selected the clinical parameters: histologic type and clinical T stage, the 18 F-FDG PET-derived textural feature long run low gray level emphasis, and the CT-derived textural feature run percentage. Introducing these variables to a logistic regression analysis showed areas under the receiver-operating-characteristic curve (AUCs) of 0.78 compared with 0.58 in the SUV max model. The discrimination slopes were 0.17 compared with 0.01, respectively. After internal validation, the AUCs decreased to 0.74 and 0.54, respectively. Conclusion: The predictive values of the constructed models were superior to the standard method (SUV max ). These results can be considered as an initial step in predicting tumor response to nCRT in locally advanced EC. Further research in refining the predictive value of these models is needed to justify omission of surgery. © 2017 by the Society of Nuclear Medicine and Molecular Imaging.
Research and design of logistical information system based on SOA
NASA Astrophysics Data System (ADS)
Zhang, Bo
2013-03-01
Through the study on the existing logistics information systems and SOA technology, based on the current situation of enterprise logistics management and business features, this paper puts forward a SOA-based logistics system design program. This program is made in the WCF framework, with the combination of SOA and the actual characteristics of logistics enterprises, is simple to realize, easy to operate, and has strong expansion characteristic, therefore has high practical value.
Regression analysis for solving diagnosis problem of children's health
NASA Astrophysics Data System (ADS)
Cherkashina, Yu A.; Gerget, O. M.
2016-04-01
The paper includes results of scientific researches. These researches are devoted to the application of statistical techniques, namely, regression analysis, to assess the health status of children in the neonatal period based on medical data (hemostatic parameters, parameters of blood tests, the gestational age, vascular-endothelial growth factor) measured at 3-5 days of children's life. In this paper a detailed description of the studied medical data is given. A binary logistic regression procedure is discussed in the paper. Basic results of the research are presented. A classification table of predicted values and factual observed values is shown, the overall percentage of correct recognition is determined. Regression equation coefficients are calculated, the general regression equation is written based on them. Based on the results of logistic regression, ROC analysis was performed, sensitivity and specificity of the model are calculated and ROC curves are constructed. These mathematical techniques allow carrying out diagnostics of health of children providing a high quality of recognition. The results make a significant contribution to the development of evidence-based medicine and have a high practical importance in the professional activity of the author.
ERIC Educational Resources Information Center
Le, Huy; Marcus, Justin
2012-01-01
This study used Monte Carlo simulation to examine the properties of the overall odds ratio (OOR), which was recently introduced as an index for overall effect size in multiple logistic regression. It was found that the OOR was relatively independent of study base rate and performed better than most commonly used R-square analogs in indexing model…
Sample size determination for logistic regression on a logit-normal distribution.
Kim, Seongho; Heath, Elisabeth; Heilbrun, Lance
2017-06-01
Although the sample size for simple logistic regression can be readily determined using currently available methods, the sample size calculation for multiple logistic regression requires some additional information, such as the coefficient of determination ([Formula: see text]) of a covariate of interest with other covariates, which is often unavailable in practice. The response variable of logistic regression follows a logit-normal distribution which can be generated from a logistic transformation of a normal distribution. Using this property of logistic regression, we propose new methods of determining the sample size for simple and multiple logistic regressions using a normal transformation of outcome measures. Simulation studies and a motivating example show several advantages of the proposed methods over the existing methods: (i) no need for [Formula: see text] for multiple logistic regression, (ii) available interim or group-sequential designs, and (iii) much smaller required sample size.
Kim, Sun Mi; Kim, Yongdai; Jeong, Kuhwan; Jeong, Heeyeong; Kim, Jiyoung
2018-01-01
The aim of this study was to compare the performance of image analysis for predicting breast cancer using two distinct regression models and to evaluate the usefulness of incorporating clinical and demographic data (CDD) into the image analysis in order to improve the diagnosis of breast cancer. This study included 139 solid masses from 139 patients who underwent a ultrasonography-guided core biopsy and had available CDD between June 2009 and April 2010. Three breast radiologists retrospectively reviewed 139 breast masses and described each lesion using the Breast Imaging Reporting and Data System (BI-RADS) lexicon. We applied and compared two regression methods-stepwise logistic (SL) regression and logistic least absolute shrinkage and selection operator (LASSO) regression-in which the BI-RADS descriptors and CDD were used as covariates. We investigated the performances of these regression methods and the agreement of radiologists in terms of test misclassification error and the area under the curve (AUC) of the tests. Logistic LASSO regression was superior (P<0.05) to SL regression, regardless of whether CDD was included in the covariates, in terms of test misclassification errors (0.234 vs. 0.253, without CDD; 0.196 vs. 0.258, with CDD) and AUC (0.785 vs. 0.759, without CDD; 0.873 vs. 0.735, with CDD). However, it was inferior (P<0.05) to the agreement of three radiologists in terms of test misclassification errors (0.234 vs. 0.168, without CDD; 0.196 vs. 0.088, with CDD) and the AUC without CDD (0.785 vs. 0.844, P<0.001), but was comparable to the AUC with CDD (0.873 vs. 0.880, P=0.141). Logistic LASSO regression based on BI-RADS descriptors and CDD showed better performance than SL in predicting the presence of breast cancer. The use of CDD as a supplement to the BI-RADS descriptors significantly improved the prediction of breast cancer using logistic LASSO regression.
Staley, James R; Jones, Edmund; Kaptoge, Stephen; Butterworth, Adam S; Sweeting, Michael J; Wood, Angela M; Howson, Joanna M M
2017-06-01
Logistic regression is often used instead of Cox regression to analyse genome-wide association studies (GWAS) of single-nucleotide polymorphisms (SNPs) and disease outcomes with cohort and case-cohort designs, as it is less computationally expensive. Although Cox and logistic regression models have been compared previously in cohort studies, this work does not completely cover the GWAS setting nor extend to the case-cohort study design. Here, we evaluated Cox and logistic regression applied to cohort and case-cohort genetic association studies using simulated data and genetic data from the EPIC-CVD study. In the cohort setting, there was a modest improvement in power to detect SNP-disease associations using Cox regression compared with logistic regression, which increased as the disease incidence increased. In contrast, logistic regression had more power than (Prentice weighted) Cox regression in the case-cohort setting. Logistic regression yielded inflated effect estimates (assuming the hazard ratio is the underlying measure of association) for both study designs, especially for SNPs with greater effect on disease. Given logistic regression is substantially more computationally efficient than Cox regression in both settings, we propose a two-step approach to GWAS in cohort and case-cohort studies. First to analyse all SNPs with logistic regression to identify associated variants below a pre-defined P-value threshold, and second to fit Cox regression (appropriately weighted in case-cohort studies) to those identified SNPs to ensure accurate estimation of association with disease.
Austin, Peter C
2010-04-22
Multilevel logistic regression models are increasingly being used to analyze clustered data in medical, public health, epidemiological, and educational research. Procedures for estimating the parameters of such models are available in many statistical software packages. There is currently little evidence on the minimum number of clusters necessary to reliably fit multilevel regression models. We conducted a Monte Carlo study to compare the performance of different statistical software procedures for estimating multilevel logistic regression models when the number of clusters was low. We examined procedures available in BUGS, HLM, R, SAS, and Stata. We found that there were qualitative differences in the performance of different software procedures for estimating multilevel logistic models when the number of clusters was low. Among the likelihood-based procedures, estimation methods based on adaptive Gauss-Hermite approximations to the likelihood (glmer in R and xtlogit in Stata) or adaptive Gaussian quadrature (Proc NLMIXED in SAS) tended to have superior performance for estimating variance components when the number of clusters was small, compared to software procedures based on penalized quasi-likelihood. However, only Bayesian estimation with BUGS allowed for accurate estimation of variance components when there were fewer than 10 clusters. For all statistical software procedures, estimation of variance components tended to be poor when there were only five subjects per cluster, regardless of the number of clusters.
Zhou, Jinzhe; Zhou, Yanbing; Cao, Shougen; Li, Shikuan; Wang, Hao; Niu, Zhaojian; Chen, Dong; Wang, Dongsheng; Lv, Liang; Zhang, Jian; Li, Yu; Jiao, Xuelong; Tan, Xiaojie; Zhang, Jianli; Wang, Haibo; Zhang, Bingyuan; Lu, Yun; Sun, Zhenqing
2016-01-01
Reporting of surgical complications is common, but few provide information about the severity and estimate risk factors of complications. If have, but lack of specificity. We retrospectively analyzed data on 2795 gastric cancer patients underwent surgical procedure at the Affiliated Hospital of Qingdao University between June 2007 and June 2012, established multivariate logistic regression model to predictive risk factors related to the postoperative complications according to the Clavien-Dindo classification system. Twenty-four out of 86 variables were identified statistically significant in univariate logistic regression analysis, 11 significant variables entered multivariate analysis were employed to produce the risk model. Liver cirrhosis, diabetes mellitus, Child classification, invasion of neighboring organs, combined resection, introperative transfusion, Billroth II anastomosis of reconstruction, malnutrition, surgical volume of surgeons, operating time and age were independent risk factors for postoperative complications after gastrectomy. Based on logistic regression equation, p=Exp∑BiXi / (1+Exp∑BiXi), multivariate logistic regression predictive model that calculated the risk of postoperative morbidity was developed, p = 1/(1 + e((4.810-1.287X1-0.504X2-0.500X3-0.474X4-0.405X5-0.318X6-0.316X7-0.305X8-0.278X9-0.255X10-0.138X11))). The accuracy, sensitivity and specificity of the model to predict the postoperative complications were 86.7%, 76.2% and 88.6%, respectively. This risk model based on Clavien-Dindo grading severity of complications system and logistic regression analysis can predict severe morbidity specific to an individual patient's risk factors, estimate patients' risks and benefits of gastric surgery as an accurate decision-making tool and may serve as a template for the development of risk models for other surgical groups.
Modeling health survey data with excessive zero and K responses.
Lin, Ting Hsiang; Tsai, Min-Hsiao
2013-04-30
Zero-inflated Poisson regression is a popular tool used to analyze data with excessive zeros. Although much work has already been performed to fit zero-inflated data, most models heavily depend on special features of the individual data. To be specific, this means that there is a sizable group of respondents who endorse the same answers making the data have peaks. In this paper, we propose a new model with the flexibility to model excessive counts other than zero, and the model is a mixture of multinomial logistic and Poisson regression, in which the multinomial logistic component models the occurrence of excessive counts, including zeros, K (where K is a positive integer) and all other values. The Poisson regression component models the counts that are assumed to follow a Poisson distribution. Two examples are provided to illustrate our models when the data have counts containing many ones and sixes. As a result, the zero-inflated and K-inflated models exhibit a better fit than the zero-inflated Poisson and standard Poisson regressions. Copyright © 2012 John Wiley & Sons, Ltd.
The cross-validated AUC for MCP-logistic regression with high-dimensional data.
Jiang, Dingfeng; Huang, Jian; Zhang, Ying
2013-10-01
We propose a cross-validated area under the receiving operator characteristic (ROC) curve (CV-AUC) criterion for tuning parameter selection for penalized methods in sparse, high-dimensional logistic regression models. We use this criterion in combination with the minimax concave penalty (MCP) method for variable selection. The CV-AUC criterion is specifically designed for optimizing the classification performance for binary outcome data. To implement the proposed approach, we derive an efficient coordinate descent algorithm to compute the MCP-logistic regression solution surface. Simulation studies are conducted to evaluate the finite sample performance of the proposed method and its comparison with the existing methods including the Akaike information criterion (AIC), Bayesian information criterion (BIC) or Extended BIC (EBIC). The model selected based on the CV-AUC criterion tends to have a larger predictive AUC and smaller classification error than those with tuning parameters selected using the AIC, BIC or EBIC. We illustrate the application of the MCP-logistic regression with the CV-AUC criterion on three microarray datasets from the studies that attempt to identify genes related to cancers. Our simulation studies and data examples demonstrate that the CV-AUC is an attractive method for tuning parameter selection for penalized methods in high-dimensional logistic regression models.
Jiang, Yanlin; Xu, Hong; Zhang, Hao; Ou, Xunyan; Xu, Zhen; Ai, Liping; Sun, Lisha; Liu, Caigang
2017-09-22
The current management of the axilla in level 1 node-positive breast cancer patients is axillary lymph node dissection regardless of the status of the level 2 axillary lymph nodes. The goal of this study was to develop a nomogram predicting the probability of level 2 axillary lymph node metastasis (L-2-ALNM) in patients with level 1 axillary node-positive breast cancer. We reviewed the records of 974 patients with pathology-confirmed level 1 node-positive breast cancer between 2010 and 2014 at the Liaoning Cancer Hospital and Institute. The patients were randomized 1:1 and divided into a modeling group and a validation group. Clinical and pathological features of the patients were assessed with uni- and multivariate logistic regression. A nomogram based on independent predictors for the L-2-ALNM identified by multivariate logistic regression was constructed. Independent predictors of L-2-ALNM by the multivariate logistic regression analysis included tumor size, Ki-67 status, histological grade, and number of positive level 1 axillary lymph nodes. The areas under the receiver operating characteristic curve of the modeling set and the validation set were 0.828 and 0.816, respectively. The false-negative rates of the L-2-ALNM nomogram were 1.82% and 7.41% for the predicted probability cut-off points of < 6% and < 10%, respectively, when applied to the validation group. Our nomogram could help predict L-2-ALNM in patients with level 1 axillary lymph node metastasis. Patients with a low probability of L-2-ALNM could be spared level 2 axillary lymph node dissection, thereby reducing postoperative morbidity.
NASA Astrophysics Data System (ADS)
Demir, Gökhan; aytekin, mustafa; banu ikizler, sabriye; angın, zekai
2013-04-01
The North Anatolian Fault is know as one of the most active and destructive fault zone which produced many earthquakes with high magnitudes. Along this fault zone, the morphology and the lithological features are prone to landsliding. However, many earthquake induced landslides were recorded by several studies along this fault zone, and these landslides caused both injuiries and live losts. Therefore, a detailed landslide susceptibility assessment for this area is indispancable. In this context, a landslide susceptibility assessment for the 1445 km2 area in the Kelkit River valley a part of North Anatolian Fault zone (Eastern Black Sea region of Turkey) was intended with this study, and the results of this study are summarized here. For this purpose, geographical information system (GIS) and a bivariate statistical model were used. Initially, Landslide inventory maps are prepared by using landslide data determined by field surveys and landslide data taken from General Directorate of Mineral Research and Exploration. The landslide conditioning factors are considered to be lithology, slope gradient, slope aspect, topographical elevation, distance to streams, distance to roads and distance to faults, drainage density and fault density. ArcGIS package was used to manipulate and analyze all the collected data Logistic regression method was applied to create a landslide susceptibility map. Landslide susceptibility maps were divided into five susceptibility regions such as very low, low, moderate, high and very high. The result of the analysis was verified using the inventoried landslide locations and compared with the produced probability model. For this purpose, Area Under Curvature (AUC) approach was applied, and a AUC value was obtained. Based on this AUC value, the obtained landslide susceptibility map was concluded as satisfactory. Keywords: North Anatolian Fault Zone, Landslide susceptibility map, Geographical Information Systems, Logistic Regression Analysis.
NASA Astrophysics Data System (ADS)
Mazza, F.; Da Silva, M. P.; Le Callet, P.; Heynderickx, I. E. J.
2015-03-01
Multimedia quality assessment has been an important research topic during the last decades. The original focus on artifact visibility has been extended during the years to aspects as image aesthetics, interestingness and memorability. More recently, Fedorovskaya proposed the concept of 'image psychology': this concept focuses on additional quality dimensions related to human content processing. While these additional dimensions are very valuable in understanding preferences, it is very hard to define, isolate and measure their effect on quality. In this paper we continue our research on face pictures investigating which image factors influence context perception. We collected perceived fit of a set of images to various content categories. These categories were selected based on current typologies in social networks. Logistic regression was adopted to model category fit based on images features. In this model we used both low level and high level features, the latter focusing on complex features related to image content. In order to extract these high level features, we relied on crowdsourcing, since computer vision algorithms are not yet sufficiently accurate for the features we needed. Our results underline the importance of some high level content features, e.g. the dress of the portrayed person and scene setting, in categorizing image.
Classification of vegetation types in military region
NASA Astrophysics Data System (ADS)
Gonçalves, Miguel; Silva, Jose Silvestre; Bioucas-Dias, Jose
2015-10-01
In decision-making process regarding planning and execution of military operations, the terrain is a determining factor. Aerial photographs are a source of vital information for the success of an operation in hostile region, namely when the cartographic information behind enemy lines is scarce or non-existent. The objective of present work is the development of a tool capable of processing aerial photos. The methodology implemented starts with feature extraction, followed by the application of an automatic selector of features. The next step, using the k-fold cross validation technique, estimates the input parameters for the following classifiers: Sparse Multinomial Logist Regression (SMLR), K Nearest Neighbor (KNN), Linear Classifier using Principal Component Expansion on the Joint Data (PCLDC) and Multi-Class Support Vector Machine (MSVM). These classifiers were used in two different studies with distinct objectives: discrimination of vegetation's density and identification of vegetation's main components. It was found that the best classifier on the first approach is the Sparse Logistic Multinomial Regression (SMLR). On the second approach, the implemented methodology applied to high resolution images showed that the better performance was achieved by KNN classifier and PCLDC. Comparing the two approaches there is a multiscale issue, in which for different resolutions, the best solution to the problem requires different classifiers and the extraction of different features.
The crux of the method: assumptions in ordinary least squares and logistic regression.
Long, Rebecca G
2008-10-01
Logistic regression has increasingly become the tool of choice when analyzing data with a binary dependent variable. While resources relating to the technique are widely available, clear discussions of why logistic regression should be used in place of ordinary least squares regression are difficult to find. The current paper compares and contrasts the assumptions of ordinary least squares with those of logistic regression and explains why logistic regression's looser assumptions make it adept at handling violations of the more important assumptions in ordinary least squares.
Pfeiffer, R M; Riedl, R
2015-08-15
We assess the asymptotic bias of estimates of exposure effects conditional on covariates when summary scores of confounders, instead of the confounders themselves, are used to analyze observational data. First, we study regression models for cohort data that are adjusted for summary scores. Second, we derive the asymptotic bias for case-control studies when cases and controls are matched on a summary score, and then analyzed either using conditional logistic regression or by unconditional logistic regression adjusted for the summary score. Two scores, the propensity score (PS) and the disease risk score (DRS) are studied in detail. For cohort analysis, when regression models are adjusted for the PS, the estimated conditional treatment effect is unbiased only for linear models, or at the null for non-linear models. Adjustment of cohort data for DRS yields unbiased estimates only for linear regression; all other estimates of exposure effects are biased. Matching cases and controls on DRS and analyzing them using conditional logistic regression yields unbiased estimates of exposure effect, whereas adjusting for the DRS in unconditional logistic regression yields biased estimates, even under the null hypothesis of no association. Matching cases and controls on the PS yield unbiased estimates only under the null for both conditional and unconditional logistic regression, adjusted for the PS. We study the bias for various confounding scenarios and compare our asymptotic results with those from simulations with limited sample sizes. To create realistic correlations among multiple confounders, we also based simulations on a real dataset. Copyright © 2015 John Wiley & Sons, Ltd.
Detrended fluctuation analysis for major depressive disorder.
Mumtaz, Wajid; Malik, Aamir Saeed; Ali, Syed Saad Azhar; Yasin, Mohd Azhar Mohd; Amin, Hafeezullah
2015-01-01
Clinical utility of Electroencephalography (EEG) based diagnostic studies is less clear for major depressive disorder (MDD). In this paper, a novel machine learning (ML) scheme was presented to discriminate the MDD patients and healthy controls. The proposed method inherently involved feature extraction, selection, classification and validation. The EEG data acquisition involved eyes closed (EC) and eyes open (EO) conditions. At feature extraction stage, the de-trended fluctuation analysis (DFA) was performed, based on the EEG data, to achieve scaling exponents. The DFA was performed to analyzes the presence or absence of long-range temporal correlations (LRTC) in the recorded EEG data. The scaling exponents were used as input features to our proposed system. At feature selection stage, 3 different techniques were used for comparison purposes. Logistic regression (LR) classifier was employed. The method was validated by a 10-fold cross-validation. As results, we have observed that the effect of 3 different reference montages on the computed features. The proposed method employed 3 different types of feature selection techniques for comparison purposes as well. The results show that the DFA analysis performed better in LE data compared with the IR and AR data. In addition, during Wilcoxon ranking, the AR performed better than LE and IR. Based on the results, it was concluded that the DFA provided useful information to discriminate the MDD patients and with further validation can be employed in clinics for diagnosis of MDD.
Using Dominance Analysis to Determine Predictor Importance in Logistic Regression
ERIC Educational Resources Information Center
Azen, Razia; Traxel, Nicole
2009-01-01
This article proposes an extension of dominance analysis that allows researchers to determine the relative importance of predictors in logistic regression models. Criteria for choosing logistic regression R[superscript 2] analogues were determined and measures were selected that can be used to perform dominance analysis in logistic regression. A…
Use and interpretation of logistic regression in habitat-selection studies
Keating, Kim A.; Cherry, Steve
2004-01-01
Logistic regression is an important tool for wildlife habitat-selection studies, but the method frequently has been misapplied due to an inadequate understanding of the logistic model, its interpretation, and the influence of sampling design. To promote better use of this method, we review its application and interpretation under 3 sampling designs: random, case-control, and use-availability. Logistic regression is appropriate for habitat use-nonuse studies employing random sampling and can be used to directly model the conditional probability of use in such cases. Logistic regression also is appropriate for studies employing case-control sampling designs, but careful attention is required to interpret results correctly. Unless bias can be estimated or probability of use is small for all habitats, results of case-control studies should be interpreted as odds ratios, rather than probability of use or relative probability of use. When data are gathered under a use-availability design, logistic regression can be used to estimate approximate odds ratios if probability of use is small, at least on average. More generally, however, logistic regression is inappropriate for modeling habitat selection in use-availability studies. In particular, using logistic regression to fit the exponential model of Manly et al. (2002:100) does not guarantee maximum-likelihood estimates, valid probabilities, or valid likelihoods. We show that the resource selection function (RSF) commonly used for the exponential model is proportional to a logistic discriminant function. Thus, it may be used to rank habitats with respect to probability of use and to identify important habitat characteristics or their surrogates, but it is not guaranteed to be proportional to probability of use. Other problems associated with the exponential model also are discussed. We describe an alternative model based on Lancaster and Imbens (1996) that offers a method for estimating conditional probability of use in use-availability studies. Although promising, this model fails to converge to a unique solution in some important situations. Further work is needed to obtain a robust method that is broadly applicable to use-availability studies.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yahya, Noorazrul, E-mail: noorazrul.yahya@research.uwa.edu.au; Ebert, Martin A.; Bulsara, Max
Purpose: Given the paucity of available data concerning radiotherapy-induced urinary toxicity, it is important to ensure derivation of the most robust models with superior predictive performance. This work explores multiple statistical-learning strategies for prediction of urinary symptoms following external beam radiotherapy of the prostate. Methods: The performance of logistic regression, elastic-net, support-vector machine, random forest, neural network, and multivariate adaptive regression splines (MARS) to predict urinary symptoms was analyzed using data from 754 participants accrued by TROG03.04-RADAR. Predictive features included dose-surface data, comorbidities, and medication-intake. Four symptoms were analyzed: dysuria, haematuria, incontinence, and frequency, each with three definitions (grade ≥more » 1, grade ≥ 2 and longitudinal) with event rate between 2.3% and 76.1%. Repeated cross-validations producing matched models were implemented. A synthetic minority oversampling technique was utilized in endpoints with rare events. Parameter optimization was performed on the training data. Area under the receiver operating characteristic curve (AUROC) was used to compare performance using sample size to detect differences of ≥0.05 at the 95% confidence level. Results: Logistic regression, elastic-net, random forest, MARS, and support-vector machine were the highest-performing statistical-learning strategies in 3, 3, 3, 2, and 1 endpoints, respectively. Logistic regression, MARS, elastic-net, random forest, neural network, and support-vector machine were the best, or were not significantly worse than the best, in 7, 7, 5, 5, 3, and 1 endpoints. The best-performing statistical model was for dysuria grade ≥ 1 with AUROC ± standard deviation of 0.649 ± 0.074 using MARS. For longitudinal frequency and dysuria grade ≥ 1, all strategies produced AUROC>0.6 while all haematuria endpoints and longitudinal incontinence models produced AUROC<0.6. Conclusions: Logistic regression and MARS were most likely to be the best-performing strategy for the prediction of urinary symptoms with elastic-net and random forest producing competitive results. The predictive power of the models was modest and endpoint-dependent. New features, including spatial dose maps, may be necessary to achieve better models.« less
NASA Astrophysics Data System (ADS)
Co, Noelle Easter C.; Brown, Donald E.; Burns, James T.
2018-05-01
This study applies data science approaches (random forest and logistic regression) to determine the extent to which macro-scale corrosion damage features govern the crack formation behavior in AA7050-T7451. Each corrosion morphology has a set of corresponding predictor variables (pit depth, volume, area, diameter, pit density, total fissure length, surface roughness metrics, etc.) describing the shape of the corrosion damage. The values of the predictor variables are obtained from white light interferometry, x-ray tomography, and scanning electron microscope imaging of the corrosion damage. A permutation test is employed to assess the significance of the logistic and random forest model predictions. Results indicate minimal relationship between the macro-scale corrosion feature predictor variables and fatigue crack initiation. These findings suggest that the macro-scale corrosion features and their interactions do not solely govern the crack formation behavior. While these results do not imply that the macro-features have no impact, they do suggest that additional parameters must be considered to rigorously inform the crack formation location.
NASA Astrophysics Data System (ADS)
Mahrooghy, Majid; Ashraf, Ahmed B.; Daye, Dania; Mies, Carolyn; Rosen, Mark; Feldman, Michael; Kontos, Despina
2014-03-01
We evaluate the prognostic value of sparse representation-based features by applying the K-SVD algorithm on multiparametric kinetic, textural, and morphologic features in breast dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI). K-SVD is an iterative dimensionality reduction method that optimally reduces the initial feature space by updating the dictionary columns jointly with the sparse representation coefficients. Therefore, by using K-SVD, we not only provide sparse representation of the features and condense the information in a few coefficients but also we reduce the dimensionality. The extracted K-SVD features are evaluated by a machine learning algorithm including a logistic regression classifier for the task of classifying high versus low breast cancer recurrence risk as determined by a validated gene expression assay. The features are evaluated using ROC curve analysis and leave one-out cross validation for different sparse representation and dimensionality reduction numbers. Optimal sparse representation is obtained when the number of dictionary elements is 4 (K=4) and maximum non-zero coefficients is 2 (L=2). We compare K-SVD with ANOVA based feature selection for the same prognostic features. The ROC results show that the AUC of the K-SVD based (K=4, L=2), the ANOVA based, and the original features (i.e., no dimensionality reduction) are 0.78, 0.71. and 0.68, respectively. From the results, it can be inferred that by using sparse representation of the originally extracted multi-parametric, high-dimensional data, we can condense the information on a few coefficients with the highest predictive value. In addition, the dimensionality reduction introduced by K-SVD can prevent models from over-fitting.
Li, Yi; Tseng, Yufeng J.; Pan, Dahua; Liu, Jianzhong; Kern, Petra S.; Gerberick, G. Frank; Hopfinger, Anton J.
2008-01-01
Currently, the only validated methods to identify skin sensitization effects are in vivo models, such as the Local Lymph Node Assay (LLNA) and guinea pig studies. There is a tremendous need, in particular due to novel legislation, to develop animal alternatives, eg. Quantitative Structure-Activity Relationship (QSAR) models. Here, QSAR models for skin sensitization using LLNA data have been constructed. The descriptors used to generate these models are derived from the 4D-molecular similarity paradigm and are referred to as universal 4D-fingerprints. A training set of 132 structurally diverse compounds and a test set of 15 structurally diverse compounds were used in this study. The statistical methodologies used to build the models are logistic regression (LR), and partial least square coupled logistic regression (PLS-LR), which prove to be effective tools for studying skin sensitization measures expressed in the two categorical terms of sensitizer and non-sensitizer. QSAR models with low values of the Hosmer-Lemeshow goodness-of-fit statistic, χHL2, are significant and predictive. For the training set, the cross-validated prediction accuracy of the logistic regression models ranges from 77.3% to 78.0%, while that of PLS-logistic regression models ranges from 87.1% to 89.4%. For the test set, the prediction accuracy of logistic regression models ranges from 80.0%-86.7%, while that of PLS-logistic regression models ranges from 73.3%-80.0%. The QSAR models are made up of 4D-fingerprints related to aromatic atoms, hydrogen bond acceptors and negatively partially charged atoms. PMID:17226934
Hierarchical Bayesian Logistic Regression to forecast metabolic control in type 2 DM patients.
Dagliati, Arianna; Malovini, Alberto; Decata, Pasquale; Cogni, Giulia; Teliti, Marsida; Sacchi, Lucia; Cerra, Carlo; Chiovato, Luca; Bellazzi, Riccardo
2016-01-01
In this work we present our efforts in building a model able to forecast patients' changes in clinical conditions when repeated measurements are available. In this case the available risk calculators are typically not applicable. We propose a Hierarchical Bayesian Logistic Regression model, which allows taking into account individual and population variability in model parameters estimate. The model is used to predict metabolic control and its variation in type 2 diabetes mellitus. In particular we have analyzed a population of more than 1000 Italian type 2 diabetic patients, collected within the European project Mosaic. The results obtained in terms of Matthews Correlation Coefficient are significantly better than the ones gathered with standard logistic regression model, based on data pooling.
Chan, Siew Foong; Deeks, Jonathan J; Macaskill, Petra; Irwig, Les
2008-01-01
To compare three predictive models based on logistic regression to estimate adjusted likelihood ratios allowing for interdependency between diagnostic variables (tests). This study was a review of the theoretical basis, assumptions, and limitations of published models; and a statistical extension of methods and application to a case study of the diagnosis of obstructive airways disease based on history and clinical examination. Albert's method includes an offset term to estimate an adjusted likelihood ratio for combinations of tests. Spiegelhalter and Knill-Jones method uses the unadjusted likelihood ratio for each test as a predictor and computes shrinkage factors to allow for interdependence. Knottnerus' method differs from the other methods because it requires sequencing of tests, which limits its application to situations where there are few tests and substantial data. Although parameter estimates differed between the models, predicted "posttest" probabilities were generally similar. Construction of predictive models using logistic regression is preferred to the independence Bayes' approach when it is important to adjust for dependency of tests errors. Methods to estimate adjusted likelihood ratios from predictive models should be considered in preference to a standard logistic regression model to facilitate ease of interpretation and application. Albert's method provides the most straightforward approach.
NASA Astrophysics Data System (ADS)
Daye, Dania; Bobo, Ezra; Baumann, Bethany; Ioannou, Antonios; Conant, Emily F.; Maidment, Andrew D. A.; Kontos, Despina
2011-03-01
Mammographic parenchymal texture patterns have been shown to be related to breast cancer risk. Yet, little is known about the biological basis underlying this association. Here, we investigate the potential of mammographic parenchymal texture patterns as an inherent phenotypic imaging marker of endogenous hormonal exposure of the breast tissue. Digital mammographic (DM) images in the cranio-caudal (CC) view of the unaffected breast from 138 women diagnosed with unilateral breast cancer were retrospectively analyzed. Menopause status was used as a surrogate marker of endogenous hormonal activity. Retroareolar 2.5cm2 ROIs were segmented from the post-processed DM images using an automated algorithm. Parenchymal texture features of skewness, coarseness, contrast, energy, homogeneity, grey-level spatial correlation, and fractal dimension were computed. Receiver operating characteristic (ROC) curve analysis was performed to evaluate feature classification performance in distinguishing between 72 pre- and 66 post-menopausal women. Logistic regression was performed to assess the independent effect of each texture feature in predicting menopause status. ROC analysis showed that texture features have inherent capacity to distinguish between pre- and post-menopausal statuses (AUC>0.5, p<0.05). Logistic regression including all texture features yielded an ROC curve with an AUC of 0.76. Addition of age at menarche, ethnicity, contraception use and hormonal replacement therapy (HRT) use lead to a modest model improvement (AUC=0.78) while texture features maintained significant contribution (p<0.05). The observed differences in parenchymal texture features between pre- and post- menopausal women suggest that mammographic texture can potentially serve as a surrogate imaging marker of endogenous hormonal activity.
Application of L1/2 regularization logistic method in heart disease diagnosis.
Zhang, Bowen; Chai, Hua; Yang, Ziyi; Liang, Yong; Chu, Gejin; Liu, Xiaoying
2014-01-01
Heart disease has become the number one killer of human health, and its diagnosis depends on many features, such as age, blood pressure, heart rate and other dozens of physiological indicators. Although there are so many risk factors, doctors usually diagnose the disease depending on their intuition and experience, which requires a lot of knowledge and experience for correct determination. To find the hidden medical information in the existing clinical data is a noticeable and powerful approach in the study of heart disease diagnosis. In this paper, sparse logistic regression method is introduced to detect the key risk factors using L(1/2) regularization on the real heart disease data. Experimental results show that the sparse logistic L(1/2) regularization method achieves fewer but informative key features than Lasso, SCAD, MCP and Elastic net regularization approaches. Simultaneously, the proposed method can cut down the computational complexity, save cost and time to undergo medical tests and checkups, reduce the number of attributes needed to be taken from patients.
Ameye, Lieveke; Fischerova, Daniela; Epstein, Elisabeth; Melis, Gian Benedetto; Guerriero, Stefano; Van Holsbeke, Caroline; Savelli, Luca; Fruscio, Robert; Lissoni, Andrea Alberto; Testa, Antonia Carla; Veldman, Joan; Vergote, Ignace; Van Huffel, Sabine; Bourne, Tom; Valentin, Lil
2010-01-01
Objectives To prospectively assess the diagnostic performance of simple ultrasound rules to predict benignity/malignancy in an adnexal mass and to test the performance of the risk of malignancy index, two logistic regression models, and subjective assessment of ultrasonic findings by an experienced ultrasound examiner in adnexal masses for which the simple rules yield an inconclusive result. Design Prospective temporal and external validation of simple ultrasound rules to distinguish benign from malignant adnexal masses. The rules comprised five ultrasonic features (including shape, size, solidity, and results of colour Doppler examination) to predict a malignant tumour (M features) and five to predict a benign tumour (B features). If one or more M features were present in the absence of a B feature, the mass was classified as malignant. If one or more B features were present in the absence of an M feature, it was classified as benign. If both M features and B features were present, or if none of the features was present, the simple rules were inconclusive. Setting 19 ultrasound centres in eight countries. Participants 1938 women with an adnexal mass examined with ultrasound by the principal investigator at each centre with a standardised research protocol. Reference standard Histological classification of the excised adnexal mass as benign or malignant. Main outcome measures Diagnostic sensitivity and specificity. Results Of the 1938 patients with an adnexal mass, 1396 (72%) had benign tumours, 373 (19.2%) had primary invasive tumours, 111 (5.7%) had borderline malignant tumours, and 58 (3%) had metastatic tumours in the ovary. The simple rules yielded a conclusive result in 1501 (77%) masses, for which they resulted in a sensitivity of 92% (95% confidence interval 89% to 94%) and a specificity of 96% (94% to 97%). The corresponding sensitivity and specificity of subjective assessment were 91% (88% to 94%) and 96% (94% to 97%). In the 357 masses for which the simple rules yielded an inconclusive result and with available results of CA-125 measurements, the sensitivities were 89% (83% to 93%) for subjective assessment, 50% (42% to 58%) for the risk of malignancy index, 89% (83% to 93%) for logistic regression model 1, and 82% (75% to 87%) for logistic regression model 2; the corresponding specificities were 78% (72% to 83%), 84% (78% to 88%), 44% (38% to 51%), and 48% (42% to 55%). Use of the simple rules as a triage test and subjective assessment for those masses for which the simple rules yielded an inconclusive result gave a sensitivity of 91% (88% to 93%) and a specificity of 93% (91% to 94%), compared with a sensitivity of 90% (88% to 93%) and a specificity of 93% (91% to 94%) when subjective assessment was used in all masses. Conclusions The use of the simple rules has the potential to improve the management of women with adnexal masses. In adnexal masses for which the rules yielded an inconclusive result, subjective assessment of ultrasonic findings by an experienced ultrasound examiner was the most accurate diagnostic test; the risk of malignancy index and the two regression models were not useful. PMID:21156740
Applying Kaplan-Meier to Item Response Data
ERIC Educational Resources Information Center
McNeish, Daniel
2018-01-01
Some IRT models can be equivalently modeled in alternative frameworks such as logistic regression. Logistic regression can also model time-to-event data, which concerns the probability of an event occurring over time. Using the relation between time-to-event models and logistic regression and the relation between logistic regression and IRT, this…
Tanaka, N; Kunihiro, Y; Kubo, M; Kawano, R; Oishi, K; Ueda, K; Gondo, T
2018-05-29
To identify characteristic high-resolution computed tomography (CT) findings for individual collagen vascular disease (CVD)-related interstitial pneumonias (IPs). The HRCT findings of 187 patients with CVD, including 55 patients with rheumatoid arthritis (RA), 50 with systemic sclerosis (SSc), 46 with polymyositis/dermatomyositis (PM/DM), 15 with mixed connective tissue disease, 11 with primary Sjögren's syndrome, and 10 with systemic lupus erythematosus, were evaluated. Lung parenchymal abnormalities were compared among CVDs using χ 2 test, Kruskal-Wallis test, and multiple logistic regression analysis. A CT-pathology correlation was performed in 23 patients. In RA-IP, honeycombing was identified as the significant indicator based on multiple logistic regression analyses. Traction bronchiectasis (81.8%) was further identified as the most frequent finding based on χ 2 test. In SSc IP, lymph node enlargement and oesophageal dilatation were identified as the indicators based on multiple logistic regression analyses, and ground-glass opacity (GGO) was the most extensive based on Kruskal-Wallis test, which reflects the higher frequency of the pathological nonspecific interstitial pneumonia (NSIP) pattern present in the CT-pathology correlation. In PM/DM IP, airspace consolidation and the absence of honeycombing were identified as the indicators based on multiple logistic regression analyses, and predominance of consolidation over GGO (32.6%) and predominant subpleural distribution of GGO/consolidation (41.3%) were further identified as the most frequent findings based on χ 2 test, which reflects the higher frequency of the pathological NSIP and/or the organising pneumonia patterns present in the CT-pathology correlation. Several characteristic high-resolution CT findings with utility for estimating underlying CVD were identified. Copyright © 2018 The Royal College of Radiologists. Published by Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Edens, John F.; Ruiz, Mark A.
2006-01-01
This study examined the effects of defensive responding on the prediction of institutional misconduct among male inmates (N = 349) who completed the Personality Assessment Inventory (L. C. Morey, 1991). Hierarchical logistic regression analyses demonstrated significant main effects for the Antisocial Features (ANT) scale as well as main effects…
Evolution of accelerometer methods for physical activity research.
Troiano, Richard P; McClain, James J; Brychta, Robert J; Chen, Kong Y
2014-07-01
The technology and application of current accelerometer-based devices in physical activity (PA) research allow the capture and storage or transmission of large volumes of raw acceleration signal data. These rich data not only provide opportunities to improve PA characterisation, but also bring logistical and analytic challenges. We discuss how researchers and developers from multiple disciplines are responding to the analytic challenges and how advances in data storage, transmission and big data computing will minimise logistical challenges. These new approaches also bring the need for several paradigm shifts for PA researchers, including a shift from count-based approaches and regression calibrations for PA energy expenditure (PAEE) estimation to activity characterisation and EE estimation based on features extracted from raw acceleration signals. Furthermore, a collaborative approach towards analytic methods is proposed to facilitate PA research, which requires a shift away from multiple independent calibration studies. Finally, we make the case for a distinction between PA represented by accelerometer-based devices and PA assessed by self-report. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Sampson, Maureen L; Gounden, Verena; van Deventer, Hendrik E; Remaley, Alan T
2016-02-01
The main drawback of the periodic analysis of quality control (QC) material is that test performance is not monitored in time periods between QC analyses, potentially leading to the reporting of faulty test results. The objective of this study was to develop a patient based QC procedure for the more timely detection of test errors. Results from a Chem-14 panel measured on the Beckman LX20 analyzer were used to develop the model. Each test result was predicted from the other 13 members of the panel by multiple regression, which resulted in correlation coefficients between the predicted and measured result of >0.7 for 8 of the 14 tests. A logistic regression model, which utilized the measured test result, the predicted test result, the day of the week and time of day, was then developed for predicting test errors. The output of the logistic regression was tallied by a daily CUSUM approach and used to predict test errors, with a fixed specificity of 90%. The mean average run length (ARL) before error detection by CUSUM-Logistic Regression (CSLR) was 20 with a mean sensitivity of 97%, which was considerably shorter than the mean ARL of 53 (sensitivity 87.5%) for a simple prediction model that only used the measured result for error detection. A CUSUM-Logistic Regression analysis of patient laboratory data can be an effective approach for the rapid and sensitive detection of clinical laboratory errors. Published by Elsevier Inc.
Classifying machinery condition using oil samples and binary logistic regression
NASA Astrophysics Data System (ADS)
Phillips, J.; Cripps, E.; Lau, John W.; Hodkiewicz, M. R.
2015-08-01
The era of big data has resulted in an explosion of condition monitoring information. The result is an increasing motivation to automate the costly and time consuming human elements involved in the classification of machine health. When working with industry it is important to build an understanding and hence some trust in the classification scheme for those who use the analysis to initiate maintenance tasks. Typically "black box" approaches such as artificial neural networks (ANN) and support vector machines (SVM) can be difficult to provide ease of interpretability. In contrast, this paper argues that logistic regression offers easy interpretability to industry experts, providing insight to the drivers of the human classification process and to the ramifications of potential misclassification. Of course, accuracy is of foremost importance in any automated classification scheme, so we also provide a comparative study based on predictive performance of logistic regression, ANN and SVM. A real world oil analysis data set from engines on mining trucks is presented and using cross-validation we demonstrate that logistic regression out-performs the ANN and SVM approaches in terms of prediction for healthy/not healthy engines.
Lee, Seokho; Shin, Hyejin; Lee, Sang Han
2016-12-01
Alzheimer's disease (AD) is usually diagnosed by clinicians through cognitive and functional performance test with a potential risk of misdiagnosis. Since the progression of AD is known to cause structural changes in the corpus callosum (CC), the CC thickness can be used as a functional covariate in AD classification problem for a diagnosis. However, misclassified class labels negatively impact the classification performance. Motivated by AD-CC association studies, we propose a logistic regression for functional data classification that is robust to misdiagnosis or label noise. Specifically, our logistic regression model is constructed by adopting individual intercepts to functional logistic regression model. This approach enables to indicate which observations are possibly mislabeled and also lead to a robust and efficient classifier. An effective algorithm using MM algorithm provides simple closed-form update formulas. We test our method using synthetic datasets to demonstrate its superiority over an existing method, and apply it to differentiating patients with AD from healthy normals based on CC from MRI. © 2016, The International Biometric Society.
Reducing the number of reconstructions needed for estimating channelized observer performance
NASA Astrophysics Data System (ADS)
Pineda, Angel R.; Miedema, Hope; Brenner, Melissa; Altaf, Sana
2018-03-01
A challenge for task-based optimization is the time required for each reconstructed image in applications where reconstructions are time consuming. Our goal is to reduce the number of reconstructions needed to estimate the area under the receiver operating characteristic curve (AUC) of the infinitely-trained optimal channelized linear observer. We explore the use of classifiers which either do not invert the channel covariance matrix or do feature selection. We also study the assumption that multiple low contrast signals in the same image of a non-linear reconstruction do not significantly change the estimate of the AUC. We compared the AUC of several classifiers (Hotelling, logistic regression, logistic regression using Firth bias reduction and the least absolute shrinkage and selection operator (LASSO)) with a small number of observations both for normal simulated data and images from a total variation reconstruction in magnetic resonance imaging (MRI). We used 10 Laguerre-Gauss channels and the Mann-Whitney estimator for AUC. For this data, our results show that at small sample sizes feature selection using the LASSO technique can decrease bias of the AUC estimation with increased variance and that for large sample sizes the difference between these classifiers is small. We also compared the use of multiple signals in a single reconstructed image to reduce the number of reconstructions in a total variation reconstruction for accelerated imaging in MRI. We found that AUC estimation using multiple low contrast signals in the same image resulted in similar AUC estimates as doing a single reconstruction per signal leading to a 13x reduction in the number of reconstructions needed.
Prediction of cold and heat patterns using anthropometric measures based on machine learning.
Lee, Bum Ju; Lee, Jae Chul; Nam, Jiho; Kim, Jong Yeol
2018-01-01
To examine the association of body shape with cold and heat patterns, to determine which anthropometric measure is the best indicator for discriminating between the two patterns, and to investigate whether using a combination of measures can improve the predictive power to diagnose these patterns. Based on a total of 4,859 subjects (3,000 women and 1,859 men), statistical analyses using binary logistic regression were performed to assess the significance of the difference and the predictive power of each anthropometric measure, and binary logistic regression and Naive Bayes with the variable selection technique were used to assess the improvement in the predictive power of the patterns using the combined measures. In women, the strongest indicators for determining the cold and heat patterns among anthropometric measures were body mass index (BMI) and rib circumference; in men, the best indicator was BMI. In experiments using a combination of measures, the values of the area under the receiver operating characteristic curve in women were 0.776 by Naive Bayes and 0.772 by logistic regression, and the values in men were 0.788 by Naive Bayes and 0.779 by logistic regression. Individuals with a higher BMI have a tendency toward a heat pattern in both women and men. The use of a combination of anthropometric measures can slightly improve the diagnostic accuracy. Our findings can provide fundamental information for the diagnosis of cold and heat patterns based on body shape for personalized medicine.
Teng, Ju-Hsi; Lin, Kuan-Chia; Ho, Bin-Shenq
2007-10-01
A community-based aboriginal study was conducted and analysed to explore the application of classification tree and logistic regression. A total of 1066 aboriginal residents in Yilan County were screened during 2003-2004. The independent variables include demographic characteristics, physical examinations, geographic location, health behaviours, dietary habits and family hereditary diseases history. Risk factors of cardiovascular diseases were selected as the dependent variables in further analysis. The completion rate for heath interview is 88.9%. The classification tree results find that if body mass index is higher than 25.72 kg m(-2) and the age is above 51 years, the predicted probability for number of cardiovascular risk factors > or =3 is 73.6% and the population is 322. If body mass index is higher than 26.35 kg m(-2) and geographical latitude of the village is lower than 24 degrees 22.8', the predicted probability for number of cardiovascular risk factors > or =4 is 60.8% and the population is 74. As the logistic regression results indicate that body mass index, drinking habit and menopause are the top three significant independent variables. The classification tree model specifically shows the discrimination paths and interactions between the risk groups. The logistic regression model presents and analyses the statistical independent factors of cardiovascular risks. Applying both models to specific situations will provide a different angle for the design and management of future health intervention plans after community-based study.
An Event-Triggered Machine Learning Approach for Accelerometer-Based Fall Detection.
Putra, I Putu Edy Suardiyana; Brusey, James; Gaura, Elena; Vesilo, Rein
2017-12-22
The fixed-size non-overlapping sliding window (FNSW) and fixed-size overlapping sliding window (FOSW) approaches are the most commonly used data-segmentation techniques in machine learning-based fall detection using accelerometer sensors. However, these techniques do not segment by fall stages (pre-impact, impact, and post-impact) and thus useful information is lost, which may reduce the detection rate of the classifier. Aligning the segment with the fall stage is difficult, as the segment size varies. We propose an event-triggered machine learning (EvenT-ML) approach that aligns each fall stage so that the characteristic features of the fall stages are more easily recognized. To evaluate our approach, two publicly accessible datasets were used. Classification and regression tree (CART), k -nearest neighbor ( k -NN), logistic regression (LR), and the support vector machine (SVM) were used to train the classifiers. EvenT-ML gives classifier F-scores of 98% for a chest-worn sensor and 92% for a waist-worn sensor, and significantly reduces the computational cost compared with the FNSW- and FOSW-based approaches, with reductions of up to 8-fold and 78-fold, respectively. EvenT-ML achieves a significantly better F-score than existing fall detection approaches. These results indicate that aligning feature segments with fall stages significantly increases the detection rate and reduces the computational cost.
Viswanathan, M; Pearl, D L; Taboada, E N; Parmley, E J; Mutschall, S K; Jardine, C M
2017-05-01
Using data collected from a cross-sectional study of 25 farms (eight beef, eight swine and nine dairy) in 2010, we assessed clustering of molecular subtypes of C. jejuni based on a Campylobacter-specific 40 gene comparative genomic fingerprinting assay (CGF40) subtypes, using unweighted pair-group method with arithmetic mean (UPGMA) analysis, and multiple correspondence analysis. Exact logistic regression was used to determine which genes differentiate wildlife and livestock subtypes in our study population. A total of 33 bovine livestock (17 beef and 16 dairy), 26 wildlife (20 raccoon (Procyon lotor), five skunk (Mephitis mephitis) and one mouse (Peromyscus spp.) C. jejuni isolates were subtyped using CGF40. Dendrogram analysis, based on UPGMA, showed distinct branches separating bovine livestock and mammalian wildlife isolates. Furthermore, two-dimensional multiple correspondence analysis was highly concordant with dendrogram analysis showing clear differentiation between livestock and wildlife CGF40 subtypes. Based on multilevel logistic regression models with a random intercept for farm of origin, we found that isolates in general, and raccoons more specifically, were significantly more likely to be part of the wildlife branch. Exact logistic regression conducted gene by gene revealed 15 genes that were predictive of whether an isolate was of wildlife or bovine livestock isolate origin. Both multiple correspondence analysis and exact logistic regression revealed that in most cases, the presence of a particular gene (13 of 15) was associated with an isolate being of livestock rather than wildlife origin. In conclusion, the evidence gained from dendrogram analysis, multiple correspondence analysis and exact logistic regression indicates that mammalian wildlife carry CGF40 subtypes of C. jejuni distinct from those carried by bovine livestock. Future studies focused on source attribution of C. jejuni in human infections will help determine whether wildlife transmit Campylobacter jejuni directly to humans. © 2016 Blackwell Verlag GmbH.
Vaeth, Michael; Skovlund, Eva
2004-06-15
For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.
GIS-based rare events logistic regression for mineral prospectivity mapping
NASA Astrophysics Data System (ADS)
Xiong, Yihui; Zuo, Renguang
2018-02-01
Mineralization is a special type of singularity event, and can be considered as a rare event, because within a specific study area the number of prospective locations (1s) are considerably fewer than the number of non-prospective locations (0s). In this study, GIS-based rare events logistic regression (RELR) was used to map the mineral prospectivity in the southwestern Fujian Province, China. An odds ratio was used to measure the relative importance of the evidence variables with respect to mineralization. The results suggest that formations, granites, and skarn alterations, followed by faults and aeromagnetic anomaly are the most important indicators for the formation of Fe-related mineralization in the study area. The prediction rate and the area under the curve (AUC) values show that areas with higher probability have a strong spatial relationship with the known mineral deposits. Comparing the results with original logistic regression (OLR) demonstrates that the GIS-based RELR performs better than OLR. The prospectivity map obtained in this study benefits the search for skarn Fe-related mineralization in the study area.
Rosenkrantz, Andrew B; Doshi, Ankur M; Ginocchio, Luke A; Aphinyanaphongs, Yindalon
2016-12-01
This study aimed to assess the performance of a text classification machine-learning model in predicting highly cited articles within the recent radiological literature and to identify the model's most influential article features. We downloaded from PubMed the title, abstract, and medical subject heading terms for 10,065 articles published in 25 general radiology journals in 2012 and 2013. Three machine-learning models were applied to predict the top 10% of included articles in terms of the number of citations to the article in 2014 (reflecting the 2-year time window in conventional impact factor calculations). The model having the highest area under the curve was selected to derive a list of article features (words) predicting high citation volume, which was iteratively reduced to identify the smallest possible core feature list maintaining predictive power. Overall themes were qualitatively assigned to the core features. The regularized logistic regression (Bayesian binary regression) model had highest performance, achieving an area under the curve of 0.814 in predicting articles in the top 10% of citation volume. We reduced the initial 14,083 features to 210 features that maintain predictivity. These features corresponded with topics relating to various imaging techniques (eg, diffusion-weighted magnetic resonance imaging, hyperpolarized magnetic resonance imaging, dual-energy computed tomography, computed tomography reconstruction algorithms, tomosynthesis, elastography, and computer-aided diagnosis), particular pathologies (prostate cancer; thyroid nodules; hepatic adenoma, hepatocellular carcinoma, non-alcoholic fatty liver disease), and other topics (radiation dose, electroporation, education, general oncology, gadolinium, statistics). Machine learning can be successfully applied to create specific feature-based models for predicting articles likely to achieve high influence within the radiological literature. Copyright © 2016 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.
Classification of older adults with/without a fall history using machine learning methods.
Lin Zhang; Ou Ma; Fabre, Jennifer M; Wood, Robert H; Garcia, Stephanie U; Ivey, Kayla M; McCann, Evan D
2015-01-01
Falling is a serious problem in an aged society such that assessment of the risk of falls for individuals is imperative for the research and practice of falls prevention. This paper introduces an application of several machine learning methods for training a classifier which is capable of classifying individual older adults into a high risk group and a low risk group (distinguished by whether or not the members of the group have a recent history of falls). Using a 3D motion capture system, significant gait features related to falls risk are extracted. By training these features, classification hypotheses are obtained based on machine learning techniques (K Nearest-neighbour, Naive Bayes, Logistic Regression, Neural Network, and Support Vector Machine). Training and test accuracies with sensitivity and specificity of each of these techniques are assessed. The feature adjustment and tuning of the machine learning algorithms are discussed. The outcome of the study will benefit the prediction and prevention of falls.
NASA Technical Reports Server (NTRS)
Smith, Kelly M.; Gay, Robert S.; Stachowiak, Susan J.
2013-01-01
In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter. In order to increase overall robustness, the vehicle also has an alternate method of triggering the drogue parachute deployment based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this velocity-based trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers excellent performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles.
Effect of folic acid on appetite in children: ordinal logistic and fuzzy logistic regressions.
Namdari, Mahshid; Abadi, Alireza; Taheri, S Mahmoud; Rezaei, Mansour; Kalantari, Naser; Omidvar, Nasrin
2014-03-01
Reduced appetite and low food intake are often a concern in preschool children, since it can lead to malnutrition, a leading cause of impaired growth and mortality in childhood. It is occasionally considered that folic acid has a positive effect on appetite enhancement and consequently growth in children. The aim of this study was to assess the effect of folic acid on the appetite of preschool children 3 to 6 y old. The study sample included 127 children ages 3 to 6 who were randomly selected from 20 preschools in the city of Tehran in 2011. Since appetite was measured by linguistic terms, a fuzzy logistic regression was applied for modeling. The obtained results were compared with a statistical ordinal logistic model. After controlling for the potential confounders, in a statistical ordinal logistic model, serum folate showed a significantly positive effect on appetite. A small but positive effect of folate was detected by fuzzy logistic regression. Based on fuzzy regression, the risk for poor appetite in preschool children was related to the employment status of their mothers. In this study, a positive association was detected between the levels of serum folate and improved appetite. For further investigation, a randomized controlled, double-blind clinical trial could be helpful to address causality. Copyright © 2014 Elsevier Inc. All rights reserved.
Computer-aided Classification of Mammographic Masses Using Visually Sensitive Image Features
Wang, Yunzhi; Aghaei, Faranak; Zarafshani, Ali; Qiu, Yuchen; Qian, Wei; Zheng, Bin
2017-01-01
Purpose To develop a new computer-aided diagnosis (CAD) scheme that computes visually sensitive image features routinely used by radiologists to develop a machine learning classifier and distinguish between the malignant and benign breast masses detected from digital mammograms. Methods An image dataset including 301 breast masses was retrospectively selected. From each segmented mass region, we computed image features that mimic five categories of visually sensitive features routinely used by radiologists in reading mammograms. We then selected five optimal features in the five feature categories and applied logistic regression models for classification. A new CAD interface was also designed to show lesion segmentation, computed feature values and classification score. Results Areas under ROC curves (AUC) were 0.786±0.026 and 0.758±0.027 when to classify mass regions depicting on two view images, respectively. By fusing classification scores computed from two regions, AUC increased to 0.806±0.025. Conclusion This study demonstrated a new approach to develop CAD scheme based on 5 visually sensitive image features. Combining with a “visual aid” interface, CAD results may be much more easily explainable to the observers and increase their confidence to consider CAD generated classification results than using other conventional CAD approaches, which involve many complicated and visually insensitive texture features. PMID:27911353
Ren, Jiliang; Yuan, Ying; Wu, Yingwei; Tao, Xiaofeng
2018-05-02
The overlap of morphological feature and mean ADC value restricted clinical application of MRI in the differential diagnosis of orbital lymphoma and idiopathic orbital inflammatory pseudotumor (IOIP). In this paper, we aimed to retrospectively evaluate the combined diagnostic value of conventional magnetic resonance imaging (MRI) and whole-tumor histogram analysis of apparent diffusion coefficient (ADC) maps in the differentiation of the two lesions. In total, 18 patients with orbital lymphoma and 22 patients with IOIP were included, who underwent both conventional MRI and diffusion weighted imaging before treatment. Conventional MRI features and histogram parameters derived from ADC maps, including mean ADC (ADC mean ), median ADC (ADC median ), skewness, kurtosis, 10th, 25th, 75th and 90th percentiles of ADC (ADC 10 , ADC 25 , ADC 75 , ADC 90 ) were evaluated and compared between orbital lymphoma and IOIP. Multivariate logistic regression analysis was used to identify the most valuable variables for discriminating. Differential model was built upon the selected variables and receiver operating characteristic (ROC) analysis was also performed to determine the differential ability of the model. Multivariate logistic regression showed ADC 10 (P = 0.023) and involvement of orbit preseptal space (P = 0.029) were the most promising indexes in the discrimination of orbital lymphoma and IOIP. The logistic model defined by ADC 10 and involvement of orbit preseptal space was built, which achieved an AUC of 0.939, with sensitivity of 77.30% and specificity of 94.40%. Conventional MRI feature of involvement of orbit preseptal space and ADC histogram parameter of ADC 10 are valuable in differential diagnosis of orbital lymphoma and IOIP.
ERIC Educational Resources Information Center
Almutairi, Mashal
2013-01-01
The main purpose of this research was to survey the literature about the U.S. education system and synthesize the important conclusions that could be identified as the main features of the education system in general as they relate to student achievement. The criteria were set and the meta-analysis procedures were carefully followed. This process…
Zlotnik, Alexander; Alfaro, Miguel Cuchí; Pérez, María Carmen Pérez; Gallardo-Antolín, Ascensión; Martínez, Juan Manuel Montero
2016-05-01
The usage of decision support tools in emergency departments, based on predictive models, capable of estimating the probability of admission for patients in the emergency department may give nursing staff the possibility of allocating resources in advance. We present a methodology for developing and building one such system for a large specialized care hospital using a logistic regression and an artificial neural network model using nine routinely collected variables available right at the end of the triage process.A database of 255.668 triaged nonobstetric emergency department presentations from the Ramon y Cajal University Hospital of Madrid, from January 2011 to December 2012, was used to develop and test the models, with 66% of the data used for derivation and 34% for validation, with an ordered nonrandom partition. On the validation dataset areas under the receiver operating characteristic curve were 0.8568 (95% confidence interval, 0.8508-0.8583) for the logistic regression model and 0.8575 (95% confidence interval, 0.8540-0. 8610) for the artificial neural network model. χ Values for Hosmer-Lemeshow fixed "deciles of risk" were 65.32 for the logistic regression model and 17.28 for the artificial neural network model. A nomogram was generated upon the logistic regression model and an automated software decision support system with a Web interface was built based on the artificial neural network model.
Standards for Standardized Logistic Regression Coefficients
ERIC Educational Resources Information Center
Menard, Scott
2011-01-01
Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…
Weapon carrying and psychopathic-like features in a population-based sample of Finnish adolescents.
Saukkonen, Suvi; Laajasalo, Taina; Jokela, Markus; Kivivuori, Janne; Salmi, Venla; Aronen, Eeva T
2016-02-01
We investigated the prevalence of juvenile weapon carrying and psychosocial and personality-related risk factors for carrying different types of weapons in a nationally representative, population-based sample of Finnish adolescents. Specifically, we aimed to investigate psychopathic-like personality features as a risk factor for weapon carrying. The participants were 15-16-year-old adolescents from the Finnish self-report delinquency study (n = 4855). Four different groups were formed based on self-reported weapon carrying: no weapon carrying, carrying knife, gun or other weapon. The associations between psychosocial factors, psychopathic-like features and weapon carrying were examined with multinomial logistic regression analysis. 9% of the participants had carried a weapon in the past 12 months. Adolescents with a history of delinquency, victimization and antisocial friends were more likely to carry weapons in general; however, delinquency and victimization were most strongly related to gun carrying, while perceived peer delinquency (antisocial friends) was most strongly related to carrying a knife. Better academic performance was associated with a reduced likelihood of carrying a gun and knife, while feeling secure correlated with a reduced likelihood of gun carrying only. Psychopathic-like features were related to a higher likelihood of weapon carrying, even after adjusting for other risk factors. The findings of the study suggest that adolescents carrying a weapon have a large cluster of problems in their lives, which may vary based on the type of weapon carried. Furthermore, psychopathic-like features strongly relate to a higher risk of carrying a weapon.
Workers' perceptions of how jobs affect health: a social ecological perspective.
Ettner, S L; Grzywacz, J G
2001-04-01
A national sample of 2,048 workers was asked to rate the impact of their job on their physical and mental health. Ordered logistic regression analyses based on social ecology theory showed that the workers' responses were significantly correlated with objective and subjective features of their jobs, in addition to personality characteristics. Workers who had higher levels of perceived constraints and neuroticism, worked nights or overtime, or reported serious ongoing stress at work or higher job pressure reported more negative effects. Respondents who had a higher level of extraversion, were self-employed, or worked part time or reported greater decision latitude or use of skills on the job reported more positive effects. These findings suggest that malleable features of the work environment are associated with perceived effects of work on health, even after controlling for personality traits and other sources of reporting bias.
Schörgendorfer, Angela; Branscum, Adam J; Hanson, Timothy E
2013-06-01
Logistic regression is a popular tool for risk analysis in medical and population health science. With continuous response data, it is common to create a dichotomous outcome for logistic regression analysis by specifying a threshold for positivity. Fitting a linear regression to the nondichotomized response variable assuming a logistic sampling model for the data has been empirically shown to yield more efficient estimates of odds ratios than ordinary logistic regression of the dichotomized endpoint. We illustrate that risk inference is not robust to departures from the parametric logistic distribution. Moreover, the model assumption of proportional odds is generally not satisfied when the condition of a logistic distribution for the data is violated, leading to biased inference from a parametric logistic analysis. We develop novel Bayesian semiparametric methodology for testing goodness of fit of parametric logistic regression with continuous measurement data. The testing procedures hold for any cutoff threshold and our approach simultaneously provides the ability to perform semiparametric risk estimation. Bayes factors are calculated using the Savage-Dickey ratio for testing the null hypothesis of logistic regression versus a semiparametric generalization. We propose a fully Bayesian and a computationally efficient empirical Bayesian approach to testing, and we present methods for semiparametric estimation of risks, relative risks, and odds ratios when parametric logistic regression fails. Theoretical results establish the consistency of the empirical Bayes test. Results from simulated data show that the proposed approach provides accurate inference irrespective of whether parametric assumptions hold or not. Evaluation of risk factors for obesity shows that different inferences are derived from an analysis of a real data set when deviations from a logistic distribution are permissible in a flexible semiparametric framework. © 2013, The International Biometric Society.
Westreich, Daniel; Lessler, Justin; Funk, Michele Jonsson
2010-01-01
Summary Objective Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this Review was to assess machine learning alternatives to logistic regression which may accomplish the same goals but with fewer assumptions or greater accuracy. Study Design and Setting We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. Results We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (CART), and meta-classifiers (in particular, boosting). Conclusion While the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and to a lesser extent decision trees (particularly CART) appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice. PMID:20630332
Fungible weights in logistic regression.
Jones, Jeff A; Waller, Niels G
2016-06-01
In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Zhan, Liang; Liu, Yashu; Wang, Yalin; Zhou, Jiayu; Jahanshad, Neda; Ye, Jieping; Thompson, Paul M.
2015-01-01
Alzheimer's disease (AD) is a progressive brain disease. Accurate detection of AD and its prodromal stage, mild cognitive impairment (MCI), are crucial. There is also a growing interest in identifying brain imaging biomarkers that help to automatically differentiate stages of Alzheimer's disease. Here, we focused on brain structural networks computed from diffusion MRI and proposed a new feature extraction and classification framework based on higher order singular value decomposition and sparse logistic regression. In tests on publicly available data from the Alzheimer's Disease Neuroimaging Initiative, our proposed framework showed promise in detecting brain network differences that help in classifying different stages of Alzheimer's disease. PMID:26257601
Westreich, Daniel; Lessler, Justin; Funk, Michele Jonsson
2010-08-01
Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this review was to assess machine learning alternatives to logistic regression, which may accomplish the same goals but with fewer assumptions or greater accuracy. We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (classification and regression trees [CART]), and meta-classifiers (in particular, boosting). Although the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and, to a lesser extent, decision trees (particularly CART), appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice. Copyright (c) 2010 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Kneringer, Philipp; Dietz, Sebastian; Mayr, Georg J.; Zeileis, Achim
2017-04-01
Low-visibility conditions have a large impact on aviation safety and economic efficiency of airports and airlines. To support decision makers, we develop a statistical probabilistic nowcasting tool for the occurrence of capacity-reducing operations related to low visibility. The probabilities of four different low visibility classes are predicted with an ordered logistic regression model based on time series of meteorological point measurements. Potential predictor variables for the statistical models are visibility, humidity, temperature and wind measurements at several measurement sites. A stepwise variable selection method indicates that visibility and humidity measurements are the most important model inputs. The forecasts are tested with a 30 minute forecast interval up to two hours, which is a sufficient time span for tactical planning at Vienna Airport. The ordered logistic regression models outperform persistence and are competitive with human forecasters.
NASA Technical Reports Server (NTRS)
Smith, Kelly M.; Gay, Robert S.; Stachowiak, Susan J.
2013-01-01
In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter to improve altitude knowledge. In order to increase overall robustness, the vehicle also has an alternate method of triggering the parachute deployment sequence based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this backup trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to semi-automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a statistical classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers improved performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles.
NASA Technical Reports Server (NTRS)
Smith, Kelly; Gay, Robert; Stachowiak, Susan
2013-01-01
In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter to improve altitude knowledge. In order to increase overall robustness, the vehicle also has an alternate method of triggering the parachute deployment sequence based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this backup trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to semi-automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a statistical classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers improved performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles
Bayesian logistic regression in detection of gene-steroid interaction for cancer at PDLIM5 locus.
Wang, Ke-Sheng; Owusu, Daniel; Pan, Yue; Xie, Changchun
2016-06-01
The PDZ and LIM domain 5 (PDLIM5) gene may play a role in cancer, bipolar disorder, major depression, alcohol dependence and schizophrenia; however, little is known about the interaction effect of steroid and PDLIM5 gene on cancer. This study examined 47 single-nucleotide polymorphisms (SNPs) within the PDLIM5 gene in the Marshfield sample with 716 cancer patients (any diagnosed cancer, excluding minor skin cancer) and 2848 noncancer controls. Multiple logistic regression model in PLINK software was used to examine the association of each SNP with cancer. Bayesian logistic regression in PROC GENMOD in SAS statistical software, ver. 9.4 was used to detect gene- steroid interactions influencing cancer. Single marker analysis using PLINK identified 12 SNPs associated with cancer (P< 0.05); especially, SNP rs6532496 revealed the strongest association with cancer (P = 6.84 × 10⁻³); while the next best signal was rs951613 (P = 7.46 × 10⁻³). Classic logistic regression in PROC GENMOD showed that both rs6532496 and rs951613 revealed strong gene-steroid interaction effects (OR=2.18, 95% CI=1.31-3.63 with P = 2.9 × 10⁻³ for rs6532496 and OR=2.07, 95% CI=1.24-3.45 with P = 5.43 × 10⁻³ for rs951613, respectively). Results from Bayesian logistic regression showed stronger interaction effects (OR=2.26, 95% CI=1.2-3.38 for rs6532496 and OR=2.14, 95% CI=1.14-3.2 for rs951613, respectively). All the 12 SNPs associated with cancer revealed significant gene-steroid interaction effects (P < 0.05); whereas 13 SNPs showed gene-steroid interaction effects without main effect on cancer. SNP rs4634230 revealed the strongest gene-steroid interaction effect (OR=2.49, 95% CI=1.5-4.13 with P = 4.0 × 10⁻⁴ based on the classic logistic regression and OR=2.59, 95% CI=1.4-3.97 from Bayesian logistic regression; respectively). This study provides evidence of common genetic variants within the PDLIM5 gene and interactions between PLDIM5 gene polymorphisms and steroid use influencing cancer.
Fei, Yang; Hu, Jian; Gao, Kun; Tu, Jianfeng; Li, Wei-Qin; Wang, Wei
2017-06-01
To construct a radical basis function (RBF) artificial neural networks (ANNs) model to predict the incidence of acute pancreatitis (AP)-induced portal vein thrombosis. The analysis included 353 patients with AP who had admitted between January 2011 and December 2015. RBF ANNs model and logistic regression model were constructed based on eleven factors relevant to AP respectively. Statistical indexes were used to evaluate the value of the prediction in two models. The predict sensitivity, specificity, positive predictive value, negative predictive value and accuracy by RBF ANNs model for PVT were 73.3%, 91.4%, 68.8%, 93.0% and 87.7%, respectively. There were significant differences between the RBF ANNs and logistic regression models in these parameters (P<0.05). In addition, a comparison of the area under receiver operating characteristic curves of the two models showed a statistically significant difference (P<0.05). The RBF ANNs model is more likely to predict the occurrence of PVT induced by AP than logistic regression model. D-dimer, AMY, Hct and PT were important prediction factors of approval for AP-induced PVT. Copyright © 2017 Elsevier Inc. All rights reserved.
Variable Selection for Road Segmentation in Aerial Images
NASA Astrophysics Data System (ADS)
Warnke, S.; Bulatov, D.
2017-05-01
For extraction of road pixels from combined image and elevation data, Wegner et al. (2015) proposed classification of superpixels into road and non-road, after which a refinement of the classification results using minimum cost paths and non-local optimization methods took place. We believed that the variable set used for classification was to a certain extent suboptimal, because many variables were redundant while several features known as useful in Photogrammetry and Remote Sensing are missed. This motivated us to implement a variable selection approach which builds a model for classification using portions of training data and subsets of features, evaluates this model, updates the feature set, and terminates when a stopping criterion is satisfied. The choice of classifier is flexible; however, we tested the approach with Logistic Regression and Random Forests, and taylored the evaluation module to the chosen classifier. To guarantee a fair comparison, we kept the segment-based approach and most of the variables from the related work, but we extended them by additional, mostly higher-level features. Applying these superior features, removing the redundant ones, as well as using more accurately acquired 3D data allowed to keep stable or even to reduce the misclassification error in a challenging dataset.
Logistic models--an odd(s) kind of regression.
Jupiter, Daniel C
2013-01-01
The logistic regression model bears some similarity to the multivariable linear regression with which we are familiar. However, the differences are great enough to warrant a discussion of the need for and interpretation of logistic regression. Copyright © 2013 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.
Burnside, Elizabeth S.; Liu, Jie; Wu, Yirong; Onitilo, Adedayo A.; McCarty, Catherine; Page, C. David; Peissig, Peggy; Trentham-Dietz, Amy; Kitchner, Terrie; Fan, Jun; Yuan, Ming
2015-01-01
Rationale and Objectives The discovery of germline genetic variants associated with breast cancer has engendered interest in risk stratification for improved, targeted detection and diagnosis. However, there has yet to be a comparison of the predictive ability of these genetic variants with mammography abnormality descriptors. Materials and Methods Our IRB-approved, HIPAA-compliant study utilized a personalized medicine registry in which participants consented to provide a DNA sample and participate in longitudinal follow-up. In our retrospective, age-matched, case-controlled study of 373 cases and 395 controls who underwent breast biopsy, we collected risk factors selected a priori based on the literature including: demographic variables based on the Gail model, common germline genetic variants, and diagnostic mammography findings according to BI-RADS. We developed predictive models using logistic regression to determine the predictive ability of: 1) demographic variables, 2) 10 selected genetic variants, or 3) mammography BI-RADS features. We evaluated each model in turn by calculating a risk score for each patient using 10-fold cross validation; used this risk estimate to construct ROC curves; and compared the AUC of each using the DeLong method. Results The performance of the regression model using demographic risk factors was not statistically different from the model using genetic variants (p=0.9). The model using mammography features (AUC = 0.689) was superior to both the demographic model (AUC = .598; p<0.001) and the genetic model (AUC = .601; p<0.001). Conclusion BI-RADS features exceeded the ability of demographic and 10 selected germline genetic variants to predict breast cancer in women recommended for biopsy. PMID:26514439
Flash Flood Type Identification within Catchments in Beijing Mountainous Area
NASA Astrophysics Data System (ADS)
Nan, W.
2017-12-01
Flash flood is a common type of disaster in mountainous area, Flash flood with the feature of large flow rate, strong flushing force, destructive power, has periodically caused loss to life and destruction to infrastructure in mountainous area. Beijing as China's political, economic and cultural center, the disaster prevention and control work in Beijing mountainous area has always been concerned widely. According to the transport mechanism, sediment concentration and density, the flash flood type identification within catchment can provide basis for making the hazards prevention and mitigation policy. Taking Beijing as the study area, this paper extracted parameters related to catchment morphological and topography features respectively. By using Bayes discriminant, Logistic regression and Random forest, the catchments in Beijing mountainous area were divided into water floods process, fluvial sediment transport process and debris flows process. The results found that Logistic regression analysis showed the highest accuracy, with the overall accuracy of 88.2%. Bayes discriminant and Random forest had poor prediction effects. This study confirmed the ability of morphological and topography features to identify flash flood process. The circularity ratio, elongation ratio and roughness index can be used to explain the flash flood types effectively, and the Melton ratio and elevation relief ratio also did a good job during the identification, whereas the drainage density seemed not to be an issue at this level of detail. Based on the analysis of spatial patterns of flash flood types, fluvial sediment transport process and debris flow process were the dominant hazards, while the pure water flood process was much less. The catchments dominated by fluvial sediment transport process were mainly distributed in the Yan Mountain region, where the fault belts were relatively dense. The debris flow process prone to occur in the Taihang Mountain region thanks to the abundant coal gangues. The pure water flood process catchments were mainly distributed in the transitional mountain front.
Predicting nest success from habitat features in aspen forests of the central Rocky Mountains
Heather M. Struempf; Deborah M. Finch; Gregory Hayward; Stanley Anderson
2001-01-01
We collected nesting data on bird use of aspen stands in the Routt and Medicine Bow National Forests between 1987 and 1989. We found active nest sites of 28 species of small nongame birds on nine study plots in undisturbed aspen forests. We compared logistic regression models predicting nest success (at least one nestling) from nest-site or stand-level habitat...
Gao, Yingwang; Geng, Jinfeng; Rao, Xiuqin; Ying, Yibin
2016-01-01
Skinning injury on potato tubers is a kind of superficial wound that is generally inflicted by mechanical forces during harvest and postharvest handling operations. Though skinning injury is pervasive and obstructive, its detection is very limited. This study attempted to identify injured skin using two CCD (Charge Coupled Device) sensor-based machine vision technologies, i.e., visible imaging and biospeckle imaging. The identification of skinning injury was realized via exploiting features extracted from varied ROIs (Region of Interests). The features extracted from visible images were pixel-wise color and texture features, while region-wise BA (Biospeckle Activity) was calculated from biospeckle imaging. In addition, the calculation of BA using varied numbers of speckle patterns were compared. Finally, extracted features were implemented into classifiers of LS-SVM (Least Square Support Vector Machine) and BLR (Binary Logistic Regression), respectively. Results showed that color features performed better than texture features in classifying sound skin and injured skin, especially for injured skin stored no less than 1 day, with the average classification accuracy of 90%. Image capturing and processing efficiency can be speeded up in biospeckle imaging, with captured 512 frames reduced to 125 frames. Classification results obtained based on the feature of BA were acceptable for early skinning injury stored within 1 day, with the accuracy of 88.10%. It is concluded that skinning injury can be recognized by visible and biospeckle imaging during different stages. Visible imaging has the aptitude in recognizing stale skinning injury, while fresh injury can be discriminated by biospeckle imaging. PMID:27763555
Gao, Yingwang; Geng, Jinfeng; Rao, Xiuqin; Ying, Yibin
2016-10-18
Skinning injury on potato tubers is a kind of superficial wound that is generally inflicted by mechanical forces during harvest and postharvest handling operations. Though skinning injury is pervasive and obstructive, its detection is very limited. This study attempted to identify injured skin using two CCD (Charge Coupled Device) sensor-based machine vision technologies, i.e., visible imaging and biospeckle imaging. The identification of skinning injury was realized via exploiting features extracted from varied ROIs (Region of Interests). The features extracted from visible images were pixel-wise color and texture features, while region-wise BA (Biospeckle Activity) was calculated from biospeckle imaging. In addition, the calculation of BA using varied numbers of speckle patterns were compared. Finally, extracted features were implemented into classifiers of LS-SVM (Least Square Support Vector Machine) and BLR (Binary Logistic Regression), respectively. Results showed that color features performed better than texture features in classifying sound skin and injured skin, especially for injured skin stored no less than 1 day, with the average classification accuracy of 90%. Image capturing and processing efficiency can be speeded up in biospeckle imaging, with captured 512 frames reduced to 125 frames. Classification results obtained based on the feature of BA were acceptable for early skinning injury stored within 1 day, with the accuracy of 88.10%. It is concluded that skinning injury can be recognized by visible and biospeckle imaging during different stages. Visible imaging has the aptitude in recognizing stale skinning injury, while fresh injury can be discriminated by biospeckle imaging.
NASA Astrophysics Data System (ADS)
Maas, A.; Alrajhi, M.; Alobeid, A.; Heipke, C.
2017-05-01
Updating topographic geospatial databases is often performed based on current remotely sensed images. To automatically extract the object information (labels) from the images, supervised classifiers are being employed. Decisions to be taken in this process concern the definition of the classes which should be recognised, the features to describe each class and the training data necessary in the learning part of classification. With a view to large scale topographic databases for fast developing urban areas in the Kingdom of Saudi Arabia we conducted a case study, which investigated the following two questions: (a) which set of features is best suitable for the classification?; (b) what is the added value of height information, e.g. derived from stereo imagery? Using stereoscopic GeoEye and Ikonos satellite data we investigate these two questions based on our research on label tolerant classification using logistic regression and partly incorrect training data. We show that in between five and ten features can be recommended to obtain a stable solution, that height information consistently yields an improved overall classification accuracy of about 5%, and that label noise can be successfully modelled and thus only marginally influences the classification results.
Ultrasound features of purulent skin and soft tissue infection without abscess.
Nelson, Courtney E; Chen, Aaron E; Bellah, Richard D; Biko, David M; Ho-Fung, Victor M; Francavilla, Michael L; Xiao, Rui; Kaplan, Summer L
2018-06-06
Ultrasound (US) aids clinical management of skin and soft tissue infection (SSTI) by differentiating non-purulent cellulitis from abscess. However, purulent SSTI may be present without abscess. Guidelines recommend incision and drainage (I & D) for purulent SSTI, but US descriptions of purulent SSTI without abscess are lacking. We retrospectively reviewed pediatric emergency department patients with US of the buttock read as negative for abscess. We identified US features of SSTI with adequate interobserver agreement (kappa > 0.45). Six independent observers then ranked presence or absence of these features on US exams. We studied association between US features and positive wound culture using logistic regression models (significance at p < 0.05). Of 217 children, 35 patients (16%) had cultures positive for pathogens by 8 h after US and 61 patients (32%) had cultures positive by 48 h after US. We found kappa > 0.45 for focal collection > 1.0 cm (κ = 0.57), hyperemia (κ = 0.57), swirling with compression (κ = 0.52), posterior acoustic enhancement (κ = 0.47), and cobblestoning or branching interstitial fluid (κ = 0.45). Only cobblestoning or interstitial fluid was associated with positive wound cultures in logistic regression models at 8 and 48 h. Cobblestoning or interstitial fluid on US may indicate presence of culture-positive, purulent SSTI in patients without US appearance of abscess. Although our study has limitations due to its retrospective design, this US appearance should alert imagers that the patient may benefit from early I & D.
Evaluating the Locational Attributes of Education Management Organizations (EMOs)
ERIC Educational Resources Information Center
Gulosino, Charisse; Miron, Gary
2017-01-01
This study uses logistic and multinomial logistic regression models to analyze neighborhood factors affecting EMO (Education Management Organization)-operated schools' locational attributes (using census tracts) in 41 states for the 2014-2015 school year. Our research combines market-based school reform, institutional theory, and resource…
Piguet, Olivier; Leyton, Cristian E; Gleeson, Liam D; Hoon, Chris; Hodges, John R
2015-01-01
The two non-semantic variants of primary progressive aphasia (PPA), nonfluent/agrammatic PPA (nfv-PPA) and logopenic variant PPA (lv-PPA), share language features despite their different underlying pathology, and may be difficult to distinguish for non-language experts. To improve diagnostic accuracy of nfv-PPA and lv-PPA using tasks measuring non-language cognition and emotion processing. Thirty-eight dementia patients meeting diagnostic criteria for PPA (nfv-PPA 20, lv-PPA 18) and 21 matched healthy Controls underwent a comprehensive assessment of cognition and emotion processing, as well as a high-resolution structural MRI and a PiB-PET scan, a putative biomarker of Alzheimer's disease. Task performances were compared between the groups and those found to differ significantly were entered into a logistic regression analysis. Analyses revealed a double dissociation between nfv-PPA and lv-PPA. nfv-PPA exhibited significant emotion processing disturbance compared to lv-PPA and Controls. In contrast, only the lv-PPA group was significantly impaired on tasks of episodic memory. Logistic regression analyses showed that 87% of patients were correctly classified using emotion processing and episodic memory composite scores, together with a measure of visuospatial ability. Non-language presenting features can help differentiate between the two non-semantic PPA syndromes, with a double dissociation observed on tasks of episodic memory and emotion processing. Based on performance on these tasks, we propose a decision tree as a complementary method to differentiate between the two non-semantic variants. These findings have important clinical implications, with identification of patients who may potentially benefit existing therapeutic interventions currently available for Alzheimer's disease.
Accurate estimation of short read mapping quality for next-generation genome sequencing
Ruffalo, Matthew; Koyutürk, Mehmet; Ray, Soumya; LaFramboise, Thomas
2012-01-01
Motivation: Several software tools specialize in the alignment of short next-generation sequencing reads to a reference sequence. Some of these tools report a mapping quality score for each alignment—in principle, this quality score tells researchers the likelihood that the alignment is correct. However, the reported mapping quality often correlates weakly with actual accuracy and the qualities of many mappings are underestimated, encouraging the researchers to discard correct mappings. Further, these low-quality mappings tend to correlate with variations in the genome (both single nucleotide and structural), and such mappings are important in accurately identifying genomic variants. Approach: We develop a machine learning tool, LoQuM (LOgistic regression tool for calibrating the Quality of short read mappings, to assign reliable mapping quality scores to mappings of Illumina reads returned by any alignment tool. LoQuM uses statistics on the read (base quality scores reported by the sequencer) and the alignment (number of matches, mismatches and deletions, mapping quality score returned by the alignment tool, if available, and number of mappings) as features for classification and uses simulated reads to learn a logistic regression model that relates these features to actual mapping quality. Results: We test the predictions of LoQuM on an independent dataset generated by the ART short read simulation software and observe that LoQuM can ‘resurrect’ many mappings that are assigned zero quality scores by the alignment tools and are therefore likely to be discarded by researchers. We also observe that the recalibration of mapping quality scores greatly enhances the precision of called single nucleotide polymorphisms. Availability: LoQuM is available as open source at http://compbio.case.edu/loqum/. Contact: matthew.ruffalo@case.edu. PMID:22962451
Determination of riverbank erosion probability using Locally Weighted Logistic Regression
NASA Astrophysics Data System (ADS)
Ioannidou, Elena; Flori, Aikaterini; Varouchakis, Emmanouil A.; Giannakis, Georgios; Vozinaki, Anthi Eirini K.; Karatzas, George P.; Nikolaidis, Nikolaos
2015-04-01
Riverbank erosion is a natural geomorphologic process that affects the fluvial environment. The most important issue concerning riverbank erosion is the identification of the vulnerable locations. An alternative to the usual hydrodynamic models to predict vulnerable locations is to quantify the probability of erosion occurrence. This can be achieved by identifying the underlying relations between riverbank erosion and the geomorphological or hydrological variables that prevent or stimulate erosion. Thus, riverbank erosion can be determined by a regression model using independent variables that are considered to affect the erosion process. The impact of such variables may vary spatially, therefore, a non-stationary regression model is preferred instead of a stationary equivalent. Locally Weighted Regression (LWR) is proposed as a suitable choice. This method can be extended to predict the binary presence or absence of erosion based on a series of independent local variables by using the logistic regression model. It is referred to as Locally Weighted Logistic Regression (LWLR). Logistic regression is a type of regression analysis used for predicting the outcome of a categorical dependent variable (e.g. binary response) based on one or more predictor variables. The method can be combined with LWR to assign weights to local independent variables of the dependent one. LWR allows model parameters to vary over space in order to reflect spatial heterogeneity. The probabilities of the possible outcomes are modelled as a function of the independent variables using a logistic function. Logistic regression measures the relationship between a categorical dependent variable and, usually, one or several continuous independent variables by converting the dependent variable to probability scores. Then, a logistic regression is formed, which predicts success or failure of a given binary variable (e.g. erosion presence or absence) for any value of the independent variables. The erosion occurrence probability can be calculated in conjunction with the model deviance regarding the independent variables tested. The most straightforward measure for goodness of fit is the G statistic. It is a simple and effective way to study and evaluate the Logistic Regression model efficiency and the reliability of each independent variable. The developed statistical model is applied to the Koiliaris River Basin on the island of Crete, Greece. Two datasets of river bank slope, river cross-section width and indications of erosion were available for the analysis (12 and 8 locations). Two different types of spatial dependence functions, exponential and tricubic, were examined to determine the local spatial dependence of the independent variables at the measurement locations. The results show a significant improvement when the tricubic function is applied as the erosion probability is accurately predicted at all eight validation locations. Results for the model deviance show that cross-section width is more important than bank slope in the estimation of erosion probability along the Koiliaris riverbanks. The proposed statistical model is a useful tool that quantifies the erosion probability along the riverbanks and can be used to assist managing erosion and flooding events. Acknowledgements This work is part of an on-going THALES project (CYBERSENSORS - High Frequency Monitoring System for Integrated Water Resources Management of Rivers). The project has been co-financed by the European Union (European Social Fund - ESF) and Greek national funds through the Operational Program "Education and Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research Funding Program: THALES. Investing in knowledge society through the European Social Fund.
Reboussin, Beth A; Preisser, John S; Song, Eun-Young; Wolfson, Mark
2012-07-01
Under-age drinking is an enormous public health issue in the USA. Evidence that community level structures may impact on under-age drinking has led to a proliferation of efforts to change the environment surrounding the use of alcohol. Although the focus of these efforts is to reduce drinking by individual youths, environmental interventions are typically implemented at the community level with entire communities randomized to the same intervention condition. A distinct feature of these trials is the tendency of the behaviours of individuals residing in the same community to be more alike than that of others residing in different communities, which is herein called 'clustering'. Statistical analyses and sample size calculations must account for this clustering to avoid type I errors and to ensure an appropriately powered trial. Clustering itself may also be of scientific interest. We consider the alternating logistic regressions procedure within the population-averaged modelling framework to estimate the effect of a law enforcement intervention on the prevalence of under-age drinking behaviours while modelling the clustering at multiple levels, e.g. within communities and within neighbourhoods nested within communities, by using pairwise odds ratios. We then derive sample size formulae for estimating intervention effects when planning a post-test-only or repeated cross-sectional community-randomized trial using the alternating logistic regressions procedure.
Evaluating the perennial stream using logistic regression in central Taiwan
NASA Astrophysics Data System (ADS)
Ruljigaljig, T.; Cheng, Y. S.; Lin, H. I.; Lee, C. H.; Yu, T. T.
2014-12-01
This study produces a perennial stream head potential map, based on a logistic regression method with a Geographic Information System (GIS). Perennial stream initiation locations, indicates the location of the groundwater and surface contact, were identified in the study area from field survey. The perennial stream potential map in central Taiwan was constructed using the relationship between perennial stream and their causative factors, such as Catchment area, slope gradient, aspect, elevation, groundwater recharge and precipitation. Here, the field surveys of 272 streams were determined in the study area. The areas under the curve for logistic regression methods were calculated as 0.87. The results illustrate the importance of catchment area and groundwater recharge as key factors within the model. The results obtained from the model within the GIS were then used to produce a map of perennial stream and estimate the location of perennial stream head.
Neural network modeling for surgical decisions on traumatic brain injury patients.
Li, Y C; Liu, L; Chiu, W T; Jian, W S
2000-01-01
Computerized medical decision support systems have been a major research topic in recent years. Intelligent computer programs were implemented to aid physicians and other medical professionals in making difficult medical decisions. This report compares three different mathematical models for building a traumatic brain injury (TBI) medical decision support system (MDSS). These models were developed based on a large TBI patient database. This MDSS accepts a set of patient data such as the types of skull fracture, Glasgow Coma Scale (GCS), episode of convulsion and return the chance that a neurosurgeon would recommend an open-skull surgery for this patient. The three mathematical models described in this report including a logistic regression model, a multi-layer perceptron (MLP) neural network and a radial-basis-function (RBF) neural network. From the 12,640 patients selected from the database. A randomly drawn 9480 cases were used as the training group to develop/train our models. The other 3160 cases were in the validation group which we used to evaluate the performance of these models. We used sensitivity, specificity, areas under receiver-operating characteristics (ROC) curve and calibration curves as the indicator of how accurate these models are in predicting a neurosurgeon's decision on open-skull surgery. The results showed that, assuming equal importance of sensitivity and specificity, the logistic regression model had a (sensitivity, specificity) of (73%, 68%), compared to (80%, 80%) from the RBF model and (88%, 80%) from the MLP model. The resultant areas under ROC curve for logistic regression, RBF and MLP neural networks are 0.761, 0.880 and 0.897, respectively (P < 0.05). Among these models, the logistic regression has noticeably poorer calibration. This study demonstrated the feasibility of applying neural networks as the mechanism for TBI decision support systems based on clinical databases. The results also suggest that neural networks may be a better solution for complex, non-linear medical decision support systems than conventional statistical techniques such as logistic regression.
The purpose of this report is to provide a reference manual that could be used by investigators for making informed use of logistic regression using two methods (standard logistic regression and MARS). The details for analyses of relationships between a dependent binary response ...
Predicting U.S. Army Reserve Unit Manning Using Market Demographics
2015-06-01
develops linear regression , classification tree, and logistic regression models to determine the ability of the location to support manning requirements... logistic regression model delivers predictive results that allow decision-makers to identify locations with a high probability of meeting unit...manning requirements. The recommendation of this thesis is that the USAR implement the logistic regression model. 14. SUBJECT TERMS U.S
ERIC Educational Resources Information Center
Chen, Chau-Kuang
2005-01-01
Logistic and Cox regression methods are practical tools used to model the relationships between certain student learning outcomes and their relevant explanatory variables. The logistic regression model fits an S-shaped curve into a binary outcome with data points of zero and one. The Cox regression model allows investigators to study the duration…
Mahrooghy, Majid; Ashraf, Ahmed B; Daye, Dania; McDonald, Elizabeth S; Rosen, Mark; Mies, Carolyn; Feldman, Michael; Kontos, Despina
2015-06-01
Heterogeneity in cancer can affect response to therapy and patient prognosis. Histologic measures have classically been used to measure heterogeneity, although a reliable noninvasive measurement is needed both to establish baseline risk of recurrence and monitor response to treatment. Here, we propose using spatiotemporal wavelet kinetic features from dynamic contrast-enhanced magnetic resonance imaging to quantify intratumor heterogeneity in breast cancer. Tumor pixels are first partitioned into homogeneous subregions using pharmacokinetic measures. Heterogeneity wavelet kinetic (HetWave) features are then extracted from these partitions to obtain spatiotemporal patterns of the wavelet coefficients and the contrast agent uptake. The HetWave features are evaluated in terms of their prognostic value using a logistic regression classifier with genetic algorithm wrapper-based feature selection to classify breast cancer recurrence risk as determined by a validated gene expression assay. Receiver operating characteristic analysis and area under the curve (AUC) are computed to assess classifier performance using leave-one-out cross validation. The HetWave features outperform other commonly used features (AUC = 0.88 HetWave versus 0.70 standard features). The combination of HetWave and standard features further increases classifier performance (AUCs 0.94). The rate of the spatial frequency pattern over the pharmacokinetic partitions can provide valuable prognostic information. HetWave could be a powerful feature extraction approach for characterizing tumor heterogeneity, providing valuable prognostic information.
Yusuf, O B; Bamgboye, E A; Afolabi, R F; Shodimu, M A
2014-09-01
Logistic regression model is widely used in health research for description and predictive purposes. Unfortunately, most researchers are sometimes not aware that the underlying principles of the techniques have failed when the algorithm for maximum likelihood does not converge. Young researchers particularly postgraduate students may not know why separation problem whether quasi or complete occurs, how to identify it and how to fix it. This study was designed to critically evaluate convergence issues in articles that employed logistic regression analysis published in an African Journal of Medicine and medical sciences between 2004 and 2013. Problems of quasi or complete separation were described and were illustrated with the National Demographic and Health Survey dataset. A critical evaluation of articles that employed logistic regression was conducted. A total of 581 articles was reviewed, of which 40 (6.9%) used binary logistic regression. Twenty-four (60.0%) stated the use of logistic regression model in the methodology while none of the articles assessed model fit. Only 3 (12.5%) properly described the procedures. Of the 40 that used the logistic regression model, the problem of convergence occurred in 6 (15.0%) of the articles. Logistic regression tends to be poorly reported in studies published between 2004 and 2013. Our findings showed that the procedure may not be well understood by researchers since very few described the process in their reports and may be totally unaware of the problem of convergence or how to deal with it.
Logistic Regression: Concept and Application
ERIC Educational Resources Information Center
Cokluk, Omay
2010-01-01
The main focus of logistic regression analysis is classification of individuals in different groups. The aim of the present study is to explain basic concepts and processes of binary logistic regression analysis intended to determine the combination of independent variables which best explain the membership in certain groups called dichotomous…
Visual attention based bag-of-words model for image classification
NASA Astrophysics Data System (ADS)
Wang, Qiwei; Wan, Shouhong; Yue, Lihua; Wang, Che
2014-04-01
Bag-of-words is a classical method for image classification. The core problem is how to count the frequency of the visual words and what visual words to select. In this paper, we propose a visual attention based bag-of-words model (VABOW model) for image classification task. The VABOW model utilizes visual attention method to generate a saliency map, and uses the saliency map as a weighted matrix to instruct the statistic process for the frequency of the visual words. On the other hand, the VABOW model combines shape, color and texture cues and uses L1 regularization logistic regression method to select the most relevant and most efficient features. We compare our approach with traditional bag-of-words based method on two datasets, and the result shows that our VABOW model outperforms the state-of-the-art method for image classification.
Modeling of geogenic radon in Switzerland based on ordered logistic regression.
Kropat, Georg; Bochud, François; Murith, Christophe; Palacios Gruson, Martha; Baechler, Sébastien
2017-01-01
The estimation of the radon hazard of a future construction site should ideally be based on the geogenic radon potential (GRP), since this estimate is free of anthropogenic influences and building characteristics. The goal of this study was to evaluate terrestrial gamma dose rate (TGD), geology, fault lines and topsoil permeability as predictors for the creation of a GRP map based on logistic regression. Soil gas radon measurements (SRC) are more suited for the estimation of GRP than indoor radon measurements (IRC) since the former do not depend on ventilation and heating habits or building characteristics. However, SRC have only been measured at a few locations in Switzerland. In former studies a good correlation between spatial aggregates of IRC and SRC has been observed. That's why we used IRC measurements aggregated on a 10 km × 10 km grid to calibrate an ordered logistic regression model for geogenic radon potential (GRP). As predictors we took into account terrestrial gamma doserate, regrouped geological units, fault line density and the permeability of the soil. The classification success rate of the model results to 56% in case of the inclusion of all 4 predictor variables. Our results suggest that terrestrial gamma doserate and regrouped geological units are more suited to model GRP than fault line density and soil permeability. Ordered logistic regression is a promising tool for the modeling of GRP maps due to its simplicity and fast computation time. Future studies should account for additional variables to improve the modeling of high radon hazard in the Jura Mountains of Switzerland. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Prediction model for the return to work of workers with injuries in Hong Kong.
Xu, Yanwen; Chan, Chetwyn C H; Lo, Karen Hui Yu-Ling; Tang, Dan
2008-01-01
This study attempts to formulate a prediction model of return to work for a group of workers who have been suffering from chronic pain and physical injury while also being out of work in Hong Kong. The study used Case-based Reasoning (CBR) method, and compared the result with the statistical method of logistic regression model. The database of the algorithm of CBR was composed of 67 cases who were also used in the logistic regression model. The testing cases were 32 participants who had a similar background and characteristics to those in the database. The methods of setting constraints and Euclidean distance metric were used in CBR to search the closest cases to the trial case based on the matrix. The usefulness of the algorithm was tested on 32 new participants, and the accuracy of predicting return to work outcomes was 62.5%, which was no better than the 71.2% accuracy derived from the logistic regression model. The results of the study would enable us to have a better understanding of the CBR applied in the field of occupational rehabilitation by comparing with the conventional regression analysis. The findings would also shed light on the development of relevant interventions for the return-to-work process of these workers.
Actively learning to distinguish suspicious from innocuous anomalies in a batch of vehicle tracks
NASA Astrophysics Data System (ADS)
Qiu, Zhicong; Miller, David J.; Stieber, Brian; Fair, Tim
2014-06-01
We investigate the problem of actively learning to distinguish between two sets of anomalous vehicle tracks, innocuous" and suspicious", starting from scratch, without any initial examples of suspicious" and with no prior knowledge of what an operator would deem suspicious. This two-class problem is challenging because it is a priori unknown which track features may characterize the suspicious class. Furthermore, there is inherent imbalance in the sizes of the labeled innocuous" and suspicious" sets, even after some suspicious examples are identified. We present a comprehensive solution wherein a classifier learns to discriminate suspicious from innocuous based on derived p-value track features. Through active learning, our classifier thus learns the types of anomalies on which to base its discrimination. Our solution encompasses: i) judicious choice of kinematic p-value based features conditioned on the road of origin, along with more explicit features that capture unique vehicle behavior (e.g. U-turns); ii) novel semi-supervised learning that exploits information in the unlabeled (test batch) tracks, and iii) evaluation of several classifier models (logistic regression, SVMs). We find that two active labeling streams are necessary in practice in order to have efficient classifier learning while also forwarding (for labeling) the most actionable tracks. Experiments on wide-area motion imagery (WAMI) tracks, extracted via a system developed by Toyon Research Corporation, demonstrate the strong ROC AUC performance of our system, with sparing use of operator-based active labeling.
Logistic regression applied to natural hazards: rare event logistic regression with replications
NASA Astrophysics Data System (ADS)
Guns, M.; Vanacker, V.
2012-06-01
Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.
Color normalization for robust evaluation of microscopy images
NASA Astrophysics Data System (ADS)
Švihlík, Jan; Kybic, Jan; Habart, David
2015-09-01
This paper deals with color normalization of microscopy images of Langerhans islets in order to increase robustness of the islet segmentation to illumination changes. The main application is automatic quantitative evaluation of the islet parameters, useful for determining the feasibility of islet transplantation in diabetes. First, background illumination inhomogeneity is compensated and a preliminary foreground/background segmentation is performed. The color normalization itself is done in either lαβ or logarithmic RGB color spaces, by comparison with a reference image. The color-normalized images are segmented using color-based features and pixel-wise logistic regression, trained on manually labeled images. Finally, relevant statistics such as the total islet area are evaluated in order to determine the success likelihood of the transplantation.
Development and evaluation of an ambulatory stress monitor based on wearable sensors.
Choi, Jongyoon; Ahmed, Beena; Gutierrez-Osuna, Ricardo
2012-03-01
Chronic stress is endemic to modern society. However, as it is unfeasible for physicians to continuously monitor stress levels, its diagnosis is nontrivial. Wireless body sensor networks offer opportunities to ubiquitously detect and monitor mental stress levels, enabling improved diagnosis, and early treatment. This article describes the development of a wearable sensor platform to monitor a number of physiological correlates of mental stress. We discuss tradeoffs in both system design and sensor selection to balance information content and wearability. Using experimental signals collected from the wearable sensor, we describe a selected number of physiological features that show good correlation with mental stress. In particular, we propose a new spectral feature that estimates the balance of the autonomic nervous system by combining information from the power spectral density of respiration and heart rate variability. We validate the effectiveness of our approach on a binary discrimination problem when subjects are placed under two psychophysiological conditions: mental stress and relaxation. When used in a logistic regression model, our feature set is able to discriminate between these two mental states with a success rate of 81% across subjects. © 2012 IEEE
Ohshige, K; Morio, S; Mizushima, S; Kitamura, K; Tajima, K; Ito, A; Suyama, A; Usuku, S; Phalla, T; Leng, H B; Sopheab, H; Eab, B; Soda, K
1999-01-01
To describe epidemiological features of HIV prevalence among female commercial sex workers (CSWs) in Cambodia, a cross-sectional study using a questionnaire study and serological tests was carried out from December 1997 to January 1998. We report the main results of the analyses of serological tests in this article. Two hundred ninety six CSWs working in Sisophon and Poi Pet, located in northwest Cambodia, Bantey Mean Chey province, were recruited for interview based on a questionnaire on sexual behavior, and serological tests. The blood samples were examined for HIV antibody, Chlamydia trachomatis IgG antibody, TPHA, Hepatitis B surface antigen, and Hepatitis B surface antibody. The relationship between HIV and the other STD's was analyzed by using logistic regression analysis. The HIV seroprevalence rate was 43.9% (130 out of 296). The seropositive rate of Chlamydia trachomatis IgG antibody (C.T.-IgG-Ab) was 73.3% (217 out of 296). Logistic regression analysis showed a significant association between C.T.-IgG-Ab positive and HIV prevalence. (Odds Ratio: 5.33; 95% Confidence Interval, 2.82-10.07). This study suggests that the existence of Chlamydia trachomatis is closely related with HIV prevalence among CSWs in Cambodia. Other STDs may also increase susceptibility to male-to-female sexual transmission of HIV. This suggests that appropriate prevention against STDs will be needed for the control of HIV prevalence in Cambodia.
Identification of Alfalfa Leaf Diseases Using Image Recognition Technology
Qin, Feng; Liu, Dongxia; Sun, Bingda; Ruan, Liu; Ma, Zhanhong; Wang, Haiguang
2016-01-01
Common leaf spot (caused by Pseudopeziza medicaginis), rust (caused by Uromyces striatus), Leptosphaerulina leaf spot (caused by Leptosphaerulina briosiana) and Cercospora leaf spot (caused by Cercospora medicaginis) are the four common types of alfalfa leaf diseases. Timely and accurate diagnoses of these diseases are critical for disease management, alfalfa quality control and the healthy development of the alfalfa industry. In this study, the identification and diagnosis of the four types of alfalfa leaf diseases were investigated using pattern recognition algorithms based on image-processing technology. A sub-image with one or multiple typical lesions was obtained by artificial cutting from each acquired digital disease image. Then the sub-images were segmented using twelve lesion segmentation methods integrated with clustering algorithms (including K_means clustering, fuzzy C-means clustering and K_median clustering) and supervised classification algorithms (including logistic regression analysis, Naive Bayes algorithm, classification and regression tree, and linear discriminant analysis). After a comprehensive comparison, the segmentation method integrating the K_median clustering algorithm and linear discriminant analysis was chosen to obtain lesion images. After the lesion segmentation using this method, a total of 129 texture, color and shape features were extracted from the lesion images. Based on the features selected using three methods (ReliefF, 1R and correlation-based feature selection), disease recognition models were built using three supervised learning methods, including the random forest, support vector machine (SVM) and K-nearest neighbor methods. A comparison of the recognition results of the models was conducted. The results showed that when the ReliefF method was used for feature selection, the SVM model built with the most important 45 features (selected from a total of 129 features) was the optimal model. For this SVM model, the recognition accuracies of the training set and the testing set were 97.64% and 94.74%, respectively. Semi-supervised models for disease recognition were built based on the 45 effective features that were used for building the optimal SVM model. For the optimal semi-supervised models built with three ratios of labeled to unlabeled samples in the training set, the recognition accuracies of the training set and the testing set were both approximately 80%. The results indicated that image recognition of the four alfalfa leaf diseases can be implemented with high accuracy. This study provides a feasible solution for lesion image segmentation and image recognition of alfalfa leaf disease. PMID:27977767
Identification of Alfalfa Leaf Diseases Using Image Recognition Technology.
Qin, Feng; Liu, Dongxia; Sun, Bingda; Ruan, Liu; Ma, Zhanhong; Wang, Haiguang
2016-01-01
Common leaf spot (caused by Pseudopeziza medicaginis), rust (caused by Uromyces striatus), Leptosphaerulina leaf spot (caused by Leptosphaerulina briosiana) and Cercospora leaf spot (caused by Cercospora medicaginis) are the four common types of alfalfa leaf diseases. Timely and accurate diagnoses of these diseases are critical for disease management, alfalfa quality control and the healthy development of the alfalfa industry. In this study, the identification and diagnosis of the four types of alfalfa leaf diseases were investigated using pattern recognition algorithms based on image-processing technology. A sub-image with one or multiple typical lesions was obtained by artificial cutting from each acquired digital disease image. Then the sub-images were segmented using twelve lesion segmentation methods integrated with clustering algorithms (including K_means clustering, fuzzy C-means clustering and K_median clustering) and supervised classification algorithms (including logistic regression analysis, Naive Bayes algorithm, classification and regression tree, and linear discriminant analysis). After a comprehensive comparison, the segmentation method integrating the K_median clustering algorithm and linear discriminant analysis was chosen to obtain lesion images. After the lesion segmentation using this method, a total of 129 texture, color and shape features were extracted from the lesion images. Based on the features selected using three methods (ReliefF, 1R and correlation-based feature selection), disease recognition models were built using three supervised learning methods, including the random forest, support vector machine (SVM) and K-nearest neighbor methods. A comparison of the recognition results of the models was conducted. The results showed that when the ReliefF method was used for feature selection, the SVM model built with the most important 45 features (selected from a total of 129 features) was the optimal model. For this SVM model, the recognition accuracies of the training set and the testing set were 97.64% and 94.74%, respectively. Semi-supervised models for disease recognition were built based on the 45 effective features that were used for building the optimal SVM model. For the optimal semi-supervised models built with three ratios of labeled to unlabeled samples in the training set, the recognition accuracies of the training set and the testing set were both approximately 80%. The results indicated that image recognition of the four alfalfa leaf diseases can be implemented with high accuracy. This study provides a feasible solution for lesion image segmentation and image recognition of alfalfa leaf disease.
Association between mammogram density and background parenchymal enhancement of breast MRI
NASA Astrophysics Data System (ADS)
Aghaei, Faranak; Danala, Gopichandh; Wang, Yunzhi; Zarafshani, Ali; Qian, Wei; Liu, Hong; Zheng, Bin
2018-02-01
Breast density has been widely considered as an important risk factor for breast cancer. The purpose of this study is to examine the association between mammogram density results and background parenchymal enhancement (BPE) of breast MRI. A dataset involving breast MR images was acquired from 65 high-risk women. Based on mammography density (BIRADS) results, the dataset was divided into two groups of low and high breast density cases. The Low-Density group has 15 cases with mammographic density (BIRADS 1 and 2), while the High-density group includes 50 cases, which were rated by radiologists as mammographic density BIRADS 3 and 4. A computer-aided detection (CAD) scheme was applied to segment and register breast regions depicted on sequential images of breast MRI scans. CAD scheme computed 20 global BPE features from the entire two breast regions, separately from the left and right breast region, as well as from the bilateral difference between left and right breast regions. An image feature selection method namely, CFS method, was applied to remove the most redundant features and select optimal features from the initial feature pool. Then, a logistic regression classifier was built using the optimal features to predict the mammogram density from the BPE features. Using a leave-one-case-out validation method, the classifier yields the accuracy of 82% and area under ROC curve, AUC=0.81+/-0.09. Also, the box-plot based analysis shows a negative association between mammogram density results and BPE features in the MRI images. This study demonstrated a negative association between mammogram density and BPE of breast MRI images.
Elahian, Bahareh; Yeasin, Mohammed; Mudigoudar, Basanagoud; Wheless, James W; Babajani-Feremi, Abbas
2017-10-01
Using a novel technique based on phase locking value (PLV), we investigated the potential for features extracted from electrocorticographic (ECoG) recordings to serve as biomarkers to identify the seizure onset zone (SOZ). We computed the PLV between the phase of the amplitude of high gamma activity (80-150Hz) and the phase of lower frequency rhythms (4-30Hz) from ECoG recordings obtained from 10 patients with epilepsy (21 seizures). We extracted five features from the PLV and used a machine learning approach based on logistic regression to build a model that classifies electrodes as SOZ or non-SOZ. More than 96% of electrodes identified as the SOZ by our algorithm were within the resected area in six seizure-free patients. In four non-seizure-free patients, more than 31% of the identified SOZ electrodes by our algorithm were outside the resected area. In addition, we observed that the seizure outcome in non-seizure-free patients correlated with the number of non-resected SOZ electrodes identified by our algorithm. This machine learning approach, based on features extracted from the PLV, effectively identified electrodes within the SOZ. The approach has the potential to assist clinicians in surgical decision-making when pre-surgical intracranial recordings are utilized. Copyright © 2017 British Epilepsy Association. Published by Elsevier Ltd. All rights reserved.
Risk estimation using probability machines
2014-01-01
Background Logistic regression has been the de facto, and often the only, model used in the description and analysis of relationships between a binary outcome and observed features. It is widely used to obtain the conditional probabilities of the outcome given predictors, as well as predictor effect size estimates using conditional odds ratios. Results We show how statistical learning machines for binary outcomes, provably consistent for the nonparametric regression problem, can be used to provide both consistent conditional probability estimation and conditional effect size estimates. Effect size estimates from learning machines leverage our understanding of counterfactual arguments central to the interpretation of such estimates. We show that, if the data generating model is logistic, we can recover accurate probability predictions and effect size estimates with nearly the same efficiency as a correct logistic model, both for main effects and interactions. We also propose a method using learning machines to scan for possible interaction effects quickly and efficiently. Simulations using random forest probability machines are presented. Conclusions The models we propose make no assumptions about the data structure, and capture the patterns in the data by just specifying the predictors involved and not any particular model structure. So they do not run the same risks of model mis-specification and the resultant estimation biases as a logistic model. This methodology, which we call a “risk machine”, will share properties from the statistical machine that it is derived from. PMID:24581306
Risk estimation using probability machines.
Dasgupta, Abhijit; Szymczak, Silke; Moore, Jason H; Bailey-Wilson, Joan E; Malley, James D
2014-03-01
Logistic regression has been the de facto, and often the only, model used in the description and analysis of relationships between a binary outcome and observed features. It is widely used to obtain the conditional probabilities of the outcome given predictors, as well as predictor effect size estimates using conditional odds ratios. We show how statistical learning machines for binary outcomes, provably consistent for the nonparametric regression problem, can be used to provide both consistent conditional probability estimation and conditional effect size estimates. Effect size estimates from learning machines leverage our understanding of counterfactual arguments central to the interpretation of such estimates. We show that, if the data generating model is logistic, we can recover accurate probability predictions and effect size estimates with nearly the same efficiency as a correct logistic model, both for main effects and interactions. We also propose a method using learning machines to scan for possible interaction effects quickly and efficiently. Simulations using random forest probability machines are presented. The models we propose make no assumptions about the data structure, and capture the patterns in the data by just specifying the predictors involved and not any particular model structure. So they do not run the same risks of model mis-specification and the resultant estimation biases as a logistic model. This methodology, which we call a "risk machine", will share properties from the statistical machine that it is derived from.
ERIC Educational Resources Information Center
Drabinová, Adéla; Martinková, Patrícia
2017-01-01
In this article we present a general approach not relying on item response theory models (non-IRT) to detect differential item functioning (DIF) in dichotomous items with presence of guessing. The proposed nonlinear regression (NLR) procedure for DIF detection is an extension of method based on logistic regression. As a non-IRT approach, NLR can…
NASA Astrophysics Data System (ADS)
Esquinca, Alberto
This is a study of language use in the context of an inquiry-based science curriculum in which conceptual understanding ratings are used split texts into groups of "successful" and "unsuccessful" texts. "Successful" texts could include known features of science language. 420 texts generated by students in 14 classrooms from three school districts, culled from a prior study on the effectiveness of science notebooks to assess understanding, in addition to the aforementioned ratings are the data sources. In science notebooks, students write in the process of learning (here, a unit on electricity). The analytical framework is systemic functional linguistics (Halliday and Matthiessen, 2004; Eggins, 2004), specifically the concepts of genre, register and nominalization. Genre classification involves an analysis of the purpose and register features in the text (Schleppegrell, 2004). The use of features of the scientific academic register, namely the use relational processes and nominalization (Halliday and Martin, 1993), requires transitivity analysis and noun analysis. Transitivity analysis, consisting of the identification of the process type, is conducted on 4737 ranking clauses. A manual count of each noun used in the corpus allows for a typology of nouns. Four school science genres, procedures, procedural recounts reports and explanations, are found. Most texts (85.4%) are factual, and 14.1% are classified as explanations, the analytical genre. Logistic regression analysis indicates that there is no significant probability that the texts classified as explanation are placed in the group of "successful" texts. In addition, material process clauses predominate in the corpus, followed by relational process clauses. Results of a logistic regression analysis indicate that there is a significant probability (Chi square = 15.23, p < .0001) that texts with a high rate of relational processes are placed in the group of "successful" texts. In addition, 59.5% of 6511 nouns are references to physical materials, followed by references to abstract concepts (35.54%). Only two of the concept nouns were found to be nominalized referents in definition model sentences. In sum, the corpus has recognizable genres and features science language, and relational processes are more prevalent in "successful" texts. However, the pervasive feature of science language, nominalization, is scarce.
A Methodology for Generating Placement Rules that Utilizes Logistic Regression
ERIC Educational Resources Information Center
Wurtz, Keith
2008-01-01
The purpose of this article is to provide the necessary tools for institutional researchers to conduct a logistic regression analysis and interpret the results. Aspects of the logistic regression procedure that are necessary to evaluate models are presented and discussed with an emphasis on cutoff values and choosing the appropriate number of…
John Hogland; Nedret Billor; Nathaniel Anderson
2013-01-01
Discriminant analysis, referred to as maximum likelihood classification within popular remote sensing software packages, is a common supervised technique used by analysts. Polytomous logistic regression (PLR), also referred to as multinomial logistic regression, is an alternative classification approach that is less restrictive, more flexible, and easy to interpret. To...
Jiang, Feng; Han, Ji-zhong
2018-01-01
Cross-domain collaborative filtering (CDCF) solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR) model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods. PMID:29623088
Yu, Xu; Lin, Jun-Yu; Jiang, Feng; Du, Jun-Wei; Han, Ji-Zhong
2018-01-01
Cross-domain collaborative filtering (CDCF) solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR) model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods.
Killeen, Peter R
2015-07-01
The generalized matching law (GML) is reconstructed as a logistic regression equation that privileges no particular value of the sensitivity parameter, a. That value will often approach 1 due to the feedback that drives switching that is intrinsic to most concurrent schedules. A model of that feedback reproduced some features of concurrent data. The GML is a law only in the strained sense that any equation that maps data is a law. The machine under the hood of matching is in all likelihood the very law that was displaced by the Matching Law. It is now time to return the Law of Effect to centrality in our science. © Society for the Experimental Analysis of Behavior.
A random forest model based classification scheme for neonatal amplitude-integrated EEG.
Chen, Weiting; Wang, Yu; Cao, Guitao; Chen, Guoqiang; Gu, Qiufang
2014-01-01
Modern medical advances have greatly increased the survival rate of infants, while they remain in the higher risk group for neurological problems later in life. For the infants with encephalopathy or seizures, identification of the extent of brain injury is clinically challenging. Continuous amplitude-integrated electroencephalography (aEEG) monitoring offers a possibility to directly monitor the brain functional state of the newborns over hours, and has seen an increasing application in neonatal intensive care units (NICUs). This paper presents a novel combined feature set of aEEG and applies random forest (RF) method to classify aEEG tracings. To that end, a series of experiments were conducted on 282 aEEG tracing cases (209 normal and 73 abnormal ones). Basic features, statistic features and segmentation features were extracted from both the tracing as a whole and the segmented recordings, and then form a combined feature set. All the features were sent to a classifier afterwards. The significance of feature, the data segmentation, the optimization of RF parameters, and the problem of imbalanced datasets were examined through experiments. Experiments were also done to evaluate the performance of RF on aEEG signal classifying, compared with several other widely used classifiers including SVM-Linear, SVM-RBF, ANN, Decision Tree (DT), Logistic Regression(LR), ML, and LDA. The combined feature set can better characterize aEEG signals, compared with basic features, statistic features and segmentation features respectively. With the combined feature set, the proposed RF-based aEEG classification system achieved a correct rate of 92.52% and a high F1-score of 95.26%. Among all of the seven classifiers examined in our work, the RF method got the highest correct rate, sensitivity, specificity, and F1-score, which means that RF outperforms all of the other classifiers considered here. The results show that the proposed RF-based aEEG classification system with the combined feature set is efficient and helpful to better detect the brain disorders in newborns.
Cross-sectional study on risk factors of HIV among female commercial sex workers in Cambodia.
Ohshige, K.; Morio, S.; Mizushima, S.; Kitamura, K.; Tajima, K.; Ito, A.; Suyama, A.; Usuku, S.; Saphonn, V.; Heng, S.; Hor, L. B.; Tia, P.; Soda, K.
2000-01-01
To describe epidemiological features on HIV prevalence among female commercial sex workers (CSWs), a cross-sectional study on sexual behaviour and serological prevalence was carried out in Cambodia. The CSWs were interviewed on their demographic characters and behaviour and their blood samples were taken for testing on sexually transmitted diseases, including HIV, Chlamydia trachomatis, syphilis, and hepatitis B. Associations between risk factors and HIV seropositivity were analysed. High seroprevalence of HIV and Chlamydia trachomatis IgG antibody (CT-IgG-Ab) was shown among the CSWs (54 and 81.7%, respectively). Univariate logistic regression analyses showed an association between HIV seropositivity and age, duration of prostitution, the number of clients per day and CT-IgG-Ab. Especially, high-titre chlamydial seropositivity showed a strong significant association with HIV prevalence. In multiple logistic regression analyses, CT-IgG-Ab with higher titre was significantly independently related to HIV infection. These suggest that existence of Chlamydia trachomatis is highly related to HIV prevalence. PMID:10722142
2013-01-01
Background Machine learning techniques are becoming useful as an alternative approach to conventional medical diagnosis or prognosis as they are good for handling noisy and incomplete data, and significant results can be attained despite a small sample size. Traditionally, clinicians make prognostic decisions based on clinicopathologic markers. However, it is not easy for the most skilful clinician to come out with an accurate prognosis by using these markers alone. Thus, there is a need to use genomic markers to improve the accuracy of prognosis. The main aim of this research is to apply a hybrid of feature selection and machine learning methods in oral cancer prognosis based on the parameters of the correlation of clinicopathologic and genomic markers. Results In the first stage of this research, five feature selection methods have been proposed and experimented on the oral cancer prognosis dataset. In the second stage, the model with the features selected from each feature selection methods are tested on the proposed classifiers. Four types of classifiers are chosen; these are namely, ANFIS, artificial neural network, support vector machine and logistic regression. A k-fold cross-validation is implemented on all types of classifiers due to the small sample size. The hybrid model of ReliefF-GA-ANFIS with 3-input features of drink, invasion and p63 achieved the best accuracy (accuracy = 93.81%; AUC = 0.90) for the oral cancer prognosis. Conclusions The results revealed that the prognosis is superior with the presence of both clinicopathologic and genomic markers. The selected features can be investigated further to validate the potential of becoming as significant prognostic signature in the oral cancer studies. PMID:23725313
Goltz, Annemarie; Janowitz, Deborah; Hannemann, Anke; Nauck, Matthias; Hoffmann, Johanna; Seyfart, Tom; Völzke, Henry; Terock, Jan; Grabe, Hans Jörgen
2018-06-19
Depression and obesity are widespread and closely linked. Brain-derived neurotrophic factor (BDNF) and vitamin D are both assumed to be associated with depression and obesity. Little is known about the interplay between vitamin D and BDNF. We explored the putative associations and interactions between serum BDNF and vitamin D levels with depressive symptoms and abdominal obesity in a large population-based cohort. Data were obtained from the population-based Study of Health in Pomerania (SHIP)-Trend (n = 3,926). The associations of serum BDNF and vitamin D levels with depressive symptoms (measured using the Patient Health Questionnaire) were assessed with binary and multinomial logistic regression models. The associations of serum BDNF and vitamin D levels with obesity (measured by the waist-to-hip ratio [WHR]) were assessed with binary logistic and linear regression models with restricted cubic splines. Logistic regression models revealed inverse associations of vitamin D with depression (OR = 0.966; 95% CI 0.951-0.981) and obesity (OR = 0.976; 95% CI 0.967-0.985). No linear association of serum BDNF with depression or obesity was found. However, linear regression models revealed a U-shaped association of BDNF with WHR (p < 0.001). Vitamin D was inversely associated with depression and obesity. BDNF was associated with abdominal obesity, but not with depression. At the population level, our results support the relevant roles of vitamin D and BDNF in mental and physical health-related outcomes. © 2018 S. Karger AG, Basel.
DeLisi, Matt; Angton, Alexia; Vaughn, Michael G; Trulson, Chad R; Caudill, Jonathan W; Beaver, Kevin M
2014-12-01
The association between psychopathy and crime is established, but the specific components of the personality disorders that most contribute to crime are largely unknown. Drawing on data from 723 confined delinquents in Missouri, the present study delved into the eight subscales of the Psychopathic Personality Inventory-Short Form to empirically assess the specific aspects of the disorder that are most responsible for explaining variation in career delinquency. Blame externalization emerged as the strongest predictor of career delinquency in ordinary least squares regression, logistic regression, and t-test models. Fearlessness and carefree nonplanfulness were also significant in all models. Other features of psychopathy, such as stress immunity, social potency, and coldheartedness were weakly and inconsistently predictive of career delinquency. Implications of these findings for the study of psychopathy and delinquent careers are discussed in this article. © The Author(s) 2013.
A wavelet-based technique to predict treatment outcome for Major Depressive Disorder.
Mumtaz, Wajid; Xia, Likun; Mohd Yasin, Mohd Azhar; Azhar Ali, Syed Saad; Malik, Aamir Saeed
2017-01-01
Treatment management for Major Depressive Disorder (MDD) has been challenging. However, electroencephalogram (EEG)-based predictions of antidepressant's treatment outcome may help during antidepressant's selection and ultimately improve the quality of life for MDD patients. In this study, a machine learning (ML) method involving pretreatment EEG data was proposed to perform such predictions for Selective Serotonin Reuptake Inhibitor (SSRIs). For this purpose, the acquisition of experimental data involved 34 MDD patients and 30 healthy controls. Consequently, a feature matrix was constructed involving time-frequency decomposition of EEG data based on wavelet transform (WT) analysis, termed as EEG data matrix. However, the resultant EEG data matrix had high dimensionality. Therefore, dimension reduction was performed based on a rank-based feature selection method according to a criterion, i.e., receiver operating characteristic (ROC). As a result, the most significant features were identified and further be utilized during the training and testing of a classification model, i.e., the logistic regression (LR) classifier. Finally, the LR model was validated with 100 iterations of 10-fold cross-validation (10-CV). The classification results were compared with short-time Fourier transform (STFT) analysis, and empirical mode decompositions (EMD). The wavelet features extracted from frontal and temporal EEG data were found statistically significant. In comparison with other time-frequency approaches such as the STFT and EMD, the WT analysis has shown highest classification accuracy, i.e., accuracy = 87.5%, sensitivity = 95%, and specificity = 80%. In conclusion, significant wavelet coefficients extracted from frontal and temporal pre-treatment EEG data involving delta and theta frequency bands may predict antidepressant's treatment outcome for the MDD patients.
Choi, Young Jun; Baek, Jung Hwan; Shin, Jung Hee; Shim, Woo Hyun; Kim, Seon-Ok; Lee, Won-Hong; Song, Dong Eun; Kim, Tae Yong; Chung, Ki-Wook; Lee, Jeong Hyun
2018-05-13
The purpose of this study was to construct a web-based predictive model using ultrasound characteristics and subcategorized biopsy results for thyroid nodules of atypia of undetermined significance/follicular lesion of undetermined significance (AUS/FLUS) to stratify the risk of malignancy. Data included 672 thyroid nodules from 656 patients from a historical cohort. We analyzed ultrasound images of thyroid nodules and biopsy results according to nuclear atypia and architectural atypia. Multivariate logistic regression analysis was performed to predict whether nodules were diagnosed as malignant or benign. The ultrasound features, including spiculated margin, marked hypoechogenicity, calcifications, biopsy results, and cytologic atypia, showed significant differences between groups. A 13-point risk scoring system was developed, and the area under the curve (AUC) of the receiver operating characteristic (ROC) curve of the development and validation sets were 0.837 and 0.830, respectively (http://www.gap.kr/thyroidnodule_b3.php). We devised a web-based predictive model using the combined information of ultrasound characteristics and biopsy results for AUS/FLUS thyroid nodules to stratify the malignant risk. © 2018 Wiley Periodicals, Inc.
A probabilistic model of emphysema based on granulometry analysis
NASA Astrophysics Data System (ADS)
Marcos, J. V.; Nava, R.; Cristobal, G.; Munoz-Barrutia, A.; Escalante-Ramírez, B.; Ortiz-de-Solórzano, C.
2013-11-01
Emphysema is associated with the destruction of lung parenchyma, resulting in abnormal enlargement of airspaces. Accurate quantification of emphysema is required for a better understanding of the disease as well as for the assessment of drugs and treatments. In the present study, a novel method for emphysema characterization from histological lung images is proposed. Elastase-induced mice were used to simulate the effect of emphysema on the lungs. A database composed of 50 normal and 50 emphysematous lung patches of size 512 x 512 pixels was used in our experiments. The purpose is to automatically identify those patches containing emphysematous tissue. The proposed approach is based on the use of granulometry analysis, which provides the pattern spectrum describing the distribution of airspaces in the lung region under evaluation. The profile of the spectrum was summarized by a set of statistical features. A logistic regression model was then used to estimate the probability for a patch to be emphysematous from this feature set. An accuracy of 87% was achieved by our method in the classification between normal and emphysematous samples. This result shows the utility of our granulometry-based method to quantify the lesions due to emphysema.
Ren, Yilong; Wang, Yunpeng; Wu, Xinkai; Yu, Guizhen; Ding, Chuan
2016-10-01
Red light running (RLR) has become a major safety concern at signalized intersection. To prevent RLR related crashes, it is critical to identify the factors that significantly impact the drivers' behaviors of RLR, and to predict potential RLR in real time. In this research, 9-month's RLR events extracted from high-resolution traffic data collected by loop detectors from three signalized intersections were applied to identify the factors that significantly affect RLR behaviors. The data analysis indicated that occupancy time, time gap, used yellow time, time left to yellow start, whether the preceding vehicle runs through the intersection during yellow, and whether there is a vehicle passing through the intersection on the adjacent lane were significantly factors for RLR behaviors. Furthermore, due to the rare events nature of RLR, a modified rare events logistic regression model was developed for RLR prediction. The rare events logistic regression method has been applied in many fields for rare events studies and shows impressive performance, but so far none of previous research has applied this method to study RLR. The results showed that the rare events logistic regression model performed significantly better than the standard logistic regression model. More importantly, the proposed RLR prediction method is purely based on loop detector data collected from a single advance loop detector located 400 feet away from stop-bar. This brings great potential for future field applications of the proposed method since loops have been widely implemented in many intersections and can collect data in real time. This research is expected to contribute to the improvement of intersection safety significantly. Copyright © 2016 Elsevier Ltd. All rights reserved.
What Are the Odds of that? A Primer on Understanding Logistic Regression
ERIC Educational Resources Information Center
Huang, Francis L.; Moon, Tonya R.
2013-01-01
The purpose of this Methodological Brief is to present a brief primer on logistic regression, a commonly used technique when modeling dichotomous outcomes. Using data from the National Education Longitudinal Study of 1988 (NELS:88), logistic regression techniques were used to investigate student-level variables in eighth grade (i.e., enrolled in a…
Peterson, Erin L; Carlson, Susan A; Schmid, Thomas L; Brown, David R; Galuska, Deborah A
2018-01-01
The purpose of this study was to examine the association between the presence of supportive community planning documents in US municipalities with design standards and requirements supportive of active living. Cross-sectional study using data from the 2014 National Survey of Community-Based Policy and Environmental Supports for Healthy Eating and Active Living. Nationally representative sample of US municipalities. Respondents are 2005 local officials. Assessed: (1) The presence of design standards and feature requirements and (2) the association between planning documents and design standards and feature requirements supportive of active living in policies for development. Using logistic regression, significant trends were identified in the presence of design standards and feature requirements by plan and number of supportive objectives present. Prevalence of design standards ranged from 19% (developer dedicated right-of-way for bicycle infrastructure development) to 50% (traffic-calming features in areas with high pedestrian and bicycle volume). Features required in policies for development ranged from 14% (short/medium pedestrian-scale block sizes) to 44% (minimum sidewalk widths of 5 feet) of municipalities. As the number of objectives in municipal plans increased, there was a significant and positive trend ( P < .05) in the prevalence of each design standard and requirement. Municipal planning documents containing objectives supportive of physical activity are associated with design standards and feature requirements supportive of activity-friendly communities.
Kupek, Emil
2006-03-15
Structural equation modelling (SEM) has been increasingly used in medical statistics for solving a system of related regression equations. However, a great obstacle for its wider use has been its difficulty in handling categorical variables within the framework of generalised linear models. A large data set with a known structure among two related outcomes and three independent variables was generated to investigate the use of Yule's transformation of odds ratio (OR) into Q-metric by (OR-1)/(OR+1) to approximate Pearson's correlation coefficients between binary variables whose covariance structure can be further analysed by SEM. Percent of correctly classified events and non-events was compared with the classification obtained by logistic regression. The performance of SEM based on Q-metric was also checked on a small (N = 100) random sample of the data generated and on a real data set. SEM successfully recovered the generated model structure. SEM of real data suggested a significant influence of a latent confounding variable which would have not been detectable by standard logistic regression. SEM classification performance was broadly similar to that of the logistic regression. The analysis of binary data can be greatly enhanced by Yule's transformation of odds ratios into estimated correlation matrix that can be further analysed by SEM. The interpretation of results is aided by expressing them as odds ratios which are the most frequently used measure of effect in medical statistics.
On the Usefulness of a Multilevel Logistic Regression Approach to Person-Fit Analysis
ERIC Educational Resources Information Center
Conijn, Judith M.; Emons, Wilco H. M.; van Assen, Marcel A. L. M.; Sijtsma, Klaas
2011-01-01
The logistic person response function (PRF) models the probability of a correct response as a function of the item locations. Reise (2000) proposed to use the slope parameter of the logistic PRF as a person-fit measure. He reformulated the logistic PRF model as a multilevel logistic regression model and estimated the PRF parameters from this…
Guo, Yanyong; Li, Zhibin; Wu, Yao; Xu, Chengcheng
2018-06-01
Bicyclists running the red light at crossing facilities increase the potential of colliding with motor vehicles. Exploring the contributing factors could improve the prediction of running red-light probability and develop countermeasures to reduce such behaviors. However, individuals could have unobserved heterogeneities in running a red light, which make the accurate prediction more challenging. Traditional models assume that factor parameters are fixed and cannot capture the varying impacts on red-light running behaviors. In this study, we employed the full Bayesian random parameters logistic regression approach to account for the unobserved heterogeneous effects. Two types of crossing facilities were considered which were the signalized intersection crosswalks and the road segment crosswalks. Electric and conventional bikes were distinguished in the modeling. Data were collected from 16 crosswalks in urban area of Nanjing, China. Factors such as individual characteristics, road geometric design, environmental features, and traffic variables were examined. Model comparison indicates that the full Bayesian random parameters logistic regression approach is statistically superior to the standard logistic regression model. More red-light runners are predicted at signalized intersection crosswalks than at road segment crosswalks. Factors affecting red-light running behaviors are gender, age, bike type, road width, presence of raised median, separation width, signal type, green ratio, bike and vehicle volume, and average vehicle speed. Factors associated with the unobserved heterogeneity are gender, bike type, signal type, separation width, and bike volume. Copyright © 2018 Elsevier Ltd. All rights reserved.
Valle, Denis; Lima, Joanna M Tucker; Millar, Justin; Amratia, Punam; Haque, Ubydul
2015-11-04
Logistic regression is a statistical model widely used in cross-sectional and cohort studies to identify and quantify the effects of potential disease risk factors. However, the impact of imperfect tests on adjusted odds ratios (and thus on the identification of risk factors) is under-appreciated. The purpose of this article is to draw attention to the problem associated with modelling imperfect diagnostic tests, and propose simple Bayesian models to adequately address this issue. A systematic literature review was conducted to determine the proportion of malaria studies that appropriately accounted for false-negatives/false-positives in a logistic regression setting. Inference from the standard logistic regression was also compared with that from three proposed Bayesian models using simulations and malaria data from the western Brazilian Amazon. A systematic literature review suggests that malaria epidemiologists are largely unaware of the problem of using logistic regression to model imperfect diagnostic test results. Simulation results reveal that statistical inference can be substantially improved when using the proposed Bayesian models versus the standard logistic regression. Finally, analysis of original malaria data with one of the proposed Bayesian models reveals that microscopy sensitivity is strongly influenced by how long people have lived in the study region, and an important risk factor (i.e., participation in forest extractivism) is identified that would have been missed by standard logistic regression. Given the numerous diagnostic methods employed by malaria researchers and the ubiquitous use of logistic regression to model the results of these diagnostic tests, this paper provides critical guidelines to improve data analysis practice in the presence of misclassification error. Easy-to-use code that can be readily adapted to WinBUGS is provided, enabling straightforward implementation of the proposed Bayesian models.
Automatically Detecting Likely Edits in Clinical Notes Created Using Automatic Speech Recognition
Lybarger, Kevin; Ostendorf, Mari; Yetisgen, Meliha
2017-01-01
The use of automatic speech recognition (ASR) to create clinical notes has the potential to reduce costs associated with note creation for electronic medical records, but at current system accuracy levels, post-editing by practitioners is needed to ensure note quality. Aiming to reduce the time required to edit ASR transcripts, this paper investigates novel methods for automatic detection of edit regions within the transcripts, including both putative ASR errors but also regions that are targets for cleanup or rephrasing. We create detection models using logistic regression and conditional random field models, exploring a variety of text-based features that consider the structure of clinical notes and exploit the medical context. Different medical text resources are used to improve feature extraction. Experimental results on a large corpus of practitioner-edited clinical notes show that 67% of sentence-level edits and 45% of word-level edits can be detected with a false detection rate of 15%. PMID:29854187
Automated robot-assisted surgical skill evaluation: Predictive analytics approach.
Fard, Mahtab J; Ameri, Sattar; Darin Ellis, R; Chinnam, Ratna B; Pandya, Abhilash K; Klein, Michael D
2018-02-01
Surgical skill assessment has predominantly been a subjective task. Recently, technological advances such as robot-assisted surgery have created great opportunities for objective surgical evaluation. In this paper, we introduce a predictive framework for objective skill assessment based on movement trajectory data. Our aim is to build a classification framework to automatically evaluate the performance of surgeons with different levels of expertise. Eight global movement features are extracted from movement trajectory data captured by a da Vinci robot for surgeons with two levels of expertise - novice and expert. Three classification methods - k-nearest neighbours, logistic regression and support vector machines - are applied. The result shows that the proposed framework can classify surgeons' expertise as novice or expert with an accuracy of 82.3% for knot tying and 89.9% for a suturing task. This study demonstrates and evaluates the ability of machine learning methods to automatically classify expert and novice surgeons using global movement features. Copyright © 2017 John Wiley & Sons, Ltd.
Park, Hea Ree; Youn, Jinyoung; Cho, Jin Whan; Oh, Eung-Seok; Kim, Ji Sun; Park, Suyeon; Jang, Wooyoung; Park, Jin Se
2018-01-01
Unlike young-onset Parkinson disease (YOPD), characteristics of late-onset PD (LOPD) have not yet been clearly elucidated. We investigated characteristic features and symptoms related to quality of life (QoL) in LOPD patients. We recruited drug-naïve, early PD patients. The patient cohort was divided into 3 subgroups based on patient age at onset (AAO): the YOPD group (AAO <50 years), the middle-onset PD (MOPD) group, and the LOPD group (AAO ≥70 years). Using various scales for motor symptoms (MS) and non-MS (NMS) and QoL, we compared the clinical features and impact on QoL. Of the 132 enrolled patients, 26 were in the YOPD group, 74 in the MOPD group, and 32 in the LOPD group. Among parkinsonian symptoms, patients in the LOPD group had a lower score on the Korean version of the Montreal Cognitive Assessment than the other groups. Logistic regression analysis showed genitourinary symptoms were related to the LOPD group. Linear regression analysis showed both MS and NMS were correlated with QoL in the MOPD group, but only NMS were correlated with QoL in the LOPD group. Particularly, anxiety and fatigue affected QoL in the LOPD group. LOPD patients showed different characteristic clinical features, and different symptoms were related with QoL for LOPD than YOPD and MOPD patients. © 2018 S. Karger AG, Basel.
Computational tools for exact conditional logistic regression.
Corcoran, C; Mehta, C; Patel, N; Senchaudhuri, P
Logistic regression analyses are often challenged by the inability of unconditional likelihood-based approximations to yield consistent, valid estimates and p-values for model parameters. This can be due to sparseness or separability in the data. Conditional logistic regression, though useful in such situations, can also be computationally unfeasible when the sample size or number of explanatory covariates is large. We review recent developments that allow efficient approximate conditional inference, including Monte Carlo sampling and saddlepoint approximations. We demonstrate through real examples that these methods enable the analysis of significantly larger and more complex data sets. We find in this investigation that for these moderately large data sets Monte Carlo seems a better alternative, as it provides unbiased estimates of the exact results and can be executed in less CPU time than can the single saddlepoint approximation. Moreover, the double saddlepoint approximation, while computationally the easiest to obtain, offers little practical advantage. It produces unreliable results and cannot be computed when a maximum likelihood solution does not exist. Copyright 2001 John Wiley & Sons, Ltd.
Logistic regression for risk factor modelling in stuttering research.
Reed, Phil; Wu, Yaqionq
2013-06-01
To outline the uses of logistic regression and other statistical methods for risk factor analysis in the context of research on stuttering. The principles underlying the application of a logistic regression are illustrated, and the types of questions to which such a technique has been applied in the stuttering field are outlined. The assumptions and limitations of the technique are discussed with respect to existing stuttering research, and with respect to formulating appropriate research strategies to accommodate these considerations. Finally, some alternatives to the approach are briefly discussed. The way the statistical procedures are employed are demonstrated with some hypothetical data. Research into several practical issues concerning stuttering could benefit if risk factor modelling were used. Important examples are early diagnosis, prognosis (whether a child will recover or persist) and assessment of treatment outcome. After reading this article you will: (a) Summarize the situations in which logistic regression can be applied to a range of issues about stuttering; (b) Follow the steps in performing a logistic regression analysis; (c) Describe the assumptions of the logistic regression technique and the precautions that need to be checked when it is employed; (d) Be able to summarize its advantages over other techniques like estimation of group differences and simple regression. Copyright © 2012 Elsevier Inc. All rights reserved.
Austin, Peter C.; Tu, Jack V.; Ho, Jennifer E.; Levy, Daniel; Lee, Douglas S.
2014-01-01
Objective Physicians classify patients into those with or without a specific disease. Furthermore, there is often interest in classifying patients according to disease etiology or subtype. Classification trees are frequently used to classify patients according to the presence or absence of a disease. However, classification trees can suffer from limited accuracy. In the data-mining and machine learning literature, alternate classification schemes have been developed. These include bootstrap aggregation (bagging), boosting, random forests, and support vector machines. Study design and Setting We compared the performance of these classification methods with those of conventional classification trees to classify patients with heart failure according to the following sub-types: heart failure with preserved ejection fraction (HFPEF) vs. heart failure with reduced ejection fraction (HFREF). We also compared the ability of these methods to predict the probability of the presence of HFPEF with that of conventional logistic regression. Results We found that modern, flexible tree-based methods from the data mining literature offer substantial improvement in prediction and classification of heart failure sub-type compared to conventional classification and regression trees. However, conventional logistic regression had superior performance for predicting the probability of the presence of HFPEF compared to the methods proposed in the data mining literature. Conclusion The use of tree-based methods offers superior performance over conventional classification and regression trees for predicting and classifying heart failure subtypes in a population-based sample of patients from Ontario. However, these methods do not offer substantial improvements over logistic regression for predicting the presence of HFPEF. PMID:23384592
Addressing data privacy in matched studies via virtual pooling.
Saha-Chaudhuri, P; Weinberg, C R
2017-09-07
Data confidentiality and shared use of research data are two desirable but sometimes conflicting goals in research with multi-center studies and distributed data. While ideal for straightforward analysis, confidentiality restrictions forbid creation of a single dataset that includes covariate information of all participants. Current approaches such as aggregate data sharing, distributed regression, meta-analysis and score-based methods can have important limitations. We propose a novel application of an existing epidemiologic tool, specimen pooling, to enable confidentiality-preserving analysis of data arising from a matched case-control, multi-center design. Instead of pooling specimens prior to assay, we apply the methodology to virtually pool (aggregate) covariates within nodes. Such virtual pooling retains most of the information used in an analysis with individual data and since individual participant data is not shared externally, within-node virtual pooling preserves data confidentiality. We show that aggregated covariate levels can be used in a conditional logistic regression model to estimate individual-level odds ratios of interest. The parameter estimates from the standard conditional logistic regression are compared to the estimates based on a conditional logistic regression model with aggregated data. The parameter estimates are shown to be similar to those without pooling and to have comparable standard errors and confidence interval coverage. Virtual data pooling can be used to maintain confidentiality of data from multi-center study and can be particularly useful in research with large-scale distributed data.
The Research on the Loan-to-Value of Inventory Pledge Loan Based Upon the Unified Credit Mode
NASA Astrophysics Data System (ADS)
Peng, Yang
This paper focus on loan limit indicator of seasonal inventory financing in supply chain financial innovation based on the logistics features of unified credit mode. According to the "corporate and debt" method in trade credit, this paper analyzes the cash flow properties of borrowing firm and the profit level of logistics enterprise, then it assumes downside-risk-averse logistics enterprise instead of risk-neutral logistics enterprise and takes the method of VaR to figure out the maximum loan-to-value ratio of inventory which is in accord with the risk tolerance level of logistics enterprise in seasonal inventory impawn financing.
Travis Woolley; David C. Shaw; Lisa M. Ganio; Stephen Fitzgerald
2012-01-01
Logistic regression models used to predict tree mortality are critical to post-fire management, planning prescribed bums and understanding disturbance ecology. We review literature concerning post-fire mortality prediction using logistic regression models for coniferous tree species in the western USA. We include synthesis and review of: methods to develop, evaluate...
Preserving Institutional Privacy in Distributed binary Logistic Regression.
Wu, Yuan; Jiang, Xiaoqian; Ohno-Machado, Lucila
2012-01-01
Privacy is becoming a major concern when sharing biomedical data across institutions. Although methods for protecting privacy of individual patients have been proposed, it is not clear how to protect the institutional privacy, which is many times a critical concern of data custodians. Built upon our previous work, Grid Binary LOgistic REgression (GLORE)1, we developed an Institutional Privacy-preserving Distributed binary Logistic Regression model (IPDLR) that considers both individual and institutional privacy for building a logistic regression model in a distributed manner. We tested our method using both simulated and clinical data, showing how it is possible to protect the privacy of individuals and of institutions using a distributed strategy.
Covariate Imbalance and Adjustment for Logistic Regression Analysis of Clinical Trial Data
Ciolino, Jody D.; Martin, Reneé H.; Zhao, Wenle; Jauch, Edward C.; Hill, Michael D.; Palesch, Yuko Y.
2014-01-01
In logistic regression analysis for binary clinical trial data, adjusted treatment effect estimates are often not equivalent to unadjusted estimates in the presence of influential covariates. This paper uses simulation to quantify the benefit of covariate adjustment in logistic regression. However, International Conference on Harmonization guidelines suggest that covariate adjustment be pre-specified. Unplanned adjusted analyses should be considered secondary. Results suggest that that if adjustment is not possible or unplanned in a logistic setting, balance in continuous covariates can alleviate some (but never all) of the shortcomings of unadjusted analyses. The case of log binomial regression is also explored. PMID:24138438
Ardoino, Ilaria; Lanzoni, Monica; Marano, Giuseppe; Boracchi, Patrizia; Sagrini, Elisabetta; Gianstefani, Alice; Piscaglia, Fabio; Biganzoli, Elia M
2017-04-01
The interpretation of regression models results can often benefit from the generation of nomograms, 'user friendly' graphical devices especially useful for assisting the decision-making processes. However, in the case of multinomial regression models, whenever categorical responses with more than two classes are involved, nomograms cannot be drawn in the conventional way. Such a difficulty in managing and interpreting the outcome could often result in a limitation of the use of multinomial regression in decision-making support. In the present paper, we illustrate the derivation of a non-conventional nomogram for multinomial regression models, intended to overcome this issue. Although it may appear less straightforward at first sight, the proposed methodology allows an easy interpretation of the results of multinomial regression models and makes them more accessible for clinicians and general practitioners too. Development of prediction model based on multinomial logistic regression and of the pertinent graphical tool is illustrated by means of an example involving the prediction of the extent of liver fibrosis in hepatitis C patients by routinely available markers.
Sepsis mortality prediction with the Quotient Basis Kernel.
Ribas Ripoll, Vicent J; Vellido, Alfredo; Romero, Enrique; Ruiz-Rodríguez, Juan Carlos
2014-05-01
This paper presents an algorithm to assess the risk of death in patients with sepsis. Sepsis is a common clinical syndrome in the intensive care unit (ICU) that can lead to severe sepsis, a severe state of septic shock or multi-organ failure. The proposed algorithm may be implemented as part of a clinical decision support system that can be used in combination with the scores deployed in the ICU to improve the accuracy, sensitivity and specificity of mortality prediction for patients with sepsis. In this paper, we used the Simplified Acute Physiology Score (SAPS) for ICU patients and the Sequential Organ Failure Assessment (SOFA) to build our kernels and algorithms. In the proposed method, we embed the available data in a suitable feature space and use algorithms based on linear algebra, geometry and statistics for inference. We present a simplified version of the Fisher kernel (practical Fisher kernel for multinomial distributions), as well as a novel kernel that we named the Quotient Basis Kernel (QBK). These kernels are used as the basis for mortality prediction using soft-margin support vector machines. The two new kernels presented are compared against other generative kernels based on the Jensen-Shannon metric (centred, exponential and inverse) and other widely used kernels (linear, polynomial and Gaussian). Clinical relevance is also evaluated by comparing these results with logistic regression and the standard clinical prediction method based on the initial SAPS score. As described in this paper, we tested the new methods via cross-validation with a cohort of 400 test patients. The results obtained using our methods compare favourably with those obtained using alternative kernels (80.18% accuracy for the QBK) and the standard clinical prediction method, which are based on the basal SAPS score or logistic regression (71.32% and 71.55%, respectively). The QBK presented a sensitivity and specificity of 79.34% and 83.24%, which outperformed the other kernels analysed, logistic regression and the standard clinical prediction method based on the basal SAPS score. Several scoring systems for patients with sepsis have been introduced and developed over the last 30 years. They allow for the assessment of the severity of disease and provide an estimate of in-hospital mortality. Physiology-based scoring systems are applied to critically ill patients and have a number of advantages over diagnosis-based systems. Severity score systems are often used to stratify critically ill patients for possible inclusion in clinical trials. In this paper, we present an effective algorithm that combines both scoring methodologies for the assessment of death in patients with sepsis that can be used to improve the sensitivity and specificity of the currently available methods. Copyright © 2014 Elsevier B.V. All rights reserved.
A computational approach to compare regression modelling strategies in prediction research.
Pajouheshnia, Romin; Pestman, Wiebe R; Teerenstra, Steven; Groenwold, Rolf H H
2016-08-25
It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in clinical prediction research. A framework for comparing logistic regression modelling strategies by their likelihoods was formulated using a wrapper approach. Five different strategies for modelling, including simple shrinkage methods, were compared in four empirical data sets to illustrate the concept of a priori strategy comparison. Simulations were performed in both randomly generated data and empirical data to investigate the influence of data characteristics on strategy performance. We applied the comparison framework in a case study setting. Optimal strategies were selected based on the results of a priori comparisons in a clinical data set and the performance of models built according to each strategy was assessed using the Brier score and calibration plots. The performance of modelling strategies was highly dependent on the characteristics of the development data in both linear and logistic regression settings. A priori comparisons in four empirical data sets found that no strategy consistently outperformed the others. The percentage of times that a model adjustment strategy outperformed a logistic model ranged from 3.9 to 94.9 %, depending on the strategy and data set. However, in our case study setting the a priori selection of optimal methods did not result in detectable improvement in model performance when assessed in an external data set. The performance of prediction modelling strategies is a data-dependent process and can be highly variable between data sets within the same clinical domain. A priori strategy comparison can be used to determine an optimal logistic regression modelling strategy for a given data set before selecting a final modelling approach.
Hein, R; Abbas, S; Seibold, P; Salazar, R; Flesch-Janys, D; Chang-Claude, J
2012-01-01
Menopausal hormone therapy (MHT) is associated with an increased breast cancer risk in postmenopausal women, with combined estrogen-progestagen therapy posing a greater risk than estrogen monotherapy. However, few studies focused on potential effect modification of MHT-associated breast cancer risk by genetic polymorphisms in the progesterone metabolism. We assessed effect modification of MHT use by five coding single nucleotide polymorphisms (SNPs) in the progesterone metabolizing enzymes AKR1C3 (rs7741), AKR1C4 (rs3829125, rs17134592), and SRD5A1 (rs248793, rs3736316) using a two-center population-based case-control study from Germany with 2,502 postmenopausal breast cancer patients and 4,833 matched controls. An empirical-Bayes procedure that tests for interaction using a weighted combination of the prospective and the retrospective case-control estimators as well as standard prospective logistic regression were applied to assess multiplicative statistical interaction between polymorphisms and duration of MHT use with regard to breast cancer risk assuming a log-additive mode of inheritance. No genetic marginal effects were observed. Breast cancer risk associated with duration of combined therapy was significantly modified by SRD5A1_rs3736316, showing a reduced risk elevation in carriers of the minor allele (p (interaction,empirical-Bayes) = 0.006 using the empirical-Bayes method, p (interaction,logistic regression) = 0.013 using logistic regression). The risk associated with duration of use of monotherapy was increased by AKR1C3_rs7741 in minor allele carriers (p (interaction,empirical-Bayes) = 0.083, p (interaction,logistic regression) = 0.029) and decreased in minor allele carriers of two SNPs in AKR1C4 (rs3829125: p (interaction,empirical-Bayes) = 0.07, p (interaction,logistic regression) = 0.021; rs17134592: p (interaction,empirical-Bayes) = 0.101, p (interaction,logistic regression) = 0.038). After Bonferroni correction for multiple testing only SRD5A1_rs3736316 assessed using the empirical-Bayes method remained significant. Postmenopausal breast cancer risk associated with combined therapy may be modified by genetic variation in SRD5A1. Further well-powered studies are, however, required to replicate our finding.
Wang, Haijun; Zhan, Xianghong; Liu, Hongqi; Wang, Xiaoyun; Li, Xia; Wang, Xiaoru; Wu, Jibiao
2017-01-01
We performed an epidemiological investigation of subjects with premenstrual dysphoric disorder (PMDD) to identify the clinical distribution of the major syndromes and symptoms. The pathogenesis of PMDD mainly involves the dysfunction of liver conveyance and dispersion. Excessive liver conveyance and dispersion are associated with liver-qi invasion syndrome, while insufficient liver conveyance and dispersion are expressed as liver-qi depression syndrome. Additionally, a nonconditional logistic regression was performed to analyze the symptomatic features of liver-qi invasion and liver-qi depression. As a result of this analysis, two subtypes of PMDD are proposed, namely, excessive liver conveyance and dispersion (liver-qi invasion syndrome) and insufficient liver conveyance and dispersion (liver-qi depression syndrome). Our findings provide an epidemiological foundation for the clinical diagnosis and treatment of PMDD based on the identification of different types. PMID:28698873
Lyubchenko, Taras; Zerbe, Gary O.
2014-01-01
This study examines the loss of peripherally induced B cell immune tolerance in Rheumatoid arthritis (RA) and establishes a novel signaling-based measure of activation in a subset of autoreactive B cells - the Induced tolerance status index (ITSI). Naturally occurring naïve autoreactive B cells can escape the “classical” tolerogenic mechanisms of clonal deletion and receptor editing, but remain peripherally tolerized through B cell receptor (BCR) signaling inhibition (postdevelopmental “receptor tuning” or anergy). ITSI is a statistical index that numerically determines the level of homology between activation patterns of BCR signaling intermediaries in B cells that are either tolerized or activated by auto antigen exposure, and thus quantifies the level of peripheral immune tolerance. The index is based on the logistic regression analysis of phosphorylation levels in a panel of BCR signaling proteins. Our results demonstrate a new approach to identifying autoreactive B cells based on their BCR signaling features. PMID:25057856
Product unit neural network models for predicting the growth limits of Listeria monocytogenes.
Valero, A; Hervás, C; García-Gimeno, R M; Zurera, G
2007-08-01
A new approach to predict the growth/no growth interface of Listeria monocytogenes as a function of storage temperature, pH, citric acid (CA) and ascorbic acid (AA) is presented. A linear logistic regression procedure was performed and a non-linear model was obtained by adding new variables by means of a Neural Network model based on Product Units (PUNN). The classification efficiency of the training data set and the generalization data of the new Logistic Regression PUNN model (LRPU) were compared with Linear Logistic Regression (LLR) and Polynomial Logistic Regression (PLR) models. 92% of the total cases from the LRPU model were correctly classified, an improvement on the percentage obtained using the PLR model (90%) and significantly higher than the results obtained with the LLR model, 80%. On the other hand predictions of LRPU were closer to data observed which permits to design proper formulations in minimally processed foods. This novel methodology can be applied to predictive microbiology for describing growth/no growth interface of food-borne microorganisms such as L. monocytogenes. The optimal balance is trying to find models with an acceptable interpretation capacity and with good ability to fit the data on the boundaries of variable range. The results obtained conclude that these kinds of models might well be very a valuable tool for mathematical modeling.
NASA Astrophysics Data System (ADS)
Ozdemir, Adnan
2011-07-01
SummaryThe purpose of this study is to produce a groundwater spring potential map of the Sultan Mountains in central Turkey, based on a logistic regression method within a Geographic Information System (GIS) environment. Using field surveys, the locations of the springs (440 springs) were determined in the study area. In this study, 17 spring-related factors were used in the analysis: geology, relative permeability, land use/land cover, precipitation, elevation, slope, aspect, total curvature, plan curvature, profile curvature, wetness index, stream power index, sediment transport capacity index, distance to drainage, distance to fault, drainage density, and fault density map. The coefficients of the predictor variables were estimated using binary logistic regression analysis and were used to calculate the groundwater spring potential for the entire study area. The accuracy of the final spring potential map was evaluated based on the observed springs. The accuracy of the model was evaluated by calculating the relative operating characteristics. The area value of the relative operating characteristic curve model was found to be 0.82. These results indicate that the model is a good estimator of the spring potential in the study area. The spring potential map shows that the areas of very low, low, moderate and high groundwater spring potential classes are 105.586 km 2 (28.99%), 74.271 km 2 (19.906%), 101.203 km 2 (27.14%), and 90.05 km 2 (24.671%), respectively. The interpretations of the potential map showed that stream power index, relative permeability of lithologies, geology, elevation, aspect, wetness index, plan curvature, and drainage density play major roles in spring occurrence and distribution in the Sultan Mountains. The logistic regression approach has not yet been used to delineate groundwater potential zones. In this study, the logistic regression method was used to locate potential zones for groundwater springs in the Sultan Mountains. The evolved model was found to be in strong agreement with the available groundwater spring test data. Hence, this method can be used routinely in groundwater exploration under favourable conditions.
NASA Astrophysics Data System (ADS)
Wilson, Seth Mark
Grizzly bear (Ursus arctos) deaths in the US tend to be concentrated on the periphery of core habitats. These deaths were often preceded by conflicts with humans. Management removals of "nuisance" and or habituated grizzly bears are a leading cause of death in many populations. This exploratory study focuses on the conditions that lead to human-grizzly bear conflicts on private lands near core habitat. I examined spatial associations among reported human-grizzly bear conflicts during 1986--2001, landscape features, and agricultural-attractants in north-central Montana. I surveyed 61 of a possible 64 active livestock related land users and I used geographic information system (GIS) techniques to collect information on cattle and sheep pasture locations, seasons of use, and bone yard (carcass dumps) and beehive locations. I used GIS spatial analyses, univariate tests, and logistic regression models to explore the associations among conflicts, landscape features, and attractants. A majority (75%) of conflicts were found in distinct seasonal conflict hotspots. Conflict hotspots with spatial overlap were associated with riparian vegetation, bone yards, and beehives in close proximity to one another and accounted for 62% of all conflicts. Consistently available seasonal attractants in overlapping hotspots such as calving areas, sheep lambing areas and spring, summer, and fall sheep and cattle pastures appear to perpetuate the occurrence of conflicts. I found that lambing areas and spring and summer sheep pastures were strongly associated with conflict locations as were cattle calving areas, spring cow/calf pastures, fall pastures, and bone yards. Logistic regression modeling revealed that the presence of riparian vegetation within a 1.6 km search radius strongly influenced the likelihood of conflict. After controlling for riparian vegetation, I found that unmanaged bone yards, unfenced and fenced beehives, all increased the odds of conflict. For every 1 km moved away from spring, summer, and fall sheep and cattle pastures, the odds of conflict decreased. The model confirmed the existence of conflict hotspots and illustrated that a collection of attractants beyond the effects of riparian vegetation were associated with conflicts. Contour probability plots of logistic regression models showed good predictive capacity. We discuss these findings and offer management recommendations.
Rispo, Antonio; Imperatore, Nicola; Testa, Anna; Bucci, Luigi; Luglio, Gaetano; De Palma, Giovanni Domenico; Rea, Matilde; Nardone, Olga Maria; Caporaso, Nicola; Castiglione, Fabiana
2018-03-08
In the management of Crohn's Disease (CD) patients, having a simple score combining clinical, endoscopic and imaging features to predict the risk of surgery could help to tailor treatment more effectively. AIMS: to prospectively evaluate the one-year risk factors for surgery in refractory/severe CD and to generate a risk matrix for predicting the probability of surgery at one year. CD patients needing a disease re-assessment at our tertiary IBD centre underwent clinical, laboratory, endoscopy and bowel sonography (BS) examinations within one week. The optimal cut-off values in predicting surgery were identified using ROC curves for Simple Endoscopic Score for CD (SES-CD), bowel wall thickness (BWT) at BS, and small bowel CD extension at BS. Binary logistic regression and Cox's regression were then carried out. Finally, the probabilities of surgery were calculated for selected baseline levels of covariates and results were arranged in a prediction matrix. Of 100 CD patients, 30 underwent surgery within one year. SES-CD©9 (OR 15.3; p<0.001), BWT©7 mm (OR 15.8; p<0.001), small bowel CD extension at BS©33 cm (OR 8.23; p<0.001) and stricturing/penetrating behavior (OR 4.3; p<0.001) were the only independent factors predictive of surgery at one-year based on binary logistic and Cox's regressions. Our matrix model combined these risk factors and the probability of surgery ranged from 0.48% to 87.5% (sixteen combinations). Our risk matrix combining clinical, endoscopic and ultrasonographic findings can accurately predict the one-year risk of surgery in patients with severe/refractory CD requiring a disease re-evaluation. This tool could be of value in clinical practice, serving as the basis for a tailored management of CD patients.
Kattimani, Shivanand; Menon, Vikas; Sarkar, Siddharth; Arun, Anand Babu; Venkatalakshmi, Penchilaiya
2016-01-01
Identifying those who are likely to make suicide attempts under alcohol intoxication has important implications for management and prevention of further suicidal behavior. To identify the frequency of suicide attempts made under the influence of alcohol and the percentage of impulsive suicide attempts among them. We also aimed to identify predictors of attempted suicide under intoxication with alcohol. Record-based study carried out at a tertiary care hospital. The clinical charts of consecutive suicide attempters ( n = 147) who presented to the crisis intervention clinic from July 2013 to June 2014 were reviewed, and relevant data were extracted. The participants were divided into three groups - nonusers of alcohol ( n = 85), alcohol users who did not attempt under intoxication ( n = 31) and alcohol users who attempted under intoxication ( n = 31). These groups were compared on various sociodemographic and clinical variables. Logistic regression was done to identify predictors of suicide attempt under intoxication. Chi-square (χ 2 ) test, one-way ANOVA (F) test and backward stepwise logistic regression. About 21.08% of all suicide attempts occurred under alcohol intoxication. Such subjects were more likely to be older ( F = 12.428, P < 0.001), male (χ 2 = 87.367, P < 0.001), married (χ 2 = 6.787, P = 0.034), employed (χ 2 = 41.778, P < 0.001), and fewer years of formal schooling ( F = 3.312, P = 0.039). Physical methods (hanging) were used more often in this group (χ 2 = 19.510, P = 0.012). In regression analysis, only marital status and living condition emerged as predictors of attempt under intoxication (odds ratios 4.52 [confidence interval (CI) 1.34-15.24, P = 0.015] and 5.67 [CI 1.17-27.39, P = 0.031] respectively). Certain demographic features may help us in identifying those who are more likely to make attempts under intoxication. The role of personality factors as potential mediators of such behavior needs further exploration.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, J; Chuong, M; Choi, W
Purpose: To identify PET/CT based imaging predictors of anal cancer recurrence and evaluate baseline vs. mid-treatment vs. post-treatment PET/CT scans in the tumor recurrence prediction. Methods: FDG-PET/CT scans were obtained at baseline, during chemoradiotherapy (CRT, midtreatment), and after CRT (post-treatment) in 17 patients of anal cancer. Four patients had tumor recurrence. For each patient, the mid-treatment and post-treatment scans were respectively aligned to the baseline scan by a rigid registration followed by a deformable registration. PET/CT image features were computed within the manually delineated tumor volume of each scan to characterize the intensity histogram, spatial patterns (texture), and shape ofmore » the tumors, as well as the changes of these features resulting from CRT. A total of 335 image features were extracted. An Exact Logistic Regression model was employed to analyze these PET/CT image features in order to identify potential predictors for tumor recurrence. Results: Eleven potential predictors of cancer recurrence were identified with p < 0.10, including five shape features, five statistical texture features, and one CT intensity histogram feature. Six features were indentified from posttreatment scans, 3 from mid-treatment scans, and 2 from baseline scans. These features indicated that there were differences in shape, intensity, and spatial pattern between tumors with and without recurrence. Recurrent tumors tended to have more compact shape (higher roundness and lower elongation) and larger intensity difference between baseline and follow-up scans, compared to non-recurrent tumors. Conclusion: PET/CT based anal cancer recurrence predictors were identified. The post-CRT PET/CT is the most important scan for the prediction of cancer recurrence. The baseline and mid-CRT PET/CT also showed value in the prediction and would be more useful for the predication of tumor recurrence in early stage of CRT. This work was supported in part by the National Cancer Institute Grant R01CA172638.« less
Steen, Paul J.; Passino-Reader, Dora R.; Wiley, Michael J.
2006-01-01
As a part of the Great Lakes Regional Aquatic Gap Analysis Project, we evaluated methodologies for modeling associations between fish species and habitat characteristics at a landscape scale. To do this, we created brook trout Salvelinus fontinalis presence and absence models based on four different techniques: multiple linear regression, logistic regression, neural networks, and classification trees. The models were tested in two ways: by application to an independent validation database and cross-validation using the training data, and by visual comparison of statewide distribution maps with historically recorded occurrences from the Michigan Fish Atlas. Although differences in the accuracy of our models were slight, the logistic regression model predicted with the least error, followed by multiple regression, then classification trees, then the neural networks. These models will provide natural resource managers a way to identify habitats requiring protection for the conservation of fish species.
America's Democracy Colleges: The Civic Engagement of Community College Students
ERIC Educational Resources Information Center
Angeli Newell, Mallory
2014-01-01
This study explored the civic engagement of current two- and four-year students to explore whether differences exist between the groups and what may explain the differences. Using binary logistic regression and Ordinary Least Squares regression it was found that community-based engagement was lower for two- than four-year students, though…
Random Forest as a Predictive Analytics Alternative to Regression in Institutional Research
ERIC Educational Resources Information Center
He, Lingjun; Levine, Richard A.; Fan, Juanjuan; Beemer, Joshua; Stronach, Jeanne
2018-01-01
In institutional research, modern data mining approaches are seldom considered to address predictive analytics problems. The goal of this paper is to highlight the advantages of tree-based machine learning algorithms over classic (logistic) regression methods for data-informed decision making in higher education problems, and stress the success of…
Beukinga, Roelof J; Hulshoff, Jan Binne; Mul, Véronique E M; Noordzij, Walter; Kats-Ugurlu, Gursah; Slart, Riemer H J A; Plukker, John T M
2018-06-01
Purpose To assess the value of baseline and restaging fluorine 18 ( 18 F) fluorodeoxyglucose (FDG) positron emission tomography (PET) radiomics in predicting pathologic complete response to neoadjuvant chemotherapy and radiation therapy (NCRT) in patients with locally advanced esophageal cancer. Materials and Methods In this retrospective study, 73 patients with histologic analysis-confirmed T1/N1-3/M0 or T2-4a/N0-3/M0 esophageal cancer were treated with NCRT followed by surgery (Chemoradiotherapy for Esophageal Cancer followed by Surgery Study regimen) between October 2014 and August 2017. Clinical variables and radiomic features from baseline and restaging 18 F-FDG PET were selected by univariable logistic regression and least absolute shrinkage and selection operator. The selected variables were used to fit a multivariable logistic regression model, which was internally validated by using bootstrap resampling with 20 000 replicates. The performance of this model was compared with reference prediction models composed of maximum standardized uptake value metrics, clinical variables, and maximum standardized uptake value at baseline NCRT radiomic features. Outcome was defined as complete versus incomplete pathologic response (tumor regression grade 1 vs 2-5 according to the Mandard classification). Results Pathologic response was complete in 16 patients (21.9%) and incomplete in 57 patients (78.1%). A prediction model combining clinical T-stage and restaging NCRT (post-NCRT) joint maximum (quantifying image orderliness) yielded an optimism-corrected area under the receiver operating characteristics curve of 0.81. Post-NCRT joint maximum was replaceable with five other redundant post-NCRT radiomic features that provided equal model performance. All reference prediction models exhibited substantially lower discriminatory accuracy. Conclusion The combination of clinical T-staging and quantitative assessment of post-NCRT 18 F-FDG PET orderliness (joint maximum) provided high discriminatory accuracy in predicting pathologic complete response in patients with esophageal cancer. © RSNA, 2018 Online supplemental material is available for this article.
Why credit risk markets are predestined for exhibiting log-periodic power law structures
NASA Astrophysics Data System (ADS)
Wosnitza, Jan Henrik; Leker, Jens
2014-01-01
Recent research has established the existence of log-periodic power law (LPPL) patterns in financial institutions’ credit default swap (CDS) spreads. The main purpose of this paper is to clarify why credit risk markets are predestined for exhibiting LPPL structures. To this end, the credit risk prediction of two variants of logistic regression, i.e. polynomial logistic regression (PLR) and kernel logistic regression (KLR), are firstly compared to the standard logistic regression (SLR). In doing so, the question whether the performances of rating systems based on balance sheet ratios can be improved by nonlinear transformations of the explanatory variables is resolved. Building on the result that nonlinear balance sheet ratio transformations hardly improve the SLR’s predictive power in our case, we secondly compare the classification performance of a multivariate SLR to the discriminative powers of probabilities of default derived from three different capital market data, namely bonds, CDSs, and stocks. Benefiting from the prompt inclusion of relevant information, the capital market data in general and CDSs in particular increasingly outperform the SLR while approaching the time of the credit event. Due to the higher classification performances, it seems plausible for creditors to align their investment decisions with capital market-based default indicators, i.e., to imitate the aggregate opinion of the market participants. Since imitation is considered to be the source of LPPL structures in financial time series, it is highly plausible to scan CDS spread developments for LPPL patterns. By establishing LPPL patterns in governmental CDS spread trajectories of some European crisis countries, the LPPL’s application to credit risk markets is extended. This novel piece of evidence further strengthens the claim that credit risk markets are adequate breeding grounds for LPPL patterns.
The use of generalized estimating equations in the analysis of motor vehicle crash data.
Hutchings, Caroline B; Knight, Stacey; Reading, James C
2003-01-01
The purpose of this study was to determine if it is necessary to use generalized estimating equations (GEEs) in the analysis of seat belt effectiveness in preventing injuries in motor vehicle crashes. The 1992 Utah crash dataset was used, excluding crash participants where seat belt use was not appropriate (n=93,633). The model used in the 1996 Report to Congress [Report to congress on benefits of safety belts and motorcycle helmets, based on data from the Crash Outcome Data Evaluation System (CODES). National Center for Statistics and Analysis, NHTSA, Washington, DC, February 1996] was analyzed for all occupants with logistic regression, one level of nesting (occupants within crashes), and two levels of nesting (occupants within vehicles within crashes) to compare the use of GEEs with logistic regression. When using one level of nesting compared to logistic regression, 13 of 16 variance estimates changed more than 10%, and eight of 16 parameter estimates changed more than 10%. In addition, three of the independent variables changed from significant to insignificant (alpha=0.05). With the use of two levels of nesting, two of 16 variance estimates and three of 16 parameter estimates changed more than 10% from the variance and parameter estimates in one level of nesting. One of the independent variables changed from insignificant to significant (alpha=0.05) in the two levels of nesting model; therefore, only two of the independent variables changed from significant to insignificant when the logistic regression model was compared to the two levels of nesting model. The odds ratio of seat belt effectiveness in preventing injuries was 12% lower when a one-level nested model was used. Based on these results, we stress the need to use a nested model and GEEs when analyzing motor vehicle crash data.
Zhan, L.; Liu, Y.; Zhou, J.; Ye, J.; Thompson, P.M.
2015-01-01
Mild cognitive impairment (MCI) is an intermediate stage between normal aging and Alzheimer's disease (AD), and around 10-15% of people with MCI develop AD each year. More recently, MCI has been further subdivided into early and late stages, and there is interest in identifying sensitive brain imaging biomarkers that help to differentiate stages of MCI. Here, we focused on anatomical brain networks computed from diffusion MRI and proposed a new feature extraction and classification framework based on higher order singular value decomposition and sparse logistic regression. In tests on publicly available data from the Alzheimer's Disease Neuroimaging Initiative, our proposed framework showed promise in detecting brain network differences that help in classifying early versus late MCI. PMID:26413202
Real estate value prediction using multivariate regression models
NASA Astrophysics Data System (ADS)
Manjula, R.; Jain, Shubham; Srivastava, Sharad; Rajiv Kher, Pranav
2017-11-01
The real estate market is one of the most competitive in terms of pricing and the same tends to vary significantly based on a lot of factors, hence it becomes one of the prime fields to apply the concepts of machine learning to optimize and predict the prices with high accuracy. Therefore in this paper, we present various important features to use while predicting housing prices with good accuracy. We have described regression models, using various features to have lower Residual Sum of Squares error. While using features in a regression model some feature engineering is required for better prediction. Often a set of features (multiple regressions) or polynomial regression (applying a various set of powers in the features) is used for making better model fit. For these models are expected to be susceptible towards over fitting ridge regression is used to reduce it. This paper thus directs to the best application of regression models in addition to other techniques to optimize the result.
NASA Astrophysics Data System (ADS)
Shafizadeh-Moghadam, Hossein; Helbich, Marco
2015-03-01
The rapid growth of megacities requires special attention among urban planners worldwide, and particularly in Mumbai, India, where growth is very pronounced. To cope with the planning challenges this will bring, developing a retrospective understanding of urban land-use dynamics and the underlying driving-forces behind urban growth is a key prerequisite. This research uses regression-based land-use change models - and in particular non-spatial logistic regression models (LR) and auto-logistic regression models (ALR) - for the Mumbai region over the period 1973-2010, in order to determine the drivers behind spatiotemporal urban expansion. Both global models are complemented by a local, spatial model, the so-called geographically weighted logistic regression (GWLR) model, one that explicitly permits variations in driving-forces across space. The study comes to two main conclusions. First, both global models suggest similar driving-forces behind urban growth over time, revealing that LRs and ALRs result in estimated coefficients with comparable magnitudes. Second, all the local coefficients show distinctive temporal and spatial variations. It is therefore concluded that GWLR aids our understanding of urban growth processes, and so can assist context-related planning and policymaking activities when seeking to secure a sustainable urban future.
Yang, Fuzhong; Li, Yihan; Xie, Dong; Shao, Chunhong; Ren, Jianer; Wu, Wenyuan; Zhang, Ning; Zhang, Zhen; Zou, Ying; Zhang, Jiulong; Qiao, Dongdong; Gao, Chengge; Li, Youhui; Hu, Jian; Deng, Hong; Wang, Gang; Du, Bo; Wang, Xumei; Liu, Tiebang; Gan, Zhaoyu; Peng, Juyi; Wei, Bo; Pan, Jiyang; Chen, Honghui; Sun, Shufan; Jia, Hong; Liu, Ying; Chen, Qiaoling; Wang, Xueyi; Cao, Juling; Lv, Luxian; Chen, Yunchun; Ha, Baowei; Ning, Yuping; Chen, YiPing; Kendler, Kenneth S.; Flint, Jonathan; Shi, Shenxun
2011-01-01
Background Individuals with early-onset depression may be a clinically distinct group with particular symptom patterns, illness course, comorbidity and family history. This question has not been previously investigated in a Han Chinese population. Methods We examined the clinical features of 1970 Han Chinese women with DSM-IV major depressive disorder (MDD) between 30 and 60 years of age across China. Analysis of linear, logistic and multiple logistic regression models was used to determine the association between age at onset (AAO) with continuous, binary and discrete characteristic clinical features of MDD. Results Earlier AAO was associated with more suicidal ideation and attempts and higher neuroticism, but fewer sleep, appetite and weight changes. Patients with an earlier AAO were more likely to suffer a chronic course (longer illness duration, more MDD episodes and longer index episode), increased rates of MDD in their parents and a lower likelihood of marriage. They tend to have higher comorbidity with anxiety disorders (general anxiety disorder, social phobia and agoraphobia) and dysthymia. Conclusions Early AAO in MDD may be an index of a more severe, highly comorbid and familial disorder. Our findings indicate that the features of MDD in China are similar to those reported elsewhere in the world. PMID:21782247
Yang, Fuzhong; Li, Yihan; Xie, Dong; Shao, Chunhong; Ren, Jianer; Wu, Wenyuan; Zhang, Ning; Zhang, Zhen; Zou, Ying; Zhang, Jiulong; Qiao, Dongdong; Gao, Chengge; Li, Youhui; Hu, Jian; Deng, Hong; Wang, Gang; Du, Bo; Wang, Xumei; Liu, Tiebang; Gan, Zhaoyu; Peng, Juyi; Wei, Bo; Pan, Jiyang; Chen, Honghui; Sun, Shufan; Jia, Hong; Liu, Ying; Chen, Qiaoling; Wang, Xueyi; Cao, Juling; Lv, Luxian; Chen, Yunchun; Ha, Baowei; Ning, Yuping; Chen, Yiping; Kendler, Kenneth S; Flint, Jonathan; Shi, Shenxun
2011-12-01
Individuals with early-onset depression may be a clinically distinct group with particular symptom patterns, illness course, comorbidity and family history. This question has not been previously investigated in a Han Chinese population. We examined the clinical features of 1970 Han Chinese women with DSM-IV major depressive disorder (MDD) between 30 and 60 years of age across China. Analysis of linear, logistic and multiple logistic regression models was used to determine the association between age at onset (AAO) with continuous, binary and discrete characteristic clinical features of MDD. Earlier AAO was associated with more suicidal ideation and attempts and higher neuroticism, but fewer sleep, appetite and weight changes. Patients with an earlier AAO were more likely to suffer a chronic course (longer illness duration, more MDD episodes and longer index episode), increased rates of MDD in their parents and a lower likelihood of marriage. They tend to have higher comorbidity with anxiety disorders (general anxiety disorder, social phobia and agoraphobia) and dysthymia. Early AAO in MDD may be an index of a more severe, highly comorbid and familial disorder. Our findings indicate that the features of MDD in China are similar to those reported elsewhere in the world. Copyright © 2011 Elsevier B.V. All rights reserved.
Logistic regression for dichotomized counts.
Preisser, John S; Das, Kalyan; Benecha, Habtamu; Stamm, John W
2016-12-01
Sometimes there is interest in a dichotomized outcome indicating whether a count variable is positive or zero. Under this scenario, the application of ordinary logistic regression may result in efficiency loss, which is quantifiable under an assumed model for the counts. In such situations, a shared-parameter hurdle model is investigated for more efficient estimation of regression parameters relating to overall effects of covariates on the dichotomous outcome, while handling count data with many zeroes. One model part provides a logistic regression containing marginal log odds ratio effects of primary interest, while an ancillary model part describes the mean count of a Poisson or negative binomial process in terms of nuisance regression parameters. Asymptotic efficiency of the logistic model parameter estimators of the two-part models is evaluated with respect to ordinary logistic regression. Simulations are used to assess the properties of the models with respect to power and Type I error, the latter investigated under both misspecified and correctly specified models. The methods are applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren. © The Author(s) 2014.
The intermediate endpoint effect in logistic and probit regression
MacKinnon, DP; Lockwood, CM; Brown, CH; Wang, W; Hoffman, JM
2010-01-01
Background An intermediate endpoint is hypothesized to be in the middle of the causal sequence relating an independent variable to a dependent variable. The intermediate variable is also called a surrogate or mediating variable and the corresponding effect is called the mediated, surrogate endpoint, or intermediate endpoint effect. Clinical studies are often designed to change an intermediate or surrogate endpoint and through this intermediate change influence the ultimate endpoint. In many intermediate endpoint clinical studies the dependent variable is binary, and logistic or probit regression is used. Purpose The purpose of this study is to describe a limitation of a widely used approach to assessing intermediate endpoint effects and to propose an alternative method, based on products of coefficients, that yields more accurate results. Methods The intermediate endpoint model for a binary outcome is described for a true binary outcome and for a dichotomization of a latent continuous outcome. Plots of true values and a simulation study are used to evaluate the different methods. Results Distorted estimates of the intermediate endpoint effect and incorrect conclusions can result from the application of widely used methods to assess the intermediate endpoint effect. The same problem occurs for the proportion of an effect explained by an intermediate endpoint, which has been suggested as a useful measure for identifying intermediate endpoints. A solution to this problem is given based on the relationship between latent variable modeling and logistic or probit regression. Limitations More complicated intermediate variable models are not addressed in the study, although the methods described in the article can be extended to these more complicated models. Conclusions Researchers are encouraged to use an intermediate endpoint method based on the product of regression coefficients. A common method based on difference in coefficient methods can lead to distorted conclusions regarding the intermediate effect. PMID:17942466
Mocellin, Simone; Ambrosi, Alessandro; Montesco, Maria Cristina; Foletto, Mirto; Zavagno, Giorgio; Nitti, Donato; Lise, Mario; Rossi, Carlo Riccardo
2006-08-01
Currently, approximately 80% of melanoma patients undergoing sentinel node biopsy (SNB) have negative sentinel lymph nodes (SLNs), and no prediction system is reliable enough to be implemented in the clinical setting to reduce the number of SNB procedures. In this study, the predictive power of support vector machine (SVM)-based statistical analysis was tested. The clinical records of 246 patients who underwent SNB at our institution were used for this analysis. The following clinicopathologic variables were considered: the patient's age and sex and the tumor's histological subtype, Breslow thickness, Clark level, ulceration, mitotic index, lymphocyte infiltration, regression, angiolymphatic invasion, microsatellitosis, and growth phase. The results of SVM-based prediction of SLN status were compared with those achieved with logistic regression. The SLN positivity rate was 22% (52 of 234). When the accuracy was > or = 80%, the negative predictive value, positive predictive value, specificity, and sensitivity were 98%, 54%, 94%, and 77% and 82%, 41%, 69%, and 93% by using SVM and logistic regression, respectively. Moreover, SVM and logistic regression were associated with a diagnostic error and an SNB percentage reduction of (1) 1% and 60% and (2) 15% and 73%, respectively. The results from this pilot study suggest that SVM-based prediction of SLN status might be evaluated as a prognostic method to avoid the SNB procedure in 60% of patients currently eligible, with a very low error rate. If validated in larger series, this strategy would lead to obvious advantages in terms of both patient quality of life and costs for the health care system.
Sharma, Ashok K; Srivastava, Gopal N; Roy, Ankita; Sharma, Vineet K
2017-01-01
The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84-0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better ( R 2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better ( R 2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules.
Sharma, Ashok K.; Srivastava, Gopal N.; Roy, Ankita; Sharma, Vineet K.
2017-01-01
The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84–0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better (R2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better (R2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules. PMID:29249969
Interpretation of commonly used statistical regression models.
Kasza, Jessica; Wolfe, Rory
2014-01-01
A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.
Ohno, Yoshiharu; Fujisawa, Yasuko; Takenaka, Daisuke; Kaminaga, Shigeo; Seki, Shinichiro; Sugihara, Naoki; Yoshikawa, Takeshi
2018-02-01
The objective of this study was to compare the capability of xenon-enhanced area-detector CT (ADCT) performed with a subtraction technique and coregistered 81m Kr-ventilation SPECT/CT for the assessment of pulmonary functional loss and disease severity in smokers. Forty-six consecutive smokers (32 men and 14 women; mean age, 67.0 years) underwent prospective unenhanced and xenon-enhanced ADCT, 81m Kr-ventilation SPECT/CT, and pulmonary function tests. Disease severity was evaluated according to the Global Initiative for Chronic Obstructive Lung Disease (GOLD) classification. CT-based functional lung volume (FLV), the percentage of wall area to total airway area (WA%), and ventilated FLV on xenon-enhanced ADCT and SPECT/CT were calculated for each smoker. All indexes were correlated with percentage of forced expiratory volume in 1 second (%FEV 1 ) using step-wise regression analyses, and univariate and multivariate logistic regression analyses were performed. In addition, the diagnostic accuracy of the proposed model was compared with that of each radiologic index by means of McNemar analysis. Multivariate logistic regression showed that %FEV 1 was significantly affected (r = 0.77, r 2 = 0.59) by two factors: the first factor, ventilated FLV on xenon-enhanced ADCT (p < 0.0001); and the second factor, WA% (p = 0.004). Univariate logistic regression analyses indicated that all indexes significantly affected GOLD classification (p < 0.05). Multivariate logistic regression analyses revealed that ventilated FLV on xenon-enhanced ADCT and CT-based FLV significantly influenced GOLD classification (p < 0.0001). The diagnostic accuracy of the proposed model was significantly higher than that of ventilated FLV on SPECT/CT (p = 0.03) and WA% (p = 0.008). Xenon-enhanced ADCT is more effective than 81m Kr-ventilation SPECT/CT for the assessment of pulmonary functional loss and disease severity.
Secure Logistic Regression Based on Homomorphic Encryption: Design and Evaluation
Song, Yongsoo; Wang, Shuang; Xia, Yuhou; Jiang, Xiaoqian
2018-01-01
Background Learning a model without accessing raw data has been an intriguing idea to security and machine learning researchers for years. In an ideal setting, we want to encrypt sensitive data to store them on a commercial cloud and run certain analyses without ever decrypting the data to preserve privacy. Homomorphic encryption technique is a promising candidate for secure data outsourcing, but it is a very challenging task to support real-world machine learning tasks. Existing frameworks can only handle simplified cases with low-degree polynomials such as linear means classifier and linear discriminative analysis. Objective The goal of this study is to provide a practical support to the mainstream learning models (eg, logistic regression). Methods We adapted a novel homomorphic encryption scheme optimized for real numbers computation. We devised (1) the least squares approximation of the logistic function for accuracy and efficiency (ie, reduce computation cost) and (2) new packing and parallelization techniques. Results Using real-world datasets, we evaluated the performance of our model and demonstrated its feasibility in speed and memory consumption. For example, it took approximately 116 minutes to obtain the training model from the homomorphically encrypted Edinburgh dataset. In addition, it gives fairly accurate predictions on the testing dataset. Conclusions We present the first homomorphically encrypted logistic regression outsourcing model based on the critical observation that the precision loss of classification models is sufficiently small so that the decision plan stays still. PMID:29666041
NASA Astrophysics Data System (ADS)
Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele
2015-11-01
The aim of this work is to define reliable susceptibility models for shallow landslides using Logistic Regression and Random Forests multivariate statistical techniques. The study area, located in North-East Sicily, was hit on October 1st 2009 by a severe rainstorm (225 mm of cumulative rainfall in 7 h) which caused flash floods and more than 1000 landslides. Several small villages, such as Giampilieri, were hit with 31 fatalities, 6 missing persons and damage to buildings and transportation infrastructures. Landslides, mainly types such as earth and debris translational slides evolving into debris flows, were triggered on steep slopes and involved colluvium and regolith materials which cover the underlying metamorphic bedrock. The work has been carried out with the following steps: i) realization of a detailed event landslide inventory map through field surveys coupled with observation of high resolution aerial colour orthophoto; ii) identification of landslide source areas; iii) data preparation of landslide controlling factors and descriptive statistics based on a bivariate method (Frequency Ratio) to get an initial overview on existing relationships between causative factors and shallow landslide source areas; iv) choice of criteria for the selection and sizing of the mapping unit; v) implementation of 5 multivariate statistical susceptibility models based on Logistic Regression and Random Forests techniques and focused on landslide source areas; vi) evaluation of the influence of sample size and type of sampling on results and performance of the models; vii) evaluation of the predictive capabilities of the models using ROC curve, AUC and contingency tables; viii) comparison of model results and obtained susceptibility maps; and ix) analysis of temporal variation of landslide susceptibility related to input parameter changes. Models based on Logistic Regression and Random Forests have demonstrated excellent predictive capabilities. Land use and wildfire variables were found to have a strong control on the occurrence of very rapid shallow landslides.
Choi, Seung Hoan; Labadorf, Adam T; Myers, Richard H; Lunetta, Kathryn L; Dupuis, Josée; DeStefano, Anita L
2017-02-06
Next generation sequencing provides a count of RNA molecules in the form of short reads, yielding discrete, often highly non-normally distributed gene expression measurements. Although Negative Binomial (NB) regression has been generally accepted in the analysis of RNA sequencing (RNA-Seq) data, its appropriateness has not been exhaustively evaluated. We explore logistic regression as an alternative method for RNA-Seq studies designed to compare cases and controls, where disease status is modeled as a function of RNA-Seq reads using simulated and Huntington disease data. We evaluate the effect of adjusting for covariates that have an unknown relationship with gene expression. Finally, we incorporate the data adaptive method in order to compare false positive rates. When the sample size is small or the expression levels of a gene are highly dispersed, the NB regression shows inflated Type-I error rates but the Classical logistic and Bayes logistic (BL) regressions are conservative. Firth's logistic (FL) regression performs well or is slightly conservative. Large sample size and low dispersion generally make Type-I error rates of all methods close to nominal alpha levels of 0.05 and 0.01. However, Type-I error rates are controlled after applying the data adaptive method. The NB, BL, and FL regressions gain increased power with large sample size, large log2 fold-change, and low dispersion. The FL regression has comparable power to NB regression. We conclude that implementing the data adaptive method appropriately controls Type-I error rates in RNA-Seq analysis. Firth's logistic regression provides a concise statistical inference process and reduces spurious associations from inaccurately estimated dispersion parameters in the negative binomial framework.
NASA Astrophysics Data System (ADS)
Ma, Chuang; Bao, Zhong-Kui; Zhang, Hai-Feng
2017-10-01
So far, many network-structure-based link prediction methods have been proposed. However, these methods only highlight one or two structural features of networks, and then use the methods to predict missing links in different networks. The performances of these existing methods are not always satisfied in all cases since each network has its unique underlying structural features. In this paper, by analyzing different real networks, we find that the structural features of different networks are remarkably different. In particular, even in the same network, their inner structural features are utterly different. Therefore, more structural features should be considered. However, owing to the remarkably different structural features, the contributions of different features are hard to be given in advance. Inspired by these facts, an adaptive fusion model regarding link prediction is proposed to incorporate multiple structural features. In the model, a logistic function combing multiple structural features is defined, then the weight of each feature in the logistic function is adaptively determined by exploiting the known structure information. Last, we use the "learnt" logistic function to predict the connection probabilities of missing links. According to our experimental results, we find that the performance of our adaptive fusion model is better than many similarity indices.
Cascaded face alignment via intimacy definition feature
NASA Astrophysics Data System (ADS)
Li, Hailiang; Lam, Kin-Man; Chiu, Man-Yau; Wu, Kangheng; Lei, Zhibin
2017-09-01
Recent years have witnessed the emerging popularity of regression-based face aligners, which directly learn mappings between facial appearance and shape-increment manifolds. We propose a random-forest based, cascaded regression model for face alignment by using a locally lightweight feature, namely intimacy definition feature. This feature is more discriminative than the pose-indexed feature, more efficient than the histogram of oriented gradients feature and the scale-invariant feature transform feature, and more compact than the local binary feature (LBF). Experimental validation of our algorithm shows that our approach achieves state-of-the-art performance when testing on some challenging datasets. Compared with the LBF-based algorithm, our method achieves about twice the speed, 20% improvement in terms of alignment accuracy and saves an order of magnitude on memory requirement.
Park, Ji Hyun; Kim, Hyeon-Young; Lee, Hanna; Yun, Eun Kyoung
2015-12-01
This study compares the performance of the logistic regression and decision tree analysis methods for assessing the risk factors for infection in cancer patients undergoing chemotherapy. The subjects were 732 cancer patients who were receiving chemotherapy at K university hospital in Seoul, Korea. The data were collected between March 2011 and February 2013 and were processed for descriptive analysis, logistic regression and decision tree analysis using the IBM SPSS Statistics 19 and Modeler 15.1 programs. The most common risk factors for infection in cancer patients receiving chemotherapy were identified as alkylating agents, vinca alkaloid and underlying diabetes mellitus. The logistic regression explained 66.7% of the variation in the data in terms of sensitivity and 88.9% in terms of specificity. The decision tree analysis accounted for 55.0% of the variation in the data in terms of sensitivity and 89.0% in terms of specificity. As for the overall classification accuracy, the logistic regression explained 88.0% and the decision tree analysis explained 87.2%. The logistic regression analysis showed a higher degree of sensitivity and classification accuracy. Therefore, logistic regression analysis is concluded to be the more effective and useful method for establishing an infection prediction model for patients undergoing chemotherapy. Copyright © 2015 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Mei, Zhixiong; Wu, Hao; Li, Shiyun
2018-06-01
The Conversion of Land Use and its Effects at Small regional extent (CLUE-S), which is a widely used model for land-use simulation, utilizes logistic regression to estimate the relationships between land use and its drivers, and thus, predict land-use change probabilities. However, logistic regression disregards possible spatial autocorrelation and self-organization in land-use data. Autologistic regression can depict spatial autocorrelation but cannot address self-organization, while logistic regression by considering only self-organization (NElogistic regression) fails to capture spatial autocorrelation. Therefore, this study developed a regression (NE-autologistic regression) method, which incorporated both spatial autocorrelation and self-organization, to improve CLUE-S. The Zengcheng District of Guangzhou, China was selected as the study area. The land-use data of 2001, 2005, and 2009, as well as 10 typical driving factors, were used to validate the proposed regression method and the improved CLUE-S model. Then, three future land-use scenarios in 2020: the natural growth scenario, ecological protection scenario, and economic development scenario, were simulated using the improved model. Validation results showed that NE-autologistic regression performed better than logistic regression, autologistic regression, and NE-logistic regression in predicting land-use change probabilities. The spatial allocation accuracy and kappa values of NE-autologistic-CLUE-S were higher than those of logistic-CLUE-S, autologistic-CLUE-S, and NE-logistic-CLUE-S for the simulations of two periods, 2001-2009 and 2005-2009, which proved that the improved CLUE-S model achieved the best simulation and was thereby effective to a certain extent. The scenario simulation results indicated that under all three scenarios, traffic land and residential/industrial land would increase, whereas arable land and unused land would decrease during 2009-2020. Apparent differences also existed in the simulated change sizes and locations of each land-use type under different scenarios. The results not only demonstrate the validity of the improved model but also provide a valuable reference for relevant policy-makers.
Child sex tourism - prevalence of and risk factors for its use in a German community sample.
Koops, Thula; Turner, Daniel; Neutze, Janina; Briken, Peer
2017-04-20
To investigate the prevalence of child sex tourism (CST) in a large German community sample, and to compare those who made use of CST with other child sexual abusers regarding established characteristics and risk factors for child sexual abuse. Adult German men were recruited through a German market research panel and questioned by means of an anonymous online survey. Group assignment was accomplished based on information on previous sexual contacts with children and previous use of CST. Characteristics and risk factors were compared between the groups using t- and Chi-square tests. Binary logistic regression analysis was performed to predict CST. Data collection was conducted in 2013, data analysis in January 2015. Out of 8718 men, 36 (0.4%) reported CST use. The CST group differed from the nonCST group (n = 96; 1.1%) with regard to pedophilic sexual and antisocial behaviors as well as own experiences of sexual abuse. Social difficulties, pedophilic sexual interests, and hypersexuality were not distinct features in the CST group. Own experiences of sexual abuse, child prostitution use, and previous conviction for a violent offense predicted CST in a logistic regression model. This study is a first step to gain insight into the prevalence and characteristics of men using CST. Findings could help to augment prevention strategies against commercial forms of sexual abuse in developed as well as in developing countries by fostering the knowledge about the characteristics of perpetrators.
Sreeramareddy, Chandrashekhar T; Panduru, Kishore V; Verma, Sharat C; Joshi, Hari S; Bates, Michael N
2008-01-24
Studies from developed countries have reported on host-related risk factors for extra-pulmonary tuberculosis (EPTB). However, similar studies from high-burden countries like Nepal are lacking. Therefore, we carried out this study to compare demographic, life-style and clinical characteristics between EPTB and PTB patients. A retrospective analysis was carried out on 474 Tuberculosis (TB) patients diagnosed in a tertiary care hospital in western Nepal. Characteristics of demography, life-style and clinical features were obtained from medical case records. Risk factors for being an EPTB patient relative to a PTB patient were identified using logistic regression analysis. The age distribution of the TB patients had a bimodal distribution. The male to female ratio for PTB was 2.29. EPTB was more common at younger ages (< 25 years) and in females. Common sites for EPTB were lymph nodes (42.6%) and peritoneum and/or intestines (14.8%). By logistic regression analysis, age less than 25 years (OR 2.11 95% CI 1.12-3.68) and female gender (OR 1.69, 95% CI 1.12-2.56) were associated with EPTB. Smoking, use of immunosuppressive drugs/steroids, diabetes and past history of TB were more likely to be associated with PTB. Results suggest that younger age and female gender may be independent risk factors for EPTB in a high-burden country like Nepal. TB control programmes may target young and female populations for EPTB case-finding. Further studies are necessary in other high-burden countries to confirm our findings.
Unitary Response Regression Models
ERIC Educational Resources Information Center
Lipovetsky, S.
2007-01-01
The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…
Aided diagnosis methods of breast cancer based on machine learning
NASA Astrophysics Data System (ADS)
Zhao, Yue; Wang, Nian; Cui, Xiaoyu
2017-08-01
In the field of medicine, quickly and accurately determining whether the patient is malignant or benign is the key to treatment. In this paper, K-Nearest Neighbor, Linear Discriminant Analysis, Logistic Regression were applied to predict the classification of thyroid,Her-2,PR,ER,Ki67,metastasis and lymph nodes in breast cancer, in order to recognize the benign and malignant breast tumors and achieve the purpose of aided diagnosis of breast cancer. The results showed that the highest classification accuracy of LDA was 88.56%, while the classification effect of KNN and Logistic Regression were better than that of LDA, the best accuracy reached 96.30%.
Logistic Regression and Path Analysis Method to Analyze Factors influencing Students’ Achievement
NASA Astrophysics Data System (ADS)
Noeryanti, N.; Suryowati, K.; Setyawan, Y.; Aulia, R. R.
2018-04-01
Students' academic achievement cannot be separated from the influence of two factors namely internal and external factors. The first factors of the student (internal factors) consist of intelligence (X1), health (X2), interest (X3), and motivation of students (X4). The external factors consist of family environment (X5), school environment (X6), and society environment (X7). The objects of this research are eighth grade students of the school year 2016/2017 at SMPN 1 Jiwan Madiun sampled by using simple random sampling. Primary data are obtained by distributing questionnaires. The method used in this study is binary logistic regression analysis that aims to identify internal and external factors that affect student’s achievement and how the trends of them. Path Analysis was used to determine the factors that influence directly, indirectly or totally on student’s achievement. Based on the results of binary logistic regression, variables that affect student’s achievement are interest and motivation. And based on the results obtained by path analysis, factors that have a direct impact on student’s achievement are students’ interest (59%) and students’ motivation (27%). While the factors that have indirect influences on students’ achievement, are family environment (97%) and school environment (37).
Worku, Yohannes; Muchie, Mammo
2012-01-01
Objective. The objective was to investigate factors that affect the efficient management of solid waste produced by commercial businesses operating in the city of Pretoria, South Africa. Methods. Data was gathered from 1,034 businesses. Efficiency in solid waste management was assessed by using a structural time-based model designed for evaluating efficiency as a function of the length of time required to manage waste. Data analysis was performed using statistical procedures such as frequency tables, Pearson's chi-square tests of association, and binary logistic regression analysis. Odds ratios estimated from logistic regression analysis were used for identifying key factors that affect efficiency in the proper disposal of waste. Results. The study showed that 857 of the 1,034 businesses selected for the study (83%) were found to be efficient enough with regards to the proper collection and disposal of solid waste. Based on odds ratios estimated from binary logistic regression analysis, efficiency in the proper management of solid waste was significantly influenced by 4 predictor variables. These 4 influential predictor variables are lack of adherence to waste management regulations, wrong perception, failure to provide customers with enough trash cans, and operation of businesses by employed managers, in a decreasing order of importance. PMID:23209483
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, R; Aguilera, T; Shultz, D
2014-06-15
Purpose: This study aims to develop predictive models of patient outcome by extracting advanced imaging features (i.e., Radiomics) from FDG-PET images. Methods: We acquired pre-treatment PET scans for 51 stage I NSCLC patients treated with SABR. We calculated 139 quantitative features from each patient PET image, including 5 morphological features, 8 statistical features, 27 texture features, and 100 features from the intensity-volume histogram. Based on the imaging features, we aim to distinguish between 2 risk groups of patients: those with regional failure or distant metastasis versus those without. We investigated 3 pattern classification algorithms: linear discriminant analysis (LDA), naive Bayesmore » (NB), and logistic regression (LR). To avoid the curse of dimensionality, we performed feature selection by first removing redundant features and then applying sequential forward selection using the wrapper approach. To evaluate the predictive performance, we performed 10-fold cross validation with 1000 random splits of the data and calculated the area under the ROC curve (AUC). Results: Feature selection identified 2 texture features (homogeneity and/or wavelet decompositions) for NB and LR, while for LDA SUVmax and one texture feature (correlation) were identified. All 3 classifiers achieved statistically significant improvements over conventional PET imaging metrics such as tumor volume (AUC = 0.668) and SUVmax (AUC = 0.737). Overall, NB achieved the best predictive performance (AUC = 0.806). This also compares favorably with MTV using the best threshold at an SUV of 11.6 (AUC = 0.746). At a sensitivity of 80%, NB achieved 69% specificity, while SUVmax and tumor volume only had 36% and 47% specificity. Conclusion: Through a systematic analysis of advanced PET imaging features, we are able to build models with improved predictive value over conventional imaging metrics. If validated in a large independent cohort, the proposed techniques could potentially aid in identifying patients who might benefit from adjuvant therapy.« less
A wavelet-based technique to predict treatment outcome for Major Depressive Disorder
Xia, Likun; Mohd Yasin, Mohd Azhar; Azhar Ali, Syed Saad
2017-01-01
Treatment management for Major Depressive Disorder (MDD) has been challenging. However, electroencephalogram (EEG)-based predictions of antidepressant’s treatment outcome may help during antidepressant’s selection and ultimately improve the quality of life for MDD patients. In this study, a machine learning (ML) method involving pretreatment EEG data was proposed to perform such predictions for Selective Serotonin Reuptake Inhibitor (SSRIs). For this purpose, the acquisition of experimental data involved 34 MDD patients and 30 healthy controls. Consequently, a feature matrix was constructed involving time-frequency decomposition of EEG data based on wavelet transform (WT) analysis, termed as EEG data matrix. However, the resultant EEG data matrix had high dimensionality. Therefore, dimension reduction was performed based on a rank-based feature selection method according to a criterion, i.e., receiver operating characteristic (ROC). As a result, the most significant features were identified and further be utilized during the training and testing of a classification model, i.e., the logistic regression (LR) classifier. Finally, the LR model was validated with 100 iterations of 10-fold cross-validation (10-CV). The classification results were compared with short-time Fourier transform (STFT) analysis, and empirical mode decompositions (EMD). The wavelet features extracted from frontal and temporal EEG data were found statistically significant. In comparison with other time-frequency approaches such as the STFT and EMD, the WT analysis has shown highest classification accuracy, i.e., accuracy = 87.5%, sensitivity = 95%, and specificity = 80%. In conclusion, significant wavelet coefficients extracted from frontal and temporal pre-treatment EEG data involving delta and theta frequency bands may predict antidepressant’s treatment outcome for the MDD patients. PMID:28152063
Jahandideh, Samad; Srinivasasainagendra, Vinodh; Zhi, Degui
2012-11-07
RNA-protein interaction plays an important role in various cellular processes, such as protein synthesis, gene regulation, post-transcriptional gene regulation, alternative splicing, and infections by RNA viruses. In this study, using Gene Ontology Annotated (GOA) and Structural Classification of Proteins (SCOP) databases an automatic procedure was designed to capture structurally solved RNA-binding protein domains in different subclasses. Subsequently, we applied tuned multi-class SVM (TMCSVM), Random Forest (RF), and multi-class ℓ1/ℓq-regularized logistic regression (MCRLR) for analysis and classifying RNA-binding protein domains based on a comprehensive set of sequence and structural features. In this study, we compared prediction accuracy of three different state-of-the-art predictor methods. From our results, TMCSVM outperforms the other methods and suggests the potential of TMCSVM as a useful tool for facilitating the multi-class prediction of RNA-binding protein domains. On the other hand, MCRLR by elucidating importance of features for their contribution in predictive accuracy of RNA-binding protein domains subclasses, helps us to provide some biological insights into the roles of sequences and structures in protein-RNA interactions.
Detection of chewing from piezoelectric film sensor signals using ensemble classifiers.
Farooq, Muhammad; Sazonov, Edward
2016-08-01
Selection and use of pattern recognition algorithms is application dependent. In this work, we explored the use of several ensembles of weak classifiers to classify signals captured from a wearable sensor system to detect food intake based on chewing. Three sensor signals (Piezoelectric sensor, accelerometer, and hand to mouth gesture) were collected from 12 subjects in free-living conditions for 24 hrs. Sensor signals were divided into 10 seconds epochs and for each epoch combination of time and frequency domain features were computed. In this work, we present a comparison of three different ensemble techniques: boosting (AdaBoost), bootstrap aggregation (bagging) and stacking, each trained with 3 different weak classifiers (Decision Trees, Linear Discriminant Analysis (LDA) and Logistic Regression). Type of feature normalization used can also impact the classification results. For each ensemble method, three feature normalization techniques: (no-normalization, z-score normalization, and minmax normalization) were tested. A 12 fold cross-validation scheme was used to evaluate the performance of each model where the performance was evaluated in terms of precision, recall, and accuracy. Best results achieved here show an improvement of about 4% over our previous algorithms.
Using Evidence-Based Decision Trees Instead of Formulas to Identify At-Risk Readers. REL 2014-036
ERIC Educational Resources Information Center
Koon, Sharon; Petscher, Yaacov; Foorman, Barbara R.
2014-01-01
This study examines whether the classification and regression tree (CART) model improves the early identification of students at risk for reading comprehension difficulties compared with the more difficult to interpret logistic regression model. CART is a type of predictive modeling that relies on nonparametric techniques. It presents results in…
Javaid, MK; Kiran, A; Guermazi, A; Kwoh, K; Zaim, S; Carbone, L; Harris, T.; McCulloch, C.E.; Arden, NK; Lane, NE; Felson, D; Nevitt, M
2012-01-01
Strong associations between radiographic features of knee OA and pain have been demonstrated in persons with unilateral knee symptoms. Our objectives were to compare radiographic with MRI features of knee OA and assess the discrimination between painful and non-painful knees in persons with unilateral symptoms. 283 individuals with unilateral knee pain aged 71 to 80 years from Health ABC, a study of weight-related diseases and mobility, had bilateral knee radiographs, read for KL grade and individual radiographic features, and 1.5T MRIs, read using WORMS. The association of structural features with pain was assessed using a within-person case/control design and conditional logistic regression. Receiver operator characteristics (ROC) were then used to test the discriminatory performance of structural features. In conditional logistic analyses, knee pain was significantly associated with both radiographic (any JSN grade >=1: OR 3.20 (1.79 – 5.71) and MRI (any cartilage defect:>=2: OR 3.67 (1.49 – 9.04)) features. However, most subjects had MR detected osteophytes, cartilage and bone marrow lesions in both knees and no individual structural feature discriminated well between painful and non-painful knees using ROC. The best performing MRI feature (synovitis/effusion) was not significantly more informative than KL grade >=2 (p=0.42). In persons with unilateral knee pain, MR and radiographic features were associated with knee pain confirming an important role in the etiology of pain. However, no single MRI or radiographic finding performed well in discriminating painful from non-painful knees. Further work is needed to examine how structural and non-structural factors influence knee pain. PMID:22736267
NASA Astrophysics Data System (ADS)
Keller, Brad M.; Chen, Jinbo; Conant, Emily F.; Kontos, Despina
2014-03-01
Accurate assessment of a woman's risk to develop specific subtypes of breast cancer is critical for appropriate utilization of chemopreventative measures, such as with tamoxifen in preventing estrogen-receptor positive breast cancer. In this context, we investigate quantitative measures of breast density and parenchymal texture, measures of glandular tissue content and tissue structure, as risk factors for estrogen-receptor positive (ER+) breast cancer. Mediolateral oblique (MLO) view digital mammograms of the contralateral breast from 106 women with unilateral invasive breast cancer were retrospectively analyzed. Breast density and parenchymal texture were analyzed via fully-automated software. Logistic regression with feature selection and was performed to predict ER+ versus ER- cancer status. A combined model considering all imaging measures extracted was compared to baseline models consisting of density-alone and texture-alone features. Area under the curve (AUC) of the receiver operating characteristic (ROC) and Delong's test were used to compare the models' discriminatory capacity for receptor status. The density-alone model had a discriminatory capacity of 0.62 AUC (p=0.05). The texture-alone model had a higher discriminatory capacity of 0.70 AUC (p=0.001), which was not significantly different compared to the density-alone model (p=0.37). In contrast the combined density-texture logistic regression model had a discriminatory capacity of 0.82 AUC (p<0.001), which was statistically significantly higher than both the density-alone (p<0.001) and texture-alone regression models (p=0.04). The combination of breast density and texture measures may have the potential to identify women specifically at risk for estrogen-receptor positive breast cancer and could be useful in triaging women into appropriate risk-reduction strategies.
Mixed conditional logistic regression for habitat selection studies.
Duchesne, Thierry; Fortin, Daniel; Courbin, Nicolas
2010-05-01
1. Resource selection functions (RSFs) are becoming a dominant tool in habitat selection studies. RSF coefficients can be estimated with unconditional (standard) and conditional logistic regressions. While the advantage of mixed-effects models is recognized for standard logistic regression, mixed conditional logistic regression remains largely overlooked in ecological studies. 2. We demonstrate the significance of mixed conditional logistic regression for habitat selection studies. First, we use spatially explicit models to illustrate how mixed-effects RSFs can be useful in the presence of inter-individual heterogeneity in selection and when the assumption of independence from irrelevant alternatives (IIA) is violated. The IIA hypothesis states that the strength of preference for habitat type A over habitat type B does not depend on the other habitat types also available. Secondly, we demonstrate the significance of mixed-effects models to evaluate habitat selection of free-ranging bison Bison bison. 3. When movement rules were homogeneous among individuals and the IIA assumption was respected, fixed-effects RSFs adequately described habitat selection by simulated animals. In situations violating the inter-individual homogeneity and IIA assumptions, however, RSFs were best estimated with mixed-effects regressions, and fixed-effects models could even provide faulty conclusions. 4. Mixed-effects models indicate that bison did not select farmlands, but exhibited strong inter-individual variations in their response to farmlands. Less than half of the bison preferred farmlands over forests. Conversely, the fixed-effect model simply suggested an overall selection for farmlands. 5. Conditional logistic regression is recognized as a powerful approach to evaluate habitat selection when resource availability changes. This regression is increasingly used in ecological studies, but almost exclusively in the context of fixed-effects models. Fitness maximization can imply differences in trade-offs among individuals, which can yield inter-individual differences in selection and lead to departure from IIA. These situations are best modelled with mixed-effects models. Mixed-effects conditional logistic regression should become a valuable tool for ecological research.
Exploring Sampling in the Detection of Multicategory EEG Signals
Siuly, Siuly; Kabir, Enamul; Wang, Hua; Zhang, Yanchun
2015-01-01
The paper presents a structure based on samplings and machine leaning techniques for the detection of multicategory EEG signals where random sampling (RS) and optimal allocation sampling (OS) are explored. In the proposed framework, before using the RS and OS scheme, the entire EEG signals of each class are partitioned into several groups based on a particular time period. The RS and OS schemes are used in order to have representative observations from each group of each category of EEG data. Then all of the selected samples by the RS from the groups of each category are combined in a one set named RS set. In the similar way, for the OS scheme, an OS set is obtained. Then eleven statistical features are extracted from the RS and OS set, separately. Finally this study employs three well-known classifiers: k-nearest neighbor (k-NN), multinomial logistic regression with a ridge estimator (MLR), and support vector machine (SVM) to evaluate the performance for the RS and OS feature set. The experimental outcomes demonstrate that the RS scheme well represents the EEG signals and the k-NN with the RS is the optimum choice for detection of multicategory EEG signals. PMID:25977705
Cheung, Li C; Pan, Qing; Hyun, Noorie; Schiffman, Mark; Fetterman, Barbara; Castle, Philip E; Lorey, Thomas; Katki, Hormuzd A
2017-09-30
For cost-effectiveness and efficiency, many large-scale general-purpose cohort studies are being assembled within large health-care providers who use electronic health records. Two key features of such data are that incident disease is interval-censored between irregular visits and there can be pre-existing (prevalent) disease. Because prevalent disease is not always immediately diagnosed, some disease diagnosed at later visits are actually undiagnosed prevalent disease. We consider prevalent disease as a point mass at time zero for clinical applications where there is no interest in time of prevalent disease onset. We demonstrate that the naive Kaplan-Meier cumulative risk estimator underestimates risks at early time points and overestimates later risks. We propose a general family of mixture models for undiagnosed prevalent disease and interval-censored incident disease that we call prevalence-incidence models. Parameters for parametric prevalence-incidence models, such as the logistic regression and Weibull survival (logistic-Weibull) model, are estimated by direct likelihood maximization or by EM algorithm. Non-parametric methods are proposed to calculate cumulative risks for cases without covariates. We compare naive Kaplan-Meier, logistic-Weibull, and non-parametric estimates of cumulative risk in the cervical cancer screening program at Kaiser Permanente Northern California. Kaplan-Meier provided poor estimates while the logistic-Weibull model was a close fit to the non-parametric. Our findings support our use of logistic-Weibull models to develop the risk estimates that underlie current US risk-based cervical cancer screening guidelines. Published 2017. This article has been contributed to by US Government employees and their work is in the public domain in the USA. Published 2017. This article has been contributed to by US Government employees and their work is in the public domain in the USA.
Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.
2003-01-01
Logistic regression was used to predict the probability of debris flows occurring in areas recently burned by wildland fires. Multiple logistic regression is conceptually similar to multiple linear regression because statistical relations between one dependent variable and several independent variables are evaluated. In logistic regression, however, the dependent variable is transformed to a binary variable (debris flow did or did not occur), and the actual probability of the debris flow occurring is statistically modeled. Data from 399 basins located within 15 wildland fires that burned during 2000-2002 in Colorado, Idaho, Montana, and New Mexico were evaluated. More than 35 independent variables describing the burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows were delineated from National Elevation Data using a Geographic Information System (GIS). (2) Data describing the burn severity, geology, land surface gradient, rainfall, and soil properties were determined for each basin. These data were then downloaded to a statistics software package for analysis using logistic regression. (3) Relations between the occurrence/non-occurrence of debris flows and burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated and several preliminary multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combination produced the most effective model. The multivariate model that best predicted the occurrence of debris flows was selected. (4) The multivariate logistic regression model was entered into a GIS, and a map showing the probability of debris flows was constructed. The most effective model incorporates the percentage of each basin with slope greater than 30 percent, percentage of land burned at medium and high burn severity in each basin, particle size sorting, average storm intensity (millimeters per hour), soil organic matter content, soil permeability, and soil drainage. The results of this study demonstrate that logistic regression is a valuable tool for predicting the probability of debris flows occurring in recently-burned landscapes.
NASA Astrophysics Data System (ADS)
Sánchez, Clara I.; Hornero, Roberto; Mayo, Agustín; García, María
2009-02-01
Diabetic Retinopathy is one of the leading causes of blindness and vision defects in developed countries. An early detection and diagnosis is crucial to avoid visual complication. Microaneurysms are the first ocular signs of the presence of this ocular disease. Their detection is of paramount importance for the development of a computer-aided diagnosis technique which permits a prompt diagnosis of the disease. However, the detection of microaneurysms in retinal images is a difficult task due to the wide variability that these images usually present in screening programs. We propose a statistical approach based on mixture model-based clustering and logistic regression which is robust to the changes in the appearance of retinal fundus images. The method is evaluated on the public database proposed by the Retinal Online Challenge in order to obtain an objective performance measure and to allow a comparative study with other proposed algorithms.
Predicting β-Turns in Protein Using Kernel Logistic Regression
Elbashir, Murtada Khalafallah; Sheng, Yu; Wang, Jianxin; Wu, FangXiang; Li, Min
2013-01-01
A β-turn is a secondary protein structure type that plays a significant role in protein configuration and function. On average 25% of amino acids in protein structures are located in β-turns. It is very important to develope an accurate and efficient method for β-turns prediction. Most of the current successful β-turns prediction methods use support vector machines (SVMs) or neural networks (NNs). The kernel logistic regression (KLR) is a powerful classification technique that has been applied successfully in many classification problems. However, it is often not found in β-turns classification, mainly because it is computationally expensive. In this paper, we used KLR to obtain sparse β-turns prediction in short evolution time. Secondary structure information and position-specific scoring matrices (PSSMs) are utilized as input features. We achieved Q total of 80.7% and MCC of 50% on BT426 dataset. These results show that KLR method with the right algorithm can yield performance equivalent to or even better than NNs and SVMs in β-turns prediction. In addition, KLR yields probabilistic outcome and has a well-defined extension to multiclass case. PMID:23509793
Predicting β-turns in protein using kernel logistic regression.
Elbashir, Murtada Khalafallah; Sheng, Yu; Wang, Jianxin; Wu, Fangxiang; Li, Min
2013-01-01
A β-turn is a secondary protein structure type that plays a significant role in protein configuration and function. On average 25% of amino acids in protein structures are located in β-turns. It is very important to develope an accurate and efficient method for β-turns prediction. Most of the current successful β-turns prediction methods use support vector machines (SVMs) or neural networks (NNs). The kernel logistic regression (KLR) is a powerful classification technique that has been applied successfully in many classification problems. However, it is often not found in β-turns classification, mainly because it is computationally expensive. In this paper, we used KLR to obtain sparse β-turns prediction in short evolution time. Secondary structure information and position-specific scoring matrices (PSSMs) are utilized as input features. We achieved Q total of 80.7% and MCC of 50% on BT426 dataset. These results show that KLR method with the right algorithm can yield performance equivalent to or even better than NNs and SVMs in β-turns prediction. In addition, KLR yields probabilistic outcome and has a well-defined extension to multiclass case.
Rashid, Nasir; Iqbal, Javaid; Javed, Amna; Tiwana, Mohsin I; Khan, Umar Shahbaz
2018-01-01
Brain Computer Interface (BCI) determines the intent of the user from a variety of electrophysiological signals. These signals, Slow Cortical Potentials, are recorded from scalp, and cortical neuronal activity is recorded by implanted electrodes. This paper is focused on design of an embedded system that is used to control the finger movements of an upper limb prosthesis using Electroencephalogram (EEG) signals. This is a follow-up of our previous research which explored the best method to classify three movements of fingers (thumb movement, index finger movement, and first movement). Two-stage logistic regression classifier exhibited the highest classification accuracy while Power Spectral Density (PSD) was used as a feature of the filtered signal. The EEG signal data set was recorded using a 14-channel electrode headset (a noninvasive BCI system) from right-handed, neurologically intact volunteers. Mu (commonly known as alpha waves) and Beta Rhythms (8-30 Hz) containing most of the movement data were retained through filtering using "Arduino Uno" microcontroller followed by 2-stage logistic regression to obtain a mean classification accuracy of 70%.
A statistical method for predicting seizure onset zones from human single-neuron recordings
NASA Astrophysics Data System (ADS)
Valdez, André B.; Hickman, Erin N.; Treiman, David M.; Smith, Kris A.; Steinmetz, Peter N.
2013-02-01
Objective. Clinicians often use depth-electrode recordings to localize human epileptogenic foci. To advance the diagnostic value of these recordings, we applied logistic regression models to single-neuron recordings from depth-electrode microwires to predict seizure onset zones (SOZs). Approach. We collected data from 17 epilepsy patients at the Barrow Neurological Institute and developed logistic regression models to calculate the odds of observing SOZs in the hippocampus, amygdala and ventromedial prefrontal cortex, based on statistics such as the burst interspike interval (ISI). Main results. Analysis of these models showed that, for a single-unit increase in burst ISI ratio, the left hippocampus was approximately 12 times more likely to contain a SOZ; and the right amygdala, 14.5 times more likely. Our models were most accurate for the hippocampus bilaterally (at 85% average sensitivity), and performance was comparable with current diagnostics such as electroencephalography. Significance. Logistic regression models can be combined with single-neuron recording to predict likely SOZs in epilepsy patients being evaluated for resective surgery, providing an automated source of clinically useful information.
Wang, Shuang; Zhang, Yuchen; Dai, Wenrui; Lauter, Kristin; Kim, Miran; Tang, Yuzhe; Xiong, Hongkai; Jiang, Xiaoqian
2016-01-01
Motivation: Genome-wide association studies (GWAS) have been widely used in discovering the association between genotypes and phenotypes. Human genome data contain valuable but highly sensitive information. Unprotected disclosure of such information might put individual’s privacy at risk. It is important to protect human genome data. Exact logistic regression is a bias-reduction method based on a penalized likelihood to discover rare variants that are associated with disease susceptibility. We propose the HEALER framework to facilitate secure rare variants analysis with a small sample size. Results: We target at the algorithm design aiming at reducing the computational and storage costs to learn a homomorphic exact logistic regression model (i.e. evaluate P-values of coefficients), where the circuit depth is proportional to the logarithmic scale of data size. We evaluate the algorithm performance using rare Kawasaki Disease datasets. Availability and implementation: Download HEALER at http://research.ucsd-dbmi.org/HEALER/ Contact: shw070@ucsd.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26446135
Testing Gene-Gene Interactions in the Case-Parents Design
Yu, Zhaoxia
2011-01-01
The case-parents design has been widely used to detect genetic associations as it can prevent spurious association that could occur in population-based designs. When examining the effect of an individual genetic locus on a disease, logistic regressions developed by conditioning on parental genotypes provide complete protection from spurious association caused by population stratification. However, when testing gene-gene interactions, it is unknown whether conditional logistic regressions are still robust. Here we evaluate the robustness and efficiency of several gene-gene interaction tests that are derived from conditional logistic regressions. We found that in the presence of SNP genotype correlation due to population stratification or linkage disequilibrium, tests with incorrectly specified main-genetic-effect models can lead to inflated type I error rates. We also found that a test with fully flexible main genetic effects always maintains correct test size and its robustness can be achieved with negligible sacrifice of its power. When testing gene-gene interactions is the focus, the test allowing fully flexible main effects is recommended to be used. PMID:21778736
Li, Saijiao; He, Aiyan; Yang, Jing; Yin, TaiLang; Xu, Wangming
2011-01-01
To investigate factors that can affect compliance with treatment of polycystic ovary syndrome (PCOS) in infertile patients and to provide a basis for clinical treatment, specialist consultation and health education. Patient compliance was assessed via a questionnaire based on the Morisky-Green test and the treatment principles of PCOS. Then interviews were conducted with 99 infertile patients diagnosed with PCOS at Renmin Hospital of Wuhan University in China, from March to September 2009. Finally, these data were analyzed using logistic regression analysis. Logistic regression analysis revealed that a total of 23 (25.6%) of the participants showed good compliance. Factors that significantly (p < 0.05) affected compliance with treatment were the patient's body mass index, convenience of medical treatment and concerns about adverse drug reactions. Patients who are obese, experience inconvenient medical treatment or are concerned about adverse drug reactions are more likely to exhibit noncompliance. Treatment education and intervention aimed at these patients should be strengthened in the clinic to improve treatment compliance. Further research is needed to better elucidate the compliance behavior of patients with PCOS.
Providing written language services in the schools: the time is now.
Fallon, Karen A; Katz, Lauren A
2011-01-01
The current study was conducted to investigate the provision of written language services by school-based speech-language pathologists (SLPs). Specifically, the study examined SLPs' knowledge, attitudes, and collaborative practices in the area of written language services as well as the variables that impact provision of these services. Public school-based SLPs from across the country were solicited for participation in an online, Web-based survey. Data from 645 full-time SLPs from 49 states were evaluated using descriptive statistics and logistic regression. Many school-based SLPs reported not providing any services in the area of written language to students with written language weaknesses. Knowledge, attitudes, and collaborative practices were mixed. A logistic regression revealed three variables likely to predict high levels of service provision in the area of written language. Data from the current study revealed that many struggling readers and writers on school-based SLPs' caseloads are not receiving services from their SLPs. Implications for SLPs' preservice preparation, continuing education, and doctoral preparation are discussed.
Kempe, P T; van Oppen, P; de Haan, E; Twisk, J W R; Sluis, A; Smit, J H; van Dyck, R; van Balkom, A J L M
2007-09-01
Two methods for predicting remissions in obsessive-compulsive disorder (OCD) treatment are evaluated. Y-BOCS measurements of 88 patients with a primary OCD (DSM-III-R) diagnosis were performed over a 16-week treatment period, and during three follow-ups. Remission at any measurement was defined as a Y-BOCS score lower than thirteen combined with a reduction of seven points when compared with baseline. Logistic regression models were compared with a Cox regression for recurrent events model. Logistic regression yielded different models at different evaluation times. The recurrent events model remained stable when fewer measurements were used. Higher baseline levels of neuroticism and more severe OCD symptoms were associated with a lower chance of remission, early age of onset and more depressive symptoms with a higher chance. Choice of outcome time affects logistic regression prediction models. Recurrent events analysis uses all information on remissions and relapses. Short- and long-term predictors for OCD remission show overlap.
Estimating the exceedance probability of rain rate by logistic regression
NASA Technical Reports Server (NTRS)
Chiu, Long S.; Kedem, Benjamin
1990-01-01
Recent studies have shown that the fraction of an area with rain intensity above a fixed threshold is highly correlated with the area-averaged rain rate. To estimate the fractional rainy area, a logistic regression model, which estimates the conditional probability that rain rate over an area exceeds a fixed threshold given the values of related covariates, is developed. The problem of dependency in the data in the estimation procedure is bypassed by the method of partial likelihood. Analyses of simulated scanning multichannel microwave radiometer and observed electrically scanning microwave radiometer data during the Global Atlantic Tropical Experiment period show that the use of logistic regression in pixel classification is superior to multiple regression in predicting whether rain rate at each pixel exceeds a given threshold, even in the presence of noisy data. The potential of the logistic regression technique in satellite rain rate estimation is discussed.
Automated identification of diagnosis and co-morbidity in clinical records.
Cano, C; Blanco, A; Peshkin, L
2009-01-01
Automated understanding of clinical records is a challenging task involving various legal and technical difficulties. Clinical free text is inherently redundant, unstructured, and full of acronyms, abbreviations and domain-specific language which make it challenging to mine automatically. There is much effort in the field focused on creating specialized ontology, lexicons and heuristics based on expert knowledge of the domain. However, ad-hoc solutions poorly generalize across diseases or diagnoses. This paper presents a successful approach for a rapid prototyping of a diagnosis classifier based on a popular computational linguistics platform. The corpus consists of several hundred of full length discharge summaries provided by Partners Healthcare. The goal is to identify a diagnosis and assign co-morbidi-ty. Our approach is based on the rapid implementation of a logistic regression classifier using an existing toolkit: LingPipe (http://alias-i.com/lingpipe). We implement and compare three different classifiers. The baseline approach uses character 5-grams as features. The second approach uses a bag-of-words representation enriched with a small additional set of features. The third approach reduces a feature set to the most informative features according to the information content. The proposed systems achieve high performance (average F-micro 0.92) for the task. We discuss the relative merit of the three classifiers. Supplementary material with detailed results is available at: http:// decsai.ugr.es/~ccano/LR/supplementary_ material/ We show that our methodology for rapid prototyping of a domain-unaware system is effective for building an accurate classifier for clinical records.
NASA Astrophysics Data System (ADS)
Aghaei, Faranak; Mirniaharikandehei, Seyedehnafiseh; Hollingsworth, Alan B.; Stoug, Rebecca G.; Pearce, Melanie; Liu, Hong; Zheng, Bin
2018-03-01
Although breast magnetic resonance imaging (MRI) has been used as a breast cancer screening modality for high-risk women, its cancer detection yield remains low (i.e., <= 3%). Thus, increasing breast MRI screening efficacy and cancer detection yield is an important clinical issue in breast cancer screening. In this study, we investigated association between the background parenchymal enhancement (BPE) of breast MRI and the change of diagnostic (BIRADS) status in the next subsequent breast MRI screening. A dataset with 65 breast MRI screening cases was retrospectively assembled. All cases were rated BIRADS-2 (benign findings). In the subsequent screening, 4 cases were malignant (BIRADS-6), 48 remained BIRADS-2 and 13 were downgraded to negative (BIRADS-1). A computer-aided detection scheme was applied to process images of the first set of breast MRI screening. Total of 33 features were computed including texture feature and global BPE features. Texture features were computed from either a gray-level co-occurrence matrix or a gray level run length matrix. Ten global BPE features were also initially computed from two breast regions and bilateral difference between the left and right breasts. Box-plot based analysis shows positive association between texture features and BIRADS rating levels in the second screening. Furthermore, a logistic regression model was built using optimal features selected by a CFS based feature selection method. Using a leave-one-case-out based cross-validation method, classification yielded an overall 75% accuracy in predicting the improvement (or downgrade) of diagnostic status (to BIRAD-1) in the subsequent breast MRI screening. This study demonstrated potential of developing a new quantitative imaging marker to predict diagnostic status change in the short-term, which may help eliminate a high fraction of unnecessary repeated breast MRI screenings and increase the cancer detection yield.
NASA Astrophysics Data System (ADS)
Belciug, Smaranda; Serbanescu, Mircea-Sebastian
2015-09-01
Feature selection is considered a key factor in classifications/decision problems. It is currently used in designing intelligent decision systems to choose the best features which allow the best performance. This paper proposes a regression-based approach to select the most important predictors to significantly increase the classification performance. Application to breast cancer detection and recurrence using publically available datasets proved the efficiency of this technique.
Wang, Qingliang; Li, Xiaojie; Hu, Kunpeng; Zhao, Kun; Yang, Peisheng; Liu, Bo
2015-05-12
To explore the risk factors of portal hypertensive gastropathy (PHG) in patients with hepatitis B associated cirrhosis and establish a Logistic regression model of noninvasive prediction. The clinical data of 234 hospitalized patients with hepatitis B associated cirrhosis from March 2012 to March 2014 were analyzed retrospectively. The dependent variable was the occurrence of PHG while the independent variables were screened by binary Logistic analysis. Multivariate Logistic regression was used for further analysis of significant noninvasive independent variables. Logistic regression model was established and odds ratio was calculated for each factor. The accuracy, sensitivity and specificity of model were evaluated by the curve of receiver operating characteristic (ROC). According to univariate Logistic regression, the risk factors included hepatic dysfunction, albumin (ALB), bilirubin (TB), prothrombin time (PT), platelet (PLT), white blood cell (WBC), portal vein diameter, spleen index, splenic vein diameter, diameter ratio, PLT to spleen volume ratio, esophageal varices (EV) and gastric varices (GV). Multivariate analysis showed that hepatic dysfunction (X1), TB (X2), PLT (X3) and splenic vein diameter (X4) were the major occurring factors for PHG. The established regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4. The accuracy of model for PHG was 79.1% with a sensitivity of 77.2% and a specificity of 80.8%. Hepatic dysfunction, TB, PLT and splenic vein diameter are risk factors for PHG and the noninvasive predicted Logistic regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4.
NASA Astrophysics Data System (ADS)
Oustimov, Andrew; Gastounioti, Aimilia; Hsieh, Meng-Kang; Pantalone, Lauren; Conant, Emily F.; Kontos, Despina
2017-03-01
We assess the feasibility of a parenchymal texture feature fusion approach, utilizing a convolutional neural network (ConvNet) architecture, to benefit breast cancer risk assessment. Hypothesizing that by capturing sparse, subtle interactions between localized motifs present in two-dimensional texture feature maps derived from mammographic images, a multitude of texture feature descriptors can be optimally reduced to five meta-features capable of serving as a basis on which a linear classifier, such as logistic regression, can efficiently assess breast cancer risk. We combine this methodology with our previously validated lattice-based strategy for parenchymal texture analysis and we evaluate the feasibility of this approach in a case-control study with 424 digital mammograms. In a randomized split-sample setting, we optimize our framework in training/validation sets (N=300) and evaluate its descriminatory performance in an independent test set (N=124). The discriminatory capacity is assessed in terms of the the area under the curve (AUC) of the receiver operator characteristic (ROC). The resulting meta-features exhibited strong classification capability in the test dataset (AUC = 0.90), outperforming conventional, non-fused, texture analysis which previously resulted in an AUC=0.85 on the same case-control dataset. Our results suggest that informative interactions between localized motifs exist and can be extracted and summarized via a fairly simple ConvNet architecture.
Variable Selection in Logistic Regression.
1987-06-01
23 %. AUTIOR(.) S. CONTRACT OR GRANT NUMBE Rf.i %Z. D. Bai, P. R. Krishnaiah and . C. Zhao F49620-85- C-0008 " PERFORMING ORGANIZATION NAME AND AOORESS...d I7 IOK-TK- d 7 -I0 7’ VARIABLE SELECTION IN LOGISTIC REGRESSION Z. D. Bai, P. R. Krishnaiah and L. C. Zhao Center for Multivariate Analysis...University of Pittsburgh Center for Multivariate Analysis University of Pittsburgh Y !I VARIABLE SELECTION IN LOGISTIC REGRESSION Z- 0. Bai, P. R. Krishnaiah
NASA Astrophysics Data System (ADS)
Madhu, B.; Ashok, N. C.; Balasubramanian, S.
2014-11-01
Multinomial logistic regression analysis was used to develop statistical model that can predict the probability of breast cancer in Southern Karnataka using the breast cancer occurrence data during 2007-2011. Independent socio-economic variables describing the breast cancer occurrence like age, education, occupation, parity, type of family, health insurance coverage, residential locality and socioeconomic status of each case was obtained. The models were developed as follows: i) Spatial visualization of the Urban- rural distribution of breast cancer cases that were obtained from the Bharat Hospital and Institute of Oncology. ii) Socio-economic risk factors describing the breast cancer occurrences were complied for each case. These data were then analysed using multinomial logistic regression analysis in a SPSS statistical software and relations between the occurrence of breast cancer across the socio-economic status and the influence of other socio-economic variables were evaluated and multinomial logistic regression models were constructed. iii) the model that best predicted the occurrence of breast cancer were identified. This multivariate logistic regression model has been entered into a geographic information system and maps showing the predicted probability of breast cancer occurrence in Southern Karnataka was created. This study demonstrates that Multinomial logistic regression is a valuable tool for developing models that predict the probability of breast cancer Occurrence in Southern Karnataka.
NASA Astrophysics Data System (ADS)
Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.
2014-07-01
Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.
Decoding memory features from hippocampal spiking activities using sparse classification models.
Dong Song; Hampson, Robert E; Robinson, Brian S; Marmarelis, Vasilis Z; Deadwyler, Sam A; Berger, Theodore W
2016-08-01
To understand how memory information is encoded in the hippocampus, we build classification models to decode memory features from hippocampal CA3 and CA1 spatio-temporal patterns of spikes recorded from epilepsy patients performing a memory-dependent delayed match-to-sample task. The classification model consists of a set of B-spline basis functions for extracting memory features from the spike patterns, and a sparse logistic regression classifier for generating binary categorical output of memory features. Results show that classification models can extract significant amount of memory information with respects to types of memory tasks and categories of sample images used in the task, despite the high level of variability in prediction accuracy due to the small sample size. These results support the hypothesis that memories are encoded in the hippocampal activities and have important implication to the development of hippocampal memory prostheses.
Hooper, Paula; Knuiman, Matthew; Foster, Sarah; Giles-Corti, Billie
2015-11-01
Planning policy makers are requesting clearer guidance on the key design features required to build neighbourhoods that promote active living. Using a backwards stepwise elimination procedure (logistic regression with generalised estimating equations adjusting for demographic characteristics, self-selection factors, stage of construction and scale of development) this study identified specific design features (n=16) from an operational planning policy ("Liveable Neighbourhoods") that showed the strongest associations with walking behaviours (measured using the Neighbourhood Physical Activity Questionnaire). The interacting effects of design features on walking behaviours were also investigated. The urban design features identified were grouped into the "building blocks of a Liveable Neighbourhood", reflecting the scale, importance and sequencing of the design and implementation phases required to create walkable, pedestrian friendly developments. Copyright © 2015 Elsevier Ltd. All rights reserved.
FitzGerald, Liesel M.; Kwon, Erika M.; Koopmeiners, Joseph S.; Salinas, Claudia A.; Stanford, Janet L.; Ostrander, Elaine A.
2009-01-01
Purpose Two recent genome-wide association studies have highlighted several SNPs purported to be associated with prostate cancer risk. We investigated the significance of these SNPs in a population-based study of Caucasian men, testing the effects of each SNP in relation to family history of prostate cancer and clinicopathological features of disease. Experimental Design We genotyped 13 SNPs in 1,308 prostate cancer patients and 1,267 unaffected controls frequency matched to cases by five-year age groups. The association of each SNP with disease risk and stratified by family history of prostate cancer and clinicopathological features of disease was calculated using logistic and polytomous regression. Results These results confirm the importance of multiple previously reported SNPs in relation to prostate cancer susceptibility; 11 of the 13 SNPs were significantly associated with risk of developing prostate cancer. However, none of the SNP associations were of comparable magnitude to that associated with having a first-degree family history of the disease. Risk estimates associated with SNPs rs4242382 and rs2735839 varied by family history, while risk estimates for rs10993994 and rs5945619 varied by Gleason score. Conclusions Our results confirm that several recently identified SNPs are associated with prostate cancer risk; however the variant alleles only confer a low to moderate relative risk of disease and are generally not associated with more aggressive disease features. PMID:19366831
Understanding logistic regression analysis.
Sperandei, Sandro
2014-01-01
Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using examples to make it as simple as possible. After definition of the technique, the basic interpretation of the results is highlighted and then some special issues are discussed.
NASA Astrophysics Data System (ADS)
Gao, Wei; Fan, Ming; Zhao, Weijie; Zheng, Bin; Li, Lihua
2017-03-01
This study developed and tested a multi-probe resonance-frequency-based electrical impedance spectroscopy (REIS) system aimed at detection of breast cancer. The REIS system consists of specially designed mechanical supporting device that can be easily lifted to fit women of different height, a seven probe sensor cup, and a computer providing software for system control and management. The sensor cup includes one central probe for direct contact with the nipple, and other six probes uniformly distributed at a distance of 35mm away from the center probe to enable contact with breast skin surface. It takes about 18 seconds for this system to complete a data acquisition process. We utilized this system for examination of breast cancer, collecting a dataset of 289 cases including biopsy verified 74 malignant and 215 benign tumors. After that, 23 REIS based features, including seven frequency, fifteen magnitude features were extracted, and an age feature. To reduce redundancy we selected 6 features using the evolutionary algorithm for classification. The area under a receiver operating characteristic curve (AUC) was computed to assess classifier performance. A multivariable logistic regression method was performed for detection of the tumors. The results of our study showed for the 23 REIS features AUC and ACC, Sensitivity and Specificity of 0.796, 0.727, 0.731 and 0.726, respectively. The AUC and ACC, Sensitivity and Specificity for the 6 REIS features of 0.840, 0.80, 0.703 and 0.833, respectively, and AUC of 0.662 and 0.619 for the frequency and magnitude based REIS features, respectively. The performance of the classifiers using all the 6 features was significantly better than solely using magnitude features (p=3.29e-08) and frequency features (5.61e-07). Smote algorithm was used to expand small samples to balance the dataset, the AUC after data balance of 0.846 increased than the original data classification performance. The results indicated that the REIS system is a promising tool for detection of breast cancer and may be acceptable for clinical implementation.
Spectral Regression Based Fault Feature Extraction for Bearing Accelerometer Sensor Signals
Xia, Zhanguo; Xia, Shixiong; Wan, Ling; Cai, Shiyu
2012-01-01
Bearings are not only the most important element but also a common source of failures in rotary machinery. Bearing fault prognosis technology has been receiving more and more attention recently, in particular because it plays an increasingly important role in avoiding the occurrence of accidents. Therein, fault feature extraction (FFE) of bearing accelerometer sensor signals is essential to highlight representative features of bearing conditions for machinery fault diagnosis and prognosis. This paper proposes a spectral regression (SR)-based approach for fault feature extraction from original features including time, frequency and time-frequency domain features of bearing accelerometer sensor signals. SR is a novel regression framework for efficient regularized subspace learning and feature extraction technology, and it uses the least squares method to obtain the best projection direction, rather than computing the density matrix of features, so it also has the advantage in dimensionality reduction. The effectiveness of the SR-based method is validated experimentally by applying the acquired vibration signals data to bearings. The experimental results indicate that SR can reduce the computation cost and preserve more structure information about different bearing faults and severities, and it is demonstrated that the proposed feature extraction scheme has an advantage over other similar approaches. PMID:23202017
Chen, Guangchao; Li, Xuehua; Chen, Jingwen; Zhang, Ya-Nan; Peijnenburg, Willie J G M
2014-12-01
Biodegradation is the principal environmental dissipation process of chemicals. As such, it is a dominant factor determining the persistence and fate of organic chemicals in the environment, and is therefore of critical importance to chemical management and regulation. In the present study, the authors developed in silico methods assessing biodegradability based on a large heterogeneous set of 825 organic compounds, using the techniques of the C4.5 decision tree, the functional inner regression tree, and logistic regression. External validation was subsequently carried out by 2 independent test sets of 777 and 27 chemicals. As a result, the functional inner regression tree exhibited the best predictability with predictive accuracies of 81.5% and 81.0%, respectively, on the training set (825 chemicals) and test set I (777 chemicals). Performance of the developed models on the 2 test sets was subsequently compared with that of the Estimation Program Interface (EPI) Suite Biowin 5 and Biowin 6 models, which also showed a better predictability of the functional inner regression tree model. The model built in the present study exhibits a reasonable predictability compared with existing models while possessing a transparent algorithm. Interpretation of the mechanisms of biodegradation was also carried out based on the models developed. © 2014 SETAC.
Fan, L; Liu, S-Y; Li, Q-C; Yu, H; Xiao, X-S
2012-01-01
Objective To evaluate different features between benign and malignant pulmonary focal ground-glass opacity (fGGO) on multidetector CT (MDCT). Methods 82 pathologically or clinically confirmed fGGOs were retrospectively analysed with regard to demographic data, lesion size and location, attenuation value and MDCT features including shape, margin, interface, internal characteristics and adjacent structure. Differences between benign and malignant fGGOs were analysed using a χ2 test, Fisher's exact test or Mann–Whitney U-test. Morphological characteristics were analysed by binary logistic regression analysis to estimate the likelihood of malignancy. Results There were 21 benign and 61 malignant lesions. No statistical differences were found between benign and malignant fGGOs in terms of demographic data, size, location and attenuation value. The frequency of lobulation (p=0.000), spiculation (p=0.008), spine-like process (p=0.004), well-defined but coarse interface (p=0.000), bronchus cut-off (p=0.003), other air-containing space (p=0.000), pleural indentation (p=0.000) and vascular convergence (p=0.006) was significantly higher in malignant fGGOs than that in benign fGGOs. Binary logistic regression analysis showed that lobulation, interface and pleural indentation were important indicators for malignant diagnosis of fGGO, with the corresponding odds ratios of 8.122, 3.139 and 9.076, respectively. In addition, a well-defined but coarse interface was the most important indicator of malignancy among all interface types. With all three important indicators considered, the diagnostic sensitivity, specificity and accuracy were 93.4%, 66.7% and 86.6%, respectively. Conclusion An fGGO with lobulation, a well-defined but coarse interface and pleural indentation gives a greater than average likelihood of being malignant. PMID:22128130
Sturiale, C L; Rigante, L; Puca, A; Di Lella, G; Albanese, A; Marchese, E; Di Rocco, C; Maira, G; Colicchio, G
2013-05-01
Epileptic seizures account for 24-40% of all clinical onsets in patients with brain arteriovenous malformations (AVMs). We retrospectively reviewed the angioarchitectural features of AVMs associated with seizures in 168 patients admitted to our Department from 1997 to 2012. Patients were dichotomized according to demographic characteristics, type of treatment, bleeding occurrence, and morphological and topographic features. Clinical status at admission and discharge was also recorded. The association of each one of these variables with seizures occurrence was statistically tested. Continuous variables and outcome were compared with Student's t-test, whereas categorical ones were compared using Fisher's exact test. The independent contribution of some seizures predictors was assessed with a logistic regression model. Associations were considered significant for P < 0.05. About 29% patients showed seizures and 47% bleeding. No significant difference in age and sex was observed between patients with and without seizures. AVMs > 4 cm (P = 0.001) and those fed by dilated arterial feeders (P = 0.02) were associated with increased risk of seizures. A higher risk of seizures occurrence was also observed in cortical AVMs compared with deeper ones (75.5% vs. 55.4%; P = 0.01), and in AVMs fed by middle and posterior cerebral arteries branches compared with the other vessels (81.6% vs. 45.3%; P < 0.001 and 48.9% vs. 23.5%; P = 0.002, respectively). No lobar predisposition was observed. A nidus > 4 cm also appeared as an independent risk factor of seizures occurrence (OR 2.82; 95% CI, 1.26-6.31; P = 0.009) at logistic regression analysis. AVM morphology, especially nidus dimension, appeared to more significantly influence seizures occurrence than their topography. © 2013 The Author(s) European Journal of Neurology © 2013 EFNS.
Kaur, Ravneet; Albano, Peter P.; Cole, Justin G.; Hagerty, Jason; LeAnder, Robert W.; Moss, Randy H.; Stoecker, William V.
2015-01-01
Background/Purpose Early detection of malignant melanoma is an important public health challenge. In the USA, dermatologists are seeing more melanomas at an early stage, before classic melanoma features have become apparent. Pink color is a feature of these early melanomas. If rapid and accurate automatic detection of pink color in these melanomas could be accomplished, there could be significant public health benefits. Methods Detection of three shades of pink (light pink, dark pink, and orange pink) was accomplished using color analysis techniques in five color planes (red, green, blue, hue and saturation). Color shade analysis was performed using a logistic regression model trained with an image set of 60 dermoscopic images of melanoma that contained pink areas. Detected pink shade areas were further analyzed with regard to the location within the lesion, average color parameters over the detected areas, and histogram texture features. Results Logistic regression analysis of a separate set of 128 melanomas and 128 benign images resulted in up to 87.9% accuracy in discriminating melanoma from benign lesions measured using area under the receiver operating characteristic curve. The accuracy in this model decreased when parameters for individual shades, texture, or shade location within the lesion were omitted. Conclusion Texture, color, and lesion location analysis applied to multiple shades of pink can assist in melanoma detection. When any of these three details: color location, shade analysis, or texture analysis were omitted from the model, accuracy in separating melanoma from benign lesions was lowered. Separation of colors into shades and further details that enhance the characterization of these color shades are needed for optimal discrimination of melanoma from benign lesions. PMID:25809473
Dell'Osso, Bernardo; Benatti, Beatrice; Arici, Chiara; Palazzo, Carlotta; Altamura, A Carlo; Hollander, Eric; Fineberg, Naomi; Stein, Dan J; Nicolini, Humberto; Lanzagorta, Nuria; Marazziti, Donatella; Pallanti, Stefano; van Ameringen, Michael; Lochner, Christine; Karamustafalioglu, Oguz; Hranov, Luchezar; Figee, Martijn; Drummond, Lynne; Rodriguez, Carolyn I; Grant, John; Denys, Damiaan; Menchon, Jose M; Zohar, Joseph
2018-02-01
Obsessive-compulsive disorder (OCD) is associated with variable risk of suicide and prevalence of suicide attempt (SA). The present study aimed to assess the prevalence of SA and associated sociodemographic and clinical features in a large international sample of OCD patients. A total of 425 OCD outpatients, recruited through the International College of Obsessive-Compulsive Spectrum Disorders (ICOCS) network, were assessed and categorized in groups with or without a history of SA, and their sociodemographic and clinical features compared through Pearson's chi-squared and t tests. Logistic regression was performed to assess the impact of the collected data on the SA variable. 14.6% of our sample reported at least one SA during their lifetime. Patients with an SA had significantly higher rates of comorbid psychiatric disorders (60 vs. 17%, p<0.001; particularly tic disorder), medical disorders (51 vs. 15%, p<0.001), and previous hospitalizations (62 vs. 11%, p<0.001) than patients with no history of SA. With respect to geographical differences, European and South African patients showed significantly higher rates of SA history (40 and 39%, respectively) compared to North American and Middle-Eastern individuals (13 and 8%, respectively) (χ2=11.4, p<0.001). The logistic regression did not show any statistically significant predictor of SA among selected independent variables. Our international study found a history of SA prevalence of ~15% in OCD patients, with higher rates of psychiatric and medical comorbidities and previous hospitalizations in patients with a previous SA. Along with potential geographical influences, the presence of the abovementioned features should recommend additional caution in the assessment of suicide risk in OCD patients.
Kaur, R; Albano, P P; Cole, J G; Hagerty, J; LeAnder, R W; Moss, R H; Stoecker, W V
2015-11-01
Early detection of malignant melanoma is an important public health challenge. In the USA, dermatologists are seeing more melanomas at an early stage, before classic melanoma features have become apparent. Pink color is a feature of these early melanomas. If rapid and accurate automatic detection of pink color in these melanomas could be accomplished, there could be significant public health benefits. Detection of three shades of pink (light pink, dark pink, and orange pink) was accomplished using color analysis techniques in five color planes (red, green, blue, hue, and saturation). Color shade analysis was performed using a logistic regression model trained with an image set of 60 dermoscopic images of melanoma that contained pink areas. Detected pink shade areas were further analyzed with regard to the location within the lesion, average color parameters over the detected areas, and histogram texture features. Logistic regression analysis of a separate set of 128 melanomas and 128 benign images resulted in up to 87.9% accuracy in discriminating melanoma from benign lesions measured using area under the receiver operating characteristic curve. The accuracy in this model decreased when parameters for individual shades, texture, or shade location within the lesion were omitted. Texture, color, and lesion location analysis applied to multiple shades of pink can assist in melanoma detection. When any of these three details: color location, shade analysis, or texture analysis were omitted from the model, accuracy in separating melanoma from benign lesions was lowered. Separation of colors into shades and further details that enhance the characterization of these color shades are needed for optimal discrimination of melanoma from benign lesions. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
A Profile of Latino School-Based Extracurricular Activity Involvement
ERIC Educational Resources Information Center
Peguero, Anthony A.
2010-01-01
Participation in school-based extracurricular activities influences educational success. Thus, it is important to depict a profile of school-based extracurricular activity involvement for a Latino student population that is marginalized in schools. This research uses the Educational Longitudinal Study of 2002 and logistic regression analyses to…
ERIC Educational Resources Information Center
Koon, Sharon; Petscher, Yaacov
2015-01-01
The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…
Friedrich, Nele; Schneider, Harald J; Spielhagen, Christin; Markus, Marcello Ricardo Paulista; Haring, Robin; Grabe, Hans J; Buchfelder, Michael; Wallaschofski, Henri; Nauck, Matthias
2011-10-01
Prolactin (PRL) is involved in immune regulation and may contribute to an atherogenic phenotype. Previous results on the association of PRL with inflammatory biomarkers have been conflicting and limited by small patient studies. Therefore, we used data from a large population-based sample to assess the cross-sectional associations between serum PRL concentration and high-sensitivity C-reactive protein (hsCRP), fibrinogen, interleukin-6 (IL-6), and white blood cell (WBC) count. From the population-based Study of Health in Pomerania (SHIP), a total of 3744 subjects were available for the present analyses. PRL and inflammatory biomarkers were measured. Linear and logistic regression models adjusted for age, sex, body-mass-index, total cholesterol and glucose were analysed. Multivariable linear regression models revealed a positive association of PRL with WBC. Multivariable logistic regression analyses showed a significant association of PRL with increased IL-6 in non-smokers [highest vs lowest quintile: odds ratio 1·69 (95% confidence interval 1·10-2·58), P = 0·02] and smokers [OR 2·06 (95%-CI 1·10-3·89), P = 0·02]. Similar results were found for WBC in non-smokers [highest vs lowest quintile: OR 2·09 (95%-CI 1·21-3·61), P = 0·01)] but not in smokers. Linear and logistic regression analyses revealed no significant associations of PRL with hsCRP or fibrinogen. Serum PRL concentrations are associated with inflammatory biomarkers including IL-6 and WBC, but not hsCRP or fibrinogen. The suggested role of PRL in inflammation needs further investigation in future prospective studies. © 2011 Blackwell Publishing Ltd.
Secure Logistic Regression Based on Homomorphic Encryption: Design and Evaluation.
Kim, Miran; Song, Yongsoo; Wang, Shuang; Xia, Yuhou; Jiang, Xiaoqian
2018-04-17
Learning a model without accessing raw data has been an intriguing idea to security and machine learning researchers for years. In an ideal setting, we want to encrypt sensitive data to store them on a commercial cloud and run certain analyses without ever decrypting the data to preserve privacy. Homomorphic encryption technique is a promising candidate for secure data outsourcing, but it is a very challenging task to support real-world machine learning tasks. Existing frameworks can only handle simplified cases with low-degree polynomials such as linear means classifier and linear discriminative analysis. The goal of this study is to provide a practical support to the mainstream learning models (eg, logistic regression). We adapted a novel homomorphic encryption scheme optimized for real numbers computation. We devised (1) the least squares approximation of the logistic function for accuracy and efficiency (ie, reduce computation cost) and (2) new packing and parallelization techniques. Using real-world datasets, we evaluated the performance of our model and demonstrated its feasibility in speed and memory consumption. For example, it took approximately 116 minutes to obtain the training model from the homomorphically encrypted Edinburgh dataset. In addition, it gives fairly accurate predictions on the testing dataset. We present the first homomorphically encrypted logistic regression outsourcing model based on the critical observation that the precision loss of classification models is sufficiently small so that the decision plan stays still. ©Miran Kim, Yongsoo Song, Shuang Wang, Yuhou Xia, Xiaoqian Jiang. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 17.04.2018.
Kasthurirathne, Suranga N; Dixon, Brian E; Gichoya, Judy; Xu, Huiping; Xia, Yuni; Mamlin, Burke; Grannis, Shaun J
2016-04-01
Increased adoption of electronic health records has resulted in increased availability of free text clinical data for secondary use. A variety of approaches to obtain actionable information from unstructured free text data exist. These approaches are resource intensive, inherently complex and rely on structured clinical data and dictionary-based approaches. We sought to evaluate the potential to obtain actionable information from free text pathology reports using routinely available tools and approaches that do not depend on dictionary-based approaches. We obtained pathology reports from a large health information exchange and evaluated the capacity to detect cancer cases from these reports using 3 non-dictionary feature selection approaches, 4 feature subset sizes, and 5 clinical decision models: simple logistic regression, naïve bayes, k-nearest neighbor, random forest, and J48 decision tree. The performance of each decision model was evaluated using sensitivity, specificity, accuracy, positive predictive value, and area under the receiver operating characteristics (ROC) curve. Decision models parameterized using automated, informed, and manual feature selection approaches yielded similar results. Furthermore, non-dictionary classification approaches identified cancer cases present in free text reports with evaluation measures approaching and exceeding 80-90% for most metrics. Our methods are feasible and practical approaches for extracting substantial information value from free text medical data, and the results suggest that these methods can perform on par, if not better, than existing dictionary-based approaches. Given that public health agencies are often under-resourced and lack the technical capacity for more complex methodologies, these results represent potentially significant value to the public health field. Copyright © 2016 Elsevier Inc. All rights reserved.
Dixon, Helen; Scully, Maree; Kelly, Bridget; Donovan, Robert; Chapman, Kathy; Wakefield, Melanie
2014-01-01
Assess the effect of counter-advertisements on parents' appraisals of unhealthy foods featuring front-of-package promotions (FOPPs). A 2 × 2 × 5 between-subjects Web-based experiment. Parents were randomly shown an advertisement (counter-advertisement challenging FOPP/control advertisement) and then a pair of food products from the same category: an unhealthy product featuring an FOPP (nutrient content claim/sports celebrity endorsement) and a healthier control product with no FOPP. Australia. A total of 1,269 Australian-based parents of children aged 5-12 years recruited from an online panel. Parents nominated which product they would prefer to buy and which they thought was healthier, then rated the unhealthy product and FOPP on various characteristics. Differences between advertisement conditions were assessed using logistic regression (product choice tasks) and analysis of variance tests (ratings of unhealthy product and FOPP). Compared with parents who saw a control advertisement, parents who saw a counter-advertisement perceived unhealthy products featuring FOPPs as less healthy, expressed weaker intentions for buying such products, and were more likely to read the nutrition facts panel before nominating choices (all P < .001). Counter-advertising may help reduce the misleading influence of unhealthy food marketing and improve the accuracy of parents' evaluations of how nutritious promoted food products are. Copyright © 2014 Society for Nutrition Education and Behavior. Published by Elsevier Inc. All rights reserved.
Caldieraro, Marco Antonio; Walsh, Samantha; Deckersbach, Thilo; Bobo, William V; Gao, Keming; Ketter, Terence A; Shelton, Richard C; Reilly-Harrington, Noreen A; Tohen, Mauricio; Calabrese, Joseph R; Thase, Michael E; Kocsis, James H; Sylvia, Louisa G; Nierenberg, Andrew A
2017-11-01
Activation encompasses energy and activity and is a central feature of bipolar disorder. However, the impact of activation on treatment response of bipolar depression requires further exploration. The aims of this study were to assess the association of decreased activation and sustained remission in bipolar depression and test for factors that could affect this association. We assessed participants with Diagnostic and Statistical Manual of Mental Disorders (4th ed) bipolar depression ( n = 303) included in a comparative effectiveness study of lithium- and quetiapine-based treatments (the Bipolar CHOICE study). Activation was evaluated using items from the Bipolar Inventory of Symptoms Scale. The selection of these items was based on a dimension of energy and interest symptoms associated with poorer treatment response in major depression. Decreased activation was associated with lower remission rates in the raw analyses and in a logistic regression model adjusted for baseline severity and subsyndromal manic symptoms (odds ratio = 0.899; p = 0.015). The manic features also predicted lower remission (odds ratio = 0.934; p < 0.001). Remission rates were similar in the two treatment groups. Decreased activation and subsyndromal manic symptoms predict lower remission rates in bipolar depression. Patients with these features may require specific treatment approaches, but new studies are necessary to identify treatments that could improve outcomes in this population.
2017-03-23
PUBLIC RELEASE; DISTRIBUTION UNLIMITED Using Multiple and Logistic Regression to Estimate the Median Will- Cost and Probability of Cost and... Cost and Probability of Cost and Schedule Overrun for Program Managers Ryan C. Trudelle Follow this and additional works at: https://scholar.afit.edu...afit.edu. Recommended Citation Trudelle, Ryan C., "Using Multiple and Logistic Regression to Estimate the Median Will- Cost and Probability of Cost and
2013-11-01
Ptrend 0.78 0.62 0.75 Unconditional logistic regression was used to estimate odds ratios (OR) and 95 % confidence intervals (CI) for risk of node...Ptrend 0.71 0.67 Unconditional logistic regression was used to estimate odds ratios (OR) and 95 % confidence intervals (CI) for risk of high-grade tumors... logistic regression was used to estimate odds ratios (OR) and 95 % confidence intervals (CI) for the associations between each of the seven SNPs and
Rhodes, Darson L; Kirchofer, Gregg; Hammig, Bart J; Ogletree, Roberta J
2013-05-01
This study examined the impact of professional preparation and class structure on sexuality topics taught and use of practice-based instructional strategies in US middle and high school health classes. Data from the classroom-level file of the 2006 School Health Policies and Programs were used. A series of multivariable logistic regression models were employed to determine if sexuality content taught was dependent on professional preparation and /or class structure (HE only versus HE/another subject combined). Additional multivariable logistic regression models were employed to determine if use of practice-based instructional strategies was dependent upon professional preparation and/or class structure. Years of teaching health topics and size of the school district were included as covariates in the multivariable logistic regression models. Findings indicated professionally prepared health educators were significantly more likely to teach 7 of the 13 sexuality topics as compared to nonprofessionally prepared health educators. There was no statistically significant difference in the instructional strategies used by professionally prepared and nonprofessionally prepared health educators. Exclusively health education classes versus combined classes were significantly more likely to have included 6 of the 13 topics and to have incorporated practice-based instructional strategies in the curricula. This study indicated professional preparation and class structure impacted sexuality content taught. Class structure also impacted whether opportunities for students to practice skills were made available. Results support the need for continued advocacy for professionally prepared health educators and health only courses. © 2013, American School Health Association.
Stata Modules for Calculating Novel Predictive Performance Indices for Logistic Models.
Barkhordari, Mahnaz; Padyab, Mojgan; Hadaegh, Farzad; Azizi, Fereidoun; Bozorgmanesh, Mohammadreza
2016-01-01
Prediction is a fundamental part of prevention of cardiovascular diseases (CVD). The development of prediction algorithms based on the multivariate regression models loomed several decades ago. Parallel with predictive models development, biomarker researches emerged in an impressively great scale. The key question is how best to assess and quantify the improvement in risk prediction offered by new biomarkers or more basically how to assess the performance of a risk prediction model. Discrimination, calibration, and added predictive value have been recently suggested to be used while comparing the predictive performances of the predictive models' with and without novel biomarkers. Lack of user-friendly statistical software has restricted implementation of novel model assessment methods while examining novel biomarkers. We intended, thus, to develop a user-friendly software that could be used by researchers with few programming skills. We have written a Stata command that is intended to help researchers obtain cut point-free and cut point-based net reclassification improvement index and (NRI) and relative and absolute Integrated discriminatory improvement index (IDI) for logistic-based regression analyses.We applied the commands to a real data on women participating the Tehran lipid and glucose study (TLGS) to examine if information of a family history of premature CVD, waist circumference, and fasting plasma glucose can improve predictive performance of the Framingham's "general CVD risk" algorithm. The command is addpred for logistic regression models. The Stata package provided herein can encourage the use of novel methods in examining predictive capacity of ever-emerging plethora of novel biomarkers.
Hao, Chen; LiJun, Chen; Albright, Thomas P.
2007-01-01
Invasive exotic species pose a growing threat to the economy, public health, and ecological integrity of nations worldwide. Explaining and predicting the spatial distribution of invasive exotic species is of great importance to prevention and early warning efforts. We are investigating the potential distribution of invasive exotic species, the environmental factors that influence these distributions, and the ability to predict them using statistical and information-theoretic approaches. For some species, detailed presence/absence occurrence data are available, allowing the use of a variety of standard statistical techniques. However, for most species, absence data are not available. Presented with the challenge of developing a model based on presence-only information, we developed an improved logistic regression approach using Information Theory and Frequency Statistics to produce a relative suitability map. This paper generated a variety of distributions of ragweed (Ambrosia artemisiifolia L.) from logistic regression models applied to herbarium specimen location data and a suite of GIS layers including climatic, topographic, and land cover information. Our logistic regression model was based on Akaike's Information Criterion (AIC) from a suite of ecologically reasonable predictor variables. Based on the results we provided a new Frequency Statistical method to compartmentalize habitat-suitability in the native range. Finally, we used the model and the compartmentalized criterion developed in native ranges to "project" a potential distribution onto the exotic ranges to build habitat-suitability maps. ?? Science in China Press 2007.
Mumtaz, Wajid; Ali, Syed Saad Azhar; Yasin, Mohd Azhar Mohd; Malik, Aamir Saeed
2018-02-01
Major depressive disorder (MDD), a debilitating mental illness, could cause functional disabilities and could become a social problem. An accurate and early diagnosis for depression could become challenging. This paper proposed a machine learning framework involving EEG-derived synchronization likelihood (SL) features as input data for automatic diagnosis of MDD. It was hypothesized that EEG-based SL features could discriminate MDD patients and healthy controls with an acceptable accuracy better than measures such as interhemispheric coherence and mutual information. In this work, classification models such as support vector machine (SVM), logistic regression (LR) and Naïve Bayesian (NB) were employed to model relationship between the EEG features and the study groups (MDD patient and healthy controls) and ultimately achieved discrimination of study participants. The results indicated that the classification rates were better than chance. More specifically, the study resulted into SVM classification accuracy = 98%, sensitivity = 99.9%, specificity = 95% and f-measure = 0.97; LR classification accuracy = 91.7%, sensitivity = 86.66%, specificity = 96.6% and f-measure = 0.90; NB classification accuracy = 93.6%, sensitivity = 100%, specificity = 87.9% and f-measure = 0.95. In conclusion, SL could be a promising method for diagnosing depression. The findings could be generalized to develop a robust CAD-based tool that may help for clinical purposes.
Feature Selection for Ridge Regression with Provable Guarantees.
Paul, Saurabh; Drineas, Petros
2016-04-01
We introduce single-set spectral sparsification as a deterministic sampling-based feature selection technique for regularized least-squares classification, which is the classification analog to ridge regression. The method is unsupervised and gives worst-case guarantees of the generalization power of the classification function after feature selection with respect to the classification function obtained using all features. We also introduce leverage-score sampling as an unsupervised randomized feature selection method for ridge regression. We provide risk bounds for both single-set spectral sparsification and leverage-score sampling on ridge regression in the fixed design setting and show that the risk in the sampled space is comparable to the risk in the full-feature space. We perform experiments on synthetic and real-world data sets; a subset of TechTC-300 data sets, to support our theory. Experimental results indicate that the proposed methods perform better than the existing feature selection methods.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lafata, K; Ren, L; Wu, Q
Purpose: To develop a data-mining methodology based on quantum clustering and machine learning to predict expected dosimetric endpoints for lung SBRT applications based on patient-specific anatomic features. Methods: Ninety-three patients who received lung SBRT at our clinic from 2011–2013 were retrospectively identified. Planning information was acquired for each patient, from which various features were extracted using in-house semi-automatic software. Anatomic features included tumor-to-OAR distances, tumor location, total-lung-volume, GTV and ITV. Dosimetric endpoints were adopted from RTOG-0195 recommendations, and consisted of various OAR-specific partial-volume doses and maximum point-doses. First, PCA analysis and unsupervised quantum-clustering was used to explore the feature-space tomore » identify potentially strong classifiers. Secondly, a multi-class logistic regression algorithm was developed and trained to predict dose-volume endpoints based on patient-specific anatomic features. Classes were defined by discretizing the dose-volume data, and the feature-space was zero-mean normalized. Fitting parameters were determined by minimizing a regularized cost function, and optimization was performed via gradient descent. As a pilot study, the model was tested on two esophageal dosimetric planning endpoints (maximum point-dose, dose-to-5cc), and its generalizability was evaluated with leave-one-out cross-validation. Results: Quantum-Clustering demonstrated a strong separation of feature-space at 15Gy across the first-and-second Principle Components of the data when the dosimetric endpoints were retrospectively identified. Maximum point dose prediction to the esophagus demonstrated a cross-validation accuracy of 87%, and the maximum dose to 5cc demonstrated a respective value of 79%. The largest optimized weighting factor was placed on GTV-to-esophagus distance (a factor of 10 greater than the second largest weighting factor), indicating an intuitively strong correlation between this feature and both endpoints. Conclusion: This pilot study shows that it is feasible to predict dose-volume endpoints based on patient-specific anatomic features. The developed methodology can potentially help to identify patients at risk for higher OAR doses, thus improving the efficiency of treatment planning. R01-184173.« less
Pellicano, Clelia; Assogna, Francesca; Cellupica, Nystya; Piras, Federica; Pierantozzi, Mariangela; Stefani, Alessandro; Cerroni, Rocco; Mercuri, Bruno; Caltagirone, Carlo; Pontieri, Francesco E; Spalletta, Gianfranco
2017-12-01
The two main variants of Progressive Supranuclear Palsy (PSP), Richardson's syndrome (PSP-RS) and PSP-parkinsonism (PSP-P), share motor and non-motor features with Parkinson's disease (PD) particularly in the early stages. This makes the precocious diagnosis more challenging. We aimed at defining qualitative and quantitative differences of neuropsychiatric and neuropsychological profiles between PSP-P, PSP-RS and PD patients recruited within 24 months after the onset of symptoms, in order to clarify if the identification of peculiar cognitive and psychiatric symptoms is of help for early PSP diagnosis. PD (n = 155), PSP-P (n = 11) and PSP-RS (n = 14) patients were identified. All patients were submitted to clinical, neurological, neuropsychiatric diagnostic evaluation and to a comprehensive neuropsychiatric and neuropsychological battery. Predictors of PSP-P and PSP-RS diagnosis were identified by multivariate logistic regressions including neuropsychiatric and neuropsychological features that differed significantly among groups. The three groups differed significantly at the Apathy Rating Scale score and at several neuropsychological domains. The multivariate logistic regressions indicated that the diagnosis of PSP-RS was predicted by phonological verbal fluency deficit whereas the presence of apathy significantly predicted the PSP-P diagnosis. Peculiar neuropsychiatric and neuropsychological symptoms are identifiable very precociously in PSP-P, PSP-RS and PD patients. Early phonological verbal fluency deficit identifies patients with PSP-RS whereas apathy supports the diagnosis of PSP-P. Copyright © 2017 Elsevier Ltd. All rights reserved.
A framework for feature extraction from hospital medical data with applications in risk prediction.
Tran, Truyen; Luo, Wei; Phung, Dinh; Gupta, Sunil; Rana, Santu; Kennedy, Richard Lee; Larkins, Ann; Venkatesh, Svetha
2014-12-30
Feature engineering is a time consuming component of predictive modeling. We propose a versatile platform to automatically extract features for risk prediction, based on a pre-defined and extensible entity schema. The extraction is independent of disease type or risk prediction task. We contrast auto-extracted features to baselines generated from the Elixhauser comorbidities. Hospital medical records was transformed to event sequences, to which filters were applied to extract feature sets capturing diversity in temporal scales and data types. The features were evaluated on a readmission prediction task, comparing with baseline feature sets generated from the Elixhauser comorbidities. The prediction model was through logistic regression with elastic net regularization. Predictions horizons of 1, 2, 3, 6, 12 months were considered for four diverse diseases: diabetes, COPD, mental disorders and pneumonia, with derivation and validation cohorts defined on non-overlapping data-collection periods. For unplanned readmissions, auto-extracted feature set using socio-demographic information and medical records, outperformed baselines derived from the socio-demographic information and Elixhauser comorbidities, over 20 settings (5 prediction horizons over 4 diseases). In particular over 30-day prediction, the AUCs are: COPD-baseline: 0.60 (95% CI: 0.57, 0.63), auto-extracted: 0.67 (0.64, 0.70); diabetes-baseline: 0.60 (0.58, 0.63), auto-extracted: 0.67 (0.64, 0.69); mental disorders-baseline: 0.57 (0.54, 0.60), auto-extracted: 0.69 (0.64,0.70); pneumonia-baseline: 0.61 (0.59, 0.63), auto-extracted: 0.70 (0.67, 0.72). The advantages of auto-extracted standard features from complex medical records, in a disease and task agnostic manner were demonstrated. Auto-extracted features have good predictive power over multiple time horizons. Such feature sets have potential to form the foundation of complex automated analytic tasks.
Logistic regression models of factors influencing the location of bioenergy and biofuels plants
T.M. Young; R.L. Zaretzki; J.H. Perdue; F.M. Guess; X. Liu
2011-01-01
Logistic regression models were developed to identify significant factors that influence the location of existing wood-using bioenergy/biofuels plants and traditional wood-using facilities. Logistic models provided quantitative insight for variables influencing the location of woody biomass-using facilities. Availability of "thinnings to a basal area of 31.7m2/ha...
A Primer on Logistic Regression.
ERIC Educational Resources Information Center
Woldbeck, Tanya
This paper introduces logistic regression as a viable alternative when the researcher is faced with variables that are not continuous. If one is to use simple regression, the dependent variable must be measured on a continuous scale. In the behavioral sciences, it may not always be appropriate or possible to have a measured dependent variable on a…
Calibrating random forests for probability estimation.
Dankowski, Theresa; Ziegler, Andreas
2016-09-30
Probabilities can be consistently estimated using random forests. It is, however, unclear how random forests should be updated to make predictions for other centers or at different time points. In this work, we present two approaches for updating random forests for probability estimation. The first method has been proposed by Elkan and may be used for updating any machine learning approach yielding consistent probabilities, so-called probability machines. The second approach is a new strategy specifically developed for random forests. Using the terminal nodes, which represent conditional probabilities, the random forest is first translated to logistic regression models. These are, in turn, used for re-calibration. The two updating strategies were compared in a simulation study and are illustrated with data from the German Stroke Study Collaboration. In most simulation scenarios, both methods led to similar improvements. In the simulation scenario in which the stricter assumptions of Elkan's method were not met, the logistic regression-based re-calibration approach for random forests outperformed Elkan's method. It also performed better on the stroke data than Elkan's method. The strength of Elkan's method is its general applicability to any probability machine. However, if the strict assumptions underlying this approach are not met, the logistic regression-based approach is preferable for updating random forests for probability estimation. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Impact of low vision on employment.
Mojon-Azzi, Stefania M; Sousa-Poza, Alfonso; Mojon, Daniel S
2010-01-01
We investigated the influence of self-reported corrected eyesight on several variables describing the perception by employees and self-employed persons of their employment. Our study was based on data from the Survey of Health, Ageing and Retirement in Europe (SHARE). SHARE is a multidisciplinary, cross-national database of microdata on health, socioeconomic status, social and family networks, collected on 31,115 individuals in 11 European countries and in Israel. With the help of ordered logistic regressions and binary logistic regressions, we analyzed the influence of perceived visual impairment--corrected by 19 covariates capturing socioeconomic and health-related factors--on 10 variables describing the respondents' employment situation. Based on data covering 10,340 working individuals, the results of the logistic and ordered regressions indicate that respondents with lower levels of self-reported general eyesight were significantly less satisfied with their jobs, felt they had less freedom to decide, less opportunity to develop new skills, less support in difficult situations, less recognition for their work, and an inadequate salary. Respondents with a lower eyesight level more frequently reported that they feared their health might limit their ability to work before regular retirement age and more often indicated that they were seeking early retirement. Analysis of this dataset from 12 countries demonstrates the strong impact of self-reported visual impairment on individual employment, and therefore on job satisfaction, productivity, and well-being. Copyright © 2010 S. Karger AG, Basel.
Utility of respiratory ward-based NIV in acidotic hypercapnic respiratory failure.
Dave, Chirag; Turner, Alice; Thomas, Ajit; Beauchamp, Ben; Chakraborty, Biman; Ali, Asad; Mukherjee, Rahul; Banerjee, Dev
2014-11-01
We sought to elicit predictors of in-hospital mortality for first and subsequent admissions with acidotic hypercapnic respiratory failure (AHRF) in a cohort of chronic obstructive pulmonary disease patients who have undergone ward-based non-invasive ventilation (NIV), and identify features associated with long-term survival. Analysis of prospectively collected data at a single centre on patients undergoing NIV for AHRF between 2004 and 2009. Predictors of in-hospital mortality and intubation were sought by logistic regression and predictors of long-term survival by Cox regression. Initial pH exhibited a threshold effect for in-hospital mortality at pH 7.15. This relationship remained in patients undergoing their first episode of AHRF. In both first and subsequent admissions, a pH threshold of 7.25 at 4 h was associated with better prognosis (P = 0.02 and P = 0.04 respectively). In second or subsequent episodes of AHRF, mortality was lower and predicted only by age (P = 0.002) on multivariate analysis. NIV could be used on medical wards for patients with pH 7.16 or greater on their first admission, although more conservative values should continue to be used for those with a second or subsequent episodes of AHRF. © 2014 Asian Pacific Society of Respirology.
NASA Astrophysics Data System (ADS)
WU, Chunhung
2015-04-01
The research built the original logistic regression landslide susceptibility model (abbreviated as or-LRLSM) and landslide ratio-based ogistic regression landslide susceptibility model (abbreviated as lr-LRLSM), compared the performance and explained the error source of two models. The research assumes that the performance of the logistic regression model can be better if the distribution of landslide ratio and weighted value of each variable is similar. Landslide ratio is the ratio of landslide area to total area in the specific area and an useful index to evaluate the seriousness of landslide disaster in Taiwan. The research adopted the landside inventory induced by 2009 Typhoon Morakot in the Chishan watershed, which was the most serious disaster event in the last decade, in Taiwan. The research adopted the 20 m grid as the basic unit in building the LRLSM, and six variables, including elevation, slope, aspect, geological formation, accumulated rainfall, and bank erosion, were included in the two models. The six variables were divided as continuous variables, including elevation, slope, and accumulated rainfall, and categorical variables, including aspect, geological formation and bank erosion in building the or-LRLSM, while all variables, which were classified based on landslide ratio, were categorical variables in building the lr-LRLSM. Because the count of whole basic unit in the Chishan watershed was too much to calculate by using commercial software, the research took random sampling instead of the whole basic units. The research adopted equal proportions of landslide unit and not landslide unit in logistic regression analysis. The research took 10 times random sampling and selected the group with the best Cox & Snell R2 value and Nagelkerker R2 value as the database for the following analysis. Based on the best result from 10 random sampling groups, the or-LRLSM (lr-LRLSM) is significant at the 1% level with Cox & Snell R2 = 0.190 (0.196) and Nagelkerke R2 = 0.253 (0.260). The unit with the landslide susceptibility value > 0.5 (≦ 0.5) will be classified as a predicted landslide unit (not landslide unit). The AUC, i.e. the area under the relative operating characteristic curve, of or-LRLSM in the Chishan watershed is 0.72, while that of lr-LRLSM is 0.77. Furthermore, the average correct ratio of lr-LRLSM (73.3%) is better than that of or-LRLSM (68.3%). The research analyzed in detail the error sources from the two models. In continuous variables, using the landslide ratio-based classification in building the lr-LRLSM can let the distribution of weighted value more similar to distribution of landslide ratio in the range of continuous variable than that in building the or-LRLSM. In categorical variables, the meaning of using the landslide ratio-based classification in building the lr-LRLSM is to gather the parameters with approximate landslide ratio together. The mean correct ratio in continuous variables (categorical variables) by using the lr-LRLSM is better than that in or-LRLSM by 0.6 ~ 2.6% (1.7% ~ 6.0%). Building the landslide susceptibility model by using landslide ratio-based classification is practical and of better performance than that by using the original logistic regression.
NASA Astrophysics Data System (ADS)
Erener, Arzu; Sivas, A. Abdullah; Selcuk-Kestel, A. Sevtap; Düzgün, H. Sebnem
2017-07-01
All of the quantitative landslide susceptibility mapping (QLSM) methods requires two basic data types, namely, landslide inventory and factors that influence landslide occurrence (landslide influencing factors, LIF). Depending on type of landslides, nature of triggers and LIF, accuracy of the QLSM methods differs. Moreover, how to balance the number of 0 (nonoccurrence) and 1 (occurrence) in the training set obtained from the landslide inventory and how to select which one of the 1's and 0's to be included in QLSM models play critical role in the accuracy of the QLSM. Although performance of various QLSM methods is largely investigated in the literature, the challenge of training set construction is not adequately investigated for the QLSM methods. In order to tackle this challenge, in this study three different training set selection strategies along with the original data set is used for testing the performance of three different regression methods namely Logistic Regression (LR), Bayesian Logistic Regression (BLR) and Fuzzy Logistic Regression (FLR). The first sampling strategy is proportional random sampling (PRS), which takes into account a weighted selection of landslide occurrences in the sample set. The second method, namely non-selective nearby sampling (NNS), includes randomly selected sites and their surrounding neighboring points at certain preselected distances to include the impact of clustering. Selective nearby sampling (SNS) is the third method, which concentrates on the group of 1's and their surrounding neighborhood. A randomly selected group of landslide sites and their neighborhood are considered in the analyses similar to NNS parameters. It is found that LR-PRS, FLR-PRS and BLR-Whole Data set-ups, with order, yield the best fits among the other alternatives. The results indicate that in QLSM based on regression models, avoidance of spatial correlation in the data set is critical for the model's performance.
Sargolzaie, Narjes; Miri-Moghaddam, Ebrahim
2014-01-01
The most common differential diagnosis of β-thalassemia (β-thal) trait is iron deficiency anemia. Several red blood cell equations were introduced during different studies for differential diagnosis between β-thal trait and iron deficiency anemia. Due to genetic variations in different regions, these equations cannot be useful in all population. The aim of this study was to determine a native equation with high accuracy for differential diagnosis of β-thal trait and iron deficiency anemia for the Sistan and Baluchestan population by logistic regression analysis. We selected 77 iron deficiency anemia and 100 β-thal trait cases. We used binary logistic regression analysis and determined best equations for probability prediction of β-thal trait against iron deficiency anemia in our population. We compared diagnostic values and receiver operative characteristic (ROC) curve related to this equation and another 10 published equations in discriminating β-thal trait and iron deficiency anemia. The binary logistic regression analysis determined the best equation for best probability prediction of β-thal trait against iron deficiency anemia with area under curve (AUC) 0.998. Based on ROC curves and AUC, Green & King, England & Frazer, and then Sirdah indices, respectively, had the most accuracy after our equation. We suggest that to get the best equation and cut-off in each region, one needs to evaluate specific information of each region, specifically in areas where populations are homogeneous, to provide a specific formula for differentiating between β-thal trait and iron deficiency anemia.
Russo, Giorgio I; Regis, Federica; Spatafora, Pietro; Frizzi, Jacopo; Urzì, Daniele; Cimino, Sebastiano; Serni, Sergio; Carini, Marco; Gacci, Mauro; Morgia, Giuseppe
2018-05-01
To investigate the association between metabolic syndrome (MetS) and morphological features of benign prostatic enlargement (BPE), including total prostate volume (TPV), transitional zone volume (TZV) and intravesical prostatic protrusion (IPP). Between January 2015 and January 2017, 224 consecutive men aged >50 years presenting with lower urinary tract symptoms (LUTS) suggestive of BPE were recruited to this multicentre cross-sectional study. MetS was defined according to International Diabetes Federation criteria. Multivariate linear and logistic regression models were performed to verify factors associated with IPP, TZV and TPV. Patients with MetS were observed to have a significant increase in IPP (P < 0.01), TPV (P < 0.01) and TZV (P = 0.02). On linear regression analysis, adjusted for age and metabolic factors of MetS, we found that high-density lipoprotein (HDL) cholesterol was negatively associated with IPP (r = -0.17), TPV (r = -0.19) and TZV (r = -0.17), while hypertension was positively associated with IPP (r = 0.16), TPV (r = 0.19) and TZV (r = 0.16). On multivariate logistic regression analysis adjusted for age and factors of MetS, hypertension (categorical; odds ratio [OR] 2.95), HDL cholesterol (OR 0.94) and triglycerides (OR 1.01) were independent predictors of TPV ≥ 40 mL. We also found that HDL cholesterol (OR 0.86), hypertension (OR 2.0) and waist circumference (OR 1.09) were significantly associated with TZV ≥ 20 mL. On age-adjusted logistic regression analysis, MetS was significantly associated with IPP ≥ 10 mm (OR 34.0; P < 0.01), TZV ≥ 20 mL (OR 4.40; P < 0.01) and TPV ≥ 40 mL (OR 5.89; P = 0.03). We found an association between MetS and BPE, demonstrating a relationship with IPP. © 2017 The Authors BJU International © 2017 BJU International Published by John Wiley & Sons Ltd.
Ye, Dong-qing; Hu, Yi-song; Li, Xiang-pei; Huang, Fen; Yang, Shi-gui; Hao, Jia-hu; Yin, Jing; Zhang, Guo-qing; Liu, Hui-hui
2004-11-01
To explore the impact of environmental factors, daily lifestyle, psycho-social factors and the interactions between environmental factors and chemokines genes on systemic lupus erythematosus (SLE). Case-control study was carried out and environmental factors for SLE were analyzed by univariate and multivariate unconditional logistic regression. Interactions between environmental factors and chemokines polymorphism contributing to systemic lupus erythematosus were also analyzed by logistic regression model. There were nineteen factors associated with SLE when univariate unconditional logistic regression was used. However, when multivariate unconditional logistic regression was used, only five factors showed having impacts on the disease, in which drinking well water (OR=0.099) was protective factor for SLE, and multiple drug allergy (OR=8.174), over-exposure to sunshine (OR=18.339), taking antibiotics (OR=9.630) and oral contraceptives were risk factors for SLE. When unconditional logistic regression model was used, results showed that there was interaction between eating irritable food and -2518MCP-1G/G genotype (OR=4.387). No interaction between environmental factors was found that contributing to SLE in this study. Many environmental factors were related to SLE, and there was an interaction between -2518MCP-1G/G genotype and eating irritable food.
A nonparametric multiple imputation approach for missing categorical data.
Zhou, Muhan; He, Yulei; Yu, Mandi; Hsu, Chiu-Hsieh
2017-06-06
Incomplete categorical variables with more than two categories are common in public health data. However, most of the existing missing-data methods do not use the information from nonresponse (missingness) probabilities. We propose a nearest-neighbour multiple imputation approach to impute a missing at random categorical outcome and to estimate the proportion of each category. The donor set for imputation is formed by measuring distances between each missing value with other non-missing values. The distance function is calculated based on a predictive score, which is derived from two working models: one fits a multinomial logistic regression for predicting the missing categorical outcome (the outcome model) and the other fits a logistic regression for predicting missingness probabilities (the missingness model). A weighting scheme is used to accommodate contributions from two working models when generating the predictive score. A missing value is imputed by randomly selecting one of the non-missing values with the smallest distances. We conduct a simulation to evaluate the performance of the proposed method and compare it with several alternative methods. A real-data application is also presented. The simulation study suggests that the proposed method performs well when missingness probabilities are not extreme under some misspecifications of the working models. However, the calibration estimator, which is also based on two working models, can be highly unstable when missingness probabilities for some observations are extremely high. In this scenario, the proposed method produces more stable and better estimates. In addition, proper weights need to be chosen to balance the contributions from the two working models and achieve optimal results for the proposed method. We conclude that the proposed multiple imputation method is a reasonable approach to dealing with missing categorical outcome data with more than two levels for assessing the distribution of the outcome. In terms of the choices for the working models, we suggest a multinomial logistic regression for predicting the missing outcome and a binary logistic regression for predicting the missingness probability.
NASA Astrophysics Data System (ADS)
Leighs, J. A.; Halling-Brown, M. D.; Patel, M. N.
2018-03-01
The UK currently has a national breast cancer-screening program and images are routinely collected from a number of screening sites, representing a wealth of invaluable data that is currently under-used. Radiologists evaluate screening images manually and recall suspicious cases for further analysis such as biopsy. Histological testing of biopsy samples confirms the malignancy of the tumour, along with other diagnostic and prognostic characteristics such as disease grade. Machine learning is becoming increasingly popular for clinical image classification problems, as it is capable of discovering patterns in data otherwise invisible. This is particularly true when applied to medical imaging features; however clinical datasets are often relatively small. A texture feature extraction toolkit has been developed to mine a wide range of features from medical images such as mammograms. This study analysed a dataset of 1,366 radiologist-marked, biopsy-proven malignant lesions obtained from the OPTIMAM Medical Image Database (OMI-DB). Exploratory data analysis methods were employed to better understand extracted features. Machine learning techniques including Classification and Regression Trees (CART), ensemble methods (e.g. random forests), and logistic regression were applied to the data to predict the disease grade of the analysed lesions. Prediction scores of up to 83% were achieved; sensitivity and specificity of the models trained have been discussed to put the results into a clinical context. The results show promise in the ability to predict prognostic indicators from the texture features extracted and thus enable prioritisation of care for patients at greatest risk.
ERIC Educational Resources Information Center
Shih, Ching-Lin; Liu, Tien-Hsiang; Wang, Wen-Chung
2014-01-01
The simultaneous item bias test (SIBTEST) method regression procedure and the differential item functioning (DIF)-free-then-DIF strategy are applied to the logistic regression (LR) method simultaneously in this study. These procedures are used to adjust the effects of matching true score on observed score and to better control the Type I error…
Optimizing methods for linking cinematic features to fMRI data.
Kauttonen, Janne; Hlushchuk, Yevhen; Tikka, Pia
2015-04-15
One of the challenges of naturalistic neurosciences using movie-viewing experiments is how to interpret observed brain activations in relation to the multiplicity of time-locked stimulus features. As previous studies have shown less inter-subject synchronization across viewers of random video footage than story-driven films, new methods need to be developed for analysis of less story-driven contents. To optimize the linkage between our fMRI data collected during viewing of a deliberately non-narrative silent film 'At Land' by Maya Deren (1944) and its annotated content, we combined the method of elastic-net regularization with the model-driven linear regression and the well-established data-driven independent component analysis (ICA) and inter-subject correlation (ISC) methods. In the linear regression analysis, both IC and region-of-interest (ROI) time-series were fitted with time-series of a total of 36 binary-valued and one real-valued tactile annotation of film features. The elastic-net regularization and cross-validation were applied in the ordinary least-squares linear regression in order to avoid over-fitting due to the multicollinearity of regressors, the results were compared against both the partial least-squares (PLS) regression and the un-regularized full-model regression. Non-parametric permutation testing scheme was applied to evaluate the statistical significance of regression. We found statistically significant correlation between the annotation model and 9 ICs out of 40 ICs. Regression analysis was also repeated for a large set of cubic ROIs covering the grey matter. Both IC- and ROI-based regression analyses revealed activations in parietal and occipital regions, with additional smaller clusters in the frontal lobe. Furthermore, we found elastic-net based regression more sensitive than PLS and un-regularized regression since it detected a larger number of significant ICs and ROIs. Along with the ISC ranking methods, our regression analysis proved a feasible method for ordering the ICs based on their functional relevance to the annotated cinematic features. The novelty of our method is - in comparison to the hypothesis-driven manual pre-selection and observation of some individual regressors biased by choice - in applying data-driven approach to all content features simultaneously. We found especially the combination of regularized regression and ICA useful when analyzing fMRI data obtained using non-narrative movie stimulus with a large set of complex and correlated features. Copyright © 2015. Published by Elsevier Inc.
Vukić, Tamara; Smith, Sean Robinson; Ljubas Kelečić, Dina; Desnica, Lana; Prenc, Ema; Pulanić, Dražen; Vrhovac, Radovan; Nemet, Damir; Pavletic, Steven Z.
2016-01-01
Aim To determine if there are correlations between joint and fascial chronic graft-vs-host disease (cGVHD) with clinical findings, laboratory parameters, and measures of functional capacity. Methods 29 patients were diagnosed with cGVHD based on National Institutes of Health (NIH) Consensus Criteria at the University Hospital Centre Zagreb from October 2013 to October 2015. Physical examination, including functional measures such as 2-minute walk test and hand grip strength, as well as laboratory tests were performed. The relationship between these evaluations and the severity of joint and fascial cGVHD was tested by logistical regression analysis. Results 12 of 29 patients (41.3%) had joint and fascial cGVHD diagnosed according to NIH Consensus Criteria. There was a significant positive correlation of joint and fascial cGVHD and skin cGVHD (P < 0.001), serum C3 complement level (P = 0.045), and leukocytes (P = 0.032). There was a significant negative correlation between 2-minute walk test (P = 0.016), percentage of cytotoxic T cells CD3+/CD8+ (P = 0.022), serum albumin (P = 0.047), and Karnofsky score (P < 0.001). Binary logistic regression model found that a significant predictor for joint and fascial cGVHD was cGVHD skin involvement (odds ratio, 7.79; 95 confidence interval 1.87-32.56; P = 0.005). Conclusion Joint and fascial cGVHD manifestations correlated with multiple laboratory measurements, clinical features, and cGVHD skin involvement, which was a significant predictor for joint and fascial cGVHD. PMID:27374828
ABCD² score may discriminate minor stroke from TIA on patient admission.
Zhao, Hui; Li, Qingjie; Lu, Mengru; Shao, Yuan; Li, Jingwei; Xu, Yun
2014-02-01
With the advent of time-dependent thrombolytic therapy for ischemic stroke, it has become increasingly important to differentiate transient ischemic attack (TIA) from minor stroke patients after symptom onset quickly. This study investigated the difference between TIA and minor stroke based on age, blood pressure, clinical features, duration of TIA, presence of diabetes, ABCD² score, digital subtraction angiography (DSA) and blood lipids. One hundred seventy-one patients with clinical manifestations as transient neurological deficits in Nanjing Drum Tower Hospital were studied retrospectively. All patients were evaluated by ABCD² score, blood lipid test, fibrinogen, and Holter electrocardiograph and DSA on admission. Patients were categorized into TIA group or minor stroke group according to CT and MRI scan 24 h within symptom onset. The study suggested that minor stroke patients were more likely to have a higher ABCD² score (odds ratio (OR) 2.060; 95% confidence interval (CI) 1.293-3.264). Receiver-operating characteristic curves identified ABCD² score >4 as the optimal cut-off for minor stroke diagnosis. Total serum cholesterol seemed a better diagnostic indicator to discriminate minor stroke from TIA (OR 4.815; 95% CI 0.946-1.654) than other blood lipids in simple logistic regression, but not valuable for the differentiation between TIA and minor stroke in multivariate logistic regression. Higher severity of intracranial internal carotid stenosis, especially >90%, were more likely to have minor stroke, but was not a reliable diagnostic indicator (P > 0.05). ABCD² could help clinicians to differentiate possible TIA from minor stroke at hospital admission while blood lipid parameters and artery stenosis location offer limited help.
Ultrasonographic Diagnosis of Biliary Atresia Based on a Decision-Making Tree Model.
Lee, So Mi; Cheon, Jung-Eun; Choi, Young Hun; Kim, Woo Sun; Cho, Hyun-Hae; Cho, Hyun-Hye; Kim, In-One; You, Sun Kyoung
2015-01-01
To assess the diagnostic value of various ultrasound (US) findings and to make a decision-tree model for US diagnosis of biliary atresia (BA). From March 2008 to January 2014, the following US findings were retrospectively evaluated in 100 infants with cholestatic jaundice (BA, n = 46; non-BA, n = 54): length and morphology of the gallbladder, triangular cord thickness, hepatic artery and portal vein diameters, and visualization of the common bile duct. Logistic regression analyses were performed to determine the features that would be useful in predicting BA. Conditional inference tree analysis was used to generate a decision-making tree for classifying patients into the BA or non-BA groups. Multivariate logistic regression analysis showed that abnormal gallbladder morphology and greater triangular cord thickness were significant predictors of BA (p = 0.003 and 0.001; adjusted odds ratio: 345.6 and 65.6, respectively). In the decision-making tree using conditional inference tree analysis, gallbladder morphology and triangular cord thickness (optimal cutoff value of triangular cord thickness, 3.4 mm) were also selected as significant discriminators for differential diagnosis of BA, and gallbladder morphology was the first discriminator. The diagnostic performance of the decision-making tree was excellent, with sensitivity of 100% (46/46), specificity of 94.4% (51/54), and overall accuracy of 97% (97/100). Abnormal gallbladder morphology and greater triangular cord thickness (> 3.4 mm) were the most useful predictors of BA on US. We suggest that the gallbladder morphology should be evaluated first and that triangular cord thickness should be evaluated subsequently in cases with normal gallbladder morphology.
Sreeramareddy, Chandrashekhar T; Panduru, Kishore V; Verma, Sharat C; Joshi, Hari S; Bates, Michael N
2008-01-01
Background Studies from developed countries have reported on host-related risk factors for extra-pulmonary tuberculosis (EPTB). However, similar studies from high-burden countries like Nepal are lacking. Therefore, we carried out this study to compare demographic, life-style and clinical characteristics between EPTB and PTB patients. Methods A retrospective analysis was carried out on 474 Tuberculosis (TB) patients diagnosed in a tertiary care hospital in western Nepal. Characteristics of demography, life-style and clinical features were obtained from medical case records. Risk factors for being an EPTB patient relative to a PTB patient were identified using logistic regression analysis. Results The age distribution of the TB patients had a bimodal distribution. The male to female ratio for PTB was 2.29. EPTB was more common at younger ages (< 25 years) and in females. Common sites for EPTB were lymph nodes (42.6%) and peritoneum and/or intestines (14.8%). By logistic regression analysis, age less than 25 years (OR 2.11 95% CI 1.12–3.68) and female gender (OR 1.69, 95% CI 1.12–2.56) were associated with EPTB. Smoking, use of immunosuppressive drugs/steroids, diabetes and past history of TB were more likely to be associated with PTB. Conclusion Results suggest that younger age and female gender may be independent risk factors for EPTB in a high-burden country like Nepal. TB control programmes may target young and female populations for EPTB case-finding. Further studies are necessary in other high-burden countries to confirm our findings. PMID:18218115
NASA Astrophysics Data System (ADS)
Domínguez-Cuesta, María José; Jiménez-Sánchez, Montserrat; Berrezueta, Edgar
2007-09-01
A geomorphological study focussing on slope instability and landslide susceptibility modelling was performed on a 278 km 2 area in the Nalón River Basin (Central Coalfield, NW Spain). The methodology of the study includes: 1) geomorphological mapping at both 1:5000 and 1:25,000 scales based on air-photo interpretation and field work; 2) Digital Terrain Model (DTM) creation and overlay of geomorphological and DTM layers in a Geographical Information System (GIS); and 3) statistical treatment of variables using SPSS and development of a logistic regression model. A total of 603 mass movements including earth flow and debris flow were inventoried and were classified into two groups according to their size. This study focuses on the first group with small mass movements (10 0 to 10 1 m in size), which often cause damage to infrastructures and even victims. The detected conditioning factors of these landslides are lithology (soils and colluviums), vegetation (pasture) and topography. DTM analyses show that high instabilities are linked to slopes with NE and SW orientations, curvature values between - 6 and - 0.7, and slope values from 16° to 30°. Bedrock lithology (Carboniferous sandstone and siltstone), presence of Quaternary soils and sediments, vegetation, and the topographical factors were used to develop a landslide susceptibility model using the logistic regression method. Application of "zoom method" allows us to accurately detect small mass movements using a 5-m grid cell data even if geomorphological mapping is done at a 1:25,000 scale.
The value of nodal information in predicting lung cancer relapse using 4DPET/4DCT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Heyse, E-mail: heyse.li@mail.utoronto.ca; Becker, Nathan; Raman, Srinivas
2015-08-15
Purpose: There is evidence that computed tomography (CT) and positron emission tomography (PET) imaging metrics are prognostic and predictive in nonsmall cell lung cancer (NSCLC) treatment outcomes. However, few studies have explored the use of standardized uptake value (SUV)-based image features of nodal regions as predictive features. The authors investigated and compared the use of tumor and node image features extracted from the radiotherapy target volumes to predict relapse in a cohort of NSCLC patients undergoing chemoradiation treatment. Methods: A prospective cohort of 25 patients with locally advanced NSCLC underwent 4DPET/4DCT imaging for radiation planning. Thirty-seven image features were derivedmore » from the CT-defined volumes and SUVs of the PET image from both the tumor and nodal target regions. The machine learning methods of logistic regression and repeated stratified five-fold cross-validation (CV) were used to predict local and overall relapses in 2 yr. The authors used well-known feature selection methods (Spearman’s rank correlation, recursive feature elimination) within each fold of CV. Classifiers were ranked on their Matthew’s correlation coefficient (MCC) after CV. Area under the curve, sensitivity, and specificity values are also presented. Results: For predicting local relapse, the best classifier found had a mean MCC of 0.07 and was composed of eight tumor features. For predicting overall relapse, the best classifier found had a mean MCC of 0.29 and was composed of a single feature: the volume greater than 0.5 times the maximum SUV (N). Conclusions: The best classifier for predicting local relapse had only tumor features. In contrast, the best classifier for predicting overall relapse included a node feature. Overall, the methods showed that nodes add value in predicting overall relapse but not local relapse.« less
Iqbal, Abdullah; Valous, Nektarios A; Mendoza, Fernando; Sun, Da-Wen; Allen, Paul
2010-03-01
Images of three qualities of pre-sliced pork and Turkey hams were evaluated for colour and textural features to characterize and classify them, and to model the ham appearance grading and preference responses of a group of consumers. A total of 26 colour features and 40 textural features were extracted for analysis. Using Mahalanobis distance and feature inter-correlation analyses, two best colour [mean of S (saturation in HSV colour space), std. deviation of b*, which indicates blue to yellow in L*a*b* colour space] and three textural features [entropy of b*, contrast of H (hue of HSV colour space), entropy of R (red of RGB colour space)] for pork, and three colour (mean of R, mean of H, std. deviation of a*, which indicates green to red in L*a*b* colour space) and two textural features [contrast of B, contrast of L* (luminance or lightness in L*a*b* colour space)] for Turkey hams were selected as features with the highest discriminant power. High classification performances were reached for both types of hams (>99.5% for pork and >90.5% for Turkey) using the best selected features or combinations of them. In spite of the poor/fair agreement among ham consumers as determined by Kappa analysis (Kappa-value<0.4) for sensory grading (surface colour, colour uniformity, bitonality, texture appearance and acceptability), a dichotomous logistic regression model using the best image features was able to explain the variability of consumers' responses for all sensorial attributes with accuracies higher than 74.1% for pork hams and 83.3% for Turkey hams. Copyright 2009 Elsevier Ltd. All rights reserved.
An EEG-based functional connectivity measure for automatic detection of alcohol use disorder.
Mumtaz, Wajid; Saad, Mohamad Naufal B Mohamad; Kamel, Nidal; Ali, Syed Saad Azhar; Malik, Aamir Saeed
2018-01-01
The abnormal alcohol consumption could cause toxicity and could alter the human brain's structure and function, termed as alcohol used disorder (AUD). Unfortunately, the conventional screening methods for AUD patients are subjective and manual. Hence, to perform automatic screening of AUD patients, objective methods are needed. The electroencephalographic (EEG) data have been utilized to study the differences of brain signals between alcoholics and healthy controls that could further developed as an automatic screening tool for alcoholics. In this work, resting-state EEG-derived features were utilized as input data to the proposed feature selection and classification method. The aim was to perform automatic classification of AUD patients and healthy controls. The validation of the proposed method involved real-EEG data acquired from 30 AUD patients and 30 age-matched healthy controls. The resting-state EEG-derived features such as synchronization likelihood (SL) were computed involving 19 scalp locations resulted into 513 features. Furthermore, the features were rank-ordered to select the most discriminant features involving a rank-based feature selection method according to a criterion, i.e., receiver operating characteristics (ROC). Consequently, a reduced set of most discriminant features was identified and utilized further during classification of AUD patients and healthy controls. In this study, three different classification models such as Support Vector Machine (SVM), Naïve Bayesian (NB), and Logistic Regression (LR) were used. The study resulted into SVM classification accuracy=98%, sensitivity=99.9%, specificity=95%, and f-measure=0.97; LR classification accuracy=91.7%, sensitivity=86.66%, specificity=96.6%, and f-measure=0.90; NB classification accuracy=93.6%, sensitivity=100%, specificity=87.9%, and f-measure=0.95. The SL features could be utilized as objective markers to screen the AUD patients and healthy controls. Copyright © 2017 Elsevier B.V. All rights reserved.
Access disparities to Magnet hospitals for patients undergoing neurosurgical operations
Missios, Symeon; Bekelis, Kimon
2017-01-01
Background Centers of excellence focusing on quality improvement have demonstrated superior outcomes for a variety of surgical interventions. We investigated the presence of access disparities to hospitals recognized by the Magnet Recognition Program of the American Nurses Credentialing Center (ANCC) for patients undergoing neurosurgical operations. Methods We performed a cohort study of all neurosurgery patients who were registered in the New York Statewide Planning and Research Cooperative System (SPARCS) database from 2009–2013. We examined the association of African-American race and lack of insurance with Magnet status hospitalization for neurosurgical procedures. A mixed effects propensity adjusted multivariable regression analysis was used to control for confounding. Results During the study period, 190,535 neurosurgical patients met the inclusion criteria. Using a multivariable logistic regression, we demonstrate that African-Americans had lower admission rates to Magnet institutions (OR 0.62; 95% CI, 0.58–0.67). This persisted in a mixed effects logistic regression model (OR 0.77; 95% CI, 0.70–0.83) to adjust for clustering at the patient county level, and a propensity score adjusted logistic regression model (OR 0.75; 95% CI, 0.69–0.82). Additionally, lack of insurance was associated with lower admission rates to Magnet institutions (OR 0.71; 95% CI, 0.68–0.73), in a multivariable logistic regression model. This persisted in a mixed effects logistic regression model (OR 0.72; 95% CI, 0.69–0.74), and a propensity score adjusted logistic regression model (OR 0.72; 95% CI, 0.69–0.75). Conclusions Using a comprehensive all-payer cohort of neurosurgery patients in New York State we identified an association of African-American race and lack of insurance with lower rates of admission to Magnet hospitals. PMID:28684152
Adjusting for Confounding in Early Postlaunch Settings: Going Beyond Logistic Regression Models.
Schmidt, Amand F; Klungel, Olaf H; Groenwold, Rolf H H
2016-01-01
Postlaunch data on medical treatments can be analyzed to explore adverse events or relative effectiveness in real-life settings. These analyses are often complicated by the number of potential confounders and the possibility of model misspecification. We conducted a simulation study to compare the performance of logistic regression, propensity score, disease risk score, and stabilized inverse probability weighting methods to adjust for confounding. Model misspecification was induced in the independent derivation dataset. We evaluated performance using relative bias confidence interval coverage of the true effect, among other metrics. At low events per coefficient (1.0 and 0.5), the logistic regression estimates had a large relative bias (greater than -100%). Bias of the disease risk score estimates was at most 13.48% and 18.83%. For the propensity score model, this was 8.74% and >100%, respectively. At events per coefficient of 1.0 and 0.5, inverse probability weighting frequently failed or reduced to a crude regression, resulting in biases of -8.49% and 24.55%. Coverage of logistic regression estimates became less than the nominal level at events per coefficient ≤5. For the disease risk score, inverse probability weighting, and propensity score, coverage became less than nominal at events per coefficient ≤2.5, ≤1.0, and ≤1.0, respectively. Bias of misspecified disease risk score models was 16.55%. In settings with low events/exposed subjects per coefficient, disease risk score methods can be useful alternatives to logistic regression models, especially when propensity score models cannot be used. Despite better performance of disease risk score methods than logistic regression and propensity score models in small events per coefficient settings, bias, and coverage still deviated from nominal.
A new method for constructing networks from binary data
NASA Astrophysics Data System (ADS)
van Borkulo, Claudia D.; Borsboom, Denny; Epskamp, Sacha; Blanken, Tessa F.; Boschloo, Lynn; Schoevers, Robert A.; Waldorp, Lourens J.
2014-08-01
Network analysis is entering fields where network structures are unknown, such as psychology and the educational sciences. A crucial step in the application of network models lies in the assessment of network structure. Current methods either have serious drawbacks or are only suitable for Gaussian data. In the present paper, we present a method for assessing network structures from binary data. Although models for binary data are infamous for their computational intractability, we present a computationally efficient model for estimating network structures. The approach, which is based on Ising models as used in physics, combines logistic regression with model selection based on a Goodness-of-Fit measure to identify relevant relationships between variables that define connections in a network. A validation study shows that this method succeeds in revealing the most relevant features of a network for realistic sample sizes. We apply our proposed method to estimate the network of depression and anxiety symptoms from symptom scores of 1108 subjects. Possible extensions of the model are discussed.
Parental Vaccine Acceptance: A Logistic Regression Model Using Previsit Decisions.
Lee, Sara; Riley-Behringer, Maureen; Rose, Jeanmarie C; Meropol, Sharon B; Lazebnik, Rina
2017-07-01
This study explores how parents' intentions regarding vaccination prior to their children's visit were associated with actual vaccine acceptance. A convenience sample of parents accompanying 6-week-old to 17-year-old children completed a written survey at 2 pediatric practices. Using hierarchical logistic regression, for hospital-based participants (n = 216), vaccine refusal history ( P < .01) and vaccine decision made before the visit ( P < .05) explained 87% of vaccine refusals. In community-based participants (n = 100), vaccine refusal history ( P < .01) explained 81% of refusals. Over 1 in 5 parents changed their minds about vaccination during the visit. Thirty parents who were previous vaccine refusers accepted current vaccines, and 37 who had intended not to vaccinate choose vaccination. Twenty-nine parents without a refusal history declined vaccines, and 32 who did not intend to refuse before the visit declined vaccination. Future research should identify key factors to nudge parent decision making in favor of vaccination.
Nie, Z Q; Ou, Y Q; Zhuang, J; Qu, Y J; Mai, J Z; Chen, J M; Liu, X Q
2016-05-01
Conditional logistic regression analysis and unconditional logistic regression analysis are commonly used in case control study, but Cox proportional hazard model is often used in survival data analysis. Most literature only refer to main effect model, however, generalized linear model differs from general linear model, and the interaction was composed of multiplicative interaction and additive interaction. The former is only statistical significant, but the latter has biological significance. In this paper, macros was written by using SAS 9.4 and the contrast ratio, attributable proportion due to interaction and synergy index were calculated while calculating the items of logistic and Cox regression interactions, and the confidence intervals of Wald, delta and profile likelihood were used to evaluate additive interaction for the reference in big data analysis in clinical epidemiology and in analysis of genetic multiplicative and additive interactions.
Ratliff, John K; Balise, Ray; Veeravagu, Anand; Cole, Tyler S; Cheng, Ivan; Olshen, Richard A; Tian, Lu
2016-05-18
Postoperative metrics are increasingly important in determining standards of quality for physicians and hospitals. Although complications following spinal surgery have been described, procedural and patient variables have yet to be incorporated into a predictive model of adverse-event occurrence. We sought to develop a predictive model of complication occurrence after spine surgery. We used longitudinal prospective data from a national claims database and developed a predictive model incorporating complication type and frequency of occurrence following spine surgery procedures. We structured our model to assess the impact of features such as preoperative diagnosis, patient comorbidities, location in the spine, anterior versus posterior approach, whether fusion had been performed, whether instrumentation had been used, number of levels, and use of bone morphogenetic protein (BMP). We assessed a variety of adverse events. Prediction models were built using logistic regression with additive main effects and logistic regression with main effects as well as all 2 and 3-factor interactions. Least absolute shrinkage and selection operator (LASSO) regularization was used to select features. Competing approaches included boosted additive trees and the classification and regression trees (CART) algorithm. The final prediction performance was evaluated by estimating the area under a receiver operating characteristic curve (AUC) as predictions were applied to independent validation data and compared with the Charlson comorbidity score. The model was developed from 279,135 records of patients with a minimum duration of follow-up of 30 days. Preliminary assessment showed an adverse-event rate of 13.95%, well within norms reported in the literature. We used the first 80% of the records for training (to predict adverse events) and the remaining 20% of the records for validation. There was remarkable similarity among methods, with an AUC of 0.70 for predicting the occurrence of adverse events. The AUC using the Charlson comorbidity score was 0.61. The described model was more accurate than Charlson scoring (p < 0.01). We present a modeling effort based on administrative claims data that predicts the occurrence of complications after spine surgery. We believe that the development of a predictive modeling tool illustrating the risk of complication occurrence after spine surgery will aid in patient counseling and improve the accuracy of risk modeling strategies. Copyright © 2016 by The Journal of Bone and Joint Surgery, Incorporated.
EEG-based mild depressive detection using feature selection methods and classifiers.
Li, Xiaowei; Hu, Bin; Sun, Shuting; Cai, Hanshu
2016-11-01
Depression has become a major health burden worldwide, and effectively detection of such disorder is a great challenge which requires latest technological tool, such as Electroencephalography (EEG). This EEG-based research seeks to find prominent frequency band and brain regions that are most related to mild depression, as well as an optimal combination of classification algorithms and feature selection methods which can be used in future mild depression detection. An experiment based on facial expression viewing task (Emo_block and Neu_block) was conducted, and EEG data of 37 university students were collected using a 128 channel HydroCel Geodesic Sensor Net (HCGSN). For discriminating mild depressive patients and normal controls, BayesNet (BN), Support Vector Machine (SVM), Logistic Regression (LR), k-nearest neighbor (KNN) and RandomForest (RF) classifiers were used. And BestFirst (BF), GreedyStepwise (GSW), GeneticSearch (GS), LinearForwordSelection (LFS) and RankSearch (RS) based on Correlation Features Selection (CFS) were applied for linear and non-linear EEG features selection. Independent Samples T-test with Bonferroni correction was used to find the significantly discriminant electrodes and features. Data mining results indicate that optimal performance is achieved using a combination of feature selection method GSW based on CFS and classifier KNN for beta frequency band. Accuracies achieved 92.00% and 98.00%, and AUC achieved 0.957 and 0.997, for Emo_block and Neu_block beta band data respectively. T-test results validate the effectiveness of selected features by search method GSW. Simplified EEG system with only FP1, FP2, F3, O2, T3 electrodes was also explored with linear features, which yielded accuracies of 91.70% and 96.00%, AUC of 0.952 and 0.972, for Emo_block and Neu_block respectively. Classification results obtained by GSW + KNN are encouraging and better than previously published results. In the spatial distribution of features, we find that left parietotemporal lobe in beta EEG frequency band has greater effect on mild depression detection. And fewer EEG channels (FP1, FP2, F3, O2 and T3) combined with linear features may be good candidates for usage in portable systems for mild depression detection. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Laman, Charlie J. M.; Kho, Angel
Bornean Hornbills (Family Bucerotidae) are omnivorous creatures, distinguished for their large size and large bill. In our study, only five out of eight species of Bornean hornbills were available. Our aims were to determine the taxonomic, morphological and sexual variations, among the species. Nine morphological features were measured from 83 specimens. Canonical Discriminant and Cluster analyses showed that the data were successfully clustered into 5 species. Logistic regression analyses showed that the diagnostic character differentiation is total length. Further results showed that males tend to be bigger than females.
Multiview alignment hashing for efficient image search.
Liu, Li; Yu, Mengyang; Shao, Ling
2015-03-01
Hashing is a popular and efficient method for nearest neighbor search in large-scale data spaces by embedding high-dimensional feature descriptors into a similarity preserving Hamming space with a low dimension. For most hashing methods, the performance of retrieval heavily depends on the choice of the high-dimensional feature descriptor. Furthermore, a single type of feature cannot be descriptive enough for different images when it is used for hashing. Thus, how to combine multiple representations for learning effective hashing functions is an imminent task. In this paper, we present a novel unsupervised multiview alignment hashing approach based on regularized kernel nonnegative matrix factorization, which can find a compact representation uncovering the hidden semantics and simultaneously respecting the joint probability distribution of data. In particular, we aim to seek a matrix factorization to effectively fuse the multiple information sources meanwhile discarding the feature redundancy. Since the raised problem is regarded as nonconvex and discrete, our objective function is then optimized via an alternate way with relaxation and converges to a locally optimal solution. After finding the low-dimensional representation, the hashing functions are finally obtained through multivariable logistic regression. The proposed method is systematically evaluated on three data sets: 1) Caltech-256; 2) CIFAR-10; and 3) CIFAR-20, and the results show that our method significantly outperforms the state-of-the-art multiview hashing techniques.
Toward Predicting Social Support Needs in Online Health Social Networks.
Choi, Min-Je; Kim, Sung-Hee; Lee, Sukwon; Kwon, Bum Chul; Yi, Ji Soo; Choo, Jaegul; Huh, Jina
2017-08-02
While online health social networks (OHSNs) serve as an effective platform for patients to fulfill their various social support needs, predicting the needs of users and providing tailored information remains a challenge. The objective of this study was to discriminate important features for identifying users' social support needs based on knowledge gathered from survey data. This study also provides guidelines for a technical framework, which can be used to predict users' social support needs based on raw data collected from OHSNs. We initially conducted a Web-based survey with 184 OHSN users. From this survey data, we extracted 34 features based on 5 categories: (1) demographics, (2) reading behavior, (3) posting behavior, (4) perceived roles in OHSNs, and (5) values sought in OHSNs. Features from the first 4 categories were used as variables for binary classification. For the prediction outcomes, we used features from the last category: the needs for emotional support, experience-based information, unconventional information, and medical facts. We compared 5 binary classifier algorithms: gradient boosting tree, random forest, decision tree, support vector machines, and logistic regression. We then calculated the scores of the area under the receiver operating characteristic (ROC) curve (AUC) to understand the comparative effectiveness of the used features. The best performance was AUC scores of 0.89 for predicting users seeking emotional support, 0.86 for experience-based information, 0.80 for unconventional information, and 0.83 for medical facts. With the gradient boosting tree as our best performing model, we analyzed the strength of individual features in predicting one's social support need. Among other discoveries, we found that users seeking emotional support tend to post more in OHSNs compared with others. We developed an initial framework for automatically predicting social support needs in OHSNs using survey data. Future work should involve nonsurvey data to evaluate the feasibility of the framework. Our study contributes to providing personalized social support in OHSNs. ©Min-Je Choi, Sung-Hee Kim, Sukwon Lee, Bum Chul Kwon, Ji Soo Yi, Jaegul Choo, Jina Huh. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 02.08.2017.
[Willingness of Patients with Obesity to Use New Media in Rehabilitation Aftercare].
Dorow, M; Löbner, M; Stein, J; Kind, P; Markert, J; Keller, J; Weidauer, E; Riedel-Heller, S G
2017-06-01
Digital media offer new possibilities in rehabilitation aftercare. This study investigates the rehabilitants' willingness to use new media (sms, internet, social networks) in rehabilitation aftercare and factors that are associated with the willingness to use media-based aftercare. 92 rehabilitants (patients with obesity) filled in a questionnaire on the willingness to use new media in rehabilitation aftercare. In order to identify influencing factors, binary logistic regression models were calculated. 3 quarters of the rehabilitants (76.1%) reported that they would be willing to use new media in rehabilitation aftercare. The binary logistic regression model yielded two factors that were associated with the willingness to use media-based aftercare: the possession of a smartphone and the willingness to receive telephone counseling for aftercare. The majority of the rehabilitants was willing to use new media in rehabilitation aftercare. The reasons for refusal of media-based aftercare need to be examined more closely. © Georg Thieme Verlag KG Stuttgart · New York.
No rationale for 1 variable per 10 events criterion for binary logistic regression analysis.
van Smeden, Maarten; de Groot, Joris A H; Moons, Karel G M; Collins, Gary S; Altman, Douglas G; Eijkemans, Marinus J C; Reitsma, Johannes B
2016-11-24
Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth's correction, are compared. The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect ('separation'). We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth's correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.
Non-proportional odds multivariate logistic regression of ordinal family data.
Zaloumis, Sophie G; Scurrah, Katrina J; Harrap, Stephen B; Ellis, Justine A; Gurrin, Lyle C
2015-03-01
Methods to examine whether genetic and/or environmental sources can account for the residual variation in ordinal family data usually assume proportional odds. However, standard software to fit the non-proportional odds model to ordinal family data is limited because the correlation structure of family data is more complex than for other types of clustered data. To perform these analyses we propose the non-proportional odds multivariate logistic regression model and take a simulation-based approach to model fitting using Markov chain Monte Carlo methods, such as partially collapsed Gibbs sampling and the Metropolis algorithm. We applied the proposed methodology to male pattern baldness data from the Victorian Family Heart Study. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Sun, Hokeun; Wang, Shuang
2013-05-30
The matched case-control designs are commonly used to control for potential confounding factors in genetic epidemiology studies especially epigenetic studies with DNA methylation. Compared with unmatched case-control studies with high-dimensional genomic or epigenetic data, there have been few variable selection methods for matched sets. In an earlier paper, we proposed the penalized logistic regression model for the analysis of unmatched DNA methylation data using a network-based penalty. However, for popularly applied matched designs in epigenetic studies that compare DNA methylation between tumor and adjacent non-tumor tissues or between pre-treatment and post-treatment conditions, applying ordinary logistic regression ignoring matching is known to bring serious bias in estimation. In this paper, we developed a penalized conditional logistic model using the network-based penalty that encourages a grouping effect of (1) linked Cytosine-phosphate-Guanine (CpG) sites within a gene or (2) linked genes within a genetic pathway for analysis of matched DNA methylation data. In our simulation studies, we demonstrated the superiority of using conditional logistic model over unconditional logistic model in high-dimensional variable selection problems for matched case-control data. We further investigated the benefits of utilizing biological group or graph information for matched case-control data. We applied the proposed method to a genome-wide DNA methylation study on hepatocellular carcinoma (HCC) where we investigated the DNA methylation levels of tumor and adjacent non-tumor tissues from HCC patients by using the Illumina Infinium HumanMethylation27 Beadchip. Several new CpG sites and genes known to be related to HCC were identified but were missed by the standard method in the original paper. Copyright © 2012 John Wiley & Sons, Ltd.
Cunningham, Marc; Bock, Ariella; Brown, Niquelle; Sacher, Suzy; Hatch, Benjamin; Inglis, Andrew; Aronovich, Dana
2015-09-01
Contraceptive prevalence rate (CPR) is a vital indicator used by country governments, international donors, and other stakeholders for measuring progress in family planning programs against country targets and global initiatives as well as for estimating health outcomes. Because of the need for more frequent CPR estimates than population-based surveys currently provide, alternative approaches for estimating CPRs are being explored, including using contraceptive logistics data. Using data from the Demographic and Health Surveys (DHS) in 30 countries, population data from the United States Census Bureau International Database, and logistics data from the Procurement Planning and Monitoring Report (PPMR) and the Pipeline Monitoring and Procurement Planning System (PipeLine), we developed and evaluated 3 models to generate country-level, public-sector contraceptive prevalence estimates for injectable contraceptives, oral contraceptives, and male condoms. Models included: direct estimation through existing couple-years of protection (CYP) conversion factors, bivariate linear regression, and multivariate linear regression. Model evaluation consisted of comparing the referent DHS prevalence rates for each short-acting method with the model-generated prevalence rate using multiple metrics, including mean absolute error and proportion of countries where the modeled prevalence rate for each method was within 1, 2, or 5 percentage points of the DHS referent value. For the methods studied, family planning use estimates from public-sector logistics data were correlated with those from the DHS, validating the quality and accuracy of current public-sector logistics data. Logistics data for oral and injectable contraceptives were significantly associated (P<.05) with the referent DHS values for both bivariate and multivariate models. For condoms, however, that association was only significant for the bivariate model. With the exception of the CYP-based model for condoms, models were able to estimate public-sector prevalence rates for each short-acting method to within 2 percentage points in at least 85% of countries. Public-sector contraceptive logistics data are strongly correlated with public-sector prevalence rates for short-acting methods, demonstrating the quality of current logistics data and their ability to provide relatively accurate prevalence estimates. The models provide a starting point for generating interim estimates of contraceptive use when timely survey data are unavailable. All models except the condoms CYP model performed well; the regression models were most accurate but the CYP model offers the simplest calculation method. Future work extending the research to other modern methods, relating subnational logistics data with prevalence rates, and tracking that relationship over time is needed. © Cunningham et al.
Cunningham, Marc; Brown, Niquelle; Sacher, Suzy; Hatch, Benjamin; Inglis, Andrew; Aronovich, Dana
2015-01-01
Background: Contraceptive prevalence rate (CPR) is a vital indicator used by country governments, international donors, and other stakeholders for measuring progress in family planning programs against country targets and global initiatives as well as for estimating health outcomes. Because of the need for more frequent CPR estimates than population-based surveys currently provide, alternative approaches for estimating CPRs are being explored, including using contraceptive logistics data. Methods: Using data from the Demographic and Health Surveys (DHS) in 30 countries, population data from the United States Census Bureau International Database, and logistics data from the Procurement Planning and Monitoring Report (PPMR) and the Pipeline Monitoring and Procurement Planning System (PipeLine), we developed and evaluated 3 models to generate country-level, public-sector contraceptive prevalence estimates for injectable contraceptives, oral contraceptives, and male condoms. Models included: direct estimation through existing couple-years of protection (CYP) conversion factors, bivariate linear regression, and multivariate linear regression. Model evaluation consisted of comparing the referent DHS prevalence rates for each short-acting method with the model-generated prevalence rate using multiple metrics, including mean absolute error and proportion of countries where the modeled prevalence rate for each method was within 1, 2, or 5 percentage points of the DHS referent value. Results: For the methods studied, family planning use estimates from public-sector logistics data were correlated with those from the DHS, validating the quality and accuracy of current public-sector logistics data. Logistics data for oral and injectable contraceptives were significantly associated (P<.05) with the referent DHS values for both bivariate and multivariate models. For condoms, however, that association was only significant for the bivariate model. With the exception of the CYP-based model for condoms, models were able to estimate public-sector prevalence rates for each short-acting method to within 2 percentage points in at least 85% of countries. Conclusions: Public-sector contraceptive logistics data are strongly correlated with public-sector prevalence rates for short-acting methods, demonstrating the quality of current logistics data and their ability to provide relatively accurate prevalence estimates. The models provide a starting point for generating interim estimates of contraceptive use when timely survey data are unavailable. All models except the condoms CYP model performed well; the regression models were most accurate but the CYP model offers the simplest calculation method. Future work extending the research to other modern methods, relating subnational logistics data with prevalence rates, and tracking that relationship over time is needed. PMID:26374805
MODELING SNAKE MICROHABITAT FROM RADIOTELEMETRY STUDIES USING POLYTOMOUS LOGISTIC REGRESSION
Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...
Applying machine-learning techniques to Twitter data for automatic hazard-event classification.
NASA Astrophysics Data System (ADS)
Filgueira, R.; Bee, E. J.; Diaz-Doce, D.; Poole, J., Sr.; Singh, A.
2017-12-01
The constant flow of information offered by tweets provides valuable information about all sorts of events at a high temporal and spatial resolution. Over the past year we have been analyzing in real-time geological hazards/phenomenon, such as earthquakes, volcanic eruptions, landslides, floods or the aurora, as part of the GeoSocial project, by geo-locating tweets filtered by keywords in a web-map. However, not all the filtered tweets are related with hazard/phenomenon events. This work explores two classification techniques for automatic hazard-event categorization based on tweets about the "Aurora". First, tweets were filtered using aurora-related keywords, removing stop words and selecting the ones written in English. For classifying the remaining between "aurora-event" or "no-aurora-event" categories, we compared two state-of-art techniques: Support Vector Machine (SVM) and Deep Convolutional Neural Networks (CNN) algorithms. Both approaches belong to the family of supervised learning algorithms, which make predictions based on labelled training dataset. Therefore, we created a training dataset by tagging 1200 tweets between both categories. The general form of SVM is used to separate two classes by a function (kernel). We compared the performance of four different kernels (Linear Regression, Logistic Regression, Multinomial Naïve Bayesian and Stochastic Gradient Descent) provided by Scikit-Learn library using our training dataset to build the SVM classifier. The results shown that the Logistic Regression (LR) gets the best accuracy (87%). So, we selected the SVM-LR classifier to categorise a large collection of tweets using the "dispel4py" framework.Later, we developed a CNN classifier, where the first layer embeds words into low-dimensional vectors. The next layer performs convolutions over the embedded word vectors. Results from the convolutional layer are max-pooled into a long feature vector, which is classified using a softmax layer. The CNN's accuracy is lower (83%) than the SVM-LR, since the algorithm needs a bigger training dataset to increase its accuracy. We used TensorFlow framework for applying CNN classifier to the same collection of tweets.In future we will modify both classifiers to work with other geo-hazards, use larger training datasets and apply them in real-time.
Chien, Yu-Tai; Chen, Yu-Jen; Hsiung, Hsiao-Fang; Chen, Hsiao-Jung; Hsieh, Meng-Hua; Wu, Wen-Jie
2017-01-01
Background Physical activity is important for middle-agers to maintain health both in middle age and in old age. Although thousands of exercise-promotion mobile phone apps are available for download, current literature offers little understanding regarding which design features can enhance middle-aged adults’ quality perception toward exercise-promotion apps and which factor may influence such perception. Objectives The aims of this study were to understand (1) which design features of exercise-promotion apps can enhance quality perception of middle-agers, (2) whether their needs are matched by current functions offered in app stores, and (3) whether physical activity (PA) and mobile phone self-efficacy (MPSE) influence quality perception. Methods A total of 105 middle-agers participated and filled out three scales: the International Physical Activity Questionnaire (IPAQ), the MPSE scale, and the need for design features questionnaire. The design features were developed based on the Coventry, Aberdeen, and London—Refined (CALO-RE) taxonomy. Following the Kano quality model, the need for design features questionnaire asked participants to classify design features into five categories: attractive, one-dimensional, must-be, indifferent, and reverse. The quality categorization was conducted based on a voting approach and the categorization results were compared with the findings of a prevalence study to realize whether needs match current availability. In total, 52 multinomial logistic regression models were analyzed to evaluate the effects of PA level and MPSE on quality perception of design features. Results The Kano analysis on the total sample revealed that visual demonstration of exercise instructions is the only attractive design feature, whereas the other 51 design features were perceived with indifference. Although examining quality perception by PA level, 21 features are recommended to low level, 6 features to medium level, but none to high-level PA. In contrast, high-level MPSE is recommended with 14 design features, medium level with 6 features, whereas low-level participants are recommended with 1 feature. The analysis suggests that the implementation of demanded features could be low, as the average prevalence of demanded design features is 20% (4.3/21). Surprisingly, social comparison and social support, most implemented features in current apps, were categorized into the indifferent category. The magnitude of effect is larger for MPSE because it effects quality perception of more design features than PA. Delving into the 52 regression models revealed that high MPSE more likely induces attractive or one- dimensional categorization, suggesting the importance of technological self-efficacy on eHealth care promotion. Conclusions This study is the first to propose middle-agers’ needs in relation to mobile phone exercise-promotion. In addition to the tailor-made recommendations, suggestions are offered to app designers to enhance the performance of persuasive features. An interesting finding on change of quality perception attributed to MPSE is proposed as future research. PMID:28546140
Patterson, Brent R.; Anderson, Morgan L.; Rodgers, Arthur R.; Vander Vennen, Lucas M.; Fryxell, John M.
2017-01-01
Woodland caribou (Rangifer tarandus caribou) in Ontario are a threatened species that have experienced a substantial retraction of their historic range. Part of their decline has been attributed to increasing densities of anthropogenic linear features such as trails, roads, railways, and hydro lines. These features have been shown to increase the search efficiency and kill rate of wolves. However, it is unclear whether selection for anthropogenic linear features is additive or compensatory to selection for natural (water) linear features which may also be used for travel. We studied the selection of water and anthropogenic linear features by 52 resident wolves (Canis lupus x lycaon) over four years across three study areas in northern Ontario that varied in degrees of forestry activity and human disturbance. We used Euclidean distance-based resource selection functions (mixed-effects logistic regression) at the seasonal range scale with random coefficients for distance to water linear features, primary/secondary roads/railways, and hydro lines, and tertiary roads to estimate the strength of selection for each linear feature and for several habitat types, while accounting for availability of each feature. Next, we investigated the trade-off between selection for anthropogenic and water linear features. Wolves selected both anthropogenic and water linear features; selection for anthropogenic features was stronger than for water during the rendezvous season. Selection for anthropogenic linear features increased with increasing density of these features on the landscape, while selection for natural linear features declined, indicating compensatory selection of anthropogenic linear features. These results have implications for woodland caribou conservation. Prey encounter rates between wolves and caribou seem to be strongly influenced by increasing linear feature densities. This behavioral mechanism–a compensatory functional response to anthropogenic linear feature density resulting in decreased use of natural travel corridors–has negative consequences for the viability of woodland caribou. PMID:29117234
Newton, Erica J; Patterson, Brent R; Anderson, Morgan L; Rodgers, Arthur R; Vander Vennen, Lucas M; Fryxell, John M
2017-01-01
Woodland caribou (Rangifer tarandus caribou) in Ontario are a threatened species that have experienced a substantial retraction of their historic range. Part of their decline has been attributed to increasing densities of anthropogenic linear features such as trails, roads, railways, and hydro lines. These features have been shown to increase the search efficiency and kill rate of wolves. However, it is unclear whether selection for anthropogenic linear features is additive or compensatory to selection for natural (water) linear features which may also be used for travel. We studied the selection of water and anthropogenic linear features by 52 resident wolves (Canis lupus x lycaon) over four years across three study areas in northern Ontario that varied in degrees of forestry activity and human disturbance. We used Euclidean distance-based resource selection functions (mixed-effects logistic regression) at the seasonal range scale with random coefficients for distance to water linear features, primary/secondary roads/railways, and hydro lines, and tertiary roads to estimate the strength of selection for each linear feature and for several habitat types, while accounting for availability of each feature. Next, we investigated the trade-off between selection for anthropogenic and water linear features. Wolves selected both anthropogenic and water linear features; selection for anthropogenic features was stronger than for water during the rendezvous season. Selection for anthropogenic linear features increased with increasing density of these features on the landscape, while selection for natural linear features declined, indicating compensatory selection of anthropogenic linear features. These results have implications for woodland caribou conservation. Prey encounter rates between wolves and caribou seem to be strongly influenced by increasing linear feature densities. This behavioral mechanism-a compensatory functional response to anthropogenic linear feature density resulting in decreased use of natural travel corridors-has negative consequences for the viability of woodland caribou.
Neurophysiological correlates of depressive symptoms in young adults: A quantitative EEG study.
Lee, Poh Foong; Kan, Donica Pei Xin; Croarkin, Paul; Phang, Cheng Kar; Doruk, Deniz
2018-01-01
There is an unmet need for practical and reliable biomarkers for mood disorders in young adults. Identifying the brain activity associated with the early signs of depressive disorders could have important diagnostic and therapeutic implications. In this study we sought to investigate the EEG characteristics in young adults with newly identified depressive symptoms. Based on the initial screening, a total of 100 participants (n = 50 euthymic, n = 50 depressive) underwent 32-channel EEG acquisition. Simple logistic regression and C-statistic were used to explore if EEG power could be used to discriminate between the groups. The strongest EEG predictors of mood using multivariate logistic regression models. Simple logistic regression analysis with subsequent C-statistics revealed that only high-alpha and beta power originating from the left central cortex (C3) have a reliable discriminative value (ROC curve >0.7 (70%)) for differentiating the depressive group from the euthymic group. Multivariate regression analysis showed that the single most significant predictor of group (depressive vs. euthymic) is the high-alpha power over C3 (p = 0.03). The present findings suggest that EEG is a useful tool in the identification of neurophysiological correlates of depressive symptoms in young adults with no previous psychiatric history. Our results could guide future studies investigating the early neurophysiological changes and surrogate outcomes in depression. Copyright © 2017 Elsevier Ltd. All rights reserved.
Henrard, S; Speybroeck, N; Hermans, C
2015-11-01
Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
Finding the Perfect Match: Factors That Influence Family Medicine Residency Selection.
Wright, Katherine M; Ryan, Elizabeth R; Gatta, John L; Anderson, Lauren; Clements, Deborah S
2016-04-01
Residency program selection is a significant experience for emerging physicians, yet there is limited information about how applicants narrow their list of potential programs. This study examines factors that influence residency program selection among medical students interested in family medicine at the time of application. Medical students with an expressed interest in family medicine were invited to participate in a 37-item, online survey. Students were asked to rate factors that may impact residency selection on a 6-point Likert scale in addition to three open-ended qualitative questions. Mean values were calculated for each survey item and were used to determine a rank order for selection criteria. Logistic regression analysis was performed to identify factors that predict a strong interest in urban, suburban, and rural residency programs. Logistic regression was also used to identify factors that predict a strong interest in academic health center-based residencies, community-based residencies, and community-based residencies with an academic affiliation. A total of 705 medical students from 32 states across the country completed the survey. Location, work/life balance, and program structure (curriculum, schedule) were rated the most important factors for residency selection. Logistic regression analysis was used to refine our understanding of how each factor relates to specific types of residencies. These findings have implications for how to best advise students in selecting a residency, as well as marketing residencies to the right candidates. Refining the recruitment process will ensure a better fit between applicants and potential programs. Limited recruitment resources may be better utilized by focusing on targeted dissemination strategies.
Stata Modules for Calculating Novel Predictive Performance Indices for Logistic Models
Barkhordari, Mahnaz; Padyab, Mojgan; Hadaegh, Farzad; Azizi, Fereidoun; Bozorgmanesh, Mohammadreza
2016-01-01
Background Prediction is a fundamental part of prevention of cardiovascular diseases (CVD). The development of prediction algorithms based on the multivariate regression models loomed several decades ago. Parallel with predictive models development, biomarker researches emerged in an impressively great scale. The key question is how best to assess and quantify the improvement in risk prediction offered by new biomarkers or more basically how to assess the performance of a risk prediction model. Discrimination, calibration, and added predictive value have been recently suggested to be used while comparing the predictive performances of the predictive models’ with and without novel biomarkers. Objectives Lack of user-friendly statistical software has restricted implementation of novel model assessment methods while examining novel biomarkers. We intended, thus, to develop a user-friendly software that could be used by researchers with few programming skills. Materials and Methods We have written a Stata command that is intended to help researchers obtain cut point-free and cut point-based net reclassification improvement index and (NRI) and relative and absolute Integrated discriminatory improvement index (IDI) for logistic-based regression analyses.We applied the commands to a real data on women participating the Tehran lipid and glucose study (TLGS) to examine if information of a family history of premature CVD, waist circumference, and fasting plasma glucose can improve predictive performance of the Framingham’s “general CVD risk” algorithm. Results The command is addpred for logistic regression models. Conclusions The Stata package provided herein can encourage the use of novel methods in examining predictive capacity of ever-emerging plethora of novel biomarkers. PMID:27279830
Real-time Mainshock Forecast by Statistical Discrimination of Foreshock Clusters
NASA Astrophysics Data System (ADS)
Nomura, S.; Ogata, Y.
2016-12-01
Foreshock discremination is one of the most effective ways for short-time forecast of large main shocks. Though many large earthquakes accompany their foreshocks, discreminating them from enormous small earthquakes is difficult and only probabilistic evaluation from their spatio-temporal features and magnitude evolution may be available. Logistic regression is the statistical learning method best suited to such binary pattern recognition problems where estimates of a-posteriori probability of class membership are required. Statistical learning methods can keep learning discreminating features from updating catalog and give probabilistic recognition of forecast in real time. We estimated a non-linear function of foreshock proportion by smooth spline bases and evaluate the possibility of foreshocks by the logit function. In this study, we classified foreshocks from earthquake catalog by the Japan Meteorological Agency by single-link clustering methods and learned spatial and temporal features of foreshocks by the probability density ratio estimation. We use the epicentral locations, time spans and difference in magnitudes for learning and forecasting. Magnitudes of main shocks are also predicted our method by incorporating b-values into our method. We discuss the spatial pattern of foreshocks from the classifier composed by our model. We also implement a back test to validate predictive performance of the model by this catalog.
Anomaly detection driven active learning for identifying suspicious tracks and events in WAMI video
NASA Astrophysics Data System (ADS)
Miller, David J.; Natraj, Aditya; Hockenbury, Ryler; Dunn, Katherine; Sheffler, Michael; Sullivan, Kevin
2012-06-01
We describe a comprehensive system for learning to identify suspicious vehicle tracks from wide-area motion (WAMI) video. First, since the road network for the scene of interest is assumed unknown, agglomerative hierarchical clustering is applied to all spatial vehicle measurements, resulting in spatial cells that largely capture individual road segments. Next, for each track, both at the cell (speed, acceleration, azimuth) and track (range, total distance, duration) levels, extreme value feature statistics are both computed and aggregated, to form summary (p-value based) anomaly statistics for each track. Here, to fairly evaluate tracks that travel across different numbers of spatial cells, for each cell-level feature type, a single (most extreme) statistic is chosen, over all cells traveled. Finally, a novel active learning paradigm, applied to a (logistic regression) track classifier, is invoked to learn to distinguish suspicious from merely anomalous tracks, starting from anomaly-ranked track prioritization, with ground-truth labeling by a human operator. This system has been applied to WAMI video data (ARGUS), with the tracks automatically extracted by a system developed in-house at Toyon Research Corporation. Our system gives promising preliminary results in highly ranking as suspicious aerial vehicles, dismounts, and traffic violators, and in learning which features are most indicative of suspicious tracks.
Preoperative predictive model of recovery of urinary continence after radical prostatectomy.
Matsushita, Kazuhito; Kent, Matthew T; Vickers, Andrew J; von Bodman, Christian; Bernstein, Melanie; Touijer, Karim A; Coleman, Jonathan A; Laudone, Vincent T; Scardino, Peter T; Eastham, James A; Akin, Oguz; Sandhu, Jaspreet S
2015-10-01
To build a predictive model of urinary continence recovery after radical prostatectomy (RP) that incorporates magnetic resonance imaging (MRI) parameters and clinical data. We conducted a retrospective review of data from 2,849 patients who underwent pelvic staging MRI before RP from November 2001 to June 2010. We used logistic regression to evaluate the association between each MRI variable and continence at 6 or 12 months, adjusting for age, body mass index (BMI) and American Society of Anesthesiologists (ASA) score, and then used multivariable logistic regression to create our model. A nomogram was constructed using the multivariable logistic regression models. In all, 68% (1,742/2,559) and 82% (2,205/2,689) regained function at 6 and 12 months, respectively. In the base model, age, BMI and ASA score were significant predictors of continence at 6 or 12 months on univariate analysis (P < 0.005). Among the preoperative MRI measurements, membranous urethral length, which showed great significance, was incorporated into the base model to create the full model. For continence recovery at 6 months, the addition of membranous urethral length increased the area under the curve (AUC) to 0.664 for the validation set, an increase of 0.064 over the base model. For continence recovery at 12 months, the AUC was 0.674, an increase of 0.085 over the base model. Using our model, the likelihood of continence recovery increases with membranous urethral length and decreases with age, BMI and ASA score. This model could be used for patient counselling and for the identification of patients at high risk for urinary incontinence in whom to study changes in operative technique that improve urinary function after RP. © 2015 The Authors BJU International © 2015 BJU International Published by John Wiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
Staley, Dennis; Negri, Jacquelyn; Kean, Jason
2016-04-01
Population expansion into fire-prone steeplands has resulted in an increase in post-fire debris-flow risk in the western United States. Logistic regression methods for determining debris-flow likelihood and the calculation of empirical rainfall intensity-duration thresholds for debris-flow initiation represent two common approaches for characterizing hazard and reducing risk. Logistic regression models are currently being used to rapidly assess debris-flow hazard in response to design storms of known intensities (e.g. a 10-year recurrence interval rainstorm). Empirical rainfall intensity-duration thresholds comprise a major component of the United States Geological Survey (USGS) and the National Weather Service (NWS) debris-flow early warning system at a regional scale in southern California. However, these two modeling approaches remain independent, with each approach having limitations that do not allow for synergistic local-scale (e.g. drainage-basin scale) characterization of debris-flow hazard during intense rainfall. The current logistic regression equations consider rainfall a unique independent variable, which prevents the direct calculation of the relation between rainfall intensity and debris-flow likelihood. Regional (e.g. mountain range or physiographic province scale) rainfall intensity-duration thresholds fail to provide insight into the basin-scale variability of post-fire debris-flow hazard and require an extensive database of historical debris-flow occurrence and rainfall characteristics. Here, we present a new approach that combines traditional logistic regression and intensity-duration threshold methodologies. This method allows for local characterization of both the likelihood that a debris-flow will occur at a given rainfall intensity, the direct calculation of the rainfall rates that will result in a given likelihood, and the ability to calculate spatially explicit rainfall intensity-duration thresholds for debris-flow generation in recently burned areas. Our approach synthesizes the two methods by incorporating measured rainfall intensity into each model variable (based on measures of topographic steepness, burn severity and surface properties) within the logistic regression equation. This approach provides a more realistic representation of the relation between rainfall intensity and debris-flow likelihood, as likelihood values asymptotically approach zero when rainfall intensity approaches 0 mm/h, and increase with more intense rainfall. Model performance was evaluated by comparing predictions to several existing regional thresholds. The model, based upon training data collected in southern California, USA, has proven to accurately predict rainfall intensity-duration thresholds for other areas in the western United States not included in the original training dataset. In addition, the improved logistic regression model shows promise for emergency planning purposes and real-time, site-specific early warning. With further validation, this model may permit the prediction of spatially-explicit intensity-duration thresholds for debris-flow generation in areas where empirically derived regional thresholds do not exist. This improvement would permit the expansion of the early-warning system into other regions susceptible to post-fire debris flow.
Wright, David K.; MacEachern, Scott; Lee, Jaeyong
2014-01-01
The locations of diy-geδ-bay (DGB) sites in the Mandara Mountains, northern Cameroon are hypothesized to occur as a function of their ability to see and be seen from points on the surrounding landscape. A series of geostatistical, two-way and Bayesian logistic regression analyses were performed to test two hypotheses related to the intervisibility of the sites to one another and their visual prominence on the landscape. We determine that the intervisibility of the sites to one another is highly statistically significant when compared to 10 stratified-random permutations of DGB sites. Bayesian logistic regression additionally demonstrates that the visibility of the sites to points on the surrounding landscape is statistically significant. The location of sites appears to have also been selected on the basis of lower slope than random permutations of sites. Using statistical measures, many of which are not commonly employed in archaeological research, to evaluate aspects of visibility on the landscape, we conclude that the placement of DGB sites improved their conspicuousness for enhanced ritual, social cooperation and/or competition purposes. PMID:25383883
Ryu, Vin; Jon, Duk-In; Cho, Hyun Sang; Kim, Se Joo; Lee, Eun; Kim, Eun Joo; Seok, Jeong-Ho
2010-09-01
Suicide is a major concern for increasing mortality in bipolar patients, but risk factors for suicide in bipolar disorder remain complex, including Korean patients. Medical records of bipolar patients were retrospectively reviewed to detect significant clinical characteristics associated with suicide attempts. A total of 579 medical records were retrospectively reviewed. Bipolar patients were divided into two groups with the presence of a history of suicide attempts. We compared demographic characteristics and clinical features between the two groups using an analysis of covariance and chi-square tests. Finally, logistic regression was performed to evaluate significant risk factors associated with suicide attempts in bipolar disorder. The prevalence of suicide attempt was 13.1% in our patient group. The presence of a depressive first episode was significantly different between attempters and nonattempters. Logistic regression analysis revealed that depressive first episodes and bipolar II disorder were significantly associated with suicide attempts in those patients. Clinicians should consider the polarity of the first mood episode when evaluating suicide risk in bipolar patients. This study has some limitations as a retrospective study and further studies with a prospective design are needed to replicate and evaluate risk factors for suicide in patients with bipolar disorder.
Javed, Amna; Tiwana, Mohsin I.; Khan, Umar Shahbaz
2018-01-01
Brain Computer Interface (BCI) determines the intent of the user from a variety of electrophysiological signals. These signals, Slow Cortical Potentials, are recorded from scalp, and cortical neuronal activity is recorded by implanted electrodes. This paper is focused on design of an embedded system that is used to control the finger movements of an upper limb prosthesis using Electroencephalogram (EEG) signals. This is a follow-up of our previous research which explored the best method to classify three movements of fingers (thumb movement, index finger movement, and first movement). Two-stage logistic regression classifier exhibited the highest classification accuracy while Power Spectral Density (PSD) was used as a feature of the filtered signal. The EEG signal data set was recorded using a 14-channel electrode headset (a noninvasive BCI system) from right-handed, neurologically intact volunteers. Mu (commonly known as alpha waves) and Beta Rhythms (8–30 Hz) containing most of the movement data were retained through filtering using “Arduino Uno” microcontroller followed by 2-stage logistic regression to obtain a mean classification accuracy of 70%. PMID:29888252
Orlando, José Ignacio; van Keer, Karel; Barbosa Breda, João; Manterola, Hugo Luis; Blaschko, Matthew B; Clausse, Alejandro
2017-12-01
Diabetic retinopathy (DR) is one of the most widespread causes of preventable blindness in the world. The most dangerous stage of this condition is proliferative DR (PDR), in which the risk of vision loss is high and treatments are less effective. Fractal features of the retinal vasculature have been previously explored as potential biomarkers of DR, yet the current literature is inconclusive with respect to their correlation with PDR. In this study, we experimentally assess their discrimination ability to recognize PDR cases. A statistical analysis of the viability of using three reference fractal characterization schemes - namely box, information, and correlation dimensions - to identify patients with PDR is presented. These descriptors are also evaluated as input features for training ℓ1 and ℓ2 regularized logistic regression classifiers, to estimate their performance. Our results on MESSIDOR, a public dataset of 1200 fundus photographs, indicate that patients with PDR are more likely to exhibit a higher fractal dimension than healthy subjects or patients with mild levels of DR (P≤1.3×10-2). Moreover, a supervised classifier trained with both fractal measurements and red lesion-based features reports an area under the ROC curve of 0.93 for PDR screening and 0.96 for detecting patients with optic disc neovascularizations. The fractal dimension of the vasculature increases with the level of DR. Furthermore, PDR screening using multiscale fractal measurements is more feasible than using their derived fractal dimensions. Code and further resources are provided at https://github.com/ignaciorlando/fundus-fractal-analysis. © 2017 American Association of Physicists in Medicine.
Heart rate time series characteristics for early detection of infections in critically ill patients.
Tambuyzer, T; Guiza, F; Boonen, E; Meersseman, P; Vervenne, H; Hansen, T K; Bjerre, M; Van den Berghe, G; Berckmans, D; Aerts, J M; Meyfroidt, G
2017-04-01
It is difficult to make a distinction between inflammation and infection. Therefore, new strategies are required to allow accurate detection of infection. Here, we hypothesize that we can distinguish infected from non-infected ICU patients based on dynamic features of serum cytokine concentrations and heart rate time series. Serum cytokine profiles and heart rate time series of 39 patients were available for this study. The serum concentration of ten cytokines were measured using blood sampled every 10 min between 2100 and 0600 hours. Heart rate was recorded every minute. Ten metrics were used to extract features from these time series to obtain an accurate classification of infected patients. The predictive power of the metrics derived from the heart rate time series was investigated using decision tree analysis. Finally, logistic regression methods were used to examine whether classification performance improved with inclusion of features derived from the cytokine time series. The AUC of a decision tree based on two heart rate features was 0.88. The model had good calibration with 0.09 Hosmer-Lemeshow p value. There was no significant additional value of adding static cytokine levels or cytokine time series information to the generated decision tree model. The results suggest that heart rate is a better marker for infection than information captured by cytokine time series when the exact stage of infection is not known. The predictive value of (expensive) biomarkers should always be weighed against the routinely monitored data, and such biomarkers have to demonstrate added value.
Brenn, T; Arnesen, E
1985-01-01
For comparative evaluation, discriminant analysis, logistic regression and Cox's model were used to select risk factors for total and coronary deaths among 6595 men aged 20-49 followed for 9 years. Groups with mortality between 5 and 93 per 1000 were considered. Discriminant analysis selected variable sets only marginally different from the logistic and Cox methods which always selected the same sets. A time-saving option, offered for both the logistic and Cox selection, showed no advantage compared with discriminant analysis. Analysing more than 3800 subjects, the logistic and Cox methods consumed, respectively, 80 and 10 times more computer time than discriminant analysis. When including the same set of variables in non-stepwise analyses, all methods estimated coefficients that in most cases were almost identical. In conclusion, discriminant analysis is advocated for preliminary or stepwise analysis, otherwise Cox's method should be used.
School Socioeconomic Composition and Adolescent Sexual Initiation in Malawi.
Kim, Jinho
2015-09-01
Numerous studies have documented the determinants of sexual behavior among adolescents in less-developed countries, yet relatively little is known about the influence of social contexts such as school and neighborhood. Using two waves of data from a school-based longitudinal survey conducted in Malawi from 2011-13, this study advances our understanding of the relationship between school-level socioeconomic contexts and adolescents' sexual activity. The results from two-level multinomial logistic regression models suggest that high socioeconomic composition of the student body in school decreases the odds of initiation of sexual activity, independent of other important features of schools and individual-level characteristics. This study also finds that the association between school socioeconomic composition and sexual activity is statistically significant among male adolescents but not female adolescents, suggesting that schools' socioeconomic contexts may be more relevant to male adolescents' initiation of sexual activity. © 2015 The Population Council, Inc.
School socioeconomic composition and adolescent sexual initiation in Malawi
Kim, Jinho
2015-01-01
While numerous studies have documented the determinants of sexual behavior among adolescents in less developed countries, relatively little is known about the influence of social contexts such as school and neighborhood. Using two waves of data from a school-based longitudinal survey conducted in Malawi from 2011 to 2013, this study advances our understanding of the relationship between school-level socioeconomic contexts and adolescents’ sexual activity. The results from two-level multinomial logistic regression models suggest that high socioeconomic composition of the student body in school decreases the odds of initiating sexual activity, independently of other important features of schools as well as individual-level characteristics. This study also finds that the association between school socioeconomic composition and sexual activity is statistically significant only among males, but not females, suggesting that school’s socioeconomic contexts may be more relevant to male adolescents’ initiation of sexual activity. PMID:26347090
Corseuil Giehl, Maruí W; Hallal, Pedro C; Brownson, Ross C; d'Orsi, Eleonora
2017-02-01
To investigate the associations between perceived environment features and walking in older adults. A cross-sectional population-based study was performed in Florianopolis, Brazil, including 1,705 older adults (60+ years). Walking was measured by the International Physical Activity Questionnaire (IPAQ), and perceived environment was assessed through the Neighborhood Environment Walkability Scale. We conducted a multinomial logistic regression to examine the association between perceived environment and walking. The presence of sidewalks was related to both walking for transportation and for leisure. Existence of crosswalks in the neighborhood, safety during the day, presence of street lighting, recreational facilities, and having dog were significant predictors of walking for transportation. Safety during the day and social support were significantly associated with walking for leisure. The perceived environment may affect walking for specific purposes among older adults. Investments in the environment may increase physical activity levels of older adults in Brazil.
ERIC Educational Resources Information Center
DeMars, Christine E.
2009-01-01
The Mantel-Haenszel (MH) and logistic regression (LR) differential item functioning (DIF) procedures have inflated Type I error rates when there are large mean group differences, short tests, and large sample sizes.When there are large group differences in mean score, groups matched on the observed number-correct score differ on true score,…
Qiu, Yuchen; Yan, Shiju; Gundreddy, Rohith Reddy; Wang, Yunzhi; Cheng, Samuel; Liu, Hong; Zheng, Bin
2017-01-01
PURPOSE To develop and test a deep learning based computer-aided diagnosis (CAD) scheme of mammograms for classifying between malignant and benign masses. METHODS An image dataset involving 560 regions of interest (ROIs) extracted from digital mammograms was used. After down-sampling each ROI from 512×512 to 64×64 pixel size, we applied an 8 layer deep learning network that involves 3 pairs of convolution-max-pooling layers for automatic feature extraction and a multiple layer perceptron (MLP) classifier for feature categorization to process ROIs. The 3 pairs of convolution layers contain 20, 10, and 5 feature maps, respectively. Each convolution layer is connected with a max-pooling layer to improve the feature robustness. The output of the sixth layer is fully connected with a MLP classifier, which is composed of one hidden layer and one logistic regression layer. The network then generates a classification score to predict the likelihood of ROI depicting a malignant mass. A four-fold cross validation method was applied to train and test this deep learning network. RESULTS The results revealed that this CAD scheme yields an area under the receiver operation characteristic curve (AUC) of 0.696±0.044, 0.802±0.037, 0.836±0.036, and 0.822±0.035 for fold 1 to 4 testing datasets, respectively. The overall AUC of the entire dataset is 0.790±0.019. CONCLUSIONS This study demonstrates the feasibility of applying a deep learning based CAD scheme to classify between malignant and benign breast masses without a lesion segmentation, image feature computation and selection process. PMID:28436410
Qiu, Yuchen; Yan, Shiju; Gundreddy, Rohith Reddy; Wang, Yunzhi; Cheng, Samuel; Liu, Hong; Zheng, Bin
2017-01-01
To develop and test a deep learning based computer-aided diagnosis (CAD) scheme of mammograms for classifying between malignant and benign masses. An image dataset involving 560 regions of interest (ROIs) extracted from digital mammograms was used. After down-sampling each ROI from 512×512 to 64×64 pixel size, we applied an 8 layer deep learning network that involves 3 pairs of convolution-max-pooling layers for automatic feature extraction and a multiple layer perceptron (MLP) classifier for feature categorization to process ROIs. The 3 pairs of convolution layers contain 20, 10, and 5 feature maps, respectively. Each convolution layer is connected with a max-pooling layer to improve the feature robustness. The output of the sixth layer is fully connected with a MLP classifier, which is composed of one hidden layer and one logistic regression layer. The network then generates a classification score to predict the likelihood of ROI depicting a malignant mass. A four-fold cross validation method was applied to train and test this deep learning network. The results revealed that this CAD scheme yields an area under the receiver operation characteristic curve (AUC) of 0.696±0.044, 0.802±0.037, 0.836±0.036, and 0.822±0.035 for fold 1 to 4 testing datasets, respectively. The overall AUC of the entire dataset is 0.790±0.019. This study demonstrates the feasibility of applying a deep learning based CAD scheme to classify between malignant and benign breast masses without a lesion segmentation, image feature computation and selection process.
A comparative study: classification vs. user-based collaborative filtering for clinical prediction.
Hao, Fang; Blair, Rachael Hageman
2016-12-08
Recommender systems have shown tremendous value for the prediction of personalized item recommendations for individuals in a variety of settings (e.g., marketing, e-commerce, etc.). User-based collaborative filtering is a popular recommender system, which leverages an individuals' prior satisfaction with items, as well as the satisfaction of individuals that are "similar". Recently, there have been applications of collaborative filtering based recommender systems for clinical risk prediction. In these applications, individuals represent patients, and items represent clinical data, which includes an outcome. Application of recommender systems to a problem of this type requires the recasting a supervised learning problem as unsupervised. The rationale is that patients with similar clinical features carry a similar disease risk. As the "Big Data" era progresses, it is likely that approaches of this type will be reached for as biomedical data continues to grow in both size and complexity (e.g., electronic health records). In the present study, we set out to understand and assess the performance of recommender systems in a controlled yet realistic setting. User-based collaborative filtering recommender systems are compared to logistic regression and random forests with different types of imputation and varying amounts of missingness on four different publicly available medical data sets: National Health and Nutrition Examination Survey (NHANES, 2011-2012 on Obesity), Study to Understand Prognoses Preferences Outcomes and Risks of Treatment (SUPPORT), chronic kidney disease, and dermatology data. We also examined performance using simulated data with observations that are Missing At Random (MAR) or Missing Completely At Random (MCAR) under various degrees of missingness and levels of class imbalance in the response variable. Our results demonstrate that user-based collaborative filtering is consistently inferior to logistic regression and random forests with different imputations on real and simulated data. The results warrant caution for the collaborative filtering for the purpose of clinical risk prediction when traditional classification is feasible and practical. CF may not be desirable in datasets where classification is an acceptable alternative. We describe some natural applications related to "Big Data" where CF would be preferred and conclude with some insights as to why caution may be warranted in this context.
Satellite rainfall retrieval by logistic regression
NASA Technical Reports Server (NTRS)
Chiu, Long S.
1986-01-01
The potential use of logistic regression in rainfall estimation from satellite measurements is investigated. Satellite measurements provide covariate information in terms of radiances from different remote sensors.The logistic regression technique can effectively accommodate many covariates and test their significance in the estimation. The outcome from the logistical model is the probability that the rainrate of a satellite pixel is above a certain threshold. By varying the thresholds, a rainrate histogram can be obtained, from which the mean and the variant can be estimated. A logistical model is developed and applied to rainfall data collected during GATE, using as covariates the fractional rain area and a radiance measurement which is deduced from a microwave temperature-rainrate relation. It is demonstrated that the fractional rain area is an important covariate in the model, consistent with the use of the so-called Area Time Integral in estimating total rain volume in other studies. To calibrate the logistical model, simulated rain fields generated by rainfield models with prescribed parameters are needed. A stringent test of the logistical model is its ability to recover the prescribed parameters of simulated rain fields. A rain field simulation model which preserves the fractional rain area and lognormality of rainrates as found in GATE is developed. A stochastic regression model of branching and immigration whose solutions are lognormally distributed in some asymptotic limits has also been developed.
Zhang, Xingyu; Kim, Joyce; Patzer, Rachel E; Pitts, Stephen R; Patzer, Aaron; Schrager, Justin D
2017-10-26
To describe and compare logistic regression and neural network modeling strategies to predict hospital admission or transfer following initial presentation to Emergency Department (ED) triage with and without the addition of natural language processing elements. Using data from the National Hospital Ambulatory Medical Care Survey (NHAMCS), a cross-sectional probability sample of United States EDs from 2012 and 2013 survey years, we developed several predictive models with the outcome being admission to the hospital or transfer vs. discharge home. We included patient characteristics immediately available after the patient has presented to the ED and undergone a triage process. We used this information to construct logistic regression (LR) and multilayer neural network models (MLNN) which included natural language processing (NLP) and principal component analysis from the patient's reason for visit. Ten-fold cross validation was used to test the predictive capacity of each model and receiver operating curves (AUC) were then calculated for each model. Of the 47,200 ED visits from 642 hospitals, 6,335 (13.42%) resulted in hospital admission (or transfer). A total of 48 principal components were extracted by NLP from the reason for visit fields, which explained 75% of the overall variance for hospitalization. In the model including only structured variables, the AUC was 0.824 (95% CI 0.818-0.830) for logistic regression and 0.823 (95% CI 0.817-0.829) for MLNN. Models including only free-text information generated AUC of 0.742 (95% CI 0.731- 0.753) for logistic regression and 0.753 (95% CI 0.742-0.764) for MLNN. When both structured variables and free text variables were included, the AUC reached 0.846 (95% CI 0.839-0.853) for logistic regression and 0.844 (95% CI 0.836-0.852) for MLNN. The predictive accuracy of hospital admission or transfer for patients who presented to ED triage overall was good, and was improved with the inclusion of free text data from a patient's reason for visit regardless of modeling approach. Natural language processing and neural networks that incorporate patient-reported outcome free text may increase predictive accuracy for hospital admission.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd
Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test ofmore » the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.« less
NASA Astrophysics Data System (ADS)
Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd; Baharum, Adam
2015-10-01
Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test of the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.
A PCA aided cross-covariance scheme for discriminative feature extraction from EEG signals.
Zarei, Roozbeh; He, Jing; Siuly, Siuly; Zhang, Yanchun
2017-07-01
Feature extraction of EEG signals plays a significant role in Brain-computer interface (BCI) as it can significantly affect the performance and the computational time of the system. The main aim of the current work is to introduce an innovative algorithm for acquiring reliable discriminating features from EEG signals to improve classification performances and to reduce the time complexity. This study develops a robust feature extraction method combining the principal component analysis (PCA) and the cross-covariance technique (CCOV) for the extraction of discriminatory information from the mental states based on EEG signals in BCI applications. We apply the correlation based variable selection method with the best first search on the extracted features to identify the best feature set for characterizing the distribution of mental state signals. To verify the robustness of the proposed feature extraction method, three machine learning techniques: multilayer perceptron neural networks (MLP), least square support vector machine (LS-SVM), and logistic regression (LR) are employed on the obtained features. The proposed methods are evaluated on two publicly available datasets. Furthermore, we evaluate the performance of the proposed methods by comparing it with some recently reported algorithms. The experimental results show that all three classifiers achieve high performance (above 99% overall classification accuracy) for the proposed feature set. Among these classifiers, the MLP and LS-SVM methods yield the best performance for the obtained feature. The average sensitivity, specificity and classification accuracy for these two classifiers are same, which are 99.32%, 100%, and 99.66%, respectively for the BCI competition dataset IVa and 100%, 100%, and 100%, for the BCI competition dataset IVb. The results also indicate the proposed methods outperform the most recently reported methods by at least 0.25% average accuracy improvement in dataset IVa. The execution time results show that the proposed method has less time complexity after feature selection. The proposed feature extraction method is very effective for getting representatives information from mental states EEG signals in BCI applications and reducing the computational complexity of classifiers by reducing the number of extracted features. Copyright © 2017 Elsevier B.V. All rights reserved.
Statistical classification of drug incidents due to look-alike sound-alike mix-ups.
Wong, Zoie Shui Yee
2016-06-01
It has been recognised that medication names that look or sound similar are a cause of medication errors. This study builds statistical classifiers for identifying medication incidents due to look-alike sound-alike mix-ups. A total of 227 patient safety incident advisories related to medication were obtained from the Canadian Patient Safety Institute's Global Patient Safety Alerts system. Eight feature selection strategies based on frequent terms, frequent drug terms and constituent terms were performed. Statistical text classifiers based on logistic regression, support vector machines with linear, polynomial, radial-basis and sigmoid kernels and decision tree were trained and tested. The models developed achieved an average accuracy of above 0.8 across all the model settings. The receiver operating characteristic curves indicated the classifiers performed reasonably well. The results obtained in this study suggest that statistical text classification can be a feasible method for identifying medication incidents due to look-alike sound-alike mix-ups based on a database of advisories from Global Patient Safety Alerts. © The Author(s) 2014.
Tu, Zhi-bin; Cui, Meng-jing; Yao, Hong-yan; Hu, Guo-qing; Xiang, Hui-yun; Stallones, Lorann; Zhang, Xu-jun
2012-04-01
To explore the risk factors on cases regarding work-related acute pesticide poisoning among farmers of Jiangsu province. A population-based, 1:2 matched case-control study was carried out, with 121 patients as case-group paired by 242 persons with same gender, district and age less then difference of 3 years, as controls. Cases were the ones who had suffered from work-related acute pesticide poisoning. A unified questionnaire was used. Data base was established by EpiData 3.1, and SPSS 16.0 was used for both data single factor and multi-conditional logistics regression analysis. Results from the single factor logistic regression analysis showed that the related risk factors were: lack of safety guidance, lack of readable labels before praying pesticides, no regression during application, using hand to wipe sweat, using leaking knapsack, body contaminated during application and continuing to work when feeling ill after the contact of pesticides. Results from multi-conditional logistic regression analysis indicated that the lack of safety guidance (OR=2.25, 95%CI: 1.35-3.74), no readable labels before praying pesticides (OR=1.95, 95%CI: 1.19-3.18), wiping the sweat by hand during application (OR=1.97, 95%CI: 1.20-3.24) and using leaking knapsack during application (OR=1.82, 95%CI:1.10-3.01) were risk factors for the occurrence of work-related acute pesticide poisoning. The lack of safety guidance, no readable labels before praying pesticides, wiping the sweat by hand or using leaking knapsack during application were correlated to the occurrence of work-related acute pesticide poisoning.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nawrocki, J; Chino, J; Craciunescu, O
Purpose: We propose a method to examine gynecological tumor heterogeneity using texture analysis in the context of an adaptive PET protocol in order to establish if texture metrics from baseline PET-CT predict tumor response better than SUV metrics alone as well as determine texture features correlating with tumor response during radiation therapy. Methods: This IRB approved protocol included 29 women with node positive gynecological cancers visible on FDG-PET treated with EBRT to the PET positive nodes. A baseline and intra-treatment PET-CT was obtained. Tumor outcome was determined based on RECIST on posttreatment PET-CT. Primary GTVs were segmented using 40% thresholdmore » and a semi-automatic gradient-based contouring tool, PET Edge (MIM Software Inc., Cleveland, OH). SUV histogram features, Metabolic Volume (MV), and Total Lesion Glycolysis (TLG) were calculated. Four 3D texture matrices describing local and regional relationships between voxel intensities in the GTV were generated: co-occurrence, run length, size zone, and neighborhood difference. From these, 39 texture features were calculated. Prognostic power of baseline features derived from gradientbased and threshold GTVs were determined using the Wilcoxon rank-sum test. Receiver Operating Characteristics and logistic regression was performed using JMP (SAS Institute Inc., Cary, NC) to find probabilities of predicting response. Changes in features during treatment were determined using the Wilcoxon signed-rank test. Results: Of the 29 patients, there were 16 complete responders, 7 partial responders, and 6 non-responders. Comparing CR/PR vs. NR for gradient-based GTVs, 7 texture values, TLG, and SUV kurtosis had a p < 0.05. Threshold GTVs yielded 4 texture features and TLG with p < 0.05. From baseline to intra-treatment, 14 texture features, SUVmean, SUVmax, MV, and TLG changed with p < 0.05. Conclusion: Texture analysis of PET imaged gynecological tumors is an effective method for early prognosis and should be used complimentary to SUV metrics, especially when using gradient based segmentation.« less
Prognostic features and clinical presentation of acute idiopathic enterocolitis in horses
Staempfli, Henry R.; Townsend, Hugh G.G.; Prescott, John F.
1991-01-01
Clinical and hematological changes observed on presentation of 47 horses referred to the Ontario Veterinary College with acute idiopathic colitis were analyzed for their prognostic features. Cases of acute enterocolitis were characterized by fever, dehydration, abnormalities of serum electrolyte concentrations, azotemia, hypoalbuminemia, and increased serum concentrations of muscle enzymes. Severely dehydrated horses were seven times more likely to die or be euthanized than those that were not dehydrated. Other factors associated with failure to survive included the following: increased hematocrit, increased number of band neutrophils, increased serum creatinine and urea concentrations, and decreased blood pH and increasingly negative base excess. The results of multivariate variable analysis (stepwise logistic regression) suggested that, among the variables tested, base excess was the best predictor of death or survival. Twenty of 47 horses died or were euthanized. Reasons for death or euthanasia included: severe disseminated intravascular coagulation, unresponsiveness of severe metabolic acidosis and hypoproteinemia to treatments, and severity of colonic lesions on exploratory laparotomy. Of the surviving horses, three developed chronic laminitis (two were destroyed) and five developed jugular vein thrombosis. Fourteen of 16 horses for which subsequent histories were available returned to normal function. Early recognition of the disease, combined with early and aggressive correction of dehydration and of acid-base imbalance, may be important determinants of survival in horses with acute idiopathic colitis. PMID:17423769
Gao, Chao; Sun, Hanbo; Wang, Tuo; Tang, Ming; Bohnen, Nicolaas I; Müller, Martijn L T M; Herman, Talia; Giladi, Nir; Kalinin, Alexandr; Spino, Cathie; Dauer, William; Hausdorff, Jeffrey M; Dinov, Ivo D
2018-05-08
In this study, we apply a multidisciplinary approach to investigate falls in PD patients using clinical, demographic and neuroimaging data from two independent initiatives (University of Michigan and Tel Aviv Sourasky Medical Center). Using machine learning techniques, we construct predictive models to discriminate fallers and non-fallers. Through controlled feature selection, we identified the most salient predictors of patient falls including gait speed, Hoehn and Yahr stage, postural instability and gait difficulty-related measurements. The model-based and model-free analytical methods we employed included logistic regression, random forests, support vector machines, and XGboost. The reliability of the forecasts was assessed by internal statistical (5-fold) cross validation as well as by external out-of-bag validation. Four specific challenges were addressed in the study: Challenge 1, develop a protocol for harmonizing and aggregating complex, multisource, and multi-site Parkinson's disease data; Challenge 2, identify salient predictive features associated with specific clinical traits, e.g., patient falls; Challenge 3, forecast patient falls and evaluate the classification performance; and Challenge 4, predict tremor dominance (TD) vs. posture instability and gait difficulty (PIGD). Our findings suggest that, compared to other approaches, model-free machine learning based techniques provide a more reliable clinical outcome forecasting of falls in Parkinson's patients, for example, with a classification accuracy of about 70-80%.
Chung, Doo Yong; Cho, Kang Su; Lee, Dae Hun; Han, Jang Hee; Kang, Dong Hyuk; Jung, Hae Do; Kown, Jong Kyou; Ham, Won Sik; Choi, Young Deuk; Lee, Joo Yong
2015-01-01
Purpose This study was conducted to evaluate colic pain as a prognostic pretreatment factor that can influence ureter stone clearance and to estimate the probability of stone-free status in shock wave lithotripsy (SWL) patients with a ureter stone. Materials and Methods We retrospectively reviewed the medical records of 1,418 patients who underwent their first SWL between 2005 and 2013. Among these patients, 551 had a ureter stone measuring 4–20 mm and were thus eligible for our analyses. The colic pain as the chief complaint was defined as either subjective flank pain during history taking and physical examination. Propensity-scores for established for colic pain was calculated for each patient using multivariate logistic regression based upon the following covariates: age, maximal stone length (MSL), and mean stone density (MSD). Each factor was evaluated as predictor for stone-free status by Bayesian and non-Bayesian logistic regression model. Results After propensity-score matching, 217 patients were extracted in each group from the total patient cohort. There were no statistical differences in variables used in propensity- score matching. One-session success and stone-free rate were also higher in the painful group (73.7% and 71.0%, respectively) than in the painless group (63.6% and 60.4%, respectively). In multivariate non-Bayesian and Bayesian logistic regression models, a painful stone, shorter MSL, and lower MSD were significant factors for one-session stone-free status in patients who underwent SWL. Conclusions Colic pain in patients with ureter calculi was one of the significant predicting factors including MSL and MSD for one-session stone-free status of SWL. PMID:25902059
Fitzpatrick, Cole D; Rakasi, Saritha; Knodler, Michael A
2017-01-01
Speed is one of the most important factors in traffic safety as higher speeds are linked to increased crash risk and higher injury severities. Nearly a third of fatal crashes in the United States are designated as "speeding-related", which is defined as either "the driver behavior of exceeding the posted speed limit or driving too fast for conditions." While many studies have utilized the speeding-related designation in safety analyses, no studies have examined the underlying accuracy of this designation. Herein, we investigate the speeding-related crash designation through the development of a series of logistic regression models that were derived from the established speeding-related crash typologies and validated using a blind review, by multiple researchers, of 604 crash narratives. The developed logistic regression model accurately identified crashes which were not originally designated as speeding-related but had crash narratives that suggested speeding as a causative factor. Only 53.4% of crashes designated as speeding-related contained narratives which described speeding as a causative factor. Further investigation of these crashes revealed that the driver contributing code (DCC) of "driving too fast for conditions" was being used in three separate situations. Additionally, this DCC was also incorrectly used when "exceeding the posted speed limit" would likely have been a more appropriate designation. Finally, it was determined that the responding officer only utilized one DCC in 82% of crashes not designated as speeding-related but contained a narrative indicating speed as a contributing causal factor. The use of logistic regression models based upon speeding-related crash typologies offers a promising method by which all possible speeding-related crashes could be identified. Published by Elsevier Ltd.
Espinosa, Alejandro Martínez
2018-01-01
International evidence regarding the relationship between maternal employment and school-age children overweight and obesity shows divergent results. In Mexico, this relationship has not been confirmed by national data sets analysis. Consequently, the objective of this article was to evaluate the role of the mothers' participation in labor force related to excess body weight in Mexican school-age children (aged 5-11 years). A cross-sectional study was conducted on a sample of 17,418 individuals from the National Health and Nutrition Survey 2012, applying binomial logistic regression models. After controlling for individual, maternal and contextual features, the mothers' participation in labor force was associated with children body composition. However, when the household features (living arrangements, household ethnicity, size, food security and socioeconomic status) were incorporated, maternal employment was no longer statically significant. Household features are crucial factors for understanding the overweight and obesity prevalence levels in Mexican school-age children, despite the mother having a paid job. Copyright: © 2018 Permanyer.
ERIC Educational Resources Information Center
Lee, Young-Jin
2017-01-01
Purpose: The purpose of this paper is to develop a quantitative model of problem solving performance of students in the computer-based mathematics learning environment. Design/methodology/approach: Regularized logistic regression was used to create a quantitative model of problem solving performance of students that predicts whether students can…
Measuring the Impact of Inquiry-Based Learning on Outcomes and Student Satisfaction
ERIC Educational Resources Information Center
Zafra-Gómez, José Luis; Román-Martínez, Isabel; Gómez-Miranda, María Elena
2015-01-01
The aim of this study is to determine the impact of inquiry-based learning (IBL) on students' academic performance and to assess their satisfaction with the process. Linear and logistic regression analyses show that examination grades are positively related to attendance at classes and tutorials; moreover, there is a positive significant…
Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke
2008-05-01
Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.
Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke
2008-01-01
Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods. PMID:18452616
NASA Astrophysics Data System (ADS)
Zeraatpisheh, Mojtaba; Ayoubi, Shamsollah; Jafari, Azam; Finke, Peter
2017-05-01
The efficiency of different digital and conventional soil mapping approaches to produce categorical maps of soil types is determined by cost, sample size, accuracy and the selected taxonomic level. The efficiency of digital and conventional soil mapping approaches was examined in the semi-arid region of Borujen, central Iran. This research aimed to (i) compare two digital soil mapping approaches including Multinomial logistic regression and random forest, with the conventional soil mapping approach at four soil taxonomic levels (order, suborder, great group and subgroup levels), (ii) validate the predicted soil maps by the same validation data set to determine the best method for producing the soil maps, and (iii) select the best soil taxonomic level by different approaches at three sample sizes (100, 80, and 60 point observations), in two scenarios with and without a geomorphology map as a spatial covariate. In most predicted maps, using both digital soil mapping approaches, the best results were obtained using the combination of terrain attributes and the geomorphology map, although differences between the scenarios with and without the geomorphology map were not significant. Employing the geomorphology map increased map purity and the Kappa index, and led to a decrease in the 'noisiness' of soil maps. Multinomial logistic regression had better performance at higher taxonomic levels (order and suborder levels); however, random forest showed better performance at lower taxonomic levels (great group and subgroup levels). Multinomial logistic regression was less sensitive than random forest to a decrease in the number of training observations. The conventional soil mapping method produced a map with larger minimum polygon size because of traditional cartographic criteria used to make the geological map 1:100,000 (on which the conventional soil mapping map was largely based). Likewise, conventional soil mapping map had also a larger average polygon size that resulted in a lower level of detail. Multinomial logistic regression at the order level (map purity of 0.80), random forest at the suborder (map purity of 0.72) and great group level (map purity of 0.60), and conventional soil mapping at the subgroup level (map purity of 0.48) produced the most accurate maps in the study area. The multinomial logistic regression method was identified as the most effective approach based on a combined index of map purity, map information content, and map production cost. The combined index also showed that smaller sample size led to a preference for the order level, while a larger sample size led to a preference for the great group level.
Raines, G.L.; Mihalasky, M.J.
2002-01-01
The U.S. Geological Survey (USGS) is proposing to conduct a global mineral-resource assessment using geologic maps, significant deposits, and exploration history as minimal data requirements. Using a geologic map and locations of significant pluton-related deposits, the pluton-related-deposit tract maps from the USGS national mineral-resource assessment have been reproduced with GIS-based analysis and modeling techniques. Agreement, kappa, and Jaccard's C correlation statistics between the expert USGS and calculated tract maps of 87%, 40%, and 28%, respectively, have been achieved using a combination of weights-of-evidence and weighted logistic regression methods. Between the experts' and calculated maps, the ranking of states measured by total permissive area correlates at 84%. The disagreement between the experts and calculated results can be explained primarily by tracts defined by geophysical evidence not considered in the calculations, generalization of tracts by the experts, differences in map scales, and the experts' inclusion of large tracts that are arguably not permissive. This analysis shows that tracts for regional mineral-resource assessment approximating those delineated by USGS experts can be calculated using weights of evidence and weighted logistic regression, a geologic map, and the location of significant deposits. Weights of evidence and weighted logistic regression applied to a global geologic map could provide quickly a useful reconnaissance definition of tracts for mineral assessment that is tied to the data and is reproducible. ?? 2002 International Association for Mathematical Geology.
Screening for ketosis using multiple logistic regression based on milk yield and composition.
Kayano, Mitsunori; Kataoka, Tomoko
2015-11-01
Multiple logistic regression was applied to milk yield and composition data for 632 records of healthy cows and 61 records of ketotic cows in Hokkaido, Japan. The purpose was to diagnose ketosis based on milk yield and composition, simultaneously. The cows were divided into two groups: (1) multiparous, including 314 healthy cows and 45 ketotic cows and (2) primiparous, including 318 healthy cows and 16 ketotic cows, since nutritional status, milk yield and composition are affected by parity. Multiple logistic regression was applied to these groups separately. For multiparous cows, milk yield (kg/day/cow) and protein-to-fat (P/F) ratio in milk were significant factors (P<0.05) for the diagnosis of ketosis. For primiparous cows, lactose content (%), solid not fat (SNF) content (%) and milk urea nitrogen (MUN) content (mg/dl) were significantly associated with ketosis (P<0.01). A diagnostic rule was constructed for each group of cows: (1) 9.978 × P/F ratio + 0.085 × milk yield <10 and (2) 2.327 × SNF - 2.703 × lactose + 0.225 × MUN <10. The sensitivity, specificity and the area under the curve (AUC) of the diagnostic rules were (1) 0.800, 0.729 and 0.811; (2) 0.813, 0.730 and 0.787, respectively. The P/F ratio, which is a widely used measure of ketosis, provided the sensitivity, specificity and AUC values of (1) 0.711, 0.726 and 0.781; and (2) 0.678, 0.767 and 0.738, respectively.
Ai Er Ken, Ai Bi Bai; Ma, Zhi-Hua; Xiong, Dai-Qin; Xu, Pei-Ru
2017-04-01
To investigate the clinical features of invasive candidiasis in children and the risk factors for Candida bloodstream infection. A retrospective study was performed on 134 children with invasive candidiasis and hospitalized in 5 tertiary hospitals in Urumqi, China, between January 2010 and December 2015. The Candida species distribution was investigated. The clinical data were compared between the patients with and without Candida bloodstream infection. The risk factors for Candida bloodstream infection were investigated using multivariate logistic regression analysis. A total of 134 Candida strains were isolated from 134 children with invasive candidiasis, and non-albicans Candida (NAC) accounted for 53.0%. The incidence of invasive candidiasis in the PICU and other pediatric wards were 41.8% and 48.5% respectively. Sixty-eight patients (50.7%) had Candida bloodstream infection, and 45 patients (33.6%) had Candida urinary tract infection. There were significant differences in age, rate of use of broad-spectrum antibiotics, and incidence rates of chronic renal insufficiency, heart failure, urinary catheterization, and NAC infection between the patients with and without Candida bloodstream infection (P<0.05). The multivariate logistic regression analysis showed that younger age (1-24 months) (OR=6.027) and NAC infection (OR=1.020) were the independent risk factors for Candida bloodstream infection. The incidence of invasive candidiasis is similar between the PICU and other pediatric wards. NAC is the most common species of invasive candidiasis. Candida bloodstream infection is the most common invasive infection. Younger age (1-24 months) and NAC infection are the risk factors for Candida bloodstream infection.
Ruochuan, Zang; Shugeng, Guo; Jie, He; Yousheng, Mao; Qi, Xue; Dali, Wang; Juwei, Mu; Jun, Zhao; Yonggang, Wang; Xiangyang, Liu; Fengwei, Tan; Gefei, Zhao; Qian, Zhang; Moyan, Zhang; Peng, Song
2015-04-01
To explore the relationship between the lymph node metastasis and clinicopathological features in patients with clinical stage T1a non-small cell lung cancer (NSCLC). Clinicopathological data of a total of 418 patients who underwent lobectomy and systematic lymph node dissection were retrospectively analyzed. Logistic regression was used to analyze the relationship between lymph node metastasis and clinicopathological features. Lymph node metastasis was observed in 25 patients. There were 122 patients who were diagnosed as ground glass opacity with no lymph node metastasis. 399 patients had subcarinal dissection, among them 7 patients were found to have lymph node metastasis. Univariate analysis showed that gender, smoking history, diameter of lymph node, ground glass opacity (GGO), differentiation of the tumor and tumor site were the factors affecting lymph node metastasis (all P < 0.05). Logistic regression analysis showed that diameter of lymph node, differentiation of the tumor and the site of lesion were independent risk factors for lymph node metastasis of NSCLC. Tumor in the left lung, poor differentiation, and diameter of lymph nodes ≥ 1 cm on the preoperative CT image are independent risk factors for lymph node metastasis of NSCLC, hence we should pay attention before surgery and systematic lymph node dissection should be done. For patients with poor differentiation and lymph nodes ≥ 1 cm, subcarinal lymph nodes dissection is recommended for the sake of higher possibility of lymph node metastasis. For patients with ground glass opacity ≤ 2 cm, the lymph node metastasis is extremely rare, therefore, selective lymph node dissection is reconmmended.
Asik, Mehmet; Altinbas, Kursat; Eroglu, Mustafa; Karaahmet, Elif; Erbag, Gokhan; Ertekin, Hulya; Sen, Hacer
2015-10-01
Women with polycystic ovary syndrome (PCOS) are reported to experience depressive episodes at a higher rate than healthy controls (HC). Affective temperament features are psychiatric markers that may help to predict and identify vulnerability to depression in women with PCOS. Our aim was to evaluate the affective temperaments of women with PCOS and to investigate the association with depression and anxiety levels and laboratory variables in comparison with HC. The study included 71 women with PCOS and 50 HC. Hormonal evaluations were performed for women with PCOS. Physical examination, clinical history, Hospital Anxiety and Depression Scale (HADS) and TEMPS-A were performed for all subjects. Differences between groups were evaluated using Student's t-tests and Mann-Whitney U tests. Correlations and logistic regression tests were performed. All temperament subtype scores, except hyperthymic, and HADS anxiety, depression, and total scores were significantly higher in patients with PCOS compared to HC. A statistically significant positive correlation was found between BMI and irritable temperament, and insulin and HADS depression scores in patients with PCOS. Additionally, hirsutism score and menstrual irregularity were correlated with HADS depression, anxiety and total scores in PCOS patients. In logistic regression analysis, depression was not affected by PCOS, hirsutism score or menstrual irregularity. However, HADS anxiety score was associated with hirsutism score. Our study is the first to evaluate the affective temperament features of women with PCOS. Consequently, establishing affective temperament properties for women with PCOS may help clinicians predict those patients with PCOS who are at risk for depressive and anxiety disorders. Copyright © 2015 Elsevier B.V. All rights reserved.
Fate of abstracts presented at the 2008 European Congress of Physical and Rehabilitation Medicine.
Allart, E; Beaucamp, F; Tiffreau, V; Thevenon, A
2015-08-01
The subsequent full-text publication of abstracts presented at a scientific congress reflects the latter's scientific quality. The aim of this paper was to evaluate the publication rate for abstracts presented at the 2008 European Congress of Physical and Rehabilitation Medicine (ECPRM), characterize the publications and identify factors that were predictive of publication. It is a bibliography search. We used the PubMed database to search for subsequent publication of abstracts. We screened the abstracts' characteristics for features that were predictive of publication among abstracts features, such the status of the authors, the topic and the type of work. We performed univariate analyses and a logistic regression analysis. Of 779 abstracts presented at ECPRM 2008, 169 (21.2%) were subsequently published. The mean time to publication was 12±15.7 months and the mean impact factor of the publishing journals was 2.05±2.1. In a univariate analysis, university status (P<10-6), geographic origin (P=10-3), oral presentation (P<10-6), and original research (P<10-6) (and particularly multicentre trials [P<0.01] and randomized controlled trials [P=10-3]) were predictive of publication. In a logistic regression analysis, oral presentation (odds ratio [OR]=0.37) and university status (OR=0.36) were significant, independent predictors of publication. ECPRM 2008 publication rate and impact factor were relatively low, when compared with most other national and international conferences in this field. University status, the type of abstract and oral presentation were predictive of subsequent publication.
Characteristics of older adult problem gamblers calling a gambling helpline.
Potenza, Marc N; Steinberg, Marvin A; Wu, Ran; Rounsaville, Bruce J; O'malley, Stephanie S
2006-06-01
Few investigations have characterized groups of older adults with gambling problems, and published reports are currently limited by small samples of older adult problem gamblers. Gambling helplines represent a widespread mechanism for assisting problem gamblers to move into treatment settings. Given data from older adult problem gamblers in treatment, we hypothesized that older as compared with younger adult problem gamblers calling a gambling helpline would be less likely to report gambling-related problems. Logistic regression analyses were performed on data obtained from January 1, 2000 to December 31, 2001, inclusive, from callers with gambling problems (N = 1,084) contacting the Connecticut Council on Problem Gambling Helpline. Of the 1,018 phone calls used in the logistic regression analyses, 168 (16.5%) were from older adults and 850 (83.5%) from younger adults. Age-related differences were observed in demographic features, types and patterns of gambling reported as problematic, gambling-related problems and psychiatric symptoms, substance use problems, patterns of indebtedness, and family histories of addictive disorders. Older as compared with younger adult problem gamblers were more likely to report having lower incomes, longer durations of gambling, fewer types of problematic gambling, and problems with casino slot machine gambling and less likely to report gambling-related anxiety, family problems, illegal behaviors and arrests, drug problems, indebtedness to bookies or acquaintances, family histories of drug abuse, and problems with casino table gambling. Older as compared with younger adult problem gamblers calling a gambling helpline differ on many clinically relevant features. The findings suggest the need for improved and unique prevention and treatment strategies for older adults with gambling problems.
NASA Astrophysics Data System (ADS)
Lombardo, L.; Cama, M.; Maerker, M.; Parisi, L.; Rotigliano, E.
2014-12-01
This study aims at comparing the performances of Binary Logistic Regression (BLR) and Boosted Regression Trees (BRT) methods in assessing landslide susceptibility for multiple-occurrence regional landslide events within the Mediterranean region. A test area was selected in the north-eastern sector of Sicily (southern Italy), corresponding to the catchments of the Briga and the Giampilieri streams both stretching for few kilometres from the Peloritan ridge (eastern Sicily, Italy) to the Ionian sea. This area was struck on the 1st October 2009 by an extreme climatic event resulting in thousands of rapid shallow landslides, mainly of debris flows and debris avalanches types involving the weathered layer of a low to high grade metamorphic bedrock. Exploiting the same set of predictors and the 2009 landslide archive, BLR- and BRT-based susceptibility models were obtained for the two catchments separately, adopting a random partition (RP) technique for validation; besides, the models trained in one of the two catchments (Briga) were tested in predicting the landslide distribution in the other (Giampilieri), adopting a spatial partition (SP) based validation procedure. All the validation procedures were based on multi-folds tests so to evaluate and compare the reliability of the fitting, the prediction skill, the coherence in the predictor selection and the precision of the susceptibility estimates. All the obtained models for the two methods produced very high predictive performances, with a general congruence between BLR and BRT in the predictor importance. In particular, the research highlighted that BRT-models reached a higher prediction performance with respect to BLR-models, for RP based modelling, whilst for the SP-based models the difference in predictive skills between the two methods dropped drastically, converging to an analogous excellent performance. However, when looking at the precision of the probability estimates, BLR demonstrated to produce more robust models in terms of selected predictors and coefficients, as well as of dispersion of the estimated probabilities around the mean value for each mapped pixel. The difference in the behaviour could be interpreted as the result of overfitting effects, which heavily affect decision tree classification more than logistic regression techniques.
Mapping of the DLQI scores to EQ-5D utility values using ordinal logistic regression.
Ali, Faraz Mahmood; Kay, Richard; Finlay, Andrew Y; Piguet, Vincent; Kupfer, Joerg; Dalgard, Florence; Salek, M Sam
2017-11-01
The Dermatology Life Quality Index (DLQI) and the European Quality of Life-5 Dimension (EQ-5D) are separate measures that may be used to gather health-related quality of life (HRQoL) information from patients. The EQ-5D is a generic measure from which health utility estimates can be derived, whereas the DLQI is a specialty-specific measure to assess HRQoL. To reduce the burden of multiple measures being administered and to enable a more disease-specific calculation of health utility estimates, we explored an established mathematical technique known as ordinal logistic regression (OLR) to develop an appropriate model to map DLQI data to EQ-5D-based health utility estimates. Retrospective data from 4010 patients were randomly divided five times into two groups for the derivation and testing of the mapping model. Split-half cross-validation was utilized resulting in a total of ten ordinal logistic regression models for each of the five EQ-5D dimensions against age, sex, and all ten items of the DLQI. Using Monte Carlo simulation, predicted health utility estimates were derived and compared against those observed. This method was repeated for both OLR and a previously tested mapping methodology based on linear regression. The model was shown to be highly predictive and its repeated fitting demonstrated a stable model using OLR as well as linear regression. The mean differences between OLR-predicted health utility estimates and observed health utility estimates ranged from 0.0024 to 0.0239 across the ten modeling exercises, with an average overall difference of 0.0120 (a 1.6% underestimate, not of clinical importance). This modeling framework developed in this study will enable researchers to calculate EQ-5D health utility estimates from a specialty-specific study population, reducing patient and economic burden.
Wang, X; Xu, Y H; Du, Z Y; Qian, Y J; Xu, Z H; Chen, R; Shi, M H
2018-02-23
Objective: This study aims to analyze the relationship among the clinical features, radiologic characteristics and pathological diagnosis in patients with solitary pulmonary nodules, and establish a prediction model for the probability of malignancy. Methods: Clinical data of 372 patients with solitary pulmonary nodules who underwent surgical resection with definite postoperative pathological diagnosis were retrospectively analyzed. In these cases, we collected clinical and radiologic features including gender, age, smoking history, history of tumor, family history of cancer, the location of lesion, ground-glass opacity, maximum diameter, calcification, vessel convergence sign, vacuole sign, pleural indentation, speculation and lobulation. The cases were divided to modeling group (268 cases) and validation group (104 cases). A new prediction model was established by logistic regression analying the data from modeling group. Then the data of validation group was planned to validate the efficiency of the new model, and was compared with three classical models(Mayo model, VA model and LiYun model). With the calculated probability values for each model from validation group, SPSS 22.0 was used to draw the receiver operating characteristic curve, to assess the predictive value of this new model. Results: 112 benign SPNs and 156 malignant SPNs were included in modeling group. Multivariable logistic regression analysis showed that gender, age, history of tumor, ground -glass opacity, maximum diameter, and speculation were independent predictors of malignancy in patients with SPN( P <0.05). We calculated a prediction model for the probability of malignancy as follow: p =e(x)/(1+ e(x)), x=-4.8029-0.743×gender+ 0.057×age+ 1.306×history of tumor+ 1.305×ground-glass opacity+ 0.051×maximum diameter+ 1.043×speculation. When the data of validation group was added to the four-mathematical prediction model, The area under the curve of our mathematical prediction model was 0.742, which is greater than other models (Mayo 0.696, VA 0.634, LiYun 0.681), while the differences between any two of the four models were not significant ( P >0.05). Conclusions: Age of patient, gender, history of tumor, ground-glass opacity, maximum diameter and speculation are independent predictors of malignancy in patients with solitary pulmonary nodule. This logistic regression prediction mathematic model is not inferior to those classical models in estimating the prognosis of SPNs.
A comparative study on entrepreneurial attitudes modeled with logistic regression and Bayes nets.
López Puga, Jorge; García García, Juan
2012-11-01
Entrepreneurship research is receiving increasing attention in our context, as entrepreneurs are key social agents involved in economic development. We compare the success of the dichotomic logistic regression model and the Bayes simple classifier to predict entrepreneurship, after manipulating the percentage of missing data and the level of categorization in predictors. A sample of undergraduate university students (N = 1230) completed five scales (motivation, attitude towards business creation, obstacles, deficiencies, and training needs) and we found that each of them predicted different aspects of the tendency to business creation. Additionally, our results show that the receiver operating characteristic (ROC) curve is affected by the rate of missing data in both techniques, but logistic regression seems to be more vulnerable when faced with missing data, whereas Bayes nets underperform slightly when categorization has been manipulated. Our study sheds light on the potential entrepreneur profile and we propose to use Bayesian networks as an additional alternative to overcome the weaknesses of logistic regression when missing data are present in applied research.
Campos-Filho, N; Franco, E L
1989-02-01
A frequent procedure in matched case-control studies is to report results from the multivariate unmatched analyses if they do not differ substantially from the ones obtained after conditioning on the matching variables. Although conceptually simple, this rule requires that an extensive series of logistic regression models be evaluated by both the conditional and unconditional maximum likelihood methods. Most computer programs for logistic regression employ only one maximum likelihood method, which requires that the analyses be performed in separate steps. This paper describes a Pascal microcomputer (IBM PC) program that performs multiple logistic regression by both maximum likelihood estimation methods, which obviates the need for switching between programs to obtain relative risk estimates from both matched and unmatched analyses. The program calculates most standard statistics and allows factoring of categorical or continuous variables by two distinct methods of contrast. A built-in, descriptive statistics option allows the user to inspect the distribution of cases and controls across categories of any given variable.
Comparison of cranial sex determination by discriminant analysis and logistic regression.
Amores-Ampuero, Anabel; Alemán, Inmaculada
2016-04-05
Various methods have been proposed for estimating dimorphism. The objective of this study was to compare sex determination results from cranial measurements using discriminant analysis or logistic regression. The study sample comprised 130 individuals (70 males) of known sex, age, and cause of death from San José cemetery in Granada (Spain). Measurements of 19 neurocranial dimensions and 11 splanchnocranial dimensions were subjected to discriminant analysis and logistic regression, and the percentages of correct classification were compared between the sex functions obtained with each method. The discriminant capacity of the selected variables was evaluated with a cross-validation procedure. The percentage accuracy with discriminant analysis was 78.2% for the neurocranium (82.4% in females and 74.6% in males) and 73.7% for the splanchnocranium (79.6% in females and 68.8% in males). These percentages were higher with logistic regression analysis: 85.7% for the neurocranium (in both sexes) and 94.1% for the splanchnocranium (100% in females and 91.7% in males).
Kim, Hae Young; Kim, Young Hoon; Yun, Gabin; Chang, Won; Lee, Yoon Jin; Kim, Bohyoung
2018-01-01
To retrospectively investigate whether texture features obtained from preoperative CT images of advanced gastric cancer (AGC) patients could be used for the prediction of occult peritoneal carcinomatosis (PC) detected during operation. 51 AGC patients with occult PC detected during operation from January 2009 to December 2012 were included as occult PC group. For the control group, other 51 AGC patients without evidence of distant metastasis including PC, and whose clinical T and N stage could be matched to those of the patients of the occult PC group, were selected from the period of January 2011 to July 2012. Each group was divided into test (n = 41) and validation cohort (n = 10). Demographic and clinical data of these patients were acquired from the hospital database. Texture features including average, standard deviation, kurtosis, skewness, entropy, correlation, and contrast were obtained from manually drawn region of interest (ROI) over the omentum on the axial CT image showing the omentum at its largest cross sectional area. After using Fisher's exact and Wilcoxon signed-rank test for comparison of the clinical and texture features between the two groups of the test cohort, conditional logistic regression analysis was performed to determine significant independent predictor for occult PC. Using the optimal cut-off value from receiver operating characteristic (ROC) analysis for the significant variables, diagnostic sensitivity and specificity were determined in the test cohort. The cut-off value of the significant variables obtained from the test cohort was then applied to the validation cohort. Bonferroni correction was used to adjust P value for multiple comparisons. Between the two groups, there was no significant difference in the clinical features. Regarding the texture features, the occult PC group showed significantly higher average, entropy, standard deviation, and significantly lower correlation (P value < 0.004 for all). Conditional logistic regression analysis demonstrated that entropy was significant independent predictor for occult PC. When the cut-off value of entropy (> 7.141) was applied to the validation cohort, sensitivity and specificity for the prediction of occult PC were 80% and 90%, respectively. For AGC patients whose PC cannot be detected with routine imaging such as CT, texture analysis may be a useful adjunct for the prediction of occult PC.
ERIC Educational Resources Information Center
Lee, Young-Jin
2015-01-01
This study investigates whether information saved in the log files of a computer-based tutor can be used to predict the problem solving performance of students. The log files of a computer-based physics tutoring environment called Andes Physics Tutor was analyzed to build a logistic regression model that predicted success and failure of students'…
Lin, Chao-Cheng; Bai, Ya-Mei; Chen, Jen-Yeu; Hwang, Tzung-Jeng; Chen, Tzu-Ting; Chiu, Hung-Wen; Li, Yu-Chuan
2010-03-01
Metabolic syndrome (MetS) is an important side effect of second-generation antipsychotics (SGAs). However, many SGA-treated patients with MetS remain undetected. In this study, we trained and validated artificial neural network (ANN) and multiple logistic regression models without biochemical parameters to rapidly identify MetS in patients with SGA treatment. A total of 383 patients with a diagnosis of schizophrenia or schizoaffective disorder (DSM-IV criteria) with SGA treatment for more than 6 months were investigated to determine whether they met the MetS criteria according to the International Diabetes Federation. The data for these patients were collected between March 2005 and September 2005. The input variables of ANN and logistic regression were limited to demographic and anthropometric data only. All models were trained by randomly selecting two-thirds of the patient data and were internally validated with the remaining one-third of the data. The models were then externally validated with data from 69 patients from another hospital, collected between March 2008 and June 2008. The area under the receiver operating characteristic curve (AUC) was used to measure the performance of all models. Both the final ANN and logistic regression models had high accuracy (88.3% vs 83.6%), sensitivity (93.1% vs 86.2%), and specificity (86.9% vs 83.8%) to identify MetS in the internal validation set. The mean +/- SD AUC was high for both the ANN and logistic regression models (0.934 +/- 0.033 vs 0.922 +/- 0.035, P = .63). During external validation, high AUC was still obtained for both models. Waist circumference and diastolic blood pressure were the common variables that were left in the final ANN and logistic regression models. Our study developed accurate ANN and logistic regression models to detect MetS in patients with SGA treatment. The models are likely to provide a noninvasive tool for large-scale screening of MetS in this group of patients. (c) 2010 Physicians Postgraduate Press, Inc.
Neuropsychological tests for predicting cognitive decline in older adults
Baerresen, Kimberly M; Miller, Karen J; Hanson, Eric R; Miller, Justin S; Dye, Richelin V; Hartman, Richard E; Vermeersch, David; Small, Gary W
2015-01-01
Summary Aim To determine neuropsychological tests likely to predict cognitive decline. Methods A sample of nonconverters (n = 106) was compared with those who declined in cognitive status (n = 24). Significant univariate logistic regression prediction models were used to create multivariate logistic regression models to predict decline based on initial neuropsychological testing. Results Rey–Osterrieth Complex Figure Test (RCFT) Retention predicted conversion to mild cognitive impairment (MCI) while baseline Buschke Delay predicted conversion to Alzheimer’s disease (AD). Due to group sample size differences, additional analyses were conducted using a subsample of demographically matched nonconverters. Analyses indicated RCFT Retention predicted conversion to MCI and AD, and Buschke Delay predicted conversion to AD. Conclusion Results suggest RCFT Retention and Buschke Delay may be useful in predicting cognitive decline. PMID:26107318
Use of logistic regression for modelling risk factors: with application to non-melanoma skin cancer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vitaliano, P.P.
Logistic regression was used to estimate the relative risk of basal and squamous skin cancer for such factors as cumulative lifetime solar exposure, age, complexion, and tannability. In previous reports, a subject's exposure was estimated indirectly, by latitude, or by the number of sun days in a subject's habitat. In contrast, these results are based on interview data gathered for each subject. A relatively new technique was used to estimate relative risk by controlling for confounding and testing for effect modification. A linear effect for the relative risk of cancer versus exposure was found. Tannability was shown to be amore » more important risk factor than complexion. This result is consistent with the work of Silverstone and Searle.« less
NASA Astrophysics Data System (ADS)
Li, Hui; Fan, Ming; Zhang, Peng; Li, Yuanzhe; Cheng, Hu; Zhang, Juan; Shao, Guoliang; Li, Lihua
2018-03-01
Breast cancer, with its high heterogeneity, is the most common malignancies in women. In addition to the entire tumor itself, tumor microenvironment could also play a fundamental role on the occurrence and development of tumors. The aim of this study is to investigate the role of heterogeneity within a tumor and the surrounding stromal tissue in predicting the Ki-67 proliferation status of oestrogen receptor (ER)-positive breast cancer patients. To this end, we collected 62 patients imaged with preoperative dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) for analysis. The tumor and the peritumoral stromal tissue were segmented into 8 shells with 5 mm width outside of tumor. The mean enhancement rate in the stromal shells showed a decreasing order if their distances to the tumor increase. Statistical and texture features were extracted from the tumor and the surrounding stromal bands, and multivariate logistic regression classifiers were trained and tested based on these features. An area under the receiver operating characteristic curve (AUC) were calculated to evaluate performance of the classifiers. Furthermore, the statistical model using features extracted from boundary shell next to the tumor produced AUC of 0.796+/-0.076, which is better than that using features from the other subregions. Furthermore, the prediction model using 7 features from the entire tumor produced an AUC value of 0.855+/-0.065. The classifier based on 9 selected features extracted from peritumoral stromal region showed an AUC value of 0.870+/-0.050. Finally, after fusion of the predictive model obtained from entire tumor and the peritumoral stromal regions, the classifier performance was significantly improved with AUC of 0.920. The results indicated that heterogeneity in tumor boundary and peritumoral stromal region could be valuable in predicting the indicator associated with prognosis.
NASA Technical Reports Server (NTRS)
Duda, David P.; Minnis, Patrick
2009-01-01
Previous studies have shown that probabilistic forecasting may be a useful method for predicting persistent contrail formation. A probabilistic forecast to accurately predict contrail formation over the contiguous United States (CONUS) is created by using meteorological data based on hourly meteorological analyses from the Advanced Regional Prediction System (ARPS) and from the Rapid Update Cycle (RUC) as well as GOES water vapor channel measurements, combined with surface and satellite observations of contrails. Two groups of logistic models were created. The first group of models (SURFACE models) is based on surface-based contrail observations supplemented with satellite observations of contrail occurrence. The second group of models (OUTBREAK models) is derived from a selected subgroup of satellite-based observations of widespread persistent contrails. The mean accuracies for both the SURFACE and OUTBREAK models typically exceeded 75 percent when based on the RUC or ARPS analysis data, but decreased when the logistic models were derived from ARPS forecast data.
Knol, Mirjam J; van der Tweel, Ingeborg; Grobbee, Diederick E; Numans, Mattijs E; Geerlings, Mirjam I
2007-10-01
To determine the presence of interaction in epidemiologic research, typically a product term is added to the regression model. In linear regression, the regression coefficient of the product term reflects interaction as departure from additivity. However, in logistic regression it refers to interaction as departure from multiplicativity. Rothman has argued that interaction estimated as departure from additivity better reflects biologic interaction. So far, literature on estimating interaction on an additive scale using logistic regression only focused on dichotomous determinants. The objective of the present study was to provide the methods to estimate interaction between continuous determinants and to illustrate these methods with a clinical example. and results From the existing literature we derived the formulas to quantify interaction as departure from additivity between one continuous and one dichotomous determinant and between two continuous determinants using logistic regression. Bootstrapping was used to calculate the corresponding confidence intervals. To illustrate the theory with an empirical example, data from the Utrecht Health Project were used, with age and body mass index as risk factors for elevated diastolic blood pressure. The methods and formulas presented in this article are intended to assist epidemiologists to calculate interaction on an additive scale between two variables on a certain outcome. The proposed methods are included in a spreadsheet which is freely available at: http://www.juliuscenter.nl/additive-interaction.xls.
ERIC Educational Resources Information Center
Osborne, Jason W.
2012-01-01
Logistic regression is slowly gaining acceptance in the social sciences, and fills an important niche in the researcher's toolkit: being able to predict important outcomes that are not continuous in nature. While OLS regression is a valuable tool, it cannot routinely be used to predict outcomes that are binary or categorical in nature. These…
Assessing Lake Trophic Status: A Proportional Odds Logistic Regression Model
Lake trophic state classifications are good predictors of ecosystem condition and are indicative of both ecosystem services (e.g., recreation and aesthetics), and disservices (e.g., harmful algal blooms). Methods for classifying trophic state are based off the foundational work o...
Transaction Costs and Cost Breaches in Major Defense Acquisition Programs
2014-02-04
bases, schools, missile storage facilities, maintenance facilities, medical/ dental clinics, libraries, and military family housing (DAU, 2011b...AIM-9_Sidewinder Allison, P. D. (2001). Logistic regression using the SAS system. Cary , NC: SAS Institute. Angelis, D., Dillard, J., Franck, C
Bielak, Lawrence F; Whaley, Dana H; Sheedy, Patrick F; Peyser, Patricia A
2010-09-01
The etiology of breast arterial calcification (BAC) is not well understood. We examined reproductive history and cardiovascular disease (CVD) risk factor associations with the presence of detectable BAC in asymptomatic postmenopausal women. Reproductive history and CVD risk factors were obtained in 240 asymptomatic postmenopausal women from a community-based research study who had a screening mammogram within 2 years of their participation in the study. The mammograms were reviewed for the presence of detectable BAC. Age-adjusted logistic regression models were fit to assess the association between each risk factor and the presence of BAC. Multiple variable logistic regression models were used to identify the most parsimonious model for the presence of BAC. The prevalence of BAC increased with increased age (p < 0.0001). The most parsimonious logistic regression model for BAC presence included age at time of examination, increased parity (p = 0.01), earlier age at first birth (p = 0.002), weight, and an age-by-weight interaction term (p = 0.004). Older women with a smaller body size had a higher probability of having BAC than women of the same age with a larger body size. The presence or absence of BAC at mammography may provide an assessment of a postmenopausal woman's lifetime estrogen exposure and indicate women who could be at risk for hormonally related conditions.
Casagrande, Gina; LeJeune, Jeffery; Belury, Martha A; Medeiros, Lydia C
2011-04-01
The Theory of Planned Behavior was used to determine if dietitians personal characteristics and beliefs about fresh vegetable food safety predict whether they currently teach, intend to teach, or neither currently teach nor intend to teach food safety information to their clients. Dietitians who participated in direct client education responded to this web-based survey (n=327). The survey evaluated three independent belief variables: Subjective Norm, Attitudes, and Perceived Behavioral Control. Spearman rho correlations were completed to determine variables that correlated best with current teaching behavior. Multinomial logistical regression was conducted to determine if the belief variables significantly predicted dietitians teaching behavior. Binary logistic regression was used to determine which independent variable was the better predictor of whether dietitians currently taught. Controlling for age, income, education, and gender, the multinomial logistical regression was significant. Perceived behavioral control was the best predictor of whether a dietitian currently taught fresh vegetable food safety. Factors affecting whether dietitians currently taught were confidence in fresh vegetable food safety knowledge, being socially influenced, and a positive attitude toward the teaching behavior. These results validate the importance of teaching food safety effectively and may be used to create more informed food safety curriculum for dietitians. Copyright © 2011 Elsevier Ltd. All rights reserved.
Immortal time bias in observational studies of time-to-event outcomes.
Jones, Mark; Fowler, Robert
2016-12-01
The purpose of the study is to show, through simulation and example, the magnitude and direction of immortal time bias when an inappropriate analysis is used. We compare 4 methods of analysis for observational studies of time-to-event outcomes: logistic regression, standard Cox model, landmark analysis, and time-dependent Cox model using an example data set of patients critically ill with influenza and a simulation study. For the example data set, logistic regression, standard Cox model, and landmark analysis all showed some evidence that treatment with oseltamivir provides protection from mortality in patients critically ill with influenza. However, when the time-dependent nature of treatment exposure is taken account of using a time-dependent Cox model, there is no longer evidence of a protective effect of treatment. The simulation study showed that, under various scenarios, the time-dependent Cox model consistently provides unbiased treatment effect estimates, whereas standard Cox model leads to bias in favor of treatment. Logistic regression and landmark analysis may also lead to bias. To minimize the risk of immortal time bias in observational studies of survival outcomes, we strongly suggest time-dependent exposures be included as time-dependent variables in hazard-based analyses. Copyright © 2016 Elsevier Inc. All rights reserved.
Development and validation of a mortality risk model for pediatric sepsis.
Chen, Mengshi; Lu, Xiulan; Hu, Li; Liu, Pingping; Zhao, Wenjiao; Yan, Haipeng; Tang, Liang; Zhu, Yimin; Xiao, Zhenghui; Chen, Lizhang; Tan, Hongzhuan
2017-05-01
Pediatric sepsis is a burdensome public health problem. Assessing the mortality risk of pediatric sepsis patients, offering effective treatment guidance, and improving prognosis to reduce mortality rates, are crucial.We extracted data derived from electronic medical records of pediatric sepsis patients that were collected during the first 24 hours after admission to the pediatric intensive care unit (PICU) of the Hunan Children's hospital from January 2012 to June 2014. A total of 788 children were randomly divided into a training (592, 75%) and validation group (196, 25%). The risk factors for mortality among these patients were identified by conducting multivariate logistic regression in the training group. Based on the established logistic regression equation, the logit probabilities for all patients (in both groups) were calculated to verify the model's internal and external validities.According to the training group, 6 variables (brain natriuretic peptide, albumin, total bilirubin, D-dimer, lactate levels, and mechanical ventilation in 24 hours) were included in the final logistic regression model. The areas under the curves of the model were 0.854 (0.826, 0.881) and 0.844 (0.816, 0.873) in the training and validation groups, respectively.The Mortality Risk Model for Pediatric Sepsis we established in this study showed acceptable accuracy to predict the mortality risk in pediatric sepsis patients.
Development and validation of a mortality risk model for pediatric sepsis
Chen, Mengshi; Lu, Xiulan; Hu, Li; Liu, Pingping; Zhao, Wenjiao; Yan, Haipeng; Tang, Liang; Zhu, Yimin; Xiao, Zhenghui; Chen, Lizhang; Tan, Hongzhuan
2017-01-01
Abstract Pediatric sepsis is a burdensome public health problem. Assessing the mortality risk of pediatric sepsis patients, offering effective treatment guidance, and improving prognosis to reduce mortality rates, are crucial. We extracted data derived from electronic medical records of pediatric sepsis patients that were collected during the first 24 hours after admission to the pediatric intensive care unit (PICU) of the Hunan Children's hospital from January 2012 to June 2014. A total of 788 children were randomly divided into a training (592, 75%) and validation group (196, 25%). The risk factors for mortality among these patients were identified by conducting multivariate logistic regression in the training group. Based on the established logistic regression equation, the logit probabilities for all patients (in both groups) were calculated to verify the model's internal and external validities. According to the training group, 6 variables (brain natriuretic peptide, albumin, total bilirubin, D-dimer, lactate levels, and mechanical ventilation in 24 hours) were included in the final logistic regression model. The areas under the curves of the model were 0.854 (0.826, 0.881) and 0.844 (0.816, 0.873) in the training and validation groups, respectively. The Mortality Risk Model for Pediatric Sepsis we established in this study showed acceptable accuracy to predict the mortality risk in pediatric sepsis patients. PMID:28514310
Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification.
Fan, Jianqing; Feng, Yang; Jiang, Jiancheng; Tong, Xin
We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing.
Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification
Feng, Yang; Jiang, Jiancheng; Tong, Xin
2015-01-01
We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing. PMID:27185970
Kabeshova, A; Annweiler, C; Fantino, B; Philip, T; Gromov, V A; Launay, C P; Beauchet, O
2014-06-01
Regression tree (RT) analyses are particularly adapted to explore the risk of recurrent falling according to various combinations of fall risk factors compared to logistic regression models. The aims of this study were (1) to determine which combinations of fall risk factors were associated with the occurrence of recurrent falls in older community-dwellers, and (2) to compare the efficacy of RT and multiple logistic regression model for the identification of recurrent falls. A total of 1,760 community-dwelling volunteers (mean age ± standard deviation, 71.0 ± 5.1 years; 49.4 % female) were recruited prospectively in this cross-sectional study. Age, gender, polypharmacy, use of psychoactive drugs, fear of falling (FOF), cognitive disorders and sad mood were recorded. In addition, the history of falls within the past year was recorded using a standardized questionnaire. Among 1,760 participants, 19.7 % (n = 346) were recurrent fallers. The RT identified 14 nodes groups and 8 end nodes with FOF as the first major split. Among participants with FOF, those who had sad mood and polypharmacy formed the end node with the greatest OR for recurrent falls (OR = 6.06 with p < 0.001). Among participants without FOF, those who were male and not sad had the lowest OR for recurrent falls (OR = 0.25 with p < 0.001). The RT correctly classified 1,356 from 1,414 non-recurrent fallers (specificity = 95.6 %), and 65 from 346 recurrent fallers (sensitivity = 18.8 %). The overall classification accuracy was 81.0 %. The multiple logistic regression correctly classified 1,372 from 1,414 non-recurrent fallers (specificity = 97.0 %), and 61 from 346 recurrent fallers (sensitivity = 17.6 %). The overall classification accuracy was 81.4 %. Our results show that RT may identify specific combinations of risk factors for recurrent falls, the combination most associated with recurrent falls involving FOF, sad mood and polypharmacy. The FOF emerged as the risk factor strongly associated with recurrent falls. In addition, RT and multiple logistic regression were not sensitive enough to identify the majority of recurrent fallers but appeared efficient in detecting individuals not at risk of recurrent falls.
A Powerful Test for Comparing Multiple Regression Functions.
Maity, Arnab
2012-09-01
In this article, we address the important problem of comparison of two or more population regression functions. Recently, Pardo-Fernández, Van Keilegom and González-Manteiga (2007) developed test statistics for simple nonparametric regression models: Y(ij) = θ(j)(Z(ij)) + σ(j)(Z(ij))∊(ij), based on empirical distributions of the errors in each population j = 1, … , J. In this paper, we propose a test for equality of the θ(j)(·) based on the concept of generalized likelihood ratio type statistics. We also generalize our test for other nonparametric regression setups, e.g, nonparametric logistic regression, where the loglikelihood for population j is any general smooth function [Formula: see text]. We describe a resampling procedure to obtain the critical values of the test. In addition, we present a simulation study to evaluate the performance of the proposed test and compare our results to those in Pardo-Fernández et al. (2007).
Fong, Youyi; Yu, Xuesong
2016-01-01
Many modern serial dilution assays are based on fluorescence intensity (FI) readouts. We study optimal transformation model choice for fitting five parameter logistic curves (5PL) to FI-based serial dilution assay data. We first develop a generalized least squares-pseudolikelihood type algorithm for fitting heteroscedastic logistic models. Next we show that the 5PL and log 5PL functions can approximate each other well. We then compare four 5PL models with different choices of log transformation and variance modeling through a Monte Carlo study and real data. Our findings are that the optimal choice depends on the intended use of the fitted curves. PMID:27642502
Smit, Michael A; Michelow, Ian C; Glavis-Bloom, Justin; Wolfman, Vanessa; Levine, Adam C
2017-02-01
The clinical and virologic characteristics of Ebola virus disease (EVD) in children have not been thoroughly documented. Consecutive children aged <18 years with real-time polymerase chain reaction (RT-PCR)-confirmed EVD were enrolled retrospectively in 5 Ebola treatment units in Liberia and Sierra Leone in 2014/2015. Data collection and medical management were based on standardized International Medical Corps protocols. We performed descriptive statistics, multivariate logistic regression, and Kaplan-Meier survival analyses. Of 122 children enrolled, the median age was 7 years and one-third were aged <5 years. The female-to-male ratio was 1.3. The most common clinical features at triage and during hospitalization were fever, weakness, anorexia, and diarrhea, although 21% of patients were initially afebrile and 6 patients remained afebrile. Bleeding was rare at presentation (5%) and manifested subsequently in fewer than 50%. The overall case fatality rate was 57%. Factors associated with death in bivariate analyses were age <5 years, bleeding at any time during hospitalization, and high viral load. After adjustment with logistic regression modeling, the odds of death were 14.8-fold higher if patients were aged <5 years, 5-fold higher if the patient had any evidence of bleeding, and 5.2-fold higher if EVD RT-PCR cycle threshold value was ≤20. Plasmodium parasitemia had no impact on EVD outcomes. Age <5 years, bleeding, and high viral loads were poor prognostic indicators of children with EVD. Research to understand mechanisms of these risk factors and the impact of dehydration and electrolyte imbalance will improve health outcomes. © The Author 2016. Published by Oxford University Press for the Infectious Diseases Society of America.
Malhotra, Prashant; Luka, Arthur; McWilliams, Carla S; Poeth, Kaitlin G; Schwartz, Rebecca; Elfekey, Mohammed; Balwan, Sandy
2016-08-01
Respiratory viral illnesses (RVI) are reliably diagnosed by respiratory viral panel using polymerase chain reaction (RVP-PCR); however, owing to the scant data, clinical presentation alone is unreliable in establishing viral etiology. The primary objective of this study was to characterize signs and symptoms of RVI among inpatients in a major tertiary care hospital. Between 2013 and 2015, adult inpatients with RVI undergoing RVP-PCR were prospectively enrolled in our study. Clinical data were collected by interviews and electronic medical record reviews. Data analysis was performed using χ(2) testing, analysis of variance for continuous variables, and logistic regression modeling. Of 421 patients analyzed, 175 (41.7%) had a positive RVP-PCR. Patients were evenly matched at baseline except for renal disease. Multivariate logistic regression modeling demonstrated the following positive correlations: positive RVP-PCR with renal disease (odds ratio [OR] 2.08), cough (OR 2.28), and wheezing (OR 1.8); influenza with cough (OR 5.04), and renal disease (OR 2.17); metapneumovirus with age older than 65 (OR 3.24); respiratory syncytial viruses with wheezing (OR 3.42) and immunosuppression (OR 3.11); and parainfluenza with smoking (OR 3.16). Negative correlations included influenza with anosmia (OR 0.41); rhinovirus/enterovirus with feeling confined to bed (OR 0.3); metapneumovirus with smoking (OR 0.29); and parainfluenza with male sex (OR 0.22). In this descriptive study, we noted specific viral associations with clinical signs and symptoms among 421 inpatients with RVIs. With increasing RVP-PCR use, studies similar to ours may be able to better define the clinical presentation of RVIs and lead to evidence-based, clinical presentation-guided diagnostic and management algorithms.
Chen, Wei; Li, Hui; Hou, Enke; Wang, Shengquan; Wang, Guirong; Panahi, Mahdi; Li, Tao; Peng, Tao; Guo, Chen; Niu, Chao; Xiao, Lele; Wang, Jiale; Xie, Xiaoshen; Ahmad, Baharin Bin
2018-09-01
The aim of the current study was to produce groundwater spring potential maps using novel ensemble weights-of-evidence (WoE) with logistic regression (LR) and functional tree (FT) models. First, a total of 66 springs were identified by field surveys, out of which 70% of the spring locations were used for training the models and 30% of the spring locations were employed for the validation process. Second, a total of 14 affecting factors including aspect, altitude, slope, plan curvature, profile curvature, stream power index (SPI), topographic wetness index (TWI), sediment transport index (STI), lithology, normalized difference vegetation index (NDVI), land use, soil, distance to roads, and distance to streams was used to analyze the spatial relationship between these affecting factors and spring occurrences. Multicollinearity analysis and feature selection of the correlation attribute evaluation (CAE) method were employed to optimize the affecting factors. Subsequently, the novel ensembles of the WoE, LR, and FT models were constructed using the training dataset. Finally, the receiver operating characteristic (ROC) curves, standard error, confidence interval (CI) at 95%, and significance level P were employed to validate and compare the performance of three models. Overall, all three models performed well for groundwater spring potential evaluation. The prediction capability of the FT model, with the highest AUC values, the smallest standard errors, the narrowest CIs, and the smallest P values for the training and validation datasets, is better compared to those of other models. The groundwater spring potential maps can be adopted for the management of water resources and land use by planners and engineers. Copyright © 2018 Elsevier B.V. All rights reserved.
Yu, Ya-Hui; Xia, Wei-Xiong; Shi, Jun-Li; Ma, Wen-Juan; Li, Yong; Ye, Yan-Fang; Liang, Hu; Ke, Liang-Ru; Lv, Xing; Yang, Jing; Xiang, Yan-Qun; Guo, Xiang
2016-06-29
For patients with nasopharyngeal carcinoma (NPC) who undergo re-irradiation with intensity-modulated radiotherapy (IMRT), lethal nasopharyngeal necrosis (LNN) is a severe late adverse event. The purpose of this study was to identify risk factors for LNN and develop a model to predict LNN after radical re-irradiation with IMRT in patients with recurrent NPC. Patients who underwent radical re-irradiation with IMRT for locally recurrent NPC between March 2001 and December 2011 and who had no evidence of distant metastasis were included in this study. Clinical characteristics, including recurrent carcinoma conditions and dosimetric features, were evaluated as candidate risk factors for LNN. Logistic regression analysis was used to identify independent risk factors and construct the predictive scoring model. Among 228 patients enrolled in this study, 204 were at risk of developing LNN based on risk analysis. Of the 204 patients treated, 31 (15.2%) developed LNN. Logistic regression analysis showed that female sex (P = 0.008), necrosis before re-irradiation (P = 0.008), accumulated total prescription dose to the gross tumor volume (GTV) ≥145.5 Gy (P = 0.043), and recurrent tumor volume ≥25.38 cm(3) (P = 0.009) were independent risk factors for LNN. A model to predict LNN was then constructed that included these four independent risk factors. A model that includes sex, necrosis before re-irradiation, accumulated total prescription dose to GTV, and recurrent tumor volume can effectively predict the risk of developing LNN in NPC patients who undergo radical re-irradiation with IMRT.
Response Assessment and Prediction in Esophageal Cancer Patients via F-18 FDG PET/CT Scans
NASA Astrophysics Data System (ADS)
Higgins, Kyle J.
Purpose: The purpose of this study is to utilize F-18 FDG PET/CT scans to determine an indicator for the response of esophageal cancer patients during radiation therapy. There is a need for such an indicator since local failures are quite common in esophageal cancer patients despite modern treatment techniques. If an indicator is found, a patient's treatment strategy may be altered to possibly improve the outcome. This is investigated with various standard uptake volume (SUV) metrics along with image texture features. The metrics and features showing the most promise and indicating response are used in logistic regression analysis to find an equation for the prediction of response. Materials and Methods: 28 patients underwent F-18 FDG PET/CT scans prior to the start of radiation therapy (RT). A second PET/CT scan was administered following the delivery of ~32 Gray (Gy) of dose. A physician contoured gross tumor volume (GTV) was used to delineate a PET based GTV (GTV-pre-PET) based on a threshold of >40% and >20% of the maximum SUV value in the GTV. Deformable registration was used in VelocityAI software to register the pre-treatment and intra-treatment CT scans so that the GTV-pre-PET contours could be transferred from the pre to intra scans (GTV-intra-PET). The fractional decrease in the maximum, mean, volume to the highest intensity 10%-90%, and combination SUV metrics of the significant previous SUV metrics were compared to post-treatment pathologic response for an indication of response. Next for the >40% threshold, texture features based on a neighborhood gray-tone dimension matrix (NGTDM) were analyzed. The fractional decrease in coarseness, contrast, busyness, complexity, and texture strength were compared to the pathologic response of the patients. From these previous two types of analysis, SUV and texture features, the two most significant results were used in logistic regression analysis to find an equation to predict the probability of a non-responder. These probability values were then used to compare against the pathological response to test for indication of response. Results: 20 of the 28 patients underwent post treatment surgery and their pathologic response was determined. 9 of the patients were classified as being responders (treatment effect grade ≤ 1) while 11 of the patients were classified as being non-responders (treatment effect grade > 1). The fractional difference in the different SUV metrics has shown that the most commonly used maximum SUV and mean SUV were not significant in determining response to the treatment. Other SUV metrics however did show promise as being indicators. For the >40% threshold SUV to the highest 10%, 20%, and 30% (SUV10%, SUV20%, SUV30%) were found to significantly distinguish between responders and non-responders (p=0.004) and had an area under the Receiver Operating Characteristic curve (AUC) of 0.7778. Combining these significant metrics (SUV10% with SUV20% and SUV 20% with SUV30%) also was able to distinguish response (p=0.033, AUC=0.7879). Cross validation of these results shown that these metrics could be used to find the response on previously unseen data. The three individual SUV terms distinguished responders from non-responders with a sensitivity of 0.7143 and a specificity of 0.6400 from the cross validation. Cross validation yielded a sensitivity of 0.8333 and a specificity of 0.7727 for the combination of SUV10% and SUV20% and a sensitivity of 0.8333 and specificity of 0.7273 for the combination of SUV20% and SUV30%. For the >20% threshold two SUV metrics were found to be significant. These were the SUV to the highest 10% and 20% (p=0.0048). The AUC for the 10% metrics was 0.7677 and for the 20% metric it was 0.7374. Cross validation of these two metrics shown that the 10% metric was the better indicator with being able to distinguish response in unseen data with a sensitivity of 0.7778 and a specificity of 0.7727. The only texture feature that was able to determine response was complexity (p-0.04, AUC=0.7778). This metric was no more significant than the three individual SUV metrics but less significant than both of the combination metrics. As with the SUV metrics, cross validation was able to show the robustness of these results. Cross validation yielded a result that could accurately distinguish a response with a sensitivity of 0.8333 and a specificity of 0.7273. Logistic regression fit with features of the two most significant results (complexity and combination of SUV10% with SUV20%) yielded the most significant result (p=0.004. AUC=0.8889). Cross validation of this model resulted in a sensitivity of 0.7982 and a specificity 0.7940. This shows that the model would accurately predict the response to unseen data. Conclusions: This study revealed that previously used SUV metrics, maximum and mean SUV, may have to be rethought about being used to determine a response in esophageal cancer patients. The most promising SUV metric was a combination of the SUV10% and SUV20% metric for a GTV created from a threshold of >40% of the maximum SUV value, while the most significant texture feature was complexity. The overall best indicator was the logistic regression fit of the significant metrics of complexity and combination of SUV10% with SUV20%. This was able to distinguish responders from non-responders with a threshold of 0.3186 (sensitivity=0.9091, specificity=0.7778).
Cui, Zaixu; Gong, Gaolang
2018-06-02
Individualized behavioral/cognitive prediction using machine learning (ML) regression approaches is becoming increasingly applied. The specific ML regression algorithm and sample size are two key factors that non-trivially influence prediction accuracies. However, the effects of the ML regression algorithm and sample size on individualized behavioral/cognitive prediction performance have not been comprehensively assessed. To address this issue, the present study included six commonly used ML regression algorithms: ordinary least squares (OLS) regression, least absolute shrinkage and selection operator (LASSO) regression, ridge regression, elastic-net regression, linear support vector regression (LSVR), and relevance vector regression (RVR), to perform specific behavioral/cognitive predictions based on different sample sizes. Specifically, the publicly available resting-state functional MRI (rs-fMRI) dataset from the Human Connectome Project (HCP) was used, and whole-brain resting-state functional connectivity (rsFC) or rsFC strength (rsFCS) were extracted as prediction features. Twenty-five sample sizes (ranged from 20 to 700) were studied by sub-sampling from the entire HCP cohort. The analyses showed that rsFC-based LASSO regression performed remarkably worse than the other algorithms, and rsFCS-based OLS regression performed markedly worse than the other algorithms. Regardless of the algorithm and feature type, both the prediction accuracy and its stability exponentially increased with increasing sample size. The specific patterns of the observed algorithm and sample size effects were well replicated in the prediction using re-testing fMRI data, data processed by different imaging preprocessing schemes, and different behavioral/cognitive scores, thus indicating excellent robustness/generalization of the effects. The current findings provide critical insight into how the selected ML regression algorithm and sample size influence individualized predictions of behavior/cognition and offer important guidance for choosing the ML regression algorithm or sample size in relevant investigations. Copyright © 2018 Elsevier Inc. All rights reserved.
Atlas, Steven J; Tosteson, Tor D; Hanscom, Brett; Blood, Emily A; Pransky, Glenn S; Abdu, William A; Andersson, Gunnar B; Weinstein, James N
2007-08-15
Combined analysis of 2 prospective clinical studies. To identify socioeconomic characteristics associated with workers' compensation in patients with an intervertebral disc herniation (IDH) or spinal stenosis (SpS). Few studies have compared socioeconomic differences between those receiving or not receiving workers' compensation with the same underlying clinical conditions. Patients were identified from the Spine Patient Outcomes Research Trial (SPORT) and the National Spine Network (NSN) practice-based outcomes study. Patients with IDH and SpS within NSN were identified satisfying SPORT eligibility criteria. Information on disability and work status at baseline evaluation was used to categorize patients into 3 groups: workers' compensation, other disability compensation, or work-eligible controls. Enrollment rates of patients with disability in a clinical efficacy trial (SPORT) and practice-based network (NSN) were compared. Independent socioeconomic predictors of baseline workers' compensation status were identified in multivariate logistic regression models controlling for clinical condition, study cohort, and initial treatment designation. Among 3759 eligible patients (1480 in SPORT and 2279 in NSN), 564 (15%) were receiving workers' compensation, 317 (8%) were receiving other disability compensation, and 2878 (77%) were controls. Patients receiving workers' compensation were less common in SPORT than NSN (9.2% vs. 18.8%, P < 0.001), but patients receiving other disability compensation were similarly represented (8.9% vs. 7.7%, P = 0.19). In univariate analyses, many socioeconomic characteristics significantly differed according to baseline workers' compensation status. In multiple logistic regression analyses, gender, educational level, work characteristics, legal action, and expectations about ability to work without surgery were independently associated with receiving workers' compensation. Clinical trials involving conditions commonly seen in patients with workers' compensation may need special efforts to ensure adequate representation. Socioeconomic characteristics markedly differed between patients receiving and not receiving workers' compensation. Identifying the independent effects of workers' compensation on outcomes will require controlling for these baseline characteristics and other clinical features associated with disability status.
What Is Different About Worker’s Compensation Patients?
Atlas, Steven J.; Tosteson, Tor D.; Hanscom, Brett; Blood, Emily A.; Pransky, Glenn S.; Abdu, William A.; Andersson, Gunnar B.; Weinstein, James N.
2010-01-01
Study Design Combined analysis of 2 prospective clinical studies. Objective To identify socioeconomic characteristics associated with workers’ compensation in patients with an intervertebral disc herniation (IDH) or spinal stenosis (SpS). Summary of Background Data Few studies have compared socioeconomic differences between those receiving or not receiving workers’ compensation with the same underlying clinical conditions. Methods Patients were identified from the Spine Patient Outcomes Research Trial (SPORT) and the National Spine Network (NSN) practice-based outcomes study. Patients with IDH and SpS within NSN were identified satisfying SPORT eligibility criteria. Information on disability and work status at baseline evaluation was used to categorize patients into 3 groups: workers’ compensation, other disability compensation, or work-eligible controls. Enrollment rates of patients with disability in a clinical efficacy trial (SPORT) and practice-based network (NSN) were compared. Independent socioeconomic predictors of baseline workers’ compensation status were identified in multivariate logistic regression models controlling for clinical condition, study cohort, and initial treatment designation. Results Among 3759 eligible patients (1480 in SPORT and 2279 in NSN), 564 (15%) were receiving workers’ compensation, 317 (8%) were receiving other disability compensation, and 2878 (77%) were controls. Patients receiving workers’ compensation were less common in SPORT than NSN (9.2% vs. 18.8%, P < 0.001), but patients receiving other disability compensation were similarly represented (8.9% vs. 7.7%, P = 0.19). In univariate analyses, many socioeconomic characteristics significantly differed according to baseline workers’ compensation status. In multiple logistic regression analyses, gender, educational level, work characteristics, legal action, and expectations about ability to work without surgery were independently associated with receiving workers’ compensation. Conclusion Clinical trials involving conditions commonly seen in patients with workers’ compensation may need special efforts to ensure adequate representation. Socioeconomic characteristics markedly differed between patients receiving and not receiving workers’ compensation. Identifying the independent effects of workers’ compensation on outcomes will require controlling for these baseline characteristics and other clinical features associated with disability status. PMID:17700451