Dynamic Dimensionality Selection for Bayesian Classifier Ensembles
2015-03-19
learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but much more...classifier, Generative learning, Discriminative learning, Naïve Bayes, Feature selection, Logistic regression , higher order attribute independence 16...discriminative learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but
Large unbalanced credit scoring using Lasso-logistic regression ensemble.
Wang, Hong; Xu, Qingsong; Zhou, Lifeng
2015-01-01
Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data.
Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble
Wang, Hong; Xu, Qingsong; Zhou, Lifeng
2015-01-01
Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data. PMID:25706988
Westreich, Daniel; Lessler, Justin; Funk, Michele Jonsson
2010-08-01
Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this review was to assess machine learning alternatives to logistic regression, which may accomplish the same goals but with fewer assumptions or greater accuracy. We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (classification and regression trees [CART]), and meta-classifiers (in particular, boosting). Although the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and, to a lesser extent, decision trees (particularly CART), appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice. Copyright (c) 2010 Elsevier Inc. All rights reserved.
An Entropy-Based Measure for Assessing Fuzziness in Logistic Regression
Weiss, Brandi A.; Dardick, William
2015-01-01
This article introduces an entropy-based measure of data–model fit that can be used to assess the quality of logistic regression models. Entropy has previously been used in mixture-modeling to quantify how well individuals are classified into latent classes. The current study proposes the use of entropy for logistic regression models to quantify the quality of classification and separation of group membership. Entropy complements preexisting measures of data–model fit and provides unique information not contained in other measures. Hypothetical data scenarios, an applied example, and Monte Carlo simulation results are used to demonstrate the application of entropy in logistic regression. Entropy should be used in conjunction with other measures of data–model fit to assess how well logistic regression models classify cases into observed categories. PMID:29795897
An Entropy-Based Measure for Assessing Fuzziness in Logistic Regression.
Weiss, Brandi A; Dardick, William
2016-12-01
This article introduces an entropy-based measure of data-model fit that can be used to assess the quality of logistic regression models. Entropy has previously been used in mixture-modeling to quantify how well individuals are classified into latent classes. The current study proposes the use of entropy for logistic regression models to quantify the quality of classification and separation of group membership. Entropy complements preexisting measures of data-model fit and provides unique information not contained in other measures. Hypothetical data scenarios, an applied example, and Monte Carlo simulation results are used to demonstrate the application of entropy in logistic regression. Entropy should be used in conjunction with other measures of data-model fit to assess how well logistic regression models classify cases into observed categories.
Susan L. King
2003-01-01
The performance of two classifiers, logistic regression and neural networks, are compared for modeling noncatastrophic individual tree mortality for 21 species of trees in West Virginia. The output of the classifier is usually a continuous number between 0 and 1. A threshold is selected between 0 and 1 and all of the trees below the threshold are classified as...
Westreich, Daniel; Lessler, Justin; Funk, Michele Jonsson
2010-01-01
Summary Objective Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this Review was to assess machine learning alternatives to logistic regression which may accomplish the same goals but with fewer assumptions or greater accuracy. Study Design and Setting We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. Results We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (CART), and meta-classifiers (in particular, boosting). Conclusion While the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and to a lesser extent decision trees (particularly CART) appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice. PMID:20630332
ERIC Educational Resources Information Center
Rudner, Lawrence
2016-01-01
In the machine learning literature, it is commonly accepted as fact that as calibration sample sizes increase, Naïve Bayes classifiers initially outperform Logistic Regression classifiers in terms of classification accuracy. Applied to subtests from an on-line final examination and from a highly regarded certification examination, this study shows…
An ultra low power feature extraction and classification system for wearable seizure detection.
Page, Adam; Pramod Tim Oates, Siddharth; Mohsenin, Tinoosh
2015-01-01
In this paper we explore the use of a variety of machine learning algorithms for designing a reliable and low-power, multi-channel EEG feature extractor and classifier for predicting seizures from electroencephalographic data (scalp EEG). Different machine learning classifiers including k-nearest neighbor, support vector machines, naïve Bayes, logistic regression, and neural networks are explored with the goal of maximizing detection accuracy while minimizing power, area, and latency. The input to each machine learning classifier is a 198 feature vector containing 9 features for each of the 22 EEG channels obtained over 1-second windows. All classifiers were able to obtain F1 scores over 80% and onset sensitivity of 100% when tested on 10 patients. Among five different classifiers that were explored, logistic regression (LR) proved to have minimum hardware complexity while providing average F-1 score of 91%. Both ASIC and FPGA implementations of logistic regression are presented and show the smallest area, power consumption, and the lowest latency when compared to the previous work.
An Entropy-Based Measure for Assessing Fuzziness in Logistic Regression
ERIC Educational Resources Information Center
Weiss, Brandi A.; Dardick, William
2016-01-01
This article introduces an entropy-based measure of data-model fit that can be used to assess the quality of logistic regression models. Entropy has previously been used in mixture-modeling to quantify how well individuals are classified into latent classes. The current study proposes the use of entropy for logistic regression models to quantify…
Computing group cardinality constraint solutions for logistic regression problems.
Zhang, Yong; Kwon, Dongjin; Pohl, Kilian M
2017-01-01
We derive an algorithm to directly solve logistic regression based on cardinality constraint, group sparsity and use it to classify intra-subject MRI sequences (e.g. cine MRIs) of healthy from diseased subjects. Group cardinality constraint models are often applied to medical images in order to avoid overfitting of the classifier to the training data. Solutions within these models are generally determined by relaxing the cardinality constraint to a weighted feature selection scheme. However, these solutions relate to the original sparse problem only under specific assumptions, which generally do not hold for medical image applications. In addition, inferring clinical meaning from features weighted by a classifier is an ongoing topic of discussion. Avoiding weighing features, we propose to directly solve the group cardinality constraint logistic regression problem by generalizing the Penalty Decomposition method. To do so, we assume that an intra-subject series of images represents repeated samples of the same disease patterns. We model this assumption by combining series of measurements created by a feature across time into a single group. Our algorithm then derives a solution within that model by decoupling the minimization of the logistic regression function from enforcing the group sparsity constraint. The minimum to the smooth and convex logistic regression problem is determined via gradient descent while we derive a closed form solution for finding a sparse approximation of that minimum. We apply our method to cine MRI of 38 healthy controls and 44 adult patients that received reconstructive surgery of Tetralogy of Fallot (TOF) during infancy. Our method correctly identifies regions impacted by TOF and generally obtains statistically significant higher classification accuracy than alternative solutions to this model, i.e., ones relaxing group cardinality constraints. Copyright © 2016 Elsevier B.V. All rights reserved.
Austin, Peter C.; Tu, Jack V.; Ho, Jennifer E.; Levy, Daniel; Lee, Douglas S.
2014-01-01
Objective Physicians classify patients into those with or without a specific disease. Furthermore, there is often interest in classifying patients according to disease etiology or subtype. Classification trees are frequently used to classify patients according to the presence or absence of a disease. However, classification trees can suffer from limited accuracy. In the data-mining and machine learning literature, alternate classification schemes have been developed. These include bootstrap aggregation (bagging), boosting, random forests, and support vector machines. Study design and Setting We compared the performance of these classification methods with those of conventional classification trees to classify patients with heart failure according to the following sub-types: heart failure with preserved ejection fraction (HFPEF) vs. heart failure with reduced ejection fraction (HFREF). We also compared the ability of these methods to predict the probability of the presence of HFPEF with that of conventional logistic regression. Results We found that modern, flexible tree-based methods from the data mining literature offer substantial improvement in prediction and classification of heart failure sub-type compared to conventional classification and regression trees. However, conventional logistic regression had superior performance for predicting the probability of the presence of HFPEF compared to the methods proposed in the data mining literature. Conclusion The use of tree-based methods offers superior performance over conventional classification and regression trees for predicting and classifying heart failure subtypes in a population-based sample of patients from Ontario. However, these methods do not offer substantial improvements over logistic regression for predicting the presence of HFPEF. PMID:23384592
NASA Astrophysics Data System (ADS)
Schaeben, Helmut; Semmler, Georg
2016-09-01
The objective of prospectivity modeling is prediction of the conditional probability of the presence T = 1 or absence T = 0 of a target T given favorable or prohibitive predictors B, or construction of a two classes 0,1 classification of T. A special case of logistic regression called weights-of-evidence (WofE) is geologists' favorite method of prospectivity modeling due to its apparent simplicity. However, the numerical simplicity is deceiving as it is implied by the severe mathematical modeling assumption of joint conditional independence of all predictors given the target. General weights of evidence are explicitly introduced which are as simple to estimate as conventional weights, i.e., by counting, but do not require conditional independence. Complementary to the regression view is the classification view on prospectivity modeling. Boosting is the construction of a strong classifier from a set of weak classifiers. From the regression point of view it is closely related to logistic regression. Boost weights-of-evidence (BoostWofE) was introduced into prospectivity modeling to counterbalance violations of the assumption of conditional independence even though relaxation of modeling assumptions with respect to weak classifiers was not the (initial) purpose of boosting. In the original publication of BoostWofE a fabricated dataset was used to "validate" this approach. Using the same fabricated dataset it is shown that BoostWofE cannot generally compensate lacking conditional independence whatever the consecutively processing order of predictors. Thus the alleged features of BoostWofE are disproved by way of counterexamples, while theoretical findings are confirmed that logistic regression including interaction terms can exactly compensate violations of joint conditional independence if the predictors are indicators.
NASA Astrophysics Data System (ADS)
Xu, Chao; Zhou, Dongxiang; Zhai, Yongping; Liu, Yunhui
2015-12-01
This paper realizes the automatic segmentation and classification of Mycobacterium tuberculosis with conventional light microscopy. First, the candidate bacillus objects are segmented by the marker-based watershed transform. The markers are obtained by an adaptive threshold segmentation based on the adaptive scale Gaussian filter. The scale of the Gaussian filter is determined according to the color model of the bacillus objects. Then the candidate objects are extracted integrally after region merging and contaminations elimination. Second, the shape features of the bacillus objects are characterized by the Hu moments, compactness, eccentricity, and roughness, which are used to classify the single, touching and non-bacillus objects. We evaluated the logistic regression, random forest, and intersection kernel support vector machines classifiers in classifying the bacillus objects respectively. Experimental results demonstrate that the proposed method yields to high robustness and accuracy. The logistic regression classifier performs best with an accuracy of 91.68%.
Steganalysis using logistic regression
NASA Astrophysics Data System (ADS)
Lubenko, Ivans; Ker, Andrew D.
2011-02-01
We advocate Logistic Regression (LR) as an alternative to the Support Vector Machine (SVM) classifiers commonly used in steganalysis. LR offers more information than traditional SVM methods - it estimates class probabilities as well as providing a simple classification - and can be adapted more easily and efficiently for multiclass problems. Like SVM, LR can be kernelised for nonlinear classification, and it shows comparable classification accuracy to SVM methods. This work is a case study, comparing accuracy and speed of SVM and LR classifiers in detection of LSB Matching and other related spatial-domain image steganography, through the state-of-art 686-dimensional SPAM feature set, in three image sets.
Classification of mislabelled microarrays using robust sparse logistic regression.
Bootkrajang, Jakramate; Kabán, Ata
2013-04-01
Previous studies reported that labelling errors are not uncommon in microarray datasets. In such cases, the training set may become misleading, and the ability of classifiers to make reliable inferences from the data is compromised. Yet, few methods are currently available in the bioinformatics literature to deal with this problem. The few existing methods focus on data cleansing alone, without reference to classification, and their performance crucially depends on some tuning parameters. In this article, we develop a new method to detect mislabelled arrays simultaneously with learning a sparse logistic regression classifier. Our method may be seen as a label-noise robust extension of the well-known and successful Bayesian logistic regression classifier. To account for possible mislabelling, we formulate a label-flipping process as part of the classifier. The regularization parameter is automatically set using Bayesian regularization, which not only saves the computation time that cross-validation would take, but also eliminates any unwanted effects of label noise when setting the regularization parameter. Extensive experiments with both synthetic data and real microarray datasets demonstrate that our approach is able to counter the bad effects of labelling errors in terms of predictive performance, it is effective at identifying marker genes and simultaneously it detects mislabelled arrays to high accuracy. The code is available from http://cs.bham.ac.uk/∼jxb008. Supplementary data are available at Bioinformatics online.
Nagelkerke, Nico; Fidler, Vaclav
2015-01-01
The problem of discrimination and classification is central to much of epidemiology. Here we consider the estimation of a logistic regression/discrimination function from training samples, when one of the training samples is subject to misclassification or mislabeling, e.g. diseased individuals are incorrectly classified/labeled as healthy controls. We show that this leads to zero-inflated binomial model with a defective logistic regression or discrimination function, whose parameters can be estimated using standard statistical methods such as maximum likelihood. These parameters can be used to estimate the probability of true group membership among those, possibly erroneously, classified as controls. Two examples are analyzed and discussed. A simulation study explores properties of the maximum likelihood parameter estimates and the estimates of the number of mislabeled observations.
Automated Classification of Consumer Health Information Needs in Patient Portal Messages.
Cronin, Robert M; Fabbri, Daniel; Denny, Joshua C; Jackson, Gretchen Purcell
2015-01-01
Patients have diverse health information needs, and secure messaging through patient portals is an emerging means by which such needs are expressed and met. As patient portal adoption increases, growing volumes of secure messages may burden healthcare providers. Automated classification could expedite portal message triage and answering. We created four automated classifiers based on word content and natural language processing techniques to identify health information needs in 1000 patient-generated portal messages. Logistic regression and random forest classifiers detected single information needs well, with area under the curves of 0.804-0.914. A logistic regression classifier accurately found the set of needs within a message, with a Jaccard index of 0.859 (95% Confidence Interval: (0.847, 0.871)). Automated classification of consumer health information needs expressed in patient portal messages is feasible and may allow direct linking to relevant resources or creation of institutional resources for commonly expressed needs.
Automated Classification of Consumer Health Information Needs in Patient Portal Messages
Cronin, Robert M.; Fabbri, Daniel; Denny, Joshua C.; Jackson, Gretchen Purcell
2015-01-01
Patients have diverse health information needs, and secure messaging through patient portals is an emerging means by which such needs are expressed and met. As patient portal adoption increases, growing volumes of secure messages may burden healthcare providers. Automated classification could expedite portal message triage and answering. We created four automated classifiers based on word content and natural language processing techniques to identify health information needs in 1000 patient-generated portal messages. Logistic regression and random forest classifiers detected single information needs well, with area under the curves of 0.804–0.914. A logistic regression classifier accurately found the set of needs within a message, with a Jaccard index of 0.859 (95% Confidence Interval: (0.847, 0.871)). Automated classification of consumer health information needs expressed in patient portal messages is feasible and may allow direct linking to relevant resources or creation of institutional resources for commonly expressed needs. PMID:26958285
A general equation to obtain multiple cut-off scores on a test from multinomial logistic regression.
Bersabé, Rosa; Rivas, Teresa
2010-05-01
The authors derive a general equation to compute multiple cut-offs on a total test score in order to classify individuals into more than two ordinal categories. The equation is derived from the multinomial logistic regression (MLR) model, which is an extension of the binary logistic regression (BLR) model to accommodate polytomous outcome variables. From this analytical procedure, cut-off scores are established at the test score (the predictor variable) at which an individual is as likely to be in category j as in category j+1 of an ordinal outcome variable. The application of the complete procedure is illustrated by an example with data from an actual study on eating disorders. In this example, two cut-off scores on the Eating Attitudes Test (EAT-26) scores are obtained in order to classify individuals into three ordinal categories: asymptomatic, symptomatic and eating disorder. Diagnoses were made from the responses to a self-report (Q-EDD) that operationalises DSM-IV criteria for eating disorders. Alternatives to the MLR model to set multiple cut-off scores are discussed.
Support vector machines classifiers of physical activities in preschoolers
USDA-ARS?s Scientific Manuscript database
The goal of this study is to develop, test, and compare multinomial logistic regression (MLR) and support vector machines (SVM) in classifying preschool-aged children physical activity data acquired from an accelerometer. In this study, 69 children aged 3-5 years old were asked to participate in a s...
A comparison of rule-based and machine learning approaches for classifying patient portal messages.
Cronin, Robert M; Fabbri, Daniel; Denny, Joshua C; Rosenbloom, S Trent; Jackson, Gretchen Purcell
2017-09-01
Secure messaging through patient portals is an increasingly popular way that consumers interact with healthcare providers. The increasing burden of secure messaging can affect clinic staffing and workflows. Manual management of portal messages is costly and time consuming. Automated classification of portal messages could potentially expedite message triage and delivery of care. We developed automated patient portal message classifiers with rule-based and machine learning techniques using bag of words and natural language processing (NLP) approaches. To evaluate classifier performance, we used a gold standard of 3253 portal messages manually categorized using a taxonomy of communication types (i.e., main categories of informational, medical, logistical, social, and other communications, and subcategories including prescriptions, appointments, problems, tests, follow-up, contact information, and acknowledgement). We evaluated our classifiers' accuracies in identifying individual communication types within portal messages with area under the receiver-operator curve (AUC). Portal messages often contain more than one type of communication. To predict all communication types within single messages, we used the Jaccard Index. We extracted the variables of importance for the random forest classifiers. The best performing approaches to classification for the major communication types were: logistic regression for medical communications (AUC: 0.899); basic (rule-based) for informational communications (AUC: 0.842); and random forests for social communications and logistical communications (AUCs: 0.875 and 0.925, respectively). The best performing classification approach of classifiers for individual communication subtypes was random forests for Logistical-Contact Information (AUC: 0.963). The Jaccard Indices by approach were: basic classifier, Jaccard Index: 0.674; Naïve Bayes, Jaccard Index: 0.799; random forests, Jaccard Index: 0.859; and logistic regression, Jaccard Index: 0.861. For medical communications, the most predictive variables were NLP concepts (e.g., Temporal_Concept, which maps to 'morning', 'evening' and Idea_or_Concept which maps to 'appointment' and 'refill'). For logistical communications, the most predictive variables contained similar numbers of NLP variables and words (e.g., Telephone mapping to 'phone', 'insurance'). For social and informational communications, the most predictive variables were words (e.g., social: 'thanks', 'much', informational: 'question', 'mean'). This study applies automated classification methods to the content of patient portal messages and evaluates the application of NLP techniques on consumer communications in patient portal messages. We demonstrated that random forest and logistic regression approaches accurately classified the content of portal messages, although the best approach to classification varied by communication type. Words were the most predictive variables for classification of most communication types, although NLP variables were most predictive for medical communication types. As adoption of patient portals increases, automated techniques could assist in understanding and managing growing volumes of messages. Further work is needed to improve classification performance to potentially support message triage and answering. Copyright © 2017 Elsevier B.V. All rights reserved.
Yang, Lixue; Chen, Kean
2015-11-01
To improve the design of underwater target recognition systems based on auditory perception, this study compared human listeners with automatic classifiers. Performances measures and strategies in three discrimination experiments, including discriminations between man-made and natural targets, between ships and submarines, and among three types of ships, were used. In the experiments, the subjects were asked to assign a score to each sound based on how confident they were about the category to which it belonged, and logistic regression, which represents linear discriminative models, also completed three similar tasks by utilizing many auditory features. The results indicated that the performances of logistic regression improved as the ratio between inter- and intra-class differences became larger, whereas the performances of the human subjects were limited by their unfamiliarity with the targets. Logistic regression performed better than the human subjects in all tasks but the discrimination between man-made and natural targets, and the strategies employed by excellent human subjects were similar to that of logistic regression. Logistic regression and several human subjects demonstrated similar performances when discriminating man-made and natural targets, but in this case, their strategies were not similar. An appropriate fusion of their strategies led to further improvement in recognition accuracy.
Classification of sodium MRI data of cartilage using machine learning.
Madelin, Guillaume; Poidevin, Frederick; Makrymallis, Antonios; Regatte, Ravinder R
2015-11-01
To assess the possible utility of machine learning for classifying subjects with and subjects without osteoarthritis using sodium magnetic resonance imaging data. Theory: Support vector machine, k-nearest neighbors, naïve Bayes, discriminant analysis, linear regression, logistic regression, neural networks, decision tree, and tree bagging were tested. Sodium magnetic resonance imaging with and without fluid suppression by inversion recovery was acquired on the knee cartilage of 19 controls and 28 osteoarthritis patients. Sodium concentrations were measured in regions of interests in the knee for both acquisitions. Mean (MEAN) and standard deviation (STD) of these concentrations were measured in each regions of interest, and the minimum, maximum, and mean of these two measurements were calculated over all regions of interests for each subject. The resulting 12 variables per subject were used as predictors for classification. Either Min [STD] alone, or in combination with Mean [MEAN] or Min [MEAN], all from fluid suppressed data, were the best predictors with an accuracy >74%, mainly with linear logistic regression and linear support vector machine. Other good classifiers include discriminant analysis, linear regression, and naïve Bayes. Machine learning is a promising technique for classifying osteoarthritis patients and controls from sodium magnetic resonance imaging data. © 2014 Wiley Periodicals, Inc.
Classification of vegetation types in military region
NASA Astrophysics Data System (ADS)
Gonçalves, Miguel; Silva, Jose Silvestre; Bioucas-Dias, Jose
2015-10-01
In decision-making process regarding planning and execution of military operations, the terrain is a determining factor. Aerial photographs are a source of vital information for the success of an operation in hostile region, namely when the cartographic information behind enemy lines is scarce or non-existent. The objective of present work is the development of a tool capable of processing aerial photos. The methodology implemented starts with feature extraction, followed by the application of an automatic selector of features. The next step, using the k-fold cross validation technique, estimates the input parameters for the following classifiers: Sparse Multinomial Logist Regression (SMLR), K Nearest Neighbor (KNN), Linear Classifier using Principal Component Expansion on the Joint Data (PCLDC) and Multi-Class Support Vector Machine (MSVM). These classifiers were used in two different studies with distinct objectives: discrimination of vegetation's density and identification of vegetation's main components. It was found that the best classifier on the first approach is the Sparse Logistic Multinomial Regression (SMLR). On the second approach, the implemented methodology applied to high resolution images showed that the better performance was achieved by KNN classifier and PCLDC. Comparing the two approaches there is a multiscale issue, in which for different resolutions, the best solution to the problem requires different classifiers and the extraction of different features.
A comparative study on entrepreneurial attitudes modeled with logistic regression and Bayes nets.
López Puga, Jorge; García García, Juan
2012-11-01
Entrepreneurship research is receiving increasing attention in our context, as entrepreneurs are key social agents involved in economic development. We compare the success of the dichotomic logistic regression model and the Bayes simple classifier to predict entrepreneurship, after manipulating the percentage of missing data and the level of categorization in predictors. A sample of undergraduate university students (N = 1230) completed five scales (motivation, attitude towards business creation, obstacles, deficiencies, and training needs) and we found that each of them predicted different aspects of the tendency to business creation. Additionally, our results show that the receiver operating characteristic (ROC) curve is affected by the rate of missing data in both techniques, but logistic regression seems to be more vulnerable when faced with missing data, whereas Bayes nets underperform slightly when categorization has been manipulated. Our study sheds light on the potential entrepreneur profile and we propose to use Bayesian networks as an additional alternative to overcome the weaknesses of logistic regression when missing data are present in applied research.
Exploring students' patterns of reasoning
NASA Astrophysics Data System (ADS)
Matloob Haghanikar, Mojgan
As part of a collaborative study of the science preparation of elementary school teachers, we investigated the quality of students' reasoning and explored the relationship between sophistication of reasoning and the degree to which the courses were considered inquiry oriented. To probe students' reasoning, we developed open-ended written content questions with the distinguishing feature of applying recently learned concepts in a new context. We devised a protocol for developing written content questions that provided a common structure for probing and classifying students' sophistication level of reasoning. In designing our protocol, we considered several distinct criteria, and classified students' responses based on their performance for each criterion. First, we classified concepts into three types: Descriptive, Hypothetical, and Theoretical and categorized the abstraction levels of the responses in terms of the types of concepts and the inter-relationship between the concepts. Second, we devised a rubric based on Bloom's revised taxonomy with seven traits (both knowledge types and cognitive processes) and a defined set of criteria to evaluate each trait. Along with analyzing students' reasoning, we visited universities and observed the courses in which the students were enrolled. We used the Reformed Teaching Observation Protocol (RTOP) to rank the courses with respect to characteristics that are valued for the inquiry courses. We conducted logistic regression for a sample of 18courses with about 900 students and reported the results for performing logistic regression to estimate the relationship between traits of reasoning and RTOP score. In addition, we analyzed conceptual structure of students' responses, based on conceptual classification schemes, and clustered students' responses into six categories. We derived regression model, to estimate the relationship between the sophistication of the categories of conceptual structure and RTOP scores. However, the outcome variable with six categories required a more complicated regression model, known as multinomial logistic regression, generalized from binary logistic regression. With the large amount of collected data, we found that the likelihood of the higher cognitive processes were in favor of classes with higher measures on inquiry. However, the usage of more abstract concepts with higher order conceptual structures was less prevalent in higher RTOP courses.
Classification of Effective Soil Depth by Using Multinomial Logistic Regression Analysis
NASA Astrophysics Data System (ADS)
Chang, C. H.; Chan, H. C.; Chen, B. A.
2016-12-01
Classification of effective soil depth is a task of determining the slopeland utilizable limitation in Taiwan. The "Slopeland Conservation and Utilization Act" categorizes the slopeland into agriculture and husbandry land, land suitable for forestry and land for enhanced conservation according to the factors including average slope, effective soil depth, soil erosion and parental rock. However, sit investigation of the effective soil depth requires a cost-effective field work. This research aimed to classify the effective soil depth by using multinomial logistic regression with the environmental factors. The Wen-Shui Watershed located at the central Taiwan was selected as the study areas. The analysis of multinomial logistic regression is performed by the assistance of a Geographic Information Systems (GIS). The effective soil depth was categorized into four levels including deeper, deep, shallow and shallower. The environmental factors of slope, aspect, digital elevation model (DEM), curvature and normalized difference vegetation index (NDVI) were selected for classifying the soil depth. An Error Matrix was then used to assess the model accuracy. The results showed an overall accuracy of 75%. At the end, a map of effective soil depth was produced to help planners and decision makers in determining the slopeland utilizable limitation in the study areas.
Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan
2015-03-15
Proteins located in appropriate cellular compartments are of paramount importance to exert their biological functions. Prediction of protein subcellular localization by computational methods is required in the post-genomic era. Recent studies have been focusing on predicting not only single-location proteins but also multi-location proteins. However, most of the existing predictors are far from effective for tackling the challenges of multi-label proteins. This article proposes an efficient multi-label predictor, namely mPLR-Loc, based on penalized logistic regression and adaptive decisions for predicting both single- and multi-location proteins. Specifically, for each query protein, mPLR-Loc exploits the information from the Gene Ontology (GO) database by using its accession number (AC) or the ACs of its homologs obtained via BLAST. The frequencies of GO occurrences are used to construct feature vectors, which are then classified by an adaptive decision-based multi-label penalized logistic regression classifier. Experimental results based on two recent stringent benchmark datasets (virus and plant) show that mPLR-Loc remarkably outperforms existing state-of-the-art multi-label predictors. In addition to being able to rapidly and accurately predict subcellular localization of single- and multi-label proteins, mPLR-Loc can also provide probabilistic confidence scores for the prediction decisions. For readers' convenience, the mPLR-Loc server is available online (http://bioinfo.eie.polyu.edu.hk/mPLRLocServer). Copyright © 2014 Elsevier Inc. All rights reserved.
Logistic regression model for diagnosis of transition zone prostate cancer on multi-parametric MRI.
Dikaios, Nikolaos; Alkalbani, Jokha; Sidhu, Harbir Singh; Fujiwara, Taiki; Abd-Alazeez, Mohamed; Kirkham, Alex; Allen, Clare; Ahmed, Hashim; Emberton, Mark; Freeman, Alex; Halligan, Steve; Taylor, Stuart; Atkinson, David; Punwani, Shonit
2015-02-01
We aimed to develop logistic regression (LR) models for classifying prostate cancer within the transition zone on multi-parametric magnetic resonance imaging (mp-MRI). One hundred and fifty-five patients (training cohort, 70 patients; temporal validation cohort, 85 patients) underwent mp-MRI and transperineal-template-prostate-mapping (TPM) biopsy. Positive cores were classified by cancer definitions: (1) any-cancer; (2) definition-1 [≥Gleason 4 + 3 or ≥ 6 mm cancer core length (CCL)] [high risk significant]; and (3) definition-2 (≥Gleason 3 + 4 or ≥ 4 mm CCL) cancer [intermediate-high risk significant]. For each, logistic-regression mp-MRI models were derived from the training cohort and validated internally and with the temporal cohort. Sensitivity/specificity and the area under the receiver operating characteristic (ROC-AUC) curve were calculated. LR model performance was compared to radiologists' performance. Twenty-eight of 70 patients from the training cohort, and 25/85 patients from the temporal validation cohort had significant cancer on TPM. The ROC-AUC of the LR model for classification of cancer was 0.73/0.67 at internal/temporal validation. The radiologist A/B ROC-AUC was 0.65/0.74 (temporal cohort). For patients scored by radiologists as Prostate Imaging Reporting and Data System (Pi-RADS) score 3, sensitivity/specificity of radiologist A 'best guess' and LR model was 0.14/0.54 and 0.71/0.61, respectively; and radiologist B 'best guess' and LR model was 0.40/0.34 and 0.50/0.76, respectively. LR models can improve classification of Pi-RADS score 3 lesions similar to experienced radiologists. • MRI helps find prostate cancer in the anterior of the gland • Logistic regression models based on mp-MRI can classify prostate cancer • Computers can help confirm cancer in areas doctors are uncertain about.
Classifying machinery condition using oil samples and binary logistic regression
NASA Astrophysics Data System (ADS)
Phillips, J.; Cripps, E.; Lau, John W.; Hodkiewicz, M. R.
2015-08-01
The era of big data has resulted in an explosion of condition monitoring information. The result is an increasing motivation to automate the costly and time consuming human elements involved in the classification of machine health. When working with industry it is important to build an understanding and hence some trust in the classification scheme for those who use the analysis to initiate maintenance tasks. Typically "black box" approaches such as artificial neural networks (ANN) and support vector machines (SVM) can be difficult to provide ease of interpretability. In contrast, this paper argues that logistic regression offers easy interpretability to industry experts, providing insight to the drivers of the human classification process and to the ramifications of potential misclassification. Of course, accuracy is of foremost importance in any automated classification scheme, so we also provide a comparative study based on predictive performance of logistic regression, ANN and SVM. A real world oil analysis data set from engines on mining trucks is presented and using cross-validation we demonstrate that logistic regression out-performs the ANN and SVM approaches in terms of prediction for healthy/not healthy engines.
Lee, Seokho; Shin, Hyejin; Lee, Sang Han
2016-12-01
Alzheimer's disease (AD) is usually diagnosed by clinicians through cognitive and functional performance test with a potential risk of misdiagnosis. Since the progression of AD is known to cause structural changes in the corpus callosum (CC), the CC thickness can be used as a functional covariate in AD classification problem for a diagnosis. However, misclassified class labels negatively impact the classification performance. Motivated by AD-CC association studies, we propose a logistic regression for functional data classification that is robust to misdiagnosis or label noise. Specifically, our logistic regression model is constructed by adopting individual intercepts to functional logistic regression model. This approach enables to indicate which observations are possibly mislabeled and also lead to a robust and efficient classifier. An effective algorithm using MM algorithm provides simple closed-form update formulas. We test our method using synthetic datasets to demonstrate its superiority over an existing method, and apply it to differentiating patients with AD from healthy normals based on CC from MRI. © 2016, The International Biometric Society.
Wolters, Mark A; Dean, C B
2017-01-01
Remote sensing images from Earth-orbiting satellites are a potentially rich data source for monitoring and cataloguing atmospheric health hazards that cover large geographic regions. A method is proposed for classifying such images into hazard and nonhazard regions using the autologistic regression model, which may be viewed as a spatial extension of logistic regression. The method includes a novel and simple approach to parameter estimation that makes it well suited to handling the large and high-dimensional datasets arising from satellite-borne instruments. The methodology is demonstrated on both simulated images and a real application to the identification of forest fire smoke.
NASA Astrophysics Data System (ADS)
Cary, Theodore W.; Cwanger, Alyssa; Venkatesh, Santosh S.; Conant, Emily F.; Sehgal, Chandra M.
2012-03-01
This study compares the performance of two proven but very different machine learners, Naïve Bayes and logistic regression, for differentiating malignant and benign breast masses using ultrasound imaging. Ultrasound images of 266 masses were analyzed quantitatively for shape, echogenicity, margin characteristics, and texture features. These features along with patient age, race, and mammographic BI-RADS category were used to train Naïve Bayes and logistic regression classifiers to diagnose lesions as malignant or benign. ROC analysis was performed using all of the features and using only a subset that maximized information gain. Performance was determined by the area under the ROC curve, Az, obtained from leave-one-out cross validation. Naïve Bayes showed significant variation (Az 0.733 +/- 0.035 to 0.840 +/- 0.029, P < 0.002) with the choice of features, but the performance of logistic regression was relatively unchanged under feature selection (Az 0.839 +/- 0.029 to 0.859 +/- 0.028, P = 0.605). Out of 34 features, a subset of 6 gave the highest information gain: brightness difference, margin sharpness, depth-to-width, mammographic BI-RADs, age, and race. The probabilities of malignancy determined by Naïve Bayes and logistic regression after feature selection showed significant correlation (R2= 0.87, P < 0.0001). The diagnostic performance of Naïve Bayes and logistic regression can be comparable, but logistic regression is more robust. Since probability of malignancy cannot be measured directly, high correlation between the probabilities derived from two basic but dissimilar models increases confidence in the predictive power of machine learning models for characterizing solid breast masses on ultrasound.
NASA Technical Reports Server (NTRS)
Smith, Kelly M.; Gay, Robert S.; Stachowiak, Susan J.
2013-01-01
In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter to improve altitude knowledge. In order to increase overall robustness, the vehicle also has an alternate method of triggering the parachute deployment sequence based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this backup trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to semi-automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a statistical classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers improved performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles.
NASA Technical Reports Server (NTRS)
Smith, Kelly; Gay, Robert; Stachowiak, Susan
2013-01-01
In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter to improve altitude knowledge. In order to increase overall robustness, the vehicle also has an alternate method of triggering the parachute deployment sequence based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this backup trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to semi-automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a statistical classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers improved performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles
NASA Technical Reports Server (NTRS)
Smith, Kelly M.; Gay, Robert S.; Stachowiak, Susan J.
2013-01-01
In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter. In order to increase overall robustness, the vehicle also has an alternate method of triggering the drogue parachute deployment based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this velocity-based trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers excellent performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles.
Data mining: Potential applications in research on nutrition and health.
Batterham, Marijka; Neale, Elizabeth; Martin, Allison; Tapsell, Linda
2017-02-01
Data mining enables further insights from nutrition-related research, but caution is required. The aim of this analysis was to demonstrate and compare the utility of data mining methods in classifying a categorical outcome derived from a nutrition-related intervention. Baseline data (23 variables, 8 categorical) on participants (n = 295) in an intervention trial were used to classify participants in terms of meeting the criteria of achieving 10 000 steps per day. Results from classification and regression trees (CARTs), random forests, adaptive boosting, logistic regression, support vector machines and neural networks were compared using area under the curve (AUC) and error assessments. The CART produced the best model when considering the AUC (0.703), overall error (18%) and within class error (28%). Logistic regression also performed reasonably well compared to the other models (AUC 0.675, overall error 23%, within class error 36%). All the methods gave different rankings of variables' importance. CART found that body fat, quality of life using the SF-12 Physical Component Summary (PCS) and the cholesterol: HDL ratio were the most important predictors of meeting the 10 000 steps criteria, while logistic regression showed the SF-12PCS, glucose levels and level of education to be the most significant predictors (P ≤ 0.01). Differing outcomes suggest caution is required with a single data mining method, particularly in a dataset with nonlinear relationships and outliers and when exploring relationships that were not the primary outcomes of the research. © 2017 Dietitians Association of Australia.
Characterization and machine learning prediction of allele-specific DNA methylation.
He, Jianlin; Sun, Ming-an; Wang, Zhong; Wang, Qianfei; Li, Qing; Xie, Hehuang
2015-12-01
A large collection of Single Nucleotide Polymorphisms (SNPs) has been identified in the human genome. Currently, the epigenetic influences of SNPs on their neighboring CpG sites remain elusive. A growing body of evidence suggests that locus-specific information, including genomic features and local epigenetic state, may play important roles in the epigenetic readout of SNPs. In this study, we made use of mouse methylomes with known SNPs to develop statistical models for the prediction of SNP associated allele-specific DNA methylation (ASM). ASM has been classified into parent-of-origin dependent ASM (P-ASM) and sequence-dependent ASM (S-ASM), which comprises scattered-S-ASM (sS-ASM) and clustered-S-ASM (cS-ASM). We found that P-ASM and cS-ASM CpG sites are both enriched in CpG rich regions, promoters and exons, while sS-ASM CpG sites are enriched in simple repeat and regions with high frequent SNP occurrence. Using Lasso-grouped Logistic Regression (LGLR), we selected 21 out of 282 genomic and methylation related features that are powerful in distinguishing cS-ASM CpG sites and trained the classifiers with machine learning techniques. Based on 5-fold cross-validation, the logistic regression classifier was found to be the best for cS-ASM prediction with an ACC of 0.77, an AUC of 0.84 and an MCC of 0.54. Lastly, we applied the logistic regression classifier on human brain methylome and predicted 608 genes associated with cS-ASM. Gene ontology term enrichment analysis indicated that these cS-ASM associated genes are significantly enriched in the category coding for transcripts with alternative splicing forms. In summary, this study provided an analytical procedure for cS-ASM prediction and shed new light on the understanding of different types of ASM events. Published by Elsevier Inc.
Assessing Lake Trophic Status: A Proportional Odds Logistic Regression Model
Lake trophic state classifications are good predictors of ecosystem condition and are indicative of both ecosystem services (e.g., recreation and aesthetics), and disservices (e.g., harmful algal blooms). Methods for classifying trophic state are based off the foundational work o...
Blood Based Biomarkers of Early Onset Breast Cancer
2016-12-01
discretizes the data, and also using logistic elastic net – a form of linear regression - we were unable to build a classifier that could accurately...classifier for differentiating cases from controls off discretized data. The first pass analysis demonstrated a 35 gene signature that differentiated...to the discretized data for mRNA gene signature, the samples used to “train” were also included in the final samples used to “test” the algorithm
A Comparison of Methods for Detecting Differential Distractor Functioning
ERIC Educational Resources Information Center
Koon, Sharon
2010-01-01
This study examined the effectiveness of the odds-ratio method (Penfield, 2008) and the multinomial logistic regression method (Kato, Moen, & Thurlow, 2009) for measuring differential distractor functioning (DDF) effects in comparison to the standardized distractor analysis approach (Schmitt & Bleistein, 1987). Students classified as participating…
Williams, Jennifer A.; Schmitter-Edgecombe, Maureen; Cook, Diane J.
2016-01-01
Introduction Reducing the amount of testing required to accurately detect cognitive impairment is clinically relevant. The aim of this research was to determine the fewest number of clinical measures required to accurately classify participants as healthy older adult, mild cognitive impairment (MCI) or dementia using a suite of classification techniques. Methods Two variable selection machine learning models (i.e., naive Bayes, decision tree), a logistic regression, and two participant datasets (i.e., clinical diagnosis, clinical dementia rating; CDR) were explored. Participants classified using clinical diagnosis criteria included 52 individuals with dementia, 97 with MCI, and 161 cognitively healthy older adults. Participants classified using CDR included 154 individuals CDR = 0, 93 individuals with CDR = 0.5, and 25 individuals with CDR = 1.0+. Twenty-seven demographic, psychological, and neuropsychological variables were available for variable selection. Results No significant difference was observed between naive Bayes, decision tree, and logistic regression models for classification of both clinical diagnosis and CDR datasets. Participant classification (70.0 – 99.1%), geometric mean (60.9 – 98.1%), sensitivity (44.2 – 100%), and specificity (52.7 – 100%) were generally satisfactory. Unsurprisingly, the MCI/CDR = 0.5 participant group was the most challenging to classify. Through variable selection only 2 – 9 variables were required for classification and varied between datasets in a clinically meaningful way. Conclusions The current study results reveal that machine learning techniques can accurately classifying cognitive impairment and reduce the number of measures required for diagnosis. PMID:26332171
2016-01-01
Understanding the relationship between physiological measurements from human subjects and their demographic data is important within both the biometric and forensic domains. In this paper we explore the relationship between measurements of the human hand and a range of demographic features. We assess the ability of linear regression and machine learning classifiers to predict demographics from hand features, thereby providing evidence on both the strength of relationship and the key features underpinning this relationship. Our results show that we are able to predict sex, height, weight and foot size accurately within various data-range bin sizes, with machine learning classification algorithms out-performing linear regression in most situations. In addition, we identify the features used to provide these relationships applicable across multiple applications. PMID:27806075
Miguel-Hurtado, Oscar; Guest, Richard; Stevenage, Sarah V; Neil, Greg J; Black, Sue
2016-01-01
Understanding the relationship between physiological measurements from human subjects and their demographic data is important within both the biometric and forensic domains. In this paper we explore the relationship between measurements of the human hand and a range of demographic features. We assess the ability of linear regression and machine learning classifiers to predict demographics from hand features, thereby providing evidence on both the strength of relationship and the key features underpinning this relationship. Our results show that we are able to predict sex, height, weight and foot size accurately within various data-range bin sizes, with machine learning classification algorithms out-performing linear regression in most situations. In addition, we identify the features used to provide these relationships applicable across multiple applications.
Online breakage detection of multitooth tools using classifier ensembles for imbalanced data
NASA Astrophysics Data System (ADS)
Bustillo, Andrés; Rodríguez, Juan J.
2014-12-01
Cutting tool breakage detection is an important task, due to its economic impact on mass production lines in the automobile industry. This task presents a central limitation: real data-sets are extremely imbalanced because breakage occurs in very few cases compared with normal operation of the cutting process. In this paper, we present an analysis of different data-mining techniques applied to the detection of insert breakage in multitooth tools. The analysis applies only one experimental variable: the electrical power consumption of the tool drive. This restriction profiles real industrial conditions more accurately than other physical variables, such as acoustic or vibration signals, which are not so easily measured. Many efforts have been made to design a method that is able to identify breakages with a high degree of reliability within a short period of time. The solution is based on classifier ensembles for imbalanced data-sets. Classifier ensembles are combinations of classifiers, which in many situations are more accurate than individual classifiers. Six different base classifiers are tested: Decision Trees, Rules, Naïve Bayes, Nearest Neighbour, Multilayer Perceptrons and Logistic Regression. Three different balancing strategies are tested with each of the classifier ensembles and compared to their performance with the original data-set: Synthetic Minority Over-Sampling Technique (SMOTE), undersampling and a combination of SMOTE and undersampling. To identify the most suitable data-mining solution, Receiver Operating Characteristics (ROC) graph and Recall-precision graph are generated and discussed. The performance of logistic regression ensembles on the balanced data-set using the combination of SMOTE and undersampling turned out to be the most suitable technique. Finally a comparison using industrial performance measures is presented, which concludes that this technique is also more suited to this industrial problem than the other techniques presented in the bibliography.
Kabeshova, A; Annweiler, C; Fantino, B; Philip, T; Gromov, V A; Launay, C P; Beauchet, O
2014-06-01
Regression tree (RT) analyses are particularly adapted to explore the risk of recurrent falling according to various combinations of fall risk factors compared to logistic regression models. The aims of this study were (1) to determine which combinations of fall risk factors were associated with the occurrence of recurrent falls in older community-dwellers, and (2) to compare the efficacy of RT and multiple logistic regression model for the identification of recurrent falls. A total of 1,760 community-dwelling volunteers (mean age ± standard deviation, 71.0 ± 5.1 years; 49.4 % female) were recruited prospectively in this cross-sectional study. Age, gender, polypharmacy, use of psychoactive drugs, fear of falling (FOF), cognitive disorders and sad mood were recorded. In addition, the history of falls within the past year was recorded using a standardized questionnaire. Among 1,760 participants, 19.7 % (n = 346) were recurrent fallers. The RT identified 14 nodes groups and 8 end nodes with FOF as the first major split. Among participants with FOF, those who had sad mood and polypharmacy formed the end node with the greatest OR for recurrent falls (OR = 6.06 with p < 0.001). Among participants without FOF, those who were male and not sad had the lowest OR for recurrent falls (OR = 0.25 with p < 0.001). The RT correctly classified 1,356 from 1,414 non-recurrent fallers (specificity = 95.6 %), and 65 from 346 recurrent fallers (sensitivity = 18.8 %). The overall classification accuracy was 81.0 %. The multiple logistic regression correctly classified 1,372 from 1,414 non-recurrent fallers (specificity = 97.0 %), and 61 from 346 recurrent fallers (sensitivity = 17.6 %). The overall classification accuracy was 81.4 %. Our results show that RT may identify specific combinations of risk factors for recurrent falls, the combination most associated with recurrent falls involving FOF, sad mood and polypharmacy. The FOF emerged as the risk factor strongly associated with recurrent falls. In addition, RT and multiple logistic regression were not sensitive enough to identify the majority of recurrent fallers but appeared efficient in detecting individuals not at risk of recurrent falls.
glmnetLRC f/k/a lrc package: Logistic Regression Classification
DOE Office of Scientific and Technical Information (OSTI.GOV)
2016-06-09
Methods for fitting and predicting logistic regression classifiers (LRC) with an arbitrary loss function using elastic net or best subsets. This package adds additional model fitting features to the existing glmnet and bestglm R packages. This package was created to perform the analyses described in Amidan BG, Orton DJ, LaMarche BL, et al. 2014. Signatures for Mass Spectrometry Data Quality. Journal of Proteome Research. 13(4), 2215-2222. It makes the model fitting available in the glmnet and bestglm packages more general by identifying optimal model parameters via cross validation with an customizable loss function. It also identifies the optimal threshold formore » binary classification.« less
Microhabitat analysis using radiotelemetry locations and polytomous logistic regression
Malcolm P. North; Joel H. Reynolds
1996-01-01
Microhabitat analyses often use discriminant function analysis (DFA) to compare vegetation structures or environmental conditions between sites classified by a study animal's presence or absence. These presence/absence studies make questionable assumptions about the habitat value of the comparison sites and the microhabitat data often violate the DFA's...
Improving ensemble decision tree performance using Adaboost and Bagging
NASA Astrophysics Data System (ADS)
Hasan, Md. Rajib; Siraj, Fadzilah; Sainin, Mohd Shamrie
2015-12-01
Ensemble classifier systems are considered as one of the most promising in medical data classification and the performance of deceision tree classifier can be increased by the ensemble method as it is proven to be better than single classifiers. However, in a ensemble settings the performance depends on the selection of suitable base classifier. This research employed two prominent esemble s namely Adaboost and Bagging with base classifiers such as Random Forest, Random Tree, j48, j48grafts and Logistic Model Regression (LMT) that have been selected independently. The empirical study shows that the performance varries when different base classifiers are selected and even some places overfitting issue also been noted. The evidence shows that ensemble decision tree classfiers using Adaboost and Bagging improves the performance of selected medical data sets.
Use of generalized ordered logistic regression for the analysis of multidrug resistance data.
Agga, Getahun E; Scott, H Morgan
2015-10-01
Statistical analysis of antimicrobial resistance data largely focuses on individual antimicrobial's binary outcome (susceptible or resistant). However, bacteria are becoming increasingly multidrug resistant (MDR). Statistical analysis of MDR data is mostly descriptive often with tabular or graphical presentations. Here we report the applicability of generalized ordinal logistic regression model for the analysis of MDR data. A total of 1,152 Escherichia coli, isolated from the feces of weaned pigs experimentally supplemented with chlortetracycline (CTC) and copper, were tested for susceptibilities against 15 antimicrobials and were binary classified into resistant or susceptible. The 15 antimicrobial agents tested were grouped into eight different antimicrobial classes. We defined MDR as the number of antimicrobial classes to which E. coli isolates were resistant ranging from 0 to 8. Proportionality of the odds assumption of the ordinal logistic regression model was violated only for the effect of treatment period (pre-treatment, during-treatment and post-treatment); but not for the effect of CTC or copper supplementation. Subsequently, a partially constrained generalized ordinal logistic model was built that allows for the effect of treatment period to vary while constraining the effects of treatment (CTC and copper supplementation) to be constant across the levels of MDR classes. Copper (Proportional Odds Ratio [Prop OR]=1.03; 95% CI=0.73-1.47) and CTC (Prop OR=1.1; 95% CI=0.78-1.56) supplementation were not significantly associated with the level of MDR adjusted for the effect of treatment period. MDR generally declined over the trial period. In conclusion, generalized ordered logistic regression can be used for the analysis of ordinal data such as MDR data when the proportionality assumptions for ordered logistic regression are violated. Published by Elsevier B.V.
Applying machine-learning techniques to Twitter data for automatic hazard-event classification.
NASA Astrophysics Data System (ADS)
Filgueira, R.; Bee, E. J.; Diaz-Doce, D.; Poole, J., Sr.; Singh, A.
2017-12-01
The constant flow of information offered by tweets provides valuable information about all sorts of events at a high temporal and spatial resolution. Over the past year we have been analyzing in real-time geological hazards/phenomenon, such as earthquakes, volcanic eruptions, landslides, floods or the aurora, as part of the GeoSocial project, by geo-locating tweets filtered by keywords in a web-map. However, not all the filtered tweets are related with hazard/phenomenon events. This work explores two classification techniques for automatic hazard-event categorization based on tweets about the "Aurora". First, tweets were filtered using aurora-related keywords, removing stop words and selecting the ones written in English. For classifying the remaining between "aurora-event" or "no-aurora-event" categories, we compared two state-of-art techniques: Support Vector Machine (SVM) and Deep Convolutional Neural Networks (CNN) algorithms. Both approaches belong to the family of supervised learning algorithms, which make predictions based on labelled training dataset. Therefore, we created a training dataset by tagging 1200 tweets between both categories. The general form of SVM is used to separate two classes by a function (kernel). We compared the performance of four different kernels (Linear Regression, Logistic Regression, Multinomial Naïve Bayesian and Stochastic Gradient Descent) provided by Scikit-Learn library using our training dataset to build the SVM classifier. The results shown that the Logistic Regression (LR) gets the best accuracy (87%). So, we selected the SVM-LR classifier to categorise a large collection of tweets using the "dispel4py" framework.Later, we developed a CNN classifier, where the first layer embeds words into low-dimensional vectors. The next layer performs convolutions over the embedded word vectors. Results from the convolutional layer are max-pooled into a long feature vector, which is classified using a softmax layer. The CNN's accuracy is lower (83%) than the SVM-LR, since the algorithm needs a bigger training dataset to increase its accuracy. We used TensorFlow framework for applying CNN classifier to the same collection of tweets.In future we will modify both classifiers to work with other geo-hazards, use larger training datasets and apply them in real-time.
Molecular Signature for Lymphatic Invasion Associated with Survival of Epithelial Ovarian Cancer.
Paik, E Sun; Choi, Hyun Jin; Kim, Tae-Joong; Lee, Jeong-Won; Kim, Byoung-Gie; Bae, Duk-Soo; Choi, Chel Hun
2018-04-01
We aimed to develop molecular classifier that can predict lymphatic invasion and their clinical significance in epithelial ovarian cancer (EOC) patients. We analyzed gene expression (mRNA, methylated DNA) in data from The Cancer Genome Atlas. To identify molecular signatures for lymphatic invasion, we found differentially expressed genes. The performance of classifier was validated by receiver operating characteristics analysis, logistic regression, linear discriminant analysis (LDA), and support vector machine (SVM). We assessed prognostic role of classifier using random survival forest (RSF) model and pathway deregulation score (PDS). For external validation,we analyzed microarray data from 26 EOC samples of Samsung Medical Center and curatedOvarianData database. We identified 21 mRNAs, and seven methylated DNAs from primary EOC tissues that predicted lymphatic invasion and created prognostic models. The classifier predicted lymphatic invasion well, which was validated by logistic regression, LDA, and SVM algorithm (C-index of 0.90, 0.71, and 0.74 for mRNA and C-index of 0.64, 0.68, and 0.69 for DNA methylation). Using RSF model, incorporating molecular data with clinical variables improved prediction of progression-free survival compared with using only clinical variables (p < 0.001 and p=0.008). Similarly, PDS enabled us to classify patients into high-risk and low-risk group, which resulted in survival difference in mRNA profiles (log-rank p-value=0.011). In external validation, gene signature was well correlated with prediction of lymphatic invasion and patients' survival. Molecular signature model predicting lymphatic invasion was well performed and also associated with survival of EOC patients.
Rashid, Nasir; Iqbal, Javaid; Javed, Amna; Tiwana, Mohsin I; Khan, Umar Shahbaz
2018-01-01
Brain Computer Interface (BCI) determines the intent of the user from a variety of electrophysiological signals. These signals, Slow Cortical Potentials, are recorded from scalp, and cortical neuronal activity is recorded by implanted electrodes. This paper is focused on design of an embedded system that is used to control the finger movements of an upper limb prosthesis using Electroencephalogram (EEG) signals. This is a follow-up of our previous research which explored the best method to classify three movements of fingers (thumb movement, index finger movement, and first movement). Two-stage logistic regression classifier exhibited the highest classification accuracy while Power Spectral Density (PSD) was used as a feature of the filtered signal. The EEG signal data set was recorded using a 14-channel electrode headset (a noninvasive BCI system) from right-handed, neurologically intact volunteers. Mu (commonly known as alpha waves) and Beta Rhythms (8-30 Hz) containing most of the movement data were retained through filtering using "Arduino Uno" microcontroller followed by 2-stage logistic regression to obtain a mean classification accuracy of 70%.
Variational dynamic background model for keyword spotting in handwritten documents
NASA Astrophysics Data System (ADS)
Kumar, Gaurav; Wshah, Safwan; Govindaraju, Venu
2013-12-01
We propose a bayesian framework for keyword spotting in handwritten documents. This work is an extension to our previous work where we proposed dynamic background model, DBM for keyword spotting that takes into account the local character level scores and global word level scores to learn a logistic regression classifier to separate keywords from non-keywords. In this work, we add a bayesian layer on top of the DBM called the variational dynamic background model, VDBM. The logistic regression classifier uses the sigmoid function to separate keywords from non-keywords. The sigmoid function being neither convex nor concave, exact inference of VDBM becomes intractable. An expectation maximization step is proposed to do approximate inference. The advantage of VDBM over the DBM is multi-fold. Firstly, being bayesian, it prevents over-fitting of data. Secondly, it provides better modeling of data and an improved prediction of unseen data. VDBM is evaluated on the IAM dataset and the results prove that it outperforms our prior work and other state of the art line based word spotting system.
New machine-learning algorithms for prediction of Parkinson's disease
NASA Astrophysics Data System (ADS)
Mandal, Indrajit; Sairam, N.
2014-03-01
This article presents an enhanced prediction accuracy of diagnosis of Parkinson's disease (PD) to prevent the delay and misdiagnosis of patients using the proposed robust inference system. New machine-learning methods are proposed and performance comparisons are based on specificity, sensitivity, accuracy and other measurable parameters. The robust methods of treating Parkinson's disease (PD) includes sparse multinomial logistic regression, rotation forest ensemble with support vector machines and principal components analysis, artificial neural networks, boosting methods. A new ensemble method comprising of the Bayesian network optimised by Tabu search algorithm as classifier and Haar wavelets as projection filter is used for relevant feature selection and ranking. The highest accuracy obtained by linear logistic regression and sparse multinomial logistic regression is 100% and sensitivity, specificity of 0.983 and 0.996, respectively. All the experiments are conducted over 95% and 99% confidence levels and establish the results with corrected t-tests. This work shows a high degree of advancement in software reliability and quality of the computer-aided diagnosis system and experimentally shows best results with supportive statistical inference.
NASA Astrophysics Data System (ADS)
Oh, Hyun-Joo; Lee, Saro; Chotikasathien, Wisut; Kim, Chang Hwan; Kwon, Ju Hyoung
2009-04-01
For predictive landslide susceptibility mapping, this study applied and verified probability model, the frequency ratio and statistical model, logistic regression at Pechabun, Thailand, using a geographic information system (GIS) and remote sensing. Landslide locations were identified in the study area from interpretation of aerial photographs and field surveys, and maps of the topography, geology and land cover were constructed to spatial database. The factors that influence landslide occurrence, such as slope gradient, slope aspect and curvature of topography and distance from drainage were calculated from the topographic database. Lithology and distance from fault were extracted and calculated from the geology database. Land cover was classified from Landsat TM satellite image. The frequency ratio and logistic regression coefficient were overlaid for landslide susceptibility mapping as each factor’s ratings. Then the landslide susceptibility map was verified and compared using the existing landslide location. As the verification results, the frequency ratio model showed 76.39% and logistic regression model showed 70.42% in prediction accuracy. The method can be used to reduce hazards associated with landslides and to plan land cover.
Kupek, Emil
2006-03-15
Structural equation modelling (SEM) has been increasingly used in medical statistics for solving a system of related regression equations. However, a great obstacle for its wider use has been its difficulty in handling categorical variables within the framework of generalised linear models. A large data set with a known structure among two related outcomes and three independent variables was generated to investigate the use of Yule's transformation of odds ratio (OR) into Q-metric by (OR-1)/(OR+1) to approximate Pearson's correlation coefficients between binary variables whose covariance structure can be further analysed by SEM. Percent of correctly classified events and non-events was compared with the classification obtained by logistic regression. The performance of SEM based on Q-metric was also checked on a small (N = 100) random sample of the data generated and on a real data set. SEM successfully recovered the generated model structure. SEM of real data suggested a significant influence of a latent confounding variable which would have not been detectable by standard logistic regression. SEM classification performance was broadly similar to that of the logistic regression. The analysis of binary data can be greatly enhanced by Yule's transformation of odds ratios into estimated correlation matrix that can be further analysed by SEM. The interpretation of results is aided by expressing them as odds ratios which are the most frequently used measure of effect in medical statistics.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huang, W; Tu, S
Purpose: We conducted a retrospective study of Radiomics research for classifying malignancy of small pulmonary nodules. A machine learning algorithm of logistic regression and open research platform of Radiomics, IBEX (Imaging Biomarker Explorer), were used to evaluate the classification accuracy. Methods: The training set included 100 CT image series from cancer patients with small pulmonary nodules where the average diameter is 1.10 cm. These patients registered at Chang Gung Memorial Hospital and received a CT-guided operation of lung cancer lobectomy. The specimens were classified by experienced pathologists with a B (benign) or M (malignant). CT images with slice thickness ofmore » 0.625 mm were acquired from a GE BrightSpeed 16 scanner. The study was formally approved by our institutional internal review board. Nodules were delineated and 374 feature parameters were extracted from IBEX. We first used the t-test and p-value criteria to study which feature can differentiate between group B and M. Then we implemented a logistic regression algorithm to perform nodule malignancy classification. 10-fold cross-validation and the receiver operating characteristic curve (ROC) were used to evaluate the classification accuracy. Finally hierarchical clustering analysis, Spearman rank correlation coefficient, and clustering heat map were used to further study correlation characteristics among different features. Results: 238 features were found differentiable between group B and M based on whether their statistical p-values were less than 0.05. A forward search algorithm was used to select an optimal combination of features for the best classification and 9 features were identified. Our study found the best accuracy of classifying malignancy was 0.79±0.01 with the 10-fold cross-validation. The area under the ROC curve was 0.81±0.02. Conclusion: Benign nodules may be treated as a malignant tumor in low-dose CT and patients may undergo unnecessary surgeries or treatments. Our study may help radiologists to differentiate nodule malignancy for low-dose CT.« less
Product unit neural network models for predicting the growth limits of Listeria monocytogenes.
Valero, A; Hervás, C; García-Gimeno, R M; Zurera, G
2007-08-01
A new approach to predict the growth/no growth interface of Listeria monocytogenes as a function of storage temperature, pH, citric acid (CA) and ascorbic acid (AA) is presented. A linear logistic regression procedure was performed and a non-linear model was obtained by adding new variables by means of a Neural Network model based on Product Units (PUNN). The classification efficiency of the training data set and the generalization data of the new Logistic Regression PUNN model (LRPU) were compared with Linear Logistic Regression (LLR) and Polynomial Logistic Regression (PLR) models. 92% of the total cases from the LRPU model were correctly classified, an improvement on the percentage obtained using the PLR model (90%) and significantly higher than the results obtained with the LLR model, 80%. On the other hand predictions of LRPU were closer to data observed which permits to design proper formulations in minimally processed foods. This novel methodology can be applied to predictive microbiology for describing growth/no growth interface of food-borne microorganisms such as L. monocytogenes. The optimal balance is trying to find models with an acceptable interpretation capacity and with good ability to fit the data on the boundaries of variable range. The results obtained conclude that these kinds of models might well be very a valuable tool for mathematical modeling.
Lacagnina, Valerio; Leto-Barone, Maria S; La Piana, Simona; Seidita, Aurelio; Pingitore, Giuseppe; Di Lorenzo, Gabriele
2014-01-01
This article uses the logistic regression model for diagnostic decision making in patients with chronic nasal symptoms. We studied the ability of the logistic regression model, obtained by the evaluation of a database, to detect patients with positive allergy skin-prick test (SPT) and patients with negative SPT. The model developed was validated using the data set obtained from another medical institution. The analysis was performed using a database obtained from a questionnaire administered to the patients with nasal symptoms containing personal data, clinical data, and results of allergy testing (SPT). All variables found to be significantly different between patients with positive and negative SPT (p < 0.05) were selected for the logistic regression models and were analyzed with backward stepwise logistic regression, evaluated with area under the curve of the receiver operating characteristic curve. A second set of patients from another institution was used to prove the model. The accuracy of the model in identifying, over the second set, both patients whose SPT will be positive and negative was high. The model detected 96% of patients with nasal symptoms and positive SPT and classified 94% of those with negative SPT. This study is preliminary to the creation of a software that could help the primary care doctors in a diagnostic decision making process (need of allergy testing) in patients complaining of chronic nasal symptoms.
NASA Astrophysics Data System (ADS)
Martínez-Fernández, J.; Chuvieco, E.; Koutsias, N.
2013-02-01
Humans are responsible for most forest fires in Europe, but anthropogenic factors behind these events are still poorly understood. We tried to identify the driving factors of human-caused fire occurrence in Spain by applying two different statistical approaches. Firstly, assuming stationary processes for the whole country, we created models based on multiple linear regression and binary logistic regression to find factors associated with fire density and fire presence, respectively. Secondly, we used geographically weighted regression (GWR) to better understand and explore the local and regional variations of those factors behind human-caused fire occurrence. The number of human-caused fires occurring within a 25-yr period (1983-2007) was computed for each of the 7638 Spanish mainland municipalities, creating a binary variable (fire/no fire) to develop logistic models, and a continuous variable (fire density) to build standard linear regression models. A total of 383 657 fires were registered in the study dataset. The binary logistic model, which estimates the probability of having/not having a fire, successfully classified 76.4% of the total observations, while the ordinary least squares (OLS) regression model explained 53% of the variation of the fire density patterns (adjusted R2 = 0.53). Both approaches confirmed, in addition to forest and climatic variables, the importance of variables related with agrarian activities, land abandonment, rural population exodus and developmental processes as underlying factors of fire occurrence. For the GWR approach, the explanatory power of the GW linear model for fire density using an adaptive bandwidth increased from 53% to 67%, while for the GW logistic model the correctly classified observations improved only slightly, from 76.4% to 78.4%, but significantly according to the corrected Akaike Information Criterion (AICc), from 3451.19 to 3321.19. The results from GWR indicated a significant spatial variation in the local parameter estimates for all the variables and an important reduction of the autocorrelation in the residuals of the GW linear model. Despite the fitting improvement of local models, GW regression, more than an alternative to "global" or traditional regression modelling, seems to be a valuable complement to explore the non-stationary relationships between the response variable and the explanatory variables. The synergy of global and local modelling provides insights into fire management and policy and helps further our understanding of the fire problem over large areas while at the same time recognizing its local character.
Schell, Greggory J; Lavieri, Mariel S; Stein, Joshua D; Musch, David C
2013-12-21
Open-angle glaucoma (OAG) is a prevalent, degenerate ocular disease which can lead to blindness without proper clinical management. The tests used to assess disease progression are susceptible to process and measurement noise. The aim of this study was to develop a methodology which accounts for the inherent noise in the data and improve significant disease progression identification. Longitudinal observations from the Collaborative Initial Glaucoma Treatment Study (CIGTS) were used to parameterize and validate a Kalman filter model and logistic regression function. The Kalman filter estimates the true value of biomarkers associated with OAG and forecasts future values of these variables. We develop two logistic regression models via generalized estimating equations (GEE) for calculating the probability of experiencing significant OAG progression: one model based on the raw measurements from CIGTS and another model based on the Kalman filter estimates of the CIGTS data. Receiver operating characteristic (ROC) curves and associated area under the ROC curve (AUC) estimates are calculated using cross-fold validation. The logistic regression model developed using Kalman filter estimates as data input achieves higher sensitivity and specificity than the model developed using raw measurements. The mean AUC for the Kalman filter-based model is 0.961 while the mean AUC for the raw measurements model is 0.889. Hence, using the probability function generated via Kalman filter estimates and GEE for logistic regression, we are able to more accurately classify patients and instances as experiencing significant OAG progression. A Kalman filter approach for estimating the true value of OAG biomarkers resulted in data input which improved the accuracy of a logistic regression classification model compared to a model using raw measurements as input. This methodology accounts for process and measurement noise to enable improved discrimination between progression and nonprogression in chronic diseases.
NASA Astrophysics Data System (ADS)
Duman, T. Y.; Can, T.; Gokceoglu, C.; Nefeslioglu, H. A.; Sonmez, H.
2006-11-01
As a result of industrialization, throughout the world, cities have been growing rapidly for the last century. One typical example of these growing cities is Istanbul, the population of which is over 10 million. Due to rapid urbanization, new areas suitable for settlement and engineering structures are necessary. The Cekmece area located west of the Istanbul metropolitan area is studied, because the landslide activity is extensive in this area. The purpose of this study is to develop a model that can be used to characterize landslide susceptibility in map form using logistic regression analysis of an extensive landslide database. A database of landslide activity was constructed using both aerial-photography and field studies. About 19.2% of the selected study area is covered by deep-seated landslides. The landslides that occur in the area are primarily located in sandstones with interbedded permeable and impermeable layers such as claystone, siltstone and mudstone. About 31.95% of the total landslide area is located at this unit. To apply logistic regression analyses, a data matrix including 37 variables was constructed. The variables used in the forwards stepwise analyses are different measures of slope, aspect, elevation, stream power index (SPI), plan curvature, profile curvature, geology, geomorphology and relative permeability of lithological units. A total of 25 variables were identified as exerting strong influence on landslide occurrence, and included by the logistic regression equation. Wald statistics values indicate that lithology, SPI and slope are more important than the other parameters in the equation. Beta coefficients of the 25 variables included the logistic regression equation provide a model for landslide susceptibility in the Cekmece area. This model is used to generate a landslide susceptibility map that correctly classified 83.8% of the landslide-prone areas.
A Nationwide Epidemiologic Modeling Study of LD: Risk, Protection, and Unintended Impact
ERIC Educational Resources Information Center
McDermott, Paul A.; Goldberg, Michelle M.; Watkins, Marley W.; Stanley, Jeanne L.; Glutting, Joseph J.
2006-01-01
Through multiple logistic regression modeling, this article explores the relative importance of risk and protective factors associated with learning disabilities (LD). A representative national sample of 6- to 17-year-old students (N = 1,268) was drawn by random stratification and classified by the presence versus absence of LD in reading,…
Seasonal Variation in Physical Activity among Preschool Children in a Northern Canadian City
ERIC Educational Resources Information Center
Carson, Valerie; Spence, John C.; Cutumisu, Nicoleta; Boule, Normand; Edwards, Joy
2010-01-01
Little research has examined seasonal differences in physical activity (PA) levels among children. Proxy reports of PA were completed by 1,715 parents on their children in Edmonton, Alberta, Canada. Total PA (TPA) minutes were calculated, and each participant was classified as active, somewhat active, or inactive. Logistic regression models were…
Classification of pregnancies of unknown location according to four different hCG-based protocols.
Fistouris, J; Bergh, C; Strandell, A
2016-10-01
How do four protocols based on serial human chorionic gonadotropin (hCG) measurements perform when classifying pregnancies of unknown location (PULs) as low or high risk of being an ectopic pregnancy (EP)? The use of cut-offs in hCG level changes published by NICE, and a logistic regression model, M4, correctly classify more PULs as high risk, compared with two other protocols. A logistic regression model, M4, based on the mean of two consecutive hCG values and the hCG ratio (hCG 48 h/hCG 0 h) that classify PULs into low- and high-risk groups for triage purposes, identifies more EPs than a protocol using the cut-offs between a 13% decline and a 66% rise in hCG levels over 48 h. A retrospective comparative study of four different hCG-based protocols classifying PULs as low or high risk of being an EP was performed at a gynaecological emergency unit over 3 years. We identified 915 women with a PUL. Initial transvaginal ultrasonography (TVS) findings categorised 187 of the PULs as probable intrauterine pregnancies (IUPs) and 16 as probable EPs. The rate of change in hCG levels over 48 h was calculated for each patient and subjected to three different hCG threshold intervals and a logistic regression model for outcome prediction. Each PUL was subsequently dichotomised to either low-risk (i.e. failed PUL/IUP) or high-risk (i.e. EP) classification, which allowed us to compare the diagnostic performance. In 'Protocol A', a PUL was classified as low risk if >13% hCG level decline or >66% hCG level rise was achieved; otherwise, the PUL was classified as high risk of being an EP. 'Protocol B' classified a PUL as low or high risk using cut-offs of 35-50% declining hCG levels and of 53% rising hCG levels. Similarly, 'Protocol C' used hCG level cut-offs published by NICE, 50% for declining hCG levels and 63% for rising hCG levels. Finally, if a logistic regression model 'Protocol M4' calculated a ≥5% risk of the PUL being an EP, it was classified as high risk, and otherwise the PUL was classified as low risk. When the time interval between two hCG measurements failed to meet an exact 48 h, extrapolation and interpolation of hCG values was made, using log linear transformation. Protocols A, B, C and M4 classified 73, 66, 55 and 56% of PULs as low risk. The sensitivity for protocols A, B, C and M4 was 68% (95% confidence interval (CI) 61-75%), 81% (74-86%), 87% (82-92%) and 88% (83-93%), respectively. The specificity was 82% (80-85%), 77% (74-80%), 66% (62-69%) and 67% (63-70%) for protocols A, B, C and M4, respectively. All comparisons of sensitivity and specificity between the protocols were statistically significant except for protocol C versus protocol M4. In protocol C, 87% (66-97%) of misclassified EPs had rising hCG levels, compared with 19% (6-41%) for protocol M4 (P < 0.01). In a secondary analysis excluding probable IUPs and probable EPs, the results for 712 PULs were analysed. The sensitivity subsequently remained stable for all protocols. Protocol M4 reached a 78% (74-81%) specificity, which was significantly higher than 70% (66-74%) for protocol C (P = 0.01) and protocol M4 classified 63% of PULs as low risk compared with 58% for protocol C. The retrospective design of the study is a limitation. The results are derived from a population where laparoscopy played an important role in PUL management and diagnosis of EPs, although it did reflect real clinical practice. Although we tried to adhere to definitions of PUL and final outcomes as in previous studies and a recent consensus statement, potential differences in this regard must be acknowledged. Where the time interval between two serial hCG measurements deviated from 48 h we estimated 48 h hCG values. A logistic regression model, M4, classifies more PULs correctly as low risk in a selected PUL population without probable IUPs and EPs and identifies as many EPs, in comparison with the cut-offs available in the NICE guideline. This advantage for model M4 may result in a reduction of unnecessary follow-up visits, when fewer low-risk PULs are misclassified as high risk. These findings, however, ought to be clarified in a randomised controlled trial. The study was supported by LUA/ALF grant No. 70940. There are no competing interests. © The Author 2016. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Akkus, Zeki; Camdeviren, Handan; Celik, Fatma; Gur, Ali; Nas, Kemal
2005-09-01
To determine the risk factors of osteoporosis using a multiple binary logistic regression method and to assess the risk variables for osteoporosis, which is a major and growing health problem in many countries. We presented a case-control study, consisting of 126 postmenopausal healthy women as control group and 225 postmenopausal osteoporotic women as the case group. The study was carried out in the Department of Physical Medicine and Rehabilitation, Dicle University, Diyarbakir, Turkey between 1999-2002. The data from the 351 participants were collected using a standard questionnaire that contains 43 variables. A multiple logistic regression model was then used to evaluate the data and to find the best regression model. We classified 80.1% (281/351) of the participants using the regression model. Furthermore, the specificity value of the model was 67% (84/126) of the control group while the sensitivity value was 88% (197/225) of the case group. We found the distribution of residual values standardized for final model to be exponential using the Kolmogorow-Smirnow test (p=0.193). The receiver operating characteristic curve was found successful to predict patients with risk for osteoporosis. This study suggests that low levels of dietary calcium intake, physical activity, education, and longer duration of menopause are independent predictors of the risk of low bone density in our population. Adequate dietary calcium intake in combination with maintaining a daily physical activity, increasing educational level, decreasing birth rate, and duration of breast-feeding may contribute to healthy bones and play a role in practical prevention of osteoporosis in Southeast Anatolia. In addition, the findings of the present study indicate that the use of multivariate statistical method as a multiple logistic regression in osteoporosis, which maybe influenced by many variables, is better than univariate statistical evaluation.
Analysis of the Effects of the Commander’s Battle Positioning on Unit Combat Performance
1991-03-01
Analysis ......... .. 58 Logistic Regression Analysis ......... .. 61 Canonical Correlation Analysis ........ .. 62 Descriminant Analysis...entails classifying objects into two or more distinct groups, or responses. Dillon defines descriminant analysis as "deriving linear combinations of the...object given it’s predictor variables. The second objective is, through analysis of the parameters of the descriminant functions, determine those
Javed, Amna; Tiwana, Mohsin I.; Khan, Umar Shahbaz
2018-01-01
Brain Computer Interface (BCI) determines the intent of the user from a variety of electrophysiological signals. These signals, Slow Cortical Potentials, are recorded from scalp, and cortical neuronal activity is recorded by implanted electrodes. This paper is focused on design of an embedded system that is used to control the finger movements of an upper limb prosthesis using Electroencephalogram (EEG) signals. This is a follow-up of our previous research which explored the best method to classify three movements of fingers (thumb movement, index finger movement, and first movement). Two-stage logistic regression classifier exhibited the highest classification accuracy while Power Spectral Density (PSD) was used as a feature of the filtered signal. The EEG signal data set was recorded using a 14-channel electrode headset (a noninvasive BCI system) from right-handed, neurologically intact volunteers. Mu (commonly known as alpha waves) and Beta Rhythms (8–30 Hz) containing most of the movement data were retained through filtering using “Arduino Uno” microcontroller followed by 2-stage logistic regression to obtain a mean classification accuracy of 70%. PMID:29888252
Modeling Verdict Outcomes Using Social Network Measures: The Watergate and Caviar Network Cases.
Masías, Víctor Hugo; Valle, Mauricio; Morselli, Carlo; Crespo, Fernando; Vargas, Augusto; Laengle, Sigifredo
2016-01-01
Modelling criminal trial verdict outcomes using social network measures is an emerging research area in quantitative criminology. Few studies have yet analyzed which of these measures are the most important for verdict modelling or which data classification techniques perform best for this application. To compare the performance of different techniques in classifying members of a criminal network, this article applies three different machine learning classifiers-Logistic Regression, Naïve Bayes and Random Forest-with a range of social network measures and the necessary databases to model the verdicts in two real-world cases: the U.S. Watergate Conspiracy of the 1970's and the now-defunct Canada-based international drug trafficking ring known as the Caviar Network. In both cases it was found that the Random Forest classifier did better than either Logistic Regression or Naïve Bayes, and its superior performance was statistically significant. This being so, Random Forest was used not only for classification but also to assess the importance of the measures. For the Watergate case, the most important one proved to be betweenness centrality while for the Caviar Network, it was the effective size of the network. These results are significant because they show that an approach combining machine learning with social network analysis not only can generate accurate classification models but also helps quantify the importance social network variables in modelling verdict outcomes. We conclude our analysis with a discussion and some suggestions for future work in verdict modelling using social network measures.
Pandey, Gaurav; Pandey, Om P; Rogers, Angela J; Ahsen, Mehmet E; Hoffman, Gabriel E; Raby, Benjamin A; Weiss, Scott T; Schadt, Eric E; Bunyavanich, Supinda
2018-06-11
Asthma is a common, under-diagnosed disease affecting all ages. We sought to identify a nasal brush-based classifier of mild/moderate asthma. 190 subjects with mild/moderate asthma and controls underwent nasal brushing and RNA sequencing of nasal samples. A machine learning-based pipeline identified an asthma classifier consisting of 90 genes interpreted via an L2-regularized logistic regression classification model. This classifier performed with strong predictive value and sensitivity across eight test sets, including (1) a test set of independent asthmatic and control subjects profiled by RNA sequencing (positive and negative predictive values of 1.00 and 0.96, respectively; AUC of 0.994), (2) two independent case-control cohorts of asthma profiled by microarray, and (3) five cohorts with other respiratory conditions (allergic rhinitis, upper respiratory infection, cystic fibrosis, smoking), where the classifier had a low to zero misclassification rate. Following validation in large, prospective cohorts, this classifier could be developed into a nasal biomarker of asthma.
Habitat features and predictive habitat modeling for the Colorado chipmunk in southern New Mexico
Rivieccio, M.; Thompson, B.C.; Gould, W.R.; Boykin, K.G.
2003-01-01
Two subspecies of Colorado chipmunk (state threatened and federal species of concern) occur in southern New Mexico: Tamias quadrivittatus australis in the Organ Mountains and T. q. oscuraensis in the Oscura Mountains. We developed a GIS model of potentially suitable habitat based on vegetation and elevation features, evaluated site classifications of the GIS model, and determined vegetation and terrain features associated with chipmunk occurrence. We compared GIS model classifications with actual vegetation and elevation features measured at 37 sites. At 60 sites we measured 18 habitat variables regarding slope, aspect, tree species, shrub species, and ground cover. We used logistic regression to analyze habitat variables associated with chipmunk presence/absence. All (100%) 37 sample sites (28 predicted suitable, 9 predicted unsuitable) were classified correctly by the GIS model regarding elevation and vegetation. For 28 sites predicted suitable by the GIS model, 18 sites (64%) appeared visually suitable based on habitat variables selected from logistic regression analyses, of which 10 sites (36%) were specifically predicted as suitable habitat via logistic regression. We detected chipmunks at 70% of sites deemed suitable via the logistic regression models. Shrub cover, tree density, plant proximity, presence of logs, and presence of rock outcrop were retained in the logistic model for the Oscura Mountains; litter, shrub cover, and grass cover were retained in the logistic model for the Organ Mountains. Evaluation of predictive models illustrates the need for multi-stage analyses to best judge performance. Microhabitat analyses indicate prospective needs for different management strategies between the subspecies. Sensitivities of each population of the Colorado chipmunk to natural and prescribed fire suggest that partial burnings of areas inhabited by Colorado chipmunks in southern New Mexico may be beneficial. These partial burnings may later help avoid a fire that could substantially reduce habitat of chipmunks over a mountain range.
Logistic Regression in the Identification of Hazards in Construction
NASA Astrophysics Data System (ADS)
Drozd, Wojciech
2017-10-01
The construction site and its elements create circumstances that are conducive to the formation of risks to safety during the execution of works. Analysis indicates the critical importance of these factors in the set of characteristics that describe the causes of accidents in the construction industry. This article attempts to analyse the characteristics related to the construction site, in order to indicate their importance in defining the circumstances of accidents at work. The study includes sites inspected in 2014 - 2016 by the employees of the District Labour Inspectorate in Krakow (Poland). The analysed set of detailed (disaggregated) data includes both quantitative and qualitative characteristics. The substantive task focused on classification modelling in the identification of hazards in construction and identifying those of the analysed characteristics that are important in an accident. In terms of methodology, resource data analysis using statistical classifiers, in the form of logistic regression, was the method used.
Eke, Gemma; Holttum, Sue; Hayward, Mark
2012-03-01
Previous research highlights barriers to clinical psychologists conducting research, but has rarely examined U.K. clinical psychologists. The study investigated U.K. clinical psychologists' self-reported research output and tested part of a theoretical model of factors influencing their intention to conduct research. Questionnaires were mailed to 1,300 U.K. clinical psychologists. Three hundred and seventy-four questionnaires were returned (29% response-rate). This study replicated in a U.K. sample the finding that the modal number of publications was zero, highlighted in a number of U.K. and U.S. studies. Research intention was bimodally distributed, and logistic regression classified 78% of cases successfully. Outcome expectations, perceived behavioral control and normative beliefs mediated between research training environment and intention. Further research should explore how research is negotiated in clinical roles, and this issue should be incorporated into prequalification training. © 2012 Wiley Periodicals, Inc.
Mena, Jorge Humberto; Sanchez, Alvaro Ignacio; Rubiano, Andres M.; Peitzman, Andrew B.; Sperry, Jason L.; Gutierrez, Maria Isabel; Puyana, Juan Carlos
2011-01-01
Objective The Glasgow Coma Scale (GCS) classifies Traumatic Brain Injuries (TBI) as Mild (14–15); Moderate (9–13) or Severe (3–8). The ATLS modified this classification so that a GCS score of 13 is categorized as mild TBI. We investigated the effect of this modification on mortality prediction, comparing patients with a GCS of 13 classified as moderate TBI (Classic Model) to patients with GCS of 13 classified as mild TBI (Modified Model). Methods We selected adult TBI patients from the Pennsylvania Outcome Study database (PTOS). Logistic regressions adjusting for age, sex, cause, severity, trauma center level, comorbidities, and isolated TBI were performed. A second evaluation included the time trend of mortality. A third evaluation also included hypothermia, hypotension, mechanical ventilation, screening for drugs, and severity of TBI. Discrimination of the models was evaluated using the area under receiver operating characteristic curve (AUC). Calibration was evaluated using the Hoslmer-Lemershow goodness of fit (GOF) test. Results In the first evaluation, the AUCs were 0.922 (95 %CI, 0.917–0.926) and 0.908 (95 %CI, 0.903–0.912) for classic and modified models, respectively. Both models showed poor calibration (p<0.001). In the third evaluation, the AUCs were 0.946 (95 %CI, 0.943 – 0.949) and 0.938 (95 %CI, 0.934 –0.940) for the classic and modified models, respectively, with improvements in calibration (p=0.30 and p=0.02 for the classic and modified models, respectively). Conclusion The lack of overlap between ROC curves of both models reveals a statistically significant difference in their ability to predict mortality. The classic model demonstrated better GOF than the modified model. A GCS of 13 classified as moderate TBI in a multivariate logistic regression model performed better than a GCS of 13 classified as mild. PMID:22071923
Hepatotoxicity during Treatment for Tuberculosis in People Living with HIV/AIDS.
Araújo-Mariz, Carolline; Lopes, Edmundo Pessoa; Acioli-Santos, Bartolomeu; Maruza, Magda; Montarroyos, Ulisses Ramos; Ximenes, Ricardo Arraes de Alencar; Lacerda, Heloísa Ramos; Miranda-Filho, Demócrito de Barros; Albuquerque, Maria de Fátima P Militão de
2016-01-01
Hepatotoxicity is frequently reported as an adverse reaction during the treatment of tuberculosis. The aim of this study was to determine the incidence of hepatotoxicity and to identify predictive factors for developing hepatotoxicity after people living with HIV/AIDS (PLWHA) start treatment for tuberculosis. This was a prospective cohort study with PLWHA who were monitored during the first 60 days of tuberculosis treatment in Pernambuco, Brazil. Hepatotoxicity was considered increased levels of aminotransferase, namely those that rose to three times higher than the level before initiating tuberculosis treatment, these levels being associated with symptoms of hepatitis. We conducted a multivariate logistic regression analysis and the magnitude of the associations was expressed by the odds ratio with a confidence interval of 95%. Hepatotoxicity was observed in 53 (30.6%) of the 173 patients who started tuberculosis treatment. The final multivariate logistic regression model demonstrated that the use of fluconazole, malnutrition and the subject being classified as a phenotypically slow acetylator increased the risk of hepatotoxicity significantly. The incidence of hepatotoxicity during treatment for tuberculosis in PLWHA was high. Those classified as phenotypically slow acetylators and as malnourished should be targeted for specific care to reduce the risk of hepatotoxicity during treatment for tuberculosis. The use of fluconazole should be avoided during tuberculosis treatment in PLWHA.
Hill, Benjamin David; Womble, Melissa N; Rohling, Martin L
2015-01-01
This study utilized logistic regression to determine whether performance patterns on Concussion Vital Signs (CVS) could differentiate known groups with either genuine or feigned performance. For the embedded measure development group (n = 174), clinical patients and undergraduate students categorized as feigning obtained significantly lower scores on the overall test battery mean for the CVS, Shipley-2 composite score, and California Verbal Learning Test-Second Edition subtests than did genuinely performing individuals. The final full model of 3 predictor variables (Verbal Memory immediate hits, Verbal Memory immediate correct passes, and Stroop Test complex reaction time correct) was significant and correctly classified individuals in their known group 83% of the time (sensitivity = .65; specificity = .97) in a mixed sample of young-adult clinical cases and simulators. The CVS logistic regression function was applied to a separate undergraduate college group (n = 378) that was asked to perform genuinely and identified 5% as having possibly feigned performance indicating a low false-positive rate. The failure rate was 11% and 16% at baseline cognitive testing in samples of high school and college athletes, respectively. These findings have particular relevance given the increasing use of computerized test batteries for baseline cognitive testing and return-to-play decisions after concussion.
Statistical text classifier to detect specific type of medical incidents.
Wong, Zoie Shui-Yee; Akiyama, Masanori
2013-01-01
WHO Patient Safety has put focus to increase the coherence and expressiveness of patient safety classification with the foundation of International Classification for Patient Safety (ICPS). Text classification and statistical approaches has showed to be successful to identifysafety problems in the Aviation industryusing incident text information. It has been challenging to comprehend the taxonomy of medical incidents in a structured manner. Independent reporting mechanisms for patient safety incidents have been established in the UK, Canada, Australia, Japan, Hong Kong etc. This research demonstrates the potential to construct statistical text classifiers to detect specific type of medical incidents using incident text data. An illustrative example for classifying look-alike sound-alike (LASA) medication incidents using structured text from 227 advisories related to medication errors from Global Patient Safety Alerts (GPSA) is shown in this poster presentation. The classifier was built using logistic regression model. ROC curve and the AUC value indicated that this is a satisfactory good model.
Secure Logistic Regression Based on Homomorphic Encryption: Design and Evaluation
Song, Yongsoo; Wang, Shuang; Xia, Yuhou; Jiang, Xiaoqian
2018-01-01
Background Learning a model without accessing raw data has been an intriguing idea to security and machine learning researchers for years. In an ideal setting, we want to encrypt sensitive data to store them on a commercial cloud and run certain analyses without ever decrypting the data to preserve privacy. Homomorphic encryption technique is a promising candidate for secure data outsourcing, but it is a very challenging task to support real-world machine learning tasks. Existing frameworks can only handle simplified cases with low-degree polynomials such as linear means classifier and linear discriminative analysis. Objective The goal of this study is to provide a practical support to the mainstream learning models (eg, logistic regression). Methods We adapted a novel homomorphic encryption scheme optimized for real numbers computation. We devised (1) the least squares approximation of the logistic function for accuracy and efficiency (ie, reduce computation cost) and (2) new packing and parallelization techniques. Results Using real-world datasets, we evaluated the performance of our model and demonstrated its feasibility in speed and memory consumption. For example, it took approximately 116 minutes to obtain the training model from the homomorphically encrypted Edinburgh dataset. In addition, it gives fairly accurate predictions on the testing dataset. Conclusions We present the first homomorphically encrypted logistic regression outsourcing model based on the critical observation that the precision loss of classification models is sufficiently small so that the decision plan stays still. PMID:29666041
Alghamdi, Manal; Al-Mallah, Mouaz; Keteyian, Steven; Brawner, Clinton; Ehrman, Jonathan; Sakr, Sherif
2017-01-01
Machine learning is becoming a popular and important approach in the field of medical research. In this study, we investigate the relative performance of various machine learning methods such as Decision Tree, Naïve Bayes, Logistic Regression, Logistic Model Tree and Random Forests for predicting incident diabetes using medical records of cardiorespiratory fitness. In addition, we apply different techniques to uncover potential predictors of diabetes. This FIT project study used data of 32,555 patients who are free of any known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 5-year follow-up. At the completion of the fifth year, 5,099 of those patients have developed diabetes. The dataset contained 62 attributes classified into four categories: demographic characteristics, disease history, medication use history, and stress test vital signs. We developed an Ensembling-based predictive model using 13 attributes that were selected based on their clinical importance, Multiple Linear Regression, and Information Gain Ranking methods. The negative effect of the imbalance class of the constructed model was handled by Synthetic Minority Oversampling Technique (SMOTE). The overall performance of the predictive model classifier was improved by the Ensemble machine learning approach using the Vote method with three Decision Trees (Naïve Bayes Tree, Random Forest, and Logistic Model Tree) and achieved high accuracy of prediction (AUC = 0.92). The study shows the potential of ensembling and SMOTE approaches for predicting incident diabetes using cardiorespiratory fitness data.
Reducing the number of reconstructions needed for estimating channelized observer performance
NASA Astrophysics Data System (ADS)
Pineda, Angel R.; Miedema, Hope; Brenner, Melissa; Altaf, Sana
2018-03-01
A challenge for task-based optimization is the time required for each reconstructed image in applications where reconstructions are time consuming. Our goal is to reduce the number of reconstructions needed to estimate the area under the receiver operating characteristic curve (AUC) of the infinitely-trained optimal channelized linear observer. We explore the use of classifiers which either do not invert the channel covariance matrix or do feature selection. We also study the assumption that multiple low contrast signals in the same image of a non-linear reconstruction do not significantly change the estimate of the AUC. We compared the AUC of several classifiers (Hotelling, logistic regression, logistic regression using Firth bias reduction and the least absolute shrinkage and selection operator (LASSO)) with a small number of observations both for normal simulated data and images from a total variation reconstruction in magnetic resonance imaging (MRI). We used 10 Laguerre-Gauss channels and the Mann-Whitney estimator for AUC. For this data, our results show that at small sample sizes feature selection using the LASSO technique can decrease bias of the AUC estimation with increased variance and that for large sample sizes the difference between these classifiers is small. We also compared the use of multiple signals in a single reconstructed image to reduce the number of reconstructions in a total variation reconstruction for accelerated imaging in MRI. We found that AUC estimation using multiple low contrast signals in the same image resulted in similar AUC estimates as doing a single reconstruction per signal leading to a 13x reduction in the number of reconstructions needed.
Perez, Ivan; Chavez, Allison K; Ponce, Dario
2016-01-01
The Ricketts' posteroanterior (PA) cephalometry seems to be the most widely used and it has not been tested by multivariate statistics for sex determination. The objective was to determine the applicability of Ricketts' PA cephalometry for sex determination using the logistic regression analysis. The logistic models were estimated at distinct age cutoffs (all ages, 11 years, 13 years, and 15 years) in a database from 1,296 Hispano American Peruvians between 5 years and 44 years of age. The logistic models were composed by six cephalometric measurements; the accuracy achieved by resubstitution varied between 60% and 70% and all the variables, with one exception, exhibited a direct relationship with the probability of being classified as male; the nasal width exhibited an indirect relationship. The maxillary and facial widths were present in all models and may represent a sexual dimorphism indicator. The accuracy found was lower than the literature and the Ricketts' PA cephalometry may not be adequate for sex determination. The indirect relationship of the nasal width in models with data from patients of 12 years of age or less may be a trait related to age or a characteristic in the studied population, which could be better studied and confirmed.
Minimalist ensemble algorithms for genome-wide protein localization prediction.
Lin, Jhih-Rong; Mondal, Ananda Mohan; Liu, Rong; Hu, Jianjun
2012-07-03
Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi.
Minimalist ensemble algorithms for genome-wide protein localization prediction
2012-01-01
Background Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. Results This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. Conclusions We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi. PMID:22759391
Linking brain-wide multivoxel activation patterns to behaviour: Examples from language and math.
Raizada, Rajeev D S; Tsao, Feng-Ming; Liu, Huei-Mei; Holloway, Ian D; Ansari, Daniel; Kuhl, Patricia K
2010-05-15
A key goal of cognitive neuroscience is to find simple and direct connections between brain and behaviour. However, fMRI analysis typically involves choices between many possible options, with each choice potentially biasing any brain-behaviour correlations that emerge. Standard methods of fMRI analysis assess each voxel individually, but then face the problem of selection bias when combining those voxels into a region-of-interest, or ROI. Multivariate pattern-based fMRI analysis methods use classifiers to analyse multiple voxels together, but can also introduce selection bias via data-reduction steps as feature selection of voxels, pre-selecting activated regions, or principal components analysis. We show here that strong brain-behaviour links can be revealed without any voxel selection or data reduction, using just plain linear regression as a classifier applied to the whole brain at once, i.e. treating each entire brain volume as a single multi-voxel pattern. The brain-behaviour correlations emerged despite the fact that the classifier was not provided with any information at all about subjects' behaviour, but instead was given only the neural data and its condition-labels. Surprisingly, more powerful classifiers such as a linear SVM and regularised logistic regression produce very similar results. We discuss some possible reasons why the very simple brain-wide linear regression model is able to find correlations with behaviour that are as strong as those obtained on the one hand from a specific ROI and on the other hand from more complex classifiers. In a manner which is unencumbered by arbitrary choices, our approach offers a method for investigating connections between brain and behaviour which is simple, rigorous and direct. Copyright (c) 2010 Elsevier Inc. All rights reserved.
Linking brain-wide multivoxel activation patterns to behaviour: Examples from language and math
Raizada, Rajeev D.S.; Tsao, Feng-Ming; Liu, Huei-Mei; Holloway, Ian D.; Ansari, Daniel; Kuhl, Patricia K.
2010-01-01
A key goal of cognitive neuroscience is to find simple and direct connections between brain and behaviour. However, fMRI analysis typically involves choices between many possible options, with each choice potentially biasing any brain–behaviour correlations that emerge. Standard methods of fMRI analysis assess each voxel individually, but then face the problem of selection bias when combining those voxels into a region-of-interest, or ROI. Multivariate pattern-based fMRI analysis methods use classifiers to analyse multiple voxels together, but can also introduce selection bias via data-reduction steps as feature selection of voxels, pre-selecting activated regions, or principal components analysis. We show here that strong brain–behaviour links can be revealed without any voxel selection or data reduction, using just plain linear regression as a classifier applied to the whole brain at once, i.e. treating each entire brain volume as a single multi-voxel pattern. The brain–behaviour correlations emerged despite the fact that the classifier was not provided with any information at all about subjects' behaviour, but instead was given only the neural data and its condition-labels. Surprisingly, more powerful classifiers such as a linear SVM and regularised logistic regression produce very similar results. We discuss some possible reasons why the very simple brain-wide linear regression model is able to find correlations with behaviour that are as strong as those obtained on the one hand from a specific ROI and on the other hand from more complex classifiers. In a manner which is unencumbered by arbitrary choices, our approach offers a method for investigating connections between brain and behaviour which is simple, rigorous and direct. PMID:20132896
Secure Logistic Regression Based on Homomorphic Encryption: Design and Evaluation.
Kim, Miran; Song, Yongsoo; Wang, Shuang; Xia, Yuhou; Jiang, Xiaoqian
2018-04-17
Learning a model without accessing raw data has been an intriguing idea to security and machine learning researchers for years. In an ideal setting, we want to encrypt sensitive data to store them on a commercial cloud and run certain analyses without ever decrypting the data to preserve privacy. Homomorphic encryption technique is a promising candidate for secure data outsourcing, but it is a very challenging task to support real-world machine learning tasks. Existing frameworks can only handle simplified cases with low-degree polynomials such as linear means classifier and linear discriminative analysis. The goal of this study is to provide a practical support to the mainstream learning models (eg, logistic regression). We adapted a novel homomorphic encryption scheme optimized for real numbers computation. We devised (1) the least squares approximation of the logistic function for accuracy and efficiency (ie, reduce computation cost) and (2) new packing and parallelization techniques. Using real-world datasets, we evaluated the performance of our model and demonstrated its feasibility in speed and memory consumption. For example, it took approximately 116 minutes to obtain the training model from the homomorphically encrypted Edinburgh dataset. In addition, it gives fairly accurate predictions on the testing dataset. We present the first homomorphically encrypted logistic regression outsourcing model based on the critical observation that the precision loss of classification models is sufficiently small so that the decision plan stays still. ©Miran Kim, Yongsoo Song, Shuang Wang, Yuhou Xia, Xiaoqian Jiang. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 17.04.2018.
Carnahan, Brian; Meyer, Gérard; Kuntz, Lois-Ann
2003-01-01
Multivariate classification models play an increasingly important role in human factors research. In the past, these models have been based primarily on discriminant analysis and logistic regression. Models developed from machine learning research offer the human factors professional a viable alternative to these traditional statistical classification methods. To illustrate this point, two machine learning approaches--genetic programming and decision tree induction--were used to construct classification models designed to predict whether or not a student truck driver would pass his or her commercial driver license (CDL) examination. The models were developed and validated using the curriculum scores and CDL exam performances of 37 student truck drivers who had completed a 320-hr driver training course. Results indicated that the machine learning classification models were superior to discriminant analysis and logistic regression in terms of predictive accuracy. Actual or potential applications of this research include the creation of models that more accurately predict human performance outcomes.
NASA Astrophysics Data System (ADS)
Inoue, N.; Kitada, N.; Irikura, K.
2013-12-01
A probability of surface rupture is important to configure the seismic source, such as area sources or fault models, for a seismic hazard evaluation. In Japan, Takemura (1998) estimated the probability based on the historical earthquake data. Kagawa et al. (2004) evaluated the probability based on a numerical simulation of surface displacements. The estimated probability indicates a sigmoid curve and increases between Mj (the local magnitude defined and calculated by Japan Meteorological Agency) =6.5 and Mj=7.0. The probability of surface rupture is also used in a probabilistic fault displacement analysis (PFDHA). The probability is determined from the collected earthquake catalog, which were classified into two categories: with surface rupture or without surface rupture. The logistic regression is performed for the classified earthquake data. Youngs et al. (2003), Ross and Moss (2011) and Petersen et al. (2011) indicate the logistic curves of the probability of surface rupture by normal, reverse and strike-slip faults, respectively. Takao et al. (2013) shows the logistic curve derived from only Japanese earthquake data. The Japanese probability curve shows the sharply increasing in narrow magnitude range by comparison with other curves. In this study, we estimated the probability of surface rupture applying the logistic analysis to the surface displacement derived from a surface displacement calculation. A source fault was defined in according to the procedure of Kagawa et al. (2004), which determined a seismic moment from a magnitude and estimated the area size of the asperity and the amount of slip. Strike slip and reverse faults were considered as source faults. We applied Wang et al. (2003) for calculations. The surface displacements with defined source faults were calculated by varying the depth of the fault. A threshold value as 5cm of surface displacement was used to evaluate whether a surface rupture reach or do not reach to the surface. We carried out the logistic regression analysis to the calculated displacements, which were classified by the above threshold. The estimated probability curve indicated the similar trend to the result of Takao et al. (2013). The probability of revere faults is larger than that of strike slip faults. On the other hand, PFDHA results show different trends. The probability of reverse faults at higher magnitude is lower than that of strike slip and normal faults. Ross and Moss (2011) suggested that the sediment and/or rock over the fault compress and not reach the displacement to the surface enough. The numerical theory applied in this study cannot deal with a complex initial situation such as topography.
Predicting Rotator Cuff Tears Using Data Mining and Bayesian Likelihood Ratios
Lu, Hsueh-Yi; Huang, Chen-Yuan; Su, Chwen-Tzeng; Lin, Chen-Chiang
2014-01-01
Objectives Rotator cuff tear is a common cause of shoulder diseases. Correct diagnosis of rotator cuff tears can save patients from further invasive, costly and painful tests. This study used predictive data mining and Bayesian theory to improve the accuracy of diagnosing rotator cuff tears by clinical examination alone. Methods In this retrospective study, 169 patients who had a preliminary diagnosis of rotator cuff tear on the basis of clinical evaluation followed by confirmatory MRI between 2007 and 2011 were identified. MRI was used as a reference standard to classify rotator cuff tears. The predictor variable was the clinical assessment results, which consisted of 16 attributes. This study employed 2 data mining methods (ANN and the decision tree) and a statistical method (logistic regression) to classify the rotator cuff diagnosis into “tear” and “no tear” groups. Likelihood ratio and Bayesian theory were applied to estimate the probability of rotator cuff tears based on the results of the prediction models. Results Our proposed data mining procedures outperformed the classic statistical method. The correction rate, sensitivity, specificity and area under the ROC curve of predicting a rotator cuff tear were statistical better in the ANN and decision tree models compared to logistic regression. Based on likelihood ratios derived from our prediction models, Fagan's nomogram could be constructed to assess the probability of a patient who has a rotator cuff tear using a pretest probability and a prediction result (tear or no tear). Conclusions Our predictive data mining models, combined with likelihood ratios and Bayesian theory, appear to be good tools to classify rotator cuff tears as well as determine the probability of the presence of the disease to enhance diagnostic decision making for rotator cuff tears. PMID:24733553
Mahrooghy, Majid; Ashraf, Ahmed B; Daye, Dania; Mies, Carolyn; Feldman, Michael; Rosen, Mark; Kontos, Despina
2013-01-01
Breast tumors are heterogeneous lesions. Intra-tumor heterogeneity presents a major challenge for cancer diagnosis and treatment. Few studies have worked on capturing tumor heterogeneity from imaging. Most studies to date consider aggregate measures for tumor characterization. In this work we capture tumor heterogeneity by partitioning tumor pixels into subregions and extracting heterogeneity wavelet kinetic (HetWave) features from breast dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) to obtain the spatiotemporal patterns of the wavelet coefficients and contrast agent uptake from each partition. Using a genetic algorithm for feature selection, and a logistic regression classifier with leave one-out cross validation, we tested our proposed HetWave features for the task of classifying breast cancer recurrence risk. The classifier based on our features gave an ROC AUC of 0.78, outperforming previously proposed kinetic, texture, and spatial enhancement variance features which give AUCs of 0.69, 0.64, and 0.65, respectively.
Classifying Volcanic Activity Using an Empirical Decision Making Algorithm
NASA Astrophysics Data System (ADS)
Junek, W. N.; Jones, W. L.; Woods, M. T.
2012-12-01
Detection and classification of developing volcanic activity is vital to eruption forecasting. Timely information regarding an impending eruption would aid civil authorities in determining the proper response to a developing crisis. In this presentation, volcanic activity is characterized using an event tree classifier and a suite of empirical statistical models derived through logistic regression. Forecasts are reported in terms of the United States Geological Survey (USGS) volcano alert level system. The algorithm employs multidisciplinary data (e.g., seismic, GPS, InSAR) acquired by various volcano monitoring systems and source modeling information to forecast the likelihood that an eruption, with a volcanic explosivity index (VEI) > 1, will occur within a quantitatively constrained area. Logistic models are constructed from a sparse and geographically diverse dataset assembled from a collection of historic volcanic unrest episodes. Bootstrapping techniques are applied to the training data to allow for the estimation of robust logistic model coefficients. Cross validation produced a series of receiver operating characteristic (ROC) curves with areas ranging between 0.78-0.81, which indicates the algorithm has good predictive capabilities. The ROC curves also allowed for the determination of a false positive rate and optimum detection for each stage of the algorithm. Forecasts for historic volcanic unrest episodes in North America and Iceland were computed and are consistent with the actual outcome of the events.
Kusano, Kristofer; Gabler, Hampton C
2014-01-01
The odds of death for a seriously injured crash victim are drastically reduced if he or she received care at a trauma center. Advanced automated crash notification (AACN) algorithms are postcrash safety systems that use data measured by the vehicles during the crash to predict the likelihood of occupants being seriously injured. The accuracy of these models are crucial to the success of an AACN. The objective of this study was to compare the predictive performance of competing injury risk models and algorithms: logistic regression, random forest, AdaBoost, naïve Bayes, support vector machine, and classification k-nearest neighbors. This study compared machine learning algorithms to the widely adopted logistic regression modeling approach. Machine learning algorithms have not been commonly studied in the motor vehicle injury literature. Machine learning algorithms may have higher predictive power than logistic regression, despite the drawback of lacking the ability to perform statistical inference. To evaluate the performance of these algorithms, data on 16,398 vehicles involved in non-rollover collisions were extracted from the NASS-CDS. Vehicles with any occupants having an Injury Severity Score (ISS) of 15 or greater were defined as those requiring victims to be treated at a trauma center. The performance of each model was evaluated using cross-validation. Cross-validation assesses how a model will perform in the future given new data not used for model training. The crash ΔV (change in velocity during the crash), damage side (struck side of the vehicle), seat belt use, vehicle body type, number of events, occupant age, and occupant sex were used as predictors in each model. Logistic regression slightly outperformed the machine learning algorithms based on sensitivity and specificity of the models. Previous studies on AACN risk curves used the same data to train and test the power of the models and as a result had higher sensitivity compared to the cross-validated results from this study. Future studies should account for future data; for example, by using cross-validation or risk presenting optimistic predictions of field performance. Past algorithms have been criticized for relying on age and sex, being difficult to measure by vehicle sensors, and inaccuracies in classifying damage side. The models with accurate damage side and including age/sex did outperform models with less accurate damage side and without age/sex, but the differences were small, suggesting that the success of AACN is not reliant on these predictors.
2016-09-01
noise density and temperature sensitivity of these devices are all on the same order of magnitude. Even the worst- case noise density of the GCDC...accelerations from a handgun firing were distinct from other impulsive events on the wrist, such as using a hammer. Loeffler first identified potential shots by...spikes, taking various statistical parameters. He used a logistic regression model on these parameters and was able to classify 98.9% of shots
Fusion of multiscale wavelet-based fractal analysis on retina image for stroke prediction.
Che Azemin, M Z; Kumar, Dinesh K; Wong, T Y; Wang, J J; Kawasaki, R; Mitchell, P; Arjunan, Sridhar P
2010-01-01
In this paper, we present a novel method of analyzing retinal vasculature using Fourier Fractal Dimension to extract the complexity of the retinal vasculature enhanced at different wavelet scales. Logistic regression was used as a fusion method to model the classifier for 5-year stroke prediction. The efficacy of this technique has been tested using standard pattern recognition performance evaluation, Receivers Operating Characteristics (ROC) analysis and medical prediction statistics, odds ratio. Stroke prediction model was developed using the proposed system.
Filius, Anika; Scheltens, Marjan; Bosch, Hans G.; van Doorn, Pieter A.; Stam, Henk J.; Hovius, Steven E.R.; Amadio, Peter C.; Selles, Ruud W.
2015-01-01
Dynamics of structures within the carpal tunnel may alter in carpal tunnel syndrome (CTS) due to fibrotic changes and increased carpal tunnel pressure. Ultrasound can visualize these potential changes, making ultrasound potentially an accurate diagnostic tool. To study this, we imaged the carpal tunnel of 113 patients and 42 controls. CTS severity was classified according to validated clinical and nerve conduction study (NCS) classifications. Transversal and longitudinal displacement and shape (changes) were calculated for the median nerve, tendons and surrounding tissue. To predict diagnostic value binary logistic regression modeling was applied. Reduced longitudinal nerve displacement (p≤0.019), increased nerve cross-sectional area (p≤0.006) and perimeter (p≤0.007), and a trend of relatively changed tendon displacements were seen in patients. Changes were more convincing when CTS was classified as more severe. Binary logistic modeling to diagnose CTS using ultrasound showed a sensitivity of 70-71% and specificity of 80-84%. In conclusion, CTS patients have altered dynamics of structures within the carpal tunnel. PMID:25865180
Mulier, Jan P; De Boeck, Liesje; Meulders, Michel; Beliën, Jeroen; Colpaert, Jan; Sels, Annabel
2015-01-01
Rationale, aims and objectives What factors determine the use of an anaesthesia preparation room and shorten non-operative time? Methods A logistic regression is applied to 18 751 surgery records from AZ Sint-Jan Brugge AV, Belgium, where each operating room has its own anaesthesia preparation room. Surgeries, in which the patient's induction has already started when the preceding patient's surgery has ended, belong to a first group where the preparation room is used as an induction room. Surgeries not fulfilling this property belong to a second group. A logistic regression model tries to predict the probability that a surgery will be classified into a specific group. Non-operative time is calculated as the time between end of the previous surgery and incision of the next surgery. A log-linear regression of this non-operative time is performed. Results It was found that switches in surgeons, being a non-elective surgery as well as the previous surgery being non-elective, increase the probability of being classified into the second group. Only a few surgery types, anaesthesiologists and operating rooms can be found exclusively in one of the two groups. Analysis of variance demonstrates that the first group has significantly lower non-operative times. Switches in surgeons, anaesthesiologists and longer scheduled durations of the previous surgery increases the non-operative time. A switch in both surgeon and anaesthesiologist strengthens this negative effect. Only a few operating rooms and surgery types influence the non-operative time. Conclusion The use of the anaesthesia preparation room shortens the non-operative time and is determined by several human and structural factors. PMID:25496600
Esserman, Denise A.; Moore, Charity G.; Roth, Mary T.
2009-01-01
Older community dwelling adults often take multiple medications for numerous chronic diseases. Non-adherence to these medications can have a large public health impact. Therefore, the measurement and modeling of medication adherence in the setting of polypharmacy is an important area of research. We apply a variety of different modeling techniques (standard linear regression; weighted linear regression; adjusted linear regression; naïve logistic regression; beta-binomial (BB) regression; generalized estimating equations (GEE)) to binary medication adherence data from a study in a North Carolina based population of older adults, where each medication an individual was taking was classified as adherent or non-adherent. In addition, through simulation we compare these different methods based on Type I error rates, bias, power, empirical 95% coverage, and goodness of fit. We find that estimation and inference using GEE is robust to a wide variety of scenarios and we recommend using this in the setting of polypharmacy when adherence is dichotomously measured for multiple medications per person. PMID:20414358
Risk Factors for Suicidal Ideation in People at Risk for Huntington's Disease.
Anderson, Karen E; Eberly, Shirley; Groves, Mark; Kayson, Elise; Marder, Karen; Young, Anne B; Shoulson, Ira
2016-12-15
Suicidal ideation (SI) and attempts are increased in Huntington's disease (HD), making risk factor assessment a priority. To determine whether, hopelessness, irritability, aggression, anxiety, CAG expansion status, depression, and motor signs/symptoms were associated with Suicidal Ideation (SI) in those at risk for HD. Behavioral and neurological data were collected from subjects in an observational study. Subject characteristics were calculated by CAG status and SI. Logistic regression models were adjusted for demographics. Separate logistic regressions were used to compare SI and non-SI subjects. A combined logistic regression model, including 4 pre-specified predictors, (hopelessness, irritability, aggression, anxiety) was used to assess the relationship of SI to these predictors. 801 subjects were assessed, 40 were classified as having SI, 6.3% of CAG mutation expansion carriers had SI, compared with 4.3% of non- CAG mutation expansion carriers (p = 0.2275). SI subjects had significantly increased depression (p < 0.0001), hopelessness (p < 0.0001), irritability (p < 0.0001), aggression (p = 0.0089), and anxiety (p < 0.0001), and an elevated motor score (p = 0.0098). Impulsivity, assessed in a subgroup of subjects, was also associated with SI (p = 0.0267). Hopelessness and anxiety remained significant in combined model (p < 0.001; p < 0.0198, respectively) even when motor score was included. Behavioral symptoms were significantly higher in those reporting SI. Hopelessness and anxiety showed a particularly strong association with SI. Risk identification could assist in assessment of suicidality in this group.
Singer, Martin; Li, Wei; Morré, Servaas A; Ouburg, Sander; Spinola, Stanley M
2016-08-01
In humans inoculated with Haemophilus ducreyi, there are host effects on the possible clinical outcomes-pustule formation versus spontaneous resolution of infection. However, the immunogenetic factors that influence these outcomes are unknown. Here we examined the role of 14 single-nucleotide polymorphisms (SNPs) in 7 selected pathogen-recognition pathways and cytokine genes on the gradated outcomes of experimental infection. DNAs from 105 volunteers infected with H. ducreyi at 3 sites were genotyped for SNPs, using real-time polymerase chain reaction. The participants were classified into 2 cohorts, by race, and into 4 groups, based on whether they formed 0, 1, 2, or 3 pustules. χ(2) tests for trend and logistic regression analyses were performed on the data. In European Americans, the most significant findings were a protective association of the TLR9 +2848 GG genotype and a risk-enhancing association of the TLR9 TA haplotype with pustule formation; logistic regression showed a trend toward protection for the TLR9 +2848 GG genotype. In African Americans, logistic regression showed a protective effect for the IL10 -2849 AA genotype and a risk-enhancing effect for the IL10 AAC haplotype. Variations in TLR9 and IL10 are associated with the outcome of H. ducreyi infection. © The Author 2016. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail journals.permissions@oup.com.
Císař, Petr; Labbé, Laurent; Souček, Pavel; Pelissier, Pablo; Kerneis, Thierry
2018-01-01
The main aim of this study was to develop a new objective method for evaluating the impacts of different diets on the live fish skin using image-based features. In total, one-hundred and sixty rainbow trout (Oncorhynchus mykiss) were fed either a fish-meal based diet (80 fish) or a 100% plant-based diet (80 fish) and photographed using consumer-grade digital camera. Twenty-three colour features and four texture features were extracted. Four different classification methods were used to evaluate fish diets including Random forest (RF), Support vector machine (SVM), Logistic regression (LR) and k-Nearest neighbours (k-NN). The SVM with radial based kernel provided the best classifier with correct classification rate (CCR) of 82% and Kappa coefficient of 0.65. Although the both LR and RF methods were less accurate than SVM, they achieved good classification with CCR 75% and 70% respectively. The k-NN was the least accurate (40%) classification model. Overall, it can be concluded that consumer-grade digital cameras could be employed as the fast, accurate and non-invasive sensor for classifying rainbow trout based on their diets. Furthermore, these was a close association between image-based features and fish diet received during cultivation. These procedures can be used as non-invasive, accurate and precise approaches for monitoring fish status during the cultivation by evaluating diet’s effects on fish skin. PMID:29596375
Saberioon, Mohammadmehdi; Císař, Petr; Labbé, Laurent; Souček, Pavel; Pelissier, Pablo; Kerneis, Thierry
2018-03-29
The main aim of this study was to develop a new objective method for evaluating the impacts of different diets on the live fish skin using image-based features. In total, one-hundred and sixty rainbow trout ( Oncorhynchus mykiss ) were fed either a fish-meal based diet (80 fish) or a 100% plant-based diet (80 fish) and photographed using consumer-grade digital camera. Twenty-three colour features and four texture features were extracted. Four different classification methods were used to evaluate fish diets including Random forest (RF), Support vector machine (SVM), Logistic regression (LR) and k -Nearest neighbours ( k -NN). The SVM with radial based kernel provided the best classifier with correct classification rate (CCR) of 82% and Kappa coefficient of 0.65. Although the both LR and RF methods were less accurate than SVM, they achieved good classification with CCR 75% and 70% respectively. The k -NN was the least accurate (40%) classification model. Overall, it can be concluded that consumer-grade digital cameras could be employed as the fast, accurate and non-invasive sensor for classifying rainbow trout based on their diets. Furthermore, these was a close association between image-based features and fish diet received during cultivation. These procedures can be used as non-invasive, accurate and precise approaches for monitoring fish status during the cultivation by evaluating diet's effects on fish skin.
Vanantwerpen, Gerty; Berkvens, Dirk; De Zutter, Lieven; Houf, Kurt
2015-07-09
Pigs are the main reservoir of human pathogenic Y. enterocolitica, and the microbiological and serological prevalence of this pathogen differs between pig farms. The infection status of pig batches at moment of slaughter is unknown while it is a possibility to classify batches. A relation between the presence of human pathogenic Yersinia spp. and the presence of antibodies could help to predict the infection of the pigs prior to slaughter. Pigs from 100 different batches were sampled. Tonsils and pieces of diaphragm were collected from 7047 pigs (on average 70 pigs per batch). The tonsils were analyzed using a direct plating method and the meat juice collected from the pieces of diaphragm was analyzed by Enzyme Linked ImmunoSorbent Assay. The microbiological and serological results were compared using a mixed-effects logistic regression at pig and batch level. Yersinia spp. were found in 2031 (28.8%) pigs, antibodies were present in 4692 (66.6%) pigs. According to the logistic regression, there was no relation at pig level between the presence of Yersinia spp. in tonsils and the presence of antibodies. Contrarily, at batch level, a mean activity value of 37 Optical Density (OD)% indicated a Yersinia spp. positive farm and the microbiological prevalence in pig batches could be estimated before shipment to the slaughterhouse. This offers the opportunity to classify batches based on their potential risk to contaminate carcasses with human pathogenic Yersinia spp. Copyright © 2015 Elsevier B.V. All rights reserved.
Mao, Nini; Liu, Yunting; Chen, Kewei; Yao, Li; Wu, Xia
2018-06-05
Multiple neuroimaging modalities have been developed providing various aspects of information on the human brain. Used together and properly, these complementary multimodal neuroimaging data integrate multisource information which can facilitate a diagnosis and improve the diagnostic accuracy. In this study, 3 types of brain imaging data (sMRI, FDG-PET, and florbetapir-PET) were fused in the hope to improve diagnostic accuracy, and multivariate methods (logistic regression) were applied to these trimodal neuroimaging indices. Then, the receiver-operating characteristic (ROC) method was used to analyze the outcomes of the logistic classifier, with either each index, multiples from each modality, or all indices from all 3 modalities, to investigate their differential abilities to identify the disease. With increasing numbers of indices within each modality and across modalities, the accuracy of identifying Alzheimer disease (AD) increases to varying degrees. For example, the area under the ROC curve is above 0.98 when all the indices from the 3 imaging data types are combined. Using a combination of different indices, the results confirmed the initial hypothesis that different biomarkers were potentially complementary, and thus the conjoint analysis of multiple information from multiple sources would improve the capability to identify diseases such as AD and mild cognitive impairment. © 2018 S. Karger AG, Basel.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anthony, G; Cunliffe, A; Armato, S
2015-06-15
Purpose: To determine whether the addition of standardized uptake value (SUV) statistical variables to CT lung texture features can improve a predictive model of radiation pneumonitis (RP) development in patients undergoing radiation therapy. Methods: Anonymized data from 96 esophageal cancer patients (18 RP-positive cases of Grade ≥ 2) were retrospectively collected including pre-therapy PET/CT scans, pre-/posttherapy diagnostic CT scans and RP status. Twenty texture features (firstorder, fractal, Laws’ filter and gray-level co-occurrence matrix) were calculated from diagnostic CT scans and compared in anatomically matched regions of the lung. The mean, maximum, standard deviation, and 50th–95th percentiles of the SUV valuesmore » for all lung voxels in the corresponding PET scans were acquired. For each texture feature, a logistic regression-based classifier consisting of (1) the average change in that texture feature value between the pre- and post-therapy CT scans and (2) the pre-therapy SUV standard deviation (SUV{sub SD}) was created. The RP-classification performance of each logistic regression model was compared to the performance of its texture feature alone by computing areas under the receiver operating characteristic curves (AUCs). T-tests were performed to determine whether the mean AUC across texture features changed significantly when SUV{sub SD} was added to the classifier. Results: The AUC for single-texturefeature classifiers ranged from 0.58–0.81 in high-dose (≥ 30 Gy) regions of the lungs and from 0.53–0.71 in low-dose (< 10 Gy) regions. Adding SUVSD in a logistic regression model using a 50/50 data partition for training and testing significantly increased the mean AUC by 0.08, 0.06 and 0.04 in the low-, medium- and high-dose regions, respectively. Conclusion: Addition of SUVSD from a pre-therapy PET scan to a single CT-based texture feature improves RP-classification performance on average. These findings demonstrate the potential for more accurate prediction of RP using information from multiple imaging modalities. Supported, in part, by the National Institute of Biomedical Imaging and Bioengineering of the National Institutes of Health under grant number T32 EB002103; SGA receives royalties and licensing fees through the University of Chicago for computer-aided diagnosis technology. HA receives royalties through the University of Chicago for computer-aided diagnosis technology.« less
NASA Astrophysics Data System (ADS)
Ariffin, Syaiba Balqish; Midi, Habshah
2014-06-01
This article is concerned with the performance of logistic ridge regression estimation technique in the presence of multicollinearity and high leverage points. In logistic regression, multicollinearity exists among predictors and in the information matrix. The maximum likelihood estimator suffers a huge setback in the presence of multicollinearity which cause regression estimates to have unduly large standard errors. To remedy this problem, a logistic ridge regression estimator is put forward. It is evident that the logistic ridge regression estimator outperforms the maximum likelihood approach for handling multicollinearity. The effect of high leverage points are then investigated on the performance of the logistic ridge regression estimator through real data set and simulation study. The findings signify that logistic ridge regression estimator fails to provide better parameter estimates in the presence of both high leverage points and multicollinearity.
A comparative study of machine learning models for ethnicity classification
NASA Astrophysics Data System (ADS)
Trivedi, Advait; Bessie Amali, D. Geraldine
2017-11-01
This paper endeavours to adopt a machine learning approach to solve the problem of ethnicity recognition. Ethnicity identification is an important vision problem with its use cases being extended to various domains. Despite the multitude of complexity involved, ethnicity identification comes naturally to humans. This meta information can be leveraged to make several decisions, be it in target marketing or security. With the recent development of intelligent systems a sub module to efficiently capture ethnicity would be useful in several use cases. Several attempts to identify an ideal learning model to represent a multi-ethnic dataset have been recorded. A comparative study of classifiers such as support vector machines, logistic regression has been documented. Experimental results indicate that the logical classifier provides a much accurate classification than the support vector machine.
NASA Astrophysics Data System (ADS)
WU, Chunhung
2015-04-01
The research built the original logistic regression landslide susceptibility model (abbreviated as or-LRLSM) and landslide ratio-based ogistic regression landslide susceptibility model (abbreviated as lr-LRLSM), compared the performance and explained the error source of two models. The research assumes that the performance of the logistic regression model can be better if the distribution of landslide ratio and weighted value of each variable is similar. Landslide ratio is the ratio of landslide area to total area in the specific area and an useful index to evaluate the seriousness of landslide disaster in Taiwan. The research adopted the landside inventory induced by 2009 Typhoon Morakot in the Chishan watershed, which was the most serious disaster event in the last decade, in Taiwan. The research adopted the 20 m grid as the basic unit in building the LRLSM, and six variables, including elevation, slope, aspect, geological formation, accumulated rainfall, and bank erosion, were included in the two models. The six variables were divided as continuous variables, including elevation, slope, and accumulated rainfall, and categorical variables, including aspect, geological formation and bank erosion in building the or-LRLSM, while all variables, which were classified based on landslide ratio, were categorical variables in building the lr-LRLSM. Because the count of whole basic unit in the Chishan watershed was too much to calculate by using commercial software, the research took random sampling instead of the whole basic units. The research adopted equal proportions of landslide unit and not landslide unit in logistic regression analysis. The research took 10 times random sampling and selected the group with the best Cox & Snell R2 value and Nagelkerker R2 value as the database for the following analysis. Based on the best result from 10 random sampling groups, the or-LRLSM (lr-LRLSM) is significant at the 1% level with Cox & Snell R2 = 0.190 (0.196) and Nagelkerke R2 = 0.253 (0.260). The unit with the landslide susceptibility value > 0.5 (≦ 0.5) will be classified as a predicted landslide unit (not landslide unit). The AUC, i.e. the area under the relative operating characteristic curve, of or-LRLSM in the Chishan watershed is 0.72, while that of lr-LRLSM is 0.77. Furthermore, the average correct ratio of lr-LRLSM (73.3%) is better than that of or-LRLSM (68.3%). The research analyzed in detail the error sources from the two models. In continuous variables, using the landslide ratio-based classification in building the lr-LRLSM can let the distribution of weighted value more similar to distribution of landslide ratio in the range of continuous variable than that in building the or-LRLSM. In categorical variables, the meaning of using the landslide ratio-based classification in building the lr-LRLSM is to gather the parameters with approximate landslide ratio together. The mean correct ratio in continuous variables (categorical variables) by using the lr-LRLSM is better than that in or-LRLSM by 0.6 ~ 2.6% (1.7% ~ 6.0%). Building the landslide susceptibility model by using landslide ratio-based classification is practical and of better performance than that by using the original logistic regression.
The association between second-hand smoke exposure and depressive symptoms among pregnant women.
Huang, Jingya; Wen, Guoming; Yang, Weikang; Yao, Zhenjiang; Wu, Chuan'an; Ye, Xiaohua
2017-10-01
Tobacco smoking and depression are strongly associated, but the possible association between second-hand smoke (SHS) exposure and depression is unclear. This study aimed to examine the possible relation between SHS exposure and depressive symptoms among pregnant women. A cross-sectional survey was conducted in Shenzhen, China, using a multistage sampling method. The univariable and multivariable logistic regression models were used to explore the associations between SHS exposure and depressive symptoms. Among 2176 pregnant women, 10.5% and 2.0% were classified as having probable and severe depressive symptoms. Both binary and multinomial logistic regression revealed that there were significantly increased risks of severe depressive symptoms corresponding to SHS exposure in homes or regular SHS exposure in workplaces using no exposure as reference. In addition, greater frequency of SHS exposure was significantly associated with the increased risk of severe depressive symptoms. Our findings suggest that SHS exposure is positively associated with depressive symptoms in a dose-response manner among the pregnant women. Copyright © 2017 Elsevier B.V. All rights reserved.
Sample size determination for logistic regression on a logit-normal distribution.
Kim, Seongho; Heath, Elisabeth; Heilbrun, Lance
2017-06-01
Although the sample size for simple logistic regression can be readily determined using currently available methods, the sample size calculation for multiple logistic regression requires some additional information, such as the coefficient of determination ([Formula: see text]) of a covariate of interest with other covariates, which is often unavailable in practice. The response variable of logistic regression follows a logit-normal distribution which can be generated from a logistic transformation of a normal distribution. Using this property of logistic regression, we propose new methods of determining the sample size for simple and multiple logistic regressions using a normal transformation of outcome measures. Simulation studies and a motivating example show several advantages of the proposed methods over the existing methods: (i) no need for [Formula: see text] for multiple logistic regression, (ii) available interim or group-sequential designs, and (iii) much smaller required sample size.
Staley, James R; Jones, Edmund; Kaptoge, Stephen; Butterworth, Adam S; Sweeting, Michael J; Wood, Angela M; Howson, Joanna M M
2017-06-01
Logistic regression is often used instead of Cox regression to analyse genome-wide association studies (GWAS) of single-nucleotide polymorphisms (SNPs) and disease outcomes with cohort and case-cohort designs, as it is less computationally expensive. Although Cox and logistic regression models have been compared previously in cohort studies, this work does not completely cover the GWAS setting nor extend to the case-cohort study design. Here, we evaluated Cox and logistic regression applied to cohort and case-cohort genetic association studies using simulated data and genetic data from the EPIC-CVD study. In the cohort setting, there was a modest improvement in power to detect SNP-disease associations using Cox regression compared with logistic regression, which increased as the disease incidence increased. In contrast, logistic regression had more power than (Prentice weighted) Cox regression in the case-cohort setting. Logistic regression yielded inflated effect estimates (assuming the hazard ratio is the underlying measure of association) for both study designs, especially for SNPs with greater effect on disease. Given logistic regression is substantially more computationally efficient than Cox regression in both settings, we propose a two-step approach to GWAS in cohort and case-cohort studies. First to analyse all SNPs with logistic regression to identify associated variants below a pre-defined P-value threshold, and second to fit Cox regression (appropriately weighted in case-cohort studies) to those identified SNPs to ensure accurate estimation of association with disease.
Comparisons and Selections of Features and Classifiers for Short Text Classification
NASA Astrophysics Data System (ADS)
Wang, Ye; Zhou, Zhi; Jin, Shan; Liu, Debin; Lu, Mi
2017-10-01
Short text is considerably different from traditional long text documents due to its shortness and conciseness, which somehow hinders the applications of conventional machine learning and data mining algorithms in short text classification. According to traditional artificial intelligence methods, we divide short text classification into three steps, namely preprocessing, feature selection and classifier comparison. In this paper, we have illustrated step-by-step how we approach our goals. Specifically, in feature selection, we compared the performance and robustness of the four methods of one-hot encoding, tf-idf weighting, word2vec and paragraph2vec, and in the classification part, we deliberately chose and compared Naive Bayes, Logistic Regression, Support Vector Machine, K-nearest Neighbor and Decision Tree as our classifiers. Then, we compared and analysed the classifiers horizontally with each other and vertically with feature selections. Regarding the datasets, we crawled more than 400,000 short text files from Shanghai and Shenzhen Stock Exchanges and manually labeled them into two classes, the big and the small. There are eight labels in the big class, and 59 labels in the small class.
Mayagoitia, Ruth E; Harding, John; Kitchen, Sheila
2017-01-01
The aim was to develop a quantitative approach to identify three stair-climbing ability levels of older adults: no, somewhat and considerable difficulty. Timed-up-and-go test, six-minute-walk test, and Berg balance scale were used for statistical comparison to a new stair climbing ability classifier based on the geometric mean of stair speeds (GeMSS) in ascent and descent on a flight of eight stairs with a 28° pitch in the housing unit where the participants, 28 (16 women) urban older adults (62-94 years), lived. Ordinal logistic regression revealed the thresholds between the three ability levels for each functional test were more stringent than thresholds found in the literature to classify walking ability levels. Though a small study, the intermediate classifier shows promise of early identification of difficulties with stairs, in order to make timely preventative interventions. Further studies are necessary to obtain scaling factors for stairs with other pitches. Copyright © 2016 Elsevier Ltd. All rights reserved.
The crux of the method: assumptions in ordinary least squares and logistic regression.
Long, Rebecca G
2008-10-01
Logistic regression has increasingly become the tool of choice when analyzing data with a binary dependent variable. While resources relating to the technique are widely available, clear discussions of why logistic regression should be used in place of ordinary least squares regression are difficult to find. The current paper compares and contrasts the assumptions of ordinary least squares with those of logistic regression and explains why logistic regression's looser assumptions make it adept at handling violations of the more important assumptions in ordinary least squares.
Iturriaga, H; Hirsch, S; Bunout, D; Díaz, M; Kelly, M; Silva, G; de la Maza, M P; Petermann, M; Ugarte, G
1993-04-01
Looking for a noninvasive method to predict liver histologic alterations in alcoholic patients without clinical signs of liver failure, we studied 187 chronic alcoholics recently abstinent, divided in 2 series. In the model series (n = 94) several clinical variables and results of common laboratory tests were confronted to the findings of liver biopsies. These were classified in 3 groups: 1. Normal liver; 2. Moderate alterations; 3. Marked alterations, including alcoholic hepatitis and cirrhosis. Multivariate methods used were logistic regression analysis and a classification and regression tree (CART). Both methods entered gamma-glutamyltransferase (GGT), aspartate-aminotransferase (AST), weight and age as significant and independent variables. Univariate analysis with GGT and AST at different cutoffs were also performed. To predict the presence of any kind of damage (Groups 2 and 3), CART and AST > 30 IU showed the higher sensitivity, specificity and correct prediction, both in the model and validation series. For prediction of marked liver damage, a score based on logistic regression and GGT > 110 IU had the higher efficiencies. It is concluded that GGT and AST are good markers of alcoholic liver damage and that, using sample cutoffs, histologic diagnosis can be correctly predicted in 80% of recently abstinent asymptomatic alcoholics.
Using Dominance Analysis to Determine Predictor Importance in Logistic Regression
ERIC Educational Resources Information Center
Azen, Razia; Traxel, Nicole
2009-01-01
This article proposes an extension of dominance analysis that allows researchers to determine the relative importance of predictors in logistic regression models. Criteria for choosing logistic regression R[superscript 2] analogues were determined and measures were selected that can be used to perform dominance analysis in logistic regression. A…
Marucci-Wellman, Helen R; Corns, Helen L; Lehto, Mark R
2017-01-01
Injury narratives are now available real time and include useful information for injury surveillance and prevention. However, manual classification of the cause or events leading to injury found in large batches of narratives, such as workers compensation claims databases, can be prohibitive. In this study we compare the utility of four machine learning algorithms (Naïve Bayes, Single word and Bi-gram models, Support Vector Machine and Logistic Regression) for classifying narratives into Bureau of Labor Statistics Occupational Injury and Illness event leading to injury classifications for a large workers compensation database. These algorithms are known to do well classifying narrative text and are fairly easy to implement with off-the-shelf software packages such as Python. We propose human-machine learning ensemble approaches which maximize the power and accuracy of the algorithms for machine-assigned codes and allow for strategic filtering of rare, emerging or ambiguous narratives for manual review. We compare human-machine approaches based on filtering on the prediction strength of the classifier vs. agreement between algorithms. Regularized Logistic Regression (LR) was the best performing algorithm alone. Using this algorithm and filtering out the bottom 30% of predictions for manual review resulted in high accuracy (overall sensitivity/positive predictive value of 0.89) of the final machine-human coded dataset. The best pairings of algorithms included Naïve Bayes with Support Vector Machine whereby the triple ensemble NB SW =NB BI-GRAM =SVM had very high performance (0.93 overall sensitivity/positive predictive value and high accuracy (i.e. high sensitivity and positive predictive values)) across both large and small categories leaving 41% of the narratives for manual review. Integrating LR into this ensemble mix improved performance only slightly. For large administrative datasets we propose incorporation of methods based on human-machine pairings such as we have done here, utilizing readily-available off-the-shelf machine learning techniques and resulting in only a fraction of narratives that require manual review. Human-machine ensemble methods are likely to improve performance over total manual coding. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Choi, Yoonha; Liu, Tiffany Ting; Pankratz, Daniel G; Colby, Thomas V; Barth, Neil M; Lynch, David A; Walsh, P Sean; Raghu, Ganesh; Kennedy, Giulia C; Huang, Jing
2018-05-09
We developed a classifier using RNA sequencing data that identifies the usual interstitial pneumonia (UIP) pattern for the diagnosis of idiopathic pulmonary fibrosis. We addressed significant challenges, including limited sample size, biological and technical sample heterogeneity, and reagent and assay batch effects. We identified inter- and intra-patient heterogeneity, particularly within the non-UIP group. The models classified UIP on transbronchial biopsy samples with a receiver-operating characteristic area under the curve of ~ 0.9 in cross-validation. Using in silico mixed samples in training, we prospectively defined a decision boundary to optimize specificity at ≥85%. The penalized logistic regression model showed greater reproducibility across technical replicates and was chosen as the final model. The final model showed sensitivity of 70% and specificity of 88% in the test set. We demonstrated that the suggested methodologies appropriately addressed challenges of the sample size, disease heterogeneity and technical batch effects and developed a highly accurate and robust classifier leveraging RNA sequencing for the classification of UIP.
2011-01-01
Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed overall classification accuracy above a median value of 0.63, but for most sensitivity was around or even lower than a median value of 0.5. Conclusions When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing. PMID:21849043
Differentiating major depressive disorder in youths with attention deficit hyperactivity disorder.
Diler, Rasim Somer; Daviss, W Burleson; Lopez, Adriana; Axelson, David; Iyengar, Satish; Birmaher, Boris
2007-09-01
Youths with attention deficit hyperactivity disorders (ADHD) frequently have comorbid major depressive disorders (MDD) sharing overlapping symptoms. Our objective was to examine which depressive symptoms best discriminate MDD among youths with ADHD. One-hundred-eleven youths with ADHD (5.2-17.8 years old) and their parents completed interviews with the K-SADS-PL and respective versions of the child or the parent Mood and Feelings Questionnaire (MFQ-C, MFQ-P). Controlling for group differences, logistic regression was used to calculate odds ratios reflecting the accuracy with which various depressive symptoms on the MFQ-C or MFQ-P discriminated MDD. Stepwise logistic regression then identified depressive symptoms that best discriminated the groups with and without MDD, using cross-validated misclassification rate as the criterion. Symptoms that discriminated youths with MDD (n=18) from those without MDD (n=93) were 4 of 6 mood/anhedonia symptoms, all 14 depressed cognition symptoms, and only 3 of 11 physical/vegetative symptoms. Mild irritability, miserable/unhappy moods, and symptoms related to sleep, appetite, energy levels and concentration did not discriminate MDD. A stepwise logistic regression correctly classified 89% of the comorbid MDD subjects, with only age, anhedonia at school, thoughts about killing self, thoughts that bad things would happen, and talking more slowly remaining in the final model. Results of this study may not generalize to community samples because subjects were drawn largely from a university-based outpatient psychiatric clinic. These findings stress the importance of social withdrawal, anhedonia, depressive cognitions, suicidal thoughts, and psychomotor retardation when trying to identify MDD among ADHD youths.
Applying Kaplan-Meier to Item Response Data
ERIC Educational Resources Information Center
McNeish, Daniel
2018-01-01
Some IRT models can be equivalently modeled in alternative frameworks such as logistic regression. Logistic regression can also model time-to-event data, which concerns the probability of an event occurring over time. Using the relation between time-to-event models and logistic regression and the relation between logistic regression and IRT, this…
Farhate, Camila Viana Vieira; Souza, Zigomar Menezes de; Oliveira, Stanley Robson de Medeiros; Tavares, Rose Luiza Moraes; Carvalho, João Luís Nunes
2018-01-01
Soil CO2 emissions are regarded as one of the largest flows of the global carbon cycle and small changes in their magnitude can have a large effect on the CO2 concentration in the atmosphere. Thus, a better understanding of this attribute would enable the identification of promoters and the development of strategies to mitigate the risks of climate change. Therefore, our study aimed at using data mining techniques to predict the soil CO2 emission induced by crop management in sugarcane areas in Brazil. To do so, we used different variable selection methods (correlation, chi-square, wrapper) and classification (Decision tree, Bayesian models, neural networks, support vector machine, bagging with logistic regression), and finally we tested the efficiency of different approaches through the Receiver Operating Characteristic (ROC) curve. The original dataset consisted of 19 variables (18 independent variables and one dependent (or response) variable). The association between cover crop and minimum tillage are effective strategies to promote the mitigation of soil CO2 emissions, in which the average CO2 emissions are 63 kg ha-1 day-1. The variables soil moisture, soil temperature (Ts), rainfall, pH, and organic carbon were most frequently selected for soil CO2 emission classification using different methods for attribute selection. According to the results of the ROC curve, the best approaches for soil CO2 emission classification were the following: (I)-the Multilayer Perceptron classifier with attribute selection through the wrapper method, that presented rate of false positive of 13,50%, true positive of 94,20% area under the curve (AUC) of 89,90% (II)-the Bagging classifier with logistic regression with attribute selection through the Chi-square method, that presented rate of false positive of 13,50%, true positive of 94,20% AUC of 89,90%. However, the (I) approach stands out in relation to (II) for its higher positive class accuracy (high CO2 emission) and lower computational cost.
de Souza, Zigomar Menezes; Oliveira, Stanley Robson de Medeiros; Tavares, Rose Luiza Moraes; Carvalho, João Luís Nunes
2018-01-01
Soil CO2 emissions are regarded as one of the largest flows of the global carbon cycle and small changes in their magnitude can have a large effect on the CO2 concentration in the atmosphere. Thus, a better understanding of this attribute would enable the identification of promoters and the development of strategies to mitigate the risks of climate change. Therefore, our study aimed at using data mining techniques to predict the soil CO2 emission induced by crop management in sugarcane areas in Brazil. To do so, we used different variable selection methods (correlation, chi-square, wrapper) and classification (Decision tree, Bayesian models, neural networks, support vector machine, bagging with logistic regression), and finally we tested the efficiency of different approaches through the Receiver Operating Characteristic (ROC) curve. The original dataset consisted of 19 variables (18 independent variables and one dependent (or response) variable). The association between cover crop and minimum tillage are effective strategies to promote the mitigation of soil CO2 emissions, in which the average CO2 emissions are 63 kg ha-1 day-1. The variables soil moisture, soil temperature (Ts), rainfall, pH, and organic carbon were most frequently selected for soil CO2 emission classification using different methods for attribute selection. According to the results of the ROC curve, the best approaches for soil CO2 emission classification were the following: (I)–the Multilayer Perceptron classifier with attribute selection through the wrapper method, that presented rate of false positive of 13,50%, true positive of 94,20% area under the curve (AUC) of 89,90% (II)–the Bagging classifier with logistic regression with attribute selection through the Chi-square method, that presented rate of false positive of 13,50%, true positive of 94,20% AUC of 89,90%. However, the (I) approach stands out in relation to (II) for its higher positive class accuracy (high CO2 emission) and lower computational cost. PMID:29513765
Application of random forests methods to diabetic retinopathy classification analyses.
Casanova, Ramon; Saldana, Santiago; Chew, Emily Y; Danis, Ronald P; Greven, Craig M; Ambrosius, Walter T
2014-01-01
Diabetic retinopathy (DR) is one of the leading causes of blindness in the United States and world-wide. DR is a silent disease that may go unnoticed until it is too late for effective treatment. Therefore, early detection could improve the chances of therapeutic interventions that would alleviate its effects. Graded fundus photography and systemic data from 3443 ACCORD-Eye Study participants were used to estimate Random Forest (RF) and logistic regression classifiers. We studied the impact of sample size on classifier performance and the possibility of using RF generated class conditional probabilities as metrics describing DR risk. RF measures of variable importance are used to detect factors that affect classification performance. Both types of data were informative when discriminating participants with or without DR. RF based models produced much higher classification accuracy than those based on logistic regression. Combining both types of data did not increase accuracy but did increase statistical discrimination of healthy participants who subsequently did or did not have DR events during four years of follow-up. RF variable importance criteria revealed that microaneurysms counts in both eyes seemed to play the most important role in discrimination among the graded fundus variables, while the number of medicines and diabetes duration were the most relevant among the systemic variables. We have introduced RF methods to DR classification analyses based on fundus photography data. In addition, we propose an approach to DR risk assessment based on metrics derived from graded fundus photography and systemic data. Our results suggest that RF methods could be a valuable tool to diagnose DR diagnosis and evaluate its progression.
Modeling Verdict Outcomes Using Social Network Measures: The Watergate and Caviar Network Cases
2016-01-01
Modelling criminal trial verdict outcomes using social network measures is an emerging research area in quantitative criminology. Few studies have yet analyzed which of these measures are the most important for verdict modelling or which data classification techniques perform best for this application. To compare the performance of different techniques in classifying members of a criminal network, this article applies three different machine learning classifiers–Logistic Regression, Naïve Bayes and Random Forest–with a range of social network measures and the necessary databases to model the verdicts in two real–world cases: the U.S. Watergate Conspiracy of the 1970’s and the now–defunct Canada–based international drug trafficking ring known as the Caviar Network. In both cases it was found that the Random Forest classifier did better than either Logistic Regression or Naïve Bayes, and its superior performance was statistically significant. This being so, Random Forest was used not only for classification but also to assess the importance of the measures. For the Watergate case, the most important one proved to be betweenness centrality while for the Caviar Network, it was the effective size of the network. These results are significant because they show that an approach combining machine learning with social network analysis not only can generate accurate classification models but also helps quantify the importance social network variables in modelling verdict outcomes. We conclude our analysis with a discussion and some suggestions for future work in verdict modelling using social network measures. PMID:26824351
NASA Astrophysics Data System (ADS)
Lin, Yingzhi; Deng, Xiangzheng; Li, Xing; Ma, Enjun
2014-12-01
Spatially explicit simulation of land use change is the basis for estimating the effects of land use and cover change on energy fluxes, ecology and the environment. At the pixel level, logistic regression is one of the most common approaches used in spatially explicit land use allocation models to determine the relationship between land use and its causal factors in driving land use change, and thereby to evaluate land use suitability. However, these models have a drawback in that they do not determine/allocate land use based on the direct relationship between land use change and its driving factors. Consequently, a multinomial logistic regression method was introduced to address this flaw, and thereby, judge the suitability of a type of land use in any given pixel in a case study area of the Jiangxi Province, China. A comparison of the two regression methods indicated that the proportion of correctly allocated pixels using multinomial logistic regression was 92.98%, which was 8.47% higher than that obtained using logistic regression. Paired t-test results also showed that pixels were more clearly distinguished by multinomial logistic regression than by logistic regression. In conclusion, multinomial logistic regression is a more efficient and accurate method for the spatial allocation of land use changes. The application of this method in future land use change studies may improve the accuracy of predicting the effects of land use and cover change on energy fluxes, ecology, and environment.
An Event-Triggered Machine Learning Approach for Accelerometer-Based Fall Detection.
Putra, I Putu Edy Suardiyana; Brusey, James; Gaura, Elena; Vesilo, Rein
2017-12-22
The fixed-size non-overlapping sliding window (FNSW) and fixed-size overlapping sliding window (FOSW) approaches are the most commonly used data-segmentation techniques in machine learning-based fall detection using accelerometer sensors. However, these techniques do not segment by fall stages (pre-impact, impact, and post-impact) and thus useful information is lost, which may reduce the detection rate of the classifier. Aligning the segment with the fall stage is difficult, as the segment size varies. We propose an event-triggered machine learning (EvenT-ML) approach that aligns each fall stage so that the characteristic features of the fall stages are more easily recognized. To evaluate our approach, two publicly accessible datasets were used. Classification and regression tree (CART), k -nearest neighbor ( k -NN), logistic regression (LR), and the support vector machine (SVM) were used to train the classifiers. EvenT-ML gives classifier F-scores of 98% for a chest-worn sensor and 92% for a waist-worn sensor, and significantly reduces the computational cost compared with the FNSW- and FOSW-based approaches, with reductions of up to 8-fold and 78-fold, respectively. EvenT-ML achieves a significantly better F-score than existing fall detection approaches. These results indicate that aligning feature segments with fall stages significantly increases the detection rate and reduces the computational cost.
Personality patterns predict the risk of antisocial behavior in Spanish-speaking adolescents.
Alcázar-Córcoles, Miguel A; Verdejo-García, Antonio; Bouso-Sáiz, José C; Revuelta-Menéndez, Javier; Ramírez-Lira, Ezequiel
2017-05-01
There is a renewed interest in incorporating personality variables in criminology theories in order to build models able to integrate personality variables and biological factors with psychosocial and sociocultural factors. The aim of this article is the assessment of personality dimensions that contribute to the prediction of antisocial behavior in adolescents. For this purpose, a sample of adolescents from El Salvador, Mexico, and Spain was obtained. The sample consisted of 1035 participants with a mean age of 16.2. There were 450 adolescents from a forensic population (those who committed a crime) and 585 adolescents from the normal population (no crime committed). All of participants answered personality tests about neuroticism, extraversion, psychoticism, sensation seeking, impulsivity, and violence risk. Principal component analysis of the data identified two independent factors: (i) the disinhibited behavior pattern (PDC), formed by the dimensions of neuroticism, psychoticism, impulsivity and risk of violence; and (ii) the extrovert behavior pattern (PEC), formed by the dimensions of sensation risk and extraversion. Both patterns significantly contributed to the prediction of adolescent antisocial behavior in a logistic regression model which properly classifies a global percentage of 81.9%, 86.8% for non-offense and 72.5% for offense behavior. The classification power of regression equations allows making very satisfactory predictions about adolescent offense commission. Educational level has been classified as a protective factor, while age and gender (male) have been classified as risk factors.
NASA Astrophysics Data System (ADS)
Tsangaratos, Paraskevas; Ilia, Ioanna; Loupasakis, Constantinos; Papadakis, Michalis; Karimalis, Antonios
2017-04-01
The main objective of the present study was to apply two machine learning methods for the production of a landslide susceptibility map in the Finikas catchment basin, located in North Peloponnese, Greece and to compare their results. Specifically, Logistic Regression and Random Forest were utilized, based on a database of 40 sites classified into two categories, non-landslide and landslide areas that were separated into a training dataset (70% of the total data) and a validation dataset (remaining 30%). The identification of the areas was established by analyzing airborne imagery, extensive field investigation and the examination of previous research studies. Six landslide related variables were analyzed, namely: lithology, elevation, slope, aspect, distance to rivers and distance to faults. Within the Finikas catchment basin most of the reported landslides were located along the road network and within the residential complexes, classified as rotational and translational slides, and rockfalls, mainly caused due to the physical conditions and the general geotechnical behavior of the geological formation that cover the area. Each landslide susceptibility map was reclassified by applying the Geometric Interval classification technique into five classes, namely: very low susceptibility, low susceptibility, moderate susceptibility, high susceptibility, and very high susceptibility. The comparison and validation of the outcomes of each model were achieved using statistical evaluation measures, the receiving operating characteristic and the area under the success and predictive rate curves. The computation process was carried out using RStudio an integrated development environment for R language and ArcGIS 10.1 for compiling the data and producing the landslide susceptibility maps. From the outcomes of the Logistic Regression analysis it was induced that the highest b coefficient is allocated to lithology and slope, which was 2.8423 and 1.5841, respectively. From the estimation of the mean decrease in Gini coefficient performed during the application of Random Forest and the mean decrease in accuracy the most important variable is slope followed by lithology, aspect, elevation, distance from river network, and distance from faults, while the most used variables during the training phase were the variable aspect (21.45%), slope (20.53%) and lithology (19.84%). The outcomes of the analysis are consistent with previous studies concerning the area of research, which have indicated the high influence of lithology and slope in the manifestation of landslides. High percentage of landslide occurrence has been observed in Plio-Pleistocene sediments, flysch formations, and Cretaceous limestone. Also the presences of landslides have been associated with the degree of weathering and fragmentation, the orientation of the discontinuities surfaces and the intense morphological relief. The most accurate model was Random Forest which identified correctly 92.00% of the instances during the training phase, followed by the Logistic Regression 89.00%. The same pattern of accuracy was calculated during the validation phase, in which the Random Forest achieved a classification accuracy of 93.00%, while the Logistic Regression model achieved an accuracy of 91.00%. In conclusion, the outcomes of the study could be a useful cartographic product to local authorities and government agencies during the implementation of successful decision-making and land use planning strategies. Keywords: Landslide Susceptibility, Logistic Regression, Random Forest, GIS, Greece.
Prostate malignancy grading using gland-related shape descriptors
NASA Astrophysics Data System (ADS)
Braumann, Ulf-Dietrich; Scheibe, Patrick; Loeffler, Markus; Kristiansen, Glen; Wernert, Nicolas
2014-03-01
A proof-of-principle study was accomplished assessing the descriptive potential of two simple geometric measures (shape descriptors) applied to sets of segmented glands within images of 125 prostate cancer tissue sections. Respective measures addressing glandular shapes were (i) inverse solidity and (ii) inverse compactness. Using a classifier based on logistic regression, Gleason grades 3 and 4/5 could be differentiated with an accuracy of approx. 95%. Results suggest not only good discriminatory properties, but also robustness against gland segmentation variations. False classifications in part were caused by inadvertent Gleason grade assignments, as a-posteriori re-inspections had turned out.
Taslimitehrani, Vahid; Dong, Guozhu; Pereira, Naveen L; Panahiazar, Maryam; Pathak, Jyotishman
2016-04-01
Computerized survival prediction in healthcare identifying the risk of disease mortality, helps healthcare providers to effectively manage their patients by providing appropriate treatment options. In this study, we propose to apply a classification algorithm, Contrast Pattern Aided Logistic Regression (CPXR(Log)) with the probabilistic loss function, to develop and validate prognostic risk models to predict 1, 2, and 5year survival in heart failure (HF) using data from electronic health records (EHRs) at Mayo Clinic. The CPXR(Log) constructs a pattern aided logistic regression model defined by several patterns and corresponding local logistic regression models. One of the models generated by CPXR(Log) achieved an AUC and accuracy of 0.94 and 0.91, respectively, and significantly outperformed prognostic models reported in prior studies. Data extracted from EHRs allowed incorporation of patient co-morbidities into our models which helped improve the performance of the CPXR(Log) models (15.9% AUC improvement), although did not improve the accuracy of the models built by other classifiers. We also propose a probabilistic loss function to determine the large error and small error instances. The new loss function used in the algorithm outperforms other functions used in the previous studies by 1% improvement in the AUC. This study revealed that using EHR data to build prediction models can be very challenging using existing classification methods due to the high dimensionality and complexity of EHR data. The risk models developed by CPXR(Log) also reveal that HF is a highly heterogeneous disease, i.e., different subgroups of HF patients require different types of considerations with their diagnosis and treatment. Our risk models provided two valuable insights for application of predictive modeling techniques in biomedicine: Logistic risk models often make systematic prediction errors, and it is prudent to use subgroup based prediction models such as those given by CPXR(Log) when investigating heterogeneous diseases. Copyright © 2016 Elsevier Inc. All rights reserved.
Standards for Standardized Logistic Regression Coefficients
ERIC Educational Resources Information Center
Menard, Scott
2011-01-01
Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…
Schörgendorfer, Angela; Branscum, Adam J; Hanson, Timothy E
2013-06-01
Logistic regression is a popular tool for risk analysis in medical and population health science. With continuous response data, it is common to create a dichotomous outcome for logistic regression analysis by specifying a threshold for positivity. Fitting a linear regression to the nondichotomized response variable assuming a logistic sampling model for the data has been empirically shown to yield more efficient estimates of odds ratios than ordinary logistic regression of the dichotomized endpoint. We illustrate that risk inference is not robust to departures from the parametric logistic distribution. Moreover, the model assumption of proportional odds is generally not satisfied when the condition of a logistic distribution for the data is violated, leading to biased inference from a parametric logistic analysis. We develop novel Bayesian semiparametric methodology for testing goodness of fit of parametric logistic regression with continuous measurement data. The testing procedures hold for any cutoff threshold and our approach simultaneously provides the ability to perform semiparametric risk estimation. Bayes factors are calculated using the Savage-Dickey ratio for testing the null hypothesis of logistic regression versus a semiparametric generalization. We propose a fully Bayesian and a computationally efficient empirical Bayesian approach to testing, and we present methods for semiparametric estimation of risks, relative risks, and odds ratios when parametric logistic regression fails. Theoretical results establish the consistency of the empirical Bayes test. Results from simulated data show that the proposed approach provides accurate inference irrespective of whether parametric assumptions hold or not. Evaluation of risk factors for obesity shows that different inferences are derived from an analysis of a real data set when deviations from a logistic distribution are permissible in a flexible semiparametric framework. © 2013, The International Biometric Society.
Lim, Liang; Nichols, Brandon; Migden, Michael R.; Rajaram, Narasimhan; Reichenberg, Jason S.; Markey, Mia K.; Ross, Merrick I.; Tunnell, James W.
2014-01-01
Abstract. The goal of this study was to determine the diagnostic capability of a multimodal spectral diagnosis (SD) for in vivo noninvasive disease diagnosis of melanoma and nonmelanoma skin cancers. We acquired reflectance, fluorescence, and Raman spectra from 137 lesions in 76 patients using custom-built optical fiber-based clinical systems. Biopsies of lesions were classified using standard histopathology as malignant melanoma (MM), nonmelanoma pigmented lesion (PL), basal cell carcinoma (BCC), actinic keratosis (AK), and squamous cell carcinoma (SCC). Spectral data were analyzed using principal component analysis. Using multiple diagnostically relevant principal components, we built leave-one-out logistic regression classifiers. Classification results were compared with histopathology of the lesion. Sensitivity/specificity for classifying MM versus PL (12 versus 17 lesions) was 100%/100%, for SCC and BCC versus AK (57 versus 14 lesions) was 95%/71%, and for AK and SCC and BCC versus normal skin (71 versus 71 lesions) was 90%/85%. The best classification for nonmelanoma skin cancers required multiple modalities; however, the best melanoma classification occurred with Raman spectroscopy alone. The high diagnostic accuracy for classifying both melanoma and nonmelanoma skin cancer lesions demonstrates the potential for SD as a clinical diagnostic device. PMID:25375350
NASA Astrophysics Data System (ADS)
Lim, Liang; Nichols, Brandon; Migden, Michael R.; Rajaram, Narasimhan; Reichenberg, Jason S.; Markey, Mia K.; Ross, Merrick I.; Tunnell, James W.
2014-11-01
The goal of this study was to determine the diagnostic capability of a multimodal spectral diagnosis (SD) for in vivo noninvasive disease diagnosis of melanoma and nonmelanoma skin cancers. We acquired reflectance, fluorescence, and Raman spectra from 137 lesions in 76 patients using custom-built optical fiber-based clinical systems. Biopsies of lesions were classified using standard histopathology as malignant melanoma (MM), nonmelanoma pigmented lesion (PL), basal cell carcinoma (BCC), actinic keratosis (AK), and squamous cell carcinoma (SCC). Spectral data were analyzed using principal component analysis. Using multiple diagnostically relevant principal components, we built leave-one-out logistic regression classifiers. Classification results were compared with histopathology of the lesion. Sensitivity/specificity for classifying MM versus PL (12 versus 17 lesions) was 100%;/100%;, for SCC and BCC versus AK (57 versus 14 lesions) was 95%;/71%, and for AK and SCC and BCC versus normal skin (71 versus 71 lesions) was 90%/85%. The best classification for nonmelanoma skin cancers required multiple modalities; however, the best melanoma classification occurred with Raman spectroscopy alone. The high diagnostic accuracy for classifying both melanoma and nonmelanoma skin cancer lesions demonstrates the potential for SD as a clinical diagnostic device.
Detection of chewing from piezoelectric film sensor signals using ensemble classifiers.
Farooq, Muhammad; Sazonov, Edward
2016-08-01
Selection and use of pattern recognition algorithms is application dependent. In this work, we explored the use of several ensembles of weak classifiers to classify signals captured from a wearable sensor system to detect food intake based on chewing. Three sensor signals (Piezoelectric sensor, accelerometer, and hand to mouth gesture) were collected from 12 subjects in free-living conditions for 24 hrs. Sensor signals were divided into 10 seconds epochs and for each epoch combination of time and frequency domain features were computed. In this work, we present a comparison of three different ensemble techniques: boosting (AdaBoost), bootstrap aggregation (bagging) and stacking, each trained with 3 different weak classifiers (Decision Trees, Linear Discriminant Analysis (LDA) and Logistic Regression). Type of feature normalization used can also impact the classification results. For each ensemble method, three feature normalization techniques: (no-normalization, z-score normalization, and minmax normalization) were tested. A 12 fold cross-validation scheme was used to evaluate the performance of each model where the performance was evaluated in terms of precision, recall, and accuracy. Best results achieved here show an improvement of about 4% over our previous algorithms.
Polasek, Ozren; Kolcic, Ivana; Dzakula, Aleksandar; Bagat, Mario
2006-04-01
Human resources management in health often encounters problems related to workforce geographical distribution. The aim of this study was to investigate the internship workplace preferences of final-year medical students and the reasons associated with their choices. A total of 204 out of 240 final-year medical students at Zagreb University Medical School, Croatia, were surveyed a few months before graduation. We collected data on each student's background, workplace preference, academic performance and emigration preferences. Logistic regression was used to analyse the factors underlying internship workplace preference, classified into two categories: Zagreb versus other areas. Only 39 respondents (19.1%) wanted to obtain internships outside Zagreb, the Croatian capital. Gender and age were not significantly associated with internship workplace preference. A single predictor variable significantly contributed to the logistic regression model: students who believed they would not get the desired specialty more often chose Zagreb as a preferred internship workplace (odds ratio 0.32, 95% CI 0.12-0.86). A strong preference for Zagreb as an internship workplace was recorded. Uncertainty about getting the desired specialty was associated with choosing Zagreb as a workplace, possibly due to more extensive and diverse job opportunities.
Association between developmental enamel defects in the primary and permanent dentitions.
Casanova-Rosado, A J; Medina-Solís, C E; Casanova-Rosado, J F; Vallejos-Sánchez, A A; Martinez-Mier, E A; Loyola-Rodríguez, J P; Islas-Márquez, A J; Maupomé, G
2011-09-01
To determine if the presence of developmental enamel defects (DED) in the primary dentition is a risk indicator for the presence of DED in the permanent dentition in children with mixed dentition, as well as others factors. A cross-sectional study was undertaken in 1296 school children ages six to 72 years. The DED [FDI; 1982] in both dentitions were identified by means of an oral exam scoring enamel opacities [classified as demarcated or diffused], and enamel hypoplasia. Sociodemographic and socioeconomic variables were collected through a questionnaire. Socioeconomic status (SES) was determined based on the occupation and maximum level of education of parents. Statistical analysis included logistic regression. Mean age of participants was 8.40 +/- 1.68; 51.6% were boys. DED prevalence was 7.5% in the permanent dentition and 10.0% in the primary dentition. The logistic regression model, adjusting for sociodemographic and socioeconomic variables, showed that for each primary tooth with DED, the odds of observing DED in the permanent dentition increased 7.38 times [95% CI = 1.17-1.64; p < 0.001]. An association between DED presence in both permanent and primary dentitions was observed. Further studies are necessary to fully characterise such relationship.
Kim, So Young; Sim, Songyong; Choi, Hyo Geun
2017-01-01
Although an association between energy drinks and suicide has been suggested, few prior studies have considered the role of emotional factors including stress, sleep, and school performance in adolescents. This study aimed to evaluate the association of energy drinks with suicide, independent of possible confounders including stress, sleep, and school performance. In total, 121,106 adolescents with 13-18 years olds from the 2014 and 2015 Korea Youth Risk Behavior Web-based Survey were surveyed for age, sex, region of residence, economic level, paternal and maternal education level, sleep time, stress level, school performance, frequency of energy drink intake, and suicide attempts. Subjective stress levels were classified into severe, moderate, mild, a little, and no stress. Sleep time was divided into 6 groups: < 6 h; 6 ≤ h < 7; 7 ≤ h < 8; 8 ≤ h < 9; and ≥ 9 h. School performance was classified into 5 levels: A (highest), B (middle, high), C (middle), D (middle, low), and E (lowest). Frequency of energy drink consumption was divided into 3 groups: ≥ 3, 1-2, and 0 times a week. The associations of sleep time, stress level, and school performance with suicide attempts and the frequency of energy drink intake were analyzed using multiple and ordinal logistic regression analysis, respectively, with complex sampling. The relationship between frequency of energy drink intake and suicide attempts was analyzed using multiple logistic regression analysis with complex sampling. Higher stress levels, lack of sleep, and low school performance were significantly associated with suicide attempts (each P < 0.001). These variables of high stress level, abnormal sleep time, and low school performance were also proportionally related with higher energy drink intake (P < 0.001). Frequent energy drink intake was significantly associated with suicide attempts in multiple logistic regression analyses (AOR for frequency of energy intake ≥ 3 times a week = 3.03, 95% CI = 2.64-3.49, P < 0.001). Severe stress, inadequate sleep, and low school performance were related with more energy drink intake and suicide attempts in Korean adolescents. Frequent energy drink intake was positively related with suicide attempts, even after adjusting for stress, sleep time, and school performance.
Kim, So Young; Sim, Songyong
2017-01-01
Objective Although an association between energy drinks and suicide has been suggested, few prior studies have considered the role of emotional factors including stress, sleep, and school performance in adolescents. This study aimed to evaluate the association of energy drinks with suicide, independent of possible confounders including stress, sleep, and school performance. Methods In total, 121,106 adolescents with 13–18 years olds from the 2014 and 2015 Korea Youth Risk Behavior Web-based Survey were surveyed for age, sex, region of residence, economic level, paternal and maternal education level, sleep time, stress level, school performance, frequency of energy drink intake, and suicide attempts. Subjective stress levels were classified into severe, moderate, mild, a little, and no stress. Sleep time was divided into 6 groups: < 6 h; 6 ≤ h < 7; 7 ≤ h < 8; 8 ≤ h < 9; and ≥ 9 h. School performance was classified into 5 levels: A (highest), B (middle, high), C (middle), D (middle, low), and E (lowest). Frequency of energy drink consumption was divided into 3 groups: ≥ 3, 1–2, and 0 times a week. The associations of sleep time, stress level, and school performance with suicide attempts and the frequency of energy drink intake were analyzed using multiple and ordinal logistic regression analysis, respectively, with complex sampling. The relationship between frequency of energy drink intake and suicide attempts was analyzed using multiple logistic regression analysis with complex sampling. Results Higher stress levels, lack of sleep, and low school performance were significantly associated with suicide attempts (each P < 0.001). These variables of high stress level, abnormal sleep time, and low school performance were also proportionally related with higher energy drink intake (P < 0.001). Frequent energy drink intake was significantly associated with suicide attempts in multiple logistic regression analyses (AOR for frequency of energy intake ≥ 3 times a week = 3.03, 95% CI = 2.64–3.49, P < 0.001). Conclusion Severe stress, inadequate sleep, and low school performance were related with more energy drink intake and suicide attempts in Korean adolescents. Frequent energy drink intake was positively related with suicide attempts, even after adjusting for stress, sleep time, and school performance. PMID:29135989
Comparison of two landslide susceptibility assessments in the Champagne-Ardenne region (France)
NASA Astrophysics Data System (ADS)
Den Eeckhaut, M. Van; Marre, A.; Poesen, J.
2010-02-01
The vineyards of the Montagne de Reims are mostly planted on steep south-oriented cuesta fronts receiving a maximum of sun radiation. Due to the location of the vineyards on steep hillslopes, the viticultural activity is threatened by slope failures. This study attempts to better understand the spatial patterns of landslide susceptibility in the Champagne-Ardenne region by comparing a heuristic (qualitative) and a statistical (quantitative) model in a 1120 km² study area. The heuristic landslide susceptibility model was adopted from the Bureau de Recherches Géologiques et Minières, the GEGEAA - Reims University and the Comité Interprofessionnel du Vin de Champagne. In this model, expert knowledge of the region was used to assign weights to all slope classes and lithologies present in the area, but the final susceptibility map was never evaluated with the location of mapped landslides. For the statistical landslide susceptibility assessment, logistic regression was applied to a dataset of 291 'old' (Holocene) landslides. The robustness of the logistic regression model was evaluated and ROC curves were used for model calibration and validation. With regard to the variables assumed to be important environmental factors controlling landslides, the two models are in agreement. They both indicate that present and future landslides are mainly controlled by slope gradient and lithology. However, the comparison of the two landslide susceptibility maps through (1) an evaluation with the location of mapped 'old' landslides and through (2) a temporal validation with spatial data of 'recent' (1960-1999; n = 48) and 'very recent' (2000-2008; n = 46) landslides showed a better prediction capacity for the statistical model produced in this study compared to the heuristic model. In total, the statistically-derived landslide susceptibility map succeeded in correctly classifying 81.0% of the 'old' and 91.6% of the 'recent' and 'very recent' landslides. On the susceptibility map derived from the heuristic model, on the other hand, only 54.6% of the 'old' and 64.0% of the 'recent' and 'very recent' landslides were correctly classified as unstable. Hence, the landslide susceptibility map obtained from logistic regression is a better tool for regional landslide susceptibility analysis in the study area of the Montagne de Reims. The accurate classification of zones with very high and high susceptibility allows delineating zones where viticulturists should be informed and where implementation of precaution measures is needed to secure slope stability.
Robust mislabel logistic regression without modeling mislabel probabilities.
Hung, Hung; Jou, Zhi-Yu; Huang, Su-Yun
2018-03-01
Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression. © 2017, The International Biometric Society.
Fungible weights in logistic regression.
Jones, Jeff A; Waller, Niels G
2016-06-01
In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Dong, Wei-Feng; Canil, Sarah; Lai, Raymond; Morel, Didier; Swanson, Paul E.; Izevbaye, Iyare
2018-01-01
A new automated MYC IHC classifier based on bivariate logistic regression is presented. The predictor relies on image analysis developed with the open-source ImageJ platform. From a histologic section immunostained for MYC protein, 2 dimensionless quantitative variables are extracted: (a) relative distance between nuclei positive for MYC IHC based on euclidean minimum spanning tree graph and (b) coefficient of variation of the MYC IHC stain intensity among MYC IHC-positive nuclei. Distance between positive nuclei is suggested to inversely correlate MYC gene rearrangement status, whereas coefficient of variation is suggested to inversely correlate physiological regulation of MYC protein expression. The bivariate classifier was compared with 2 other MYC IHC classifiers (based on percentage of MYC IHC positive nuclei), all tested on 113 lymphomas including mostly diffuse large B-cell lymphomas with known MYC fluorescent in situ hybridization (FISH) status. The bivariate classifier strongly outperformed the “percentage of MYC IHC-positive nuclei” methods to predict MYC+ FISH status with 100% sensitivity (95% confidence interval, 94-100) associated with 80% specificity. The test is rapidly performed and might at a minimum provide primary IHC screening for MYC gene rearrangement status in diffuse large B-cell lymphomas. Furthermore, as this bivariate classifier actually predicts “permanent overexpressed MYC protein status,” it might identify nontranslocation-related chromosomal anomalies missed by FISH. PMID:27093450
Jin, Meihua; Yang, Zhongrong; Dong, Zhengquan; Han, Jiankang
2013-12-01
There is growing evidence that men who have sex with men (MSM) are currently a group at high risk of HIV infection in China. Our study aims to know the factors affecting consistent condom use among MSM recruited through the internet in Huzhou city. An anonymous cross-sectional study was conducted by recruiting 410 MSM living in Huzhou city via the Internet. The socio-demographic profiles (age, education level, employment status, etc.) and sexual risk behaviors of the respondents were investigated. Bivariate logistic regression analyses were performed to compare the differences between consistent condom users and inconsistent condom users. Variables with significant bivariate between groups' differences were used as candidate variables in a stepwise multivariate logistic regression model. All statistical analyses were performed using SPSS for Windows 17.0, and a p value < 0.05 was considered to be statistically significant. According to their condom use, sixty-eight respondents were classified into two groups. One is consistent condom users, and the other is inconsistent condom users. Multivariate logistic regression showed that respondents who had a comprehensive knowledge of HIV (OR = 4.08, 95% CI: 1.85-8.99), who had sex with male sex workers (OR = 15.30, 95% CI: 5.89-39.75) and who had not drunk alcohol before sex (OR = 3.10, 95% CI: 1.38-6.95) were more likely to be consistent condom users. Consistent condom use among MSM was associated with comprehensive knowledge of HIV and a lack of alcohol use before sexual contact. As a result, reducing alcohol consumption and enhancing education regarding the risks of HIV among sexually active MSM would be effective in preventing of HIV transmission.
Smith, Vanessa; Riccieri, Valeria; Pizzorni, Carmen; Decuman, Saskia; Deschepper, Ellen; Bonroy, Carolien; Sulli, Alberto; Piette, Yves; De Keyser, Filip; Cutolo, Maurizio
2013-12-01
Assessment of associations of nailfold videocapillaroscopy (NVC) scleroderma (systemic sclerosis; SSc) ("early," "active," and "late") with novel future severe clinical involvement in 2 independent cohorts. Sixty-six consecutive Belgian and 82 Italian patients with SSc underwent NVC at baseline. Images were blindly assessed and classified into normal, early, active, or late NVC pattern. Clinical evaluation was performed for 9 organ systems (general, peripheral vascular, skin, joint, muscle, gastrointestinal tract, lung, heart, and kidney) according to the Medsger disease severity scale (DSS) at baseline and in the future (18-24 months of followup). Severe clinical involvement was defined as category 2 to 4 per organ of the DSS. Logistic regression analysis (continuous NVC predictor variable) was performed. The OR to develop novel future severe organ involvement was stronger according to more severe NVC patterns and similar in both cohorts. In simple logistic regression analysis the OR in the Belgian/Italian cohort was 2.16 (95% CI 1.19-4.47, p = 0.010)/2.33 (95% CI 1.36-4.22, p = 0.002) for the early NVC SSc pattern, 4.68/5.42 for the active pattern, and 10.14/12.63 for the late pattern versus the normal pattern. In multiple logistic regression analysis, adjusting for disease duration, subset, and vasoactive medication, the OR was 2.99 (95% CI 1.31-8.82, p = 0.007)/1.88 (95% CI 1.00-3.71, p = 0.050) for the early NVC SSc pattern, 8.93/3.54 for the active pattern, and 26.69/6.66 for the late pattern versus the normal pattern. Capillaroscopy may be predictive of novel future severe organ involvement in SSc, as attested by 2 independent cohorts.
Study on a pattern classification method of soil quality based on simplified learning sample dataset
Zhang, Jiahua; Liu, S.; Hu, Y.; Tian, Y.
2011-01-01
Based on the massive soil information in current soil quality grade evaluation, this paper constructed an intelligent classification approach of soil quality grade depending on classical sampling techniques and disordered multiclassification Logistic regression model. As a case study to determine the learning sample capacity under certain confidence level and estimation accuracy, and use c-means algorithm to automatically extract the simplified learning sample dataset from the cultivated soil quality grade evaluation database for the study area, Long chuan county in Guangdong province, a disordered Logistic classifier model was then built and the calculation analysis steps of soil quality grade intelligent classification were given. The result indicated that the soil quality grade can be effectively learned and predicted by the extracted simplified dataset through this method, which changed the traditional method for soil quality grade evaluation. ?? 2011 IEEE.
Should metacognition be measured by logistic regression?
Rausch, Manuel; Zehetleitner, Michael
2017-03-01
Are logistic regression slopes suitable to quantify metacognitive sensitivity, i.e. the efficiency with which subjective reports differentiate between correct and incorrect task responses? We analytically show that logistic regression slopes are independent from rating criteria in one specific model of metacognition, which assumes (i) that rating decisions are based on sensory evidence generated independently of the sensory evidence used for primary task responses and (ii) that the distributions of evidence are logistic. Given a hierarchical model of metacognition, logistic regression slopes depend on rating criteria. According to all considered models, regression slopes depend on the primary task criterion. A reanalysis of previous data revealed that massive numbers of trials are required to distinguish between hierarchical and independent models with tolerable accuracy. It is argued that researchers who wish to use logistic regression as measure of metacognitive sensitivity need to control the primary task criterion and rating criteria. Copyright © 2017 Elsevier Inc. All rights reserved.
Kendrick, Sarah K; Zheng, Qi; Garbett, Nichola C; Brock, Guy N
2017-01-01
DSC is used to determine thermally-induced conformational changes of biomolecules within a blood plasma sample. Recent research has indicated that DSC curves (or thermograms) may have different characteristics based on disease status and, thus, may be useful as a monitoring and diagnostic tool for some diseases. Since thermograms are curves measured over a range of temperature values, they are considered functional data. In this paper we apply functional data analysis techniques to analyze differential scanning calorimetry (DSC) data from individuals from the Lupus Family Registry and Repository (LFRR). The aim was to assess the effect of lupus disease status as well as additional covariates on the thermogram profiles, and use FD analysis methods to create models for classifying lupus vs. control patients on the basis of the thermogram curves. Thermograms were collected for 300 lupus patients and 300 controls without lupus who were matched with diseased individuals based on sex, race, and age. First, functional regression with a functional response (DSC) and categorical predictor (disease status) was used to determine how thermogram curve structure varied according to disease status and other covariates including sex, race, and year of birth. Next, functional logistic regression with disease status as the response and functional principal component analysis (FPCA) scores as the predictors was used to model the effect of thermogram structure on disease status prediction. The prediction accuracy for patients with Osteoarthritis and Rheumatoid Arthritis but without Lupus was also calculated to determine the ability of the classifier to differentiate between Lupus and other diseases. Data were divided 1000 times into separate 2/3 training and 1/3 test data for evaluation of predictions. Finally, derivatives of thermogram curves were included in the models to determine whether they aided in prediction of disease status. Functional regression with thermogram as a functional response and disease status as predictor showed a clear separation in thermogram curve structure between cases and controls. The logistic regression model with FPCA scores as the predictors gave the most accurate results with a mean 79.22% correct classification rate with a mean sensitivity = 79.70%, and specificity = 81.48%. The model correctly classified OA and RA patients without Lupus as controls at a rate of 75.92% on average with a mean sensitivity = 79.70% and specificity = 77.6%. Regression models including FPCA scores for derivative curves did not perform as well, nor did regression models including covariates. Changes in thermograms observed in the disease state likely reflect covalent modifications of plasma proteins or changes in large protein-protein interacting networks resulting in the stabilization of plasma proteins towards thermal denaturation. By relating functional principal components from thermograms to disease status, our Functional Principal Component Analysis model provides results that are more easily interpretable compared to prior studies. Further, the model could also potentially be coupled with other biomarkers to improve diagnostic classification for lupus.
Seghatoleslam, T; Habi, H; Rashid, R Abdul; Mosavi, N; Asmaee, S; Naseri, A
2012-01-01
THE CURRENT STUDY AIMED TO TEST THE HYPOTHESIS: Is suicide predictable? And try to classify the predictive factors in multiple suicide attempts. A cross-sectional study was administered to 223 multiple attempters, women who came to a medical poison centre after a suicide attempt. The participants were young, poor, and single. A Logistic Regression Analiysis was used to classify the predictive factors of suicide. Women who had multiple suicide attempts exhibited a significant tendency to attempt suicide again. They had a history for more than two years of multiple suicide attempts, from three to as many as 18 times, plus mental illnesses such as depression and substance abuse. They also had a positive history of mental illnesses. Results indicate that contributing factors for another suicide attempt include previous suicide attempts, mental illness (depression), or a positive history of mental illnesses in the family affecting them at a young age, and substance abuse.
London Measure of Unplanned Pregnancy: guidance for its use as an outcome measure
Hall, Jennifer A; Barrett, Geraldine; Copas, Andrew; Stephenson, Judith
2017-01-01
Background The London Measure of Unplanned Pregnancy (LMUP) is a psychometrically validated measure of the degree of intention of a current or recent pregnancy. The LMUP is increasingly being used worldwide, and can be used to evaluate family planning or preconception care programs. However, beyond recommending the use of the full LMUP scale, there is no published guidance on how to use the LMUP as an outcome measure. Ordinal logistic regression has been recommended informally, but studies published to date have all used binary logistic regression and dichotomized the scale at different cut points. There is thus a need for evidence-based guidance to provide a standardized methodology for multivariate analysis and to enable comparison of results. This paper makes recommendations for the regression method for analysis of the LMUP as an outcome measure. Materials and methods Data collected from 4,244 pregnant women in Malawi were used to compare five regression methods: linear, logistic with two cut points, and ordinal logistic with either the full or grouped LMUP score. The recommendations were then tested on the original UK LMUP data. Results There were small but no important differences in the findings across the regression models. Logistic regression resulted in the largest loss of information, and assumptions were violated for the linear and ordinal logistic regression. Consequently, robust standard errors were used for linear regression and a partial proportional odds ordinal logistic regression model attempted. The latter could only be fitted for grouped LMUP score. Conclusion We recommend the linear regression model with robust standard errors to make full use of the LMUP score when analyzed as an outcome measure. Ordinal logistic regression could be considered, but a partial proportional odds model with grouped LMUP score may be required. Logistic regression is the least-favored option, due to the loss of information. For logistic regression, the cut point for un/planned pregnancy should be between nine and ten. These recommendations will standardize the analysis of LMUP data and enhance comparability of results across studies. PMID:28435343
Logistic models--an odd(s) kind of regression.
Jupiter, Daniel C
2013-01-01
The logistic regression model bears some similarity to the multivariable linear regression with which we are familiar. However, the differences are great enough to warrant a discussion of the need for and interpretation of logistic regression. Copyright © 2013 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.
Engvall, Karin; Hult, M; Corner, R; Lampa, E; Norbäck, D; Emenius, G
2010-01-01
The aim was to develop a new model to identify residential buildings with higher frequencies of "SBS" than expected, "risk buildings". In 2005, 481 multi-family buildings with 10,506 dwellings in Stockholm were studied by a new stratified random sampling. A standardised self-administered questionnaire was used to assess "SBS", atopy and personal factors. The response rate was 73%. Statistical analysis was performed by multiple logistic regressions. Dwellers owning their building reported less "SBS" than those renting. There was a strong relationship between socio-economic factors and ownership. The regression model, ended up with high explanatory values for age, gender, atopy and ownership. Applying our model, 9% of all residential buildings in Stockholm were classified as "risk buildings" with the highest proportion in houses built 1961-1975 (26%) and lowest in houses built 1985-1990 (4%). To identify "risk buildings", it is necessary to adjust for ownership and population characteristics.
Hoggarth, Petra A; Innes, Carrie R H; Dalrymple-Alford, John C; Jones, Richard D
2013-12-01
To generate a robust model of computerized sensory-motor and cognitive test performance to predict on-road driving assessment outcomes in older persons with diagnosed or suspected cognitive impairment. A logistic regression model classified pass–fail outcomes of a blinded on-road driving assessment. Generalizability of the model was tested using leave-one-out cross-validation. Three specialist clinics in New Zealand. Drivers (n=279; mean age 78.4, 65% male) with diagnosed or suspected dementia, mild cognitive impairment, unspecified cognitive impairment, or memory problems referred for a medical driving assessment. A computerized battery of sensory-motor and cognitive tests and an on-road medical driving assessment. One hundred fifty-five participants (55.5%) received an on-road fail score. Binary logistic regression correctly classified 75.6% of the sample into on-road pass and fail groups. The cross-validation indicated accuracy of the model of 72.0% with sensitivity for detecting on-road fails of 73.5%, specificity of 70.2%, positive predictive value of 75.5%, and negative predictive value of 68%. The off-road assessment prediction model resulted in a substantial number of people who were assessed as likely to fail despite passing an on-road assessment and vice versa. Thus, despite a large multicenter sample, the use of off-road tests previously found to be useful in other older populations, and a carefully constructed and tested prediction model, off-road measures have yet to be found that are sufficiently accurate to allow acceptable determination of on-road driving safety of cognitively impaired older drivers. © 2013, Copyright the Authors Journal compilation © 2013, The American Geriatrics Society.
Basha, Sakeenabi; Mohammad, Roshan Noor; Swamy, Hiremath Shivalinga; Sexena, Vrinda
2015-01-01
Obesity and poverty are independent risk factors in trauma-related morbidity in children as well as adolescents. The main objective of this study was to investigate the association between traumatic dental injury, obesity, and socioeconomic status in 6- and 13-year-old schoolchildren in Davangere city, Karnataka, India. Data were obtained from 1,550 schoolchildren. Dental trauma was classified according to Andreasen's criteria. The medical evaluation assessed the Body Mass Index. Overjet was considered a risk factor when it presented values higher than 3 mm, whereas lip coverage was classified as adequate or inadequate. With appropriate sample weighting, relationships between traumatic dental injury and other variables were assessed using the chi-squared test and multivariable logistic regression. Overall prevalence of dental injuries was 10.52% (3.6% in 6-year-olds and 17.2% in 13-year-olds). Boys experienced more injuries than girls, 11.03% and 9.97%, respectively (p>.05). There was a statistically significant difference between traumatic dental injury and overjet (95% confidence interval [CI] [2.06, 4.78], p < 001) and between traumatic dental injury and inadequate lip coverage (95% CI [1.23, 4.65], p < .001). When adjusted for covariates, the logistic regression model showed that there was a significant association between obese children (p < .05) and dental trauma prevalence. Children from low socioeconomic status had an odds ratio 2.33 (95% CI [1.05, 3.97]) times higher likelihood of having dental trauma than children from medium and upper socioeconomic status. To conclude the results of this study support an association between traumatic dental injuries, obesity, and poverty.
Feasibility Testing of a Wearable Behavioral Aid for Social Learning in Children with Autism.
Daniels, Jena; Haber, Nick; Voss, Catalin; Schwartz, Jessey; Tamura, Serena; Fazel, Azar; Kline, Aaron; Washington, Peter; Phillips, Jennifer; Winograd, Terry; Feinstein, Carl; Wall, Dennis P
2018-01-01
Recent advances in computer vision and wearable technology have created an opportunity to introduce mobile therapy systems for autism spectrum disorders (ASD) that can respond to the increasing demand for therapeutic interventions; however, feasibility questions must be answered first. We studied the feasibility of a prototype therapeutic tool for children with ASD using Google Glass, examining whether children with ASD would wear such a device, if providing the emotion classification will improve emotion recognition, and how emotion recognition differs between ASD participants and neurotypical controls (NC). We ran a controlled laboratory experiment with 43 children: 23 with ASD and 20 NC. Children identified static facial images on a computer screen with one of 7 emotions in 3 successive batches: the first with no information about emotion provided to the child, the second with the correct classification from the Glass labeling the emotion, and the third again without emotion information. We then trained a logistic regression classifier on the emotion confusion matrices generated by the two information-free batches to predict ASD versus NC. All 43 children were comfortable wearing the Glass. ASD and NC participants who completed the computer task with Glass providing audible emotion labeling ( n = 33) showed increased accuracies in emotion labeling, and the logistic regression classifier achieved an accuracy of 72.7%. Further analysis suggests that the ability to recognize surprise, fear, and neutrality may distinguish ASD cases from NC. This feasibility study supports the utility of a wearable device for social affective learning in ASD children and demonstrates subtle differences in how ASD and NC children perform on an emotion recognition task. Schattauer GmbH Stuttgart.
Application of Random Forests Methods to Diabetic Retinopathy Classification Analyses
Casanova, Ramon; Saldana, Santiago; Chew, Emily Y.; Danis, Ronald P.; Greven, Craig M.; Ambrosius, Walter T.
2014-01-01
Background Diabetic retinopathy (DR) is one of the leading causes of blindness in the United States and world-wide. DR is a silent disease that may go unnoticed until it is too late for effective treatment. Therefore, early detection could improve the chances of therapeutic interventions that would alleviate its effects. Methodology Graded fundus photography and systemic data from 3443 ACCORD-Eye Study participants were used to estimate Random Forest (RF) and logistic regression classifiers. We studied the impact of sample size on classifier performance and the possibility of using RF generated class conditional probabilities as metrics describing DR risk. RF measures of variable importance are used to detect factors that affect classification performance. Principal Findings Both types of data were informative when discriminating participants with or without DR. RF based models produced much higher classification accuracy than those based on logistic regression. Combining both types of data did not increase accuracy but did increase statistical discrimination of healthy participants who subsequently did or did not have DR events during four years of follow-up. RF variable importance criteria revealed that microaneurysms counts in both eyes seemed to play the most important role in discrimination among the graded fundus variables, while the number of medicines and diabetes duration were the most relevant among the systemic variables. Conclusions and Significance We have introduced RF methods to DR classification analyses based on fundus photography data. In addition, we propose an approach to DR risk assessment based on metrics derived from graded fundus photography and systemic data. Our results suggest that RF methods could be a valuable tool to diagnose DR diagnosis and evaluate its progression. PMID:24940623
Hip Strength as a Predictor of Ankle Sprains in Male Soccer Players: A Prospective Study.
Powers, Christopher M; Ghoddosi, Navid; Straub, Rachel K; Khayambashi, Khalil
2017-11-01
Diminished hip-abductor strength has been suggested to increase the risk of noncontact lateral ankle sprains. To determine prospectively whether baseline hip-abductor strength predicts future noncontact lateral ankle sprains in competitive male soccer players. Prospective cohort study. Athletic training facilities and various athletic fields. Two hundred ten competitive male soccer players. Before the start of the sport season, isometric hip-abductor strength was measured bilaterally using a handheld dynamometer. Any previous history of ankle sprain, body mass index, age, height, and weight were documented. During the sport season (30 weeks), ankle injury status was recorded by team medical providers. Injured athletes were further classified based on the mechanism of injury. Only data from injured athletes who sustained noncontact lateral ankle sprains were used for analysis. Postseason, logistic regression was used to determine whether baseline hip strength predicted future noncontact lateral ankle sprains. A receiver operating characteristic curve was constructed for hip strength to determine the cutoff value for distinguishing between high-risk and low-risk outcomes. A total of 25 noncontact lateral ankle sprains were confirmed, for an overall annual incidence of 11.9%. Baseline hip-abductor strength was lower in injured players than in uninjured players ( P = .008). Logistic regression indicated that impaired hip-abductor strength increased the future injury risk (odds ratio = 1.10 [95% confidence interval = 1.02, 1.18], P = .010). The strength cutoff to define high risk was ≤33.8% body weight, as determined by receiver operating characteristic curve analysis. For athletes classified as high risk, the probability of injury increased from 11.9% to 26.7%. Reduced isometric hip-abductor strength predisposed competitive male soccer players to noncontact lateral ankle sprains.
Quality of Life Among HIV-Infected Patients in Brazil after Initiation of Treatment
Campos, Lorenza Nogueira; César, Cibele Comini; Guimarães, Mark Drew Crosland
2009-01-01
INTRODUCTION Despite improvement in clinical treatment for HIV-infected patients, the impact of antiretroviral therapy on the overall quality of life has become a major concern. OBJECTIVE To identify factors associated with increased levels of self-reported quality of life among HIV-infected patients after four months of antiretroviral therapy. METHODS Patients were recruited at two public health referral centers for AIDS, Belo Horizonte, Brazil, for a prospective adherence study. Patients were interviewed before initiating treatment (baseline) and after one and four months. Quality of life was assessed using a psychometric instrument, and factors associated with good/very good quality of life four months after the initiation of antiretroviral therapy were assessed using a cross-sectional approach. Logistic regression was used for analysis. RESULTS Overall quality of life was classified as ‘very good/good’ by 66.4% of the participants four months after initiating treatment, while 33.6% classified it as ‘neither poor nor good/poor/very poor’. Logistic regression indicated that >8 years of education, none/mild symptoms of anxiety and depression, no antiretroviral switch, lower number of adverse reactions and better quality of life at baseline were independently associated with good/very good quality of life over four months of treatment. CONCLUSIONS Our results highlight the importance of modifiable factors such as psychiatric symptoms and treatment-related variables that may contribute to a better quality of life among patients initiating treatment. Considering that poor quality of life is related to non-adherence to antiretroviral therapy, careful clinical monitoring of these factors may contribute to ensuring the long-term effectiveness of antiretroviral regimens. PMID:19759880
NASA Astrophysics Data System (ADS)
Yao, W.; Poleswki, P.; Krzystek, P.
2016-06-01
The recent success of deep convolutional neural networks (CNN) on a large number of applications can be attributed to large amounts of available training data and increasing computing power. In this paper, a semantic pixel labelling scheme for urban areas using multi-resolution CNN and hand-crafted spatial-spectral features of airborne remotely sensed data is presented. Both CNN and hand-crafted features are applied to image/DSM patches to produce per-pixel class probabilities with a L1-norm regularized logistical regression classifier. The evidence theory infers a degree of belief for pixel labelling from different sources to smooth regions by handling the conflicts present in the both classifiers while reducing the uncertainty. The aerial data used in this study were provided by ISPRS as benchmark datasets for 2D semantic labelling tasks in urban areas, which consists of two data sources from LiDAR and color infrared camera. The test sites are parts of a city in Germany which is assumed to consist of typical object classes including impervious surfaces, trees, buildings, low vegetation, vehicles and clutter. The evaluation is based on the computation of pixel-based confusion matrices by random sampling. The performance of the strategy with respect to scene characteristics and method combination strategies is analyzed and discussed. The competitive classification accuracy could be not only explained by the nature of input data sources: e.g. the above-ground height of nDSM highlight the vertical dimension of houses, trees even cars and the nearinfrared spectrum indicates vegetation, but also attributed to decision-level fusion of CNN's texture-based approach with multichannel spatial-spectral hand-crafted features based on the evidence combination theory.
Automatic seed selection for segmentation of liver cirrhosis in laparoscopic sequences
NASA Astrophysics Data System (ADS)
Sinha, Rahul; Marcinczak, Jan Marek; Grigat, Rolf-Rainer
2014-03-01
For computer aided diagnosis based on laparoscopic sequences, image segmentation is one of the basic steps which define the success of all further processing. However, many image segmentation algorithms require prior knowledge which is given by interaction with the clinician. We propose an automatic seed selection algorithm for segmentation of liver cirrhosis in laparoscopic sequences which assigns each pixel a probability of being cirrhotic liver tissue or background tissue. Our approach is based on a trained classifier using SIFT and RGB features with PCA. Due to the unique illumination conditions in laparoscopic sequences of the liver, a very low dimensional feature space can be used for classification via logistic regression. The methodology is evaluated on 718 cirrhotic liver and background patches that are taken from laparoscopic sequences of 7 patients. Using a linear classifier we achieve a precision of 91% in a leave-one-patient-out cross-validation. Furthermore, we demonstrate that with logistic probability estimates, seeds with high certainty of being cirrhotic liver tissue can be obtained. For example, our precision of liver seeds increases to 98.5% if only seeds with more than 95% probability of being liver are used. Finally, these automatically selected seeds can be used as priors in Graph Cuts which is demonstrated in this paper.
Statistical classification of drug incidents due to look-alike sound-alike mix-ups.
Wong, Zoie Shui Yee
2016-06-01
It has been recognised that medication names that look or sound similar are a cause of medication errors. This study builds statistical classifiers for identifying medication incidents due to look-alike sound-alike mix-ups. A total of 227 patient safety incident advisories related to medication were obtained from the Canadian Patient Safety Institute's Global Patient Safety Alerts system. Eight feature selection strategies based on frequent terms, frequent drug terms and constituent terms were performed. Statistical text classifiers based on logistic regression, support vector machines with linear, polynomial, radial-basis and sigmoid kernels and decision tree were trained and tested. The models developed achieved an average accuracy of above 0.8 across all the model settings. The receiver operating characteristic curves indicated the classifiers performed reasonably well. The results obtained in this study suggest that statistical text classification can be a feasible method for identifying medication incidents due to look-alike sound-alike mix-ups based on a database of advisories from Global Patient Safety Alerts. © The Author(s) 2014.
Time series modeling by a regression approach based on a latent process.
Chamroukhi, Faicel; Samé, Allou; Govaert, Gérard; Aknin, Patrice
2009-01-01
Time series are used in many domains including finance, engineering, economics and bioinformatics generally to represent the change of a measurement over time. Modeling techniques may then be used to give a synthetic representation of such data. A new approach for time series modeling is proposed in this paper. It consists of a regression model incorporating a discrete hidden logistic process allowing for activating smoothly or abruptly different polynomial regression models. The model parameters are estimated by the maximum likelihood method performed by a dedicated Expectation Maximization (EM) algorithm. The M step of the EM algorithm uses a multi-class Iterative Reweighted Least-Squares (IRLS) algorithm to estimate the hidden process parameters. To evaluate the proposed approach, an experimental study on simulated data and real world data was performed using two alternative approaches: a heteroskedastic piecewise regression model using a global optimization algorithm based on dynamic programming, and a Hidden Markov Regression Model whose parameters are estimated by the Baum-Welch algorithm. Finally, in the context of the remote monitoring of components of the French railway infrastructure, and more particularly the switch mechanism, the proposed approach has been applied to modeling and classifying time series representing the condition measurements acquired during switch operations.
Frndak, Seth E; Smerbeck, Audrey M; Irwin, Lauren N; Drake, Allison S; Kordovski, Victoria M; Kunker, Katrina A; Khan, Anjum L; Benedict, Ralph H B
2016-10-01
We endeavored to clarify how distinct co-occurring symptoms relate to the presence of negative work events in employed multiple sclerosis (MS) patients. Latent profile analysis (LPA) was utilized to elucidate common disability patterns by isolating patient subpopulations. Samples of 272 employed MS patients and 209 healthy controls (HC) were administered neuroperformance tests of ambulation, hand dexterity, processing speed, and memory. Regression-based norms were created from the HC sample. LPA identified latent profiles using the regression-based z-scores. Finally, multinomial logistic regression tested for negative work event differences among the latent profiles. Four profiles were identified via LPA: a common profile (55%) characterized by slightly below average performance in all domains, a broadly low-performing profile (18%), a poor motor abilities profile with average cognition (17%), and a generally high-functioning profile (9%). Multinomial regression analysis revealed that the uniformly low-performing profile demonstrated a higher likelihood of reported negative work events. Employed MS patients with co-occurring motor, memory and processing speed impairments were most likely to report a negative work event, classifying them as uniquely at risk for job loss.
Impaired executive function can predict recurrent falls in Parkinson's disease.
Mak, Margaret K; Wong, Adrian; Pang, Marco Y
2014-12-01
To examine whether impairment in executive function independently predicts recurrent falls in people with Parkinson's disease (PD). Prospective cohort study. University motor control research laboratory. A convenience sample of community-dwelling people with PD (N=144) was recruited from a patient self-help group and movement disorders clinics. Not applicable. Executive function was assessed with the Mattis Dementia Rating Scale Initiation/Perseveration (MDRS-IP) subtest, and fear of falling (FoF) with the Activities-specific Balance Confidence (ABC) Scale. All participants were followed up for 12 months to record the number of monthly fall events. Forty-two people with PD had at least 2 falls during the follow-up period and were classified as recurrent fallers. After accounting for demographic variables and fall history (P=.001), multiple logistic regression analysis showed that the ABC scores (P=.014) and MDRS-IP scores (P=.006) were significantly associated with future recurrent falls among people with PD. The overall accuracy of the prediction was 85.9%. With the use of the significant predictors identified in multiple logistic regression analysis, a prediction model determined by the logistic function was generated: Z = 1.544 + .378 (fall history) - .045 (ABC) - .145 (MDRS-IP). Impaired executive function is a significant predictor of future recurrent falls in people with PD. Participants with executive dysfunction and greater FoF at baseline had a significantly greater risk of sustaining a recurrent fall within the subsequent 12 months. Copyright © 2014 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Okuyama, Mayumi; Nishida, Masumi
2016-01-01
The aim of the present study was to examine the association between impending dehydration among elderly people in nursing homes and physical signs, including the axillary skin temperature, humidity, intraoral moisture content, and salivary components. The study included 78 elderly individuals who required long-term care in a nursing home (11 men and 67 women; average age, 86.6±7.3 years). The elderly subjects were classified in two groups according to their serum osmolality levels: those with levels between the upper limit reference value (292 mOsm/kg H2O) and the diagnostic reference value of dehydration (300 mOsm/kg H2O) were classified into the boundary zone group and those with levels of <292 mOsm/kg H2O were classified into the normal range group. The following parameters were measured: basic attributes (age, gender and level of care required), body mass index, diet, daily fluid intake per kilogram of body weight, physiological indicators (blood pressure, pulse rate, body temperature, axillary skin temperature, humidity, total body water, body water rate, internal liquid rate, external solution rate, blood components, intraoral water amount, and salivary components), and the indoor environment (room temperature and humidity). We then performed a statistical analysis to compare the boundary zone group with the normal range group. After adjusting for age and the daily fluid intake per kilogram of body weight (<25 ml/≥25 ml), we performed a logistic regression analysis (the boundary zone group was used as an independent variable) for variables that had significance levels of <0.05 (except for blood components). The univariate analysis revealed significant differences in the following parameters: the serum sodium, chloride, and creatinine levels; the blood sugar level; the urea nitrogen/creatinine ratio; the axillary skin temperature; and room humidity. Only the axillary skin temperature showed a significant association in the final model of the logistic regression analysis (odds ratio, 3.664; 95% confidence interval, 1.101-12.197; p = 0.034). As the axillary skin temperature increased by 1°C, there was a 3.67-fold risk of being classified into the boundary zone group instead of the normal range group. Thus, the axillary skin temperature was associated with impending dehydration.
Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model
NASA Astrophysics Data System (ADS)
Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami
2017-06-01
A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.
The purpose of this report is to provide a reference manual that could be used by investigators for making informed use of logistic regression using two methods (standard logistic regression and MARS). The details for analyses of relationships between a dependent binary response ...
Predicting U.S. Army Reserve Unit Manning Using Market Demographics
2015-06-01
develops linear regression , classification tree, and logistic regression models to determine the ability of the location to support manning requirements... logistic regression model delivers predictive results that allow decision-makers to identify locations with a high probability of meeting unit...manning requirements. The recommendation of this thesis is that the USAR implement the logistic regression model. 14. SUBJECT TERMS U.S
ERIC Educational Resources Information Center
Chen, Chau-Kuang
2005-01-01
Logistic and Cox regression methods are practical tools used to model the relationships between certain student learning outcomes and their relevant explanatory variables. The logistic regression model fits an S-shaped curve into a binary outcome with data points of zero and one. The Cox regression model allows investigators to study the duration…
Yusuf, O B; Bamgboye, E A; Afolabi, R F; Shodimu, M A
2014-09-01
Logistic regression model is widely used in health research for description and predictive purposes. Unfortunately, most researchers are sometimes not aware that the underlying principles of the techniques have failed when the algorithm for maximum likelihood does not converge. Young researchers particularly postgraduate students may not know why separation problem whether quasi or complete occurs, how to identify it and how to fix it. This study was designed to critically evaluate convergence issues in articles that employed logistic regression analysis published in an African Journal of Medicine and medical sciences between 2004 and 2013. Problems of quasi or complete separation were described and were illustrated with the National Demographic and Health Survey dataset. A critical evaluation of articles that employed logistic regression was conducted. A total of 581 articles was reviewed, of which 40 (6.9%) used binary logistic regression. Twenty-four (60.0%) stated the use of logistic regression model in the methodology while none of the articles assessed model fit. Only 3 (12.5%) properly described the procedures. Of the 40 that used the logistic regression model, the problem of convergence occurred in 6 (15.0%) of the articles. Logistic regression tends to be poorly reported in studies published between 2004 and 2013. Our findings showed that the procedure may not be well understood by researchers since very few described the process in their reports and may be totally unaware of the problem of convergence or how to deal with it.
Logistic Regression: Concept and Application
ERIC Educational Resources Information Center
Cokluk, Omay
2010-01-01
The main focus of logistic regression analysis is classification of individuals in different groups. The aim of the present study is to explain basic concepts and processes of binary logistic regression analysis intended to determine the combination of independent variables which best explain the membership in certain groups called dichotomous…
Beretta, Lorenzo; Santaniello, Alessandro; Cappiello, Francesca; Chawla, Nitesh V; Vonk, Madelon C; Carreira, Patricia E; Allanore, Yannick; Popa-Diaconu, D A; Cossu, Marta; Bertolotti, Francesca; Ferraccioli, Gianfranco; Mazzone, Antonino; Scorza, Raffaella
2010-01-01
Systemic sclerosis (SSc) is a multiorgan disease with high mortality rates. Several clinical features have been associated with poor survival in different populations of SSc patients, but no clear and reproducible prognostic model to assess individual survival prediction in scleroderma patients has ever been developed. We used Cox regression and three data mining-based classifiers (Naïve Bayes Classifier [NBC], Random Forests [RND-F] and logistic regression [Log-Reg]) to develop a robust and reproducible 5-year prognostic model. All the models were built and internally validated by means of 5-fold cross-validation on a population of 558 Italian SSc patients. Their predictive ability and capability of generalisation was then tested on an independent population of 356 patients recruited from 5 external centres and finally compared to the predictions made by two SSc domain experts on the same population. The NBC outperformed the Cox-based classifier and the other data mining algorithms after internal cross-validation (area under receiving operator characteristic curve, AUROC: NBC=0.759; RND-F=0.736; Log-Reg=0.754 and Cox= 0.724). The NBC had also a remarkable and better trade-off between sensitivity and specificity (e.g. Balanced accuracy, BA) than the Cox-based classifier, when tested on an independent population of SSc patients (BA: NBC=0.769, Cox=0.622). The NBC was also superior to domain experts in predicting 5-year survival in this population (AUROC=0.829 vs. AUROC=0.788 and BA=0.769 vs. BA=0.67). We provide a model to make consistent 5-year prognostic predictions in SSc patients. Its internal validity, as well as capability of generalisation and reduced uncertainty compared to human experts support its use at bedside. Available at: http://www.nd.edu/~nchawla/survival.xls.
NASA Astrophysics Data System (ADS)
Pradhan, Biswajeet
2010-05-01
This paper presents the results of the cross-validation of a multivariate logistic regression model using remote sensing data and GIS for landslide hazard analysis on the Penang, Cameron, and Selangor areas in Malaysia. Landslide locations in the study areas were identified by interpreting aerial photographs and satellite images, supported by field surveys. SPOT 5 and Landsat TM satellite imagery were used to map landcover and vegetation index, respectively. Maps of topography, soil type, lineaments and land cover were constructed from the spatial datasets. Ten factors which influence landslide occurrence, i.e., slope, aspect, curvature, distance from drainage, lithology, distance from lineaments, soil type, landcover, rainfall precipitation, and normalized difference vegetation index (ndvi), were extracted from the spatial database and the logistic regression coefficient of each factor was computed. Then the landslide hazard was analysed using the multivariate logistic regression coefficients derived not only from the data for the respective area but also using the logistic regression coefficients calculated from each of the other two areas (nine hazard maps in all) as a cross-validation of the model. For verification of the model, the results of the analyses were then compared with the field-verified landslide locations. Among the three cases of the application of logistic regression coefficient in the same study area, the case of Selangor based on the Selangor logistic regression coefficients showed the highest accuracy (94%), where as Penang based on the Penang coefficients showed the lowest accuracy (86%). Similarly, among the six cases from the cross application of logistic regression coefficient in other two areas, the case of Selangor based on logistic coefficient of Cameron showed highest (90%) prediction accuracy where as the case of Penang based on the Selangor logistic regression coefficients showed the lowest accuracy (79%). Qualitatively, the cross application model yields reasonable results which can be used for preliminary landslide hazard mapping.
Covariations of adolescent weight-control, health-risk and health-promoting behaviors.
Rafiroiu, Codruta; Sargent, Roger G; Parra-Medina, Deborah; Drane, Wanzer J; Valois, Robert F
2003-01-01
To assess the prevalence of dieting and investigate clusters of risk behaviors among adolescents. Data were secured from a random sample of adolescents (4,636) and analyzed using bivariate methods and logistic regression. From the survey sample, 19.2% adolescents were classified as extreme, 43.2% as moderate dieters, 37.2% as nondieters. Extreme dieters were more likely to use alcohol, cigarettes, and/or marijuana and to attempt suicide and less likely to practice vigorous exercise. Moderate dieters were less likely to use cigarettes, marijuana and more likely to engage in vigorous exercise, with differences across gender-race categories. Results have relevance for developing multicomponent programs for adolescents.
Talbert, Steven
2009-01-01
This study evaluated the association between changing physiological status (delta data) with severe injury (SI) or need for trauma center resources (TCR). Prehospital and emergency department arrival weighted RTS (RTSw) were computed for patients with complete records entered into the registry from 2002 to 2004 (n = 23,753). Physiological change was classified as unchanged, deteriorated, or improved (PreRTSw vs EDRTSw). Performance of delta data was evaluated using standard epidemiological approaches and multiple logistic regression. Deterioration status predicted SI (operating room [OR] = 1.38) and TCR (OR = 2.09). Improved status predicted TCR (OR = 1.27). Delta data independently predicted both SI and TCR.
Logistic regression applied to natural hazards: rare event logistic regression with replications
NASA Astrophysics Data System (ADS)
Guns, M.; Vanacker, V.
2012-06-01
Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.
Hosseinifard, Behshad; Moradi, Mohammad Hassan; Rostami, Reza
2013-03-01
Diagnosing depression in the early curable stages is very important and may even save the life of a patient. In this paper, we study nonlinear analysis of EEG signal for discriminating depression patients and normal controls. Forty-five unmedicated depressed patients and 45 normal subjects were participated in this study. Power of four EEG bands and four nonlinear features including detrended fluctuation analysis (DFA), higuchi fractal, correlation dimension and lyapunov exponent were extracted from EEG signal. For discriminating the two groups, k-nearest neighbor, linear discriminant analysis and logistic regression as the classifiers are then used. Highest classification accuracy of 83.3% is obtained by correlation dimension and LR classifier among other nonlinear features. For further improvement, all nonlinear features are combined and applied to classifiers. A classification accuracy of 90% is achieved by all nonlinear features and LR classifier. In all experiments, genetic algorithm is employed to select the most important features. The proposed technique is compared and contrasted with the other reported methods and it is demonstrated that by combining nonlinear features, the performance is enhanced. This study shows that nonlinear analysis of EEG can be a useful method for discriminating depressed patients and normal subjects. It is suggested that this analysis may be a complementary tool to help psychiatrists for diagnosing depressed patients. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Mahrooghy, Majid; Ashraf, Ahmed B; Daye, Dania; McDonald, Elizabeth S; Rosen, Mark; Mies, Carolyn; Feldman, Michael; Kontos, Despina
2015-06-01
Heterogeneity in cancer can affect response to therapy and patient prognosis. Histologic measures have classically been used to measure heterogeneity, although a reliable noninvasive measurement is needed both to establish baseline risk of recurrence and monitor response to treatment. Here, we propose using spatiotemporal wavelet kinetic features from dynamic contrast-enhanced magnetic resonance imaging to quantify intratumor heterogeneity in breast cancer. Tumor pixels are first partitioned into homogeneous subregions using pharmacokinetic measures. Heterogeneity wavelet kinetic (HetWave) features are then extracted from these partitions to obtain spatiotemporal patterns of the wavelet coefficients and the contrast agent uptake. The HetWave features are evaluated in terms of their prognostic value using a logistic regression classifier with genetic algorithm wrapper-based feature selection to classify breast cancer recurrence risk as determined by a validated gene expression assay. Receiver operating characteristic analysis and area under the curve (AUC) are computed to assess classifier performance using leave-one-out cross validation. The HetWave features outperform other commonly used features (AUC = 0.88 HetWave versus 0.70 standard features). The combination of HetWave and standard features further increases classifier performance (AUCs 0.94). The rate of the spatial frequency pattern over the pharmacokinetic partitions can provide valuable prognostic information. HetWave could be a powerful feature extraction approach for characterizing tumor heterogeneity, providing valuable prognostic information.
Prediction of performance on the RCMP physical ability requirement evaluation.
Stanish, H I; Wood, T M; Campagna, P
1999-08-01
The Royal Canadian Mounted Police use the Physical Ability Requirement Evaluation (PARE) for screening applicants. The purposes of this investigation were to identify those field tests of physical fitness that were associated with PARE performance and determine which most accurately classified successful and unsuccessful PARE performers. The participants were 27 female and 21 male volunteers. Testing included measures of aerobic power, anaerobic power, agility, muscular strength, muscular endurance, and body composition. Multiple regression analysis revealed a three-variable model for males (70-lb bench press, standing long jump, and agility) explaining 79% of the variability in PARE time, whereas a one-variable model (agility) explained 43% of the variability for females. Analysis of the classification accuracy of the males' data was prohibited because 91% of the males passed the PARE. Classification accuracy of the females' data, using logistic regression, produced a two-variable model (agility, 1.5-mile endurance run) with 93% overall classification accuracy.
Work Outcomes in Patients Who Stay at Work Despite Musculoskeletal Pain.
Cochrane, Andy; Higgins, Niamh M; Rothwell, Conor; Ashton, Jennifer; Breen, Roisin; Corcoran, Oriel; FitzGerald, Oliver; Gallagher, Pamela; Desmond, Deirdre
2017-12-13
Purpose To assess self-reported work impacts and associations between psychosocial risk factors and work impairment amongst workers seeking care for musculoskeletal pain while continuing to work. Methods Patients were recruited from Musculoskeletal Assessment Clinics at 5 hospitals across Ireland. Participants completed questionnaires including assessments of work impairment (Work Productivity and Activity Impairment Questionnaire), work ability (single item from the Work Ability Index) and work performance (Work Role Functioning Questionnaire; WRFQ). Logistic and hierarchical regressions were conducted to analyse the relation between psychosocial variables and work outcomes. Results 155 participants (53.5% female; mean age = 46.50 years) who were working at the time of assessment completed the questionnaires. Absenteeism was low, yet 62.6% were classified as functioning poorly according to the WRFQ; 52.3% reported having poor work ability. Logistic regression analyses indicated that higher work role functioning was associated with higher pain self-efficacy (OR 1.51); better work ability was associated with older age (OR 1.063) and lower functional restriction (OR 0.93); greater absenteeism was associated with lower pain self-efficacy (OR 0.65) and poorer work expectancy (OR 1.18). Multiple regression analysis indicated that greater presenteeism was associated with higher pain intensity (β = 0.259) and lower pain self-efficacy (β = - 0.385). Conclusions While individuals continue to work with musculoskeletal pain, their work performance can be adversely affected. Interventions that target mutable factors, such as pain self-efficacy, may help reduce the likelihood of work impairment.
1996-04-01
Logistics Transfer 3 Data KFA Match Through Association 1 KFC File Data Minus Security Classi- 1 Note 1: Output DICs other than Search and Inter- fled...vols 8/9 KEC Output Exceeds AUTODIN Limitations 4,5 vols 8/9 KFA Match through Association 4 vols 8/9 KFC File Data Minus Security Classified...Activities 2 Nuclear Ordnance 4 Reference Numbers 2 SECURITY CLASSIFIED DATA, FILE DATA MINUS 4 vols 8/9, DIC KFC SECURITY CLASSIFIED CHARACTERISTICS 4 vols
Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning
ERIC Educational Resources Information Center
Li, Zhushan
2014-01-01
Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…
A Methodology for Generating Placement Rules that Utilizes Logistic Regression
ERIC Educational Resources Information Center
Wurtz, Keith
2008-01-01
The purpose of this article is to provide the necessary tools for institutional researchers to conduct a logistic regression analysis and interpret the results. Aspects of the logistic regression procedure that are necessary to evaluate models are presented and discussed with an emphasis on cutoff values and choosing the appropriate number of…
John Hogland; Nedret Billor; Nathaniel Anderson
2013-01-01
Discriminant analysis, referred to as maximum likelihood classification within popular remote sensing software packages, is a common supervised technique used by analysts. Polytomous logistic regression (PLR), also referred to as multinomial logistic regression, is an alternative classification approach that is less restrictive, more flexible, and easy to interpret. To...
Kim, Sang Hyun
2013-12-01
The purpose of this study was to examine the concordance between a checklist's categories of professor recommendation letters and characteristics of the self-introduction letter. Checklists of professor recommendation letters were analyzed and classified into cognitive, social, and affective domains. Simple correlation was performed to determine whether the characteristics of the checklists were concordant with those of the self-introduction letter. The difference in ratings of the checklists by pass or fail grades was analyzed by independent sample t-test. Logistic regression analysis was performed to determine whether a pass or fail grade was influenced by ratings on the checklists. The Cronbach alpha value of the checklists was 0.854. Initiative, as an affective domain, in the professor's recommendation letter was highly ranked among the six checklist categories. Self-directed learning in the self-introduction letter was influenced by a pass or fail grade by logistic regression analysis (p<0.05). Successful applicants received higher ratings than those who failed in every checklist category, particularly in problem-solving ability, communication skills, initiative, and morality (p<0.05). There was a strong correlation between cognitive and affective characteristics in the professor recommendation letters and the sum of all characteristics in the self-introduction letter.
Polasek, Ozren; Kolcic, Ivana; Dzakula, Aleksandar; Bagat, Mario
2006-01-01
Background Human resources management in health often encounters problems related to workforce geographical distribution. The aim of this study was to investigate the internship workplace preferences of final-year medical students and the reasons associated with their choices. Method A total of 204 out of 240 final-year medical students at Zagreb University Medical School, Croatia, were surveyed a few months before graduation. We collected data on each student's background, workplace preference, academic performance and emigration preferences. Logistic regression was used to analyse the factors underlying internship workplace preference, classified into two categories: Zagreb versus other areas. Results Only 39 respondents (19.1%) wanted to obtain internships outside Zagreb, the Croatian capital. Gender and age were not significantly associated with internship workplace preference. A single predictor variable significantly contributed to the logistic regression model: students who believed they would not get the desired specialty more often chose Zagreb as a preferred internship workplace (odds ratio 0.32, 95% CI 0.12–0.86). Conclusion A strong preference for Zagreb as an internship workplace was recorded. Uncertainty about getting the desired specialty was associated with choosing Zagreb as a workplace, possibly due to more extensive and diverse job opportunities. PMID:16579857
Veauthier, Christian
2013-01-01
Background The Fatigue Severity Scale (FSS) is widely used to assess fatigue, not only in the context of multiple sclerosis-related fatigue, but also in many other medical conditions. Some polysomnographic studies have shown high FSS values in sleep-disordered patients without multiple sclerosis. The Modified Fatigue Impact Scale (MFIS) has increasingly been used in order to assess fatigue, but polysomnographic data investigating sleep-disordered patients are thus far unavailable. Moreover, the pathophysiological link between sleep architecture and fatigue measured with the MFIS and the FSS has not been previously investigated. Methods This was a retrospective observational study (n = 410) with subgroups classified according to sleep diagnosis. The statistical analysis included nonparametric correlation between questionnaire results and polysomnographic data, age and sex, and univariate and multiple logistic regression. Results The multiple logistic regression showed a significant relationship between FSS/MFIS values and younger age and female sex. Moreover, there was a significant relationship between FSS values and number of arousals and between MFIS values and number of awakenings. Conclusion Younger age, female sex, and high number of awakenings and arousals are predictive of fatigue in sleep-disordered patients. Further investigations are needed to find the pathophysiological explanation for these relationships. PMID:24109185
Detecting Dementia Through Interactive Computer Avatars
Adachi, Hiroyoshi; Ukita, Norimichi; Ikeda, Manabu; Kazui, Hiroaki; Kudo, Takashi; Nakamura, Satoshi
2017-01-01
This paper proposes a new approach to automatically detect dementia. Even though some works have detected dementia from speech and language attributes, most have applied detection using picture descriptions, narratives, and cognitive tasks. In this paper, we propose a new computer avatar with spoken dialog functionalities that produces spoken queries based on the mini-mental state examination, the Wechsler memory scale-revised, and other related neuropsychological questions. We recorded the interactive data of spoken dialogues from 29 participants (14 dementia and 15 healthy controls) and extracted various audiovisual features. We tried to predict dementia using audiovisual features and two machine learning algorithms (support vector machines and logistic regression). Here, we show that the support vector machines outperformed logistic regression, and by using the extracted features they classified the participants into two groups with 0.93 detection performance, as measured by the areas under the receiver operating characteristic curve. We also newly identified some contributing features, e.g., gap before speaking, the variations of fundamental frequency, voice quality, and the ratio of smiling. We concluded that our system has the potential to detect dementia through spoken dialog systems and that the system can assist health care workers. In addition, these findings could help medical personnel detect signs of dementia. PMID:29018636
Suicide in the media: a quantitative review of studies based on non-fictional stories.
Stack, Steven
2005-04-01
Research on the effect of suicide stories in the media on suicide in the real world has been marked by much debate and inconsistent findings. Recent narrative reviews have suggested that research based on nonfictional models is more apt to uncover imitative effects than research based on fictional models. There is, however, substantial variation in media effects within the research restricted to nonfictional accounts of suicide. The present analysis provides some explanations of the variation in findings in the work on nonfictional media. Logistic regression techniques applied to 419 findings from 55 studies determined that: (1) studies measuring the presence of either an entertainment or political celebrity were 5.27 times more likely to find a copycat effect, (2) studies focusing on stories that stressed negative definitions of suicide were 99% less likely to report a copycat effect, (3) research based on television stories (which receive less coverage than print stories) were 79% less likely to find a copycat effect, and (4) studies focusing on female suicide were 4.89 times more likely to report a copycat effect than other studies. The full logistic regression model correctly classified 77.3% of the findings from the 55 studies. Methodological differences among studies are associated with discrepancies in their results.
Viana, Andres G; Rabian, Brian; Beidel, Deborah C
2008-06-01
We examined differences in self-reported anxiety and depression according to the number and pattern of DSM-IV comorbid diagnoses in 172 children and adolescents (mean age=11.87, S.D.=2.67; range=7-17) with a primary diagnosis of social phobia. Three hypotheses were tested: (1) children with comorbid anxiety disorders would show significantly higher scores than children with social phobia-only on self-report measures, (2) self-report measures would significantly differentiate between children with social phobia and comorbid internalizing versus externalizing disorders, and (3) self-report measures would significantly differentiate children according to the type of anxiety comorbidities present. Multinomial logistic regressions showed that children with three anxiety disorders scored significantly higher than children with one and two diagnoses on two of three self-report measures used. Logistic regressions revealed that children's scores on measures did not differ according to the nature of the comorbid diagnoses (internalizing vs. externalizing). Finally, ROC curves showed that the MASC and the SPAI-C accurately classified children with additional diagnoses of SAD and GAD, respectively. The potential of self-report measures to further our understanding of childhood anxiety comorbidity and the clinical implications of their use to screen for comorbidity are discussed along with suggestions for further study.
Grigoletti, Laura; Amaddeo, Francesco; Grassi, Aldrigo; Boldrini, Massimo; Chiappelli, Marco; Percudani, Mauro; Catapano, Francesco; Fiorillo, Andrea; Perris, Francesco; Bacigalupi, Maurizio; Albanese, Paolo; Simonetti, Simona; De Agostini, Paola; Tansella, Michele
2010-01-01
To develop predictive models to allocate patients into frequent and low service users groups within the Italian Community-based Mental Health Services (CMHSs). To allocate frequent users to different packages of care, identifying the costs of these packages. Socio-demographic and clinical data and GAF scores at baseline were collected for 1250 users attending five CMHSs. All psychiatric contacts made by these patients during six months were recorded. A logistic regression identified frequent service users predictive variables. Multinomial logistic regression identified variables able to predict the most appropriate package of care. A cost function was utilised to estimate costs. Frequent service users were 49%, using nearly 90% of all contacts. The model classified correctly 80% of users in the frequent and low users groups. Three packages of care were identified: Basic Community Treatment (4,133 Euro per six months); Intensive Community Treatment (6,180 Euro) and Rehabilitative Community Treatment (11,984 Euro) for 83%, 6% and 11% of frequent service users respectively. The model was found to be accurate for 85% of users. It is possible to develop predictive models to identify frequent service users and to assign them to pre-defined packages of care, and to use these models to inform the funding of psychiatric care.
What Are the Odds of that? A Primer on Understanding Logistic Regression
ERIC Educational Resources Information Center
Huang, Francis L.; Moon, Tonya R.
2013-01-01
The purpose of this Methodological Brief is to present a brief primer on logistic regression, a commonly used technique when modeling dichotomous outcomes. Using data from the National Education Longitudinal Study of 1988 (NELS:88), logistic regression techniques were used to investigate student-level variables in eighth grade (i.e., enrolled in a…
On the Usefulness of a Multilevel Logistic Regression Approach to Person-Fit Analysis
ERIC Educational Resources Information Center
Conijn, Judith M.; Emons, Wilco H. M.; van Assen, Marcel A. L. M.; Sijtsma, Klaas
2011-01-01
The logistic person response function (PRF) models the probability of a correct response as a function of the item locations. Reise (2000) proposed to use the slope parameter of the logistic PRF as a person-fit measure. He reformulated the logistic PRF model as a multilevel logistic regression model and estimated the PRF parameters from this…
Stylianou, Neophytos; Akbarov, Artur; Kontopantelis, Evangelos; Buchan, Iain; Dunn, Ken W
2015-08-01
Predicting mortality from burn injury has traditionally employed logistic regression models. Alternative machine learning methods have been introduced in some areas of clinical prediction as the necessary software and computational facilities have become accessible. Here we compare logistic regression and machine learning predictions of mortality from burn. An established logistic mortality model was compared to machine learning methods (artificial neural network, support vector machine, random forests and naïve Bayes) using a population-based (England & Wales) case-cohort registry. Predictive evaluation used: area under the receiver operating characteristic curve; sensitivity; specificity; positive predictive value and Youden's index. All methods had comparable discriminatory abilities, similar sensitivities, specificities and positive predictive values. Although some machine learning methods performed marginally better than logistic regression the differences were seldom statistically significant and clinically insubstantial. Random forests were marginally better for high positive predictive value and reasonable sensitivity. Neural networks yielded slightly better prediction overall. Logistic regression gives an optimal mix of performance and interpretability. The established logistic regression model of burn mortality performs well against more complex alternatives. Clinical prediction with a small set of strong, stable, independent predictors is unlikely to gain much from machine learning outside specialist research contexts. Copyright © 2015 Elsevier Ltd and ISBI. All rights reserved.
Valle, Denis; Lima, Joanna M Tucker; Millar, Justin; Amratia, Punam; Haque, Ubydul
2015-11-04
Logistic regression is a statistical model widely used in cross-sectional and cohort studies to identify and quantify the effects of potential disease risk factors. However, the impact of imperfect tests on adjusted odds ratios (and thus on the identification of risk factors) is under-appreciated. The purpose of this article is to draw attention to the problem associated with modelling imperfect diagnostic tests, and propose simple Bayesian models to adequately address this issue. A systematic literature review was conducted to determine the proportion of malaria studies that appropriately accounted for false-negatives/false-positives in a logistic regression setting. Inference from the standard logistic regression was also compared with that from three proposed Bayesian models using simulations and malaria data from the western Brazilian Amazon. A systematic literature review suggests that malaria epidemiologists are largely unaware of the problem of using logistic regression to model imperfect diagnostic test results. Simulation results reveal that statistical inference can be substantially improved when using the proposed Bayesian models versus the standard logistic regression. Finally, analysis of original malaria data with one of the proposed Bayesian models reveals that microscopy sensitivity is strongly influenced by how long people have lived in the study region, and an important risk factor (i.e., participation in forest extractivism) is identified that would have been missed by standard logistic regression. Given the numerous diagnostic methods employed by malaria researchers and the ubiquitous use of logistic regression to model the results of these diagnostic tests, this paper provides critical guidelines to improve data analysis practice in the presence of misclassification error. Easy-to-use code that can be readily adapted to WinBUGS is provided, enabling straightforward implementation of the proposed Bayesian models.
Identification of patients with gout: elaboration of a questionnaire for epidemiological studies.
Richette, P; Clerson, P; Bouée, S; Chalès, G; Doherty, M; Flipo, R M; Lambert, C; Lioté, F; Poiraud, T; Schaeverbeke, T; Bardin, T
2015-09-01
In France, the prevalence of gout is currently unknown. We aimed to design a questionnaire to detect gout that would be suitable for use in a telephone survey by non-physicians and assessed its performance. We designed a 62-item questionnaire covering comorbidities, clinical features and treatment of gout. In a case-control study, we enrolled patients with a history of arthritis who had undergone arthrocentesis for synovial fluid analysis and crystal detection. Cases were patients with crystal-proven gout and controls were patients who had arthritis and effusion with no monosodium urate crystals in synovial fluid. The questionnaire was administered by phone to cases and controls by non-physicians who were unaware of the patient diagnosis. Logistic regression analysis and classification and regression trees were used to select items discriminating cases and controls. We interviewed 246 patients (102 cases and 142 controls). Two logistic regression models (sensitivity 88.0% and 87.5%; specificity 93.0% and 89.8%, respectively) and one classification and regression tree model (sensitivity 81.4%, specificity 93.7%) revealed 11 informative items that allowed for classifying 90.0%, 88.8% and 88.5% of patients, respectively. We developed a questionnaire to detect gout containing 11 items that is fast and suitable for use in a telephone survey by non-physicians. The questionnaire demonstrated good properties for discriminating patients with and without gout. It will be administered in a large sample of the general population to estimate the prevalence of gout in France. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Speech prosody impairment predicts cognitive decline in Parkinson's disease.
Rektorova, Irena; Mekyska, Jiri; Janousova, Eva; Kostalova, Milena; Eliasova, Ilona; Mrackova, Martina; Berankova, Dagmar; Necasova, Tereza; Smekal, Zdenek; Marecek, Radek
2016-08-01
Impairment of speech prosody is characteristic for Parkinson's disease (PD) and does not respond well to dopaminergic treatment. We assessed whether baseline acoustic parameters, alone or in combination with other predominantly non-dopaminergic symptoms may predict global cognitive decline as measured by the Addenbrooke's cognitive examination (ACE-R) and/or worsening of cognitive status as assessed by a detailed neuropsychological examination. Forty-four consecutive non-depressed PD patients underwent clinical and cognitive testing, and acoustic voice analysis at baseline and at the two-year follow-up. Influence of speech and other clinical parameters on worsening of the ACE-R and of the cognitive status was analyzed using linear and logistic regression. The cognitive status (classified as normal cognition, mild cognitive impairment and dementia) deteriorated in 25% of patients during the follow-up. The multivariate linear regression model consisted of the variation in range of the fundamental voice frequency (F0VR) and the REM Sleep Behavioral Disorder Screening Questionnaire (RBDSQ). These parameters explained 37.2% of the variability of the change in ACE-R. The most significant predictors in the univariate logistic regression were the speech index of rhythmicity (SPIR; p = 0.012), disease duration (p = 0.019), and the RBDSQ (p = 0.032). The multivariate regression analysis revealed that SPIR alone led to 73.2% accuracy in predicting a change in cognitive status. Combining SPIR with RBDSQ improved the prediction accuracy of SPIR alone by 7.3%. Impairment of speech prosody together with symptoms of RBD predicted rapid cognitive decline and worsening of PD cognitive status during a two-year period. Copyright © 2016 Elsevier Ltd. All rights reserved.
Bégarie, Jérôme; Maïano, Christophe; Leconte, Pascale; Ninot, Grégory
2013-05-01
This study examines the prevalence of overweight and obesity and a panel of potential determinants among French youths and adults with an intellectual disability (ID). The sample used consisted of 1120 youths and adults with an ID, from 5 to 28 years old, attending a French special education school. The results indicated that 19.8% of the participants with an ID are classified as overweight and 8.6% as obese. Multivariate logistic regression analyses revealed that there are nearly three times more girls/women classified as overweight than boys/men. Additionally, they showed that there are nearly two times more participants from southern France classified as overweight than from northern France, and that the risk of being classified as overweight significantly increases with seniority in the school. Next, the interaction effects observed indicated first that there are nearly two times more boys/men on psychotropic medication classified as overweight than boys/men not on psychotropic medication. Second, they revealed that the odds of being classified as overweight for boys/men not on psychotropic medication are 47% lower than for girls/women not on psychotropic medication. Third, they indicated that there are nearly two times more boys/men from southern France classified as obese than boys/men from northern France. Fourth, they showed that the odds of being classified as obese for boys/men from northern France are 52% lower than for girls/women from northern France. In conclusion, these results should be viewed as preliminary and need to be replicated since, to our knowledge, this study is the first one to examine this topic while simultaneously controlling for all of the potential determinants and relying on a sample of youths and adults. Copyright © 2013 Elsevier Ltd. All rights reserved.
Nowakowska, Marzena
2017-04-01
The development of the Bayesian logistic regression model classifying the road accident severity is discussed. The already exploited informative priors (method of moments, maximum likelihood estimation, and two-stage Bayesian updating), along with the original idea of a Boot prior proposal, are investigated when no expert opinion has been available. In addition, two possible approaches to updating the priors, in the form of unbalanced and balanced training data sets, are presented. The obtained logistic Bayesian models are assessed on the basis of a deviance information criterion (DIC), highest probability density (HPD) intervals, and coefficients of variation estimated for the model parameters. The verification of the model accuracy has been based on sensitivity, specificity and the harmonic mean of sensitivity and specificity, all calculated from a test data set. The models obtained from the balanced training data set have a better classification quality than the ones obtained from the unbalanced training data set. The two-stage Bayesian updating prior model and the Boot prior model, both identified with the use of the balanced training data set, outperform the non-informative, method of moments, and maximum likelihood estimation prior models. It is important to note that one should be careful when interpreting the parameters since different priors can lead to different models. Copyright © 2017 Elsevier Ltd. All rights reserved.
Logistic regression for risk factor modelling in stuttering research.
Reed, Phil; Wu, Yaqionq
2013-06-01
To outline the uses of logistic regression and other statistical methods for risk factor analysis in the context of research on stuttering. The principles underlying the application of a logistic regression are illustrated, and the types of questions to which such a technique has been applied in the stuttering field are outlined. The assumptions and limitations of the technique are discussed with respect to existing stuttering research, and with respect to formulating appropriate research strategies to accommodate these considerations. Finally, some alternatives to the approach are briefly discussed. The way the statistical procedures are employed are demonstrated with some hypothetical data. Research into several practical issues concerning stuttering could benefit if risk factor modelling were used. Important examples are early diagnosis, prognosis (whether a child will recover or persist) and assessment of treatment outcome. After reading this article you will: (a) Summarize the situations in which logistic regression can be applied to a range of issues about stuttering; (b) Follow the steps in performing a logistic regression analysis; (c) Describe the assumptions of the logistic regression technique and the precautions that need to be checked when it is employed; (d) Be able to summarize its advantages over other techniques like estimation of group differences and simple regression. Copyright © 2012 Elsevier Inc. All rights reserved.
Methods for estimating selected low-flow frequency statistics for unregulated streams in Kentucky
Martin, Gary R.; Arihood, Leslie D.
2010-01-01
This report provides estimates of, and presents methods for estimating, selected low-flow frequency statistics for unregulated streams in Kentucky including the 30-day mean low flows for recurrence intervals of 2 and 5 years (30Q2 and 30Q5) and the 7-day mean low flows for recurrence intervals of 5, 10, and 20 years (7Q2, 7Q10, and 7Q20). Estimates of these statistics are provided for 121 U.S. Geological Survey streamflow-gaging stations with data through the 2006 climate year, which is the 12-month period ending March 31 of each year. Data were screened to identify the periods of homogeneous, unregulated flows for use in the analyses. Logistic-regression equations are presented for estimating the annual probability of the selected low-flow frequency statistics being equal to zero. Weighted-least-squares regression equations were developed for estimating the magnitude of the nonzero 30Q2, 30Q5, 7Q2, 7Q10, and 7Q20 low flows. Three low-flow regions were defined for estimating the 7-day low-flow frequency statistics. The explicit explanatory variables in the regression equations include total drainage area and the mapped streamflow-variability index measured from a revised statewide coverage of this characteristic. The percentage of the station low-flow statistics correctly classified as zero or nonzero by use of the logistic-regression equations ranged from 87.5 to 93.8 percent. The average standard errors of prediction of the weighted-least-squares regression equations ranged from 108 to 226 percent. The 30Q2 regression equations have the smallest standard errors of prediction, and the 7Q20 regression equations have the largest standard errors of prediction. The regression equations are applicable only to stream sites with low flows unaffected by regulation from reservoirs and local diversions of flow and to drainage basins in specified ranges of basin characteristics. Caution is advised when applying the equations for basins with characteristics near the applicable limits and for basins with karst drainage features.
Travis Woolley; David C. Shaw; Lisa M. Ganio; Stephen Fitzgerald
2012-01-01
Logistic regression models used to predict tree mortality are critical to post-fire management, planning prescribed bums and understanding disturbance ecology. We review literature concerning post-fire mortality prediction using logistic regression models for coniferous tree species in the western USA. We include synthesis and review of: methods to develop, evaluate...
Preserving Institutional Privacy in Distributed binary Logistic Regression.
Wu, Yuan; Jiang, Xiaoqian; Ohno-Machado, Lucila
2012-01-01
Privacy is becoming a major concern when sharing biomedical data across institutions. Although methods for protecting privacy of individual patients have been proposed, it is not clear how to protect the institutional privacy, which is many times a critical concern of data custodians. Built upon our previous work, Grid Binary LOgistic REgression (GLORE)1, we developed an Institutional Privacy-preserving Distributed binary Logistic Regression model (IPDLR) that considers both individual and institutional privacy for building a logistic regression model in a distributed manner. We tested our method using both simulated and clinical data, showing how it is possible to protect the privacy of individuals and of institutions using a distributed strategy.
Covariate Imbalance and Adjustment for Logistic Regression Analysis of Clinical Trial Data
Ciolino, Jody D.; Martin, Reneé H.; Zhao, Wenle; Jauch, Edward C.; Hill, Michael D.; Palesch, Yuko Y.
2014-01-01
In logistic regression analysis for binary clinical trial data, adjusted treatment effect estimates are often not equivalent to unadjusted estimates in the presence of influential covariates. This paper uses simulation to quantify the benefit of covariate adjustment in logistic regression. However, International Conference on Harmonization guidelines suggest that covariate adjustment be pre-specified. Unplanned adjusted analyses should be considered secondary. Results suggest that that if adjustment is not possible or unplanned in a logistic setting, balance in continuous covariates can alleviate some (but never all) of the shortcomings of unadjusted analyses. The case of log binomial regression is also explored. PMID:24138438
Differentially private distributed logistic regression using private and public data.
Ji, Zhanglong; Jiang, Xiaoqian; Wang, Shuang; Xiong, Li; Ohno-Machado, Lucila
2014-01-01
Privacy protecting is an important issue in medical informatics and differential privacy is a state-of-the-art framework for data privacy research. Differential privacy offers provable privacy against attackers who have auxiliary information, and can be applied to data mining models (for example, logistic regression). However, differentially private methods sometimes introduce too much noise and make outputs less useful. Given available public data in medical research (e.g. from patients who sign open-consent agreements), we can design algorithms that use both public and private data sets to decrease the amount of noise that is introduced. In this paper, we modify the update step in Newton-Raphson method to propose a differentially private distributed logistic regression model based on both public and private data. We try our algorithm on three different data sets, and show its advantage over: (1) a logistic regression model based solely on public data, and (2) a differentially private distributed logistic regression model based on private data under various scenarios. Logistic regression models built with our new algorithm based on both private and public datasets demonstrate better utility than models that trained on private or public datasets alone without sacrificing the rigorous privacy guarantee.
Deng, Yingyuan; Wang, Tianfu; Chen, Siping; Liu, Weixiang
2017-01-01
The aim of the study is to screen the significant sonographic features by logistic regression analysis and fit a model to diagnose thyroid nodules. A total of 525 pathological thyroid nodules were retrospectively analyzed. All the nodules underwent conventional ultrasonography (US), strain elastosonography (SE), and contrast -enhanced ultrasound (CEUS). Those nodules’ 12 suspicious sonographic features were used to assess thyroid nodules. The significant features of diagnosing thyroid nodules were picked out by logistic regression analysis. All variables that were statistically related to diagnosis of thyroid nodules, at a level of p < 0.05 were embodied in a logistic regression analysis model. The significant features in the logistic regression model of diagnosing thyroid nodules were calcification, suspected cervical lymph node metastasis, hypoenhancement pattern, margin, shape, vascularity, posterior acoustic, echogenicity, and elastography score. According to the results of logistic regression analysis, the formula that could predict whether or not thyroid nodules are malignant was established. The area under the receiver operating curve (ROC) was 0.930 and the sensitivity, specificity, accuracy, positive predictive value, and negative predictive value were 83.77%, 89.56%, 87.05%, 86.04%, and 87.79% respectively. PMID:29228030
Pang, Tiantian; Huang, Leidan; Deng, Yingyuan; Wang, Tianfu; Chen, Siping; Gong, Xuehao; Liu, Weixiang
2017-01-01
The aim of the study is to screen the significant sonographic features by logistic regression analysis and fit a model to diagnose thyroid nodules. A total of 525 pathological thyroid nodules were retrospectively analyzed. All the nodules underwent conventional ultrasonography (US), strain elastosonography (SE), and contrast -enhanced ultrasound (CEUS). Those nodules' 12 suspicious sonographic features were used to assess thyroid nodules. The significant features of diagnosing thyroid nodules were picked out by logistic regression analysis. All variables that were statistically related to diagnosis of thyroid nodules, at a level of p < 0.05 were embodied in a logistic regression analysis model. The significant features in the logistic regression model of diagnosing thyroid nodules were calcification, suspected cervical lymph node metastasis, hypoenhancement pattern, margin, shape, vascularity, posterior acoustic, echogenicity, and elastography score. According to the results of logistic regression analysis, the formula that could predict whether or not thyroid nodules are malignant was established. The area under the receiver operating curve (ROC) was 0.930 and the sensitivity, specificity, accuracy, positive predictive value, and negative predictive value were 83.77%, 89.56%, 87.05%, 86.04%, and 87.79% respectively.
Amini, Payam; Maroufizadeh, Saman; Samani, Reza Omani; Hamidi, Omid; Sepidarkish, Mahdi
2017-06-01
Preterm birth (PTB) is a leading cause of neonatal death and the second biggest cause of death in children under five years of age. The objective of this study was to determine the prevalence of PTB and its associated factors using logistic regression and decision tree classification methods. This cross-sectional study was conducted on 4,415 pregnant women in Tehran, Iran, from July 6-21, 2015. Data were collected by a researcher-developed questionnaire through interviews with mothers and review of their medical records. To evaluate the accuracy of the logistic regression and decision tree methods, several indices such as sensitivity, specificity, and the area under the curve were used. The PTB rate was 5.5% in this study. The logistic regression outperformed the decision tree for the classification of PTB based on risk factors. Logistic regression showed that multiple pregnancies, mothers with preeclampsia, and those who conceived with assisted reproductive technology had an increased risk for PTB ( p < 0.05). Identifying and training mothers at risk as well as improving prenatal care may reduce the PTB rate. We also recommend that statisticians utilize the logistic regression model for the classification of risk groups for PTB.
Comparing Postural Stability Entropy Analyses to Differentiate Fallers and Non-Fallers
Fino, Peter C.; Mojdehi, Ahmad R.; Adjerid, Khaled; Habibi, Mohammad; Lockhart, Thurmon E.; Ross, Shane D.
2015-01-01
The health and financial cost of falls has spurred research to differentiate the characteristics of fallers and non-fallers. Postural stability has received much of the attention with recent studies exploring various measures of entropy. This study compared the discriminatory ability of several entropy methods at differentiating two paradigms in the center-of-pressure (COP) of elderly individuals: 1.) eyes open (EO) versus eyes closed (EC) and 2.) fallers (F) versus non-fallers (NF). Methods were compared using the area under the curve (AUC) of the receiver-operating characteristic (ROC) curves developed from logistic regression models. Overall, multiscale entropy (MSE) and composite multiscale entropy (CompMSE) performed the best with AUCs of 0.71 for EO/EC and 0.77 for F/NF. When methods were combined together to maximize the AUC, the entropy classifier had an AUC of for 0.91 the F/NF comparison. These results suggest researchers and clinicians attempting to create clinical tests to identify fallers should consider a combination of every entropy method when creating a classifying test. Additionally, MSE and CompMSE classifiers using polar coordinate data outperformed rectangular coordinate data, encouraging more research into the most appropriate time series for postural stability entropy analysis. PMID:26464267
Comparing Postural Stability Entropy Analyses to Differentiate Fallers and Non-fallers.
Fino, Peter C; Mojdehi, Ahmad R; Adjerid, Khaled; Habibi, Mohammad; Lockhart, Thurmon E; Ross, Shane D
2016-05-01
The health and financial cost of falls has spurred research to differentiate the characteristics of fallers and non-fallers. Postural stability has received much of the attention with recent studies exploring various measures of entropy. This study compared the discriminatory ability of several entropy methods at differentiating two paradigms in the center-of-pressure of elderly individuals: (1) eyes open (EO) vs. eyes closed (EC) and (2) fallers (F) vs. non-fallers (NF). Methods were compared using the area under the curve (AUC) of the receiver-operating characteristic curves developed from logistic regression models. Overall, multiscale entropy (MSE) and composite multiscale entropy (CompMSE) performed the best with AUCs of 0.71 for EO/EC and 0.77 for F/NF. When methods were combined together to maximize the AUC, the entropy classifier had an AUC of for 0.91 the F/NF comparison. These results suggest researchers and clinicians attempting to create clinical tests to identify fallers should consider a combination of every entropy method when creating a classifying test. Additionally, MSE and CompMSE classifiers using polar coordinate data outperformed rectangular coordinate data, encouraging more research into the most appropriate time series for postural stability entropy analysis.
Fall classification by machine learning using mobile phones.
Albert, Mark V; Kording, Konrad; Herrmann, Megan; Jayaraman, Arun
2012-01-01
Fall prevention is a critical component of health care; falls are a common source of injury in the elderly and are associated with significant levels of mortality and morbidity. Automatically detecting falls can allow rapid response to potential emergencies; in addition, knowing the cause or manner of a fall can be beneficial for prevention studies or a more tailored emergency response. The purpose of this study is to demonstrate techniques to not only reliably detect a fall but also to automatically classify the type. We asked 15 subjects to simulate four different types of falls-left and right lateral, forward trips, and backward slips-while wearing mobile phones and previously validated, dedicated accelerometers. Nine subjects also wore the devices for ten days, to provide data for comparison with the simulated falls. We applied five machine learning classifiers to a large time-series feature set to detect falls. Support vector machines and regularized logistic regression were able to identify a fall with 98% accuracy and classify the type of fall with 99% accuracy. This work demonstrates how current machine learning approaches can simplify data collection for prevention in fall-related research as well as improve rapid response to potential injuries due to falls.
Classification of older adults with/without a fall history using machine learning methods.
Lin Zhang; Ou Ma; Fabre, Jennifer M; Wood, Robert H; Garcia, Stephanie U; Ivey, Kayla M; McCann, Evan D
2015-01-01
Falling is a serious problem in an aged society such that assessment of the risk of falls for individuals is imperative for the research and practice of falls prevention. This paper introduces an application of several machine learning methods for training a classifier which is capable of classifying individual older adults into a high risk group and a low risk group (distinguished by whether or not the members of the group have a recent history of falls). Using a 3D motion capture system, significant gait features related to falls risk are extracted. By training these features, classification hypotheses are obtained based on machine learning techniques (K Nearest-neighbour, Naive Bayes, Logistic Regression, Neural Network, and Support Vector Machine). Training and test accuracies with sensitivity and specificity of each of these techniques are assessed. The feature adjustment and tuning of the machine learning algorithms are discussed. The outcome of the study will benefit the prediction and prevention of falls.
A review of machine learning in obesity.
DeGregory, K W; Kuiper, P; DeSilvio, T; Pleuss, J D; Miller, R; Roginski, J W; Fisher, C B; Harness, D; Viswanath, S; Heymsfield, S B; Dungan, I; Thomas, D M
2018-05-01
Rich sources of obesity-related data arising from sensors, smartphone apps, electronic medical health records and insurance data can bring new insights for understanding, preventing and treating obesity. For such large datasets, machine learning provides sophisticated and elegant tools to describe, classify and predict obesity-related risks and outcomes. Here, we review machine learning methods that predict and/or classify such as linear and logistic regression, artificial neural networks, deep learning and decision tree analysis. We also review methods that describe and characterize data such as cluster analysis, principal component analysis, network science and topological data analysis. We introduce each method with a high-level overview followed by examples of successful applications. The algorithms were then applied to National Health and Nutrition Examination Survey to demonstrate methodology, utility and outcomes. The strengths and limitations of each method were also evaluated. This summary of machine learning algorithms provides a unique overview of the state of data analysis applied specifically to obesity. © 2018 World Obesity Federation.
Automated robot-assisted surgical skill evaluation: Predictive analytics approach.
Fard, Mahtab J; Ameri, Sattar; Darin Ellis, R; Chinnam, Ratna B; Pandya, Abhilash K; Klein, Michael D
2018-02-01
Surgical skill assessment has predominantly been a subjective task. Recently, technological advances such as robot-assisted surgery have created great opportunities for objective surgical evaluation. In this paper, we introduce a predictive framework for objective skill assessment based on movement trajectory data. Our aim is to build a classification framework to automatically evaluate the performance of surgeons with different levels of expertise. Eight global movement features are extracted from movement trajectory data captured by a da Vinci robot for surgeons with two levels of expertise - novice and expert. Three classification methods - k-nearest neighbours, logistic regression and support vector machines - are applied. The result shows that the proposed framework can classify surgeons' expertise as novice or expert with an accuracy of 82.3% for knot tying and 89.9% for a suturing task. This study demonstrates and evaluates the ability of machine learning methods to automatically classify expert and novice surgeons using global movement features. Copyright © 2017 John Wiley & Sons, Ltd.
Urine cell-based DNA methylation classifier for monitoring bladder cancer.
van der Heijden, Antoine G; Mengual, Lourdes; Ingelmo-Torres, Mercedes; Lozano, Juan J; van Rijt-van de Westerlo, Cindy C M; Baixauli, Montserrat; Geavlete, Bogdan; Moldoveanud, Cristian; Ene, Cosmin; Dinney, Colin P; Czerniak, Bogdan; Schalken, Jack A; Kiemeney, Lambertus A L M; Ribal, Maria J; Witjes, J Alfred; Alcaraz, Antonio
2018-01-01
Current standard methods used to detect and monitor bladder cancer (BC) are invasive or have low sensitivity. This study aimed to develop a urine methylation biomarker classifier for BC monitoring and validate this classifier in patients in follow-up for bladder cancer (PFBC). Voided urine samples ( N = 725) from BC patients, controls, and PFBC were prospectively collected in four centers. Finally, 626 urine samples were available for analysis. DNA was extracted from the urinary cells and bisulfite modificated, and methylation status was analyzed using pyrosequencing. Cytology was available from a subset of patients ( N = 399). In the discovery phase, seven selected genes from the literature ( CDH13 , CFTR , NID2 , SALL3 , TMEFF2 , TWIST1 , and VIM2 ) were studied in 111 BC and 57 control samples. This training set was used to develop a gene classifier by logistic regression and was validated in 458 PFBC samples (173 with recurrence). A three-gene methylation classifier containing CFTR , SALL3 , and TWIST1 was developed in the training set (AUC 0.874). The classifier achieved an AUC of 0.741 in the validation series. Cytology results were available for 308 samples from the validation set. Cytology achieved AUC 0.696 whereas the classifier in this subset of patients reached an AUC 0.768. Combining the methylation classifier with cytology results achieved an AUC 0.86 in the validation set, with a sensitivity of 96%, a specificity of 40%, and a positive and negative predictive value of 56 and 92%, respectively. The combination of the three-gene methylation classifier and cytology results has high sensitivity and high negative predictive value in a real clinical scenario (PFBC). The proposed classifier is a useful test for predicting BC recurrence and decrease the number of cystoscopies in the follow-up of BC patients. If only patients with a positive combined classifier result would be cystoscopied, 36% of all cystoscopies can be prevented.
Knowledge, Attitudes, and Substance Use Practices Among Street Children in Western Kenya
Embleton, Lonnie; Ayuku, David; Atwoli, Lukoye; Vreeman, Rachel; Braitstein, Paula
2013-01-01
The study describes the knowledge of and attitudes toward substance use among street-involved youth in Kenya, and how they relate to their substance use practices. In 2011, 146 children and youth ages 10–19 years, classified as either children on the street or children of the street were recruited to participate in a cross-sectional survey in Eldoret, Kenya. Bivariate analysis using χ2 or Fisher’s Exact Test was used to test the associations between variables, and multiple logistic regression analysis was used to identify independent covariates associated with lifetime and current drug use. The study’s limitations and source of funding are noted. PMID:22780841
Zhan, Liang; Liu, Yashu; Wang, Yalin; Zhou, Jiayu; Jahanshad, Neda; Ye, Jieping; Thompson, Paul M.
2015-01-01
Alzheimer's disease (AD) is a progressive brain disease. Accurate detection of AD and its prodromal stage, mild cognitive impairment (MCI), are crucial. There is also a growing interest in identifying brain imaging biomarkers that help to automatically differentiate stages of Alzheimer's disease. Here, we focused on brain structural networks computed from diffusion MRI and proposed a new feature extraction and classification framework based on higher order singular value decomposition and sparse logistic regression. In tests on publicly available data from the Alzheimer's Disease Neuroimaging Initiative, our proposed framework showed promise in detecting brain network differences that help in classifying different stages of Alzheimer's disease. PMID:26257601
Logistic regression for dichotomized counts.
Preisser, John S; Das, Kalyan; Benecha, Habtamu; Stamm, John W
2016-12-01
Sometimes there is interest in a dichotomized outcome indicating whether a count variable is positive or zero. Under this scenario, the application of ordinary logistic regression may result in efficiency loss, which is quantifiable under an assumed model for the counts. In such situations, a shared-parameter hurdle model is investigated for more efficient estimation of regression parameters relating to overall effects of covariates on the dichotomous outcome, while handling count data with many zeroes. One model part provides a logistic regression containing marginal log odds ratio effects of primary interest, while an ancillary model part describes the mean count of a Poisson or negative binomial process in terms of nuisance regression parameters. Asymptotic efficiency of the logistic model parameter estimators of the two-part models is evaluated with respect to ordinary logistic regression. Simulations are used to assess the properties of the models with respect to power and Type I error, the latter investigated under both misspecified and correctly specified models. The methods are applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren. © The Author(s) 2014.
Segmentation and analysis of mouse pituitary cells with graphic user interface (GUI)
NASA Astrophysics Data System (ADS)
González, Erika; Medina, Lucía.; Hautefeuille, Mathieu; Fiordelisio, Tatiana
2018-02-01
In this work we present a method to perform pituitary cell segmentation in image stacks acquired by fluorescence microscopy from pituitary slice preparations. Although there exist many procedures developed to achieve cell segmentation tasks, they are generally based on the edge detection and require high resolution images. However in the biological preparations that we worked on, the cells are not well defined as experts identify their intracellular calcium activity due to fluorescence intensity changes in different regions over time. This intensity changes were associated with time series over regions, and because they present a particular behavior they were used into a classification procedure in order to perform cell segmentation. Two logistic regression classifiers were implemented for the time series classification task using as features the area under the curve and skewness in the first classifier and skewness and kurtosis in the second classifier. Once we have found both decision boundaries in two different feature spaces by training using 120 time series, the decision boundaries were tested over 12 image stacks through a python graphical user interface (GUI), generating binary images where white pixels correspond to cells and the black ones to background. Results show that area-skewness classifier reduces the time an expert dedicates in locating cells by up to 75% in some stacks versus a 92% for the kurtosis-skewness classifier, this evaluated on the number of regions the method found. Due to the promising results, we expect that this method will be improved adding more relevant features to the classifier.
NASA Astrophysics Data System (ADS)
Han, Xiaopeng; Huang, Xin; Li, Jiayi; Li, Yansheng; Yang, Michael Ying; Gong, Jianya
2018-04-01
In recent years, the availability of high-resolution imagery has enabled more detailed observation of the Earth. However, it is imperative to simultaneously achieve accurate interpretation and preserve the spatial details for the classification of such high-resolution data. To this aim, we propose the edge-preservation multi-classifier relearning framework (EMRF). This multi-classifier framework is made up of support vector machine (SVM), random forest (RF), and sparse multinomial logistic regression via variable splitting and augmented Lagrangian (LORSAL) classifiers, considering their complementary characteristics. To better characterize complex scenes of remote sensing images, relearning based on landscape metrics is proposed, which iteratively quantizes both the landscape composition and spatial configuration by the use of the initial classification results. In addition, a novel tri-training strategy is proposed to solve the over-smoothing effect of relearning by means of automatic selection of training samples with low classification certainties, which always distribute in or near the edge areas. Finally, EMRF flexibly combines the strengths of relearning and tri-training via the classification certainties calculated by the probabilistic output of the respective classifiers. It should be noted that, in order to achieve an unbiased evaluation, we assessed the classification accuracy of the proposed framework using both edge and non-edge test samples. The experimental results obtained with four multispectral high-resolution images confirm the efficacy of the proposed framework, in terms of both edge and non-edge accuracy.
Herrera-Anaya, Elizabeth; Angarita-Fonseca, Adriana; Herrera-Galindo, Víctor M; Martínez-Marín, Rocío D P; Rodríguez-Bayona, Cindy N
2016-09-01
To determine the association between gross motor function and nutritional status in children with cerebral palsy (CP) residing in an urban area in a developing country. We conducted a cross-sectional study in 177 children (ages 2-12y, 59.3% male) with a diagnosis of CP who were attending rehabilitation centres in Bucaramanga, Colombia (2012-2013). A physiotherapist evaluated patients using the Gross Motor Function Classification System (GMFCS, levels I to V). Nutritional status was evaluated by nutritionists and classified according to the World Health Organization growth charts. We used linear and multinomial logistic regression methods to determine the associations. There were 39.5%, 6.8%, 5.6%, 16.4%, and 31.6% patients classified in levels I to V respectively. The mean adjusted differences for weight-for-age, height-for-age, BMI-for-age, and height-for-weight z-scores were significantly larger for children classified in levels II to V compared with those in level I. The children classified in levels IV and V were more likely to have malnutrition (adjusted odds ratio [OR] 5.64; 95% confidence interval [CI] 2.27-14.0) and stunting (OR 8.42; 95% CI 2.90-24.4) than those classified in GMFCS levels I to III. Stunting and malnutrition are prevalent conditions among paediatric patients with CP, and both are directly associated with higher levels of gross motor dysfunction. © 2016 Mac Keith Press.
Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P
2015-01-01
This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of more than 10% over the standard classification models, which can be translated to correct labeling of additional 400 - 500 readmissions for heart failure patients in the state of California over a year. Lastly, several key predictor identified from the HCUP data include the disposition location from discharge, the number of chronic conditions, and the number of acute procedures. It would be beneficial to apply simple decision rules obtained from the decision tree in an ad-hoc manner to guide the cohort stratification. It could be potentially beneficial to explore the effect of pairwise interactions between influential predictors when building the logistic regression models for different data strata. Judicious use of the ad-hoc CLR models developed offers insights into future development of prediction models for hospital readmissions, which can lead to better intuition in identifying high-risk patients and developing effective post-discharge care strategies. Lastly, this paper is expected to raise the awareness of collecting data on additional markers and developing necessary database infrastructure for larger-scale exploratory studies on readmission risk prediction.
NASA Astrophysics Data System (ADS)
Salehi, Hassan S.; Li, Hai; Merkulov, Alex; Kumavor, Patrick D.; Vavadi, Hamed; Sanders, Melinda; Kueck, Angela; Brewer, Molly A.; Zhu, Quing
2016-04-01
Most ovarian cancers are diagnosed at advanced stages due to the lack of efficacious screening techniques. Photoacoustic tomography (PAT) has a potential to image tumor angiogenesis and detect early neovascular changes of the ovary. We have developed a coregistered PAT and ultrasound (US) prototype system for real-time assessment of ovarian masses. Features extracted from PAT and US angular beams, envelopes, and images were input to a logistic classifier and a support vector machine (SVM) classifier to diagnose ovaries as benign or malignant. A total of 25 excised ovaries of 15 patients were studied and the logistic and SVM classifiers achieved sensitivities of 70.4 and 87.7%, and specificities of 95.6 and 97.9%, respectively. Furthermore, the ovaries of two patients were noninvasively imaged using the PAT/US system before surgical excision. By using five significant features and the logistic classifier, 12 out of 14 images (86% sensitivity) from a malignant ovarian mass and all 17 images (100% specificity) from a benign mass were accurately classified; the SVM correctly classified 10 out of 14 malignant images (71% sensitivity) and all 17 benign images (100% specificity). These initial results demonstrate the clinical potential of the PAT/US technique for ovarian cancer diagnosis.
Interpretation of commonly used statistical regression models.
Kasza, Jessica; Wolfe, Rory
2014-01-01
A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.
Choi, Seung Hoan; Labadorf, Adam T; Myers, Richard H; Lunetta, Kathryn L; Dupuis, Josée; DeStefano, Anita L
2017-02-06
Next generation sequencing provides a count of RNA molecules in the form of short reads, yielding discrete, often highly non-normally distributed gene expression measurements. Although Negative Binomial (NB) regression has been generally accepted in the analysis of RNA sequencing (RNA-Seq) data, its appropriateness has not been exhaustively evaluated. We explore logistic regression as an alternative method for RNA-Seq studies designed to compare cases and controls, where disease status is modeled as a function of RNA-Seq reads using simulated and Huntington disease data. We evaluate the effect of adjusting for covariates that have an unknown relationship with gene expression. Finally, we incorporate the data adaptive method in order to compare false positive rates. When the sample size is small or the expression levels of a gene are highly dispersed, the NB regression shows inflated Type-I error rates but the Classical logistic and Bayes logistic (BL) regressions are conservative. Firth's logistic (FL) regression performs well or is slightly conservative. Large sample size and low dispersion generally make Type-I error rates of all methods close to nominal alpha levels of 0.05 and 0.01. However, Type-I error rates are controlled after applying the data adaptive method. The NB, BL, and FL regressions gain increased power with large sample size, large log2 fold-change, and low dispersion. The FL regression has comparable power to NB regression. We conclude that implementing the data adaptive method appropriately controls Type-I error rates in RNA-Seq analysis. Firth's logistic regression provides a concise statistical inference process and reduces spurious associations from inaccurately estimated dispersion parameters in the negative binomial framework.
A new computational strategy for predicting essential genes.
Cheng, Jian; Wu, Wenwu; Zhang, Yinwen; Li, Xiangchen; Jiang, Xiaoqian; Wei, Gehong; Tao, Shiheng
2013-12-21
Determination of the minimum gene set for cellular life is one of the central goals in biology. Genome-wide essential gene identification has progressed rapidly in certain bacterial species; however, it remains difficult to achieve in most eukaryotic species. Several computational models have recently been developed to integrate gene features and used as alternatives to transfer gene essentiality annotations between organisms. We first collected features that were widely used by previous predictive models and assessed the relationships between gene features and gene essentiality using a stepwise regression model. We found two issues that could significantly reduce model accuracy: (i) the effect of multicollinearity among gene features and (ii) the diverse and even contrasting correlations between gene features and gene essentiality existing within and among different species. To address these issues, we developed a novel model called feature-based weighted Naïve Bayes model (FWM), which is based on Naïve Bayes classifiers, logistic regression, and genetic algorithm. The proposed model assesses features and filters out the effects of multicollinearity and diversity. The performance of FWM was compared with other popular models, such as support vector machine, Naïve Bayes model, and logistic regression model, by applying FWM to reciprocally predict essential genes among and within 21 species. Our results showed that FWM significantly improves the accuracy and robustness of essential gene prediction. FWM can remarkably improve the accuracy of essential gene prediction and may be used as an alternative method for other classification work. This method can contribute substantially to the knowledge of the minimum gene sets required for living organisms and the discovery of new drug targets.
Young, Sean D; Yu, Wenchao; Wang, Wei
2017-02-01
"Social big data" from technologies such as social media, wearable devices, and online searches continue to grow and can be used as tools for HIV research. Although researchers can uncover patterns and insights associated with HIV trends and transmission, the review process is time consuming and resource intensive. Machine learning methods derived from computer science might be used to assist HIV domain experts by learning how to rapidly and accurately identify patterns associated with HIV from a large set of social data. Using an existing social media data set that was associated with HIV and coded by an HIV domain expert, we tested whether 4 commonly used machine learning methods could learn the patterns associated with HIV risk behavior. We used the 10-fold cross-validation method to examine the speed and accuracy of these models in applying that knowledge to detect HIV content in social media data. Logistic regression and random forest resulted in the highest accuracy in detecting HIV-related social data (85.3%), whereas the Ridge Regression Classifier resulted in the lowest accuracy. Logistic regression yielded the fastest processing time (16.98 seconds). Machine learning can enable social big data to become a new and important tool in HIV research, helping to create a new field of "digital HIV epidemiology." If a domain expert can identify patterns in social data associated with HIV risk or HIV transmission, machine learning models could quickly and accurately learn those associations and identify potential HIV patterns in large social data sets.
Differentially private distributed logistic regression using private and public data
2014-01-01
Background Privacy protecting is an important issue in medical informatics and differential privacy is a state-of-the-art framework for data privacy research. Differential privacy offers provable privacy against attackers who have auxiliary information, and can be applied to data mining models (for example, logistic regression). However, differentially private methods sometimes introduce too much noise and make outputs less useful. Given available public data in medical research (e.g. from patients who sign open-consent agreements), we can design algorithms that use both public and private data sets to decrease the amount of noise that is introduced. Methodology In this paper, we modify the update step in Newton-Raphson method to propose a differentially private distributed logistic regression model based on both public and private data. Experiments and results We try our algorithm on three different data sets, and show its advantage over: (1) a logistic regression model based solely on public data, and (2) a differentially private distributed logistic regression model based on private data under various scenarios. Conclusion Logistic regression models built with our new algorithm based on both private and public datasets demonstrate better utility than models that trained on private or public datasets alone without sacrificing the rigorous privacy guarantee. PMID:25079786
Park, Ji Hyun; Kim, Hyeon-Young; Lee, Hanna; Yun, Eun Kyoung
2015-12-01
This study compares the performance of the logistic regression and decision tree analysis methods for assessing the risk factors for infection in cancer patients undergoing chemotherapy. The subjects were 732 cancer patients who were receiving chemotherapy at K university hospital in Seoul, Korea. The data were collected between March 2011 and February 2013 and were processed for descriptive analysis, logistic regression and decision tree analysis using the IBM SPSS Statistics 19 and Modeler 15.1 programs. The most common risk factors for infection in cancer patients receiving chemotherapy were identified as alkylating agents, vinca alkaloid and underlying diabetes mellitus. The logistic regression explained 66.7% of the variation in the data in terms of sensitivity and 88.9% in terms of specificity. The decision tree analysis accounted for 55.0% of the variation in the data in terms of sensitivity and 89.0% in terms of specificity. As for the overall classification accuracy, the logistic regression explained 88.0% and the decision tree analysis explained 87.2%. The logistic regression analysis showed a higher degree of sensitivity and classification accuracy. Therefore, logistic regression analysis is concluded to be the more effective and useful method for establishing an infection prediction model for patients undergoing chemotherapy. Copyright © 2015 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Mei, Zhixiong; Wu, Hao; Li, Shiyun
2018-06-01
The Conversion of Land Use and its Effects at Small regional extent (CLUE-S), which is a widely used model for land-use simulation, utilizes logistic regression to estimate the relationships between land use and its drivers, and thus, predict land-use change probabilities. However, logistic regression disregards possible spatial autocorrelation and self-organization in land-use data. Autologistic regression can depict spatial autocorrelation but cannot address self-organization, while logistic regression by considering only self-organization (NElogistic regression) fails to capture spatial autocorrelation. Therefore, this study developed a regression (NE-autologistic regression) method, which incorporated both spatial autocorrelation and self-organization, to improve CLUE-S. The Zengcheng District of Guangzhou, China was selected as the study area. The land-use data of 2001, 2005, and 2009, as well as 10 typical driving factors, were used to validate the proposed regression method and the improved CLUE-S model. Then, three future land-use scenarios in 2020: the natural growth scenario, ecological protection scenario, and economic development scenario, were simulated using the improved model. Validation results showed that NE-autologistic regression performed better than logistic regression, autologistic regression, and NE-logistic regression in predicting land-use change probabilities. The spatial allocation accuracy and kappa values of NE-autologistic-CLUE-S were higher than those of logistic-CLUE-S, autologistic-CLUE-S, and NE-logistic-CLUE-S for the simulations of two periods, 2001-2009 and 2005-2009, which proved that the improved CLUE-S model achieved the best simulation and was thereby effective to a certain extent. The scenario simulation results indicated that under all three scenarios, traffic land and residential/industrial land would increase, whereas arable land and unused land would decrease during 2009-2020. Apparent differences also existed in the simulated change sizes and locations of each land-use type under different scenarios. The results not only demonstrate the validity of the improved model but also provide a valuable reference for relevant policy-makers.
Unitary Response Regression Models
ERIC Educational Resources Information Center
Lipovetsky, S.
2007-01-01
The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…
Binary logistic regression-Instrument for assessing museum indoor air impact on exhibits.
Bucur, Elena; Danet, Andrei Florin; Lehr, Carol Blaziu; Lehr, Elena; Nita-Lazar, Mihai
2017-04-01
This paper presents a new way to assess the environmental impact on historical artifacts using binary logistic regression. The prediction of the impact on the exhibits during certain pollution scenarios (environmental impact) was calculated by a mathematical model based on the binary logistic regression; it allows the identification of those environmental parameters from a multitude of possible parameters with a significant impact on exhibitions and ranks them according to their severity effect. Air quality (NO 2 , SO 2 , O 3 and PM 2.5 ) and microclimate parameters (temperature, humidity) monitoring data from a case study conducted within exhibition and storage spaces of the Romanian National Aviation Museum Bucharest have been used for developing and validating the binary logistic regression method and the mathematical model. The logistic regression analysis was used on 794 data combinations (715 to develop of the model and 79 to validate it) by a Statistical Package for Social Sciences (SPSS 20.0). The results from the binary logistic regression analysis demonstrated that from six parameters taken into consideration, four of them present a significant effect upon exhibits in the following order: O 3 >PM 2.5 >NO 2 >humidity followed at a significant distance by the effects of SO 2 and temperature. The mathematical model, developed in this study, correctly predicted 95.1 % of the cumulated effect of the environmental parameters upon the exhibits. Moreover, this model could also be used in the decisional process regarding the preventive preservation measures that should be implemented within the exhibition space. The paper presents a new way to assess the environmental impact on historical artifacts using binary logistic regression. The mathematical model developed on the environmental parameters analyzed by the binary logistic regression method could be useful in a decision-making process establishing the best measures for pollution reduction and preventive preservation of exhibits.
Determining factors influencing survival of breast cancer by fuzzy logistic regression model.
Nikbakht, Roya; Bahrampour, Abbas
2017-01-01
Fuzzy logistic regression model can be used for determining influential factors of disease. This study explores the important factors of actual predictive survival factors of breast cancer's patients. We used breast cancer data which collected by cancer registry of Kerman University of Medical Sciences during the period of 2000-2007. The variables such as morphology, grade, age, and treatments (surgery, radiotherapy, and chemotherapy) were applied in the fuzzy logistic regression model. Performance of model was determined in terms of mean degree of membership (MDM). The study results showed that almost 41% of patients were in neoplasm and malignant group and more than two-third of them were still alive after 5-year follow-up. Based on the fuzzy logistic model, the most important factors influencing survival were chemotherapy, morphology, and radiotherapy, respectively. Furthermore, the MDM criteria show that the fuzzy logistic regression have a good fit on the data (MDM = 0.86). Fuzzy logistic regression model showed that chemotherapy is more important than radiotherapy in survival of patients with breast cancer. In addition, another ability of this model is calculating possibilistic odds of survival in cancer patients. The results of this study can be applied in clinical research. Furthermore, there are few studies which applied the fuzzy logistic models. Furthermore, we recommend using this model in various research areas.
Blood lead level association with lower body weight in NHANES 1999–2006
DOE Office of Scientific and Technical Information (OSTI.GOV)
Scinicariello, Franco, E-mail: fes6@cdc.gov; Buser, Melanie C.; Mevissen, Meike
Background: Lead exposure is associated with low birth-weight. The objective of this study is to determine whether lead exposure is associated with lower body weight in children, adolescents and adults. Methods: We analyzed data from NHANES 1999–2006 for participants aged ≥ 3 using multiple logistic and multivariate linear regression. Using age- and sex-standardized BMI Z-scores, overweight and obese children (ages 3–19) were classified by BMI ≥ 85th and ≥ 95th percentiles, respectively. The adult population (age ≥ 20) was classified as overweight and obese with BMI measures of 25–29.9 and ≥ 30, respectively. Blood lead level (BLL) was categorized bymore » weighted quartiles. Results: Multivariate linear regressions revealed a lower BMI Z-score in children and adolescents when the highest lead quartile was compared to the lowest lead quartile (β (SE) = − 0.33 (0.07), p < 0.001), and a decreased BMI in adults (β (SE) = − 2.58 (0.25), p < 0.001). Multiple logistic analyses in children and adolescents found a negative association between BLL and the percentage of obese and overweight with BLL in the highest quartile compared to the lowest quartile (OR = 0.42, 95% CI: 0.30–0.59; and OR = 0.67, 95% CI: 0.52–0.88, respectively). Adults in the highest lead quartile were less likely to be obese (OR = 0.42, 95% CI: 0.35–0.50) compared to those in the lowest lead quartile. Further analyses with blood lead as restricted cubic splines, confirmed the dose-relationship between blood lead and body weight outcomes. Conclusions: BLLs are associated with lower body mass index and obesity in children, adolescents and adults. - Highlights: • NHANES analysis of BLL and body weight outcomes • Increased BLL associated with decreased body weight in children and adolescent • Increased BLL associated with decreased body weight in adults.« less
Accurate Diabetes Risk Stratification Using Machine Learning: Role of Missing Value and Outliers.
Maniruzzaman, Md; Rahman, Md Jahanur; Al-MehediHasan, Md; Suri, Harman S; Abedin, Md Menhazul; El-Baz, Ayman; Suri, Jasjit S
2018-04-10
Diabetes mellitus is a group of metabolic diseases in which blood sugar levels are too high. About 8.8% of the world was diabetic in 2017. It is projected that this will reach nearly 10% by 2045. The major challenge is that when machine learning-based classifiers are applied to such data sets for risk stratification, leads to lower performance. Thus, our objective is to develop an optimized and robust machine learning (ML) system under the assumption that missing values or outliers if replaced by a median configuration will yield higher risk stratification accuracy. This ML-based risk stratification is designed, optimized and evaluated, where: (i) the features are extracted and optimized from the six feature selection techniques (random forest, logistic regression, mutual information, principal component analysis, analysis of variance, and Fisher discriminant ratio) and combined with ten different types of classifiers (linear discriminant analysis, quadratic discriminant analysis, naïve Bayes, Gaussian process classification, support vector machine, artificial neural network, Adaboost, logistic regression, decision tree, and random forest) under the hypothesis that both missing values and outliers when replaced by computed medians will improve the risk stratification accuracy. Pima Indian diabetic dataset (768 patients: 268 diabetic and 500 controls) was used. Our results demonstrate that on replacing the missing values and outliers by group median and median values, respectively and further using the combination of random forest feature selection and random forest classification technique yields an accuracy, sensitivity, specificity, positive predictive value, negative predictive value and area under the curve as: 92.26%, 95.96%, 79.72%, 91.14%, 91.20%, and 0.93, respectively. This is an improvement of 10% over previously developed techniques published in literature. The system was validated for its stability and reliability. RF-based model showed the best performance when outliers are replaced by median values.
Loring, David W; Goldstein, Felicia C; Chen, Chuqing; Drane, Daniel L; Lah, James J; Zhao, Liping; Larrabee, Glenn J
2016-06-01
The objective is to examine failure on three embedded performance validity tests [Reliable Digit Span (RDS), Auditory Verbal Learning Test (AVLT) logistic regression, and AVLT recognition memory] in early Alzheimer disease (AD; n = 178), amnestic mild cognitive impairment (MCI; n = 365), and cognitively intact age-matched controls (n = 206). Neuropsychological tests scores were obtained from subjects participating in the Alzheimer's Disease Neuroimaging Initiative (ADNI). RDS failure using a ≤7 RDS threshold was 60/178 (34%) for early AD, 52/365 (14%) for MCI, and 17/206 (8%) for controls. A ≤6 RDS criterion reduced this rate to 24/178 (13%) for early AD, 15/365 (4%) for MCI, and 7/206 (3%) for controls. AVLT logistic regression probability of ≥.76 yielded unacceptably high false-positive rates in both clinical groups [early AD = 149/178 (79%); MCI = 159/365 (44%)] but not cognitively intact controls (13/206, 6%). AVLT recognition criterion of ≤9/15 classified 125/178 (70%) of early AD, 155/365 (42%) of MCI, and 18/206 (9%) of control scores as invalid, which decreased to 66/178 (37%) for early AD, 46/365 (13%) for MCI, and 10/206 (5%) for controls when applying a ≤5/15 criterion. Despite high false-positive rates across individual measures and thresholds, combining RDS ≤ 6 and AVLT recognition ≤9/15 classified only 9/178 (5%) of early AD and 4/365 (1%) of MCI patients as invalid performers. Embedded validity cutoffs derived from mixed clinical groups produce unacceptably high false-positive rates in MCI and early AD. Combining embedded PVT indicators lowers the false-positive rate. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Igase, Michiya; Kohara, Katsuhiko; Igase, Keiji; Yamashita, Shiro; Fujisawa, Mutsuo; Katagi, Ryosuke; Miki, Tetsuro
2013-02-15
Cerebral microbleeds (CMBs) detected on T2*-weighted MRI gradient-echo have been associated with increased risk of cerebral infarction. We evaluated risk factors for these lesions in a cohort of first-time ischemic stroke patients. Presence of CMBs in consecutive first-time ischemic stroke patients was evaluated. The location of CMBs was classified by cerebral region as strictly lobar (lobar CMBs) and deep or infratentorial (deep CMBs). Logistic regression analysis was performed to determine the contribution of lipid profile to the presence of CMBs. One hundred and sixteen patients with a mean age of 70±10years were recruited. CMBs were present in 74 patients. The deep CMBs group had significantly lower HDL-C levels than those without CMBs. In univariable analysis, advanced periventricular hyperintensity grade (PVH>2) and decreased HDL-C were significantly associated with the deep but not the lobar CMB group. On logistic regression analysis, HDL-C (beta=-0.06, p=0.002) and PVH grade >2 (beta=3.40, p=0.005) were independent determinants of deep CMBs. Low HDL-C may be a risk factor of deep CMBs, including advanced PVH status, in elderly patients with acute ischemic stroke. Management of HDL-C levels might be a therapeutic target for the prevention of recurrence of stroke. Copyright © 2012 Elsevier B.V. All rights reserved.
Association between maternal smoking, gender, and cleft lip and palate.
Martelli, Daniella Reis Barbosa; Coletta, Ricardo D; Oliveira, Eduardo A; Swerts, Mário Sérgio Oliveira; Rodrigues, Laíse A Mendes; Oliveira, Maria Christina; Martelli Júnior, Hercílio
2015-01-01
Cleft lip and/or palate (CL/P) represent the most common congenital anomalies of the face. To assess the relationship between maternal smoking, gender and CL/P. This is an epidemiological cross-sectional study. We interviewed 1519 mothers divided into two groups: mothers of children with CL/P (n=843) and mothers of children without CL/P (n=676). All mothers were classified as smoker or non-smoker subjects during the first trimester of pregnancy. To determine an association among maternal smoking, gender, and CL/P, odds ratios were calculated and the adjustment was made by a logistic regression model. An association between maternal smoking and the presence of cleft was observed. There was also a strong association between male gender and the presence of cleft (OR=3.51; 95% CI 2.83-4.37). By binary logistic regression analysis, it was demonstrated that both variables were independently associated with clefts. In a multivariate analysis, male gender and maternal smoking had a 2.5- and a 1.5-time greater chance of having a cleft, respectively. Our findings are consistent with a positive association between maternal smoking during pregnancy and CL/P in male gender. The results support the importance of smoking prevention and introduction of cessation programs among women with childbearing potential. Copyright © 2015 Associação Brasileira de Otorrinolaringologia e Cirurgia Cérvico-Facial. Published by Elsevier Editora Ltda. All rights reserved.
Zhang, Ya-Jie; Jin, Hua; Qin, Zhen-Li; Ma, Jin-Long; Zhao, Han; Zhang, Ling; Chen, Zi-Jiang
2016-01-01
This study aims to explore the independent predictors of gestational diabetes mellitus (GDM) in Chinese women with polycystic ovary syndrome (PCOS). This cross-sectional study analyzed primigravid women with PCOS and classified them as those with and without GDM. Independent risk factors and model performance were analyzed using multivariate logistic regression and the area under the curve (AUC) of receiver operating characteristic (ROC), respectively. Maternal body mass index, waist circumference, waist-to-hip ratio (WHR), fasting glucose, insulin, sex hormone-binding globulin (SHBG), homeostasis model assessment-insulin resistance (HOMA-IR) before pregnancy, gestation weight gain before 24 weeks and the incidence of family history of diabetes were different in the 2 groups. Logistic regression analysis showed that pre-pregnancy WHR, SHBG, HOMA-IR and gestation weight gain before 24 weeks were the independent predictors of GDM. ROC curve analysis confirmed that gestation weight gain before 24 weeks (AUC 0.767, 95% CI 0.688-0.841), pre-pregnant WHR (AUC 0.725, 95% CI 0.649-0.802), HOMA-IR (AUC 0.711, 95% CI 0.632-0.790) and SHBG levels (AUC 0.709, 95% CI 0.625-0.793) were the strong risk factors. In Chinese women with PCOS, factors of gestation weight gain before 24 weeks, pre-pregnant WHR, HOMA-IR and SHBG levels are strongly associated with subsequent development of GDM. © 2015 S. Karger AG, Basel.
Identifying the optimal segmentors for mass classification in mammograms
NASA Astrophysics Data System (ADS)
Zhang, Yu; Tomuro, Noriko; Furst, Jacob; Raicu, Daniela S.
2015-03-01
In this paper, we present the results of our investigation on identifying the optimal segmentor(s) from an ensemble of weak segmentors, used in a Computer-Aided Diagnosis (CADx) system which classifies suspicious masses in mammograms as benign or malignant. This is an extension of our previous work, where we used various parameter settings of image enhancement techniques to each suspicious mass (region of interest (ROI)) to obtain several enhanced images, then applied segmentation to each image to obtain several contours of a given mass. Each segmentation in this ensemble is essentially a "weak segmentor" because no single segmentation can produce the optimal result for all images. Then after shape features are computed from the segmented contours, the final classification model was built using logistic regression. The work in this paper focuses on identifying the optimal segmentor(s) from an ensemble mix of weak segmentors. For our purpose, optimal segmentors are those in the ensemble mix which contribute the most to the overall classification rather than the ones that produced high precision segmentation. To measure the segmentors' contribution, we examined weights on the features in the derived logistic regression model and computed the average feature weight for each segmentor. The result showed that, while in general the segmentors with higher segmentation success rates had higher feature weights, some segmentors with lower segmentation rates had high classification feature weights as well.
Internal Structure of Kidney Calculi as a Predictor for Shockwave Lithotripsy Success.
Christiansen, Frederikke Eichner; Andreassen, Kim Hovgaard; Osther, Susanne Sloth; Osther, Palle Joern Sloth
2016-03-01
The internal structure of renal calculi can be determined on CT using bone windows and may be classified as homogeneous or inhomogeneous with void regions. In vitro studies have shown homogeneous stones to be less responsive to extracorporeal shockwave lithotripsy (SWL). The objective was to evaluate whether the internal morphology of calculi defined by CT bone window influences SWL outcome in vivo. One hundred eleven patients with solitary renal calculi treated with SWL were included. Treatment data were registered prospectively and follow-up data were collected retrospectively. All patients had noncontrast computed tomography (NCCT) performed before SWL and at 3-month follow-up. The stones were categorized as homogeneous or inhomogeneous. At follow-up, the patient's stone status was registered. Stone-free status was defined as no evidence of calculi on NCCT. Treatment was considered successful if the patient was either stone free or had clinically insignificant residual fragments. Using simple logistic regression, the odds for being stone free 3 months post-SWL were significantly reduced in the patients with inhomogeneous stones compared with patients with homogeneous stones (odds ratio 0.43 [95% confidence interval 0.20, 0.92; p < 0.05]). However, when adjusting for stone size by multiple logistic regression, including stone size (area) as a covariate, this difference became insignificant. The internal structure of kidney stones did not predict the outcome of SWL in vivo.
Serum Leptin Is a Biomarker of Malnutrition in Decompensated Cirrhosis
Rachakonda, Vikrant; Borhani, Amir A.; Dunn, Michael A.; Andrzejewski, Margaret; Martin, Kelly; Behari, Jaideep
2016-01-01
Background and Aims Malnutrition is a leading cause of morbidity and mortality in cirrhosis. There is no consensus as to the optimal approach for identifying malnutrition in end-stage liver disease. The aim of this study was to measure biochemical, serologic, hormonal, radiographic, and anthropometric features in a cohort of hospitalized cirrhotic patients to characterize biomarkers for identification of malnutrition. Design In this prospective observational cohort study, 52 hospitalized cirrhotic patients were classified as malnourished (42.3%) or nourished (57.7%) based on mid-arm muscle circumference < 23 cm and dominant handgrip strength < 30 kg. Anthropometric measurements were obtained. Appetite was assessed using the Simplified Nutrition Appetite Questionnaire (SNAQ) score. Fasting levels of serum adipokines, cytokines, and hormones were determined using Luminex assays. Logistic regression analysis was used to determine features independently associated with malnutrition. Results Subjects with and without malnutrition differed in several key features of metabolic phenotype including wet and dry BMI, skeletal muscle index, visceral fat index and HOMA-IR. Serum leptin levels were lower and INR was higher in malnourished subjects. Serum leptin was significantly correlated with HOMA-IR, wet and dry BMI, mid-arm muscle circumference, skeletal muscle index, and visceral fat index. Logistic regression analysis revealed that INR and log-transformed leptin were independently associated with malnutrition. Conclusions Low serum leptin and elevated INR are associated with malnutrition in hospitalized patients with end-stage liver disease. PMID:27583675
Bayesian data fusion for spatial prediction of categorical variables in environmental sciences
NASA Astrophysics Data System (ADS)
Gengler, Sarah; Bogaert, Patrick
2014-12-01
First developed to predict continuous variables, Bayesian Maximum Entropy (BME) has become a complete framework in the context of space-time prediction since it has been extended to predict categorical variables and mixed random fields. This method proposes solutions to combine several sources of data whatever the nature of the information. However, the various attempts that were made for adapting the BME methodology to categorical variables and mixed random fields faced some limitations, as a high computational burden. The main objective of this paper is to overcome this limitation by generalizing the Bayesian Data Fusion (BDF) theoretical framework to categorical variables, which is somehow a simplification of the BME method through the convenient conditional independence hypothesis. The BDF methodology for categorical variables is first described and then applied to a practical case study: the estimation of soil drainage classes using a soil map and point observations in the sandy area of Flanders around the city of Mechelen (Belgium). The BDF approach is compared to BME along with more classical approaches, as Indicator CoKringing (ICK) and logistic regression. Estimators are compared using various indicators, namely the Percentage of Correctly Classified locations (PCC) and the Average Highest Probability (AHP). Although BDF methodology for categorical variables is somehow a simplification of BME approach, both methods lead to similar results and have strong advantages compared to ICK and logistic regression.
Predicting Visual Distraction Using Driving Performance Data
Kircher, Katja; Ahlstrom, Christer
2010-01-01
Behavioral variables are often used as performance indicators (PIs) of visual or internal distraction induced by secondary tasks. The objective of this study is to investigate whether visual distraction can be predicted by driving performance PIs in a naturalistic setting. Visual distraction is here defined by a gaze based real-time distraction detection algorithm called AttenD. Seven drivers used an instrumented vehicle for one month each in a small scale field operational test. For each of the visual distraction events detected by AttenD, seven PIs such as steering wheel reversal rate and throttle hold were calculated. Corresponding data were also calculated for time periods during which the drivers were classified as attentive. For each PI, means between distracted and attentive states were calculated using t-tests for different time-window sizes (2 – 40 s), and the window width with the smallest resulting p-value was selected as optimal. Based on the optimized PIs, logistic regression was used to predict whether the drivers were attentive or distracted. The logistic regression resulted in predictions which were 76 % correct (sensitivity = 77 % and specificity = 76 %). The conclusion is that there is a relationship between behavioral variables and visual distraction, but the relationship is not strong enough to accurately predict visual driver distraction. Instead, behavioral PIs are probably best suited as complementary to eye tracking based algorithms in order to make them more accurate and robust. PMID:21050615
Abnormal anal cytology risk in women with known genital squamous intraepithelial lesion.
do Socorro Nobre, Maria; Jacyntho, Claudia Marcia; Eleutério, José; Giraldo, Paulo César; Gonçalves, Ana Katherine
2016-01-01
The purpose of this study was to assess the risk of abnormal anal cytology in women with known genital squamous intraepithelial lesion. This study evaluated 200 women with and without genital squamous intraepithelial lesion who were recruited for anal Pap smears. Women who had abnormal results on equally or over atypical squamous cells of undetermined significance were classified as having abnormal anal cytology. A multiple logistic regression analysis (stepwise) was performed to identify the risk for developing abnormal anal cytology. Data were analyzed using the SPSS 20.0 program. The average age was 41.09 (±12.64). Of the total participants, 75.5% did not practice anal sex, 91% did not have HPV-infected partners, 92% did not have any anal pathology, and 68.5% did not have anal bleeding. More than half (57.5%) had genital SIL and a significant number developed abnormal anal cytology: 13% in the total sample and 17.4% in women with genital SIL. A significant association was observed between genital squamous intraepithelial lesion and anal squamous intraepithelial lesion (PR=2.46; p=0.03). In the logistic regression model, women having genital intraepithelial lesion were more likely to have abnormal anal Pap smear (aPR=2.81; p=0.02). This report shows that women with genital squamous intraepithelial lesion must be more closely screened for anal cancer. Copyright © 2016 Elsevier Editora Ltda. All rights reserved.
Mixed conditional logistic regression for habitat selection studies.
Duchesne, Thierry; Fortin, Daniel; Courbin, Nicolas
2010-05-01
1. Resource selection functions (RSFs) are becoming a dominant tool in habitat selection studies. RSF coefficients can be estimated with unconditional (standard) and conditional logistic regressions. While the advantage of mixed-effects models is recognized for standard logistic regression, mixed conditional logistic regression remains largely overlooked in ecological studies. 2. We demonstrate the significance of mixed conditional logistic regression for habitat selection studies. First, we use spatially explicit models to illustrate how mixed-effects RSFs can be useful in the presence of inter-individual heterogeneity in selection and when the assumption of independence from irrelevant alternatives (IIA) is violated. The IIA hypothesis states that the strength of preference for habitat type A over habitat type B does not depend on the other habitat types also available. Secondly, we demonstrate the significance of mixed-effects models to evaluate habitat selection of free-ranging bison Bison bison. 3. When movement rules were homogeneous among individuals and the IIA assumption was respected, fixed-effects RSFs adequately described habitat selection by simulated animals. In situations violating the inter-individual homogeneity and IIA assumptions, however, RSFs were best estimated with mixed-effects regressions, and fixed-effects models could even provide faulty conclusions. 4. Mixed-effects models indicate that bison did not select farmlands, but exhibited strong inter-individual variations in their response to farmlands. Less than half of the bison preferred farmlands over forests. Conversely, the fixed-effect model simply suggested an overall selection for farmlands. 5. Conditional logistic regression is recognized as a powerful approach to evaluate habitat selection when resource availability changes. This regression is increasingly used in ecological studies, but almost exclusively in the context of fixed-effects models. Fitness maximization can imply differences in trade-offs among individuals, which can yield inter-individual differences in selection and lead to departure from IIA. These situations are best modelled with mixed-effects models. Mixed-effects conditional logistic regression should become a valuable tool for ecological research.
Advanced colorectal neoplasia risk stratification by penalized logistic regression.
Lin, Yunzhi; Yu, Menggang; Wang, Sijian; Chappell, Richard; Imperiale, Thomas F
2016-08-01
Colorectal cancer is the second leading cause of death from cancer in the United States. To facilitate the efficiency of colorectal cancer screening, there is a need to stratify risk for colorectal cancer among the 90% of US residents who are considered "average risk." In this article, we investigate such risk stratification rules for advanced colorectal neoplasia (colorectal cancer and advanced, precancerous polyps). We use a recently completed large cohort study of subjects who underwent a first screening colonoscopy. Logistic regression models have been used in the literature to estimate the risk of advanced colorectal neoplasia based on quantifiable risk factors. However, logistic regression may be prone to overfitting and instability in variable selection. Since most of the risk factors in our study have several categories, it was tempting to collapse these categories into fewer risk groups. We propose a penalized logistic regression method that automatically and simultaneously selects variables, groups categories, and estimates their coefficients by penalizing the [Formula: see text]-norm of both the coefficients and their differences. Hence, it encourages sparsity in the categories, i.e. grouping of the categories, and sparsity in the variables, i.e. variable selection. We apply the penalized logistic regression method to our data. The important variables are selected, with close categories simultaneously grouped, by penalized regression models with and without the interactions terms. The models are validated with 10-fold cross-validation. The receiver operating characteristic curves of the penalized regression models dominate the receiver operating characteristic curve of naive logistic regressions, indicating a superior discriminative performance. © The Author(s) 2013.
Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.
2003-01-01
Logistic regression was used to predict the probability of debris flows occurring in areas recently burned by wildland fires. Multiple logistic regression is conceptually similar to multiple linear regression because statistical relations between one dependent variable and several independent variables are evaluated. In logistic regression, however, the dependent variable is transformed to a binary variable (debris flow did or did not occur), and the actual probability of the debris flow occurring is statistically modeled. Data from 399 basins located within 15 wildland fires that burned during 2000-2002 in Colorado, Idaho, Montana, and New Mexico were evaluated. More than 35 independent variables describing the burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows were delineated from National Elevation Data using a Geographic Information System (GIS). (2) Data describing the burn severity, geology, land surface gradient, rainfall, and soil properties were determined for each basin. These data were then downloaded to a statistics software package for analysis using logistic regression. (3) Relations between the occurrence/non-occurrence of debris flows and burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated and several preliminary multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combination produced the most effective model. The multivariate model that best predicted the occurrence of debris flows was selected. (4) The multivariate logistic regression model was entered into a GIS, and a map showing the probability of debris flows was constructed. The most effective model incorporates the percentage of each basin with slope greater than 30 percent, percentage of land burned at medium and high burn severity in each basin, particle size sorting, average storm intensity (millimeters per hour), soil organic matter content, soil permeability, and soil drainage. The results of this study demonstrate that logistic regression is a valuable tool for predicting the probability of debris flows occurring in recently-burned landscapes.
Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon
2015-01-01
Background: Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. Methods: In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. Results: The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Conclusion: Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended. PMID:26793655
Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon
2015-01-01
Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended.
Kempe, P T; van Oppen, P; de Haan, E; Twisk, J W R; Sluis, A; Smit, J H; van Dyck, R; van Balkom, A J L M
2007-09-01
Two methods for predicting remissions in obsessive-compulsive disorder (OCD) treatment are evaluated. Y-BOCS measurements of 88 patients with a primary OCD (DSM-III-R) diagnosis were performed over a 16-week treatment period, and during three follow-ups. Remission at any measurement was defined as a Y-BOCS score lower than thirteen combined with a reduction of seven points when compared with baseline. Logistic regression models were compared with a Cox regression for recurrent events model. Logistic regression yielded different models at different evaluation times. The recurrent events model remained stable when fewer measurements were used. Higher baseline levels of neuroticism and more severe OCD symptoms were associated with a lower chance of remission, early age of onset and more depressive symptoms with a higher chance. Choice of outcome time affects logistic regression prediction models. Recurrent events analysis uses all information on remissions and relapses. Short- and long-term predictors for OCD remission show overlap.
Estimating the exceedance probability of rain rate by logistic regression
NASA Technical Reports Server (NTRS)
Chiu, Long S.; Kedem, Benjamin
1990-01-01
Recent studies have shown that the fraction of an area with rain intensity above a fixed threshold is highly correlated with the area-averaged rain rate. To estimate the fractional rainy area, a logistic regression model, which estimates the conditional probability that rain rate over an area exceeds a fixed threshold given the values of related covariates, is developed. The problem of dependency in the data in the estimation procedure is bypassed by the method of partial likelihood. Analyses of simulated scanning multichannel microwave radiometer and observed electrically scanning microwave radiometer data during the Global Atlantic Tropical Experiment period show that the use of logistic regression in pixel classification is superior to multiple regression in predicting whether rain rate at each pixel exceeds a given threshold, even in the presence of noisy data. The potential of the logistic regression technique in satellite rain rate estimation is discussed.
Wang, Qingliang; Li, Xiaojie; Hu, Kunpeng; Zhao, Kun; Yang, Peisheng; Liu, Bo
2015-05-12
To explore the risk factors of portal hypertensive gastropathy (PHG) in patients with hepatitis B associated cirrhosis and establish a Logistic regression model of noninvasive prediction. The clinical data of 234 hospitalized patients with hepatitis B associated cirrhosis from March 2012 to March 2014 were analyzed retrospectively. The dependent variable was the occurrence of PHG while the independent variables were screened by binary Logistic analysis. Multivariate Logistic regression was used for further analysis of significant noninvasive independent variables. Logistic regression model was established and odds ratio was calculated for each factor. The accuracy, sensitivity and specificity of model were evaluated by the curve of receiver operating characteristic (ROC). According to univariate Logistic regression, the risk factors included hepatic dysfunction, albumin (ALB), bilirubin (TB), prothrombin time (PT), platelet (PLT), white blood cell (WBC), portal vein diameter, spleen index, splenic vein diameter, diameter ratio, PLT to spleen volume ratio, esophageal varices (EV) and gastric varices (GV). Multivariate analysis showed that hepatic dysfunction (X1), TB (X2), PLT (X3) and splenic vein diameter (X4) were the major occurring factors for PHG. The established regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4. The accuracy of model for PHG was 79.1% with a sensitivity of 77.2% and a specificity of 80.8%. Hepatic dysfunction, TB, PLT and splenic vein diameter are risk factors for PHG and the noninvasive predicted Logistic regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4.
Variable Selection in Logistic Regression.
1987-06-01
23 %. AUTIOR(.) S. CONTRACT OR GRANT NUMBE Rf.i %Z. D. Bai, P. R. Krishnaiah and . C. Zhao F49620-85- C-0008 " PERFORMING ORGANIZATION NAME AND AOORESS...d I7 IOK-TK- d 7 -I0 7’ VARIABLE SELECTION IN LOGISTIC REGRESSION Z. D. Bai, P. R. Krishnaiah and L. C. Zhao Center for Multivariate Analysis...University of Pittsburgh Center for Multivariate Analysis University of Pittsburgh Y !I VARIABLE SELECTION IN LOGISTIC REGRESSION Z- 0. Bai, P. R. Krishnaiah
NASA Astrophysics Data System (ADS)
Madhu, B.; Ashok, N. C.; Balasubramanian, S.
2014-11-01
Multinomial logistic regression analysis was used to develop statistical model that can predict the probability of breast cancer in Southern Karnataka using the breast cancer occurrence data during 2007-2011. Independent socio-economic variables describing the breast cancer occurrence like age, education, occupation, parity, type of family, health insurance coverage, residential locality and socioeconomic status of each case was obtained. The models were developed as follows: i) Spatial visualization of the Urban- rural distribution of breast cancer cases that were obtained from the Bharat Hospital and Institute of Oncology. ii) Socio-economic risk factors describing the breast cancer occurrences were complied for each case. These data were then analysed using multinomial logistic regression analysis in a SPSS statistical software and relations between the occurrence of breast cancer across the socio-economic status and the influence of other socio-economic variables were evaluated and multinomial logistic regression models were constructed. iii) the model that best predicted the occurrence of breast cancer were identified. This multivariate logistic regression model has been entered into a geographic information system and maps showing the predicted probability of breast cancer occurrence in Southern Karnataka was created. This study demonstrates that Multinomial logistic regression is a valuable tool for developing models that predict the probability of breast cancer Occurrence in Southern Karnataka.
Parsaeian, M; Mohammad, K; Mahmoudi, M; Zeraati, H
2012-01-01
Background: The purpose of this investigation was to compare empirically predictive ability of an artificial neural network with a logistic regression in prediction of low back pain. Methods: Data from the second national health survey were considered in this investigation. This data includes the information of low back pain and its associated risk factors among Iranian people aged 15 years and older. Artificial neural network and logistic regression models were developed using a set of 17294 data and they were validated in a test set of 17295 data. Hosmer and Lemeshow recommendation for model selection was used in fitting the logistic regression. A three-layer perceptron with 9 inputs, 3 hidden and 1 output neurons was employed. The efficiency of two models was compared by receiver operating characteristic analysis, root mean square and -2 Loglikelihood criteria. Results: The area under the ROC curve (SE), root mean square and -2Loglikelihood of the logistic regression was 0.752 (0.004), 0.3832 and 14769.2, respectively. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the artificial neural network was 0.754 (0.004), 0.3770 and 14757.6, respectively. Conclusions: Based on these three criteria, artificial neural network would give better performance than logistic regression. Although, the difference is statistically significant, it does not seem to be clinically significant. PMID:23113198
Parsaeian, M; Mohammad, K; Mahmoudi, M; Zeraati, H
2012-01-01
The purpose of this investigation was to compare empirically predictive ability of an artificial neural network with a logistic regression in prediction of low back pain. Data from the second national health survey were considered in this investigation. This data includes the information of low back pain and its associated risk factors among Iranian people aged 15 years and older. Artificial neural network and logistic regression models were developed using a set of 17294 data and they were validated in a test set of 17295 data. Hosmer and Lemeshow recommendation for model selection was used in fitting the logistic regression. A three-layer perceptron with 9 inputs, 3 hidden and 1 output neurons was employed. The efficiency of two models was compared by receiver operating characteristic analysis, root mean square and -2 Loglikelihood criteria. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the logistic regression was 0.752 (0.004), 0.3832 and 14769.2, respectively. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the artificial neural network was 0.754 (0.004), 0.3770 and 14757.6, respectively. Based on these three criteria, artificial neural network would give better performance than logistic regression. Although, the difference is statistically significant, it does not seem to be clinically significant.
NASA Astrophysics Data System (ADS)
Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.
2014-07-01
Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.
Understanding logistic regression analysis.
Sperandei, Sandro
2014-01-01
Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using examples to make it as simple as possible. After definition of the technique, the basic interpretation of the results is highlighted and then some special issues are discussed.
Risk factors of hypertension among adults aged 35-64 years living in an urban slum Nairobi, Kenya.
Olack, Beatrice; Wabwire-Mangen, Fred; Smeeth, Liam; Montgomery, Joel M; Kiwanuka, Noah; Breiman, Robert F
2015-12-17
Hypertension is an emerging public health problem in Sub Saharan Africa (SSA) and urbanization is considered to favor its emergence. Given a paucity of information on hypertension and associated risk factors among urban slum dwellers in SSA, we aimed to characterize the distribution of risk factors for hypertension and investigate their association with hypertension in an urban slum in Kenya. We conducted a community based cross-sectional survey among adults 35 years and older living in Kibera slum Nairobi, Kenya. Trained interviewers collected data on socio demographic characteristics and self reported health behaviours using modified World Health Organization stepwise surveillance questionnaire for chronic disease risk factors. Anthropometric and blood pressure measurements were performed following standard procedures. Multiple logistic regression was used for analysis and odds ratios with 95 % confidence intervals were calculated to identify risk factors associated with hypertension. A total of 1528 adults were surveyed with a mean age of 46.7 years. The age-standardized prevalence of hypertension was 29.4 % (95 % CI 27.0-31.7). Among the 418 participants classified as hypertensive, over one third (39.0 %) were unaware they had hypertension. Prevalence of current smoking and alcohol consumption was 8.5 and 13.1 % respectively. Over one quarter 26.2 % participants were classified as overweight (Body Mass Index [BMI] ≥25 to ≤29.9 kg/m(2)), and 17 % classified as obese (BMI ≥30 kg/m(2)). Overweight, obesity, current smoking, some level of education, highest wealth index, moderate physical activity, older age and being widowed were each independently associated with hypertension. When fit in a multivariable logistic regression model, being a widow [AOR = 1.7; (95 % CI, 1.1-2.6)], belonging to the highest wealth index [AOR = 1.6; (95 % CI, 1.1-2.5)], obesity [AOR = 1.8; 95 % CI, 1.1-3.1)] and moderate physical activity [AOR = 1.9; (95 % CI, 1.2-3.0)], all remained significantly associated with hypertension. Hypertension in the slum is a public health problem affecting at least one in three adults aged 35-64 years. Age, marital status, wealth index, physical inactivity and body mass index are important risk factors associated with hypertension. Prevention measures targeting the modifiable risk factors associated with hypertension are warranted to curb hypertension and its progressive effects.
Gallagher, Patience J; Castro, Victor; Fava, Maurizio; Weilburg, Jeffrey B; Murphy, Shawn N; Gainer, Vivian S; Churchill, Susanne E; Kohane, Isaac S; Iosifescu, Dan V; Smoller, Jordan W; Perlis, Roy H
2012-10-01
OBJECTIVE It has been suggested that there is a mechanism by which nonsteroidal anti-inflammatory drugs (NSAIDs) may interfere with antidepressant response, and poorer outcomes among NSAID-treated patients were reported in the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) study. To attempt to confirm this association in an independent population-based treatment cohort and explore potential confounding variables, the authors examined use of NSAIDs and related medications among 1,528 outpatients in a New England health care system. METHOD Treatment outcomes were classified using a validated machine learning tool applied to electronic medical records. Logistic regression was used to examine the association between medication exposure and treatment outcomes, adjusted for potential confounding variables. To further elucidate confounding and treatment specificity of the observed effects, data from the STAR*D study were reanalyzed. RESULTS NSAID exposure was associated with a greater likelihood of depression classified as treatment resistant compared with depression classified as responsive to selective serotonin reuptake inhibitors (odds ratio=1.55, 95% CI=1.21-2.00). This association was apparent in the NSAIDs-only group but not in those using other agents with NSAID-like mechanisms (cyclooxygenase-2 inhibitors and salicylates). Inclusion of age, sex, ethnicity, and measures of comorbidity and health care utilization in regression models indicated confounding; association with outcome was no longer significant in fully adjusted models. Reanalysis of STAR*D results likewise identified an association in NSAIDs but not NSAID-like drugs, with more modest effects persisting after adjustment for potential confounding variables. CONCLUSIONS These results support an association between NSAID use and poorer antidepressant outcomes in major depressive disorder but indicate that some of the observed effect may be a result of confounding.
ERIC Educational Resources Information Center
Koon, Sharon; Petscher, Yaacov
2015-01-01
The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…
Toward a model for improved targeting of aged at risk of institutionalization.
Weissert, W G; Cready, C M
1989-01-01
A national sample of institutionalized and noninstitutionalized aged was created by merging the 1977 National Nursing Home Survey and its counterpart, the National Health Interview Survey for the same year. A weighted logistic regression analysis was conducted to identify factors that might be useful in calculating home- and community-based long-term care clients' risk of institutionalization. A model containing patient characteristics, nursing home bed supply, and a climate variable correctly classified 98.2 percent of cases residing in nursing homes or the community. Physical dependency, mental disorder and degenerative disease, lack of spouse, being white, poverty, old age, unoccupied nursing home beds, and climate all appear to be determinants of institutional residency among the aged. PMID:2807934
2017-03-23
PUBLIC RELEASE; DISTRIBUTION UNLIMITED Using Multiple and Logistic Regression to Estimate the Median Will- Cost and Probability of Cost and... Cost and Probability of Cost and Schedule Overrun for Program Managers Ryan C. Trudelle Follow this and additional works at: https://scholar.afit.edu...afit.edu. Recommended Citation Trudelle, Ryan C., "Using Multiple and Logistic Regression to Estimate the Median Will- Cost and Probability of Cost and
2013-11-01
Ptrend 0.78 0.62 0.75 Unconditional logistic regression was used to estimate odds ratios (OR) and 95 % confidence intervals (CI) for risk of node...Ptrend 0.71 0.67 Unconditional logistic regression was used to estimate odds ratios (OR) and 95 % confidence intervals (CI) for risk of high-grade tumors... logistic regression was used to estimate odds ratios (OR) and 95 % confidence intervals (CI) for the associations between each of the seven SNPs and
Kim, Sun Mi; Kim, Yongdai; Jeong, Kuhwan; Jeong, Heeyeong; Kim, Jiyoung
2018-01-01
The aim of this study was to compare the performance of image analysis for predicting breast cancer using two distinct regression models and to evaluate the usefulness of incorporating clinical and demographic data (CDD) into the image analysis in order to improve the diagnosis of breast cancer. This study included 139 solid masses from 139 patients who underwent a ultrasonography-guided core biopsy and had available CDD between June 2009 and April 2010. Three breast radiologists retrospectively reviewed 139 breast masses and described each lesion using the Breast Imaging Reporting and Data System (BI-RADS) lexicon. We applied and compared two regression methods-stepwise logistic (SL) regression and logistic least absolute shrinkage and selection operator (LASSO) regression-in which the BI-RADS descriptors and CDD were used as covariates. We investigated the performances of these regression methods and the agreement of radiologists in terms of test misclassification error and the area under the curve (AUC) of the tests. Logistic LASSO regression was superior (P<0.05) to SL regression, regardless of whether CDD was included in the covariates, in terms of test misclassification errors (0.234 vs. 0.253, without CDD; 0.196 vs. 0.258, with CDD) and AUC (0.785 vs. 0.759, without CDD; 0.873 vs. 0.735, with CDD). However, it was inferior (P<0.05) to the agreement of three radiologists in terms of test misclassification errors (0.234 vs. 0.168, without CDD; 0.196 vs. 0.088, with CDD) and the AUC without CDD (0.785 vs. 0.844, P<0.001), but was comparable to the AUC with CDD (0.873 vs. 0.880, P=0.141). Logistic LASSO regression based on BI-RADS descriptors and CDD showed better performance than SL in predicting the presence of breast cancer. The use of CDD as a supplement to the BI-RADS descriptors significantly improved the prediction of breast cancer using logistic LASSO regression.
Mjahad, A; Rosado-Muñoz, A; Bataller-Mompeán, M; Francés-Víllora, J V; Guerrero-Martínez, J F
2017-04-01
To safely select the proper therapy for Ventricullar Fibrillation (VF) is essential to distinct it correctly from Ventricular Tachycardia (VT) and other rhythms. Provided that the required therapy would not be the same, an erroneous detection might lead to serious injuries to the patient or even cause Ventricular Fibrillation (VF). The main novelty of this paper is the use of time-frequency (t-f) representation images as the direct input to the classifier. We hypothesize that this method allow to improve classification results as it allows to eliminate the typical feature selection and extraction stage, and its corresponding loss of information. The standard AHA and MIT-BIH databases were used for evaluation and comparison with other authors. Previous to t-f Pseudo Wigner-Ville (PWV) calculation, only a basic preprocessing for denoising and signal alignment is necessary. In order to check the validity of the method independently of the classifier, four different classifiers are used: Logistic Regression with L2 Regularization (L2 RLR), Adaptive Neural Network Classifier (ANNC), Support Vector Machine (SSVM), and Bagging classifier (BAGG). The main classification results for VF detection (including flutter episodes) are 95.56% sensitivity and 98.8% specificity, 88.80% sensitivity and 99.5% specificity for ventricular tachycardia (VT), 98.98% sensitivity and 97.7% specificity for normal sinus, and 96.87% sensitivity and 99.55% specificity for other rhythms. Results shows that using t-f data representations to feed classifiers provide superior performance values than the feature selection strategies used in previous works. It opens the door to be used in any other detection applications. Copyright © 2017 Elsevier B.V. All rights reserved.
Moore, Richard G.; McMeekin, D. Scott; Brown, Amy K.; DiSilvestro, Paul; Miller, M. Craig; Allard, W. Jeffrey; Gajewski, Walter; Kurman, Robert; Bast, Robert C.; Skates, Steven J.
2012-01-01
Introduction Patients diagnosed with epithelial ovarian cancer (EOC) have improved outcomes when cared for at centers experienced in the management of EOC. The objective of this trial was to validate a predictive model to assess the risk for EOC in women with a pelvic mass. Methods Women diagnosed with a pelvic mass and scheduled to have surgery were enrolled on a multicenter prospective study. Preoperative serum levels of HE4 and CA125 were measured. Separate logistic regression algorithms for premenopausal and postmenopausal women were utilized to categorize patients into low and high risk groups for EOC. Results Twelve sites enrolled 531 evaluable patients with 352 benign tumors, 129 EOC, 22 LMP tumors, 6 non EOC and 22 non ovarian cancers. The postmenopausal group contained 150 benign cases of which 112 were classified as low risk giving a specificity of 75.0% (95% CI 66.9-81.4), and 111 EOC and 6 LMP tumors of which 108 were classified as high risk giving a sensitivity of 92.3% (95% CI=85.9-96.4). The premenopausal group had 202 benign cases of which 151 were classified as low risk providing a specificity of 74.8% (95% CI=68.2--80.6), and 18 EOC and 16 LMP tumors of which 26 were classified as high risk, providing a sensitivity of 76.5% (95% CI=58.8--89.3). Conclusion An algorithm utilizing HE4 and CA125 successfully classified patients into high and low risk groups with 93.8% of EOC correctly classified as high risk. This model can be used to effectively triage patients to centers of excellence. PMID:18851871
Health of children classified as underweight by CDC reference but normal by WHO standard.
Meyers, Alan; Joyce, Katherine; Coleman, Sharon M; Cook, John T; Cutts, Diana; Ettinger de Cuba, Stephanie; Heeren, Timothy C; Rose-Jacobs, Ruth; Black, Maureen M; Casey, Patrick H; Chilton, Mariana; Sandel, Megan; Frank, Deborah A
2013-06-01
To ascertain measures of health status among 6- to 24-month-old children classified as below normal weight-for-age (underweight) by the Centers for Disease Control and Prevention (CDC) 2000 growth reference but as normal weight-for-age by the World Health Organization (WHO) 2006 standard. Data were gathered from children and primary caregivers at emergency departments and primary care clinics in 7 US cities. Outcome measures included caregiver rating of child health, parental evaluation of developmental status, history of hospitalizations, and admission to hospital at the time of visit. Children were classified as (1) not underweight by either CDC 2000 or WHO 2006 criteria, (2) underweight by CDC 2000 but not by WHO 2006 criteria, or (3) underweight by both criteria. Associations between these categories and health outcome measures were assessed by using multiple logistic regression analysis. Data were available for 18 420 children. For each health outcome measure, children classified as underweight by CDC 2000 but normal by WHO 2006 had higher adjusted odds ratios (aORs) of adverse health outcomes than children not classified as underweight by either; children classified as underweight by both had the highest aORs of adverse outcomes. For example, compared with children not underweight by either criteria, the aORs for fair/poor health rating were 2.54 (95% confidence interval: 2.20-2.93) among children underweight by CDC but not WHO and 3.76 (3.13-4.51) among children underweight by both. Children who are reclassified from underweight to normal weight in changing from CDC 2000 to WHO 2006 growth charts may still be affected by morbidities associated with underweight.
Hancock, A S; Younis, P J; Beggs, D S; Mansell, P D; Stevenson, M A; Pyman, M F
2016-12-01
In pasture-based, seasonally calving dairy herds of southern Australia, the mating period usually consists of an initial artificial insemination period followed by a period of natural service using herd bulls. The primary objective of this study was to identify associations between individual bull- and herd-level management factors and bull fertility as measured by a pre- and postmating bull breeding soundness evaluation (BBSE). Multivariable mixed effects logistic regression models were used to identify factors associated with bulls being classified as high risk of reduced fertility at the premating and postmating BBSE. Bulls older than 4 yr of age at the premating BBSE were more likely to be classified high risk compared with bulls less than 4 yr of age. Bulls that were in herds in which concentrates were fed before mating were more likely to be classified as high risk at the postmating BBSE compared with bulls that were in herds where concentrates were not fed. Univariable analyses also identified areas in need of further research, including breed differences between dairy bulls, leg conformation and joint abnormalities, preventative hoof blocking for bulls, and mating ratios. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Umut, İlhan; Çentik, Güven
2016-01-01
The number of channels used for polysomnographic recording frequently causes difficulties for patients because of the many cables connected. Also, it increases the risk of having troubles during recording process and increases the storage volume. In this study, it is intended to detect periodic leg movement (PLM) in sleep with the use of the channels except leg electromyography (EMG) by analysing polysomnography (PSG) data with digital signal processing (DSP) and machine learning methods. PSG records of 153 patients of different ages and genders with PLM disorder diagnosis were examined retrospectively. A novel software was developed for the analysis of PSG records. The software utilizes the machine learning algorithms, statistical methods, and DSP methods. In order to classify PLM, popular machine learning methods (multilayer perceptron, K-nearest neighbour, and random forests) and logistic regression were used. Comparison of classified results showed that while K-nearest neighbour classification algorithm had higher average classification rate (91.87%) and lower average classification error value (RMSE = 0.2850), multilayer perceptron algorithm had the lowest average classification rate (83.29%) and the highest average classification error value (RMSE = 0.3705). Results showed that PLM can be classified with high accuracy (91.87%) without leg EMG record being present. PMID:27213008
Umut, İlhan; Çentik, Güven
2016-01-01
The number of channels used for polysomnographic recording frequently causes difficulties for patients because of the many cables connected. Also, it increases the risk of having troubles during recording process and increases the storage volume. In this study, it is intended to detect periodic leg movement (PLM) in sleep with the use of the channels except leg electromyography (EMG) by analysing polysomnography (PSG) data with digital signal processing (DSP) and machine learning methods. PSG records of 153 patients of different ages and genders with PLM disorder diagnosis were examined retrospectively. A novel software was developed for the analysis of PSG records. The software utilizes the machine learning algorithms, statistical methods, and DSP methods. In order to classify PLM, popular machine learning methods (multilayer perceptron, K-nearest neighbour, and random forests) and logistic regression were used. Comparison of classified results showed that while K-nearest neighbour classification algorithm had higher average classification rate (91.87%) and lower average classification error value (RMSE = 0.2850), multilayer perceptron algorithm had the lowest average classification rate (83.29%) and the highest average classification error value (RMSE = 0.3705). Results showed that PLM can be classified with high accuracy (91.87%) without leg EMG record being present.
A three-parameter model for classifying anurans into four genera based on advertisement calls.
Gingras, Bruno; Fitch, William Tecumseh
2013-01-01
The vocalizations of anurans are innate in structure and may therefore contain indicators of phylogenetic history. Thus, advertisement calls of species which are more closely related phylogenetically are predicted to be more similar than those of distant species. This hypothesis was evaluated by comparing several widely used machine-learning algorithms. Recordings of advertisement calls from 142 species belonging to four genera were analyzed. A logistic regression model, using mean values for dominant frequency, coefficient of variation of root-mean square energy, and spectral flux, correctly classified advertisement calls with regard to genus with an accuracy above 70%. Similar accuracy rates were obtained using these parameters with a support vector machine model, a K-nearest neighbor algorithm, and a multivariate Gaussian distribution classifier, whereas a Gaussian mixture model performed slightly worse. In contrast, models based on mel-frequency cepstral coefficients did not fare as well. Comparable accuracy levels were obtained on out-of-sample recordings from 52 of the 142 original species. The results suggest that a combination of low-level acoustic attributes is sufficient to discriminate efficiently between the vocalizations of these four genera, thus supporting the initial premise and validating the use of high-throughput algorithms on animal vocalizations to evaluate phylogenetic hypotheses.
Yu, Yuanyuan; Li, Hongkai; Sun, Xiaoru; Su, Ping; Wang, Tingting; Liu, Yi; Yuan, Zhongshang; Liu, Yanxun; Xue, Fuzhong
2017-12-28
Confounders can produce spurious associations between exposure and outcome in observational studies. For majority of epidemiologists, adjusting for confounders using logistic regression model is their habitual method, though it has some problems in accuracy and precision. It is, therefore, important to highlight the problems of logistic regression and search the alternative method. Four causal diagram models were defined to summarize confounding equivalence. Both theoretical proofs and simulation studies were performed to verify whether conditioning on different confounding equivalence sets had the same bias-reducing potential and then to select the optimum adjusting strategy, in which logistic regression model and inverse probability weighting based marginal structural model (IPW-based-MSM) were compared. The "do-calculus" was used to calculate the true causal effect of exposure on outcome, then the bias and standard error were used to evaluate the performances of different strategies. Adjusting for different sets of confounding equivalence, as judged by identical Markov boundaries, produced different bias-reducing potential in the logistic regression model. For the sets satisfied G-admissibility, adjusting for the set including all the confounders reduced the equivalent bias to the one containing the parent nodes of the outcome, while the bias after adjusting for the parent nodes of exposure was not equivalent to them. In addition, all causal effect estimations through logistic regression were biased, although the estimation after adjusting for the parent nodes of exposure was nearest to the true causal effect. However, conditioning on different confounding equivalence sets had the same bias-reducing potential under IPW-based-MSM. Compared with logistic regression, the IPW-based-MSM could obtain unbiased causal effect estimation when the adjusted confounders satisfied G-admissibility and the optimal strategy was to adjust for the parent nodes of outcome, which obtained the highest precision. All adjustment strategies through logistic regression were biased for causal effect estimation, while IPW-based-MSM could always obtain unbiased estimation when the adjusted set satisfied G-admissibility. Thus, IPW-based-MSM was recommended to adjust for confounders set.
Use and interpretation of logistic regression in habitat-selection studies
Keating, Kim A.; Cherry, Steve
2004-01-01
Logistic regression is an important tool for wildlife habitat-selection studies, but the method frequently has been misapplied due to an inadequate understanding of the logistic model, its interpretation, and the influence of sampling design. To promote better use of this method, we review its application and interpretation under 3 sampling designs: random, case-control, and use-availability. Logistic regression is appropriate for habitat use-nonuse studies employing random sampling and can be used to directly model the conditional probability of use in such cases. Logistic regression also is appropriate for studies employing case-control sampling designs, but careful attention is required to interpret results correctly. Unless bias can be estimated or probability of use is small for all habitats, results of case-control studies should be interpreted as odds ratios, rather than probability of use or relative probability of use. When data are gathered under a use-availability design, logistic regression can be used to estimate approximate odds ratios if probability of use is small, at least on average. More generally, however, logistic regression is inappropriate for modeling habitat selection in use-availability studies. In particular, using logistic regression to fit the exponential model of Manly et al. (2002:100) does not guarantee maximum-likelihood estimates, valid probabilities, or valid likelihoods. We show that the resource selection function (RSF) commonly used for the exponential model is proportional to a logistic discriminant function. Thus, it may be used to rank habitats with respect to probability of use and to identify important habitat characteristics or their surrogates, but it is not guaranteed to be proportional to probability of use. Other problems associated with the exponential model also are discussed. We describe an alternative model based on Lancaster and Imbens (1996) that offers a method for estimating conditional probability of use in use-availability studies. Although promising, this model fails to converge to a unique solution in some important situations. Further work is needed to obtain a robust method that is broadly applicable to use-availability studies.
Riniker, Sereina; Fechner, Nikolas; Landrum, Gregory A
2013-11-25
The concept of data fusion - the combination of information from different sources describing the same object with the expectation to generate a more accurate representation - has found application in a very broad range of disciplines. In the context of ligand-based virtual screening (VS), data fusion has been applied to combine knowledge from either different active molecules or different fingerprints to improve similarity search performance. Machine-learning (ML) methods based on fusion of multiple homogeneous classifiers, in particular random forests, have also been widely applied in the ML literature. The heterogeneous version of classifier fusion - fusing the predictions from different model types - has been less explored. Here, we investigate heterogeneous classifier fusion for ligand-based VS using three different ML methods, RF, naïve Bayes (NB), and logistic regression (LR), with four 2D fingerprints, atom pairs, topological torsions, RDKit fingerprint, and circular fingerprint. The methods are compared using a previously developed benchmarking platform for 2D fingerprints which is extended to ML methods in this article. The original data sets are filtered for difficulty, and a new set of challenging data sets from ChEMBL is added. Data sets were also generated for a second use case: starting from a small set of related actives instead of diverse actives. The final fused model consistently outperforms the other approaches across the broad variety of targets studied, indicating that heterogeneous classifier fusion is a very promising approach for ligand-based VS. The new data sets together with the adapted source code for ML methods are provided in the Supporting Information .
Automated identification of diagnosis and co-morbidity in clinical records.
Cano, C; Blanco, A; Peshkin, L
2009-01-01
Automated understanding of clinical records is a challenging task involving various legal and technical difficulties. Clinical free text is inherently redundant, unstructured, and full of acronyms, abbreviations and domain-specific language which make it challenging to mine automatically. There is much effort in the field focused on creating specialized ontology, lexicons and heuristics based on expert knowledge of the domain. However, ad-hoc solutions poorly generalize across diseases or diagnoses. This paper presents a successful approach for a rapid prototyping of a diagnosis classifier based on a popular computational linguistics platform. The corpus consists of several hundred of full length discharge summaries provided by Partners Healthcare. The goal is to identify a diagnosis and assign co-morbidi-ty. Our approach is based on the rapid implementation of a logistic regression classifier using an existing toolkit: LingPipe (http://alias-i.com/lingpipe). We implement and compare three different classifiers. The baseline approach uses character 5-grams as features. The second approach uses a bag-of-words representation enriched with a small additional set of features. The third approach reduces a feature set to the most informative features according to the information content. The proposed systems achieve high performance (average F-micro 0.92) for the task. We discuss the relative merit of the three classifiers. Supplementary material with detailed results is available at: http:// decsai.ugr.es/~ccano/LR/supplementary_ material/ We show that our methodology for rapid prototyping of a domain-unaware system is effective for building an accurate classifier for clinical records.
Real-data comparison of data mining methods in prediction of diabetes in iran.
Tapak, Lily; Mahjub, Hossein; Hamidi, Omid; Poorolajal, Jalal
2013-09-01
Diabetes is one of the most common non-communicable diseases in developing countries. Early screening and diagnosis play an important role in effective prevention strategies. This study compared two traditional classification methods (logistic regression and Fisher linear discriminant analysis) and four machine-learning classifiers (neural networks, support vector machines, fuzzy c-mean, and random forests) to classify persons with and without diabetes. The data set used in this study included 6,500 subjects from the Iranian national non-communicable diseases risk factors surveillance obtained through a cross-sectional survey. The obtained sample was based on cluster sampling of the Iran population which was conducted in 2005-2009 to assess the prevalence of major non-communicable disease risk factors. Ten risk factors that are commonly associated with diabetes were selected to compare the performance of six classifiers in terms of sensitivity, specificity, total accuracy, and area under the receiver operating characteristic (ROC) curve criteria. Support vector machines showed the highest total accuracy (0.986) as well as area under the ROC (0.979). Also, this method showed high specificity (1.000) and sensitivity (0.820). All other methods produced total accuracy of more than 85%, but for all methods, the sensitivity values were very low (less than 0.350). The results of this study indicate that, in terms of sensitivity, specificity, and overall classification accuracy, the support vector machine model ranks first among all the classifiers tested in the prediction of diabetes. Therefore, this approach is a promising classifier for predicting diabetes, and it should be further investigated for the prediction of other diseases.
The need for pediatric-specific triage criteria: results from the Florida Trauma Triage Study.
Phillips, S; Rond, P C; Kelly, S M; Swartz, P D
1996-12-01
The objective of the Florida Trauma Triage Study was to assess the performance of state-adopted field triage criteria. The study addressed three specific age groups: pediatric (age < 15 years), adult (age 15-54 years), and geriatric (age 55+ years). Since 1990, Florida has used a uniform set of eight triage criteria, known as the trauma scorecard, for triaging adult trauma patients to state-approved trauma centers. However, only five of the criteria are recommended for use with pediatric patients. This article presents the findings regarding the performance of the scorecard when applied to a pediatric population. We used state trauma registry data linked to state hospital discharge data in a retrospective analysis of trauma patients transported by prehospital providers to any acute care hospital within nine selected Florida counties between July 1, 1991, and December 31, 1991. We used cross-table and logistic regression analysis to determine the ability of triage criteria to correctly identify patients who were retrospectively defined as major trauma. We applied the field criteria to physiologic and anatomy/mechanism of injury data contained in the trauma registry to "score" the patient as major or minor trauma. To make our retrospective determination of major or minor trauma we used the protocols developed by an expert medical panel as described by E. J. MacKenzie et al. (1990). We calculated sensitivity, specificity, and the corresponding over- and undertriage rates by comparing patient classifications (major or minor trauma) produced by the triage criteria and the retrospective algorithm. We used logistic regression to identify which triage criteria were statistically significant in predicting major trauma. Pediatric cases accounted for 9.2% of the total study population, 6.0% of all hospitalized cases, and 6.8% of all trauma deaths. Of the 1505 pediatric cases available for analysis, the triage criteria classified 269 cases as expected major trauma and 1236 cases as expected minor trauma. The retrospective algorithm classified 78 cases as expected major trauma and 1427 cases as expected minor trauma. The resulting specificity is 84.8% (15.2% overtriage), and the sensitivity is 66.7% (33.3% undertriage). Logistic regression indicated that, of the eight state-adopted field triage criteria, only the Glasgow coma score, ejection from vehicle, and penetrating injuries have a statistically significant impact on predicting major trauma in pediatric patients. Although the state-adopted trauma scorecard, applied to a pediatric population, produced acceptable overtriage, it did not produce acceptable undertriage. However, our undertriage rate is comparable to the results of other published studies on pediatric trauma. As a result of the Florida Trauma Triage Study, a new pediatric triage instrument was developed. It is currently being field-tested.
Mental illness in metropolitan, urban and rural Georgia populations
2013-01-01
Background Mental illness represents an important public health problem. Local-level data concerning mental illness in different populations (e.g., socio-demographics and residence – metropolitan/urban/rural) provides the evidence-base for public health authorities to plan, implement and evaluate control programs. This paper describes prevalence and covariates of psychiatric conditions in Georgia populations in three defined geographic areas. Methods Data came from the Georgia population-based random-digit-dialing study investigating unwellness and chronic fatigue syndrome (CFS) in Georgia populations of three defined geographic areas (metropolitan, urban, and rural). Respondents were screened for symptoms of fatigue, sleep, cognition, and pain at household screening interviews, and a randomly selected sample completed detailed individual phone interviews. Based on the detailed phone interviews, we conducted one-day clinical evaluations of 292 detailed interview participants classified as unwell with a probable CFS (i.e. CFS-like; a functional somatic syndrome), 268 classified as other unwell, and 223 well (matched to CFS-like). Clinical evaluation included psychiatric classification by means of the Structured Clinical Interview for DSM (SCID). To derive prevalence estimates we used sample weighting to account for the complexity of the multistage sampling design. We used 2- and 3-way table analyses to examine socio-demographic and urbanicity specific associations and multiple logistic regression to calculate adjusted odds ratios. Results Anxiety and mood disorders were the most common psychiatric conditions. Nineteen percent of participants suffered a current anxiety disorder, 18% a mood disorder and 10% had two or more conditions. There was a significant linear trend in occurrence of anxiety or mood disorders from well to CFS-like. The most common anxiety disorders were post-traumatic stress disorder (PTSD) (6.6%) and generalized anxiety disorder (GAD) (5.8%). Logistic regression showed that lower education and female sex contributed significantly to risk for both PTSD and GAD. In addition, rural/urban residence and Hispanic ethnicity were associated with PTSD. We defined moderate to severe depression as Major Depressive Disorder or a Zung score >60 and logistic regression found lower education to be significantly associated but sex, age and urbanicity were not. Conclusions Overall occurrence of anxiety and mood disorders in Georgia mirrored national findings. However, PTSD and GAD occurred at twice the published national rates (3.6 and 2.7%, respectively). State and local prevalence and associations with education, sex and urbanicity comprise important considerations for developing control programs. The increased prevalence of anxiety and mood disorders in people with a functional somatic syndrome (or CFS-like illness) is important for primary care providers, who should consider additional psychiatric screening or referral of individuals presenting with somatoform symptoms. PMID:23631737
Modeling Governance KB with CATPCA to Overcome Multicollinearity in the Logistic Regression
NASA Astrophysics Data System (ADS)
Khikmah, L.; Wijayanto, H.; Syafitri, U. D.
2017-04-01
The problem often encounters in logistic regression modeling are multicollinearity problems. Data that have multicollinearity between explanatory variables with the result in the estimation of parameters to be bias. Besides, the multicollinearity will result in error in the classification. In general, to overcome multicollinearity in regression used stepwise regression. They are also another method to overcome multicollinearity which involves all variable for prediction. That is Principal Component Analysis (PCA). However, classical PCA in only for numeric data. Its data are categorical, one method to solve the problems is Categorical Principal Component Analysis (CATPCA). Data were used in this research were a part of data Demographic and Population Survey Indonesia (IDHS) 2012. This research focuses on the characteristic of women of using the contraceptive methods. Classification results evaluated using Area Under Curve (AUC) values. The higher the AUC value, the better. Based on AUC values, the classification of the contraceptive method using stepwise method (58.66%) is better than the logistic regression model (57.39%) and CATPCA (57.39%). Evaluation of the results of logistic regression using sensitivity, shows the opposite where CATPCA method (99.79%) is better than logistic regression method (92.43%) and stepwise (92.05%). Therefore in this study focuses on major class classification (using a contraceptive method), then the selected model is CATPCA because it can raise the level of the major class model accuracy.
Brady, Christopher John; Mudie, Lucy Iluka; Wang, Xueyang; Guallar, Eliseo; Friedman, David Steven
2017-06-20
Diabetic retinopathy (DR) is a leading cause of vision loss in working age individuals worldwide. While screening is effective and cost effective, it remains underutilized, and novel methods are needed to increase detection of DR. This clinical validation study compared diagnostic gradings of retinal fundus photographs provided by volunteers on the Amazon Mechanical Turk (AMT) crowdsourcing marketplace with expert-provided gold-standard grading and explored whether determination of the consensus of crowdsourced classifications could be improved beyond a simple majority vote (MV) using regression methods. The aim of our study was to determine whether regression methods could be used to improve the consensus grading of data collected by crowdsourcing. A total of 1200 retinal images of individuals with diabetes mellitus from the Messidor public dataset were posted to AMT. Eligible crowdsourcing workers had at least 500 previously approved tasks with an approval rating of 99% across their prior submitted work. A total of 10 workers were recruited to classify each image as normal or abnormal. If half or more workers judged the image to be abnormal, the MV consensus grade was recorded as abnormal. Rasch analysis was then used to calculate worker ability scores in a random 50% training set, which were then used as weights in a regression model in the remaining 50% test set to determine if a more accurate consensus could be devised. Outcomes of interest were the percent correctly classified images, sensitivity, specificity, and area under the receiver operating characteristic (AUROC) for the consensus grade as compared with the expert grading provided with the dataset. Using MV grading, the consensus was correct in 75.5% of images (906/1200), with 75.5% sensitivity, 75.5% specificity, and an AUROC of 0.75 (95% CI 0.73-0.78). A logistic regression model using Rasch-weighted individual scores generated an AUROC of 0.91 (95% CI 0.88-0.93) compared with 0.89 (95% CI 0.86-92) for a model using unweighted scores (chi-square P value<.001). Setting a diagnostic cut-point to optimize sensitivity at 90%, 77.5% (465/600) were graded correctly, with 90.3% sensitivity, 68.5% specificity, and an AUROC of 0.79 (95% CI 0.76-0.83). Crowdsourced interpretations of retinal images provide rapid and accurate results as compared with a gold-standard grading. Creating a logistic regression model using Rasch analysis to weight crowdsourced classifications by worker ability improves accuracy of aggregated grades as compared with simple majority vote. ©Christopher John Brady, Lucy Iluka Mudie, Xueyang Wang, Eliseo Guallar, David Steven Friedman. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 20.06.2017.
Logistic regression models of factors influencing the location of bioenergy and biofuels plants
T.M. Young; R.L. Zaretzki; J.H. Perdue; F.M. Guess; X. Liu
2011-01-01
Logistic regression models were developed to identify significant factors that influence the location of existing wood-using bioenergy/biofuels plants and traditional wood-using facilities. Logistic models provided quantitative insight for variables influencing the location of woody biomass-using facilities. Availability of "thinnings to a basal area of 31.7m2/ha...
Discrete post-processing of total cloud cover ensemble forecasts
NASA Astrophysics Data System (ADS)
Hemri, Stephan; Haiden, Thomas; Pappenberger, Florian
2017-04-01
This contribution presents an approach to post-process ensemble forecasts for the discrete and bounded weather variable of total cloud cover. Two methods for discrete statistical post-processing of ensemble predictions are tested. The first approach is based on multinomial logistic regression, the second involves a proportional odds logistic regression model. Applying them to total cloud cover raw ensemble forecasts from the European Centre for Medium-Range Weather Forecasts improves forecast skill significantly. Based on station-wise post-processing of raw ensemble total cloud cover forecasts for a global set of 3330 stations over the period from 2007 to early 2014, the more parsimonious proportional odds logistic regression model proved to slightly outperform the multinomial logistic regression model. Reference Hemri, S., Haiden, T., & Pappenberger, F. (2016). Discrete post-processing of total cloud cover ensemble forecasts. Monthly Weather Review 144, 2565-2577.
Fuzzy multinomial logistic regression analysis: A multi-objective programming approach
NASA Astrophysics Data System (ADS)
Abdalla, Hesham A.; El-Sayed, Amany A.; Hamed, Ramadan
2017-05-01
Parameter estimation for multinomial logistic regression is usually based on maximizing the likelihood function. For large well-balanced datasets, Maximum Likelihood (ML) estimation is a satisfactory approach. Unfortunately, ML can fail completely or at least produce poor results in terms of estimated probabilities and confidence intervals of parameters, specially for small datasets. In this study, a new approach based on fuzzy concepts is proposed to estimate parameters of the multinomial logistic regression. The study assumes that the parameters of multinomial logistic regression are fuzzy. Based on the extension principle stated by Zadeh and Bárdossy's proposition, a multi-objective programming approach is suggested to estimate these fuzzy parameters. A simulation study is used to evaluate the performance of the new approach versus Maximum likelihood (ML) approach. Results show that the new proposed model outperforms ML in cases of small datasets.
NASA Astrophysics Data System (ADS)
Tahernezhad-Javazm, Farajollah; Azimirad, Vahid; Shoaran, Maryam
2018-04-01
Objective. Considering the importance and the near-future development of noninvasive brain-machine interface (BMI) systems, this paper presents a comprehensive theoretical-experimental survey on the classification and evolutionary methods for BMI-based systems in which EEG signals are used. Approach. The paper is divided into two main parts. In the first part, a wide range of different types of the base and combinatorial classifiers including boosting and bagging classifiers and evolutionary algorithms are reviewed and investigated. In the second part, these classifiers and evolutionary algorithms are assessed and compared based on two types of relatively widely used BMI systems, sensory motor rhythm-BMI and event-related potentials-BMI. Moreover, in the second part, some of the improved evolutionary algorithms as well as bi-objective algorithms are experimentally assessed and compared. Main results. In this study two databases are used, and cross-validation accuracy (CVA) and stability to data volume (SDV) are considered as the evaluation criteria for the classifiers. According to the experimental results on both databases, regarding the base classifiers, linear discriminant analysis and support vector machines with respect to CVA evaluation metric, and naive Bayes with respect to SDV demonstrated the best performances. Among the combinatorial classifiers, four classifiers, Bagg-DT (bagging decision tree), LogitBoost, and GentleBoost with respect to CVA, and Bagging-LR (bagging logistic regression) and AdaBoost (adaptive boosting) with respect to SDV had the best performances. Finally, regarding the evolutionary algorithms, single-objective invasive weed optimization (IWO) and bi-objective nondominated sorting IWO algorithms demonstrated the best performances. Significance. We present a general survey on the base and the combinatorial classification methods for EEG signals (sensory motor rhythm and event-related potentials) as well as their optimization methods through the evolutionary algorithms. In addition, experimental and statistical significance tests are carried out to study the applicability and effectiveness of the reviewed methods.
Guinness, Robert E
2015-04-28
This paper presents the results of research on the use of smartphone sensors (namely, GPS and accelerometers), geospatial information (points of interest, such as bus stops and train stations) and machine learning (ML) to sense mobility contexts. Our goal is to develop techniques to continuously and automatically detect a smartphone user's mobility activities, including walking, running, driving and using a bus or train, in real-time or near-real-time (<5 s). We investigated a wide range of supervised learning techniques for classification, including decision trees (DT), support vector machines (SVM), naive Bayes classifiers (NB), Bayesian networks (BN), logistic regression (LR), artificial neural networks (ANN) and several instance-based classifiers (KStar, LWLand IBk). Applying ten-fold cross-validation, the best performers in terms of correct classification rate (i.e., recall) were DT (96.5%), BN (90.9%), LWL (95.5%) and KStar (95.6%). In particular, the DT-algorithm RandomForest exhibited the best overall performance. After a feature selection process for a subset of algorithms, the performance was improved slightly. Furthermore, after tuning the parameters of RandomForest, performance improved to above 97.5%. Lastly, we measured the computational complexity of the classifiers, in terms of central processing unit (CPU) time needed for classification, to provide a rough comparison between the algorithms in terms of battery usage requirements. As a result, the classifiers can be ranked from lowest to highest complexity (i.e., computational cost) as follows: SVM, ANN, LR, BN, DT, NB, IBk, LWL and KStar. The instance-based classifiers take considerably more computational time than the non-instance-based classifiers, whereas the slowest non-instance-based classifier (NB) required about five-times the amount of CPU time as the fastest classifier (SVM). The above results suggest that DT algorithms are excellent candidates for detecting mobility contexts in smartphones, both in terms of performance and computational complexity.
Guinness, Robert E.
2015-01-01
This paper presents the results of research on the use of smartphone sensors (namely, GPS and accelerometers), geospatial information (points of interest, such as bus stops and train stations) and machine learning (ML) to sense mobility contexts. Our goal is to develop techniques to continuously and automatically detect a smartphone user's mobility activities, including walking, running, driving and using a bus or train, in real-time or near-real-time (<5 s). We investigated a wide range of supervised learning techniques for classification, including decision trees (DT), support vector machines (SVM), naive Bayes classifiers (NB), Bayesian networks (BN), logistic regression (LR), artificial neural networks (ANN) and several instance-based classifiers (KStar, LWLand IBk). Applying ten-fold cross-validation, the best performers in terms of correct classification rate (i.e., recall) were DT (96.5%), BN (90.9%), LWL (95.5%) and KStar (95.6%). In particular, the DT-algorithm RandomForest exhibited the best overall performance. After a feature selection process for a subset of algorithms, the performance was improved slightly. Furthermore, after tuning the parameters of RandomForest, performance improved to above 97.5%. Lastly, we measured the computational complexity of the classifiers, in terms of central processing unit (CPU) time needed for classification, to provide a rough comparison between the algorithms in terms of battery usage requirements. As a result, the classifiers can be ranked from lowest to highest complexity (i.e., computational cost) as follows: SVM, ANN, LR, BN, DT, NB, IBk, LWL and KStar. The instance-based classifiers take considerably more computational time than the non-instance-based classifiers, whereas the slowest non-instance-based classifier (NB) required about five-times the amount of CPU time as the fastest classifier (SVM). The above results suggest that DT algorithms are excellent candidates for detecting mobility contexts in smartphones, both in terms of performance and computational complexity. PMID:25928060
Co-occurring risk factors for current cigarette smoking in a U.S. nationally representative sample
Higgins, Stephen T.; Kurti, Allison N.; Redner, Ryan; White, Thomas J.; Keith, Diana R.; Gaalema, Diann E.; Sprague, Brian L.; Stanton, Cassandra A.; Roberts, Megan E.; Doogan, Nathan J.; Priest, Jeff S.
2016-01-01
Introduction Relatively little has been reported characterizing cumulative risk associated with co-occurring risk factors for cigarette smoking. The purpose of the present study was to address that knowledge gap in a U.S. nationally representative sample. Methods Data were obtained from 114,426 adults (≥ 18 years) in the U.S. National Survey on Drug Use and Health (years 2011–13). Multiple logistic regression and classification and regression tree (CART) modeling were used to examine risk of current smoking associated with eight co-occurring risk factors (age, gender, race/ethnicity, educational attainment, poverty, drug abuse/dependence, alcohol abuse/dependence, mental illness). Results Each of these eight risk factors was independently associated with significant increases in the odds of smoking when concurrently present in a multiple logistic regression model. Effects of risk-factor combinations were typically summative. Exceptions to that pattern were in the direction of less-than-summative effects when one of the combined risk factors was associated with generally high or low rates of smoking (e.g., drug abuse/dependence, age ≥65). CART modeling identified subpopulation risk profiles wherein smoking prevalence varied from a low of 11% to a high of 74% depending on particular risk factor combinations. Being a college graduate was the strongest independent predictor of smoking status, classifying 30% of the adult population. Conclusions These results offer strong evidence that the effects associated with common risk factors for cigarette smoking are independent, cumulative, and generally summative. The results also offer potentially useful insights into national population risk profiles around which U.S. tobacco policies can be developed or refined. PMID:26902875
Patient Stratification Using Electronic Health Records from a Chronic Disease Management Program.
Chen, Robert; Sun, Jimeng; Dittus, Robert S; Fabbri, Daniel; Kirby, Jacqueline; Laffer, Cheryl L; McNaughton, Candace D; Malin, Bradley
2016-01-04
The goal of this study is to devise a machine learning framework to assist care coordination programs in prognostic stratification to design and deliver personalized care plans and to allocate financial and medical resources effectively. This study is based on a de-identified cohort of 2,521 hypertension patients from a chronic care coordination program at the Vanderbilt University Medical Center. Patients were modeled as vectors of features derived from electronic health records (EHRs) over a six-year period. We applied a stepwise regression to identify risk factors associated with a decrease in mean arterial pressure of at least 2 mmHg after program enrollment. The resulting features were subsequently validated via a logistic regression classifier. Finally, risk factors were applied to group the patients through model-based clustering. We identified a set of predictive features that consisted of a mix of demographic, medication, and diagnostic concepts. Logistic regression over these features yielded an area under the ROC curve (AUC) of 0.71 (95% CI: [0.67, 0.76]). Based on these features, four clinically meaningful groups are identified through clustering - two of which represented patients with more severe disease profiles, while the remaining represented patients with mild disease profiles. Patients with hypertension can exhibit significant variation in their blood pressure control status and responsiveness to therapy. Yet this work shows that a clustering analysis can generate more homogeneous patient groups, which may aid clinicians in designing and implementing customized care programs. The study shows that predictive modeling and clustering using EHR data can be beneficial for providing a systematic, generalized approach for care providers to tailor their management approach based upon patient-level factors.
A Primer on Logistic Regression.
ERIC Educational Resources Information Center
Woldbeck, Tanya
This paper introduces logistic regression as a viable alternative when the researcher is faced with variables that are not continuous. If one is to use simple regression, the dependent variable must be measured on a continuous scale. In the behavioral sciences, it may not always be appropriate or possible to have a measured dependent variable on a…
A molecular topology approach to predicting pesticide pollution of groundwater
Worrall , Fred
2001-01-01
Various models have proposed methods for the discrimination of polluting and nonpolluting compounds on the basis of simple parameters, typically adsorption and degradation constants. However, such attempts are prone to site variability and measurement error to the extent that compounds cannot be reliably classified nor the chemistry of pollution extrapolated from them. Using observations of pesticide occurrence in U.S. groundwater it is possible to show that polluting from nonpolluting compounds can be distinguished purely on the basis of molecular topology. Topological parameters can be derived without measurement error or site-specific variability. A logistic regression model has been developed which explains 97% of the variation in the data, with 86% of the variation being explained by the rule that a compound will be found in groundwater if 6 < 0.55. Where 6χp is the sixth-order molecular path connectivity. One group of compounds cannot be classified by this rule and prediction requires reference to higher order connectivity parameters. The use of molecular approaches for understanding pollution at the molecular level and their application to agrochemical development and risk assessment is discussed.
[Study on application of SVM in prediction of coronary heart disease].
Zhu, Yue; Wu, Jianghua; Fang, Ying
2013-12-01
Base on the data of blood pressure, plasma lipid, Glu and UA by physical test, Support Vector Machine (SVM) was applied to identify coronary heart disease (CHD) in patients and non-CHD individuals in south China population for guide of further prevention and treatment of the disease. Firstly, the SVM classifier was built using radial basis kernel function, liner kernel function and polynomial kernel function, respectively. Secondly, the SVM penalty factor C and kernel parameter sigma were optimized by particle swarm optimization (PSO) and then employed to diagnose and predict the CHD. By comparison with those from artificial neural network with the back propagation (BP) model, linear discriminant analysis, logistic regression method and non-optimized SVM, the overall results of our calculation demonstrated that the classification performance of optimized RBF-SVM model could be superior to other classifier algorithm with higher accuracy rate, sensitivity and specificity, which were 94.51%, 92.31% and 96.67%, respectively. So, it is well concluded that SVM could be used as a valid method for assisting diagnosis of CHD.
A Solution to Separation and Multicollinearity in Multiple Logistic Regression
Shen, Jianzhao; Gao, Sujuan
2010-01-01
In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27–38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth’s penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study. PMID:20376286
A Solution to Separation and Multicollinearity in Multiple Logistic Regression.
Shen, Jianzhao; Gao, Sujuan
2008-10-01
In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27-38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth's penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study.
Ye, Dong-qing; Hu, Yi-song; Li, Xiang-pei; Huang, Fen; Yang, Shi-gui; Hao, Jia-hu; Yin, Jing; Zhang, Guo-qing; Liu, Hui-hui
2004-11-01
To explore the impact of environmental factors, daily lifestyle, psycho-social factors and the interactions between environmental factors and chemokines genes on systemic lupus erythematosus (SLE). Case-control study was carried out and environmental factors for SLE were analyzed by univariate and multivariate unconditional logistic regression. Interactions between environmental factors and chemokines polymorphism contributing to systemic lupus erythematosus were also analyzed by logistic regression model. There were nineteen factors associated with SLE when univariate unconditional logistic regression was used. However, when multivariate unconditional logistic regression was used, only five factors showed having impacts on the disease, in which drinking well water (OR=0.099) was protective factor for SLE, and multiple drug allergy (OR=8.174), over-exposure to sunshine (OR=18.339), taking antibiotics (OR=9.630) and oral contraceptives were risk factors for SLE. When unconditional logistic regression model was used, results showed that there was interaction between eating irritable food and -2518MCP-1G/G genotype (OR=4.387). No interaction between environmental factors was found that contributing to SLE in this study. Many environmental factors were related to SLE, and there was an interaction between -2518MCP-1G/G genotype and eating irritable food.
Mielniczuk, Jan; Teisseyre, Paweł
2018-03-01
Detection of gene-gene interactions is one of the most important challenges in genome-wide case-control studies. Besides traditional logistic regression analysis, recently the entropy-based methods attracted a significant attention. Among entropy-based methods, interaction information is one of the most promising measures having many desirable properties. Although both logistic regression and interaction information have been used in several genome-wide association studies, the relationship between them has not been thoroughly investigated theoretically. The present paper attempts to fill this gap. We show that although certain connections between the two methods exist, in general they refer two different concepts of dependence and looking for interactions in those two senses leads to different approaches to interaction detection. We introduce ordering between interaction measures and specify conditions for independent and dependent genes under which interaction information is more discriminative measure than logistic regression. Moreover, we show that for so-called perfect distributions those measures are equivalent. The numerical experiments illustrate the theoretical findings indicating that interaction information and its modified version are more universal tools for detecting various types of interaction than logistic regression and linkage disequilibrium measures. © 2017 WILEY PERIODICALS, INC.
ERIC Educational Resources Information Center
Shih, Ching-Lin; Liu, Tien-Hsiang; Wang, Wen-Chung
2014-01-01
The simultaneous item bias test (SIBTEST) method regression procedure and the differential item functioning (DIF)-free-then-DIF strategy are applied to the logistic regression (LR) method simultaneously in this study. These procedures are used to adjust the effects of matching true score on observed score and to better control the Type I error…
Federal Logistics Information System (FLIS) Procedures Manual, Volume 1, Change 1
1996-07-01
Se- 2 KAT Add FLIS Data Base Data 1 curity Classified Characteristics KDZ Delete Logistics Transfer 3 Data KFA Match Through Association I KFC File...a Cancelled menus normally furnished with this DIC NSNIPSCN, Related Generic or (2) the segment Z data pertains to an NSN. or Reference Number FSC...8/9 KEC Output Exceeds AUTODIN Limitations 4,5 vols 8/9 KFA Match through Association 4 vols 8/9 KFC File Data Minus Security Classified Character- 4
Actively learning to distinguish suspicious from innocuous anomalies in a batch of vehicle tracks
NASA Astrophysics Data System (ADS)
Qiu, Zhicong; Miller, David J.; Stieber, Brian; Fair, Tim
2014-06-01
We investigate the problem of actively learning to distinguish between two sets of anomalous vehicle tracks, innocuous" and suspicious", starting from scratch, without any initial examples of suspicious" and with no prior knowledge of what an operator would deem suspicious. This two-class problem is challenging because it is a priori unknown which track features may characterize the suspicious class. Furthermore, there is inherent imbalance in the sizes of the labeled innocuous" and suspicious" sets, even after some suspicious examples are identified. We present a comprehensive solution wherein a classifier learns to discriminate suspicious from innocuous based on derived p-value track features. Through active learning, our classifier thus learns the types of anomalies on which to base its discrimination. Our solution encompasses: i) judicious choice of kinematic p-value based features conditioned on the road of origin, along with more explicit features that capture unique vehicle behavior (e.g. U-turns); ii) novel semi-supervised learning that exploits information in the unlabeled (test batch) tracks, and iii) evaluation of several classifier models (logistic regression, SVMs). We find that two active labeling streams are necessary in practice in order to have efficient classifier learning while also forwarding (for labeling) the most actionable tracks. Experiments on wide-area motion imagery (WAMI) tracks, extracted via a system developed by Toyon Research Corporation, demonstrate the strong ROC AUC performance of our system, with sparing use of operator-based active labeling.
Access disparities to Magnet hospitals for patients undergoing neurosurgical operations
Missios, Symeon; Bekelis, Kimon
2017-01-01
Background Centers of excellence focusing on quality improvement have demonstrated superior outcomes for a variety of surgical interventions. We investigated the presence of access disparities to hospitals recognized by the Magnet Recognition Program of the American Nurses Credentialing Center (ANCC) for patients undergoing neurosurgical operations. Methods We performed a cohort study of all neurosurgery patients who were registered in the New York Statewide Planning and Research Cooperative System (SPARCS) database from 2009–2013. We examined the association of African-American race and lack of insurance with Magnet status hospitalization for neurosurgical procedures. A mixed effects propensity adjusted multivariable regression analysis was used to control for confounding. Results During the study period, 190,535 neurosurgical patients met the inclusion criteria. Using a multivariable logistic regression, we demonstrate that African-Americans had lower admission rates to Magnet institutions (OR 0.62; 95% CI, 0.58–0.67). This persisted in a mixed effects logistic regression model (OR 0.77; 95% CI, 0.70–0.83) to adjust for clustering at the patient county level, and a propensity score adjusted logistic regression model (OR 0.75; 95% CI, 0.69–0.82). Additionally, lack of insurance was associated with lower admission rates to Magnet institutions (OR 0.71; 95% CI, 0.68–0.73), in a multivariable logistic regression model. This persisted in a mixed effects logistic regression model (OR 0.72; 95% CI, 0.69–0.74), and a propensity score adjusted logistic regression model (OR 0.72; 95% CI, 0.69–0.75). Conclusions Using a comprehensive all-payer cohort of neurosurgery patients in New York State we identified an association of African-American race and lack of insurance with lower rates of admission to Magnet hospitals. PMID:28684152
Adjusting for Confounding in Early Postlaunch Settings: Going Beyond Logistic Regression Models.
Schmidt, Amand F; Klungel, Olaf H; Groenwold, Rolf H H
2016-01-01
Postlaunch data on medical treatments can be analyzed to explore adverse events or relative effectiveness in real-life settings. These analyses are often complicated by the number of potential confounders and the possibility of model misspecification. We conducted a simulation study to compare the performance of logistic regression, propensity score, disease risk score, and stabilized inverse probability weighting methods to adjust for confounding. Model misspecification was induced in the independent derivation dataset. We evaluated performance using relative bias confidence interval coverage of the true effect, among other metrics. At low events per coefficient (1.0 and 0.5), the logistic regression estimates had a large relative bias (greater than -100%). Bias of the disease risk score estimates was at most 13.48% and 18.83%. For the propensity score model, this was 8.74% and >100%, respectively. At events per coefficient of 1.0 and 0.5, inverse probability weighting frequently failed or reduced to a crude regression, resulting in biases of -8.49% and 24.55%. Coverage of logistic regression estimates became less than the nominal level at events per coefficient ≤5. For the disease risk score, inverse probability weighting, and propensity score, coverage became less than nominal at events per coefficient ≤2.5, ≤1.0, and ≤1.0, respectively. Bias of misspecified disease risk score models was 16.55%. In settings with low events/exposed subjects per coefficient, disease risk score methods can be useful alternatives to logistic regression models, especially when propensity score models cannot be used. Despite better performance of disease risk score methods than logistic regression and propensity score models in small events per coefficient settings, bias, and coverage still deviated from nominal.
Pfeiffer, R M; Riedl, R
2015-08-15
We assess the asymptotic bias of estimates of exposure effects conditional on covariates when summary scores of confounders, instead of the confounders themselves, are used to analyze observational data. First, we study regression models for cohort data that are adjusted for summary scores. Second, we derive the asymptotic bias for case-control studies when cases and controls are matched on a summary score, and then analyzed either using conditional logistic regression or by unconditional logistic regression adjusted for the summary score. Two scores, the propensity score (PS) and the disease risk score (DRS) are studied in detail. For cohort analysis, when regression models are adjusted for the PS, the estimated conditional treatment effect is unbiased only for linear models, or at the null for non-linear models. Adjustment of cohort data for DRS yields unbiased estimates only for linear regression; all other estimates of exposure effects are biased. Matching cases and controls on DRS and analyzing them using conditional logistic regression yields unbiased estimates of exposure effect, whereas adjusting for the DRS in unconditional logistic regression yields biased estimates, even under the null hypothesis of no association. Matching cases and controls on the PS yield unbiased estimates only under the null for both conditional and unconditional logistic regression, adjusted for the PS. We study the bias for various confounding scenarios and compare our asymptotic results with those from simulations with limited sample sizes. To create realistic correlations among multiple confounders, we also based simulations on a real dataset. Copyright © 2015 John Wiley & Sons, Ltd.
Nie, Z Q; Ou, Y Q; Zhuang, J; Qu, Y J; Mai, J Z; Chen, J M; Liu, X Q
2016-05-01
Conditional logistic regression analysis and unconditional logistic regression analysis are commonly used in case control study, but Cox proportional hazard model is often used in survival data analysis. Most literature only refer to main effect model, however, generalized linear model differs from general linear model, and the interaction was composed of multiplicative interaction and additive interaction. The former is only statistical significant, but the latter has biological significance. In this paper, macros was written by using SAS 9.4 and the contrast ratio, attributable proportion due to interaction and synergy index were calculated while calculating the items of logistic and Cox regression interactions, and the confidence intervals of Wald, delta and profile likelihood were used to evaluate additive interaction for the reference in big data analysis in clinical epidemiology and in analysis of genetic multiplicative and additive interactions.
Artificial Intelligence Systems as Prognostic and Predictive Tools in Ovarian Cancer.
Enshaei, A; Robson, C N; Edmondson, R J
2015-11-01
The ability to provide accurate prognostic and predictive information to patients is becoming increasingly important as clinicians enter an era of personalized medicine. For a disease as heterogeneous as epithelial ovarian cancer, conventional algorithms become too complex for routine clinical use. This study therefore investigated the potential for an artificial intelligence model to provide this information and compared it with conventional statistical approaches. The authors created a database comprising 668 cases of epithelial ovarian cancer during a 10-year period and collected data routinely available in a clinical environment. They also collected survival data for all the patients, then constructed an artificial intelligence model capable of comparing a variety of algorithms and classifiers alongside conventional statistical approaches such as logistic regression. The model was used to predict overall survival and demonstrated that an artificial neural network (ANN) algorithm was capable of predicting survival with high accuracy (93 %) and an area under the curve (AUC) of 0.74 and that this outperformed logistic regression. The model also was used to predict the outcome of surgery and again showed that ANN could predict outcome (complete/optimal cytoreduction vs. suboptimal cytoreduction) with 77 % accuracy and an AUC of 0.73. These data are encouraging and demonstrate that artificial intelligence systems may have a role in providing prognostic and predictive data for patients. The performance of these systems likely will improve with increasing data set size, and this needs further investigation.
Perez-Rodriguez, M. Mercedes; Weinstein, Shauna; New, Antonia S.; Bevilacqua, Laura; Yuan, Qiaoping; Zhou, Zhifeng; Hodgkinson, Colin; Goodman, Marianne; Koenigsberg, Harold W.; Goldman, David; Siever, Larry J.
2010-01-01
Background There is decreased serotonergic function in impulsive aggression and borderline personality disorder (BPD), and genetic association studies suggest a role of serotonergic genes in impulsive aggression and BPD. Only one study has analyzed the association between the tryptophan-hydroxylase 2 (TPH2) gene and BPD. A TPH2 “risk” haplotype has been described that is associated with anxiety, depression and suicidal behavior. Methods We assessed the relationship between the previously identified “risk” haplotype at the TPH2 locus and BPD diagnosis, impulsive aggression, affective lability, and suicidal/parasuicidal behaviors, in a well-characterized clinical sample of 103 healthy controls (HCs) and 251 patients with personality disorders (109 with BPD). A logistic regression including measures of depression, affective lability and aggression scores in predicting “risk” haplotype was conducted. Results The prevalence of the “risk” haplotype was significantly higher in patients with BPD compared to HCs. Those with the “risk” haplotype have higher aggression and affect lability scores and more suicidal/parasuicidal behaviors than those without it. In the logistic regression model, affect lability was the only significant predictor and it correctly classified 83.1% of the subjects as “risk” or “non-risk” haplotype carriers. Conclusions We found an association between the previously described TPH2 “risk” haplotype and BPD diagnosis, affective lability, suicidal/parasuicidal behavior, and aggression scores. PMID:20451217
Boonvisudhi, Thummaporn; Kuladee, Sanchai
2017-01-01
To study the extent of Internet addiction (IA) and its association with depression in Thai medical students. A cross-sectional study was conducted at Faculty of Medicine, Ramathibodi Hospital. Participants were first- to fifth-year medical students who agreed to participate in this study. Demographic characteristics and stress-related factors were derived from self-rated questionnaires. Depression was assessed using the Thai version of Patient Health Questionnaire (PHQ-9). A total score of five or greater derived from the Thai version of Young Diagnostic Questionnaire for Internet Addiction was classified as "possible IA". Then chi-square test and logistic regression were used to evaluate the associations between possible IA, depression and associated factors. From 705 participants, 24.4% had possible IA and 28.8% had depression. There was statistically significant association between possible IA and depression (odds ratio (OR) 1.92, 95% confidence interval (CI): 1.34-2.77, P-value <0.001). Logistic regression analysis illustrated that the odds of depression in possible IA group was 1.58 times of the group of normal Internet use (95% CI: 1.04-2.38, P-value = 0.031). Academic problems were found to be a significant predictor of both possible IA and depression. IA was likely to be a common psychiatric problem among Thai medical students. The research has also shown that possible IA was associated with depression and academic problems. We suggest that surveillance of IA should be considered in medical schools.
Workplace bullying a risk for permanent employees.
Keuskamp, Dominic; Ziersch, Anna M; Baum, Fran E; Lamontagne, Anthony D
2012-04-01
We tested the hypothesis that the risk of experiencing workplace bullying was greater for those employed on casual contracts compared to permanent or ongoing employees. A cross-sectional population-based telephone survey was conducted in South Australia in 2009. Employment arrangements were classified by self-report into four categories: permanent, casual, fixed-term and self-employed. Self-report of workplace bullying was modelled using multiple logistic regression in relation to employment arrangement, controlling for sex, age, working hours, years in job, occupational skill level, marital status and a proxy for socioeconomic status. Workplace bullying was reported by 174 respondents (15.2%). Risk of workplace bullying was higher for being in a professional occupation, having a university education and being separated, divorced or widowed, but did not vary significantly by sex, age or job tenure. In adjusted multivariate logistic regression models, casual workers were significantly less likely than workers on permanent or fixed-term contracts to report bullying. Those separated, divorced or widowed had higher odds of reporting bullying than married, de facto or never-married workers. Contrary to expectation, workplace bullying was more often reported by permanent than casual employees. It may represent an exposure pathway not previously linked with the more idealised permanent employment arrangement. A finer understanding of psycho-social hazards across all employment arrangements is needed, with equal attention to the hazards associated with permanent as well as casual employment. © 2012 The Authors. ANZJPH © 2012 Public Health Association of Australia.
Reider, Nadia; Salter, Amber R; Cutter, Gary R; Tyry, Tuula; Marrie, Ruth Ann
2017-04-01
Physical activity levels among persons with multiple sclerosis (MS) are worryingly low. We aimed to identify the factors associated with physical activity for people with MS, with an emphasis on factors that have not been studied previously (bladder and hand dysfunction) and are potentially modifiable. This study was a secondary analysis of data collected in the spring of 2012 during the North American Research Committee on Multiple Sclerosis (NARCOMS) Registry. NARCOMS participants were surveyed regarding smoking using questions from the Behavioral Risk Factor Surveillance Survey; disability using the Patient Determined Disease Steps; fatigue, cognition, spasticity, sensory, bladder, vision and hand function using self-reported Performance Scales; health literacy using the Medical Term Recognition Test; and physical activity using questions from the Health Information National Trends Survey. We used a forward binary logistic regression to develop a predictive model in which physical activity was the outcome variable. Of 8,755 respondents, 1,707 (19.5%) were classified as active and 7,068 (80.5%) as inactive. In logistic regression, being a current smoker, moderate or severe level of disability, depression, fatigue, hand, or bladder dysfunction and minimal to mild spasticity were associated with lower odds of meeting physical activity guidelines. MS type was not linked to activity level. Several modifiable clinical and lifestyle factors influenced physical activity in MS. Prospective studies are needed to evaluate whether modification of these factors can increase physical activity participation in persons with MS. © 2016 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Panic anxiety, under the weather?
NASA Astrophysics Data System (ADS)
Bulbena, A.; Pailhez, G.; Aceña, R.; Cunillera, J.; Rius, A.; Garcia-Ribera, C.; Gutiérrez, J.; Rojo, C.
2005-03-01
The relationship between weather conditions and psychiatric disorders has been a continuous subject of speculation due to contradictory findings. This study attempts to further clarify this relationship by focussing on specific conditions such as panic attacks and non-panic anxiety in relation to specific meteorological variables. All psychiatric emergencies attended at a general hospital in Barcelona (Spain) during 2002 with anxiety as main complaint were classified as panic or non-panic anxiety according to strict independent and retrospective criteria. Both groups were assessed and compared with meteorological data (wind speed and direction, daily rainfall, temperature, humidity and solar radiation). Seasons and weekend days were also included as independent variables. Non-parametric statistics were used throughout since most variables do not follow a normal distribution. Logistic regression models were applied to predict days with and without the clinical condition. Episodes of panic were three times more common with the poniente wind (hot wind), twice less often with rainfall, and one and a half times more common in autumn than in other seasons. These three trends (hot wind, rainfall and autumn) were accumulative for panic episodes in a logistic regression formula. Significant reduction of episodes on weekends was found only for non-panic episodes. Panic attacks, unlike other anxiety episodes, in a psychiatric emergency department in Barcelona seem to show significant meteorotropism. Assessing specific disorders instead of overall emergencies or other variables of a more general quality could shed new light on the relationship between weather conditions and behaviour.
Bernardy, K; Krampen, G; Köllner, V
2008-12-01
The aim of the present study was to identify factors at the beginning and at the end of an inpatient psychosomatic rehabilitation predicting the successful transfer of Progressive Relaxation (PR) according to Jacobson three months after the stay. Eighty patients in a psychosomatic rehabilitation centre were studied in the beginning (T1), at discharge (T2) and three months after discharge (T3). Every patient participated in courses on PR. To evaluate the course, parts of the "Diagnostisches und evaluatives Instrumentarium für Entspannungstraining und Entspannungstherapie" were used. Transfer was defined as successful if patients practised PR at least once a week three months after their stay. Potential predictors were: diagnosis, age, symptoms, previous experiences, and motives at T1 and frequency of practising, adequateness of group size and change of symptoms at T2. Stepwise logistic regression analysis was used to identify predictors. Three months after the course 52,5% of the patients were able to transfer PR successfully into their daily lives. 68,8% of cases had been correctly classified by logistic regression through: participation motive "positive thoughts" (T1) and "frequency of practising PR outside the course" (T2). Intrinsic participation motives and practising independently are significant predictors of long-term transfer of PR. This indicates the necessity of discussing motives at the beginning as well as frequency of practising during the PR course. It would be particularly interesting to know whether specific encouraging of motivation would improve the transfer to everyday life.
Association of sarcopenia with functional decline in community-dwelling elderly subjects in Japan.
Tanimoto, Yoshimi; Watanabe, Misuzu; Sun, Wei; Tanimoto, Keiji; Shishikura, Kanako; Sugiura, Yumiko; Kusabiraki, Toshiyuki; Kono, Koichi
2013-10-01
The present study aimed to determine the association of sarcopenia, defined by muscle mass, muscle strength and physical performance, with functional disability from a 2-year cohort study of community-dwelling elderly Japanese people. Participants were 743 community-dwelling elderly Japanese people aged 65 years or older. We used bioelectrical impedance analysis (BIA) to measure muscle mass, grip strength to measure muscle strength, and usual walking speed to measure physical performance in a baseline study. Functional disability was defined using an activities of daily living (ADL) scale and instrumental activities of daily living (IADL) scale at baseline and during follow-up examinations 2 years later. Logistic regression analysis, adjusted for age and body mass index, was used to examine the association between sarcopenia and the occurrence of functional disability. In the present study, 7.8% of men and 10.2% of women were classified as having sarcopenia. Among sarcopenia patients in the baseline study, 36.8% of men and 18.8% of women became dependent in ADL at 2-year follow up. From the logistic regression analysis adjusted by age and body mass index, sarcopenia was significantly associated with the occurrences of physical disability compared with normal subjects in both men and women. Sarcopenia, defined by muscle mass, muscle strength and physical performance, was associated with functional decline over a 2-year period in elderly Japanese. Interventions to prevent sarcopenia are very important to prevent functional decline among elderly individuals. © 2013 Japan Geriatrics Society.
No rationale for 1 variable per 10 events criterion for binary logistic regression analysis.
van Smeden, Maarten; de Groot, Joris A H; Moons, Karel G M; Collins, Gary S; Altman, Douglas G; Eijkemans, Marinus J C; Reitsma, Johannes B
2016-11-24
Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth's correction, are compared. The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect ('separation'). We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth's correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.
Li, Yi; Tseng, Yufeng J.; Pan, Dahua; Liu, Jianzhong; Kern, Petra S.; Gerberick, G. Frank; Hopfinger, Anton J.
2008-01-01
Currently, the only validated methods to identify skin sensitization effects are in vivo models, such as the Local Lymph Node Assay (LLNA) and guinea pig studies. There is a tremendous need, in particular due to novel legislation, to develop animal alternatives, eg. Quantitative Structure-Activity Relationship (QSAR) models. Here, QSAR models for skin sensitization using LLNA data have been constructed. The descriptors used to generate these models are derived from the 4D-molecular similarity paradigm and are referred to as universal 4D-fingerprints. A training set of 132 structurally diverse compounds and a test set of 15 structurally diverse compounds were used in this study. The statistical methodologies used to build the models are logistic regression (LR), and partial least square coupled logistic regression (PLS-LR), which prove to be effective tools for studying skin sensitization measures expressed in the two categorical terms of sensitizer and non-sensitizer. QSAR models with low values of the Hosmer-Lemeshow goodness-of-fit statistic, χHL2, are significant and predictive. For the training set, the cross-validated prediction accuracy of the logistic regression models ranges from 77.3% to 78.0%, while that of PLS-logistic regression models ranges from 87.1% to 89.4%. For the test set, the prediction accuracy of logistic regression models ranges from 80.0%-86.7%, while that of PLS-logistic regression models ranges from 73.3%-80.0%. The QSAR models are made up of 4D-fingerprints related to aromatic atoms, hydrogen bond acceptors and negatively partially charged atoms. PMID:17226934
Local curvature analysis for classifying breast tumors: Preliminary analysis in dedicated breast CT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Juhun, E-mail: leej15@upmc.edu; Nishikawa, Robert M.; Reiser, Ingrid
2015-09-15
Purpose: The purpose of this study is to measure the effectiveness of local curvature measures as novel image features for classifying breast tumors. Methods: A total of 119 breast lesions from 104 noncontrast dedicated breast computed tomography images of women were used in this study. Volumetric segmentation was done using a seed-based segmentation algorithm and then a triangulated surface was extracted from the resulting segmentation. Total, mean, and Gaussian curvatures were then computed. Normalized curvatures were used as classification features. In addition, traditional image features were also extracted and a forward feature selection scheme was used to select the optimalmore » feature set. Logistic regression was used as a classifier and leave-one-out cross-validation was utilized to evaluate the classification performances of the features. The area under the receiver operating characteristic curve (AUC, area under curve) was used as a figure of merit. Results: Among curvature measures, the normalized total curvature (C{sub T}) showed the best classification performance (AUC of 0.74), while the others showed no classification power individually. Five traditional image features (two shape, two margin, and one texture descriptors) were selected via the feature selection scheme and its resulting classifier achieved an AUC of 0.83. Among those five features, the radial gradient index (RGI), which is a margin descriptor, showed the best classification performance (AUC of 0.73). A classifier combining RGI and C{sub T} yielded an AUC of 0.81, which showed similar performance (i.e., no statistically significant difference) to the classifier with the above five traditional image features. Additional comparisons in AUC values between classifiers using different combinations of traditional image features and C{sub T} were conducted. The results showed that C{sub T} was able to replace the other four image features for the classification task. Conclusions: The normalized curvature measure contains useful information in classifying breast tumors. Using this, one can reduce the number of features in a classifier, which may result in more robust classifiers for different datasets.« less
Younes, Mohamed; Robert, Céline; Cottin, François; Barrey, Eric
2015-01-01
Nearly 50% of the horses participating in endurance events are eliminated at a veterinary examination (a vet gate). Detecting unfit horses before a health problem occurs and treatment is required is a challenge for veterinarians but is essential for improving equine welfare. We hypothesized that it would be possible to detect unfit horses earlier in the event by measuring heart rate recovery variables. Hence, the objective of the present study was to compute logistic regressions of heart rate, cardiac recovery time and average speed data recorded at the previous vet gate (n-1) and thus predict the probability of elimination during successive phases (n and following) in endurance events. Speed and heart rate data were extracted from an electronic database of endurance events (80–160 km in length) organized in four countries. Overall, 39% of the horses that started an event were eliminated—mostly due to lameness (64%) or metabolic disorders (15%). For each vet gate, logistic regressions of explanatory variables (average speed, cardiac recovery time and heart rate measured at the previous vet gate) and categorical variables (age and/or event distance) were computed to estimate the probability of elimination. The predictive logistic regressions for vet gates 2 to 5 correctly classified between 62% and 86% of the eliminated horses. The robustness of these results was confirmed by high areas under the receiving operating characteristic curves (0.68–0.84). Overall, a horse has a 70% chance of being eliminated at the next gate if its cardiac recovery time is longer than 11 min at vet gate 1 or 2, or longer than 13 min at vet gates 3 or 4. Heart rate recovery and average speed variables measured at the previous vet gate(s) enabled us to predict elimination at the following vet gate. These variables should be checked at each veterinary examination, in order to detect unfit horses as early as possible. Our predictive method may help to improve equine welfare and ethical considerations in endurance events. PMID:26322506
Younes, Mohamed; Robert, Céline; Cottin, François; Barrey, Eric
2015-01-01
Nearly 50% of the horses participating in endurance events are eliminated at a veterinary examination (a vet gate). Detecting unfit horses before a health problem occurs and treatment is required is a challenge for veterinarians but is essential for improving equine welfare. We hypothesized that it would be possible to detect unfit horses earlier in the event by measuring heart rate recovery variables. Hence, the objective of the present study was to compute logistic regressions of heart rate, cardiac recovery time and average speed data recorded at the previous vet gate (n-1) and thus predict the probability of elimination during successive phases (n and following) in endurance events. Speed and heart rate data were extracted from an electronic database of endurance events (80-160 km in length) organized in four countries. Overall, 39% of the horses that started an event were eliminated--mostly due to lameness (64%) or metabolic disorders (15%). For each vet gate, logistic regressions of explanatory variables (average speed, cardiac recovery time and heart rate measured at the previous vet gate) and categorical variables (age and/or event distance) were computed to estimate the probability of elimination. The predictive logistic regressions for vet gates 2 to 5 correctly classified between 62% and 86% of the eliminated horses. The robustness of these results was confirmed by high areas under the receiving operating characteristic curves (0.68-0.84). Overall, a horse has a 70% chance of being eliminated at the next gate if its cardiac recovery time is longer than 11 min at vet gate 1 or 2, or longer than 13 min at vet gates 3 or 4. Heart rate recovery and average speed variables measured at the previous vet gate(s) enabled us to predict elimination at the following vet gate. These variables should be checked at each veterinary examination, in order to detect unfit horses as early as possible. Our predictive method may help to improve equine welfare and ethical considerations in endurance events.
Factors influencing hospital high length of stay outliers
2012-01-01
Background The study of length of stay (LOS) outliers is important for the management and financing of hospitals. Our aim was to study variables associated with high LOS outliers and their evolution over time. Methods We used hospital administrative data from inpatient episodes in public acute care hospitals in the Portuguese National Health Service (NHS), with discharges between years 2000 and 2009, together with some hospital characteristics. The dependent variable, LOS outliers, was calculated for each diagnosis related group (DRG) using a trim point defined for each year by the geometric mean plus two standard deviations. Hospitals were classified on the basis of administrative, economic and teaching characteristics. We also studied the influence of comorbidities and readmissions. Logistic regression models, including a multivariable logistic regression, were used in the analysis. All the logistic regressions were fitted using generalized estimating equations (GEE). Results In near nine million inpatient episodes analysed we found a proportion of 3.9% high LOS outliers, accounting for 19.2% of total inpatient days. The number of hospital patient discharges increased between years 2000 and 2005 and slightly decreased after that. The proportion of outliers ranged between the lowest value of 3.6% (in years 2001 and 2002) and the highest value of 4.3% in 2009. Teaching hospitals with over 1,000 beds have significantly more outliers than other hospitals, even after adjustment to readmissions and several patient characteristics. Conclusions In the last years both average LOS and high LOS outliers are increasing in Portuguese NHS hospitals. As high LOS outliers represent an important proportion in the total inpatient days, this should be seen as an important alert for the management of hospitals and for national health policies. As expected, age, type of admission, and hospital type were significantly associated with high LOS outliers. The proportion of high outliers does not seem to be related to their financial coverage; they should be studied in order to highlight areas for further investigation. The increasing complexity of both hospitals and patients may be the single most important determinant of high LOS outliers and must therefore be taken into account by health managers when considering hospital costs. PMID:22906386
NASA Astrophysics Data System (ADS)
Wu, W.; Chen, G. Y.; Kang, R.; Xia, J. C.; Huang, Y. P.; Chen, K. J.
2017-07-01
During slaughtering and further processing, chicken carcasses are inevitably contaminated by microbial pathogen contaminants. Due to food safety concerns, many countries implement a zero-tolerance policy that forbids the placement of visibly contaminated carcasses in ice-water chiller tanks during processing. Manual detection of contaminants is labor consuming and imprecise. Here, a successive projections algorithm (SPA)-multivariable linear regression (MLR) classifier based on an optimal performance threshold was developed for automatic detection of contaminants on chicken carcasses. Hyperspectral images were obtained using a hyperspectral imaging system. A regression model of the classifier was established by MLR based on twelve characteristic wavelengths (505, 537, 561, 562, 564, 575, 604, 627, 656, 665, 670, and 689 nm) selected by SPA , and the optimal threshold T = 1 was obtained from the receiver operating characteristic (ROC) analysis. The SPA-MLR classifier provided the best detection results when compared with the SPA-partial least squares (PLS) regression classifier and the SPA-least squares supported vector machine (LS-SVM) classifier. The true positive rate (TPR) of 100% and the false positive rate (FPR) of 0.392% indicate that the SPA-MLR classifier can utilize spatial and spectral information to effectively detect contaminants on chicken carcasses.
MODELING SNAKE MICROHABITAT FROM RADIOTELEMETRY STUDIES USING POLYTOMOUS LOGISTIC REGRESSION
Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...
Weissinger, E M; Metzger, J; Dobbelstein, C; Wolff, D; Schleuning, M; Kuzmina, Z; Greinix, H; Dickinson, A M; Mullen, W; Kreipe, H; Hamwi, I; Morgan, M; Krons, A; Tchebotarenko, I; Ihlenburg-Schwarz, D; Dammann, E; Collin, M; Ehrlich, S; Diedrich, H; Stadler, M; Eder, M; Holler, E; Mischak, H; Krauter, J; Ganser, A
2014-04-01
Allogeneic hematopoietic stem cell transplantation is one curative treatment for hematological malignancies, but is compromised by life-threatening complications, such as severe acute graft-versus-host disease (aGvHD). Prediction of severe aGvHD as early as possible is crucial to allow timely initiation of treatment. Here we report on a multicentre validation of an aGvHD-specific urinary proteomic classifier (aGvHD_MS17) in 423 patients. Samples (n=1106) were collected prospectively between day +7 and day +130 and analyzed using capillary electrophoresis coupled on-line to mass spectrometry. Integration of aGvHD_MS17 analysis with demographic and clinical variables using a logistic regression model led to correct classification of patients developing severe aGvHD 14 days before any clinical signs with 82.4% sensitivity and 77.3% specificity. Multivariate regression analysis showed that aGvHD_MS17 positivity was the only strong predictor for aGvHD grade III or IV (P<0.0001). The classifier consists of 17 peptides derived from albumin, β2-microglobulin, CD99, fibronectin and various collagen α-chains, indicating inflammation, activation of T cells and changes in the extracellular matrix as early signs of GvHD-induced organ damage. This study is currently the largest demonstration of accurate and investigator-independent prediction of patients at risk for severe aGvHD, thus allowing preemptive therapy based on proteomic profiling.
Predicting introductory programming performance: A multi-institutional multivariate study
NASA Astrophysics Data System (ADS)
Bergin, Susan; Reilly, Ronan
2006-12-01
A model for predicting student performance on introductory programming modules is presented. The model uses attributes identified in a study carried out at four third-level institutions in the Republic of Ireland. Four instruments were used to collect the data and over 25 attributes were examined. A data reduction technique was applied and a logistic regression model using 10-fold stratified cross validation was developed. The model used three attributes: Leaving Certificate Mathematics result (final mathematics examination at second level), number of hours playing computer games while taking the module and programming self-esteem. Prediction success was significant with 80% of students correctly classified. The model also works well on a per-institution level. A discussion on the implications of the model is provided and future work is outlined.
Automated flare forecasting using a statistical learning technique
NASA Astrophysics Data System (ADS)
Yuan, Yuan; Shih, Frank Y.; Jing, Ju; Wang, Hai-Min
2010-08-01
We present a new method for automatically forecasting the occurrence of solar flares based on photospheric magnetic measurements. The method is a cascading combination of an ordinal logistic regression model and a support vector machine classifier. The predictive variables are three photospheric magnetic parameters, i.e., the total unsigned magnetic flux, length of the strong-gradient magnetic polarity inversion line, and total magnetic energy dissipation. The output is true or false for the occurrence of a certain level of flares within 24 hours. Experimental results, from a sample of 230 active regions between 1996 and 2005, show the accuracies of a 24-hour flare forecast to be 0.86, 0.72, 0.65 and 0.84 respectively for the four different levels. Comparison shows an improvement in the accuracy of X-class flare forecasting.
Are High-Lethality Suicide Attempters With Bipolar Disorder a Distinct Phenotype?
Oquendo, Maria A.; Carballo, Juan Jose; Rajouria, Namita; Currier, Dianne; Tin, Adrienne; Merville, Jessica; Galfalvy, Hanga C.; Sher, Leo; Grunebaum, Michael F.; Burke, Ainsley K.; Mann, J. John
2013-01-01
Because Bipolar Disorder (BD) individuals making highly lethal suicide attempts have greater injury burden and risk for suicide, early identification is critical. BD patients were classified as high- or low-lethality attempters. High-lethality attempts required inpatient medical treatment. Mixed effects logistic regression models and permutation analyses examined correlations between lethality, number, and order of attempts. High-lethality attempters reported greater suicidal intent and more previous attempts. Multiple attempters showed no pattern of incremental lethality increase with subsequent attempts, but individuals with early high-lethality attempts more often made high-lethality attempts later. A subset of high-lethality attempters make only high-lethality attempts. However, presence of previous low-lethality attempts does not indicate that risk for more lethal, possibly successful, attempts is reduced. PMID:19590998
Zhan, L.; Liu, Y.; Zhou, J.; Ye, J.; Thompson, P.M.
2015-01-01
Mild cognitive impairment (MCI) is an intermediate stage between normal aging and Alzheimer's disease (AD), and around 10-15% of people with MCI develop AD each year. More recently, MCI has been further subdivided into early and late stages, and there is interest in identifying sensitive brain imaging biomarkers that help to differentiate stages of MCI. Here, we focused on anatomical brain networks computed from diffusion MRI and proposed a new feature extraction and classification framework based on higher order singular value decomposition and sparse logistic regression. In tests on publicly available data from the Alzheimer's Disease Neuroimaging Initiative, our proposed framework showed promise in detecting brain network differences that help in classifying early versus late MCI. PMID:26413202
Brenn, T; Arnesen, E
1985-01-01
For comparative evaluation, discriminant analysis, logistic regression and Cox's model were used to select risk factors for total and coronary deaths among 6595 men aged 20-49 followed for 9 years. Groups with mortality between 5 and 93 per 1000 were considered. Discriminant analysis selected variable sets only marginally different from the logistic and Cox methods which always selected the same sets. A time-saving option, offered for both the logistic and Cox selection, showed no advantage compared with discriminant analysis. Analysing more than 3800 subjects, the logistic and Cox methods consumed, respectively, 80 and 10 times more computer time than discriminant analysis. When including the same set of variables in non-stepwise analyses, all methods estimated coefficients that in most cases were almost identical. In conclusion, discriminant analysis is advocated for preliminary or stepwise analysis, otherwise Cox's method should be used.
ERIC Educational Resources Information Center
DeMars, Christine E.
2009-01-01
The Mantel-Haenszel (MH) and logistic regression (LR) differential item functioning (DIF) procedures have inflated Type I error rates when there are large mean group differences, short tests, and large sample sizes.When there are large group differences in mean score, groups matched on the observed number-correct score differ on true score,…
Satellite rainfall retrieval by logistic regression
NASA Technical Reports Server (NTRS)
Chiu, Long S.
1986-01-01
The potential use of logistic regression in rainfall estimation from satellite measurements is investigated. Satellite measurements provide covariate information in terms of radiances from different remote sensors.The logistic regression technique can effectively accommodate many covariates and test their significance in the estimation. The outcome from the logistical model is the probability that the rainrate of a satellite pixel is above a certain threshold. By varying the thresholds, a rainrate histogram can be obtained, from which the mean and the variant can be estimated. A logistical model is developed and applied to rainfall data collected during GATE, using as covariates the fractional rain area and a radiance measurement which is deduced from a microwave temperature-rainrate relation. It is demonstrated that the fractional rain area is an important covariate in the model, consistent with the use of the so-called Area Time Integral in estimating total rain volume in other studies. To calibrate the logistical model, simulated rain fields generated by rainfield models with prescribed parameters are needed. A stringent test of the logistical model is its ability to recover the prescribed parameters of simulated rain fields. A rain field simulation model which preserves the fractional rain area and lognormality of rainrates as found in GATE is developed. A stochastic regression model of branching and immigration whose solutions are lognormally distributed in some asymptotic limits has also been developed.
Ameye, Lieveke; Fischerova, Daniela; Epstein, Elisabeth; Melis, Gian Benedetto; Guerriero, Stefano; Van Holsbeke, Caroline; Savelli, Luca; Fruscio, Robert; Lissoni, Andrea Alberto; Testa, Antonia Carla; Veldman, Joan; Vergote, Ignace; Van Huffel, Sabine; Bourne, Tom; Valentin, Lil
2010-01-01
Objectives To prospectively assess the diagnostic performance of simple ultrasound rules to predict benignity/malignancy in an adnexal mass and to test the performance of the risk of malignancy index, two logistic regression models, and subjective assessment of ultrasonic findings by an experienced ultrasound examiner in adnexal masses for which the simple rules yield an inconclusive result. Design Prospective temporal and external validation of simple ultrasound rules to distinguish benign from malignant adnexal masses. The rules comprised five ultrasonic features (including shape, size, solidity, and results of colour Doppler examination) to predict a malignant tumour (M features) and five to predict a benign tumour (B features). If one or more M features were present in the absence of a B feature, the mass was classified as malignant. If one or more B features were present in the absence of an M feature, it was classified as benign. If both M features and B features were present, or if none of the features was present, the simple rules were inconclusive. Setting 19 ultrasound centres in eight countries. Participants 1938 women with an adnexal mass examined with ultrasound by the principal investigator at each centre with a standardised research protocol. Reference standard Histological classification of the excised adnexal mass as benign or malignant. Main outcome measures Diagnostic sensitivity and specificity. Results Of the 1938 patients with an adnexal mass, 1396 (72%) had benign tumours, 373 (19.2%) had primary invasive tumours, 111 (5.7%) had borderline malignant tumours, and 58 (3%) had metastatic tumours in the ovary. The simple rules yielded a conclusive result in 1501 (77%) masses, for which they resulted in a sensitivity of 92% (95% confidence interval 89% to 94%) and a specificity of 96% (94% to 97%). The corresponding sensitivity and specificity of subjective assessment were 91% (88% to 94%) and 96% (94% to 97%). In the 357 masses for which the simple rules yielded an inconclusive result and with available results of CA-125 measurements, the sensitivities were 89% (83% to 93%) for subjective assessment, 50% (42% to 58%) for the risk of malignancy index, 89% (83% to 93%) for logistic regression model 1, and 82% (75% to 87%) for logistic regression model 2; the corresponding specificities were 78% (72% to 83%), 84% (78% to 88%), 44% (38% to 51%), and 48% (42% to 55%). Use of the simple rules as a triage test and subjective assessment for those masses for which the simple rules yielded an inconclusive result gave a sensitivity of 91% (88% to 93%) and a specificity of 93% (91% to 94%), compared with a sensitivity of 90% (88% to 93%) and a specificity of 93% (91% to 94%) when subjective assessment was used in all masses. Conclusions The use of the simple rules has the potential to improve the management of women with adnexal masses. In adnexal masses for which the rules yielded an inconclusive result, subjective assessment of ultrasonic findings by an experienced ultrasound examiner was the most accurate diagnostic test; the risk of malignancy index and the two regression models were not useful. PMID:21156740
NASA Astrophysics Data System (ADS)
He, Ting; Fan, Ming; Zhang, Peng; Li, Hui; Zhang, Juan; Shao, Guoliang; Li, Lihua
2018-03-01
Breast cancer can be classified into four molecular subtypes of Luminal A, Luminal B, HER2 and Basal-like, which have significant differences in treatment and survival outcomes. We in this study aim to predict immunohistochemistry (IHC) determined molecular subtypes of breast cancer using image features derived from tumor and peritumoral stroma region based on diffusion weighted imaging (DWI). A dataset of 126 breast cancer patients were collected who underwent preoperative breast MRI with a 3T scanner. The apparent diffusion coefficients (ADCs) were recorded from DWI, and breast image was segmented into regions comprising the tumor and the surrounding stromal. Statistical characteristics in various breast tumor and peritumoral regions were computed, including mean, minimum, maximum, variance, interquartile range, range, skewness, and kurtosis of ADC values. Additionally, the difference of features between each two regions were also calculated. The univariate logistic based classifier was performed for evaluating the performance of the individual features for discriminating subtypes. For multi-class classification, multivariate logistic regression model was trained and validated. The results showed that the tumor boundary and proximal peritumoral stroma region derived features have a higher performance in classification compared to that of the other regions. Furthermore, the prediction model using statistical features, difference features and all the features combined from these regions generated AUC values of 0.774, 0.796 and 0.811, respectively. The results in this study indicate that ADC feature in tumor and peritumoral stromal region would be valuable for estimating the molecular subtype in breast cancer.
Practical Session: Logistic Regression
NASA Astrophysics Data System (ADS)
Clausel, M.; Grégoire, G.
2014-12-01
An exercise is proposed to illustrate the logistic regression. One investigates the different risk factors in the apparition of coronary heart disease. It has been proposed in Chapter 5 of the book of D.G. Kleinbaum and M. Klein, "Logistic Regression", Statistics for Biology and Health, Springer Science Business Media, LLC (2010) and also by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr341.pdf). This example is based on data given in the file evans.txt coming from http://www.sph.emory.edu/dkleinb/logreg3.htm#data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd
Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test ofmore » the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.« less
NASA Astrophysics Data System (ADS)
Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd; Baharum, Adam
2015-10-01
Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test of the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.
The cross-validated AUC for MCP-logistic regression with high-dimensional data.
Jiang, Dingfeng; Huang, Jian; Zhang, Ying
2013-10-01
We propose a cross-validated area under the receiving operator characteristic (ROC) curve (CV-AUC) criterion for tuning parameter selection for penalized methods in sparse, high-dimensional logistic regression models. We use this criterion in combination with the minimax concave penalty (MCP) method for variable selection. The CV-AUC criterion is specifically designed for optimizing the classification performance for binary outcome data. To implement the proposed approach, we derive an efficient coordinate descent algorithm to compute the MCP-logistic regression solution surface. Simulation studies are conducted to evaluate the finite sample performance of the proposed method and its comparison with the existing methods including the Akaike information criterion (AIC), Bayesian information criterion (BIC) or Extended BIC (EBIC). The model selected based on the CV-AUC criterion tends to have a larger predictive AUC and smaller classification error than those with tuning parameters selected using the AIC, BIC or EBIC. We illustrate the application of the MCP-logistic regression with the CV-AUC criterion on three microarray datasets from the studies that attempt to identify genes related to cancers. Our simulation studies and data examples demonstrate that the CV-AUC is an attractive method for tuning parameter selection for penalized methods in high-dimensional logistic regression models.
Heart rate variability (HRV): an indicator of stress
NASA Astrophysics Data System (ADS)
Kaur, Balvinder; Durek, Joseph J.; O'Kane, Barbara L.; Tran, Nhien; Moses, Sophia; Luthra, Megha; Ikonomidou, Vasiliki N.
2014-05-01
Heart rate variability (HRV) can be an important indicator of several conditions that affect the autonomic nervous system, including traumatic brain injury, post-traumatic stress disorder and peripheral neuropathy [3], [4], [10] & [11]. Recent work has shown that some of the HRV features can potentially be used for distinguishing a subject's normal mental state from a stressed one [4], [13] & [14]. In all of these past works, although processing is done in both frequency and time domains, few classification algorithms have been explored for classifying normal from stressed RRintervals. In this paper we used 30 s intervals from the Electrocardiogram (ECG) time series collected during normal and stressed conditions, produced by means of a modified version of the Trier social stress test, to compute HRV-driven features and subsequently applied a set of classification algorithms to distinguish stressed from normal conditions. To classify RR-intervals, we explored classification algorithms that are commonly used for medical applications, namely 1) logistic regression (LR) [16] and 2) linear discriminant analysis (LDA) [6]. Classification performance for various levels of stress over the entire test was quantified using precision, accuracy, sensitivity and specificity measures. Results from both classifiers were then compared to find an optimal classifier and HRV features for stress detection. This work, performed under an IRB-approved protocol, not only provides a method for developing models and classifiers based on human data, but also provides a foundation for a stress indicator tool based on HRV. Further, these classification tools will not only benefit many civilian applications for detecting stress, but also security and military applications for screening such as: border patrol, stress detection for deception [3],[17], and wounded-warrior triage [12].
The value of nodal information in predicting lung cancer relapse using 4DPET/4DCT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Heyse, E-mail: heyse.li@mail.utoronto.ca; Becker, Nathan; Raman, Srinivas
2015-08-15
Purpose: There is evidence that computed tomography (CT) and positron emission tomography (PET) imaging metrics are prognostic and predictive in nonsmall cell lung cancer (NSCLC) treatment outcomes. However, few studies have explored the use of standardized uptake value (SUV)-based image features of nodal regions as predictive features. The authors investigated and compared the use of tumor and node image features extracted from the radiotherapy target volumes to predict relapse in a cohort of NSCLC patients undergoing chemoradiation treatment. Methods: A prospective cohort of 25 patients with locally advanced NSCLC underwent 4DPET/4DCT imaging for radiation planning. Thirty-seven image features were derivedmore » from the CT-defined volumes and SUVs of the PET image from both the tumor and nodal target regions. The machine learning methods of logistic regression and repeated stratified five-fold cross-validation (CV) were used to predict local and overall relapses in 2 yr. The authors used well-known feature selection methods (Spearman’s rank correlation, recursive feature elimination) within each fold of CV. Classifiers were ranked on their Matthew’s correlation coefficient (MCC) after CV. Area under the curve, sensitivity, and specificity values are also presented. Results: For predicting local relapse, the best classifier found had a mean MCC of 0.07 and was composed of eight tumor features. For predicting overall relapse, the best classifier found had a mean MCC of 0.29 and was composed of a single feature: the volume greater than 0.5 times the maximum SUV (N). Conclusions: The best classifier for predicting local relapse had only tumor features. In contrast, the best classifier for predicting overall relapse included a node feature. Overall, the methods showed that nodes add value in predicting overall relapse but not local relapse.« less
Vaeth, Michael; Skovlund, Eva
2004-06-15
For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.
Kesselmeier, Miriam; Lorenzo Bermejo, Justo
2017-11-01
Logistic regression is the most common technique used for genetic case-control association studies. A disadvantage of standard maximum likelihood estimators of the genotype relative risk (GRR) is their strong dependence on outlier subjects, for example, patients diagnosed at unusually young age. Robust methods are available to constrain outlier influence, but they are scarcely used in genetic studies. This article provides a non-intimidating introduction to robust logistic regression, and investigates its benefits and limitations in genetic association studies. We applied the bounded Huber and extended the R package 'robustbase' with the re-descending Hampel functions to down-weight outlier influence. Computer simulations were carried out to assess the type I error rate, mean squared error (MSE) and statistical power according to major characteristics of the genetic study and investigated markers. Simulations were complemented with the analysis of real data. Both standard and robust estimation controlled type I error rates. Standard logistic regression showed the highest power but standard GRR estimates also showed the largest bias and MSE, in particular for associated rare and recessive variants. For illustration, a recessive variant with a true GRR=6.32 and a minor allele frequency=0.05 investigated in a 1000 case/1000 control study by standard logistic regression resulted in power=0.60 and MSE=16.5. The corresponding figures for Huber-based estimation were power=0.51 and MSE=0.53. Overall, Hampel- and Huber-based GRR estimates did not differ much. Robust logistic regression may represent a valuable alternative to standard maximum likelihood estimation when the focus lies on risk prediction rather than identification of susceptibility variants. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Sampson, Maureen L; Gounden, Verena; van Deventer, Hendrik E; Remaley, Alan T
2016-02-01
The main drawback of the periodic analysis of quality control (QC) material is that test performance is not monitored in time periods between QC analyses, potentially leading to the reporting of faulty test results. The objective of this study was to develop a patient based QC procedure for the more timely detection of test errors. Results from a Chem-14 panel measured on the Beckman LX20 analyzer were used to develop the model. Each test result was predicted from the other 13 members of the panel by multiple regression, which resulted in correlation coefficients between the predicted and measured result of >0.7 for 8 of the 14 tests. A logistic regression model, which utilized the measured test result, the predicted test result, the day of the week and time of day, was then developed for predicting test errors. The output of the logistic regression was tallied by a daily CUSUM approach and used to predict test errors, with a fixed specificity of 90%. The mean average run length (ARL) before error detection by CUSUM-Logistic Regression (CSLR) was 20 with a mean sensitivity of 97%, which was considerably shorter than the mean ARL of 53 (sensitivity 87.5%) for a simple prediction model that only used the measured result for error detection. A CUSUM-Logistic Regression analysis of patient laboratory data can be an effective approach for the rapid and sensitive detection of clinical laboratory errors. Published by Elsevier Inc.
Brown, H.E.; Yates, K.F.; Dietrich, G.; MacMillan, K.; Graham, C.B.; Reese, S.M.; Helterbrand, Wm. S.; Nicholson, W.L.; Blount, K.; Mead, P.S.; Patrick, S.L.; Eisen, R.J.
2011-01-01
In the United States, tickborne diseases occur focally. Missouri represents a major focus of several tickborne diseases that includes spotted fever rickettsiosis, tularemia, and ehrlichiosis. Our study sought to determine the potential risk of human exposure to human-biting vector ticks in this area. We collected ticks in 79 sites in southern Missouri during June 7-10, 2009, which yielded 1,047 adult and 3,585 nymphal Amblyomma americanum, 5 adult Amblyomma maculatum, 19 adult Dermacentor variabilis, and 5 nymphal Ixodes brunneus. Logistic regression analysis showed that areas posing an elevated risk of exposure to A. americanum nymphs or adults were more likely to be classified as forested than grassland, and the probability of being classified as elevated risk increased with increasing relative humidity during the month of June (30-year average). Overall accuracy of each of the two models was greater than 70% and showed that 20% and 30% of the state were classified as elevated risk for human exposure to nymphs and adults, respectively. We also found a significant positive association between heightened acarologic risk and counties reporting tularemia cases. Our study provides an updated distribution map for A. americanum in Missouri and suggests a wide-spread risk of human exposure to A. americanum and their associated pathogens in this region. Copyright ?? 2011 by The American Society of Tropical Medicine and Hygiene.
Ghimire, Mamata; Ayer, Rakesh; Kondo, Masahide
2018-02-14
Nepal has committed to the global community to achieve universal health coverage by 2030. Nevertheless, Nepal still has a high proportion of out-of-pocket health payment and a limited risk-pooling mechanism. Out-of-pocket payment for the healthcare services could result in catastrophic health expenditure (CHE). Evidence is required to effectively channel the efforts to lower those expenses in order to achieve universal health coverage. However, little is known about CHE and its determinants in a broad national context in Nepal. Therefore, this study was conducted to explore the cumulative incidence, distribution, and determinants of CHE in Nepal. Data were obtained from the nationally representative survey, the Nepal Living Standards Survey-third undertaken in 2010/11. Information from 5988 households was used for the analyses. Households were classified as having CHE when their out-of-pocket health payment was greater than or equal to 40% of their capacity to pay. Remaining households were classified as not having CHE. Logistic regression analyses were used to identify determinants of CHE. Based on household-weighted sample, the cumulative incidence of CHE was 10.3% per month in Nepal. This incidence was concentrated in the far-western region and households in the poorer expenditure quartiles. Multivariable logistic regression revealed that households were more likely to face CHE if they; consisted of chronically ill member(s), have a higher burden of acute illness and injuries, have elderly (≥60 years) member(s), belonged to the poor expenditure quartile, and were located in the far-western region. In contrast, households were less likely to incur CHE when their household head was educated. Having children (≤5 years) in households did not significantly affect catastrophic health expenditure. This study identified a high cumulative incidence of CHE. CHE was disproportionately concentrated in the poor households and households located in the far-western region. Policy-makers should focus on prioritizing households vulnerable to CHE. Interventions to reduce economic burden of out-of-pocket healthcare payment are imperative to lower incidences of CHE among those households. Improving literacy rate might also be useful in order to lower CHE and facilitate universal health coverage.
Nonconvex Sparse Logistic Regression With Weakly Convex Regularization
NASA Astrophysics Data System (ADS)
Shen, Xinyue; Gu, Yuantao
2018-06-01
In this work we propose to fit a sparse logistic regression model by a weakly convex regularized nonconvex optimization problem. The idea is based on the finding that a weakly convex function as an approximation of the $\\ell_0$ pseudo norm is able to better induce sparsity than the commonly used $\\ell_1$ norm. For a class of weakly convex sparsity inducing functions, we prove the nonconvexity of the corresponding sparse logistic regression problem, and study its local optimality conditions and the choice of the regularization parameter to exclude trivial solutions. Despite the nonconvexity, a method based on proximal gradient descent is used to solve the general weakly convex sparse logistic regression, and its convergence behavior is studied theoretically. Then the general framework is applied to a specific weakly convex function, and a necessary and sufficient local optimality condition is provided. The solution method is instantiated in this case as an iterative firm-shrinkage algorithm, and its effectiveness is demonstrated in numerical experiments by both randomly generated and real datasets.
Campos-Filho, N; Franco, E L
1989-02-01
A frequent procedure in matched case-control studies is to report results from the multivariate unmatched analyses if they do not differ substantially from the ones obtained after conditioning on the matching variables. Although conceptually simple, this rule requires that an extensive series of logistic regression models be evaluated by both the conditional and unconditional maximum likelihood methods. Most computer programs for logistic regression employ only one maximum likelihood method, which requires that the analyses be performed in separate steps. This paper describes a Pascal microcomputer (IBM PC) program that performs multiple logistic regression by both maximum likelihood estimation methods, which obviates the need for switching between programs to obtain relative risk estimates from both matched and unmatched analyses. The program calculates most standard statistics and allows factoring of categorical or continuous variables by two distinct methods of contrast. A built-in, descriptive statistics option allows the user to inspect the distribution of cases and controls across categories of any given variable.
Comparison of cranial sex determination by discriminant analysis and logistic regression.
Amores-Ampuero, Anabel; Alemán, Inmaculada
2016-04-05
Various methods have been proposed for estimating dimorphism. The objective of this study was to compare sex determination results from cranial measurements using discriminant analysis or logistic regression. The study sample comprised 130 individuals (70 males) of known sex, age, and cause of death from San José cemetery in Granada (Spain). Measurements of 19 neurocranial dimensions and 11 splanchnocranial dimensions were subjected to discriminant analysis and logistic regression, and the percentages of correct classification were compared between the sex functions obtained with each method. The discriminant capacity of the selected variables was evaluated with a cross-validation procedure. The percentage accuracy with discriminant analysis was 78.2% for the neurocranium (82.4% in females and 74.6% in males) and 73.7% for the splanchnocranium (79.6% in females and 68.8% in males). These percentages were higher with logistic regression analysis: 85.7% for the neurocranium (in both sexes) and 94.1% for the splanchnocranium (100% in females and 91.7% in males).
Hill, Andrew; Loh, Po-Ru; Bharadwaj, Ragu B.; Pons, Pascal; Shang, Jingbo; Guinan, Eva; Lakhani, Karim; Kilty, Iain
2017-01-01
Abstract Background: The association of differing genotypes with disease-related phenotypic traits offers great potential to both help identify new therapeutic targets and support stratification of patients who would gain the greatest benefit from specific drug classes. Development of low-cost genotyping and sequencing has made collecting large-scale genotyping data routine in population and therapeutic intervention studies. In addition, a range of new technologies is being used to capture numerous new and complex phenotypic descriptors. As a result, genotype and phenotype datasets have grown exponentially. Genome-wide association studies associate genotypes and phenotypes using methods such as logistic regression. As existing tools for association analysis limit the efficiency by which value can be extracted from increasing volumes of data, there is a pressing need for new software tools that can accelerate association analyses on large genotype-phenotype datasets. Results: Using open innovation (OI) and contest-based crowdsourcing, the logistic regression analysis in a leading, community-standard genetics software package (PLINK 1.07) was substantially accelerated. OI allowed us to do this in <6 months by providing rapid access to highly skilled programmers with specialized, difficult-to-find skill sets. Through a crowd-based contest a combination of computational, numeric, and algorithmic approaches was identified that accelerated the logistic regression in PLINK 1.07 by 18- to 45-fold. Combining contest-derived logistic regression code with coarse-grained parallelization, multithreading, and associated changes to data initialization code further developed through distributed innovation, we achieved an end-to-end speedup of 591-fold for a data set size of 6678 subjects by 645 863 variants, compared to PLINK 1.07's logistic regression. This represents a reduction in run time from 4.8 hours to 29 seconds. Accelerated logistic regression code developed in this project has been incorporated into the PLINK2 project. Conclusions: Using iterative competition-based OI, we have developed a new, faster implementation of logistic regression for genome-wide association studies analysis. We present lessons learned and recommendations on running a successful OI process for bioinformatics. PMID:28327993
Hill, Andrew; Loh, Po-Ru; Bharadwaj, Ragu B; Pons, Pascal; Shang, Jingbo; Guinan, Eva; Lakhani, Karim; Kilty, Iain; Jelinsky, Scott A
2017-05-01
The association of differing genotypes with disease-related phenotypic traits offers great potential to both help identify new therapeutic targets and support stratification of patients who would gain the greatest benefit from specific drug classes. Development of low-cost genotyping and sequencing has made collecting large-scale genotyping data routine in population and therapeutic intervention studies. In addition, a range of new technologies is being used to capture numerous new and complex phenotypic descriptors. As a result, genotype and phenotype datasets have grown exponentially. Genome-wide association studies associate genotypes and phenotypes using methods such as logistic regression. As existing tools for association analysis limit the efficiency by which value can be extracted from increasing volumes of data, there is a pressing need for new software tools that can accelerate association analyses on large genotype-phenotype datasets. Using open innovation (OI) and contest-based crowdsourcing, the logistic regression analysis in a leading, community-standard genetics software package (PLINK 1.07) was substantially accelerated. OI allowed us to do this in <6 months by providing rapid access to highly skilled programmers with specialized, difficult-to-find skill sets. Through a crowd-based contest a combination of computational, numeric, and algorithmic approaches was identified that accelerated the logistic regression in PLINK 1.07 by 18- to 45-fold. Combining contest-derived logistic regression code with coarse-grained parallelization, multithreading, and associated changes to data initialization code further developed through distributed innovation, we achieved an end-to-end speedup of 591-fold for a data set size of 6678 subjects by 645 863 variants, compared to PLINK 1.07's logistic regression. This represents a reduction in run time from 4.8 hours to 29 seconds. Accelerated logistic regression code developed in this project has been incorporated into the PLINK2 project. Using iterative competition-based OI, we have developed a new, faster implementation of logistic regression for genome-wide association studies analysis. We present lessons learned and recommendations on running a successful OI process for bioinformatics. © The Author 2017. Published by Oxford University Press.
Lin, Chao-Cheng; Bai, Ya-Mei; Chen, Jen-Yeu; Hwang, Tzung-Jeng; Chen, Tzu-Ting; Chiu, Hung-Wen; Li, Yu-Chuan
2010-03-01
Metabolic syndrome (MetS) is an important side effect of second-generation antipsychotics (SGAs). However, many SGA-treated patients with MetS remain undetected. In this study, we trained and validated artificial neural network (ANN) and multiple logistic regression models without biochemical parameters to rapidly identify MetS in patients with SGA treatment. A total of 383 patients with a diagnosis of schizophrenia or schizoaffective disorder (DSM-IV criteria) with SGA treatment for more than 6 months were investigated to determine whether they met the MetS criteria according to the International Diabetes Federation. The data for these patients were collected between March 2005 and September 2005. The input variables of ANN and logistic regression were limited to demographic and anthropometric data only. All models were trained by randomly selecting two-thirds of the patient data and were internally validated with the remaining one-third of the data. The models were then externally validated with data from 69 patients from another hospital, collected between March 2008 and June 2008. The area under the receiver operating characteristic curve (AUC) was used to measure the performance of all models. Both the final ANN and logistic regression models had high accuracy (88.3% vs 83.6%), sensitivity (93.1% vs 86.2%), and specificity (86.9% vs 83.8%) to identify MetS in the internal validation set. The mean +/- SD AUC was high for both the ANN and logistic regression models (0.934 +/- 0.033 vs 0.922 +/- 0.035, P = .63). During external validation, high AUC was still obtained for both models. Waist circumference and diastolic blood pressure were the common variables that were left in the final ANN and logistic regression models. Our study developed accurate ANN and logistic regression models to detect MetS in patients with SGA treatment. The models are likely to provide a noninvasive tool for large-scale screening of MetS in this group of patients. (c) 2010 Physicians Postgraduate Press, Inc.
Bayesian logistic regression in detection of gene-steroid interaction for cancer at PDLIM5 locus.
Wang, Ke-Sheng; Owusu, Daniel; Pan, Yue; Xie, Changchun
2016-06-01
The PDZ and LIM domain 5 (PDLIM5) gene may play a role in cancer, bipolar disorder, major depression, alcohol dependence and schizophrenia; however, little is known about the interaction effect of steroid and PDLIM5 gene on cancer. This study examined 47 single-nucleotide polymorphisms (SNPs) within the PDLIM5 gene in the Marshfield sample with 716 cancer patients (any diagnosed cancer, excluding minor skin cancer) and 2848 noncancer controls. Multiple logistic regression model in PLINK software was used to examine the association of each SNP with cancer. Bayesian logistic regression in PROC GENMOD in SAS statistical software, ver. 9.4 was used to detect gene- steroid interactions influencing cancer. Single marker analysis using PLINK identified 12 SNPs associated with cancer (P< 0.05); especially, SNP rs6532496 revealed the strongest association with cancer (P = 6.84 × 10⁻³); while the next best signal was rs951613 (P = 7.46 × 10⁻³). Classic logistic regression in PROC GENMOD showed that both rs6532496 and rs951613 revealed strong gene-steroid interaction effects (OR=2.18, 95% CI=1.31-3.63 with P = 2.9 × 10⁻³ for rs6532496 and OR=2.07, 95% CI=1.24-3.45 with P = 5.43 × 10⁻³ for rs951613, respectively). Results from Bayesian logistic regression showed stronger interaction effects (OR=2.26, 95% CI=1.2-3.38 for rs6532496 and OR=2.14, 95% CI=1.14-3.2 for rs951613, respectively). All the 12 SNPs associated with cancer revealed significant gene-steroid interaction effects (P < 0.05); whereas 13 SNPs showed gene-steroid interaction effects without main effect on cancer. SNP rs4634230 revealed the strongest gene-steroid interaction effect (OR=2.49, 95% CI=1.5-4.13 with P = 4.0 × 10⁻⁴ based on the classic logistic regression and OR=2.59, 95% CI=1.4-3.97 from Bayesian logistic regression; respectively). This study provides evidence of common genetic variants within the PDLIM5 gene and interactions between PLDIM5 gene polymorphisms and steroid use influencing cancer.
Deletion Diagnostics for Alternating Logistic Regressions
Preisser, John S.; By, Kunthel; Perin, Jamie; Qaqish, Bahjat F.
2013-01-01
Deletion diagnostics are introduced for the regression analysis of clustered binary outcomes estimated with alternating logistic regressions, an implementation of generalized estimating equations (GEE) that estimates regression coefficients in a marginal mean model and in a model for the intracluster association given by the log odds ratio. The diagnostics are developed within an estimating equations framework that recasts the estimating functions for association parameters based upon conditional residuals into equivalent functions based upon marginal residuals. Extensions of earlier work on GEE diagnostics follow directly, including computational formulae for one-step deletion diagnostics that measure the influence of a cluster of observations on the estimated regression parameters and on the overall marginal mean or association model fit. The diagnostic formulae are evaluated with simulations studies and with an application concerning an assessment of factors associated with health maintenance visits in primary care medical practices. The application and the simulations demonstrate that the proposed cluster-deletion diagnostics for alternating logistic regressions are good approximations of their exact fully iterated counterparts. PMID:22777960
NASA Astrophysics Data System (ADS)
Pham, Binh Thai; Prakash, Indra; Tien Bui, Dieu
2018-02-01
A hybrid machine learning approach of Random Subspace (RSS) and Classification And Regression Trees (CART) is proposed to develop a model named RSSCART for spatial prediction of landslides. This model is a combination of the RSS method which is known as an efficient ensemble technique and the CART which is a state of the art classifier. The Luc Yen district of Yen Bai province, a prominent landslide prone area of Viet Nam, was selected for the model development. Performance of the RSSCART model was evaluated through the Receiver Operating Characteristic (ROC) curve, statistical analysis methods, and the Chi Square test. Results were compared with other benchmark landslide models namely Support Vector Machines (SVM), single CART, Naïve Bayes Trees (NBT), and Logistic Regression (LR). In the development of model, ten important landslide affecting factors related with geomorphology, geology and geo-environment were considered namely slope angles, elevation, slope aspect, curvature, lithology, distance to faults, distance to rivers, distance to roads, and rainfall. Performance of the RSSCART model (AUC = 0.841) is the best compared with other popular landslide models namely SVM (0.835), single CART (0.822), NBT (0.821), and LR (0.723). These results indicate that performance of the RSSCART is a promising method for spatial landslide prediction.
Knol, Mirjam J; van der Tweel, Ingeborg; Grobbee, Diederick E; Numans, Mattijs E; Geerlings, Mirjam I
2007-10-01
To determine the presence of interaction in epidemiologic research, typically a product term is added to the regression model. In linear regression, the regression coefficient of the product term reflects interaction as departure from additivity. However, in logistic regression it refers to interaction as departure from multiplicativity. Rothman has argued that interaction estimated as departure from additivity better reflects biologic interaction. So far, literature on estimating interaction on an additive scale using logistic regression only focused on dichotomous determinants. The objective of the present study was to provide the methods to estimate interaction between continuous determinants and to illustrate these methods with a clinical example. and results From the existing literature we derived the formulas to quantify interaction as departure from additivity between one continuous and one dichotomous determinant and between two continuous determinants using logistic regression. Bootstrapping was used to calculate the corresponding confidence intervals. To illustrate the theory with an empirical example, data from the Utrecht Health Project were used, with age and body mass index as risk factors for elevated diastolic blood pressure. The methods and formulas presented in this article are intended to assist epidemiologists to calculate interaction on an additive scale between two variables on a certain outcome. The proposed methods are included in a spreadsheet which is freely available at: http://www.juliuscenter.nl/additive-interaction.xls.
ERIC Educational Resources Information Center
Osborne, Jason W.
2012-01-01
Logistic regression is slowly gaining acceptance in the social sciences, and fills an important niche in the researcher's toolkit: being able to predict important outcomes that are not continuous in nature. While OLS regression is a valuable tool, it cannot routinely be used to predict outcomes that are binary or categorical in nature. These…
Multi-feature classifiers for burst detection in single EEG channels from preterm infants
NASA Astrophysics Data System (ADS)
Navarro, X.; Porée, F.; Kuchenbuch, M.; Chavez, M.; Beuchée, Alain; Carrault, G.
2017-08-01
Objective. The study of electroencephalographic (EEG) bursts in preterm infants provides valuable information about maturation or prognostication after perinatal asphyxia. Over the last two decades, a number of works proposed algorithms to automatically detect EEG bursts in preterm infants, but they were designed for populations under 35 weeks of post menstrual age (PMA). However, as the brain activity evolves rapidly during postnatal life, these solutions might be under-performing with increasing PMA. In this work we focused on preterm infants reaching term ages (PMA ⩾36 weeks) using multi-feature classification on a single EEG channel. Approach. Five EEG burst detectors relying on different machine learning approaches were compared: logistic regression (LR), linear discriminant analysis (LDA), k-nearest neighbors (kNN), support vector machines (SVM) and thresholding (Th). Classifiers were trained by visually labeled EEG recordings from 14 very preterm infants (born after 28 weeks of gestation) with 36-41 weeks PMA. Main results. The most performing classifiers reached about 95% accuracy (kNN, SVM and LR) whereas Th obtained 84%. Compared to human-automatic agreements, LR provided the highest scores (Cohen’s kappa = 0.71) using only three EEG features. Applying this classifier in an unlabeled database of 21 infants ⩾36 weeks PMA, we found that long EEG bursts and short inter-burst periods are characteristic of infants with the highest PMA and weights. Significance. In view of these results, LR-based burst detection could be a suitable tool to study maturation in monitoring or portable devices using a single EEG channel.
A sampling bias in identifying children in foster care using Medicaid data.
Rubin, David M; Pati, Susmita; Luan, Xianqun; Alessandrini, Evaline A
2005-01-01
Prior research identified foster care children using Medicaid eligibility codes specific to foster care, but it is unknown whether these codes capture all foster care children. To describe the sampling bias in relying on Medicaid eligibility codes to identify foster care children. Using foster care administrative files linked to Medicaid data, we describe the proportion of children whose Medicaid eligibility was correctly encoded as foster child during a 1-year follow-up period following a new episode of foster care. Sampling bias is described by comparing claims in mental health, emergency department (ED), and other ambulatory settings among correctly and incorrectly classified foster care children. Twenty-eight percent of the 5683 sampled children were incorrectly classified in Medicaid eligibility files. In a multivariate logistic regression model, correct classification was associated with duration of foster care (>9 vs <2 months, odds ratio [OR] 7.67, 95% confidence interval [CI] 7.17-7.97), number of placements (>3 vs 1 placement, OR 4.20, 95% CI 3.14-5.64), and placement in a group home among adjudicated dependent children (OR 1.87, 95% CI 1.33-2.63). Compared with incorrectly classified children, correctly classified foster care children were 3 times more likely to use any services, 2 times more likely to visit the ED, 3 times more likely to make ambulatory visits, and 4 times more likely to use mental health care services (P < .001 for all comparisons). Identifying children in foster care using Medicaid eligibility files is prone to sampling bias that over-represents children in foster care who use more services.
Larrañaga, Ana; Bielza, Concha; Pongrácz, Péter; Faragó, Tamás; Bálint, Anna; Larrañaga, Pedro
2015-03-01
Barking is perhaps the most characteristic form of vocalization in dogs; however, very little is known about its role in the intraspecific communication of this species. Besides the obvious need for ethological research, both in the field and in the laboratory, the possible information content of barks can also be explored by computerized acoustic analyses. This study compares four different supervised learning methods (naive Bayes, classification trees, [Formula: see text]-nearest neighbors and logistic regression) combined with three strategies for selecting variables (all variables, filter and wrapper feature subset selections) to classify Mudi dogs by sex, age, context and individual from their barks. The classification accuracy of the models obtained was estimated by means of [Formula: see text]-fold cross-validation. Percentages of correct classifications were 85.13 % for determining sex, 80.25 % for predicting age (recodified as young, adult and old), 55.50 % for classifying contexts (seven situations) and 67.63 % for recognizing individuals (8 dogs), so the results are encouraging. The best-performing method was [Formula: see text]-nearest neighbors following a wrapper feature selection approach. The results for classifying contexts and recognizing individual dogs were better with this method than they were for other approaches reported in the specialized literature. This is the first time that the sex and age of domestic dogs have been predicted with the help of sound analysis. This study shows that dog barks carry ample information regarding the caller's indexical features. Our computerized analysis provides indirect proof that barks may serve as an important source of information for dogs as well.
ANALYSIS OF SAMPLING TECHNIQUES FOR IMBALANCED DATA: AN N=648 ADNI STUDY
Dubey, Rashmi; Zhou, Jiayu; Wang, Yalin; Thompson, Paul M.; Ye, Jieping
2013-01-01
Many neuroimaging applications deal with imbalanced imaging data. For example, in Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, the mild cognitive impairment (MCI) cases eligible for the study are nearly two times the Alzheimer’s disease (AD) patients for structural magnetic resonance imaging (MRI) modality and six times the control cases for proteomics modality. Constructing an accurate classifier from imbalanced data is a challenging task. Traditional classifiers that aim to maximize the overall prediction accuracy tend to classify all data into the majority class. In this paper, we study an ensemble system of feature selection and data sampling for the class imbalance problem. We systematically analyze various sampling techniques by examining the efficacy of different rates and types of undersampling, oversampling, and a combination of over and under sampling approaches. We thoroughly examine six widely used feature selection algorithms to identify significant biomarkers and thereby reduce the complexity of the data. The efficacy of the ensemble techniques is evaluated using two different classifiers including Random Forest and Support Vector Machines based on classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity, and specificity measures. Our extensive experimental results show that for various problem settings in ADNI, (1). a balanced training set obtained with K-Medoids technique based undersampling gives the best overall performance among different data sampling techniques and no sampling approach; and (2). sparse logistic regression with stability selection achieves competitive performance among various feature selection algorithms. Comprehensive experiments with various settings show that our proposed ensemble model of multiple undersampled datasets yields stable and promising results. PMID:24176869
Analysis of sampling techniques for imbalanced data: An n = 648 ADNI study.
Dubey, Rashmi; Zhou, Jiayu; Wang, Yalin; Thompson, Paul M; Ye, Jieping
2014-02-15
Many neuroimaging applications deal with imbalanced imaging data. For example, in Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, the mild cognitive impairment (MCI) cases eligible for the study are nearly two times the Alzheimer's disease (AD) patients for structural magnetic resonance imaging (MRI) modality and six times the control cases for proteomics modality. Constructing an accurate classifier from imbalanced data is a challenging task. Traditional classifiers that aim to maximize the overall prediction accuracy tend to classify all data into the majority class. In this paper, we study an ensemble system of feature selection and data sampling for the class imbalance problem. We systematically analyze various sampling techniques by examining the efficacy of different rates and types of undersampling, oversampling, and a combination of over and undersampling approaches. We thoroughly examine six widely used feature selection algorithms to identify significant biomarkers and thereby reduce the complexity of the data. The efficacy of the ensemble techniques is evaluated using two different classifiers including Random Forest and Support Vector Machines based on classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity, and specificity measures. Our extensive experimental results show that for various problem settings in ADNI, (1) a balanced training set obtained with K-Medoids technique based undersampling gives the best overall performance among different data sampling techniques and no sampling approach; and (2) sparse logistic regression with stability selection achieves competitive performance among various feature selection algorithms. Comprehensive experiments with various settings show that our proposed ensemble model of multiple undersampled datasets yields stable and promising results. © 2013 Elsevier Inc. All rights reserved.
Intermediate and advanced topics in multilevel logistic regression analysis
Merlo, Juan
2017-01-01
Multilevel data occur frequently in health services, population and public health, and epidemiologic research. In such research, binary outcomes are common. Multilevel logistic regression models allow one to account for the clustering of subjects within clusters of higher‐level units when estimating the effect of subject and cluster characteristics on subject outcomes. A search of the PubMed database demonstrated that the use of multilevel or hierarchical regression models is increasing rapidly. However, our impression is that many analysts simply use multilevel regression models to account for the nuisance of within‐cluster homogeneity that is induced by clustering. In this article, we describe a suite of analyses that can complement the fitting of multilevel logistic regression models. These ancillary analyses permit analysts to estimate the marginal or population‐average effect of covariates measured at the subject and cluster level, in contrast to the within‐cluster or cluster‐specific effects arising from the original multilevel logistic regression model. We describe the interval odds ratio and the proportion of opposed odds ratios, which are summary measures of effect for cluster‐level covariates. We describe the variance partition coefficient and the median odds ratio which are measures of components of variance and heterogeneity in outcomes. These measures allow one to quantify the magnitude of the general contextual effect. We describe an R 2 measure that allows analysts to quantify the proportion of variation explained by different multilevel logistic regression models. We illustrate the application and interpretation of these measures by analyzing mortality in patients hospitalized with a diagnosis of acute myocardial infarction. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:28543517
Intermediate and advanced topics in multilevel logistic regression analysis.
Austin, Peter C; Merlo, Juan
2017-09-10
Multilevel data occur frequently in health services, population and public health, and epidemiologic research. In such research, binary outcomes are common. Multilevel logistic regression models allow one to account for the clustering of subjects within clusters of higher-level units when estimating the effect of subject and cluster characteristics on subject outcomes. A search of the PubMed database demonstrated that the use of multilevel or hierarchical regression models is increasing rapidly. However, our impression is that many analysts simply use multilevel regression models to account for the nuisance of within-cluster homogeneity that is induced by clustering. In this article, we describe a suite of analyses that can complement the fitting of multilevel logistic regression models. These ancillary analyses permit analysts to estimate the marginal or population-average effect of covariates measured at the subject and cluster level, in contrast to the within-cluster or cluster-specific effects arising from the original multilevel logistic regression model. We describe the interval odds ratio and the proportion of opposed odds ratios, which are summary measures of effect for cluster-level covariates. We describe the variance partition coefficient and the median odds ratio which are measures of components of variance and heterogeneity in outcomes. These measures allow one to quantify the magnitude of the general contextual effect. We describe an R 2 measure that allows analysts to quantify the proportion of variation explained by different multilevel logistic regression models. We illustrate the application and interpretation of these measures by analyzing mortality in patients hospitalized with a diagnosis of acute myocardial infarction. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
Siuly; Yin, Xiaoxia; Hadjiloucas, Sillas; Zhang, Yanchun
2016-04-01
This work provides a performance comparison of four different machine learning classifiers: multinomial logistic regression with ridge estimators (MLR) classifier, k-nearest neighbours (KNN), support vector machine (SVM) and naïve Bayes (NB) as applied to terahertz (THz) transient time domain sequences associated with pixelated images of different powder samples. The six substances considered, although have similar optical properties, their complex insertion loss at the THz part of the spectrum is significantly different because of differences in both their frequency dependent THz extinction coefficient as well as differences in their refractive index and scattering properties. As scattering can be unquantifiable in many spectroscopic experiments, classification solely on differences in complex insertion loss can be inconclusive. The problem is addressed using two-dimensional (2-D) cross-correlations between background and sample interferograms, these ensure good noise suppression of the datasets and provide a range of statistical features that are subsequently used as inputs to the above classifiers. A cross-validation procedure is adopted to assess the performance of the classifiers. Firstly the measurements related to samples that had thicknesses of 2mm were classified, then samples at thicknesses of 4mm, and after that 3mm were classified and the success rate and consistency of each classifier was recorded. In addition, mixtures having thicknesses of 2 and 4mm as well as mixtures of 2, 3 and 4mm were presented simultaneously to all classifiers. This approach provided further cross-validation of the classification consistency of each algorithm. The results confirm the superiority in classification accuracy and robustness of the MLR (least accuracy 88.24%) and KNN (least accuracy 90.19%) algorithms which consistently outperformed the SVM (least accuracy 74.51%) and NB (least accuracy 56.86%) classifiers for the same number of feature vectors across all studies. The work establishes a general methodology for assessing the performance of other hyperspectral dataset classifiers on the basis of 2-D cross-correlations in far-infrared spectroscopy or other parts of the electromagnetic spectrum. It also advances the wider proliferation of automated THz imaging systems across new application areas e.g., biomedical imaging, industrial processing and quality control where interpretation of hyperspectral images is still under development. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Predicting Social Trust with Binary Logistic Regression
ERIC Educational Resources Information Center
Adwere-Boamah, Joseph; Hufstedler, Shirley
2015-01-01
This study used binary logistic regression to predict social trust with five demographic variables from a national sample of adult individuals who participated in The General Social Survey (GSS) in 2012. The five predictor variables were respondents' highest degree earned, race, sex, general happiness and the importance of personally assisting…
Effect of folic acid on appetite in children: ordinal logistic and fuzzy logistic regressions.
Namdari, Mahshid; Abadi, Alireza; Taheri, S Mahmoud; Rezaei, Mansour; Kalantari, Naser; Omidvar, Nasrin
2014-03-01
Reduced appetite and low food intake are often a concern in preschool children, since it can lead to malnutrition, a leading cause of impaired growth and mortality in childhood. It is occasionally considered that folic acid has a positive effect on appetite enhancement and consequently growth in children. The aim of this study was to assess the effect of folic acid on the appetite of preschool children 3 to 6 y old. The study sample included 127 children ages 3 to 6 who were randomly selected from 20 preschools in the city of Tehran in 2011. Since appetite was measured by linguistic terms, a fuzzy logistic regression was applied for modeling. The obtained results were compared with a statistical ordinal logistic model. After controlling for the potential confounders, in a statistical ordinal logistic model, serum folate showed a significantly positive effect on appetite. A small but positive effect of folate was detected by fuzzy logistic regression. Based on fuzzy regression, the risk for poor appetite in preschool children was related to the employment status of their mothers. In this study, a positive association was detected between the levels of serum folate and improved appetite. For further investigation, a randomized controlled, double-blind clinical trial could be helpful to address causality. Copyright © 2014 Elsevier Inc. All rights reserved.
Variable Selection for Road Segmentation in Aerial Images
NASA Astrophysics Data System (ADS)
Warnke, S.; Bulatov, D.
2017-05-01
For extraction of road pixels from combined image and elevation data, Wegner et al. (2015) proposed classification of superpixels into road and non-road, after which a refinement of the classification results using minimum cost paths and non-local optimization methods took place. We believed that the variable set used for classification was to a certain extent suboptimal, because many variables were redundant while several features known as useful in Photogrammetry and Remote Sensing are missed. This motivated us to implement a variable selection approach which builds a model for classification using portions of training data and subsets of features, evaluates this model, updates the feature set, and terminates when a stopping criterion is satisfied. The choice of classifier is flexible; however, we tested the approach with Logistic Regression and Random Forests, and taylored the evaluation module to the chosen classifier. To guarantee a fair comparison, we kept the segment-based approach and most of the variables from the related work, but we extended them by additional, mostly higher-level features. Applying these superior features, removing the redundant ones, as well as using more accurately acquired 3D data allowed to keep stable or even to reduce the misclassification error in a challenging dataset.
NASA Astrophysics Data System (ADS)
Pham, Tuan D.; Watanabe, Yuzuru; Higuchi, Mitsunori; Suzuki, Hiroyuki
2017-02-01
Texture analysis of computed tomography (CT) imaging has been found useful to distinguish subtle differences, which are in- visible to human eyes, between malignant and benign tissues in cancer patients. This study implemented two complementary methods of texture analysis, known as the gray-level co-occurrence matrix (GLCM) and the experimental semivariogram (SV) with an aim to improve the predictive value of evaluating mediastinal lymph nodes in lung cancer. The GLCM was explored with the use of a rich set of its derived features, whereas the SV feature was extracted on real and synthesized CT samples of benign and malignant lymph nodes. A distinct advantage of the computer methodology presented herein is the alleviation of the need for an automated precise segmentation of the lymph nodes. Using the logistic regression model, a sensitivity of 75%, specificity of 90%, and area under curve of 0.89 were obtained in the test population. A tenfold cross-validation of 70% accuracy of classifying between benign and malignant lymph nodes was obtained using the support vector machines as a pattern classifier. These results are higher than those recently reported in literature with similar studies.
Law, Tameeka L; Katikaneni, Lakshmi D; Taylor, Sarah N; Korte, Jeffrey E; Ebeling, Myla D; Wagner, Carol L; Newman, Roger B
2012-07-01
Compare customized versus population-based growth curves for identification of small-for-gestational-age (SGA) and body fat percent (BF%) among preterm infants. Prospective cohort study of 204 preterm infants classified as SGA or appropriate-for-gestational-age (AGA) by population-based and customized growth curves. BF% was determined by air-displacement plethysmography. Differences between groups were compared using bivariable and multivariable linear and logistic regression analyses. Customized curves reclassified 30% of the preterm infants as SGA. SGA infants identified by customized method only had significantly lower BF% (13.8 ± 6.0) than the AGA (16.2 ± 6.3, p = 0.02) infants and similar to the SGA infants classified by both methods (14.6 ± 6.7, p = 0.51). Customized growth curves were a significant predictor of BF% (p = 0.02), whereas population-based growth curves were not a significant independent predictor of BF% (p = 0.50) at term corrected gestational age. Customized growth potential improves the differentiation of SGA infants and low BF% compared with a standard population-based growth curve among a cohort of preterm infants.
The application of cat swarm optimisation algorithm in classifying small loan performance
NASA Astrophysics Data System (ADS)
Kencana, Eka N.; Kiswanti, Nyoman; Sari, Kartika
2017-10-01
It is common for banking system to analyse the feasibility of credit application before its approval. Although this process has been carefully done, there is no warranty that all credits will be repaid smoothly. This study aimed to know the accuracy of Cat Swarm Optimisation (CSO) algorithm in classifying small loans’ performance that is approved by Bank Rakyat Indonesia (BRI), one of several public banks in Indonesia. Data collected from 200 lenders were used in this work. The data matrix consists of 9 independent variables that represent profile of the credit, and one categorical dependent variable reflects credit’s performance. Prior to the analyses, data was divided into two data subset with equal size. Ordinal logistic regression (OLR) procedure is applied for the first subset and gave 3 out of 9 independent variables i.e. the amount of credit, credit’s period, and income per month of lender proved significantly affect credit performance. By using significantly estimated parameters from OLR procedure as the initial values for observations at the second subset, CSO procedure started. This procedure gave 76 percent of classification accuracy of credit performance, slightly better compared to 64 percent resulted from OLR procedure.
Cough sound analysis - a new tool for diagnosing pneumonia.
Abeyratne, U R; Swarnkar, V; Triasih, Rina; Setyati, Amalia
2013-01-01
Pneumonia kills over 1,800,000 children annually throughout the world. Prompt diagnosis and proper treatment are essential to prevent these unnecessary deaths. Reliable diagnosis of childhood pneumonia in remote regions is fraught with difficulties arising from the lack of field-deployable imaging and laboratory facilities as well as the scarcity of trained community healthcare workers. In this paper, we present a pioneering class of enabling technology addressing both of these problems. Our approach is centered on automated analysis of cough and respiratory sounds, collected via microphones that do not require physical contact with subjects. We collected cough sounds from 91 patients suspected of acute respiratory illness such as pneumonia, bronchiolitis and asthma. We extracted mathematical features from cough sounds and used them to train a Logistic Regression classifier. We used the clinical diagnosis provided by the paediatric respiratory clinician as the gold standard to train and validate our classifier against. The methods proposed in this paper could separate pneumonia from other diseases at a sensitivity and specificity of 94% and 75% respectively, based on parameters extracted from cough sounds alone. Our method has the potential to revolutionize the management of childhood pneumonia in remote regions of the world.
Automatic Gleason grading of prostate cancer using quantitative phase imaging and machine learning
NASA Astrophysics Data System (ADS)
Nguyen, Tan H.; Sridharan, Shamira; Macias, Virgilia; Kajdacsy-Balla, Andre; Melamed, Jonathan; Do, Minh N.; Popescu, Gabriel
2017-03-01
We present an approach for automatic diagnosis of tissue biopsies. Our methodology consists of a quantitative phase imaging tissue scanner and machine learning algorithms to process these data. We illustrate the performance by automatic Gleason grading of prostate specimens. The imaging system operates on the principle of interferometry and, as a result, reports on the nanoscale architecture of the unlabeled specimen. We use these data to train a random forest classifier to learn textural behaviors of prostate samples and classify each pixel in the image into different classes. Automatic diagnosis results were computed from the segmented regions. By combining morphological features with quantitative information from the glands and stroma, logistic regression was used to discriminate regions with Gleason grade 3 versus grade 4 cancer in prostatectomy tissue. The overall accuracy of this classification derived from a receiver operating curve was 82%, which is in the range of human error when interobserver variability is considered. We anticipate that our approach will provide a clinically objective and quantitative metric for Gleason grading, allowing us to corroborate results across instruments and laboratories and feed the computer algorithms for improved accuracy.
Lacherez, Philippe; Wood, Joanne M; Anstey, Kaarin J; Lord, Stephen R
2014-02-01
To establish whether sensorimotor function and balance are associated with on-road driving performance in older adults. The performance of 270 community-living adults aged 70-88 years recruited via the electoral roll was measured on a battery of peripheral sensation, strength, flexibility, reaction time, and balance tests and on a standardized measure of on-road driving performance. Forty-seven participants (17.4%) were classified as unsafe based on their driving assessment. Unsafe driving was associated with reduced peripheral sensation, lower limb weakness, reduced neck range of motion, slow reaction time, and poor balance in univariate analyses. Multivariate logistic regression analysis identified poor vibration sensitivity, reduced quadriceps strength, and increased sway on a foam surface with eyes closed as significant and independent risk factors for unsafe driving. These variables classified participants into safe and unsafe drivers with a sensitivity of 74% and specificity of 70%. A number of sensorimotor and balance measures were associated with driver safety and the multivariate model comprising measures of sensation, strength, and balance was highly predictive of unsafe driving in this sample. These findings highlight important determinants of driver safety and may assist in developing efficacious driver safety strategies for older drivers.
Guerra, Jorge; Uddin, Jasim; Nilsen, Dawn; Mclnerney, James; Fadoo, Ammarah; Omofuma, Isirame B.; Hughes, Shatif; Agrawal, Sunil; Allen, Peter; Schambra, Heidi M.
2017-01-01
There currently exist no practical tools to identify functional movements in the upper extremities (UEs). This absence has limited the precise therapeutic dosing of patients recovering from stroke. In this proof-of-principle study, we aimed to develop an accurate approach for classifying UE functional movement primitives, which comprise functional movements. Data were generated from inertial measurement units (IMUs) placed on upper body segments of older healthy individuals and chronic stroke patients. Subjects performed activities commonly trained during rehabilitation after stroke. Data processing involved the use of a sliding window to obtain statistical descriptors, and resulting features were processed by a Hidden Markov Model (HMM). The likelihoods of the states, resulting from the HMM, were segmented by a second sliding window and their averages were calculated. The final predictions were mapped to human functional movement primitives using a Logistic Regression algorithm. Algorithm performance was assessed with a leave-one-out analysis, which determined its sensitivity, specificity, and positive and negative predictive values for all classified primitives. In healthy control and stroke participants, our approach identified functional movement primitives embedded in training activities with, on average, 80% precision. This approach may support functional movement dosing in stroke rehabilitation. PMID:28813877
Syringe Sharing in Drug Injecting Dyads: A Cross-Classified Multilevel Analysis of Social Networks.
Shahesmaeili, Armita; Mirzazadeh, Ali; McFarland, Willi; Sharifi, Hamid; Haghdoost, Ali Akbar; Soori, Hamid
2018-05-15
We examined the association of dyadic-level factors with syringe sharing among people who inject drugs (PWID) in Kerman, Iran. In a cross-sectional study, we collected data on 329 drug-injecting dyads by individual face-to-face interviews. An injecting dyad was defined as 2 PWID who knew each other and injected drugs together during the last 6 months. If they reported at least 1 occasion of syringe sharing, the dyad was considered high-risk. Dyadic-level factors associated with syringe sharing were assessed using cross-classified multilevel logistic regression. The rate of syringe sharing was significantly higher for dyads who were more intimate (adjusted odds ratio [AOR] 4.5, CI 95%, 2.3-8.6), who had instrumental support (AOR 2.1, 95% CI 1.1-4.5), and who pooled money for drugs (AOR 4.1, 95% CI 2.0-8.3). The rate was lower in same-sex dyads (AOR 0.4, 95% CI 0.2-0.9) and in dyads who shared health information (AOR 0.5, 95% CI 0.2-0.9). Findings highlight close-peer influences on syringe-sharing behavior.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Band, P.; Feldstein, M.; Saccomanno, G.
To assess the effect of cigarette smoking and of exposure to radon daughters, a prospective survey consisting of periodic sputum cytology evaluation was initiated among 249 underground uranium miners and 123 male controls. Sputum cytology specimens showing moderate atypia, marked atypia, or cancer cells were classified as abnormal. As compared to control smokers, miners who smoke had a significantly higher incidence of abnormal cytology (P = 0.025). For miner smokers, the observed frequencies of abnormal cytology were linearly related to cumulative exposure to radon daughters and to the number of years of uranium mining. A statistical model relating the probabilitymore » of abnormal cytology to the risk factors was investigated using a binary logistic regression. The estimated frequency of abnormal cytology was significantly dependent, for controls, on the duration of cigarette smoking, and for miners, on the duration of cigarette smoking and of uranium mining.« less
Antin, Jonathan F.; Stanley, Laura M.; Guo, Feng
2011-01-01
The purpose of this research effort was to compare older driver and non-driver functional impairment profiles across some 60 assessment metrics in an initial effort to contribute to the development of fitness-to-drive assessment models. Of the metrics evaluated, 21 showed statistically significant differences, almost all favoring the drivers. Also, it was shown that a logistic regression model comprised of five of the assessment scores could completely and accurately separate the two groups. The results of this study imply that older drivers are far less functionally impaired than non-drivers of similar ages, and that a parsimonious model can accurately assign individuals to either group. With such models, any driver classified or diagnosed as a non-driver would be a strong candidate for further investigation and intervention. PMID:22058607
A random forest model based classification scheme for neonatal amplitude-integrated EEG.
Chen, Weiting; Wang, Yu; Cao, Guitao; Chen, Guoqiang; Gu, Qiufang
2014-01-01
Modern medical advances have greatly increased the survival rate of infants, while they remain in the higher risk group for neurological problems later in life. For the infants with encephalopathy or seizures, identification of the extent of brain injury is clinically challenging. Continuous amplitude-integrated electroencephalography (aEEG) monitoring offers a possibility to directly monitor the brain functional state of the newborns over hours, and has seen an increasing application in neonatal intensive care units (NICUs). This paper presents a novel combined feature set of aEEG and applies random forest (RF) method to classify aEEG tracings. To that end, a series of experiments were conducted on 282 aEEG tracing cases (209 normal and 73 abnormal ones). Basic features, statistic features and segmentation features were extracted from both the tracing as a whole and the segmented recordings, and then form a combined feature set. All the features were sent to a classifier afterwards. The significance of feature, the data segmentation, the optimization of RF parameters, and the problem of imbalanced datasets were examined through experiments. Experiments were also done to evaluate the performance of RF on aEEG signal classifying, compared with several other widely used classifiers including SVM-Linear, SVM-RBF, ANN, Decision Tree (DT), Logistic Regression(LR), ML, and LDA. The combined feature set can better characterize aEEG signals, compared with basic features, statistic features and segmentation features respectively. With the combined feature set, the proposed RF-based aEEG classification system achieved a correct rate of 92.52% and a high F1-score of 95.26%. Among all of the seven classifiers examined in our work, the RF method got the highest correct rate, sensitivity, specificity, and F1-score, which means that RF outperforms all of the other classifiers considered here. The results show that the proposed RF-based aEEG classification system with the combined feature set is efficient and helpful to better detect the brain disorders in newborns.
Figueiredo, C P; Domiciano, D S; Lopes, J B; Caparbo, V F; Scazufca, M; Bonfá, E; Pereira, R M R
2014-02-01
Sarcopenia is an aging syndrome that can be characterized by many criteria adjusted or not by fat mass. This study suggested that the optimal criteria should be selected according to body mass index (BMI) in older men and identified age, BMI, race, smoking, physical activity, hip bone mineral density (BMD) as risk factors for this syndrome. This study aims to analyze the prevalence of sarcopenia and associated risk factors using appendicular skeletal mass (ASM)/height(2) and ASM adjusted for total fat mass criteria in older men from community. Three hundred ninety-nine men were included and answered a questionnaire about lifestyle and medical history. Individuals were classified by their BMI using the classification adjusted by age. Body composition and bone mineral density were measured by dual X-ray absorptiometry. Sarcopenia was classified according to both criteria. Logistic regression models were used to analyze risk factors associated with sarcopenia. The mean BMI was 26.46 kg/m(2): 12.5 % underweight, 43.6 % normal, and 43.9 % overweight/obese. Fifty-four (13.5 %) were considered sarcopenic by ASM/height(2) and 79 (19.8 %) by ASM adjusted for fat (p = 0.001). Fifty-one (12.8 %) individuals had discordant sarcopenia classification: 13 were classified only by ASM/height(2) and 38 only by ASM adjusted for fat. Of the 13 subjects classified as sarcopenic only by ASM/height(2), 84.6 % (11/13) were underweight and solely one (7.7 %) was considered overweight/obese. In contrast, of those 38 older men classified as sarcopenic only by ASM adjusted for fat, none were underweight and 53 % (20/38) were overweight/obese. Subjects classified as sarcopenic according to both criteria had the same risk factors in the final model analyses (age, BMI, race, smoking, physical activity, hip BMD; p < 0.05). This study suggested that the optimal criteria for sarcopenia should be selected according to BMI in community-dwelling older men.
On the design of classifiers for crop inventories
NASA Technical Reports Server (NTRS)
Heydorn, R. P.; Takacs, H. C.
1986-01-01
Crop proportion estimators that use classifications of satellite data to correct, in an additive way, a given estimate acquired from ground observations are discussed. A linear version of these estimators is optimal, in terms of minimum variance, when the regression of the ground observations onto the satellite observations in linear. When this regression is not linear, but the reverse regression (satellite observations onto ground observations) is linear, the estimator is suboptimal but still has certain appealing variance properties. In this paper expressions are derived for those regressions which relate the intercepts and slopes to conditional classification probabilities. These expressions are then used to discuss the question of classifier designs that can lead to low-variance crop proportion estimates. Variance expressions for these estimates in terms of classifier omission and commission errors are also derived.
Suzuki, Takashi; Motojima, Sayaka; Saito, Shu; Ishii, Takao; Ryu, Keinosuke; Ryu, Junnosuke; Tokuhashi, Yasuaki
2013-11-01
The type of osteoarthritis and the degree of severity which causes restriction of knee range of motion (ROM) is still largely unknown. The objective of this study was to analyse the location and the degree of cartilage degeneration that affect knee range of motion and the connection, if any, between femorotibial angle (FTA) and knee ROM restriction. Four hundreds and fifty-six knees in 230 subjects with knee osteoarthritis undergoing knee arthroplasty were included. Articular surface was divided into eight sections, and cartilage degeneration was evaluated macroscopically during the operation. Cartilage degeneration was classified into four grades based on the degree of exposure of subchondral bone. A Pearson correlation was conducted between FTA and knee flexion angle to determine whether high a degree of FTA caused knee flexion restriction. A logistic regression analysis was also conducted to detect the locations and levels of cartilage degeneration causing knee flexion restriction. No correlation was found between FTA and flexion angle (r = -0.08). Flexion angle was not restricted with increasing FTA. Logistic regression analysis showed significant correlation between restricted knee ROM and levels of knee cartilage degeneration in the patella (odds ratio (OR) = 1.77; P = 0.01), the lateral femoral condyle (OR = 1.62; P = 0.03) and the posterior medial femoral condyle (OR = 1.80; P = 0.03). For clinical relevance, soft tissue release and osteophyte resection around the patella, lateral femoral condyle and posterior medial femoral condyle might be indicated to obtain a higher degree of knee flexion angle.
Nutrient intake and use of dietary supplements among US adults with disabilities.
An, Ruopeng; Chiu, Chung-Yi; Andrade, Flavia
2015-04-01
Physical, mental, social, and financial hurdles in adults with disabilities may limit their access to adequate nutrition. To examine the impact of dietary supplement use on daily total nutrient intake levels among US adults 20 years and older with disabilities. Study sample came from 2007-2008 and 2009-2010 waves of the National Health and Nutrition Examination Survey, a nationally representative repeated cross-sectional survey. Disability was classified into 5 categories using standardized indices. Nutrient intakes from foods and dietary supplements were calculated from 2 nonconsecutive 24-hour dietary recalls. Two-sample proportion tests and multiple logistic regressions were used to examine the adherence rates to the recommended daily nutrient intake levels between dietary supplement users and nonusers in each disability category. The association between sociodemographic characteristics and dietary supplement use was assessed using multiple logistic regressions, accounting for complex survey design. A substantial proportion of the US adult population with disabilities failed to meet dietary guidelines, with insufficient intakes of multiple nutrients. Over half of the US adults with disabilities used dietary supplements. Dietary supplement use was associated with higher adherence rates for vitamin A, vitamin B1, vitamin B2, vitamin B6, vitamin B12, vitamin C, vitamin D, vitamin E, calcium, copper, iron, magnesium, and zinc intake among adults with disabilities. Women, non-Hispanic Whites, older age, higher education, and higher household income were found to predict dietary supplement use. Proper use of dietary supplements under the guidance of health care providers may improve the nutritional status among adults with disabilities. Copyright © 2015 Elsevier Inc. All rights reserved.
Cosmic Radiation and Cataracts in Airline Pilots
NASA Astrophysics Data System (ADS)
Rafnsson, V.; Olafsdottir, E.; Hrafnkelsson, J.; de Angelis, G.; Sasaki, H.; Arnarson, A.; Jonasson, F.
Nuclear cataracts have been associated with ionising radiation exposure in previous studies. A population based case-control study on airline pilots has been performed to investigate whether employment as a commercial pilot and consequent exposure to cosmic radiation were associated to lens opacification, when adjusted for known risk factors for cataracts. Cases of opacification of the ocular lens were found in surveys among pilots and a random sample of the Icelandic population. Altogether 445 male subjects underwent a detailed eye examination and answered a questionnaire. Information from the airline company on the 79 pilots employment time, annual hours flown per aircraft type, the timetables and the flight profiles made calculation of individual cumulated radiation dose (mSv) possible. Lens opacification were classified and graded according to WHO simplified cataracts grading system using slit lamp. The odds ratio from logistic regression of nuclear cataracts risk among cases and controls was 3.02 (95% CI 1.44 to 6.35) for pilots compared with non-pilots, adjusted for age, smoking and sunbathing habits, whereas that of cortical cataracts risk among cases and controls was lower than unity (non significant) for pilots compared with non-pilots in a logistic regression analysis adjusted for same factors. Length of employment as a pilot and cumulated radiation dose (mSv) were significantly related to the risk of nuclear cataracts. So the association between radiation exposure of pilots and the risk of nuclear cataracts, adjusted for age, smoking and sunbathing habits, indicates that cosmic radiation may be cause of nuclear cataract among commercial pilots.
Sakurai, Ryota; Kawai, Hisashi; Yoshida, Hideyo; Fukaya, Taro; Suzuki, Hiroyuki; Kim, Hunkyung; Hirano, Hirohiko; Ihara, Kazushige; Obuchi, Shuichi; Fujiwara, Yoshinori
2016-01-01
Background The health benefits of bicycling in older adults with mobility limitation (ML) are unclear. We investigated ML and functional capacity of older cyclists by evaluating their instrumental activities of daily living (IADL), intellectual activity, and social function. Methods On the basis of interviews, 614 community-dwelling older adults (after excluding 63 participants who never cycled) were classified as cyclists with ML, cyclists without ML, non-cyclists with ML (who ceased bicycling due to physical difficulties), or non-cyclists without ML (who ceased bicycling for other reasons). A cyclist was defined as a person who cycled at least a few times per month, and ML was defined as difficulty walking 1 km or climbing stairs without using a handrail. Functional capacity and physical ability were evaluated by standardized tests. Results Regular cycling was documented in 399 participants, and 74 of them (18.5%) had ML; among non-cyclists, 49 had ML, and 166 did not. Logistic regression analysis for evaluating the relationship between bicycling and functional capacity revealed that non-cyclists with ML were more likely to have reduced IADL and social function compared to cyclists with ML. However, logistic regression analysis also revealed that the risk of bicycle-related falls was significantly associated with ML among older cyclists. Conclusions The ability and opportunity to bicycle may prevent reduced IADL and social function in older adults with ML, although older adults with ML have a higher risk of falls during bicycling. It is important to develop a safe environment for bicycling for older adults. PMID:26902165
Knowledge of HIV testing and attitudes towards blood donation at three blood centres in Brazil
Miranda, C.; Moreno, E.; Bruhn, R.; Larsen, N. M.; Wright, D. J.; Oliveira, C. D. L.; Carneiro-Proietti, A. B. F.; Loureiro, P.; de Almeida-Neto, C.; Custer, B.; Sabino, E. C.; Gonçalez, T. T.
2015-01-01
Background Reducing risk of HIV window period transmission requires understanding of donor knowledge and attitudes related to HIV and risk factors. Study Design and Methods We conducted a survey of 7635 presenting blood donors at three Brazilian blood centres from 15 October through 20 November 2009. Participants completed a questionnaire on HIV knowledge and attitudes about blood donation. Six questions about blood testing and HIV were evaluated using maximum likelihood chi-square and logistic regression. Test seeking was classified in non-overlapping categories according to answers to one direct and two indirect questions. Results Overall, respondents were male (64%) repeat donors (67%) between 18 and 49 years old (91%). Nearly 60% believed blood centres use better HIV tests than other places; however, 42% were unaware of the HIV window period. Approximately 50% believed it was appropriate to donate to be tested for HIV, but 67% said it was not acceptable to donate with risk factors even if blood is tested. Logistic regression found that less education, Hemope-Recife blood centre, replacement, potential and self-disclosed test-seeking were associated with less HIV knowledge. Conclusion HIV knowledge related to blood safety remains low among Brazilian blood donors. A subset finds it appropriate to be tested at blood centres and may be unaware of the HIV window period. These donations may impose a significant risk to the safety of the blood supply. Decreasing test-seeking and changing beliefs about the appropriateness of individuals with behavioural risk factors donating blood could reduce the risk of transfusing an infectious unit. PMID:24313562
Sasang Constitution as a Risk Factor for Diabetes Mellitus: A Cross-Sectional Study
Lee, Tae-Gyu; Koh, Byunghee
2009-01-01
Sasang Constitutional Medicine, which is a branch of traditional Korean medicine, states that medications for diabetes should be individualized according to the patient's individual constitution. However, the effect of constitution on diabetes has not been evaluated to date. Therefore, this study was conducted to determine if constitution is an independent risk factor for diabetes by comparing the prevalence and odds ratios (ORs) of the disease according to constitution. The medical records of 1443 adults who had been examined and classified based on their constitution at Kyung Hee University Hospital in Seoul, Korea were reviewed. A chi-squared test and Fisher's exact test were used to compare the prevalence of diabetes according to constitution, and multiple logistic regression was used to calculate the ORs for diabetes. The prevalence of diabetes differed significantly according to constitution (χ2 = 36.20, df = 2, P < 0.001). Specifically, the prevalence of the disease was higher in Tae-eumin (11.4%) individuals than in Soyangin (5.0%) or Soeumin (1.7%) individuals. In addition, multiple logistic regression revealed that Tae-eumin individuals had a greater risk for diabetes than Soeumin individuals. When compared to Soeumin individuals, the adjusted ORs were 2.01 (95% CI 0.77–5.26) for Soyangin individuals and 3.96 (95% CI 1.48–10.60) for Tae-eumin individuals. These results show that constitution has a significant and independent association with diabetes, which suggests that constitution is an independent risk factor for diabetes that should be considered when attempting to detect and prevent the disease. PMID:19745018
Impact of cognitive function on oral perception in independently living older people.
Fukutake, Motoyoshi; Ogawa, Taiji; Ikebe, Kazunori; Mihara, Yusuke; Inomata, Chisato; Takeshita, Hajime; Matsuda, Kenichi; Hatta, Kodai; Gondo, Yasuyuki; Masui, Yukie; Inagaki, Hiroki; Arai, Yasumichi; Kamide, Kei; Ishizaki, Tatsuro; Maeda, Yoshinobu
2018-04-10
Oral tactile perception is important for better mastication, appetite, and enjoyment of food. However, previous investigations have not utilized comprehensible variables thought to have negative effect on oral perception, including aging, denture wearing, and cognitive function. The aim of this study was to elucidate the impact of cognitive function on oral perception in independently living older individuals. The study sample was comprised of 987 participants (466 males, 521 females; age 69-71 years). Oral examinations, assessments of cognitive function in preclinical level by Montreal Cognitive Assessment (MoCA)-J, and determination of oral stereognostic ability as an indicator of oral perception were performed. Related variables were selected by univariate analyses; then, multivariate logistic regression model analysis was conducted. Univariate analyses revealed that number of teeth, removable dentures usage, and cognitive function respectively had a significant relationship with stereognostic score. Next, the subjects were classified into good and poor perception groups (lowest 17.4%) according to oral stereognostic ability. Logistic regression analysis revealed that lower cognitive function was significantly associated with poor oral perception (OR = 0.934, p = 0.017) after controlling for other variables. Cognitive decline even in preclinical stage was associated with reduced oral perception after controlling for gender, tooth number and denture use in independent living older people. This study suggested that preclinical level of change in cognitive function affected oral perception. Dental practitioners and caregivers may need to pay attention to reduced oral perception among older people even if they do not have trouble in daily life.
Contreras-Manzano, Alejandra; Villalpando, Salvador; Robledo-Pérez, Ricardo
2017-01-01
To describe the prevalence of Vitamin D deficiency (VDD) and insufficiency (VDI), and the main dietary sources of vitamin D (VD) in a probabilistic sample of Mexican women at reproductive age participating in Ensanut 2012, stratified by sociodemographic factors and body mass index (BMI) categories. Serum concentrations of 25-hydroxyvitamin-D(25-OH-D) were determined using an ELISA technique in 4162 women participants of Ensanut 2012 and classified as VDD, VDI or optimal VD status. Sociodemographic, anthropometric and dietary data were also collected. The association between VDD/VDI and sociodemographic and anthropometry factors was assessed adjusting for potential confounders through an estimation of a multinomial logistic regression model. The prevalence of VDD was 36.8%, and that of VDI was 49.8%. The mean dietary intake of VD was 2.56 μg/d. The relative risk ratio (RRR) of VDD or VDI was calculated by a multinomial logistic regression model in 4162 women. The RRR of VDD or VDI were significantly higher in women with overweight (RRR: 1.85 and 1.44, p<0.05), obesity (RRR: 2.94 and 1.93, p<0.001), urban dwelling (RRR:1.68 and 1.31, p<0.06), belonging to the 3rd tertile of income (RRR: 5.32 and 2.22, p<0.001), or of indigenous ethnicity (RRR: 2.86 and 1.70, p<0.05), respectively. The high prevalence of VDD/VDI in Mexican women calls for stronger actions from the health authorities, strengthtening the actual policy of food supplementation and recommending a reasonable amount of sun exposure.
Succi, Regina C. M.; Krauss, Margot R.; Harris, D. Robert; Machado, Daisy M.; de Moraes-Pinto, Maria Isabel; Mussi-Pinhata, Marisa M.; Ruz, Noris Pavia; Pierre, Russell B.; Kolevic, Lenka; Joao, Esau; Foradori, Irene; Hazra, Rohan
2013-01-01
Background Perinatally HIV-infected children (PHIV) may be at risk of undervaccination. Vaccination coverage rates among PHIV and HIV-exposed uninfected children (HEU) in Latin America and the Caribbean were compared. Methods All PHIV and HEU children born from 2002–2007 that were enrolled in a multi-site observational study conducted in Latin America and the Caribbean were included in this analysis. Children were classified as up to date (UTD) if they had received the recommended number of doses of each vaccine at the appropriate intervals by 12 and 24 months of age. Fisher’s exact test was used to analyze the data. Covariates potentially associated with a child’s HIV status were considered in multivariable logistic regression modeling. Results Of 1156 eligible children, 768 (66.4%) were HEU and 388 (33.6%) were PHIV. HEU children were significantly (p<0.01) more likely to be UTD by 12 and 24 months of age for all vaccines examined. Statistically significant differences persisted when the analyses were limited to children enrolled prior to 12 months of age. Controlling for birth weight, sex, primary caregiver education and any use of tobacco, alcohol or illegal drugs during pregnancy did not contribute significantly to the logistic regression models. Conclusions PHIV children were significantly less likely than HEU children to be UTD for their childhood vaccinations at 12 and 24 months of age, even when limited to children enrolled before 12 months of age. Strategies to increase vaccination rates in PHIV are needed. PMID:23860480
Patil, Radhika; Uusi-Rasi, Kirsti; Kannus, Pekka; Karinkanta, Saija; Sievänen, Harri
2014-01-01
Fear of falling has been linked to activity restriction, functional decline, decreased quality of life and increased risk of falling. Factors that distinguish persons with a high concern about falling from those with low concern have not been systematically studied. This study aimed to expose potential health-related, functional and psychosocial factors that correlate with fear of falling among independently living older women who had fallen in the past year. Baseline data of 409 women aged 70-80 years recruited to a randomised falls prevention trial (DEX) (NCT00986466) were used. Participants were classified according to their level of concern about falling using the Falls Efficacy Scale International (FES-I). Multinomial logistic regression analyses were performed to explore associations between health-related variables, functional performance tests, amount of physical activity, quality of life and FES-I scores. 68% of the participants reported a moderate to high concern (FES-I ≥ 20) about falls. Multinomial logistic regression showed that highly concerned women were significantly more likely to have poorer health and quality of life and lower functional ability. Reported difficulties in instrumental activities of daily living, balance, outdoor mobility and poorer quality of life contributed independently to a greater concern about falling. Concern about falling was highly prevalent in our sample of community-living older women. In particular, poor perceived general health and mobility constraints contributed independently to the difference between high and low concern of falling. Knowledge of these associations may help in developing interventions to reduce fear of falling and activity avoidance in old age.
Ultrasonographic Diagnosis of Biliary Atresia Based on a Decision-Making Tree Model.
Lee, So Mi; Cheon, Jung-Eun; Choi, Young Hun; Kim, Woo Sun; Cho, Hyun-Hae; Cho, Hyun-Hye; Kim, In-One; You, Sun Kyoung
2015-01-01
To assess the diagnostic value of various ultrasound (US) findings and to make a decision-tree model for US diagnosis of biliary atresia (BA). From March 2008 to January 2014, the following US findings were retrospectively evaluated in 100 infants with cholestatic jaundice (BA, n = 46; non-BA, n = 54): length and morphology of the gallbladder, triangular cord thickness, hepatic artery and portal vein diameters, and visualization of the common bile duct. Logistic regression analyses were performed to determine the features that would be useful in predicting BA. Conditional inference tree analysis was used to generate a decision-making tree for classifying patients into the BA or non-BA groups. Multivariate logistic regression analysis showed that abnormal gallbladder morphology and greater triangular cord thickness were significant predictors of BA (p = 0.003 and 0.001; adjusted odds ratio: 345.6 and 65.6, respectively). In the decision-making tree using conditional inference tree analysis, gallbladder morphology and triangular cord thickness (optimal cutoff value of triangular cord thickness, 3.4 mm) were also selected as significant discriminators for differential diagnosis of BA, and gallbladder morphology was the first discriminator. The diagnostic performance of the decision-making tree was excellent, with sensitivity of 100% (46/46), specificity of 94.4% (51/54), and overall accuracy of 97% (97/100). Abnormal gallbladder morphology and greater triangular cord thickness (> 3.4 mm) were the most useful predictors of BA on US. We suggest that the gallbladder morphology should be evaluated first and that triangular cord thickness should be evaluated subsequently in cases with normal gallbladder morphology.
Muto, Satoru; Sugiura, Syo-Ichiro; Nakajima, Akiko; Horiuchi, Akira; Inoue, Masahiro; Saito, Keisuke; Isotani, Shuji; Yamaguchi, Raizo; Ide, Hisamitsu; Horie, Shigeo
2014-10-01
We aimed to identify patients with a chief complaint of hematuria who could safely avoid unnecessary radiation and instrumentation in the diagnosis of bladder cancer (BC), using automated urine flow cytometry to detect isomorphic red blood cells (RBCs) in urine. We acquired urine samples from 134 patients over the age of 35 years with a chief complaint of hematuria and a positive urine occult blood test or microhematuria. The data were analyzed using the UF-1000i (®) (Sysmex Co., Ltd., Kobe, Japan) automated urine flow cytometer to determine RBC morphology, which was classified as isomorphic or dysmorphic. The patients were divided into two groups (BC versus non-BC) for statistical analysis. Multivariate logistic regression analysis was used to determine the predictive value of flow cytometry versus urine cytology, the bladder tumor antigen test, occult blood in urine test, and microhematuria test. BC was confirmed in 26 of 134 patients (19.4 %). The area under the curve for RBC count using the automated urine flow cytometer was 0.94, representing the highest reference value obtained in this study. Isomorphic RBCs were detected in all patients in the BC group. On multivariate logistic regression analysis, only isomorphic RBC morphology was significantly predictive for BC (p < 0.001). Analytical parameters such as sensitivity, specificity, positive predictive value, and negative predictive value of isomorphic RBCs in urine were 100.0, 91.7, 74.3, and 100.0 %, respectively. Detection of urinary isomorphic RBCs using automated urine flow cytometry is a reliable method in the diagnosis of BC with hematuria.
Ultrasound based computer-aided-diagnosis of kidneys for pediatric hydronephrosis
NASA Astrophysics Data System (ADS)
Cerrolaza, Juan J.; Peters, Craig A.; Martin, Aaron D.; Myers, Emmarie; Safdar, Nabile; Linguraru, Marius G.
2014-03-01
Ultrasound is the mainstay of imaging for pediatric hydronephrosis, though its potential as diagnostic tool is limited by its subjective assessment, and lack of correlation with renal function. Therefore, all cases showing signs of hydronephrosis undergo further invasive studies, like diuretic renogram, in order to assess the actual renal function. Under the hypothesis that renal morphology is correlated with renal function, a new ultrasound based computer-aided diagnosis (CAD) tool for pediatric hydronephrosis is presented. From 2D ultrasound, a novel set of morphological features of the renal collecting systems and the parenchyma, is automatically extracted using image analysis techniques. From the original set of features, including size, geometric and curvature descriptors, a subset of ten features are selected as predictive variables, combining a feature selection technique and area under the curve filtering. Using the washout half time (T1/2) as indicative of renal obstruction, two groups are defined. Those cases whose T1/2 is above 30 minutes are considered to be severe, while the rest would be in the safety zone, where diuretic renography could be avoided. Two different classification techniques are evaluated (logistic regression, and support vector machines). Adjusting the probability decision thresholds to operate at the point of maximum sensitivity, i.e., preventing any severe case be misclassified, specificities of 53%, and 75% are achieved, for the logistic regression and the support vector machine classifier, respectively. The proposed CAD system allows to establish a link between non-invasive non-ionizing imaging techniques and renal function, limiting the need for invasive and ionizing diuretic renography.
Zhang, Xinyan; Li, Bingzong; Han, Huiying; Song, Sha; Xu, Hongxia; Hong, Yating; Yi, Nengjun; Zhuang, Wenzhuo
2018-05-10
Multiple myeloma (MM), like other cancers, is caused by the accumulation of genetic abnormalities. Heterogeneity exists in the patients' response to treatments, for example, bortezomib. This urges efforts to identify biomarkers from numerous molecular features and build predictive models for identifying patients that can benefit from a certain treatment scheme. However, previous studies treated the multi-level ordinal drug response as a binary response where only responsive and non-responsive groups are considered. It is desirable to directly analyze the multi-level drug response, rather than combining the response to two groups. In this study, we present a novel method to identify significantly associated biomarkers and then develop ordinal genomic classifier using the hierarchical ordinal logistic model. The proposed hierarchical ordinal logistic model employs the heavy-tailed Cauchy prior on the coefficients and is fitted by an efficient quasi-Newton algorithm. We apply our hierarchical ordinal regression approach to analyze two publicly available datasets for MM with five-level drug response and numerous gene expression measures. Our results show that our method is able to identify genes associated with the multi-level drug response and to generate powerful predictive models for predicting the multi-level response. The proposed method allows us to jointly fit numerous correlated predictors and thus build efficient models for predicting the multi-level drug response. The predictive model for the multi-level drug response can be more informative than the previous approaches. Thus, the proposed approach provides a powerful tool for predicting multi-level drug response and has important impact on cancer studies.
Subclinical hypothyroidism and diabetes as risk factors for postoperative stiff shoulder.
Blonna, Davide; Fissore, Francesca; Bellato, Enrico; La Malfa, Marco; Calò, Michel; Bonasia, Davide Edoardo; Rossi, Roberto; Castoldi, Filippo
2017-07-01
Postoperative stiffness can be a disabling condition after arthroscopic shoulder surgery. The purpose of this study was to analyse the potential contribution of subclinical forms of hypothyroidism and diabetes in the development of postoperative shoulder stiffness. A prospective study was conducted on 65 consecutive patients scheduled for arthroscopic subacromial decompression or rotator cuff tear repair. Patients with preoperative stiffness were excluded. Preoperative measurements of free thyroxine, free triiodothyronine, thyroid-stimulating hormone and fasting glycaemia were taken in all patients to detect subclinical forms of diabetes and hypothyroidism. A follow-up was planned at 30, 60, 90 and 180 days after surgery. According to range of motion measurements, postoperative stiffness was classified as severe or moderate at follow-up. Univariate and logistic regression analyses were performed for the assessment of risk factors for stiffness. The overall incidence of postoperative stiffness was 29 % (19/65) in our cohort. Considering only the arthroscopic rotator cuff repairs, this incidence was 23 % (7/31). A new diagnosis of subclinical forms of diabetes or hypothyroidism was made in five cases. All five of these cases developed postoperative stiffness. The logistic regression analysis demonstrated that hypothyroidism was a risk factor for severe stiffness (RR = 25; p = 0.001) and that diabetes was a risk factor for moderate stiffness (RR = 5.7; p = 0.03). The postoperative stiffness in the majority of patients can be predicted by a careful analysis of past medical history and by detecting subclinical forms of hypothyroidism and diabetes. Prognostic study, Level II.
Frequent hospital admissions in Singapore: clinical risk factors and impact of socioeconomic status.
Low, Lian Leng; Tay, Wei Yi; Ng, Matthew Joo Ming; Tan, Shu Yun; Liu, Nan; Lee, Kheng Hock
2018-01-01
Frequent admitters to hospitals are high-cost patients who strain finite healthcare resources. However, the exact risk factors for frequent admissions, which can be used to guide risk stratification and design effective interventions locally, remain unknown. Our study aimed to identify the clinical and sociodemographic risk factors associated with frequent hospital admissions in Singapore. An observational study was conducted using retrospective 2014 data from the administrative database at Singapore General Hospital, Singapore. Variables were identified a priori and included patient demographics, comorbidities, prior healthcare utilisation, and clinical and laboratory variables during the index admission. Multivariate logistic regression analysis was used to identify independent risk factors for frequent admissions. A total of 16,306 unique patients were analysed and 1,640 (10.1%) patients were classified as frequent admitters. On multivariate logistic regression, 16 variables were independently associated with frequent hospital admissions, including age, cerebrovascular disease, history of malignancy, haemoglobin, serum creatinine, serum albumin, and number of specialist outpatient clinic visits, emergency department visits, admissions preceding index admission and medications dispensed at discharge. Patients staying in public rental housing had a 30% higher risk of being a frequent admitter after adjusting for demographics and clinical conditions. Our study, the first in our knowledge to examine the clinical risk factors for frequent admissions in Singapore, validated the use of public rental housing as a sensitive indicator of area-level socioeconomic status in Singapore. These risk factors can be used to identify high-risk patients in the hospital so that they can receive interventions that reduce readmission risk. Copyright: © Singapore Medical Association
Liu, Jianhua; Zeng, Weiqiang; Huang, Chengzhi; Wang, Junjiang; Xu, Lishu; Ma, Dong
2018-05-01
The present study aimed to investigate whether c-mesenchymal epithelial transition factor (C-MET) overexpression combined with RAS (including KRAS, NRAS and HRAS ) or BRAF mutations were associated with late distant metastases and the prognosis of patients with colorectal cancer (CRC). A total of 374 patients with stage III CRC were classified into 4 groups based on RAS/BRAF and C-MET status for comprehensive analysis. Mutations in RAS / BRAF were determined using Sanger sequencing and C-MET expression was examined using immunohistochemistry. The associations between RAS/BRAF mutations in combination with C-MET overexpression and clinicopathological variables including survival were evaluated. In addition, their predictive value for late distant metastases were statistically analyzed via logistic regression and receiver operating characteristic analysis. Among 374 patients, mutations in KRAS, NRAS, HRAS, BRAF and C-MET overexpression were observed in 43.9, 2.4, 0.3, 5.9 and 71.9% of cases, respectively. Considering RAS/BRAF mutations and C-MET overexpression, vascular invasion (P=0.001), high carcino-embryonic antigen level (P=0.031) and late distant metastases (P<0.001) were more likely to occur in patients of group 4. Furthermore, survival analyses revealed RAS/BRAF mutations may have a more powerful impact on survival than C-MET overexpression, although they were both predictive factors for adverse prognosis. Further logistic regression suggested that RAS/BRAF mutations and C-MET overexpression may predict late distant metastases. In conclusion, RAS/BRAF mutations and C-MET overexpression may serve as predictive indicators for metastatic behavior and poor prognosis of CRC.
Montero, Javier; Albaladejo, Alberto; Zalba, José-Ignacio
2014-05-01
To evaluate the influence of dental visiting patterns on the dental status and Oral Health-related Quality of Life (OHQoL) of patients visiting the University Clinic of Salamanca (Spain). This cross-sectional study consisted of a clinical oral examination and a questionnaire-based interviewin a consecutive sample of patients seeking a dental examination. Patients were classified as problem-based dental attendees(PB) and regular dental attendees(RB). Clinical and OHQoL(OHIP-14 & OIDP)data were compared betweengroups. Pair-wise comparisons were performed and a Logistic Regression Model was fitted for predicting the Odds Ratio (OR) of being a PB patient. The sample was composed of 255 patients aged 18 to 87 years (mean age: 63.1 ± 12.7; women: 51.8%). The PB patients had a poorer dental status (i.e. caries, periodontal and prosthetic needs), brushed their teethless,and were significantly more impaired in their OHQoL according to both instruments.The logistic regression coefficients demonstrated that on average the OR of being a PB patient was high in this dental patient sample, but this OR increased significantly if the patient was a male (OR= 1.1-5.0) or referred pain-related impacts according to the OHIP and, additionally, the OR decreased significantly as a function of the number of healthy fillings and the number of sextants coded as CPI=0. Regular dental check-ups are associated with better dental status and a better OHQoL after controlling for potentially related confounding factors.
Association of AKI with adverse outcomes in burned military casualties.
Stewart, Ian J; Tilley, Molly A; Cotant, Casey L; Aden, James K; Gisler, Christopher; Kwan, Hana K; McCorcle, Jeffery; Renz, Evan M; Chung, Kevin K
2012-02-01
Although associated with increased morbidity and mortality, AKI has not been systematically examined in military personnel injured from combat operations in Iraq and Afghanistan. Patients evacuated from Iraq and Afghanistan to a burn unit were examined. AKI was classified by the Acute Kidney Injury Network (AKIN) and Risk-Injury-Failure-Loss-End Stage (RIFLE) schemas. Age, sex, percentage of total body surface area burned (TBSA), percentage of full-thickness burn, inhalation injury, and injury severity score were recorded. Additional data that could be associated with poor outcomes were recorded for patients with TBSA ≥20%. Multivariate logistic regression analyses were performed to determine factors associated with morbidity and mortality. AKI prevalence rates by the RIFLE and AKIN criteria were 23.8% and 29.9%, respectively. After logistic regression, RIFLE categories of risk (odds ratio [OR], 15.34; 95% confidence interval [CI], 1.75-134; P=0.01), injury (OR, 46.28; 95% CI, 5.02-427; P<0.001), and failure (OR, 126; 95% CI, 13.39->999; P<0.001); AKIN-2 (OR, 23.70; 95% CI, 2.32-242; P=0.008); and AKIN-3 (OR, 130; 95% CI, 13.38->999; P<0.001) were significantly associated with death. AKIN-3, injury, and failure remained significant in the subset of patients with ≥20% TBSA. There was also a strong interaction between TBSA and the stage of AKI with respect to ventilator and intensive care unit days. AKI is prevalent in military casualties with burn injury and is independently associated with morbidity and mortality after adjustment for factors associated with injury severity.
NASA Astrophysics Data System (ADS)
Domínguez-Cuesta, María José; Jiménez-Sánchez, Montserrat; Berrezueta, Edgar
2007-09-01
A geomorphological study focussing on slope instability and landslide susceptibility modelling was performed on a 278 km 2 area in the Nalón River Basin (Central Coalfield, NW Spain). The methodology of the study includes: 1) geomorphological mapping at both 1:5000 and 1:25,000 scales based on air-photo interpretation and field work; 2) Digital Terrain Model (DTM) creation and overlay of geomorphological and DTM layers in a Geographical Information System (GIS); and 3) statistical treatment of variables using SPSS and development of a logistic regression model. A total of 603 mass movements including earth flow and debris flow were inventoried and were classified into two groups according to their size. This study focuses on the first group with small mass movements (10 0 to 10 1 m in size), which often cause damage to infrastructures and even victims. The detected conditioning factors of these landslides are lithology (soils and colluviums), vegetation (pasture) and topography. DTM analyses show that high instabilities are linked to slopes with NE and SW orientations, curvature values between - 6 and - 0.7, and slope values from 16° to 30°. Bedrock lithology (Carboniferous sandstone and siltstone), presence of Quaternary soils and sediments, vegetation, and the topographical factors were used to develop a landslide susceptibility model using the logistic regression method. Application of "zoom method" allows us to accurately detect small mass movements using a 5-m grid cell data even if geomorphological mapping is done at a 1:25,000 scale.
Lin, Tao; Meng, Yichen; Li, Tangbo; Jiang, Heng; Gao, Rui; Zhou, Xuhui
2018-01-01
To investigate the factors associated with the recovery process of elderly patients after degenerative lumbar scoliosis surgery. A total of 213 elderly patients who had undergone surgical treatment for degenerative lumbar scoliosis from 2011 to 2015 were included retrospectively in this study. Clinical data and demographics were collected for logistic regression analysis. Among 213 eligible patients, 77 (38.5%) were classified as being in the excellent group, 70 (35%) as showing improvement, 24 (12%) as showing no change, and 29 (14.5%) as having deteriorated. At baseline, patients differed significantly from matched normative data in all Scoliosis Research Society domains. Larger differences from normative values were found for pain and activity domains. After surgery, each domain improved significantly. In the multivariate logistic regression, age 60-70 years (odds ratio [OR], 2.431; 95% confidence interval [CI], 1.143-5.174), and American Society of Anesthesiologists grade <3 (OR, 2.987; 95% CI, 1.519-5.874) may be predictive factors for a satisfying recovery, whereas presence of complications (OR, 0.342; 95% CI, 0.153-0.765), fusion to the sacrum (OR, 0.200; 95% CI, 0.076-0.523), and more osteotomies (OR, 0.360; 95% CI, 0.132-0.985) have negative effects on the recovery process. The factors that affect postoperative recovery in elderly patients with degenerative lumbar scoliosis are age, American Society of Anesthesiologists grade, distal fusion level, presence of complications, and number of osteotomies. Copyright © 2017 Elsevier Inc. All rights reserved.
Contraceptive awareness among men in Bangladesh.
Islam, Mohammad Amirul; Padmadas, Sabu S; Smith, Peter W F
2006-04-01
A considerable gap exists between contraceptive awareness and use. Traditional approaches to measuring awareness are inadequate to properly understand the linkages between awareness and use. The objective of this study was to examine the degree of men's modern contraceptive awareness in Bangladesh and the associated determinants and further testing of a hypothesis that current contraceptive use confers a high degree of method awareness. This study used the couple data set from the Bangladesh Demographic and Health Survey (1999-2000). A two-level, multinomial logistic regression was used with the degree of contraceptive awareness as the dependent variable. The degree of awareness was measured by the reported number of modern contraceptive methods known among men aged 15-59 years. Men's responses on method awareness were classified according to those reported spontaneously and probed. Nearly 100% of the study participants reported having heard of at least one method and about half reported awareness of at least eight different methods of contraception. Multinomial logistic regression analyses showed that older and educated men were more likely to have reported a high degree of awareness. The findings confirmed our hypothesis that current contraceptive use is likely to confer a high degree of modern method awareness among men (p<0.001), after controlling for other important characteristics. Men who had a low degree of contraceptive awareness seem not properly informed of the wide range of contraceptive options. It is imperative that family planning intervention strategies in Bangladesh should focus on the degree and functional knowledge of contraceptive methods to improve the uptake of especially male-based modern methods.
Lim, K K; Chan, Y Y; Noor Ani, A; Rohani, J; Siti Norfadhilah, Z A; Santhi, M R
2017-12-01
The success of the Expanded Program on Immunization among children will greatly reduce the burden of illness and disability from vaccine preventable diseases. The aim of the study was to evaluate the complete immunization coverage and its determinants among children aged 12-23 months in Malaysia. Cross-sectional study. Data on immunization were extracted from the 2016 National Health and Morbidity Survey. Complete immunization coverage was classified as received all recommended primary vaccine doses by the age of 12 months and verified by vaccination cards, and incompletely immunized if they received partially recommended vaccine dose or not received any recommended vaccine dose or had no vaccination card. The multiple logistic regression analyses were conducted to determine the sociodemographic factors associated with complete immunization coverage. The overall complete immunization coverage among children (verified by cards) was 86.4% (n = 8920, 95% confidence interval: 85.4-87.4). Multivariable logistic regression analyses model revealed that factors significantly associated with complete immunization coverage were ethnicity, occupation of the mother, head of household's education level, and head of household's occupation. While sex, citizenship, household income, mother's age, and marital status were not significantly associated with complete immunization coverage. According to the World Health Organization criteria, the present study demonstrated that the immunization coverage of 86.4% is still unsatisfactory. Thus, the current intervention program should be enhanced in order to achieve the 95% coverage for all antigens in the national vaccination program. Copyright © 2017 The Royal Society for Public Health. Published by Elsevier Ltd. All rights reserved.
Secondhand Smoking Is Associated with Poor Mental Health in Korean Adolescents.
Bang, Inho; Jeong, Young-Jin; Park, Young-Yoon; Moon, Na-Yeon; Lee, Junyong; Jeon, Tae-Hee
2017-08-01
In Korea, the prevalence of depression is increasing in adolescents and the most common cause of death of adolescents has been reported as suicide. At a time of increasing predicament of mental health of adolescents, there are few studies on whether secondhand smoking is associated with mental health in adolescents. The objective of this study was to determine whether exposure to secondhand smoke is associated with mental health-related variables, such as depression, stress, and suicide, in Korean adolescents. Data from the eleventh Korea youth risk behavior web-based survey, a nationally representative survey of 62,708 participants (30,964 males and 31,744 females), were analyzed. For students of aged 12 to 18 years, extensive data including secondhand smoking, mental health, sociodemographic variables, and physical health were collected. Chi-square analysis, multiple logistic regression analysis and ordered logistic regression analysis were performed to estimate the association and dose-response relation between secondhand smoking and mental health. Compared with the non-exposed group, the odds ratios (OR) of depression, stress, suicidal ideation, suicidal planning and suicidal attempt in the secondhand smoking exposed group were 1.339, 1.192, 1.303, 1.437 and 1.505, respectively (all P < 0.001). When subjects were classified into two secondhand smoke exposure groups, with increasing secondhand smoking experience, higher was the OR for each mental health related variable, in a dose-response relation. Our findings suggest that secondhand smoking is associated with poor mental health such as depression, stress, and suicide, showing a dose-response relation in Korean adolescents.
Grepperud, Sverre; Holman, Per Arne; Wangen, Knut Reidar
2014-12-14
Clinicians at Norwegian community mental health centres assess referrals from general practitioners and classify them into three priority groups (high priority, low priority, and refusal) according to need where need is defined by three prioritization criteria (severity, effect, and cost-effectiveness). In this study, we seek to operationalize the three criteria and analyze to what extent they have an effect on clinical-level priority setting after controlling for clinician characteristics and organisational factors. Twenty anonymous referrals were rated by 42 admission team members employed at 14 community mental health centres in the South-East Health Region of Norway. Intra-class correlation coefficients were calculated and logistic regressions were performed. Variation in clinicians' assessments of the three criteria was highest for effect and cost-effectiveness. An ordered logistic regression model showed that all three criteria for prioritization, three clinician characteristics (education, being a manager or not, and "guideline awareness"), and the centres themselves (fixed effects), explained priority decisions. The relative importance of the explanatory factors, however, depended on the priority decision studied. For the classification of all admitted patients into high- and low-priority groups, all clinician characteristics became insignificant. For the classification of patients, into those admitted and non-admitted, one criterion (effect) and "being a manager or not" became insignificant, while profession ("being a psychiatrist") became significant. Our findings suggest that variation in priority decisions can be reduced by: (i) reducing the disagreement in clinicians' assessments of cost-effectiveness and effect, and (ii) restricting priority decisions to clinicians with a similar background (education, being a manager or not, and "guideline awareness").
Difficulties Reported by Hiv-Infected Patients Using Antiretroviral Therapy in Brazil
Guimarães, Mark Drew Crosland; Rocha, Gustavo Machado; Campos, Lorenza Nogueira; de Freitas, Felipe Melo Teixeira; Gualberto, Felipe Augusto Souza; Teixeira, Ramiro d’Ávila Rivelli; de Castilho, Fábio Morato
2008-01-01
OBJECTIVE To describe the degree of difficulty that HIV-infected patients have with therapy treatment. INTRODUCTION Patients’ perceptions about their treatment are a determinant factor for improved adherence and a better quality of life. METHODS Two cross-sectional analyses were conducted in public AIDS referral centers in Brazil among patients initiating treatment. Patients interviewed at baseline, after one month, and after seven months following the beginning of treatment were asked to classify and justify the degree of difficulty with treatment. Logistic regression was used for analysis. RESULTS Among 406 patients initiating treatment, 350 (86.2%) and 209 (51.5%) returned for their first and third visits, respectively. Treatment perceptions ranged from medium to very difficult for 51.4% and 37.3% on the first and third visits, respectively. The main difficulties reported were adverse reactions to the medication and scheduling. A separate logistic regression indicated that the HIV-seropositive status disclosure, symptoms of anxiety, absence of psychotherapy, higher CD4+ cell count (> 200/mm3) and high (> 4) adverse reaction count reported were independently associated with the degree of difficulty in the first visit, while CDC clinical category A, pill burden (> 7 pills), use of other medications, high (> 4) adverse reaction count reported and low understanding of medical orientation showed independent association for the third visit. CONCLUSIONS A significant level of difficulty was observed with treatment. Our analyses suggest the need for early assessment of difficulties with treatment, highlighting the importance of modifiable factors that may contribute to better adherence to the treatment protocol. PMID:18438569
Clustering performance comparison using K-means and expectation maximization algorithms.
Jung, Yong Gyu; Kang, Min Soo; Heo, Jun
2014-11-14
Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K -means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K -means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.
Delva, J; Spencer, M S; Lin, J K
2000-01-01
This article compares estimates of the relative odds of nitrite use obtained from weighted unconditional logistic regression with estimates obtained from conditional logistic regression after post-stratification and matching of cases with controls by neighborhood of residence. We illustrate these methods by comparing the odds associated with nitrite use among adults of four racial/ethnic groups, with and without a high school education. We used aggregated data from the 1994-B through 1996 National Household Survey on Drug Abuse (NHSDA). Difference between the methods and implications for analysis and inference are discussed.
Distributed neural signatures of natural audiovisual speech and music in the human auditory cortex.
Salmi, Juha; Koistinen, Olli-Pekka; Glerean, Enrico; Jylänki, Pasi; Vehtari, Aki; Jääskeläinen, Iiro P; Mäkelä, Sasu; Nummenmaa, Lauri; Nummi-Kuisma, Katarina; Nummi, Ilari; Sams, Mikko
2017-08-15
During a conversation or when listening to music, auditory and visual information are combined automatically into audiovisual objects. However, it is still poorly understood how specific type of visual information shapes neural processing of sounds in lifelike stimulus environments. Here we applied multi-voxel pattern analysis to investigate how naturally matching visual input modulates supratemporal cortex activity during processing of naturalistic acoustic speech, singing and instrumental music. Bayesian logistic regression classifiers with sparsity-promoting priors were trained to predict whether the stimulus was audiovisual or auditory, and whether it contained piano playing, speech, or singing. The predictive performances of the classifiers were tested by leaving one participant at a time for testing and training the model using the remaining 15 participants. The signature patterns associated with unimodal auditory stimuli encompassed distributed locations mostly in the middle and superior temporal gyrus (STG/MTG). A pattern regression analysis, based on a continuous acoustic model, revealed that activity in some of these MTG and STG areas were associated with acoustic features present in speech and music stimuli. Concurrent visual stimulus modulated activity in bilateral MTG (speech), lateral aspect of right anterior STG (singing), and bilateral parietal opercular cortex (piano). Our results suggest that specific supratemporal brain areas are involved in processing complex natural speech, singing, and piano playing, and other brain areas located in anterior (facial speech) and posterior (music-related hand actions) supratemporal cortex are influenced by related visual information. Those anterior and posterior supratemporal areas have been linked to stimulus identification and sensory-motor integration, respectively. Copyright © 2017 Elsevier Inc. All rights reserved.
Austin, Peter C; Lee, Douglas S; Steyerberg, Ewout W; Tu, Jack V
2012-01-01
In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. PMID:22777999
ERIC Educational Resources Information Center
Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza
2014-01-01
This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…
ERIC Educational Resources Information Center
French, Brian F.; Maller, Susan J.
2007-01-01
Two unresolved implementation issues with logistic regression (LR) for differential item functioning (DIF) detection include ability purification and effect size use. Purification is suggested to control inaccuracies in DIF detection as a result of DIF items in the ability estimate. Additionally, effect size use may be beneficial in controlling…
A Note on Three Statistical Tests in the Logistic Regression DIF Procedure
ERIC Educational Resources Information Center
Paek, Insu
2012-01-01
Although logistic regression became one of the well-known methods in detecting differential item functioning (DIF), its three statistical tests, the Wald, likelihood ratio (LR), and score tests, which are readily available under the maximum likelihood, do not seem to be consistently distinguished in DIF literature. This paper provides a clarifying…
ERIC Educational Resources Information Center
West, Lindsey M.; Davis, Telsie A.; Thompson, Martie P.; Kaslow, Nadine J.
2011-01-01
Protective factors for fostering reasons for living were examined among low-income, suicidal, African American women. Bivariate logistic regressions revealed that higher levels of optimism, spiritual well-being, and family social support predicted reasons for living. Multivariate logistic regressions indicated that spiritual well-being showed…
Comparison of Two Approaches for Handling Missing Covariates in Logistic Regression
ERIC Educational Resources Information Center
Peng, Chao-Ying Joanne; Zhu, Jin
2008-01-01
For the past 25 years, methodological advances have been made in missing data treatment. Most published work has focused on missing data in dependent variables under various conditions. The present study seeks to fill the void by comparing two approaches for handling missing data in categorical covariates in logistic regression: the…
Comparison of IRT Likelihood Ratio Test and Logistic Regression DIF Detection Procedures
ERIC Educational Resources Information Center
Atar, Burcu; Kamata, Akihito
2011-01-01
The Type I error rates and the power of IRT likelihood ratio test and cumulative logit ordinal logistic regression procedures in detecting differential item functioning (DIF) for polytomously scored items were investigated in this Monte Carlo simulation study. For this purpose, 54 simulation conditions (combinations of 3 sample sizes, 2 sample…
Multiple Logistic Regression Analysis of Cigarette Use among High School Students
ERIC Educational Resources Information Center
Adwere-Boamah, Joseph
2011-01-01
A binary logistic regression analysis was performed to predict high school students' cigarette smoking behavior from selected predictors from 2009 CDC Youth Risk Behavior Surveillance Survey. The specific target student behavior of interest was frequent cigarette use. Five predictor variables included in the model were: a) race, b) frequency of…
ERIC Educational Resources Information Center
Anderson, Carolyn J.; Verkuilen, Jay; Peyton, Buddy L.
2010-01-01
Survey items with multiple response categories and multiple-choice test questions are ubiquitous in psychological and educational research. We illustrate the use of log-multiplicative association (LMA) models that are extensions of the well-known multinomial logistic regression model for multiple dependent outcome variables to reanalyze a set of…
Propensity Score Estimation with Data Mining Techniques: Alternatives to Logistic Regression
ERIC Educational Resources Information Center
Keller, Bryan S. B.; Kim, Jee-Seon; Steiner, Peter M.
2013-01-01
Propensity score analysis (PSA) is a methodological technique which may correct for selection bias in a quasi-experiment by modeling the selection process using observed covariates. Because logistic regression is well understood by researchers in a variety of fields and easy to implement in a number of popular software packages, it has…
Two-factor logistic regression in pediatric liver transplantation
NASA Astrophysics Data System (ADS)
Uzunova, Yordanka; Prodanova, Krasimira; Spasov, Lyubomir
2017-12-01
Using a two-factor logistic regression analysis an estimate is derived for the probability of absence of infections in the early postoperative period after pediatric liver transplantation. The influence of both the bilirubin level and the international normalized ratio of prothrombin time of blood coagulation at the 5th postoperative day is studied.
ERIC Educational Resources Information Center
Courtney, Jon R.; Prophet, Retta
2011-01-01
Placement instability is often associated with a number of negative outcomes for children. To gain state level contextual knowledge of factors associated with placement stability/instability, logistic regression was applied to selected variables from the New Mexico Adoption and Foster Care Administrative Reporting System dataset. Predictors…
Length bias correction in gene ontology enrichment analysis using logistic regression.
Mi, Gu; Di, Yanming; Emerson, Sarah; Cumbie, Jason S; Chang, Jeff H
2012-01-01
When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible.
Hansson, Lisbeth; Khamis, Harry J
2008-12-01
Simulated data sets are used to evaluate conditional and unconditional maximum likelihood estimation in an individual case-control design with continuous covariates when there are different rates of excluded cases and different levels of other design parameters. The effectiveness of the estimation procedures is measured by method bias, variance of the estimators, root mean square error (RMSE) for logistic regression and the percentage of explained variation. Conditional estimation leads to higher RMSE than unconditional estimation in the presence of missing observations, especially for 1:1 matching. The RMSE is higher for the smaller stratum size, especially for the 1:1 matching. The percentage of explained variation appears to be insensitive to missing data, but is generally higher for the conditional estimation than for the unconditional estimation. It is particularly good for the 1:2 matching design. For minimizing RMSE, a high matching ratio is recommended; in this case, conditional and unconditional logistic regression models yield comparable levels of effectiveness. For maximizing the percentage of explained variation, the 1:2 matching design with the conditional logistic regression model is recommended.
Szekér, Szabolcs; Vathy-Fogarassy, Ágnes
2018-01-01
Logistic regression based propensity score matching is a widely used method in case-control studies to select the individuals of the control group. This method creates a suitable control group if all factors affecting the output variable are known. However, if relevant latent variables exist as well, which are not taken into account during the calculations, the quality of the control group is uncertain. In this paper, we present a statistics-based research in which we try to determine the relationship between the accuracy of the logistic regression model and the uncertainty of the dependent variable of the control group defined by propensity score matching. Our analyses show that there is a linear correlation between the fit of the logistic regression model and the uncertainty of the output variable. In certain cases, a latent binary explanatory variable can result in a relative error of up to 70% in the prediction of the outcome variable. The observed phenomenon calls the attention of analysts to an important point, which must be taken into account when deducting conclusions.
NASA Astrophysics Data System (ADS)
Li, Hai; Kumavor, Patrick; Salman Alqasemi, Umar; Zhu, Quing
2015-01-01
A composite set of ovarian tissue features extracted from photoacoustic spectral data, beam envelope, and co-registered ultrasound and photoacoustic images are used to characterize malignant and normal ovaries using logistic and support vector machine (SVM) classifiers. Normalized power spectra were calculated from the Fourier transform of the photoacoustic beamformed data, from which the spectral slopes and 0-MHz intercepts were extracted. Five features were extracted from the beam envelope and another 10 features were extracted from the photoacoustic images. These 17 features were ranked by their p-values from t-tests on which a filter type of feature selection method was used to determine the optimal feature number for final classification. A total of 169 samples from 19 ex vivo ovaries were randomly distributed into training and testing groups. Both classifiers achieved a minimum value of the mean misclassification error when the seven features with lowest p-values were selected. Using these seven features, the logistic and SVM classifiers obtained sensitivities of 96.39±3.35% and 97.82±2.26%, and specificities of 98.92±1.39% and 100%, respectively, for the training group. For the testing group, logistic and SVM classifiers achieved sensitivities of 92.71±3.55% and 92.64±3.27%, and specificities of 87.52±8.78% and 98.49±2.05%, respectively.
Logistic regression for circular data
NASA Astrophysics Data System (ADS)
Al-Daffaie, Kadhem; Khan, Shahjahan
2017-05-01
This paper considers the relationship between a binary response and a circular predictor. It develops the logistic regression model by employing the linear-circular regression approach. The maximum likelihood method is used to estimate the parameters. The Newton-Raphson numerical method is used to find the estimated values of the parameters. A data set from weather records of Toowoomba city is analysed by the proposed methods. Moreover, a simulation study is considered. The R software is used for all computations and simulations.
Naval Research Logistics Quarterly. Volume 28. Number 3,
1981-09-01
denotes component-wise maximum. f has antone (isotone) differences on C x D if for cl < c2 and d, < d2, NAVAL RESEARCH LOGISTICS QUARTERLY VOL. 28...or negative correlations and linear or nonlinear regressions. Given are the mo- ments to order two and, for special cases, (he regression function and...data sets. We designate this bnb distribution as G - B - N(a, 0, v). The distribution admits only of positive correlation and linear regressions
Bond, H S; Sullivan, S G; Cowling, B J
2016-06-01
Influenza vaccination is the most practical means available for preventing influenza virus infection and is widely used in many countries. Because vaccine components and circulating strains frequently change, it is important to continually monitor vaccine effectiveness (VE). The test-negative design is frequently used to estimate VE. In this design, patients meeting the same clinical case definition are recruited and tested for influenza; those who test positive are the cases and those who test negative form the comparison group. When determining VE in these studies, the typical approach has been to use logistic regression, adjusting for potential confounders. Because vaccine coverage and influenza incidence change throughout the season, time is included among these confounders. While most studies use unconditional logistic regression, adjusting for time, an alternative approach is to use conditional logistic regression, matching on time. Here, we used simulation data to examine the potential for both regression approaches to permit accurate and robust estimates of VE. In situations where vaccine coverage changed during the influenza season, the conditional model and unconditional models adjusting for categorical week and using a spline function for week provided more accurate estimates. We illustrated the two approaches on data from a test-negative study of influenza VE against hospitalization in children in Hong Kong which resulted in the conditional logistic regression model providing the best fit to the data.
Asghari, Mehdi Poursheikhali; Hayatshahi, Sayyed Hamed Sadat; Abdolmaleki, Parviz
2012-01-01
From both the structural and functional points of view, β-turns play important biological roles in proteins. In the present study, a novel two-stage hybrid procedure has been developed to identify β-turns in proteins. Binary logistic regression was initially used for the first time to select significant sequence parameters in identification of β-turns due to a re-substitution test procedure. Sequence parameters were consisted of 80 amino acid positional occurrences and 20 amino acid percentages in sequence. Among these parameters, the most significant ones which were selected by binary logistic regression model, were percentages of Gly, Ser and the occurrence of Asn in position i+2, respectively, in sequence. These significant parameters have the highest effect on the constitution of a β-turn sequence. A neural network model was then constructed and fed by the parameters selected by binary logistic regression to build a hybrid predictor. The networks have been trained and tested on a non-homologous dataset of 565 protein chains. With applying a nine fold cross-validation test on the dataset, the network reached an overall accuracy (Qtotal) of 74, which is comparable with results of the other β-turn prediction methods. In conclusion, this study proves that the parameter selection ability of binary logistic regression together with the prediction capability of neural networks lead to the development of more precise models for identifying β-turns in proteins. PMID:27418910
Asghari, Mehdi Poursheikhali; Hayatshahi, Sayyed Hamed Sadat; Abdolmaleki, Parviz
2012-01-01
From both the structural and functional points of view, β-turns play important biological roles in proteins. In the present study, a novel two-stage hybrid procedure has been developed to identify β-turns in proteins. Binary logistic regression was initially used for the first time to select significant sequence parameters in identification of β-turns due to a re-substitution test procedure. Sequence parameters were consisted of 80 amino acid positional occurrences and 20 amino acid percentages in sequence. Among these parameters, the most significant ones which were selected by binary logistic regression model, were percentages of Gly, Ser and the occurrence of Asn in position i+2, respectively, in sequence. These significant parameters have the highest effect on the constitution of a β-turn sequence. A neural network model was then constructed and fed by the parameters selected by binary logistic regression to build a hybrid predictor. The networks have been trained and tested on a non-homologous dataset of 565 protein chains. With applying a nine fold cross-validation test on the dataset, the network reached an overall accuracy (Qtotal) of 74, which is comparable with results of the other β-turn prediction methods. In conclusion, this study proves that the parameter selection ability of binary logistic regression together with the prediction capability of neural networks lead to the development of more precise models for identifying β-turns in proteins.
Crane, Paul K; Gibbons, Laura E; Jolley, Lance; van Belle, Gerald
2006-11-01
We present an ordinal logistic regression model for identification of items with differential item functioning (DIF) and apply this model to a Mini-Mental State Examination (MMSE) dataset. We employ item response theory ability estimation in our models. Three nested ordinal logistic regression models are applied to each item. Model testing begins with examination of the statistical significance of the interaction term between ability and the group indicator, consistent with nonuniform DIF. Then we turn our attention to the coefficient of the ability term in models with and without the group term. If including the group term has a marked effect on that coefficient, we declare that it has uniform DIF. We examined DIF related to language of test administration in addition to self-reported race, Hispanic ethnicity, age, years of education, and sex. We used PARSCALE for IRT analyses and STATA for ordinal logistic regression approaches. We used an iterative technique for adjusting IRT ability estimates on the basis of DIF findings. Five items were found to have DIF related to language. These same items also had DIF related to other covariates. The ordinal logistic regression approach to DIF detection, when combined with IRT ability estimates, provides a reasonable alternative for DIF detection. There appear to be several items with significant DIF related to language of test administration in the MMSE. More attention needs to be paid to the specific criteria used to determine whether an item has DIF, not just the technique used to identify DIF.
Conditional Poisson models: a flexible alternative to conditional logistic case cross-over analysis.
Armstrong, Ben G; Gasparrini, Antonio; Tobias, Aurelio
2014-11-24
The time stratified case cross-over approach is a popular alternative to conventional time series regression for analysing associations between time series of environmental exposures (air pollution, weather) and counts of health outcomes. These are almost always analyzed using conditional logistic regression on data expanded to case-control (case crossover) format, but this has some limitations. In particular adjusting for overdispersion and auto-correlation in the counts is not possible. It has been established that a Poisson model for counts with stratum indicators gives identical estimates to those from conditional logistic regression and does not have these limitations, but it is little used, probably because of the overheads in estimating many stratum parameters. The conditional Poisson model avoids estimating stratum parameters by conditioning on the total event count in each stratum, thus simplifying the computing and increasing the number of strata for which fitting is feasible compared with the standard unconditional Poisson model. Unlike the conditional logistic model, the conditional Poisson model does not require expanding the data, and can adjust for overdispersion and auto-correlation. It is available in Stata, R, and other packages. By applying to some real data and using simulations, we demonstrate that conditional Poisson models were simpler to code and shorter to run than are conditional logistic analyses and can be fitted to larger data sets than possible with standard Poisson models. Allowing for overdispersion or autocorrelation was possible with the conditional Poisson model but when not required this model gave identical estimates to those from conditional logistic regression. Conditional Poisson regression models provide an alternative to case crossover analysis of stratified time series data with some advantages. The conditional Poisson model can also be used in other contexts in which primary control for confounding is by fine stratification.
Fei, Y; Hu, J; Li, W-Q; Wang, W; Zong, G-Q
2017-03-01
Essentials Predicting the occurrence of portosplenomesenteric vein thrombosis (PSMVT) is difficult. We studied 72 patients with acute pancreatitis. Artificial neural networks modeling was more accurate than logistic regression in predicting PSMVT. Additional predictive factors may be incorporated into artificial neural networks. Objective To construct and validate artificial neural networks (ANNs) for predicting the occurrence of portosplenomesenteric venous thrombosis (PSMVT) and compare the predictive ability of the ANNs with that of logistic regression. Methods The ANNs and logistic regression modeling were constructed using simple clinical and laboratory data of 72 acute pancreatitis (AP) patients. The ANNs and logistic modeling were first trained on 48 randomly chosen patients and validated on the remaining 24 patients. The accuracy and the performance characteristics were compared between these two approaches by SPSS17.0 software. Results The training set and validation set did not differ on any of the 11 variables. After training, the back propagation network training error converged to 1 × 10 -20 , and it retained excellent pattern recognition ability. When the ANNs model was applied to the validation set, it revealed a sensitivity of 80%, specificity of 85.7%, a positive predictive value of 77.6% and negative predictive value of 90.7%. The accuracy was 83.3%. Differences could be found between ANNs modeling and logistic regression modeling in these parameters (10.0% [95% CI, -14.3 to 34.3%], 14.3% [95% CI, -8.6 to 37.2%], 15.7% [95% CI, -9.9 to 41.3%], 11.8% [95% CI, -8.2 to 31.8%], 22.6% [95% CI, -1.9 to 47.1%], respectively). When ANNs modeling was used to identify PSMVT, the area under receiver operating characteristic curve was 0.849 (95% CI, 0.807-0.901), which demonstrated better overall properties than logistic regression modeling (AUC = 0.716) (95% CI, 0.679-0.761). Conclusions ANNs modeling was a more accurate tool than logistic regression in predicting the occurrence of PSMVT following AP. More clinical factors or biomarkers may be incorporated into ANNs modeling to improve its predictive ability. © 2016 International Society on Thrombosis and Haemostasis.
McLaren, Christine E.; Chen, Wen-Pin; Nie, Ke; Su, Min-Ying
2009-01-01
Rationale and Objectives Dynamic contrast enhanced MRI (DCE-MRI) is a clinical imaging modality for detection and diagnosis of breast lesions. Analytical methods were compared for diagnostic feature selection and performance of lesion classification to differentiate between malignant and benign lesions in patients. Materials and Methods The study included 43 malignant and 28 benign histologically-proven lesions. Eight morphological parameters, ten gray level co-occurrence matrices (GLCM) texture features, and fourteen Laws’ texture features were obtained using automated lesion segmentation and quantitative feature extraction. Artificial neural network (ANN) and logistic regression analysis were compared for selection of the best predictors of malignant lesions among the normalized features. Results Using ANN, the final four selected features were compactness, energy, homogeneity, and Law_LS, with area under the receiver operating characteristic curve (AUC) = 0.82, and accuracy = 0.76. The diagnostic performance of these 4-features computed on the basis of logistic regression yielded AUC = 0.80 (95% CI, 0.688 to 0.905), similar to that of ANN. The analysis also shows that the odds of a malignant lesion decreased by 48% (95% CI, 25% to 92%) for every increase of 1 SD in the Law_LS feature, adjusted for differences in compactness, energy, and homogeneity. Using logistic regression with z-score transformation, a model comprised of compactness, NRL entropy, and gray level sum average was selected, and it had the highest overall accuracy of 0.75 among all models, with AUC = 0.77 (95% CI, 0.660 to 0.880). When logistic modeling of transformations using the Box-Cox method was performed, the most parsimonious model with predictors, compactness and Law_LS, had an AUC of 0.79 (95% CI, 0.672 to 0.898). Conclusion The diagnostic performance of models selected by ANN and logistic regression was similar. The analytic methods were found to be roughly equivalent in terms of predictive ability when a small number of variables were chosen. The robust ANN methodology utilizes a sophisticated non-linear model, while logistic regression analysis provides insightful information to enhance interpretation of the model features. PMID:19409817
Ai, Zi-Sheng; Gao, You-Shui; Sun, Yuan; Liu, Yue; Zhang, Chang-Qing; Jiang, Cheng-Hua
2013-03-01
Risk factors for femoral neck fracture-induced avascular necrosis of the femoral head have not been elucidated clearly in middle-aged and elderly patients. Moreover, the high incidence of screw removal in China and its effect on the fate of the involved femoral head require statistical methods to reflect their intrinsic relationship. Ninety-nine patients older than 45 years with femoral neck fracture were treated by internal fixation between May 1999 and April 2004. Descriptive analysis, interaction analysis between associated factors, single factor logistic regression, multivariate logistic regression, and detailed interaction analysis were employed to explore potential relationships among associated factors. Avascular necrosis of the femoral head was found in 15 cases (15.2 %). Age × the status of implants (removal vs. maintenance) and gender × the timing of reduction were interactive according to two-factor interactive analysis. Age, the displacement of fractures, the quality of reduction, and the status of implants were found to be significant factors in single factor logistic regression analysis. Age, age × the status of implants, and the quality of reduction were found to be significant factors in multivariate logistic regression analysis. In fine interaction analysis after multivariate logistic regression analysis, implant removal was the most important risk factor for avascular necrosis in 56-to-85-year-old patients, with a risk ratio of 26.00 (95 % CI = 3.076-219.747). The middle-aged and elderly have less incidence of avascular necrosis of the femoral head following femoral neck fractures treated by cannulated screws. The removal of cannulated screws can induce a significantly high incidence of avascular necrosis of the femoral head in elderly patients, while a high-quality reduction is helpful to reduce avascular necrosis.
Zhou, Jinzhe; Zhou, Yanbing; Cao, Shougen; Li, Shikuan; Wang, Hao; Niu, Zhaojian; Chen, Dong; Wang, Dongsheng; Lv, Liang; Zhang, Jian; Li, Yu; Jiao, Xuelong; Tan, Xiaojie; Zhang, Jianli; Wang, Haibo; Zhang, Bingyuan; Lu, Yun; Sun, Zhenqing
2016-01-01
Reporting of surgical complications is common, but few provide information about the severity and estimate risk factors of complications. If have, but lack of specificity. We retrospectively analyzed data on 2795 gastric cancer patients underwent surgical procedure at the Affiliated Hospital of Qingdao University between June 2007 and June 2012, established multivariate logistic regression model to predictive risk factors related to the postoperative complications according to the Clavien-Dindo classification system. Twenty-four out of 86 variables were identified statistically significant in univariate logistic regression analysis, 11 significant variables entered multivariate analysis were employed to produce the risk model. Liver cirrhosis, diabetes mellitus, Child classification, invasion of neighboring organs, combined resection, introperative transfusion, Billroth II anastomosis of reconstruction, malnutrition, surgical volume of surgeons, operating time and age were independent risk factors for postoperative complications after gastrectomy. Based on logistic regression equation, p=Exp∑BiXi / (1+Exp∑BiXi), multivariate logistic regression predictive model that calculated the risk of postoperative morbidity was developed, p = 1/(1 + e((4.810-1.287X1-0.504X2-0.500X3-0.474X4-0.405X5-0.318X6-0.316X7-0.305X8-0.278X9-0.255X10-0.138X11))). The accuracy, sensitivity and specificity of the model to predict the postoperative complications were 86.7%, 76.2% and 88.6%, respectively. This risk model based on Clavien-Dindo grading severity of complications system and logistic regression analysis can predict severe morbidity specific to an individual patient's risk factors, estimate patients' risks and benefits of gastric surgery as an accurate decision-making tool and may serve as a template for the development of risk models for other surgical groups.
Hypertension and hematologic parameters in a community near a uranium processing facility
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wagner, Sara E., E-mail: swagner@uga.edu; Burch, James B.; South Carolina Statewide Cancer Prevention and Control Program, Columbia, SC
Background: Environmental uranium exposure originating as a byproduct of uranium processing can impact human health. The Fernald Feed Materials Production Center functioned as a uranium processing facility from 1951 to 1989, and potential health effects among residents living near this plant were investigated via the Fernald Medical Monitoring Program (FMMP). Methods: Data from 8216 adult FMMP participants were used to test the hypothesis that elevated uranium exposure was associated with indicators of hypertension or changes in hematologic parameters at entry into the program. A cumulative uranium exposure estimate, developed by FMMP investigators, was used to classify exposure. Systolic and diastolicmore » blood pressure and physician diagnoses were used to assess hypertension; and red blood cells, platelets, and white blood cell differential counts were used to characterize hematology. The relationship between uranium exposure and hypertension or hematologic parameters was evaluated using generalized linear models and quantile regression for continuous outcomes, and logistic regression or ordinal logistic regression for categorical outcomes, after adjustment for potential confounding factors. Results: Of 8216 adult FMMP participants 4187 (51%) had low cumulative uranium exposure, 1273 (15%) had moderate exposure, and 2756 (34%) were in the high (>0.50 Sievert) cumulative lifetime uranium exposure category. Participants with elevated uranium exposure had decreased white blood cell and lymphocyte counts and increased eosinophil counts. Female participants with higher uranium exposures had elevated systolic blood pressure compared to women with lower exposures. However, no exposure-related changes were observed in diastolic blood pressure or hypertension diagnoses among female or male participants. Conclusions: Results from this investigation suggest that residents in the vicinity of the Fernald plant with elevated exposure to uranium primarily via inhalation exhibited decreases in white blood cell counts, and small, though statistically significant, gender-specific alterations in systolic blood pressure at entry into the FMMP.« less
Rank-Optimized Logistic Matrix Regression toward Improved Matrix Data Classification.
Zhang, Jianguang; Jiang, Jianmin
2018-02-01
While existing logistic regression suffers from overfitting and often fails in considering structural information, we propose a novel matrix-based logistic regression to overcome the weakness. In the proposed method, 2D matrices are directly used to learn two groups of parameter vectors along each dimension without vectorization, which allows the proposed method to fully exploit the underlying structural information embedded inside the 2D matrices. Further, we add a joint [Formula: see text]-norm on two parameter matrices, which are organized by aligning each group of parameter vectors in columns. This added co-regularization term has two roles-enhancing the effect of regularization and optimizing the rank during the learning process. With our proposed fast iterative solution, we carried out extensive experiments. The results show that in comparison to both the traditional tensor-based methods and the vector-based regression methods, our proposed solution achieves better performance for matrix data classifications.
Lam, Lawrence T; Wong, Emmy M Y
2015-03-01
Based on the theoretical framework of Problem Behavior and Stress Reduction theories for problematic Internet use (PIU), this study aimed to investigate the relationship between parental PIU and the PIU among adolescents taking into consideration the stress levels of young people. This was a population-based parent and adolescent dyad health survey utilizing a random sampling technique. PIU for both parents and adolescents was measured by the Internet addiction test designed by Young. The stress level of adolescents was assessed using the stress subscale of the Depression Anxiety Stress Scale (DASS). Data were analyzed using logistic regression modeling techniques with adjustment for potential confounding factors with analysis on the modification effect of stress levels on the relationship between parent and adolescent PIU. Of the total 1,098 parent and adolescent dyads with usable information, 263 adolescents (24.0%) and 62 parents (5.7%) could be classified as moderate and severe problematic users of the Internet. About 14% (n = 157) of adolescents could be classified with moderate-to-severe stress. Regression analysis results suggested a significant interaction between parental PIU and adolescents' stress levels on adolescent PIU. Stratified regression analyses by stress level resulted in a significant parent and adolescent PIU relationship in the low stress group (odds ratio, 3.18; 95% confidence interval 1.65-6.14). However, the association between parent and adolescent PIU in the high stress group became insignificant. There was a significant parent and adolescent PIU relationship; however, this relationship is differentially affected by the stress status of the adolescent. The direct implication of the results is that parental Internet use should also be assessed and included as part of the treatment regime for adolescents. Copyright © 2015 Society for Adolescent Health and Medicine. Published by Elsevier Inc. All rights reserved.
Detecting DIF in Polytomous Items Using MACS, IRT and Ordinal Logistic Regression
ERIC Educational Resources Information Center
Elosua, Paula; Wells, Craig
2013-01-01
The purpose of the present study was to compare the Type I error rate and power of two model-based procedures, the mean and covariance structure model (MACS) and the item response theory (IRT), and an observed-score based procedure, ordinal logistic regression, for detecting differential item functioning (DIF) in polytomous items. A simulation…
ERIC Educational Resources Information Center
Fan, Xitao; Wang, Lin
The Monte Carlo study compared the performance of predictive discriminant analysis (PDA) and that of logistic regression (LR) for the two-group classification problem. Prior probabilities were used for classification, but the cost of misclassification was assumed to be equal. The study used a fully crossed three-factor experimental design (with…
ERIC Educational Resources Information Center
Nguyen, Phuong L.
2006-01-01
This study examines the effects of parental SES, school quality, and community factors on children's enrollment and achievement in rural areas in Viet Nam, using logistic regression and ordered logistic regression. Multivariate analysis reveals significant differences in educational enrollment and outcomes by level of household expenditures and…
School Exits in the Milwaukee Parental Choice Program: Evidence of a Marketplace?
ERIC Educational Resources Information Center
Ford, Michael
2011-01-01
This article examines whether the large number of school exits from the Milwaukee school voucher program is evidence of a marketplace. Two logistic regression and multinomial logistic regression models tested the relation between the inability to draw large numbers of voucher students and the ability for a private school to remain viable. Data on…
Hierarchical Bayesian Logistic Regression to forecast metabolic control in type 2 DM patients.
Dagliati, Arianna; Malovini, Alberto; Decata, Pasquale; Cogni, Giulia; Teliti, Marsida; Sacchi, Lucia; Cerra, Carlo; Chiovato, Luca; Bellazzi, Riccardo
2016-01-01
In this work we present our efforts in building a model able to forecast patients' changes in clinical conditions when repeated measurements are available. In this case the available risk calculators are typically not applicable. We propose a Hierarchical Bayesian Logistic Regression model, which allows taking into account individual and population variability in model parameters estimate. The model is used to predict metabolic control and its variation in type 2 diabetes mellitus. In particular we have analyzed a population of more than 1000 Italian type 2 diabetic patients, collected within the European project Mosaic. The results obtained in terms of Matthews Correlation Coefficient are significantly better than the ones gathered with standard logistic regression model, based on data pooling.
Li, Ji; Gray, B.R.; Bates, D.M.
2008-01-01
Partitioning the variance of a response by design levels is challenging for binomial and other discrete outcomes. Goldstein (2003) proposed four definitions for variance partitioning coefficients (VPC) under a two-level logistic regression model. In this study, we explicitly derived formulae for multi-level logistic regression model and subsequently studied the distributional properties of the calculated VPCs. Using simulations and a vegetation dataset, we demonstrated associations between different VPC definitions, the importance of methods for estimating VPCs (by comparing VPC obtained using Laplace and penalized quasilikehood methods), and bivariate dependence between VPCs calculated at different levels. Such an empirical study lends an immediate support to wider applications of VPC in scientific data analysis.
Model building strategy for logistic regression: purposeful selection.
Zhang, Zhongheng
2016-03-01
Logistic regression is one of the most commonly used models to account for confounders in medical literature. The article introduces how to perform purposeful selection model building strategy with R. I stress on the use of likelihood ratio test to see whether deleting a variable will have significant impact on model fit. A deleted variable should also be checked for whether it is an important adjustment of remaining covariates. Interaction should be checked to disentangle complex relationship between covariates and their synergistic effect on response variable. Model should be checked for the goodness-of-fit (GOF). In other words, how the fitted model reflects the real data. Hosmer-Lemeshow GOF test is the most widely used for logistic regression model.
De Gucht, Veronique; Garcia, Franshelis Katerinee; den Engelsman, Marielle; Maes, Stan
2016-10-01
The main research question is: "Do CFS patients differ from fatigued non-CFS patients with respect to physical, cognitive, behavioral, social, and emotional determinants?" In addition, group differences in relevant outcomes were explored. Patients who met the Centers for Disease Control (CDC) criteria for CFS were categorized as CFS; these patients were mainly recruited via a large Dutch patient organization. Primary care patients who were fatigued for at least 1 month and up to 2 years but did not meet the CDC criteria were classified as fatigued non-CFS patients. Both groups were matched by age and gender (N = 192 for each group). CFS patients attributed their fatigue more frequently to external causes, reported a worse physical functioning, more medical visits, and a lower employment rate. The results of a multiple logistic regression analysis showed that patients who believe that their fatigue is associated with more severe consequences, that their fatigue will last longer and is responsible for more additional symptoms are more likely to be classified as CFS, while patients who are more physically active and have higher levels of "all or nothing behavior" are less likely to be classified as having CFS. A longitudinal study should explore the predictive value of the above factors for the transition from medically unexplained fatigue to CFS in order to develop targeted interventions for primary care patients with short-term fatigue complaints.
Evaluation of the Risk Factors for a Rotator Cuff Retear After Repair Surgery.
Lee, Yeong Seok; Jeong, Jeung Yeol; Park, Chan-Deok; Kang, Seung Gyoon; Yoo, Jae Chul
2017-07-01
A retear is a significant clinical problem after rotator cuff repair. However, no study has evaluated the retear rate with regard to the extent of footprint coverage. To evaluate the preoperative and intraoperative factors for a retear after rotator cuff repair, and to confirm the relationship with the extent of footprint coverage. Cohort study; Level of evidence, 3. Data were retrospectively collected from 693 patients who underwent arthroscopic rotator cuff repair between January 2006 and December 2014. All repairs were classified into 4 types of completeness of repair according to the amount of footprint coverage at the end of surgery. All patients underwent magnetic resonance imaging (MRI) after a mean postoperative duration of 5.4 months. Preoperative demographic data, functional scores, range of motion, and global fatty degeneration on preoperative MRI and intraoperative variables including the tear size, completeness of rotator cuff repair, concomitant subscapularis repair, number of suture anchors used, repair technique (single-row or transosseous-equivalent double-row repair), and surgical duration were evaluated. Furthermore, the factors associated with failure using the single-row technique and transosseous-equivalent double-row technique were analyzed separately. The retear rate was 7.22%. Univariate analysis revealed that rotator cuff retears were affected by age; the presence of inflammatory arthritis; the completeness of rotator cuff repair; the initial tear size; the number of suture anchors; mean operative time; functional visual analog scale scores; Simple Shoulder Test findings; American Shoulder and Elbow Surgeons scores; and fatty degeneration of the supraspinatus, infraspinatus, and subscapularis. Multivariate logistic regression analysis revealed patient age, initial tear size, and fatty degeneration of the supraspinatus as independent risk factors for a rotator cuff retear. Multivariate logistic regression analysis of the single-row group revealed patient age and fatty degeneration of the supraspinatus as independent risk factors for a rotator cuff retear. Multivariate logistic regression analysis of the transosseous-equivalent double-row group revealed a frozen shoulder as an independent risk factor for a rotator cuff retear. Our results suggest that patient age, initial tear size, and fatty degeneration of the supraspinatus are independent risk factors for a rotator cuff retear, whereas the completeness of rotator cuff repair based on the extent of footprint coverage and repair technique are not.
Flynn-Evans, Erin E.; Lockley, Steven W.
2016-01-01
Study Objectives: There is currently no questionnaire-based pre-screening tool available to detect non-24-hour sleep-wake rhythm disorder (N24HSWD) among blind patients. Our goal was to develop such a tool, derived from gold standard, objective hormonal measures of circadian entrainment status, for the detection of N24HSWD among those with visual impairment. Methods: We evaluated the contribution of 40 variables in their ability to predict N24HSWD among 127 blind women, classified using urinary 6-sulfatoxymelatonin period, an objective marker of circadian entrainment status in this population. We subjected the 40 candidate predictors to 1,000 bootstrapped iterations of a logistic regression forward selection model to predict N24HSWD, with model inclusion set at the p < 0.05 level. We removed any predictors that were not selected at least 1% of the time in the 1,000 bootstrapped models and applied a second round of 1,000 bootstrapped logistic regression forward selection models to the remaining 23 candidate predictors. We included all questions that were selected at least 10% of the time in the final model. We subjected the selected predictors to a final logistic regression model to predict N24SWD over 1,000 bootstrapped models to calculate the concordance statistic and adjusted optimism of the final model. We used this information to generate a predictive model and determined the sensitivity and specificity of the model. Finally, we applied the model to a cohort of 1,262 blind women who completed the survey, but did not collect urine samples. Results: The final model consisted of eight questions. The concordance statistic, adjusted for bootstrapping, was 0.85. The positive predictive value was 88%, the negative predictive value was 79%. Applying this model to our larger dataset of women, we found that 61% of those without light perception, and 27% with some degree of light perception, would be referred for further screening for N24HSWD. Conclusions: Our model has predictive utility sufficient to serve as a pre-screening questionnaire for N24HSWD among the blind. Citation: Flynn-Evans EE, Lockley SW. A pre-screening questionnaire to predict non-24-hour sleep-wake rhythm disorder (N24HSWD) among the blind. J Clin Sleep Med 2016;12(5):703–710. PMID:26951421
[Suicide Ideation Among Medical Students: Prevalence and Associated Factors].
Pinzón-Amado, Alexander; Guerrero, Sonia; Moreno, Katherine; Landínez, Carolina; Pinzón, Julie
2013-01-01
It is well documented that physicians have higher rates of suicide than the general population. This risk tends to increase even from the beginning of undergraduate training in medicine. There are few studies evaluating the frequency of suicidal behaviors in undergraduate medical students, particularly in Latin America. To determine the lifetime prevalence and the variables associated with suicidal ideation and suicide attempts in a sample of medical students from the city of Bucaramanga, Colombia. An analytical cross-sectional observational study was conducted to determine the lifetime prevalence of suicidal ideation and suicide attempts in a non-random sample of medical students enrolled in three medical schools in Bucaramanga. A self-administered questionnaire was voluntarily and anonymously answered by the participants. Validated versions of the CES-D and CAGE scales were used to assess the presence of depressive symptoms and problematic alcohol use, respectively. A multivariate logistic regression model was generated in order to adjust the estimates of variables associated with the outcome «suicidal ideation in life». The study sample consisted of 963 medical students, of which 57% (n=549) of the participants were women. The average age was 20.3 years (SD=2.3 years). Having had at least one episode of serious suicidal ideation in their lifetime was reported by 15.7% (n=149) of the students, with 5% (n=47) of the students reported having made at least one suicide attempt. Having taken antidepressants during their medical training was reported by 13.9% (n=131) of the students. The variables associated with the presence of suicidal ideation in the logistic regression model were: clinically significant depressive symptoms (OR: 6.9, 95% CI; 4.54-10.4), history of illicit psychoactive substance use (OR 2.8, 95% CI; 1.6-4.8), and perception of poor academic performance over the past year (OR: 2.2, 95% CI; 1.4-3.6). The logistic regression model correctly classified 85% of the subjects with a history of suicidal ideation. Suicidal ideation is a frequently occurring phenomenon in medical students. Medical schools need to establish screening procedures for early detection and intervention of students with emotional distress and suicide risk. Copyright © 2013 Asociación Colombiana de Psiquiatría. Publicado por Elsevier España. All rights reserved.
NASA Astrophysics Data System (ADS)
Rossi, M.; Apuani, T.; Felletti, F.
2009-04-01
The aim of this paper is to compare the results of two statistical methods for landslide susceptibility analysis: 1) univariate probabilistic method based on landslide susceptibility index, 2) multivariate method (logistic regression). The study area is the Febbraro valley, located in the central Italian Alps, where different types of metamorphic rocks croup out. On the eastern part of the studied basin a quaternary cover represented by colluvial and secondarily, by glacial deposits, is dominant. In this study 110 earth flows, mainly located toward NE portion of the catchment, were analyzed. They involve only the colluvial deposits and their extension mainly ranges from 36 to 3173 m2. Both statistical methods require to establish a spatial database, in which each landslide is described by several parameters that can be assigned using a main scarp central point of landslide. The spatial database is constructed using a Geographical Information System (GIS). Each landslide is described by several parameters corresponding to the value of main scarp central point of the landslide. Based on bibliographic review a total of 15 predisposing factors were utilized. The width of the intervals, in which the maps of the predisposing factors have to be reclassified, has been defined assuming constant intervals to: elevation (100 m), slope (5 °), solar radiation (0.1 MJ/cm2/year), profile curvature (1.2 1/m), tangential curvature (2.2 1/m), drainage density (0.5), lineament density (0.00126). For the other parameters have been used the results of the probability-probability plots analysis and the statistical indexes of landslides site. In particular slope length (0 ÷ 2, 2 ÷ 5, 5 ÷ 10, 10 ÷ 20, 20 ÷ 35, 35 ÷ 260), accumulation flow (0 ÷ 1, 1 ÷ 2, 2 ÷ 5, 5 ÷ 12, 12 ÷ 60, 60 ÷27265), Topographic Wetness Index 0 ÷ 0.74, 0.74 ÷ 1.94, 1.94 ÷ 2.62, 2.62 ÷ 3.48, 3.48 ÷ 6,00, 6.00 ÷ 9.44), Stream Power Index (0 ÷ 0.64, 0.64 ÷ 1.28, 1.28 ÷ 1.81, 1.81 ÷ 4.20, 4.20 ÷ 9.40). Geological map and land use map were also used, considering geological and land use properties as categorical variables. Appling the univariate probabilistic method the Landslide Susceptibility Index (LSI) is defined as the sum of the ratio Ra/Rb calculated for each predisposing factor, where Ra is the ratio between number of pixel of class and the total number of pixel of the study area, and Rb is the ratio between number of landslides respect to the pixel number of the interval area. From the analysis of the Ra/Rb ratio the relationship between landslide occurrence and predisposing factors were defined. Then the equation of LSI was used in GIS to trace the landslide susceptibility maps. The multivariate method for landslide susceptibility analysis, based on logistic regression, was performed starting from the density maps of the predisposing factors, calculated with the intervals defined above using the equation Rb/Rbtot, where Rbtot is a sum of all Rb values. Using stepwise forward algorithms the logistic regression was performed in two successive steps: first a univariate logistic regression is used to choose the most significant predisposing factors, then the multivariate logistic regression can be performed. The univariate regression highlighted the importance of the following factors: elevation, accumulation flow, drainage density, lineament density, geology and land use. When the multivariate regression was applied the number of controlling factors was reduced neglecting the geological properties. The resulting final susceptibility equation is: P = 1 / (1 + exp-(6.46-22.34*elevation-5.33*accumulation flow-7.99* drainage density-4.47*lineament density-17.31*land use)) and using this equation the susceptibility maps were obtained. To easy compare the results of the two methodologies, the susceptibility maps were reclassified in five susceptibility intervals (very high, high, moderate, low and very low) using natural breaks. Then the maps were validated using two cumulative distribution curves, one related to the landslides (number of landslides in each susceptibility class) and one to the basin (number of pixel covering each class). Comparing the curves for each method, it results that the two approaches (univariate and multivariate) are appropriate, providing acceptable results. In both maps the distribution of high susceptibility condition is mainly localized on the left slope of the catchment in agreement with the field evidences. The comparison between the methods was obtained by subtraction of the two maps. This operation shows that about 40% of the basin is classified by the same class of susceptibility. In general the univariate probabilistic method tends to overestimate the areal extension of the high susceptibility class with respect to the maps obtained by the logistic regression method.
Qiu, Yuchen; Yan, Shiju; Gundreddy, Rohith Reddy; Wang, Yunzhi; Cheng, Samuel; Liu, Hong; Zheng, Bin
2017-01-01
PURPOSE To develop and test a deep learning based computer-aided diagnosis (CAD) scheme of mammograms for classifying between malignant and benign masses. METHODS An image dataset involving 560 regions of interest (ROIs) extracted from digital mammograms was used. After down-sampling each ROI from 512×512 to 64×64 pixel size, we applied an 8 layer deep learning network that involves 3 pairs of convolution-max-pooling layers for automatic feature extraction and a multiple layer perceptron (MLP) classifier for feature categorization to process ROIs. The 3 pairs of convolution layers contain 20, 10, and 5 feature maps, respectively. Each convolution layer is connected with a max-pooling layer to improve the feature robustness. The output of the sixth layer is fully connected with a MLP classifier, which is composed of one hidden layer and one logistic regression layer. The network then generates a classification score to predict the likelihood of ROI depicting a malignant mass. A four-fold cross validation method was applied to train and test this deep learning network. RESULTS The results revealed that this CAD scheme yields an area under the receiver operation characteristic curve (AUC) of 0.696±0.044, 0.802±0.037, 0.836±0.036, and 0.822±0.035 for fold 1 to 4 testing datasets, respectively. The overall AUC of the entire dataset is 0.790±0.019. CONCLUSIONS This study demonstrates the feasibility of applying a deep learning based CAD scheme to classify between malignant and benign breast masses without a lesion segmentation, image feature computation and selection process. PMID:28436410
Qiu, Yuchen; Yan, Shiju; Gundreddy, Rohith Reddy; Wang, Yunzhi; Cheng, Samuel; Liu, Hong; Zheng, Bin
2017-01-01
To develop and test a deep learning based computer-aided diagnosis (CAD) scheme of mammograms for classifying between malignant and benign masses. An image dataset involving 560 regions of interest (ROIs) extracted from digital mammograms was used. After down-sampling each ROI from 512×512 to 64×64 pixel size, we applied an 8 layer deep learning network that involves 3 pairs of convolution-max-pooling layers for automatic feature extraction and a multiple layer perceptron (MLP) classifier for feature categorization to process ROIs. The 3 pairs of convolution layers contain 20, 10, and 5 feature maps, respectively. Each convolution layer is connected with a max-pooling layer to improve the feature robustness. The output of the sixth layer is fully connected with a MLP classifier, which is composed of one hidden layer and one logistic regression layer. The network then generates a classification score to predict the likelihood of ROI depicting a malignant mass. A four-fold cross validation method was applied to train and test this deep learning network. The results revealed that this CAD scheme yields an area under the receiver operation characteristic curve (AUC) of 0.696±0.044, 0.802±0.037, 0.836±0.036, and 0.822±0.035 for fold 1 to 4 testing datasets, respectively. The overall AUC of the entire dataset is 0.790±0.019. This study demonstrates the feasibility of applying a deep learning based CAD scheme to classify between malignant and benign breast masses without a lesion segmentation, image feature computation and selection process.
Assessment and monitoring practices of Australian fitness professionals.
Bennie, Jason A; Wiesner, Glen H; van Uffelen, Jannique G Z; Harvey, Jack T; Craike, Melinda J; Biddle, Stuart J H
2018-04-01
Assessment and monitoring of client health and fitness is a key part of fitness professionals' practices. However, little is known about prevalence of this practice. This study describes the assessment/monitoring practices of a large sample of Australian fitness professionals. Cross-sectional. In 2014, 1206 fitness professionals completed an online survey. Respondents reported their frequency (4 point-scale: [1] 'never' to [4] 'always') of assessment/monitoring of eight health and fitness constructs (e.g. body composition, aerobic fitness). This was classified as: (i) 'high' ('always' assessing/monitoring ≥5 constructs); (ii) 'medium' (1-4 constructs); (iii) 'low' (0 constructs). Classifications are reported by demographic and fitness industry characteristics. The odds of being classified as a 'high assessor/monitor' according to social ecological correlates were examined using a multiple-factor logistic regression model. Mean age of respondents was 39.3 (±11.6) years and 71.6% were female. A total of 15.8% (95% CI: 13.7%-17.9%) were classified as a 'high' assessor/monitor. Constructs with the largest proportion of being 'always' assessed were body composition (47.7%; 95% CI: 45.0%-50.1%) and aerobic fitness (42.5%; 95% CI: 39.6%-45.3%). Those with the lowest proportion of being 'always' assessed were balance (24.0%; 95% CI: 24.7%-26.5%) and mental health (20.2%; 95% CI: 18.1%-29.6%). A perceived lack of client interest and fitness professionals not considering assessing their responsibility were associated with lower odds of being classified as a 'high assessor/monitor'. Most fitness professionals do not routinely assess/monitor client fitness and health. Key factors limiting client health assessment and monitoring include a perceived lack of client interest and professionals not considering this their role. Copyright © 2017. Published by Elsevier Ltd.
Toovey, Rachel; Reid, Susan M; Rawicki, Barry; Harvey, Adrienne R; Watt, Kerrianne
2017-04-01
Limited information exists on the ability of children with cerebral palsy (CP) to ride a two-wheel bicycle, an activity that may improve health and participation. We aimed to describe bicycle-riding ability and variables associated with ability to ride in children with CP (Gross Motor Functional Classification System [GMFCS] levels I-II) compared with children with typical development. This case-control study surveyed parents of 114 children with CP and 87 children with typical development aged 6 to 15 years (115 males, mean age 9y 11mo, standard deviation [SD] 2y 10mo). Kaplan-Meier methods were used to compare proportions able to ride at any given age between the two groups. Logistic regression was used to assess variables associated with ability to ride for children with CP and typical development separately. The proportion of children with CP able to ride at each level of bicycle-riding ability was substantially lower at each age than peers with typical development (p<0.001). While most children with typical development were able to ride independently by 10 years of age, 51% of children with CP classified as GMFCS level I and 3% of those classified as GMFCS level II had obtained independent riding in the community by 15 years of age. Variables associated with ability to ride for children classified as GMFCS level I were age and parent-rated importance of their child being able to ride. Some independently ambulant children with CP can learn to ride a bicycle, in particular if they are classified as GMFCS level I. Variables associated with ability to ride deserve consideration in shaping future efforts for the majority of this population who are not yet able to ride. © 2016 Mac Keith Press.
Sources of practice knowledge among Australian fitness trainers.
Bennie, Jason A; Wiesner, Glen H; van Uffelen, Jannique G Z; Harvey, Jack T; Biddle, Stuart J H
2017-12-01
Few studies have examined the sources of practice knowledge fitness trainers use to inform their training methods and update knowledge. This study aims to describe sources of practice knowledge among Australian fitness trainers. In July 2014, 9100 Australian fitness trainers were invited to complete an online survey. Respondents reported the frequency of use of eight sources of practice knowledge (e.g. fitness magazines, academic texts). In a separate survey, exercise science experts (n = 27) ranked each source as either (1) 'high-quality' or (2) 'low-quality'. Proportions of users of 'high-quality' sources were calculated across demographic (age, sex) and fitness industry-related characteristics (qualification, setting, role). A multivariate logistic regression analysis assessed the odds of being classified as a user of high-quality sources, adjusting for demographic and fitness industry-related factors. Out of 1185 fitness trainers (response rate = 13.0%), aged 17-72 years, 47.6% (95% CI, 44.7-50.4%) were classified as frequent users of high-quality sources of practice knowledge. In the adjusted analysis, compared to trainers aged 17-26 years, those aged ≥61 years (OR, 2.15; 95% CI, 1.05-4.38) and 40-50 years (OR, 1.54; 95% CI, 1.02-2.31) were more likely to be classified as a user of high-quality sources. When compared to trainers working in large centres, those working in outdoor settings (OR, 1.81; 95% CI, 1.23-2.65) and medium centres (OR, 1.59; 95% CI, 1.12-2.29) were more likely to be classified as users of high-quality sources. Our findings suggest that efforts should be made to improve the quality of knowledge acquisition among Australian fitness trainers.
Quantification of photoacoustic microscopy images for ovarian cancer detection
NASA Astrophysics Data System (ADS)
Wang, Tianheng; Yang, Yi; Alqasemi, Umar; Kumavor, Patrick D.; Wang, Xiaohong; Sanders, Melinda; Brewer, Molly; Zhu, Quing
2014-03-01
In this paper, human ovarian tissues with malignant and benign features were imaged ex vivo by using an opticalresolution photoacoustic microscopy (OR-PAM) system. Several features were quantitatively extracted from PAM images to describe photoacoustic signal distributions and fluctuations. 106 PAM images from 18 human ovaries were classified by applying those extracted features to a logistic prediction model. 57 images from 9 ovaries were used as a training set to train the logistic model, and 49 images from another 9 ovaries were used to test our prediction model. We assumed that if one image from one malignant ovary was classified as malignant, it is sufficient to classify this ovary as malignant. For the training set, we achieved 100% sensitivity and 83.3% specificity; for testing set, we achieved 100% sensitivity and 66.7% specificity. These preliminary results demonstrate that PAM could be extremely valuable in assisting and guiding surgeons for in vivo evaluation of ovarian tissue.
NASA Astrophysics Data System (ADS)
Ceppi, C.; Mancini, F.; Ritrovato, G.
2009-04-01
This study aim at the landslide susceptibility mapping within an area of the Daunia (Apulian Apennines, Italy) by a multivariate statistical method and data manipulation in a Geographical Information System (GIS) environment. Among the variety of existing statistical data analysis techniques, the logistic regression was chosen to produce a susceptibility map all over an area where small settlements are historically threatened by landslide phenomena. By logistic regression a best fitting between the presence or absence of landslide (dependent variable) and the set of independent variables is performed on the basis of a maximum likelihood criterion, bringing to the estimation of regression coefficients. The reliability of such analysis is therefore due to the ability to quantify the proneness to landslide occurrences by the probability level produced by the analysis. The inventory of dependent and independent variables were managed in a GIS, where geometric properties and attributes have been translated into raster cells in order to proceed with the logistic regression by means of SPSS (Statistical Package for the Social Sciences) package. A landslide inventory was used to produce the bivariate dependent variable whereas the independent set of variable concerned with slope, aspect, elevation, curvature, drained area, lithology and land use after their reductions to dummy variables. The effect of independent parameters on landslide occurrence was assessed by the corresponding coefficient in the logistic regression function, highlighting a major role played by the land use variable in determining occurrence and distribution of phenomena. Once the outcomes of the logistic regression are determined, data are re-introduced in the GIS to produce a map reporting the proneness to landslide as predicted level of probability. As validation of results and regression model a cell-by-cell comparison between the susceptibility map and the initial inventory of landslide events was performed and an agreement at 75% level achieved.
Determination of riverbank erosion probability using Locally Weighted Logistic Regression
NASA Astrophysics Data System (ADS)
Ioannidou, Elena; Flori, Aikaterini; Varouchakis, Emmanouil A.; Giannakis, Georgios; Vozinaki, Anthi Eirini K.; Karatzas, George P.; Nikolaidis, Nikolaos
2015-04-01
Riverbank erosion is a natural geomorphologic process that affects the fluvial environment. The most important issue concerning riverbank erosion is the identification of the vulnerable locations. An alternative to the usual hydrodynamic models to predict vulnerable locations is to quantify the probability of erosion occurrence. This can be achieved by identifying the underlying relations between riverbank erosion and the geomorphological or hydrological variables that prevent or stimulate erosion. Thus, riverbank erosion can be determined by a regression model using independent variables that are considered to affect the erosion process. The impact of such variables may vary spatially, therefore, a non-stationary regression model is preferred instead of a stationary equivalent. Locally Weighted Regression (LWR) is proposed as a suitable choice. This method can be extended to predict the binary presence or absence of erosion based on a series of independent local variables by using the logistic regression model. It is referred to as Locally Weighted Logistic Regression (LWLR). Logistic regression is a type of regression analysis used for predicting the outcome of a categorical dependent variable (e.g. binary response) based on one or more predictor variables. The method can be combined with LWR to assign weights to local independent variables of the dependent one. LWR allows model parameters to vary over space in order to reflect spatial heterogeneity. The probabilities of the possible outcomes are modelled as a function of the independent variables using a logistic function. Logistic regression measures the relationship between a categorical dependent variable and, usually, one or several continuous independent variables by converting the dependent variable to probability scores. Then, a logistic regression is formed, which predicts success or failure of a given binary variable (e.g. erosion presence or absence) for any value of the independent variables. The erosion occurrence probability can be calculated in conjunction with the model deviance regarding the independent variables tested. The most straightforward measure for goodness of fit is the G statistic. It is a simple and effective way to study and evaluate the Logistic Regression model efficiency and the reliability of each independent variable. The developed statistical model is applied to the Koiliaris River Basin on the island of Crete, Greece. Two datasets of river bank slope, river cross-section width and indications of erosion were available for the analysis (12 and 8 locations). Two different types of spatial dependence functions, exponential and tricubic, were examined to determine the local spatial dependence of the independent variables at the measurement locations. The results show a significant improvement when the tricubic function is applied as the erosion probability is accurately predicted at all eight validation locations. Results for the model deviance show that cross-section width is more important than bank slope in the estimation of erosion probability along the Koiliaris riverbanks. The proposed statistical model is a useful tool that quantifies the erosion probability along the riverbanks and can be used to assist managing erosion and flooding events. Acknowledgements This work is part of an on-going THALES project (CYBERSENSORS - High Frequency Monitoring System for Integrated Water Resources Management of Rivers). The project has been co-financed by the European Union (European Social Fund - ESF) and Greek national funds through the Operational Program "Education and Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research Funding Program: THALES. Investing in knowledge society through the European Social Fund.
Contract management in USA hospitals: service duplication and access within local markets.
Carey, Kathleen; Dor, Avi
2008-08-01
This paper examines the extent to which hospitals that are under external contract management engage in service duplication, as well as the degree to which the various services they offer contribute to or detract from community access. The study incorporates all USA hospitals using data from the American Hospital Association Annual Survey Database, supplemented by county level measures obtained from the area resource file (ARF). Using data on the 3794 hospitals classified as acute care facilities in 2002, we performed a set of logistic regressions that analyzed whether a hospital offered each of 74 distinct services. For each service (regression), key independent variables measured the number of other hospitals in the local market area that also offered the service. Local area market definitions are the areas circumscribed by the hospital within distances of 10 and 20 miles. Results suggest that contract-managed (CM) hospitals display a more competitive pattern (service duplication) than hospitals in general, but CM hospitals that are the sole provider of services locally are less likely to offer services than traditionally managed sole hospital providers. Contract management does not appear to offer any particular advantages in improving access to hospital services.
Dental calculus is associated with death from heart infarction.
Söder, Birgitta; Meurman, Jukka H; Söder, Per-Östen
2014-01-01
We studied whether the amount of dental calculus is associated with death from heart infarction in the dental infection-atherosclerosis paradigm. Participants were 1676 healthy young Swedes followed up from 1985 to 2011. At the beginning of the study all subjects underwent oral clinical examination including dental calculus registration scored with calculus index (CI). Outcome measure was cause of death classified according to WHO International Classification of Diseases. Unpaired t-test, Chi-square tests, and multiple logistic regressions were used. Of the 1676 participants, 2.8% had died during follow-up. Women died at a mean age of 61.5 years and men at 61.7 years. The difference in the CI index score between the survivors versus deceased patients was significant by the year 2009 (P < 0.01). In multiple regression analysis of the relationship between death from heart infarction as a dependent variable and CI as independent variable with controlling for age, gender, dental visits, dental plaque, periodontal pockets, education, income, socioeconomic status, and pack-years of smoking, CI score appeared to be associated with 2.3 times the odds ratio for cardiac death. The results confirmed our study hypothesis by showing that dental calculus indeed associated statistically with cardiac death due to infarction.
Automatic Generation of Customized, Model Based Information Systems for Operations Management.
The paper discusses the need for developing a customized, model based system to support management decision making in the field of operations ... management . It provides a critique of the current approaches available, formulates a framework to classify logistics decisions, and suggests an approach for the automatic development of logistics systems. (Author)
NASA Astrophysics Data System (ADS)
Yilmaz, Işık
2009-06-01
The purpose of this study is to compare the landslide susceptibility mapping methods of frequency ratio (FR), logistic regression and artificial neural networks (ANN) applied in the Kat County (Tokat—Turkey). Digital elevation model (DEM) was first constructed using GIS software. Landslide-related factors such as geology, faults, drainage system, topographical elevation, slope angle, slope aspect, topographic wetness index (TWI) and stream power index (SPI) were used in the landslide susceptibility analyses. Landslide susceptibility maps were produced from the frequency ratio, logistic regression and neural networks models, and they were then compared by means of their validations. The higher accuracies of the susceptibility maps for all three models were obtained from the comparison of the landslide susceptibility maps with the known landslide locations. However, respective area under curve (AUC) values of 0.826, 0.842 and 0.852 for frequency ratio, logistic regression and artificial neural networks showed that the map obtained from ANN model is more accurate than the other models, accuracies of all models can be evaluated relatively similar. The results obtained in this study also showed that the frequency ratio model can be used as a simple tool in assessment of landslide susceptibility when a sufficient number of data were obtained. Input process, calculations and output process are very simple and can be readily understood in the frequency ratio model, however logistic regression and neural networks require the conversion of data to ASCII or other formats. Moreover, it is also very hard to process the large amount of data in the statistical package.
Asai, Yumi; Imamura, Kotaro; Kawakami, Norito
2017-06-01
This study aimed to investigate associations of job stressors with panic attack (PA) and panic disorder (PD) among Japanese workers. A cross-sectional online questionnaire survey was conducted of 2060 workers. Job strain, effort/reward imbalance, and workplace social support were measured by the job content questionnaire and effort/reward imbalance questionnaire. These variables were classified into tertiles. PA/PD were measured by self-report based on the mini international neuropsychiatric interview (MINI). Multiple logistic regression was conducted, adjusting for demographic, lifestyle, and health-related covariates. Data from 1965 participants were analyzed. Adjusted odds ratio (OR) of PA/PD was significantly greater for the group with high effort/reward imbalance compared with the group with low effort/reward imbalance (ORs, 2.64 and 2.94, respectively, both P < 0.05). This study found effort/reward imbalance was associated with having PA/PD among Japanese workers.
Maternal occupation and the risk of neural tube defects in offspring.
Kim, Jihye; Langlois, Peter H; Mitchell, Laura E; Agopian, A J
2017-07-19
We evaluated the association between maternal occupation and the risk of neural tube defects (NTDs) in offspring. Data for 491 nonsyndromic cases were obtained from the Texas Birth Defects Registry for deliveries between 1999 and 2009. We randomly selected 2,291 controls among all live births in Texas during this time. Maternal occupations were classified using automated software and manual assignment. Multivariable logistic regression analyses were used to examine the relationship between maternal occupation and risk for any NTD, adjusting for maternal race/ethnicity, any diabetes, and maternal body mass index. These analyses were repeated for spina bifida specifically. Some maternal occupations, particularly those related to business/finance, health care practice, and cleaning/maintenance, were significantly associated with increased risk of spina bifida and/or any NTD. Further research is needed to identify the specific occupational exposures related to NTD risk.
Attell, Brandon K
2017-01-01
Several longitudinal studies show that over time the American public has become more approving of euthanasia and suicide for terminally ill persons. Yet, these previous findings are limited because they derive from biased estimates of disaggregated hierarchical data. Using insights from life course sociological theory and cross-classified logistic regression models, I better account for this liberalization process by disentangling the age, period, and cohort effects that contribute to longitudinal changes in these attitudes. The results of the analysis point toward a continued liberalization of both attitudes over time, although the magnitude of change was greater for suicide compared with euthanasia. More fluctuation in the probability of supporting both measures was exhibited for the age and period effects over the cohort effects. In addition, age-based differences in supporting both measures were found between men and women and various religious affiliations.
Yeung, Pui-Sze; Ho, Connie Suk-Han; Chan, David Wai-Ock; Chung, Kevin Kien-Hoa
2014-05-01
To identify the indicators of persistent reading difficulties among Chinese readers in early elementary grades, the performance of three groups of Chinese children with different reading trajectories ('persistent poor word readers', 'improved poor word readers' and 'skilled word readers') in reading-related measures was analysed in a 3-year longitudinal study. The three groups were classified according to their performance in a standardized Chinese word reading test in Grade 1 and Grade 4. Results of analysis of variance and logistic regression on the reading-related measures revealed that rapid naming and syntactic skills were important indicators of early word reading difficulty. Syntactic skills and morphological awareness were possible markers of persistent reading problems. Chinese persistent poor readers did not differ significantly from skilled readers on the measures of phonological skills. Copyright © 2014 John Wiley & Sons, Ltd.
Clinical reasoning in feline epilepsy: Which combination of clinical information is useful?
Stanciu, Gabriela-Dumitrita; Packer, Rowena Mary Anne; Pakozdy, Akos; Solcan, Gheorghe; Volk, Holger Andreas
2017-07-01
We sought to identify the association between clinical risk factors and the diagnosis of idiopathic epilepsy (IE) or structural epilepsy (SE) in cats, using statistical models to identify combinations of discrete parameters from the patient signalment, history and neurological examination findings that could suggest the most likely diagnosis. Data for 138 cats with recurrent seizures were reviewed, of which 110 were valid for inclusion. Seizure aetiology was classified as IE in 57% and SE in 43% of cats. Binomial logistic regression analyses demonstrated that pedigree status, older age at seizure onset (particularly >7years old), abnormal neurological examinations, and ictal vocalisation were associated with a diagnosis of SE compared to IE, and that ictal salivation was more likely to be associated with a diagnosis of IE than SE. These findings support the importance of considering inter-ictal neurological deficits and seizure history in clinical reasoning. Copyright © 2017 Elsevier Ltd. All rights reserved.
Morphometric-based sexual determination of Bananaquits (Coereba flaveola)
Bibles, Brent D.; Boal, Clint W.
2012-01-01
The Bananaquit (Coereba flaveola) is a common passerine throughout the tropics and has been a convenient species for ecological studies. This species has sexually monomorphic plumage and cannot be reliably sexed unless in breeding condition. This is problematic for demographic and comparative studies, which are contingent upon accurately aging and sexing individuals. Although male Bananaquits are larger than females, there is overlap in both wing chord and mass. We used morphometric data collected over eight years to develop a predictive model based on logistic regression to assign adult Bananaquits to sex. Our model classified 96% of validation individuals to the correct sex. We suggest that this approach may enhance ecological studies of the species by facilitating correct sex determination independent of breeding status. We believe our modeling approach is applicable elsewhere but, because there may be geographical variation across the species distribution, models will need to be customized to local populations.
Decoding memory features from hippocampal spiking activities using sparse classification models.
Dong Song; Hampson, Robert E; Robinson, Brian S; Marmarelis, Vasilis Z; Deadwyler, Sam A; Berger, Theodore W
2016-08-01
To understand how memory information is encoded in the hippocampus, we build classification models to decode memory features from hippocampal CA3 and CA1 spatio-temporal patterns of spikes recorded from epilepsy patients performing a memory-dependent delayed match-to-sample task. The classification model consists of a set of B-spline basis functions for extracting memory features from the spike patterns, and a sparse logistic regression classifier for generating binary categorical output of memory features. Results show that classification models can extract significant amount of memory information with respects to types of memory tasks and categories of sample images used in the task, despite the high level of variability in prediction accuracy due to the small sample size. These results support the hypothesis that memories are encoded in the hippocampal activities and have important implication to the development of hippocampal memory prostheses.
Development of a Pediatric Ebola Predictive Score, Sierra Leone1.
Fitzgerald, Felicity; Wing, Kevin; Naveed, Asad; Gbessay, Musa; Ross, J C G; Checchi, Francesco; Youkee, Daniel; Jalloh, Mohamed Boie; Baion, David E; Mustapha, Ayeshatu; Jah, Hawanatu; Lako, Sandra; Oza, Shefali; Boufkhed, Sabah; Feury, Reynold; Bielicki, Julia; Williamson, Elizabeth; Gibb, Diana M; Klein, Nigel; Sahr, Foday; Yeung, Shunmay
2018-02-01
We compared children who were positive for Ebola virus disease (EVD) with those who were negative to derive a pediatric EVD predictor (PEP) score. We collected data on all children <13 years of age admitted to 11 Ebola holding units in Sierra Leone during August 2014-March 2015 and performed multivariable logistic regression. Among 1,054 children, 309 (29%) were EVD positive and 697 (66%) EVD negative, with 48 (5%) missing. Contact history, conjunctivitis, and age were the strongest positive predictors for EVD. The PEP score had an area under receiver operating characteristics curve of 0.80. A PEP score of 7/10 was 92% specific and 44% sensitive; 3/10 was 30% specific, 94% sensitive. The PEP score could correctly classify 79%-90% of children and could be used to facilitate triage into risk categories, depending on the sensitivity or specificity required.
Donders, Jacobus; Janke, Kelly
2008-07-01
The performance of 40 children with complicated mild to severe traumatic brain injury on the Wechsler Intelligence Scale for Children-Fourth Edition (WISC-IV; Wechsler, 2003) was compared with that of 40 demographically matched healthy controls. Of the four WISC-IV factor index scores, only Processing Speed yielded a statistically significant group difference (p < .001) as well as a statistically significant negative correlation with length of coma (p < .01). Logistic regression, using Processing Speed to classify individual children, yielded a sensitivity of 72.50% and a specificity of 62.50%, with false positive and false negative rates both exceeding 30%. We conclude that Processing Speed has acceptable criterion validity in the evaluation of children with complicated mild to severe traumatic brain injury but that the WISC-IV should be supplemented with other measures to assure sufficient accuracy in the diagnostic process.
Trace element analysis of rough diamond by LA-ICP-MS: a case of source discrimination?
Dalpé, Claude; Hudon, Pierre; Ballantyne, David J; Williams, Darrell; Marcotte, Denis
2010-11-01
Current profiling of rough diamond source is performed using different physical and/or morphological techniques that require strong knowledge and experience in the field. More recently, chemical impurities have been used to discriminate diamond source and with the advance of laser ablation-inductively coupled plasma-mass spectrometry (LA-ICP-MS) empirical profiling of rough diamonds is possible to some extent. In this study, we present a LA-ICP-MS methodology that we developed for analyzing ultra-trace element impurities in rough diamond for origin determination ("profiling"). Diamonds from two sources were analyzed by LA-ICP-MS and were statistically classified by accepted methods. For the two diamond populations analyzed in this study, binomial logistic regression produced a better overall correct classification than linear discriminant analysis. The results suggest that an anticipated matrix match reference material would improve the robustness of our methodology for forensic applications. © 2010 American Academy of Forensic Sciences.
Manchia, Mirko; Firinu, Giorgio; Carpiniello, Bernardo; Pinna, Federica
2017-03-31
Severe mental illness (SMI) has considerable excess morbidity and mortality, a proportion of which is explained by cardiovascular diseases, caused in part by antipsychotic (AP) induced QT-related arrhythmias and sudden death by Torsade de Point (TdP). The implementation of evidence-based recommendations for cardiac function monitoring might reduce the incidence of these AP-related adverse events. To investigate clinicians' adherence to cardiac function monitoring before and after starting AP, we performed a retrospective assessment of 434 AP-treated SMI patients longitudinally followed-up for 5 years at an academic community mental health center. We classified antipsychotics according to their risk of inducing QT-related arrhythmias and TdP (Center for Research on Therapeutics, University of Arizona). We used univariate tests and multinomial or binary logistic regression model for data analysis. Univariate and multinomial regression analysis showed that psychiatrists were more likely to perform pre-treatment electrocardiogram (ECG) and electrolyte testing with AP carrying higher cardiovascular risk, but not on the basis of AP pharmacological class. Univariate and binomial regression analysis showed that cardiac function parameters (ECG and electrolyte balance) were more frequently monitored during treatment with second generation AP than with first generation AP. Our data show the presence of weaknesses in the cardiac function monitoring of AP-treated SMI patients, and might guide future interventions to tackle them.
Display area, looking north towards the classified storage rooms, D.M. ...
Display area, looking north towards the classified storage rooms, D.M. Logistics and D.O. Offices in northwest corner. Viewing bridge is at upper left, and alert status display at upper right - March Air Force Base, Strategic Air Command, Combat Operations Center, 5220 Riverside Drive, Moreno Valley, Riverside County, CA
ERIC Educational Resources Information Center
Schumacher, Phyllis; Olinsky, Alan; Quinn, John; Smith, Richard
2010-01-01
The authors extended previous research by 2 of the authors who conducted a study designed to predict the successful completion of students enrolled in an actuarial program. They used logistic regression to determine the probability of an actuarial student graduating in the major or dropping out. They compared the results of this study with those…
Carolyn B. Meyer; Sherri L. Miller; C. John Ralph
2004-01-01
The scale at which habitat variables are measured affects the accuracy of resource selection functions in predicting animal use of sites. We used logistic regression models for a wide-ranging species, the marbled murrelet, (Brachyramphus marmoratus) in a large region in California to address how much changing the spatial or temporal scale of...
ERIC Educational Resources Information Center
Monahan, Patrick O.; McHorney, Colleen A.; Stump, Timothy E.; Perkins, Anthony J.
2007-01-01
Previous methodological and applied studies that used binary logistic regression (LR) for detection of differential item functioning (DIF) in dichotomously scored items either did not report an effect size or did not employ several useful measures of DIF magnitude derived from the LR model. Equations are provided for these effect size indices.…
ERIC Educational Resources Information Center
Magis, David; Raiche, Gilles; Beland, Sebastien; Gerard, Paul
2011-01-01
We present an extension of the logistic regression procedure to identify dichotomous differential item functioning (DIF) in the presence of more than two groups of respondents. Starting from the usual framework of a single focal group, we propose a general approach to estimate the item response functions in each group and to test for the presence…
Risk Factors of Falls in Community-Dwelling Older Adults: Logistic Regression Tree Analysis
ERIC Educational Resources Information Center
Yamashita, Takashi; Noe, Douglas A.; Bailer, A. John
2012-01-01
Purpose of the Study: A novel logistic regression tree-based method was applied to identify fall risk factors and possible interaction effects of those risk factors. Design and Methods: A nationally representative sample of American older adults aged 65 years and older (N = 9,592) in the Health and Retirement Study 2004 and 2006 modules was used.…
ERIC Educational Resources Information Center
Gordovil-Merino, Amalia; Guardia-Olmos, Joan; Pero-Cebollero, Maribel
2012-01-01
In this paper, we used simulations to compare the performance of classical and Bayesian estimations in logistic regression models using small samples. In the performed simulations, conditions were varied, including the type of relationship between independent and dependent variable values (i.e., unrelated and related values), the type of variable…
Ohlmacher, G.C.; Davis, J.C.
2003-01-01
Landslides in the hilly terrain along the Kansas and Missouri rivers in northeastern Kansas have caused millions of dollars in property damage during the last decade. To address this problem, a statistical method called multiple logistic regression has been used to create a landslide-hazard map for Atchison, Kansas, and surrounding areas. Data included digitized geology, slopes, and landslides, manipulated using ArcView GIS. Logistic regression relates predictor variables to the occurrence or nonoccurrence of landslides within geographic cells and uses the relationship to produce a map showing the probability of future landslides, given local slopes and geologic units. Results indicated that slope is the most important variable for estimating landslide hazard in the study area. Geologic units consisting mostly of shale, siltstone, and sandstone were most susceptible to landslides. Soil type and aspect ratio were considered but excluded from the final analysis because these variables did not significantly add to the predictive power of the logistic regression. Soil types were highly correlated with the geologic units, and no significant relationships existed between landslides and slope aspect. ?? 2003 Elsevier Science B.V. All rights reserved.
A Method for Calculating the Probability of Successfully Completing a Rocket Propulsion Ground Test
NASA Technical Reports Server (NTRS)
Messer, Bradley
2007-01-01
Propulsion ground test facilities face the daily challenge of scheduling multiple customers into limited facility space and successfully completing their propulsion test projects. Over the last decade NASA s propulsion test facilities have performed hundreds of tests, collected thousands of seconds of test data, and exceeded the capabilities of numerous test facility and test article components. A logistic regression mathematical modeling technique has been developed to predict the probability of successfully completing a rocket propulsion test. A logistic regression model is a mathematical modeling approach that can be used to describe the relationship of several independent predictor variables X(sub 1), X(sub 2),.., X(sub k) to a binary or dichotomous dependent variable Y, where Y can only be one of two possible outcomes, in this case Success or Failure of accomplishing a full duration test. The use of logistic regression modeling is not new; however, modeling propulsion ground test facilities using logistic regression is both a new and unique application of the statistical technique. Results from this type of model provide project managers with insight and confidence into the effectiveness of rocket propulsion ground testing.
Fei, Yang; Hu, Jian; Gao, Kun; Tu, Jianfeng; Li, Wei-Qin; Wang, Wei
2017-06-01
To construct a radical basis function (RBF) artificial neural networks (ANNs) model to predict the incidence of acute pancreatitis (AP)-induced portal vein thrombosis. The analysis included 353 patients with AP who had admitted between January 2011 and December 2015. RBF ANNs model and logistic regression model were constructed based on eleven factors relevant to AP respectively. Statistical indexes were used to evaluate the value of the prediction in two models. The predict sensitivity, specificity, positive predictive value, negative predictive value and accuracy by RBF ANNs model for PVT were 73.3%, 91.4%, 68.8%, 93.0% and 87.7%, respectively. There were significant differences between the RBF ANNs and logistic regression models in these parameters (P<0.05). In addition, a comparison of the area under receiver operating characteristic curves of the two models showed a statistically significant difference (P<0.05). The RBF ANNs model is more likely to predict the occurrence of PVT induced by AP than logistic regression model. D-dimer, AMY, Hct and PT were important prediction factors of approval for AP-induced PVT. Copyright © 2017 Elsevier Inc. All rights reserved.
Wang, Shuang; Jiang, Xiaoqian; Wu, Yuan; Cui, Lijuan; Cheng, Samuel; Ohno-Machado, Lucila
2013-01-01
We developed an EXpectation Propagation LOgistic REgRession (EXPLORER) model for distributed privacy-preserving online learning. The proposed framework provides a high level guarantee for protecting sensitive information, since the information exchanged between the server and the client is the encrypted posterior distribution of coefficients. Through experimental results, EXPLORER shows the same performance (e.g., discrimination, calibration, feature selection etc.) as the traditional frequentist Logistic Regression model, but provides more flexibility in model updating. That is, EXPLORER can be updated one point at a time rather than having to retrain the entire data set when new observations are recorded. The proposed EXPLORER supports asynchronized communication, which relieves the participants from coordinating with one another, and prevents service breakdown from the absence of participants or interrupted communications. PMID:23562651
Dietary consumption patterns and laryngeal cancer risk.
Vlastarakos, Petros V; Vassileiou, Andrianna; Delicha, Evie; Kikidis, Dimitrios; Protopapas, Dimosthenis; Nikolopoulos, Thomas P
2016-06-01
We conducted a case-control study to investigate the effect of diet on laryngeal carcinogenesis. Our study population was made up of 140 participants-70 patients with laryngeal cancer (LC) and 70 controls with a non-neoplastic condition that was unrelated to diet, smoking, or alcohol. A food-frequency questionnaire determined the mean consumption of 113 different items during the 3 years prior to symptom onset. Total energy intake and cooking mode were also noted. The relative risk, odds ratio (OR), and 95% confidence interval (CI) were estimated by multiple logistic regression analysis. We found that the total energy intake was significantly higher in the LC group (p < 0.001), and that the difference remained statistically significant after logistic regression analysis (p < 0.001; OR: 118.70). Notably, meat consumption was higher in the LC group (p < 0.001), and the difference remained significant after logistic regression analysis (p = 0.029; OR: 1.16). LC patients also consumed significantly more fried food (p = 0.036); this difference also remained significant in the logistic regression model (p = 0.026; OR: 5.45). The LC group also consumed significantly more seafood (p = 0.012); the difference persisted after logistic regression analysis (p = 0.009; OR: 2.48), with the consumption of shrimp proving detrimental (p = 0.049; OR: 2.18). Finally, the intake of zinc was significantly higher in the LC group before and after logistic regression analysis (p = 0.034 and p = 0.011; OR: 30.15, respectively). Cereal consumption (including pastas) was also higher among the LC patients (p = 0.043), with logistic regression analysis showing that their negative effect was possibly associated with the sauces and dressings that traditionally accompany pasta dishes (p = 0.006; OR: 4.78). Conversely, a higher consumption of dairy products was found in controls (p < 0.05); logistic regression analysis showed that calcium appeared to be protective at the micronutrient level (p < 0.001; OR: 0.27). We found no difference in the overall consumption of fruits and vegetables between the LC patients and controls; however, the LC patients did have a greater consumption of cooked tomatoes and cooked root vegetables (p = 0.039 for both), and the controls had more consumption of leeks (p = 0.042) and, among controls younger than 65 years, cooked beans (p = 0.037). Lemon (p = 0.037), squeezed fruit juice (p = 0.032), and watermelon (p = 0.018) were also more frequently consumed by the controls. Other differences at the micronutrient level included greater consumption by the LC patients of retinol (p = 0.044), polyunsaturated fats (p = 0.041), and linoleic acid (p = 0.008); LC patients younger than 65 years also had greater intake of riboflavin (p = 0.045). We conclude that the differences in dietary consumption patterns between LC patients and controls indicate a possible role for lifestyle modifications involving nutritional factors as a means of decreasing the risk of laryngeal cancer.
NASA Astrophysics Data System (ADS)
Esposito, Carlo; Barra, Anna; Evans, Stephen G.; Scarascia Mugnozza, Gabriele; Delaney, Keith
2014-05-01
The study of landslide susceptibility by multivariate statistical methods is based on finding a quantitative relationship between controlling factors and landslide occurrence. Such studies have become popular in the last few decades thanks to the development of geographic information systems (GIS) software and the related improved data management. In this work we applied a statistical approach to an area of high landslide susceptibility mainly due to its tropical climate and geological-geomorphological setting. The study area is located in the south-east region of Brazil that has frequently been affected by flood and landslide hazard, especially because of heavy rainfall events during the summer season. In this work we studied a disastrous event that occurred on January 11th and 12th of 2011, which involved Região Serrana (the mountainous region of Rio de Janeiro State) and caused more than 5000 landslides and at least 904 deaths. In order to produce susceptibility maps, we focused our attention on an area of 93,6 km2 that includes Nova Friburgo city. We utilized two different multivariate statistic methods: Logistic Regression (LR), already widely used in applied geosciences, and Random Forest (RF), which has only recently been applied to landslide susceptibility analysis. With reference to each mapping unit, the first method (LR) results in a probability of landslide occurrence, while the second one (RF) gives a prediction in terms of % of area susceptible to slope failure. With this aim in mind, a landslide inventory map (related to the studied event) has been drawn up through analyses of high-resolution GeoEye satellite images, in a GIS environment. Data layers of 11 causative factors have been created and processed in order to be used as continuous numerical or discrete categorical variables in statistical analysis. In particular, the logistic regression method has frequent difficulties in managing numerical continuous and discrete categorical variables together; therefore in our work we tried different methods to process categorical variables , until we obtained a statistically significant model. The outcomes of the two statistical methods (RF and LR) have been tested with a spatial validation and gave us two susceptibility maps. The significance of the models is quantified in terms of Area Under ROC Curve (AUC resulted in 0.81 for RF model and in 0.72 for LR model). In the first instance, a graphical comparison of the two methods shows a good correspondence between them. Further, we integrated results in a unique susceptibility map which maintains both information of probability of occurrence and % of area of landslide detachment, resulting from LR and RF respectively. In fact, in view of a landslide susceptibility classification of the study area, the former is less accurate but gives easily classifiable results, while the latter is more accurate but the results can be only subjectively classified. The obtained "integrated" susceptibility map preserves information about the probability that a given % of area could fail for each mapping unit.
Earnst, K S; Marson, D C; Harrell, L E
2000-08-01
To investigate measures of patient cognitive abilities as predictors of physician judgments of medical treatment consent capacity (competency) in patients with Alzheimer's disease (AD). Predictor models of legal standards (LS) and personal competency judgments were developed for each study physician using independent neuropsychological test measures and logistic regression analyses. A university medical center. Five physicians with experience assessing the competency of AD patients were recruited to make competency judgments of videotaped vignettes from 10 older controls and 21 patients with AD (10 with mild and 11 with moderate dementia). The 31 patient and control videotapes of performance on a measure of treatment consent capacity (Capacity to Consent to Treatment Instrument) (CCTI) were rated by the five physicians. The CCTI consists of two clinical vignettes (A-neoplasm and B-cardiac) that test competency under five LS. Each study physician viewed each vignette videotape individually, made judgments of competent or incompetent under each of the LS, and then made his/her own personal competency judgment. Physicians were blinded to participant diagnosis and neuropsychological test performance. Stepwise logistic regression was conducted to identify cognitive predictors of each physician's LS and personal competency judgments for Vignette A using the full sample (n = 31). Classification logistic regression analysis was used to determine how well these cognitive predictor models classified each physician's competency judgments for Vignette A. These classification models were then cross-validated using physician's Vignette B judgments. Cognitive predictor models for Vignette A competency judgments differed across individual physicians, and were related to difficulty of LS and to incompetency outcome rates across LS for AD patients. Measures of semantic knowledge and receptive language predicted judgments under less difficult LS of evidencing a treatment choice (LS1) and making the reasonable treatment choice (LS2). Measures of semantic knowledge, short-term verbal recall, and simple reasoning ability predicted judgments under more difficult and clinically relevant LS of appreciating consequences of a treatment choice (LS3), providing rational reasons for a treatment choice (LS4), and understanding the treatment situation and choices (LSS). Cognitive models for physicians' personal competency judgments were virtually identical to their respective models for LS5 judgments. For AD patients, shortterm memory predictors were associated with high incompetency outcome rates (over 70%), a simple reasoning measure was associated with moderately high incompetency outcome rates (60-70%), and a semantic knowledge measure was associated with lower incompetency outcome rates (30-60%). Overall, single predictor models were relatively robust, correctly classifying an average of 83% of physician judgments for Vignette A and 80% of judgments for Vignette B. Multiple cognitive functions predicted physicians' LS and personal competency judgments. Declines in semantic knowledge, short-term verbal recall, and simple reasoning ability predicted physicians' judgments on the three most difficult and clinically most relevant LS (LS3-LS5), as well as their personal competency judgments. Our findings suggest that clinical assessment of competency should include evaluation of semantic knowledge, verbal recall, and simple reasoning abilities.
Callan, Daniel; Mills, Lloyd; Nott, Connie; England, Robert; England, Shaun
2014-01-01
Chronic pain is one of the most prevalent health problems in the world today, yet neurological markers, critical to diagnosis of chronic pain, are still largely unknown. The ability to objectively identify individuals with chronic pain using functional magnetic resonance imaging (fMRI) data is important for the advancement of diagnosis, treatment, and theoretical knowledge of brain processes associated with chronic pain. The purpose of our research is to investigate specific neurological markers that could be used to diagnose individuals experiencing chronic pain by using multivariate pattern analysis with fMRI data. We hypothesize that individuals with chronic pain have different patterns of brain activity in response to induced pain. This pattern can be used to classify the presence or absence of chronic pain. The fMRI experiment consisted of alternating 14 seconds of painful electric stimulation (applied to the lower back) with 14 seconds of rest. We analyzed contrast fMRI images in stimulation versus rest in pain-related brain regions to distinguish between the groups of participants: 1) chronic pain and 2) normal controls. We employed supervised machine learning techniques, specifically sparse logistic regression, to train a classifier based on these contrast images using a leave-one-out cross-validation procedure. We correctly classified 92.3% of the chronic pain group (N = 13) and 92.3% of the normal control group (N = 13) by recognizing multivariate patterns of activity in the somatosensory and inferior parietal cortex. This technique demonstrates that differences in the pattern of brain activity to induced pain can be used as a neurological marker to distinguish between individuals with and without chronic pain. Medical, legal and business professionals have recognized the importance of this research topic and of developing objective measures of chronic pain. This method of data analysis was very successful in correctly classifying each of the two groups.
Callan, Daniel; Mills, Lloyd; Nott, Connie; England, Robert; England, Shaun
2014-01-01
Chronic pain is one of the most prevalent health problems in the world today, yet neurological markers, critical to diagnosis of chronic pain, are still largely unknown. The ability to objectively identify individuals with chronic pain using functional magnetic resonance imaging (fMRI) data is important for the advancement of diagnosis, treatment, and theoretical knowledge of brain processes associated with chronic pain. The purpose of our research is to investigate specific neurological markers that could be used to diagnose individuals experiencing chronic pain by using multivariate pattern analysis with fMRI data. We hypothesize that individuals with chronic pain have different patterns of brain activity in response to induced pain. This pattern can be used to classify the presence or absence of chronic pain. The fMRI experiment consisted of alternating 14 seconds of painful electric stimulation (applied to the lower back) with 14 seconds of rest. We analyzed contrast fMRI images in stimulation versus rest in pain-related brain regions to distinguish between the groups of participants: 1) chronic pain and 2) normal controls. We employed supervised machine learning techniques, specifically sparse logistic regression, to train a classifier based on these contrast images using a leave-one-out cross-validation procedure. We correctly classified 92.3% of the chronic pain group (N = 13) and 92.3% of the normal control group (N = 13) by recognizing multivariate patterns of activity in the somatosensory and inferior parietal cortex. This technique demonstrates that differences in the pattern of brain activity to induced pain can be used as a neurological marker to distinguish between individuals with and without chronic pain. Medical, legal and business professionals have recognized the importance of this research topic and of developing objective measures of chronic pain. This method of data analysis was very successful in correctly classifying each of the two groups. PMID:24905072
Nuutinen, Mikko; Leskelä, Riikka-Leena; Suojalehto, Ella; Tirronen, Anniina; Komssi, Vesa
2017-04-13
In previous years a substantial number of studies have identified statistically important predictors of nursing home admission (NHA). However, as far as we know, the analyses have been done at the population-level. No prior research has analysed the prediction accuracy of a NHA model for individuals. This study is an analysis of 3056 longer-term home care customers in the city of Tampere, Finland. Data were collected from the records of social and health service usage and RAI-HC (Resident Assessment Instrument - Home Care) assessment system during January 2011 and September 2015. The aim was to find out the most efficient variable subsets to predict NHA for individuals and validate the accuracy. The variable subsets of predicting NHA were searched by sequential forward selection (SFS) method, a variable ranking metric and the classifiers of logistic regression (LR), support vector machine (SVM) and Gaussian naive Bayes (GNB). The validation of the results was guaranteed using randomly balanced data sets and cross-validation. The primary performance metrics for the classifiers were the prediction accuracy and AUC (average area under the curve). The LR and GNB classifiers achieved 78% accuracy for predicting NHA. The most important variables were RAI MAPLE (Method for Assigning Priority Levels), functional impairment (RAI IADL, Activities of Daily Living), cognitive impairment (RAI CPS, Cognitive Performance Scale), memory disorders (diagnoses G30-G32 and F00-F03) and the use of community-based health-service and prior hospital use (emergency visits and periods of care). The accuracy of the classifier for individuals was high enough to convince the officials of the city of Tampere to integrate the predictive model based on the findings of this study as a part of home care information system. Further work need to be done to evaluate variables that are modifiable and responsive to interventions.
Spatial patterns of high Aedes aegypti oviposition activity in northwestern Argentina.
Estallo, Elizabet Lilia; Más, Guillermo; Vergara-Cid, Carolina; Lanfri, Mario Alberto; Ludueña-Almeida, Francisco; Scavuzzo, Carlos Marcelo; Introini, María Virginia; Zaidenberg, Mario; Almirón, Walter Ricardo
2013-01-01
In Argentina, dengue has affected mainly the Northern provinces, including Salta. The objective of this study was to analyze the spatial patterns of high Aedes aegypti oviposition activity in San Ramón de la Nueva Orán, northwestern Argentina. The location of clusters as hot spot areas should help control programs to identify priority areas and allocate their resources more effectively. Oviposition activity was detected in Orán City (Salta province) using ovitraps, weekly replaced (October 2005-2007). Spatial autocorrelation was measured with Moran's Index and depicted through cluster maps to identify hot spots. Total egg numbers were spatially interpolated and a classified map with Ae. aegypti high oviposition activity areas was performed. Potential breeding and resting (PBR) sites were geo-referenced. A logistic regression analysis of interpolated egg numbers and PBR location was performed to generate a predictive mapping of mosquito oviposition activity. Both cluster maps and predictive map were consistent, identifying in central and southern areas of the city high Ae. aegypti oviposition activity. A logistic regression model was successfully developed to predict Ae. aegypti oviposition activity based on distance to PBR sites, with tire dumps having the strongest association with mosquito oviposition activity. A predictive map reflecting probability of oviposition activity was produced. The predictive map delimitated an area of maximum probability of Ae. aegypti oviposition activity in the south of Orán city where tire dumps predominate. The overall fit of the model was acceptable (ROC=0.77), obtaining 99% of sensitivity and 75.29% of specificity. Distance to tire dumps is inversely associated with high mosquito activity, allowing us to identify hot spots. These methodologies are useful for prevention, surveillance, and control of tropical vector borne diseases and might assist National Health Ministry to focus resources more effectively.
Ha, Eun Kyo; Baek, Ji Hyeon; Lee, So-Yeon; Park, Yong Mean; Kim, Woo Kyung; Sheen, Youn Ho; Lee, Seung Jin; Bae, Youngoh; Kim, Jihyeon; Lee, Kee-Jae; Ahn, Kangmo; Kwon, Ho-Jang; Han, Man Yong
2016-01-01
Aeroallergen sensitization is related to the coexistence of allergic diseases, but the nature of this relationship is poorly understood. The aim of this study was to clarify the relationship of polysensitization with allergic multimorbidities and the severity of allergic diseases. This study is a cross-sectional analysis of 3,368 Korean children aged 6-7 years-old. We defined IgE-mediated allergic diseases based on structured questionnaires, and classified the sensitivity to 18 aeroallergens by logistic regression and the Ward hierarchical clustering method. The relationship of polysensitization (positive IgE responses against 2 or more aeroallergens classes) with allergic multimorbidities (coexistence of 2 or more of the following allergic diseases: asthma, rhinitis, eczema, and conjunctivitis) and severity of allergic diseases was determined by ordinal logistic regression analysis. The rate of polysensitization was 13.6% (n = 458, 95% CI 12.4-14.8) and that of allergic multimorbidity was 23.5% (n = 790, 95% CI 22.0-24.9). Children sensitized to more aeroallergens tended to have more allergic diseases (rho = 0.248, p < 0.001), although the agreement between polysensitization and multimorbidity was poor (kappa = 0.11, p < 0.001). The number allergen classes to which a child was sensitized increased the risk of wheezing attacks (1 allergen: adjusted odds ratio [aOR] 2.22, 4 or more allergens: aOR 9.39), absence from school (1 allergen: aOR 1.96, 3 allergens: aOR 2.08), and severity of nasal symptoms (1 allergen: aOR 1.61, 4 or more allergens: aOR 4.38). Polysensitization was weakly related to multimorbidity. However, the number of allergens to which a child is sensitized is related to the severity of IgE-mediated symptoms. © 2017 S. Karger AG, Basel.
Babb, James; Xia, Ding; Chang, Gregory; Krasnokutsky, Svetlana; Abramson, Steven B.; Jerschow, Alexej; Regatte, Ravinder R.
2013-01-01
Purpose: To assess the potential use of sodium magnetic resonance (MR) imaging of cartilage, with and without fluid suppression by using an adiabatic pulse, for classifying subjects with versus subjects without osteoarthritis at 7.0 T. Materials and Methods: The study was approved by the institutional review board and was compliant with HIPAA. The knee cartilage of 19 asymptomatic (control subjects) and 28 symptomatic (osteoarthritis patients) subjects underwent 7.0-T sodium MR imaging with use of two different sequences: one without fluid suppression (radial three-dimensional sequence) and one with fluid suppression (inversion recovery [IR] wideband uniform rate and smooth truncation [WURST]). Fluid suppression was obtained by using IR with an adiabatic inversion pulse (WURST pulse). Mean sodium concentrations and their standard deviations were measured in the patellar, femorotibial medial, and lateral cartilage regions over four consecutive sections for each subject. The minimum, maximum, median, and average means and standard deviations were calculated over all measurements for each subject. The utility of these measures in the detection of osteoarthritis was evaluated by using logistic regression and the area under the receiver operating characteristic curve (AUC). Bonferroni correction was applied to the P values obtained with logistic regression. Results: Measurements from IR WURST were found to be significant predicators of all osteoarthritis (Kellgren-Lawrence score of 1–4) and early osteoarthritis (Kellgren-Lawrence score of 1 or 2). The minimum standard deviation provided the highest AUC (0.83) with the highest accuracy (>78%), sensitivity (>82%), and specificity (>74%) for both all osteoarthritis and early osteoarthritis groups. Conclusion: Quantitative sodium MR imaging at 7.0 T with fluid suppression by using adiabatic IR is a potential biomarker for osteoarthritis. © RSNA, 2013 PMID:23468572
Domingueti, Caroline Pereira; Fóscolo, Rodrigo Bastos; Dusse, Luci Maria S; Reis, Janice Sepúlveda; Carvalho, Maria das Graças; Gomes, Karina Braga; Fernandes, Ana Paula
2018-02-01
Objective This study aimed to evaluate the association between different renal biomarkers with D-Dimer levels in diabetes mellitus (DM1) patients group classified as: low D-Dimer levels (< 318 ng/mL), which included first and second D-Dimer tertiles, and high D-Dimer levels (≥ 318 ng/mL), which included third D-Dimer tertile. Materials and methods D-Dimer and cystatin C were measured by ELISA. Creatinine and urea were determined by enzymatic method. Estimated glomerular filtration rate (eGFR) was calculated using CKD-EPI equation. Albuminuria was assessed by immunoturbidimetry. Presence of renal disease was evaluated using each renal biomarker: creatinine, urea, cystatin C, eGFR and albuminuria. Bivariate logistic regression analysis was performed to assess which renal biomarkers are associated with high D-Dimer levels and odds ratio was calculated. After, multivariate logistic regression analysis was performed to assess which renal biomarkers are associated with high D-Dimer levels (after adjusting for sex and age) and odds ratio was calculated. Results Cystatin C presented a better association [OR of 9.8 (3.8-25.5)] with high D-Dimer levels than albuminuria, creatinine, eGFR and urea [OR of 5.3 (2.2-12.9), 8.4 (2.5-25.4), 9.1 (2.6-31.4) and 3.5 (1.4-8.4), respectively] after adjusting for sex and age. All biomarkers showed a good association with D-Dimer levels, and consequently, with hypercoagulability status, and cystatin C showed the best association among them. Conclusion Therefore, cystatin C might be useful to detect patients with incipient diabetic kidney disease that present an increased risk of cardiovascular disease, contributing to an early adoption of reno and cardioprotective therapies.
The role of diabetes mellitus and BMI in the surgical treatment of ankle fractures.
Lanzetti, Riccardo Maria; Lupariello, Domenico; Venditto, Teresa; Guzzini, Matteo; Ponzo, Antonio; De Carli, Angelo; Ferretti, Andrea
2018-02-01
Open reduction and internal fixation is the standard treatment for displaced ankle fractures. However, the presence of comorbidities such as diabetes mellitus and body mass index (BMI) are associated with poor bone quality, and these factors may predict the development of postoperative complications. The study aim was to assess the role of diabetes mellitus and BMI in wound healing in patients younger than 65 years who were surgically treated for malleoli fractures. Ninety patients, aged from 18 to 65 years old, with surgically treated ankle fracture, were retrospectively enrolled. Patients were classified in two groups: patient with diabetes and patients without diabetes (insulin-dependent and noninsulin dependent). All patients were assessed for wound complications, Visual Analogue Scale and Foot and Ankle Disability Index (FADI) were assessed for all patients. Logistic regression was used to identify the risk of wound complications after surgery using the following factors as explanatory variables: age, gender, duration of surgery, BMI, hypercholesterolemia, smoking history, diabetes mellitus, and high blood pressure. In total, 38.9% of patients showed wound complications. Of them, 17.1% were nondiabetics and 82.9% were diabetics. We observed a significant association between DM and wound complications after surgery (P = .005). Logistic regression analysis revealed that DM (P < .001) and BMI (P = .03) were associated with wound complications. The odds of having a postoperative wound complication were increased 0.16 times in the presence of diabetes and 1.14 times for increasing BMI. This study showed that diabetes mellitus and higher BMI delay the wound healing and increase the complication rate in young adult patients with surgically treated bimalleolar fractures. Copyright © 2017 John Wiley & Sons, Ltd.
Braga, Larissa; Semelka, Richard C; Pietrobon, Ricardo; Martin, Diego; de Barros, Nestor; Guller, Ulrich
2004-05-01
The aim of our study was to evaluate the association of the vascularity of liver metastases, as characterized by MRI, and disease progression in breast cancer patients. Sixteen breast cancer patients with liver metastases who underwent MRI before and after systemic therapy were retrospectively identified. On the basis of comparison of each MRI examination with the previous examination, disease status of the patients was classified as complete response, partial response, stable disease, or progressive disease. Liver metastases were characterized as hyper- or hypovascular on the basis of the degree of enhancement in the arterial, portal, and interstitial phases of imaging after administration of a contrast agent. Fisher's exact test and ordinal logistic regression models, including the type of systemic therapy, presence of multiple metastases, and hormone receptor status, were used to estimate the unadjusted and risk-adjusted association between the presence of hypervascular liver metastases and disease progression. All patients in our sample (n = 16) were women and most (12/16, 75%) were white. Their median age was 51.5 years. In unadjusted analyses, the association between the presence of hypervascular liver metastases and disease progression was statistically significant (p < 0.0001). In multiple logistic regression analyses, hypervascular liver metastases were found to be an independent predictor of disease progression. Patients with hypervascular liver lesions were 20.5 times more likely to experience disease progression than patients without hypervascular metastases (odds ratio, 20.5; 95% confidence interval, 5.1-83.5; p < 0.0001). Our analysis provides suggestive evidence that disease progression can be predicted through MRI assessment of the vascularity of liver metastases in patients with breast cancer.
Chin, Weng-Yee; Wan, Eric Yuk Fai; Dowrick, Christopher; Arroll, Bruce; Lam, Cindy Lo Kuen
2018-04-26
The aim of this study was to explore the relationship between patient self-reported Patient Health Questionnaire-9 (PHQ-9) symptoms and doctor diagnosis of depression using a tree analysis approach. This was a secondary analysis on a dataset obtained from 10 179 adult primary care patients and 59 primary care physicians (PCPs) across Hong Kong. Patients completed a waiting room survey collecting data on socio-demographics and the PHQ-9. Blinded doctors documented whether they thought the patient had depression. Data were analyzed using multiple logistic regression and conditional inference decision tree modeling. PCPs diagnosed 594 patients with depression. Logistic regression identified gender, age, employment status, past history of depression, family history of mental illness and recent doctor visit as factors associated with a depression diagnosis. Tree analyses revealed different pathways of association between PHQ-9 symptoms and depression diagnosis for patients with and without past depression. The PHQ-9 symptom model revealed low mood, sense of worthlessness, fatigue, sleep disturbance and functional impairment as early classifiers. The PHQ-9 total score model revealed cut-off scores of >12 and >15 were most frequently associated with depression diagnoses in patients with and without past depression. A past history of depression is the most significant factor associated with the diagnosis of depression. PCPs appear to utilize a hypothetical-deductive problem-solving approach incorporating pre-test probability, with different associated factors for patients with and without past depression. Diagnostic thresholds may be too low for patients with past depression and too high for those without, potentially leading to over and under diagnosis of depression.
Efficacy of oral moxifloxacin for aerobic vaginitis.
Wang, C; Han, C; Geng, N; Fan, A; Wang, Y; Yue, Y; Zhang, H; Xue, F
2016-01-01
The purpose of this study was to investigate the therapeutic efficacy of oral moxifloxacin for aerobic vaginitis (AV). We also identified factors that are associated with therapeutic efficacy. This prospective study enrolled general gynecological outpatients at Tianjin Medical University General Hospital between September 2012 and May 2014. Women diagnosed with AV (n = 102) were recruited. All enrolled women were treated with oral moxifloxacin, 400 mg once daily for 6 days (one course). Therapeutic efficacy was evaluated based on microscopic criteria, and cure rates were calculated. Women who were microscopically improved (but not cured) received a second course of therapy. Women classified with microscopic failure were treated using other strategies. Univariate and multivariate logistic regression analysis was used to identify factors that may be associated with a cure after one course of therapy. After one course of therapy, 65.7 % (67/102) of women were cured, 29.4 % (30/102) of women were improved (but not cured), 4.9 % (5/102) of women failed to respond to the therapy. After two courses of therapy, 85.3 % (87/102) of women were cured, 9.8 % (10/102) of women were improved, 4.9 % (5/102) of women failed to respond to the therapy, and clinical improvement was achieved in additional women. In the multivariate logistic regression analysis, women with a baseline vaginal pH value of <5.0 had a 3.5-times higher chance of being cured, compared with those with a baseline vaginal pH value of ≥5.0 (OR, 3.503; 95 % CI, 1.278-9.601). Moxifloxacin is an effective therapeutic option for patients with AV. Most women with AV were cured with one course of moxifloxacin. For those with a higher vaginal pH value of ≥5.0 before treatment, two courses of therapy should be considered.
NASA Astrophysics Data System (ADS)
Tan, Maxine; Emaminejad, Nastaran; Qian, Wei; Sun, Shenshen; Kang, Yan; Guan, Yubao; Lure, Fleming; Zheng, Bin
2014-03-01
Stage I non-small-cell lung cancers (NSCLC) usually have favorable prognosis. However, high percentage of NSCLC patients have cancer relapse after surgery. Accurately predicting cancer prognosis is important to optimally treat and manage the patients to minimize the risk of cancer relapse. Studies have shown that an excision repair crosscomplementing 1 (ERCC1) gene was a potentially useful genetic biomarker to predict prognosis of NSCLC patients. Meanwhile, studies also found that chronic obstructive pulmonary disease (COPD) was highly associated with lung cancer prognosis. In this study, we investigated and evaluated the correlations between COPD image features and ERCC1 gene expression. A database involving 106 NSCLC patients was used. Each patient had a thoracic CT examination and ERCC1 genetic test. We applied a computer-aided detection scheme to segment and quantify COPD image features. A logistic regression method and a multilayer perceptron network were applied to analyze the correlation between the computed COPD image features and ERCC1 protein expression. A multilayer perceptron network (MPN) was also developed to test performance of using COPD-related image features to predict ERCC1 protein expression. A nine feature based logistic regression analysis showed the average COPD feature values in the low and high ERCC1 protein expression groups are significantly different (p < 0.01). Using a five-fold cross validation method, the MPN yielded an area under ROC curve (AUC = 0.669±0.053) in classifying between the low and high ERCC1 expression cases. The study indicates that CT phenotype features are associated with the genetic tests, which may provide supplementary information to help improve accuracy in assessing prognosis of NSCLC patients.
Use of evidence-based management in healthcare administration decision-making.
Guo, Ruiling; Berkshire, Steven D; Fulton, Lawrence V; Hermanson, Patrick M
2017-07-03
Purpose The purpose of this paper is to examine whether healthcare leaders use evidence-based management (EBMgt) when facing major decisions and what types of evidence healthcare administrators consult during their decision-making. This study also intends to identify any relationship that might exist among adoption of EBMgt in healthcare management, attitudes towards EBMgt, demographic characteristics and organizational characteristics. Design/methodology/approach A cross-sectional study was conducted among US healthcare leaders. Spearman's correlation and logistic regression were performed using the Statistical Package for the Social Sciences (SPSS) 23.0. Findings One hundred and fifty-four healthcare leaders completed the survey. The study results indicated that 90 per cent of the participants self-reported having used an EBMgt approach for decision-making. Professional experiences (87 per cent), organizational data (84 per cent) and stakeholders' values (63 per cent) were the top three types of evidence consulted daily and weekly for decision-making. Case study (75 per cent) and scientific research findings (75 per cent) were the top two types of evidence consulted monthly or less than once a month. An exploratory, stepwise logistic regression model correctly classified 75.3 per cent of all observations for a dichotomous "use of EBMgt" response variable using three independent variables: attitude towards EBMgt, number of employees in the organization and the job position. Spearman's correlation indicated statistically significant relationships between healthcare leaders' use of EBMgt and healthcare organization bed size ( r s = 0.217, n = 152, p < 0.01), attitude towards EBMgt ( r s = 0.517, n = 152, p < 0.01), and the number of organization employees ( r s = 0.195, n = 152, p = 0.016). Originality/value This study generated new research findings on the practice of EBMgt in US healthcare administration decision-making.