Logistic regression applied to natural hazards: rare event logistic regression with replications
NASA Astrophysics Data System (ADS)
Guns, M.; Vanacker, V.
2012-06-01
Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.
ERIC Educational Resources Information Center
Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza
2014-01-01
This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…
A Note on Three Statistical Tests in the Logistic Regression DIF Procedure
ERIC Educational Resources Information Center
Paek, Insu
2012-01-01
Although logistic regression became one of the well-known methods in detecting differential item functioning (DIF), its three statistical tests, the Wald, likelihood ratio (LR), and score tests, which are readily available under the maximum likelihood, do not seem to be consistently distinguished in DIF literature. This paper provides a clarifying…
New robust statistical procedures for the polytomous logistic regression models.
Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro
2018-05-17
This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.
Logistic regression for risk factor modelling in stuttering research.
Reed, Phil; Wu, Yaqionq
2013-06-01
To outline the uses of logistic regression and other statistical methods for risk factor analysis in the context of research on stuttering. The principles underlying the application of a logistic regression are illustrated, and the types of questions to which such a technique has been applied in the stuttering field are outlined. The assumptions and limitations of the technique are discussed with respect to existing stuttering research, and with respect to formulating appropriate research strategies to accommodate these considerations. Finally, some alternatives to the approach are briefly discussed. The way the statistical procedures are employed are demonstrated with some hypothetical data. Research into several practical issues concerning stuttering could benefit if risk factor modelling were used. Important examples are early diagnosis, prognosis (whether a child will recover or persist) and assessment of treatment outcome. After reading this article you will: (a) Summarize the situations in which logistic regression can be applied to a range of issues about stuttering; (b) Follow the steps in performing a logistic regression analysis; (c) Describe the assumptions of the logistic regression technique and the precautions that need to be checked when it is employed; (d) Be able to summarize its advantages over other techniques like estimation of group differences and simple regression. Copyright © 2012 Elsevier Inc. All rights reserved.
Applications of statistics to medical science, III. Correlation and regression.
Watanabe, Hiroshi
2012-01-01
In this third part of a series surveying medical statistics, the concepts of correlation and regression are reviewed. In particular, methods of linear regression and logistic regression are discussed. Arguments related to survival analysis will be made in a subsequent paper.
Upgrade Summer Severe Weather Tool
NASA Technical Reports Server (NTRS)
Watson, Leela
2011-01-01
The goal of this task was to upgrade to the existing severe weather database by adding observations from the 2010 warm season, update the verification dataset with results from the 2010 warm season, use statistical logistic regression analysis on the database and develop a new forecast tool. The AMU analyzed 7 stability parameters that showed the possibility of providing guidance in forecasting severe weather, calculated verification statistics for the Total Threat Score (TTS), and calculated warm season verification statistics for the 2010 season. The AMU also performed statistical logistic regression analysis on the 22-year severe weather database. The results indicated that the logistic regression equation did not show an increase in skill over the previously developed TTS. The equation showed less accuracy than TTS at predicting severe weather, little ability to distinguish between severe and non-severe weather days, and worse standard categorical accuracy measures and skill scores over TTS.
Interpretation of commonly used statistical regression models.
Kasza, Jessica; Wolfe, Rory
2014-01-01
A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.
Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.
2003-01-01
Logistic regression was used to predict the probability of debris flows occurring in areas recently burned by wildland fires. Multiple logistic regression is conceptually similar to multiple linear regression because statistical relations between one dependent variable and several independent variables are evaluated. In logistic regression, however, the dependent variable is transformed to a binary variable (debris flow did or did not occur), and the actual probability of the debris flow occurring is statistically modeled. Data from 399 basins located within 15 wildland fires that burned during 2000-2002 in Colorado, Idaho, Montana, and New Mexico were evaluated. More than 35 independent variables describing the burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows were delineated from National Elevation Data using a Geographic Information System (GIS). (2) Data describing the burn severity, geology, land surface gradient, rainfall, and soil properties were determined for each basin. These data were then downloaded to a statistics software package for analysis using logistic regression. (3) Relations between the occurrence/non-occurrence of debris flows and burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated and several preliminary multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combination produced the most effective model. The multivariate model that best predicted the occurrence of debris flows was selected. (4) The multivariate logistic regression model was entered into a GIS, and a map showing the probability of debris flows was constructed. The most effective model incorporates the percentage of each basin with slope greater than 30 percent, percentage of land burned at medium and high burn severity in each basin, particle size sorting, average storm intensity (millimeters per hour), soil organic matter content, soil permeability, and soil drainage. The results of this study demonstrate that logistic regression is a valuable tool for predicting the probability of debris flows occurring in recently-burned landscapes.
Valle, Denis; Lima, Joanna M Tucker; Millar, Justin; Amratia, Punam; Haque, Ubydul
2015-11-04
Logistic regression is a statistical model widely used in cross-sectional and cohort studies to identify and quantify the effects of potential disease risk factors. However, the impact of imperfect tests on adjusted odds ratios (and thus on the identification of risk factors) is under-appreciated. The purpose of this article is to draw attention to the problem associated with modelling imperfect diagnostic tests, and propose simple Bayesian models to adequately address this issue. A systematic literature review was conducted to determine the proportion of malaria studies that appropriately accounted for false-negatives/false-positives in a logistic regression setting. Inference from the standard logistic regression was also compared with that from three proposed Bayesian models using simulations and malaria data from the western Brazilian Amazon. A systematic literature review suggests that malaria epidemiologists are largely unaware of the problem of using logistic regression to model imperfect diagnostic test results. Simulation results reveal that statistical inference can be substantially improved when using the proposed Bayesian models versus the standard logistic regression. Finally, analysis of original malaria data with one of the proposed Bayesian models reveals that microscopy sensitivity is strongly influenced by how long people have lived in the study region, and an important risk factor (i.e., participation in forest extractivism) is identified that would have been missed by standard logistic regression. Given the numerous diagnostic methods employed by malaria researchers and the ubiquitous use of logistic regression to model the results of these diagnostic tests, this paper provides critical guidelines to improve data analysis practice in the presence of misclassification error. Easy-to-use code that can be readily adapted to WinBUGS is provided, enabling straightforward implementation of the proposed Bayesian models.
NASA Astrophysics Data System (ADS)
Madhu, B.; Ashok, N. C.; Balasubramanian, S.
2014-11-01
Multinomial logistic regression analysis was used to develop statistical model that can predict the probability of breast cancer in Southern Karnataka using the breast cancer occurrence data during 2007-2011. Independent socio-economic variables describing the breast cancer occurrence like age, education, occupation, parity, type of family, health insurance coverage, residential locality and socioeconomic status of each case was obtained. The models were developed as follows: i) Spatial visualization of the Urban- rural distribution of breast cancer cases that were obtained from the Bharat Hospital and Institute of Oncology. ii) Socio-economic risk factors describing the breast cancer occurrences were complied for each case. These data were then analysed using multinomial logistic regression analysis in a SPSS statistical software and relations between the occurrence of breast cancer across the socio-economic status and the influence of other socio-economic variables were evaluated and multinomial logistic regression models were constructed. iii) the model that best predicted the occurrence of breast cancer were identified. This multivariate logistic regression model has been entered into a geographic information system and maps showing the predicted probability of breast cancer occurrence in Southern Karnataka was created. This study demonstrates that Multinomial logistic regression is a valuable tool for developing models that predict the probability of breast cancer Occurrence in Southern Karnataka.
NASA Astrophysics Data System (ADS)
Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.
2014-07-01
Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.
Campos-Filho, N; Franco, E L
1989-02-01
A frequent procedure in matched case-control studies is to report results from the multivariate unmatched analyses if they do not differ substantially from the ones obtained after conditioning on the matching variables. Although conceptually simple, this rule requires that an extensive series of logistic regression models be evaluated by both the conditional and unconditional maximum likelihood methods. Most computer programs for logistic regression employ only one maximum likelihood method, which requires that the analyses be performed in separate steps. This paper describes a Pascal microcomputer (IBM PC) program that performs multiple logistic regression by both maximum likelihood estimation methods, which obviates the need for switching between programs to obtain relative risk estimates from both matched and unmatched analyses. The program calculates most standard statistics and allows factoring of categorical or continuous variables by two distinct methods of contrast. A built-in, descriptive statistics option allows the user to inspect the distribution of cases and controls across categories of any given variable.
NASA Astrophysics Data System (ADS)
Ceppi, C.; Mancini, F.; Ritrovato, G.
2009-04-01
This study aim at the landslide susceptibility mapping within an area of the Daunia (Apulian Apennines, Italy) by a multivariate statistical method and data manipulation in a Geographical Information System (GIS) environment. Among the variety of existing statistical data analysis techniques, the logistic regression was chosen to produce a susceptibility map all over an area where small settlements are historically threatened by landslide phenomena. By logistic regression a best fitting between the presence or absence of landslide (dependent variable) and the set of independent variables is performed on the basis of a maximum likelihood criterion, bringing to the estimation of regression coefficients. The reliability of such analysis is therefore due to the ability to quantify the proneness to landslide occurrences by the probability level produced by the analysis. The inventory of dependent and independent variables were managed in a GIS, where geometric properties and attributes have been translated into raster cells in order to proceed with the logistic regression by means of SPSS (Statistical Package for the Social Sciences) package. A landslide inventory was used to produce the bivariate dependent variable whereas the independent set of variable concerned with slope, aspect, elevation, curvature, drained area, lithology and land use after their reductions to dummy variables. The effect of independent parameters on landslide occurrence was assessed by the corresponding coefficient in the logistic regression function, highlighting a major role played by the land use variable in determining occurrence and distribution of phenomena. Once the outcomes of the logistic regression are determined, data are re-introduced in the GIS to produce a map reporting the proneness to landslide as predicted level of probability. As validation of results and regression model a cell-by-cell comparison between the susceptibility map and the initial inventory of landslide events was performed and an agreement at 75% level achieved.
Robust mislabel logistic regression without modeling mislabel probabilities.
Hung, Hung; Jou, Zhi-Yu; Huang, Su-Yun
2018-03-01
Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression. © 2017, The International Biometric Society.
Austin, Peter C
2010-04-22
Multilevel logistic regression models are increasingly being used to analyze clustered data in medical, public health, epidemiological, and educational research. Procedures for estimating the parameters of such models are available in many statistical software packages. There is currently little evidence on the minimum number of clusters necessary to reliably fit multilevel regression models. We conducted a Monte Carlo study to compare the performance of different statistical software procedures for estimating multilevel logistic regression models when the number of clusters was low. We examined procedures available in BUGS, HLM, R, SAS, and Stata. We found that there were qualitative differences in the performance of different software procedures for estimating multilevel logistic models when the number of clusters was low. Among the likelihood-based procedures, estimation methods based on adaptive Gauss-Hermite approximations to the likelihood (glmer in R and xtlogit in Stata) or adaptive Gaussian quadrature (Proc NLMIXED in SAS) tended to have superior performance for estimating variance components when the number of clusters was small, compared to software procedures based on penalized quasi-likelihood. However, only Bayesian estimation with BUGS allowed for accurate estimation of variance components when there were fewer than 10 clusters. For all statistical software procedures, estimation of variance components tended to be poor when there were only five subjects per cluster, regardless of the number of clusters.
Length bias correction in gene ontology enrichment analysis using logistic regression.
Mi, Gu; Di, Yanming; Emerson, Sarah; Cumbie, Jason S; Chang, Jeff H
2012-01-01
When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible.
NASA Astrophysics Data System (ADS)
Kneringer, Philipp; Dietz, Sebastian; Mayr, Georg J.; Zeileis, Achim
2017-04-01
Low-visibility conditions have a large impact on aviation safety and economic efficiency of airports and airlines. To support decision makers, we develop a statistical probabilistic nowcasting tool for the occurrence of capacity-reducing operations related to low visibility. The probabilities of four different low visibility classes are predicted with an ordered logistic regression model based on time series of meteorological point measurements. Potential predictor variables for the statistical models are visibility, humidity, temperature and wind measurements at several measurement sites. A stepwise variable selection method indicates that visibility and humidity measurements are the most important model inputs. The forecasts are tested with a 30 minute forecast interval up to two hours, which is a sufficient time span for tactical planning at Vienna Airport. The ordered logistic regression models outperform persistence and are competitive with human forecasters.
Li, Yi; Tseng, Yufeng J.; Pan, Dahua; Liu, Jianzhong; Kern, Petra S.; Gerberick, G. Frank; Hopfinger, Anton J.
2008-01-01
Currently, the only validated methods to identify skin sensitization effects are in vivo models, such as the Local Lymph Node Assay (LLNA) and guinea pig studies. There is a tremendous need, in particular due to novel legislation, to develop animal alternatives, eg. Quantitative Structure-Activity Relationship (QSAR) models. Here, QSAR models for skin sensitization using LLNA data have been constructed. The descriptors used to generate these models are derived from the 4D-molecular similarity paradigm and are referred to as universal 4D-fingerprints. A training set of 132 structurally diverse compounds and a test set of 15 structurally diverse compounds were used in this study. The statistical methodologies used to build the models are logistic regression (LR), and partial least square coupled logistic regression (PLS-LR), which prove to be effective tools for studying skin sensitization measures expressed in the two categorical terms of sensitizer and non-sensitizer. QSAR models with low values of the Hosmer-Lemeshow goodness-of-fit statistic, χHL2, are significant and predictive. For the training set, the cross-validated prediction accuracy of the logistic regression models ranges from 77.3% to 78.0%, while that of PLS-logistic regression models ranges from 87.1% to 89.4%. For the test set, the prediction accuracy of logistic regression models ranges from 80.0%-86.7%, while that of PLS-logistic regression models ranges from 73.3%-80.0%. The QSAR models are made up of 4D-fingerprints related to aromatic atoms, hydrogen bond acceptors and negatively partially charged atoms. PMID:17226934
Effect of folic acid on appetite in children: ordinal logistic and fuzzy logistic regressions.
Namdari, Mahshid; Abadi, Alireza; Taheri, S Mahmoud; Rezaei, Mansour; Kalantari, Naser; Omidvar, Nasrin
2014-03-01
Reduced appetite and low food intake are often a concern in preschool children, since it can lead to malnutrition, a leading cause of impaired growth and mortality in childhood. It is occasionally considered that folic acid has a positive effect on appetite enhancement and consequently growth in children. The aim of this study was to assess the effect of folic acid on the appetite of preschool children 3 to 6 y old. The study sample included 127 children ages 3 to 6 who were randomly selected from 20 preschools in the city of Tehran in 2011. Since appetite was measured by linguistic terms, a fuzzy logistic regression was applied for modeling. The obtained results were compared with a statistical ordinal logistic model. After controlling for the potential confounders, in a statistical ordinal logistic model, serum folate showed a significantly positive effect on appetite. A small but positive effect of folate was detected by fuzzy logistic regression. Based on fuzzy regression, the risk for poor appetite in preschool children was related to the employment status of their mothers. In this study, a positive association was detected between the levels of serum folate and improved appetite. For further investigation, a randomized controlled, double-blind clinical trial could be helpful to address causality. Copyright © 2014 Elsevier Inc. All rights reserved.
Practical Session: Logistic Regression
NASA Astrophysics Data System (ADS)
Clausel, M.; Grégoire, G.
2014-12-01
An exercise is proposed to illustrate the logistic regression. One investigates the different risk factors in the apparition of coronary heart disease. It has been proposed in Chapter 5 of the book of D.G. Kleinbaum and M. Klein, "Logistic Regression", Statistics for Biology and Health, Springer Science Business Media, LLC (2010) and also by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr341.pdf). This example is based on data given in the file evans.txt coming from http://www.sph.emory.edu/dkleinb/logreg3.htm#data.
Applied Statistics: From Bivariate through Multivariate Techniques [with CD-ROM
ERIC Educational Resources Information Center
Warner, Rebecca M.
2007-01-01
This book provides a clear introduction to widely used topics in bivariate and multivariate statistics, including multiple regression, discriminant analysis, MANOVA, factor analysis, and binary logistic regression. The approach is applied and does not require formal mathematics; equations are accompanied by verbal explanations. Students are asked…
Menditto, Anthony A; Linhorst, Donald M; Coleman, James C; Beck, Niels C
2006-04-01
Development of policies and procedures to contend with the risks presented by elopement, aggression, and suicidal behaviors are long-standing challenges for mental health administrators. Guidance in making such judgments can be obtained through the use of a multivariate statistical technique known as logistic regression. This procedure can be used to develop a predictive equation that is mathematically formulated to use the best combination of predictors, rather than considering just one factor at a time. This paper presents an overview of logistic regression and its utility in mental health administrative decision making. A case example of its application is presented using data on elopements from Missouri's long-term state psychiatric hospitals. Ultimately, the use of statistical prediction analyses tempered with differential qualitative weighting of classification errors can augment decision-making processes in a manner that provides guidance and flexibility while wrestling with the complex problem of risk assessment and decision making.
Fei, Yang; Hu, Jian; Gao, Kun; Tu, Jianfeng; Li, Wei-Qin; Wang, Wei
2017-06-01
To construct a radical basis function (RBF) artificial neural networks (ANNs) model to predict the incidence of acute pancreatitis (AP)-induced portal vein thrombosis. The analysis included 353 patients with AP who had admitted between January 2011 and December 2015. RBF ANNs model and logistic regression model were constructed based on eleven factors relevant to AP respectively. Statistical indexes were used to evaluate the value of the prediction in two models. The predict sensitivity, specificity, positive predictive value, negative predictive value and accuracy by RBF ANNs model for PVT were 73.3%, 91.4%, 68.8%, 93.0% and 87.7%, respectively. There were significant differences between the RBF ANNs and logistic regression models in these parameters (P<0.05). In addition, a comparison of the area under receiver operating characteristic curves of the two models showed a statistically significant difference (P<0.05). The RBF ANNs model is more likely to predict the occurrence of PVT induced by AP than logistic regression model. D-dimer, AMY, Hct and PT were important prediction factors of approval for AP-induced PVT. Copyright © 2017 Elsevier Inc. All rights reserved.
Deng, Yingyuan; Wang, Tianfu; Chen, Siping; Liu, Weixiang
2017-01-01
The aim of the study is to screen the significant sonographic features by logistic regression analysis and fit a model to diagnose thyroid nodules. A total of 525 pathological thyroid nodules were retrospectively analyzed. All the nodules underwent conventional ultrasonography (US), strain elastosonography (SE), and contrast -enhanced ultrasound (CEUS). Those nodules’ 12 suspicious sonographic features were used to assess thyroid nodules. The significant features of diagnosing thyroid nodules were picked out by logistic regression analysis. All variables that were statistically related to diagnosis of thyroid nodules, at a level of p < 0.05 were embodied in a logistic regression analysis model. The significant features in the logistic regression model of diagnosing thyroid nodules were calcification, suspected cervical lymph node metastasis, hypoenhancement pattern, margin, shape, vascularity, posterior acoustic, echogenicity, and elastography score. According to the results of logistic regression analysis, the formula that could predict whether or not thyroid nodules are malignant was established. The area under the receiver operating curve (ROC) was 0.930 and the sensitivity, specificity, accuracy, positive predictive value, and negative predictive value were 83.77%, 89.56%, 87.05%, 86.04%, and 87.79% respectively. PMID:29228030
Pang, Tiantian; Huang, Leidan; Deng, Yingyuan; Wang, Tianfu; Chen, Siping; Gong, Xuehao; Liu, Weixiang
2017-01-01
The aim of the study is to screen the significant sonographic features by logistic regression analysis and fit a model to diagnose thyroid nodules. A total of 525 pathological thyroid nodules were retrospectively analyzed. All the nodules underwent conventional ultrasonography (US), strain elastosonography (SE), and contrast -enhanced ultrasound (CEUS). Those nodules' 12 suspicious sonographic features were used to assess thyroid nodules. The significant features of diagnosing thyroid nodules were picked out by logistic regression analysis. All variables that were statistically related to diagnosis of thyroid nodules, at a level of p < 0.05 were embodied in a logistic regression analysis model. The significant features in the logistic regression model of diagnosing thyroid nodules were calcification, suspected cervical lymph node metastasis, hypoenhancement pattern, margin, shape, vascularity, posterior acoustic, echogenicity, and elastography score. According to the results of logistic regression analysis, the formula that could predict whether or not thyroid nodules are malignant was established. The area under the receiver operating curve (ROC) was 0.930 and the sensitivity, specificity, accuracy, positive predictive value, and negative predictive value were 83.77%, 89.56%, 87.05%, 86.04%, and 87.79% respectively.
Discrete post-processing of total cloud cover ensemble forecasts
NASA Astrophysics Data System (ADS)
Hemri, Stephan; Haiden, Thomas; Pappenberger, Florian
2017-04-01
This contribution presents an approach to post-process ensemble forecasts for the discrete and bounded weather variable of total cloud cover. Two methods for discrete statistical post-processing of ensemble predictions are tested. The first approach is based on multinomial logistic regression, the second involves a proportional odds logistic regression model. Applying them to total cloud cover raw ensemble forecasts from the European Centre for Medium-Range Weather Forecasts improves forecast skill significantly. Based on station-wise post-processing of raw ensemble total cloud cover forecasts for a global set of 3330 stations over the period from 2007 to early 2014, the more parsimonious proportional odds logistic regression model proved to slightly outperform the multinomial logistic regression model. Reference Hemri, S., Haiden, T., & Pappenberger, F. (2016). Discrete post-processing of total cloud cover ensemble forecasts. Monthly Weather Review 144, 2565-2577.
Assistive Technologies for Second-Year Statistics Students Who Are Blind
ERIC Educational Resources Information Center
Erhardt, Robert J.; Shuman, Michael P.
2015-01-01
At Wake Forest University, a student who is blind enrolled in a second course in statistics. The course covered simple and multiple regression, model diagnostics, model selection, data visualization, and elementary logistic regression. These topics required that the student both interpret and produce three sets of materials: mathematical writing,…
Stylianou, Neophytos; Akbarov, Artur; Kontopantelis, Evangelos; Buchan, Iain; Dunn, Ken W
2015-08-01
Predicting mortality from burn injury has traditionally employed logistic regression models. Alternative machine learning methods have been introduced in some areas of clinical prediction as the necessary software and computational facilities have become accessible. Here we compare logistic regression and machine learning predictions of mortality from burn. An established logistic mortality model was compared to machine learning methods (artificial neural network, support vector machine, random forests and naïve Bayes) using a population-based (England & Wales) case-cohort registry. Predictive evaluation used: area under the receiver operating characteristic curve; sensitivity; specificity; positive predictive value and Youden's index. All methods had comparable discriminatory abilities, similar sensitivities, specificities and positive predictive values. Although some machine learning methods performed marginally better than logistic regression the differences were seldom statistically significant and clinically insubstantial. Random forests were marginally better for high positive predictive value and reasonable sensitivity. Neural networks yielded slightly better prediction overall. Logistic regression gives an optimal mix of performance and interpretability. The established logistic regression model of burn mortality performs well against more complex alternatives. Clinical prediction with a small set of strong, stable, independent predictors is unlikely to gain much from machine learning outside specialist research contexts. Copyright © 2015 Elsevier Ltd and ISBI. All rights reserved.
Intermediate and advanced topics in multilevel logistic regression analysis.
Austin, Peter C; Merlo, Juan
2017-09-10
Multilevel data occur frequently in health services, population and public health, and epidemiologic research. In such research, binary outcomes are common. Multilevel logistic regression models allow one to account for the clustering of subjects within clusters of higher-level units when estimating the effect of subject and cluster characteristics on subject outcomes. A search of the PubMed database demonstrated that the use of multilevel or hierarchical regression models is increasing rapidly. However, our impression is that many analysts simply use multilevel regression models to account for the nuisance of within-cluster homogeneity that is induced by clustering. In this article, we describe a suite of analyses that can complement the fitting of multilevel logistic regression models. These ancillary analyses permit analysts to estimate the marginal or population-average effect of covariates measured at the subject and cluster level, in contrast to the within-cluster or cluster-specific effects arising from the original multilevel logistic regression model. We describe the interval odds ratio and the proportion of opposed odds ratios, which are summary measures of effect for cluster-level covariates. We describe the variance partition coefficient and the median odds ratio which are measures of components of variance and heterogeneity in outcomes. These measures allow one to quantify the magnitude of the general contextual effect. We describe an R 2 measure that allows analysts to quantify the proportion of variation explained by different multilevel logistic regression models. We illustrate the application and interpretation of these measures by analyzing mortality in patients hospitalized with a diagnosis of acute myocardial infarction. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
Nie, Z Q; Ou, Y Q; Zhuang, J; Qu, Y J; Mai, J Z; Chen, J M; Liu, X Q
2016-05-01
Conditional logistic regression analysis and unconditional logistic regression analysis are commonly used in case control study, but Cox proportional hazard model is often used in survival data analysis. Most literature only refer to main effect model, however, generalized linear model differs from general linear model, and the interaction was composed of multiplicative interaction and additive interaction. The former is only statistical significant, but the latter has biological significance. In this paper, macros was written by using SAS 9.4 and the contrast ratio, attributable proportion due to interaction and synergy index were calculated while calculating the items of logistic and Cox regression interactions, and the confidence intervals of Wald, delta and profile likelihood were used to evaluate additive interaction for the reference in big data analysis in clinical epidemiology and in analysis of genetic multiplicative and additive interactions.
Use of generalized ordered logistic regression for the analysis of multidrug resistance data.
Agga, Getahun E; Scott, H Morgan
2015-10-01
Statistical analysis of antimicrobial resistance data largely focuses on individual antimicrobial's binary outcome (susceptible or resistant). However, bacteria are becoming increasingly multidrug resistant (MDR). Statistical analysis of MDR data is mostly descriptive often with tabular or graphical presentations. Here we report the applicability of generalized ordinal logistic regression model for the analysis of MDR data. A total of 1,152 Escherichia coli, isolated from the feces of weaned pigs experimentally supplemented with chlortetracycline (CTC) and copper, were tested for susceptibilities against 15 antimicrobials and were binary classified into resistant or susceptible. The 15 antimicrobial agents tested were grouped into eight different antimicrobial classes. We defined MDR as the number of antimicrobial classes to which E. coli isolates were resistant ranging from 0 to 8. Proportionality of the odds assumption of the ordinal logistic regression model was violated only for the effect of treatment period (pre-treatment, during-treatment and post-treatment); but not for the effect of CTC or copper supplementation. Subsequently, a partially constrained generalized ordinal logistic model was built that allows for the effect of treatment period to vary while constraining the effects of treatment (CTC and copper supplementation) to be constant across the levels of MDR classes. Copper (Proportional Odds Ratio [Prop OR]=1.03; 95% CI=0.73-1.47) and CTC (Prop OR=1.1; 95% CI=0.78-1.56) supplementation were not significantly associated with the level of MDR adjusted for the effect of treatment period. MDR generally declined over the trial period. In conclusion, generalized ordered logistic regression can be used for the analysis of ordinal data such as MDR data when the proportionality assumptions for ordered logistic regression are violated. Published by Elsevier B.V.
ERIC Educational Resources Information Center
Hidalgo, Mª Dolores; Gómez-Benito, Juana; Zumbo, Bruno D.
2014-01-01
The authors analyze the effectiveness of the R[superscript 2] and delta log odds ratio effect size measures when using logistic regression analysis to detect differential item functioning (DIF) in dichotomous items. A simulation study was carried out, and the Type I error rate and power estimates under conditions in which only statistical testing…
Park, Ji Hyun; Kim, Hyeon-Young; Lee, Hanna; Yun, Eun Kyoung
2015-12-01
This study compares the performance of the logistic regression and decision tree analysis methods for assessing the risk factors for infection in cancer patients undergoing chemotherapy. The subjects were 732 cancer patients who were receiving chemotherapy at K university hospital in Seoul, Korea. The data were collected between March 2011 and February 2013 and were processed for descriptive analysis, logistic regression and decision tree analysis using the IBM SPSS Statistics 19 and Modeler 15.1 programs. The most common risk factors for infection in cancer patients receiving chemotherapy were identified as alkylating agents, vinca alkaloid and underlying diabetes mellitus. The logistic regression explained 66.7% of the variation in the data in terms of sensitivity and 88.9% in terms of specificity. The decision tree analysis accounted for 55.0% of the variation in the data in terms of sensitivity and 89.0% in terms of specificity. As for the overall classification accuracy, the logistic regression explained 88.0% and the decision tree analysis explained 87.2%. The logistic regression analysis showed a higher degree of sensitivity and classification accuracy. Therefore, logistic regression analysis is concluded to be the more effective and useful method for establishing an infection prediction model for patients undergoing chemotherapy. Copyright © 2015 Elsevier Ltd. All rights reserved.
Valid Statistical Analysis for Logistic Regression with Multiple Sources
NASA Astrophysics Data System (ADS)
Fienberg, Stephen E.; Nardi, Yuval; Slavković, Aleksandra B.
Considerable effort has gone into understanding issues of privacy protection of individual information in single databases, and various solutions have been proposed depending on the nature of the data, the ways in which the database will be used and the precise nature of the privacy protection being offered. Once data are merged across sources, however, the nature of the problem becomes far more complex and a number of privacy issues arise for the linked individual files that go well beyond those that are considered with regard to the data within individual sources. In the paper, we propose an approach that gives full statistical analysis on the combined database without actually combining it. We focus mainly on logistic regression, but the method and tools described may be applied essentially to other statistical models as well.
Kupek, Emil
2006-03-15
Structural equation modelling (SEM) has been increasingly used in medical statistics for solving a system of related regression equations. However, a great obstacle for its wider use has been its difficulty in handling categorical variables within the framework of generalised linear models. A large data set with a known structure among two related outcomes and three independent variables was generated to investigate the use of Yule's transformation of odds ratio (OR) into Q-metric by (OR-1)/(OR+1) to approximate Pearson's correlation coefficients between binary variables whose covariance structure can be further analysed by SEM. Percent of correctly classified events and non-events was compared with the classification obtained by logistic regression. The performance of SEM based on Q-metric was also checked on a small (N = 100) random sample of the data generated and on a real data set. SEM successfully recovered the generated model structure. SEM of real data suggested a significant influence of a latent confounding variable which would have not been detectable by standard logistic regression. SEM classification performance was broadly similar to that of the logistic regression. The analysis of binary data can be greatly enhanced by Yule's transformation of odds ratios into estimated correlation matrix that can be further analysed by SEM. The interpretation of results is aided by expressing them as odds ratios which are the most frequently used measure of effect in medical statistics.
Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.
Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg
2009-11-01
G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.
Parsaeian, M; Mohammad, K; Mahmoudi, M; Zeraati, H
2012-01-01
Background: The purpose of this investigation was to compare empirically predictive ability of an artificial neural network with a logistic regression in prediction of low back pain. Methods: Data from the second national health survey were considered in this investigation. This data includes the information of low back pain and its associated risk factors among Iranian people aged 15 years and older. Artificial neural network and logistic regression models were developed using a set of 17294 data and they were validated in a test set of 17295 data. Hosmer and Lemeshow recommendation for model selection was used in fitting the logistic regression. A three-layer perceptron with 9 inputs, 3 hidden and 1 output neurons was employed. The efficiency of two models was compared by receiver operating characteristic analysis, root mean square and -2 Loglikelihood criteria. Results: The area under the ROC curve (SE), root mean square and -2Loglikelihood of the logistic regression was 0.752 (0.004), 0.3832 and 14769.2, respectively. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the artificial neural network was 0.754 (0.004), 0.3770 and 14757.6, respectively. Conclusions: Based on these three criteria, artificial neural network would give better performance than logistic regression. Although, the difference is statistically significant, it does not seem to be clinically significant. PMID:23113198
Parsaeian, M; Mohammad, K; Mahmoudi, M; Zeraati, H
2012-01-01
The purpose of this investigation was to compare empirically predictive ability of an artificial neural network with a logistic regression in prediction of low back pain. Data from the second national health survey were considered in this investigation. This data includes the information of low back pain and its associated risk factors among Iranian people aged 15 years and older. Artificial neural network and logistic regression models were developed using a set of 17294 data and they were validated in a test set of 17295 data. Hosmer and Lemeshow recommendation for model selection was used in fitting the logistic regression. A three-layer perceptron with 9 inputs, 3 hidden and 1 output neurons was employed. The efficiency of two models was compared by receiver operating characteristic analysis, root mean square and -2 Loglikelihood criteria. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the logistic regression was 0.752 (0.004), 0.3832 and 14769.2, respectively. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the artificial neural network was 0.754 (0.004), 0.3770 and 14757.6, respectively. Based on these three criteria, artificial neural network would give better performance than logistic regression. Although, the difference is statistically significant, it does not seem to be clinically significant.
Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry
2013-08-01
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.
Kim, Yoonsang; Emery, Sherry
2013-01-01
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415
Li, Ji; Gray, B.R.; Bates, D.M.
2008-01-01
Partitioning the variance of a response by design levels is challenging for binomial and other discrete outcomes. Goldstein (2003) proposed four definitions for variance partitioning coefficients (VPC) under a two-level logistic regression model. In this study, we explicitly derived formulae for multi-level logistic regression model and subsequently studied the distributional properties of the calculated VPCs. Using simulations and a vegetation dataset, we demonstrated associations between different VPC definitions, the importance of methods for estimating VPCs (by comparing VPC obtained using Laplace and penalized quasilikehood methods), and bivariate dependence between VPCs calculated at different levels. Such an empirical study lends an immediate support to wider applications of VPC in scientific data analysis.
Clustering performance comparison using K-means and expectation maximization algorithms.
Jung, Yong Gyu; Kang, Min Soo; Heo, Jun
2014-11-14
Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K -means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K -means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.
Austin, Peter C; Reeves, Mathew J
2013-03-01
Hospital report cards, in which outcomes following the provision of medical or surgical care are compared across health care providers, are being published with increasing frequency. Essential to the production of these reports is risk-adjustment, which allows investigators to account for differences in the distribution of patient illness severity across different hospitals. Logistic regression models are frequently used for risk adjustment in hospital report cards. Many applied researchers use the c-statistic (equivalent to the area under the receiver operating characteristic curve) of the logistic regression model as a measure of the credibility and accuracy of hospital report cards. To determine the relationship between the c-statistic of a risk-adjustment model and the accuracy of hospital report cards. Monte Carlo simulations were used to examine this issue. We examined the influence of 3 factors on the accuracy of hospital report cards: the c-statistic of the logistic regression model used for risk adjustment, the number of hospitals, and the number of patients treated at each hospital. The parameters used to generate the simulated datasets came from analyses of patients hospitalized with a diagnosis of acute myocardial infarction in Ontario, Canada. The c-statistic of the risk-adjustment model had, at most, a very modest impact on the accuracy of hospital report cards, whereas the number of patients treated at each hospital had a much greater impact. The c-statistic of a risk-adjustment model should not be used to assess the accuracy of a hospital report card.
Austin, Peter C.; Reeves, Mathew J.
2015-01-01
Background Hospital report cards, in which outcomes following the provision of medical or surgical care are compared across health care providers, are being published with increasing frequency. Essential to the production of these reports is risk-adjustment, which allows investigators to account for differences in the distribution of patient illness severity across different hospitals. Logistic regression models are frequently used for risk-adjustment in hospital report cards. Many applied researchers use the c-statistic (equivalent to the area under the receiver operating characteristic curve) of the logistic regression model as a measure of the credibility and accuracy of hospital report cards. Objectives To determine the relationship between the c-statistic of a risk-adjustment model and the accuracy of hospital report cards. Research Design Monte Carlo simulations were used to examine this issue. We examined the influence of three factors on the accuracy of hospital report cards: the c-statistic of the logistic regression model used for risk-adjustment, the number of hospitals, and the number of patients treated at each hospital. The parameters used to generate the simulated datasets came from analyses of patients hospitalized with a diagnosis of acute myocardial infarction in Ontario, Canada. Results The c-statistic of the risk-adjustment model had, at most, a very modest impact on the accuracy of hospital report cards, whereas the number of patients treated at each hospital had a much greater impact. Conclusions The c-statistic of a risk-adjustment model should not be used to assess the accuracy of a hospital report card. PMID:23295579
Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula
2011-01-01
Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.
Engoren, Milo; Habib, Robert H; Dooner, John J; Schwann, Thomas A
2013-08-01
As many as 14 % of patients undergoing coronary artery bypass surgery are readmitted within 30 days. Readmission is usually the result of morbidity and may lead to death. The purpose of this study is to develop and compare statistical and genetic programming models to predict readmission. Patients were divided into separate Construction and Validation populations. Using 88 variables, logistic regression, genetic programs, and artificial neural nets were used to develop predictive models. Models were first constructed and tested on the Construction populations, then validated on the Validation population. Areas under the receiver operator characteristic curves (AU ROC) were used to compare the models. Two hundred and two patients (7.6 %) in the 2,644 patient Construction group and 216 (8.0 %) of the 2,711 patient Validation group were re-admitted within 30 days of CABG surgery. Logistic regression predicted readmission with AU ROC = .675 ± .021 in the Construction group. Genetic programs significantly improved the accuracy, AU ROC = .767 ± .001, p < .001). Artificial neural nets were less accurate with AU ROC = 0.597 ± .001 in the Construction group. Predictive accuracy of all three techniques fell in the Validation group. However, the accuracy of genetic programming (AU ROC = .654 ± .001) was still trivially but statistically non-significantly better than that of the logistic regression (AU ROC = .644 ± .020, p = .61). Genetic programming and logistic regression provide alternative methods to predict readmission that are similarly accurate.
Cakir, Ebru; Kucuk, Ulku; Pala, Emel Ebru; Sezer, Ozlem; Ekin, Rahmi Gokhan; Cakmak, Ozgur
2017-05-01
Conventional cytomorphologic assessment is the first step to establish an accurate diagnosis in urinary cytology. In cytologic preparations, the separation of low-grade urothelial carcinoma (LGUC) from reactive urothelial proliferation (RUP) can be exceedingly difficult. The bladder washing cytologies of 32 LGUC and 29 RUP were reviewed. The cytologic slides were examined for the presence or absence of the 28 cytologic features. The cytologic criteria showing statistical significance in LGUC were increased numbers of monotonous single (non-umbrella) cells, three-dimensional cellular papillary clusters without fibrovascular cores, irregular bordered clusters, atypical single cells, irregular nuclear overlap, cytoplasmic homogeneity, increased N/C ratio, pleomorphism, nuclear border irregularity, nuclear eccentricity, elongated nuclei, and hyperchromasia (p ˂ 0.05), and the cytologic criteria showing statistical significance in RUP were inflammatory background, mixture of small and large urothelial cells, loose monolayer aggregates, and vacuolated cytoplasm (p ˂ 0.05). When these variables were subjected to a stepwise logistic regression analysis, four features were selected to distinguish LGUC from RUP: increased numbers of monotonous single (non-umbrella) cells, increased nuclear cytoplasmic ratio, hyperchromasia, and presence of small and large urothelial cells (p = 0.0001). By this logistic model of the 32 cases with proven LGUC, the stepwise logistic regression analysis correctly predicted 31 (96.9%) patients with this diagnosis, and of the 29 patients with RUP, the logistic model correctly predicted 26 (89.7%) patients as having this disease. There are several cytologic features to separate LGUC from RUP. Stepwise logistic regression analysis is a valuable tool for determining the most useful cytologic criteria to distinguish these entities. © 2017 APMIS. Published by John Wiley & Sons Ltd.
Choi, Seung Hoan; Labadorf, Adam T; Myers, Richard H; Lunetta, Kathryn L; Dupuis, Josée; DeStefano, Anita L
2017-02-06
Next generation sequencing provides a count of RNA molecules in the form of short reads, yielding discrete, often highly non-normally distributed gene expression measurements. Although Negative Binomial (NB) regression has been generally accepted in the analysis of RNA sequencing (RNA-Seq) data, its appropriateness has not been exhaustively evaluated. We explore logistic regression as an alternative method for RNA-Seq studies designed to compare cases and controls, where disease status is modeled as a function of RNA-Seq reads using simulated and Huntington disease data. We evaluate the effect of adjusting for covariates that have an unknown relationship with gene expression. Finally, we incorporate the data adaptive method in order to compare false positive rates. When the sample size is small or the expression levels of a gene are highly dispersed, the NB regression shows inflated Type-I error rates but the Classical logistic and Bayes logistic (BL) regressions are conservative. Firth's logistic (FL) regression performs well or is slightly conservative. Large sample size and low dispersion generally make Type-I error rates of all methods close to nominal alpha levels of 0.05 and 0.01. However, Type-I error rates are controlled after applying the data adaptive method. The NB, BL, and FL regressions gain increased power with large sample size, large log2 fold-change, and low dispersion. The FL regression has comparable power to NB regression. We conclude that implementing the data adaptive method appropriately controls Type-I error rates in RNA-Seq analysis. Firth's logistic regression provides a concise statistical inference process and reduces spurious associations from inaccurately estimated dispersion parameters in the negative binomial framework.
A statistical method for predicting seizure onset zones from human single-neuron recordings
NASA Astrophysics Data System (ADS)
Valdez, André B.; Hickman, Erin N.; Treiman, David M.; Smith, Kris A.; Steinmetz, Peter N.
2013-02-01
Objective. Clinicians often use depth-electrode recordings to localize human epileptogenic foci. To advance the diagnostic value of these recordings, we applied logistic regression models to single-neuron recordings from depth-electrode microwires to predict seizure onset zones (SOZs). Approach. We collected data from 17 epilepsy patients at the Barrow Neurological Institute and developed logistic regression models to calculate the odds of observing SOZs in the hippocampus, amygdala and ventromedial prefrontal cortex, based on statistics such as the burst interspike interval (ISI). Main results. Analysis of these models showed that, for a single-unit increase in burst ISI ratio, the left hippocampus was approximately 12 times more likely to contain a SOZ; and the right amygdala, 14.5 times more likely. Our models were most accurate for the hippocampus bilaterally (at 85% average sensitivity), and performance was comparable with current diagnostics such as electroencephalography. Significance. Logistic regression models can be combined with single-neuron recording to predict likely SOZs in epilepsy patients being evaluated for resective surgery, providing an automated source of clinically useful information.
Landslide Hazard Mapping in Rwanda Using Logistic Regression
NASA Astrophysics Data System (ADS)
Piller, A.; Anderson, E.; Ballard, H.
2015-12-01
Landslides in the United States cause more than $1 billion in damages and 50 deaths per year (USGS 2014). Globally, figures are much more grave, yet monitoring, mapping and forecasting of these hazards are less than adequate. Seventy-five percent of the population of Rwanda earns a living from farming, mostly subsistence. Loss of farmland, housing, or life, to landslides is a very real hazard. Landslides in Rwanda have an impact at the economic, social, and environmental level. In a developing nation that faces challenges in tracking, cataloging, and predicting the numerous landslides that occur each year, satellite imagery and spatial analysis allow for remote study. We have focused on the development of a landslide inventory and a statistical methodology for assessing landslide hazards. Using logistic regression on approximately 30 test variables (i.e. slope, soil type, land cover, etc.) and a sample of over 200 landslides, we determine which variables are statistically most relevant to landslide occurrence in Rwanda. A preliminary predictive hazard map for Rwanda has been produced, using the variables selected from the logistic regression analysis.
Determination of riverbank erosion probability using Locally Weighted Logistic Regression
NASA Astrophysics Data System (ADS)
Ioannidou, Elena; Flori, Aikaterini; Varouchakis, Emmanouil A.; Giannakis, Georgios; Vozinaki, Anthi Eirini K.; Karatzas, George P.; Nikolaidis, Nikolaos
2015-04-01
Riverbank erosion is a natural geomorphologic process that affects the fluvial environment. The most important issue concerning riverbank erosion is the identification of the vulnerable locations. An alternative to the usual hydrodynamic models to predict vulnerable locations is to quantify the probability of erosion occurrence. This can be achieved by identifying the underlying relations between riverbank erosion and the geomorphological or hydrological variables that prevent or stimulate erosion. Thus, riverbank erosion can be determined by a regression model using independent variables that are considered to affect the erosion process. The impact of such variables may vary spatially, therefore, a non-stationary regression model is preferred instead of a stationary equivalent. Locally Weighted Regression (LWR) is proposed as a suitable choice. This method can be extended to predict the binary presence or absence of erosion based on a series of independent local variables by using the logistic regression model. It is referred to as Locally Weighted Logistic Regression (LWLR). Logistic regression is a type of regression analysis used for predicting the outcome of a categorical dependent variable (e.g. binary response) based on one or more predictor variables. The method can be combined with LWR to assign weights to local independent variables of the dependent one. LWR allows model parameters to vary over space in order to reflect spatial heterogeneity. The probabilities of the possible outcomes are modelled as a function of the independent variables using a logistic function. Logistic regression measures the relationship between a categorical dependent variable and, usually, one or several continuous independent variables by converting the dependent variable to probability scores. Then, a logistic regression is formed, which predicts success or failure of a given binary variable (e.g. erosion presence or absence) for any value of the independent variables. The erosion occurrence probability can be calculated in conjunction with the model deviance regarding the independent variables tested. The most straightforward measure for goodness of fit is the G statistic. It is a simple and effective way to study and evaluate the Logistic Regression model efficiency and the reliability of each independent variable. The developed statistical model is applied to the Koiliaris River Basin on the island of Crete, Greece. Two datasets of river bank slope, river cross-section width and indications of erosion were available for the analysis (12 and 8 locations). Two different types of spatial dependence functions, exponential and tricubic, were examined to determine the local spatial dependence of the independent variables at the measurement locations. The results show a significant improvement when the tricubic function is applied as the erosion probability is accurately predicted at all eight validation locations. Results for the model deviance show that cross-section width is more important than bank slope in the estimation of erosion probability along the Koiliaris riverbanks. The proposed statistical model is a useful tool that quantifies the erosion probability along the riverbanks and can be used to assist managing erosion and flooding events. Acknowledgements This work is part of an on-going THALES project (CYBERSENSORS - High Frequency Monitoring System for Integrated Water Resources Management of Rivers). The project has been co-financed by the European Union (European Social Fund - ESF) and Greek national funds through the Operational Program "Education and Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research Funding Program: THALES. Investing in knowledge society through the European Social Fund.
Tuuli, Methodius G; Odibo, Anthony O
2011-08-01
The objective of this article is to discuss the rationale for common statistical tests used for the analysis and interpretation of prenatal diagnostic imaging studies. Examples from the literature are used to illustrate descriptive and inferential statistics. The uses and limitations of linear and logistic regression analyses are discussed in detail.
Advanced Statistics for Exotic Animal Practitioners.
Hodsoll, John; Hellier, Jennifer M; Ryan, Elizabeth G
2017-09-01
Correlation and regression assess the association between 2 or more variables. This article reviews the core knowledge needed to understand these analyses, moving from visual analysis in scatter plots through correlation, simple and multiple linear regression, and logistic regression. Correlation estimates the strength and direction of a relationship between 2 variables. Regression can be considered more general and quantifies the numerical relationships between an outcome and 1 or multiple variables in terms of a best-fit line, allowing predictions to be made. Each technique is discussed with examples and the statistical assumptions underlying their correct application. Copyright © 2017 Elsevier Inc. All rights reserved.
2012-09-01
3,435 10,461 9.1 3.1 63 Unmarried with Children+ Unmarried without Children 439,495 0.01 10,350 43,870 10.1 2.2 64 Married with Children+ Married ...logistic regression model was used to predict the probability of eligibility for the survey (known eligibility vs . unknown eligibility). A second logistic...regression model was used to predict the probability of response among eligible sample members (complete response vs . non-response). CHAID (Chi
Akkus, Zeki; Camdeviren, Handan; Celik, Fatma; Gur, Ali; Nas, Kemal
2005-09-01
To determine the risk factors of osteoporosis using a multiple binary logistic regression method and to assess the risk variables for osteoporosis, which is a major and growing health problem in many countries. We presented a case-control study, consisting of 126 postmenopausal healthy women as control group and 225 postmenopausal osteoporotic women as the case group. The study was carried out in the Department of Physical Medicine and Rehabilitation, Dicle University, Diyarbakir, Turkey between 1999-2002. The data from the 351 participants were collected using a standard questionnaire that contains 43 variables. A multiple logistic regression model was then used to evaluate the data and to find the best regression model. We classified 80.1% (281/351) of the participants using the regression model. Furthermore, the specificity value of the model was 67% (84/126) of the control group while the sensitivity value was 88% (197/225) of the case group. We found the distribution of residual values standardized for final model to be exponential using the Kolmogorow-Smirnow test (p=0.193). The receiver operating characteristic curve was found successful to predict patients with risk for osteoporosis. This study suggests that low levels of dietary calcium intake, physical activity, education, and longer duration of menopause are independent predictors of the risk of low bone density in our population. Adequate dietary calcium intake in combination with maintaining a daily physical activity, increasing educational level, decreasing birth rate, and duration of breast-feeding may contribute to healthy bones and play a role in practical prevention of osteoporosis in Southeast Anatolia. In addition, the findings of the present study indicate that the use of multivariate statistical method as a multiple logistic regression in osteoporosis, which maybe influenced by many variables, is better than univariate statistical evaluation.
Assessing risk factors for periodontitis using regression
NASA Astrophysics Data System (ADS)
Lobo Pereira, J. A.; Ferreira, Maria Cristina; Oliveira, Teresa
2013-10-01
Multivariate statistical analysis is indispensable to assess the associations and interactions between different factors and the risk of periodontitis. Among others, regression analysis is a statistical technique widely used in healthcare to investigate and model the relationship between variables. In our work we study the impact of socio-demographic, medical and behavioral factors on periodontal health. Using regression, linear and logistic models, we can assess the relevance, as risk factors for periodontitis disease, of the following independent variables (IVs): Age, Gender, Diabetic Status, Education, Smoking status and Plaque Index. The multiple linear regression analysis model was built to evaluate the influence of IVs on mean Attachment Loss (AL). Thus, the regression coefficients along with respective p-values will be obtained as well as the respective p-values from the significance tests. The classification of a case (individual) adopted in the logistic model was the extent of the destruction of periodontal tissues defined by an Attachment Loss greater than or equal to 4 mm in 25% (AL≥4mm/≥25%) of sites surveyed. The association measures include the Odds Ratios together with the correspondent 95% confidence intervals.
Correlation and simple linear regression.
Eberly, Lynn E
2007-01-01
This chapter highlights important steps in using correlation and simple linear regression to address scientific questions about the association of two continuous variables with each other. These steps include estimation and inference, assessing model fit, the connection between regression and ANOVA, and study design. Examples in microbiology are used throughout. This chapter provides a framework that is helpful in understanding more complex statistical techniques, such as multiple linear regression, linear mixed effects models, logistic regression, and proportional hazards regression.
Binary logistic regression-Instrument for assessing museum indoor air impact on exhibits.
Bucur, Elena; Danet, Andrei Florin; Lehr, Carol Blaziu; Lehr, Elena; Nita-Lazar, Mihai
2017-04-01
This paper presents a new way to assess the environmental impact on historical artifacts using binary logistic regression. The prediction of the impact on the exhibits during certain pollution scenarios (environmental impact) was calculated by a mathematical model based on the binary logistic regression; it allows the identification of those environmental parameters from a multitude of possible parameters with a significant impact on exhibitions and ranks them according to their severity effect. Air quality (NO 2 , SO 2 , O 3 and PM 2.5 ) and microclimate parameters (temperature, humidity) monitoring data from a case study conducted within exhibition and storage spaces of the Romanian National Aviation Museum Bucharest have been used for developing and validating the binary logistic regression method and the mathematical model. The logistic regression analysis was used on 794 data combinations (715 to develop of the model and 79 to validate it) by a Statistical Package for Social Sciences (SPSS 20.0). The results from the binary logistic regression analysis demonstrated that from six parameters taken into consideration, four of them present a significant effect upon exhibits in the following order: O 3 >PM 2.5 >NO 2 >humidity followed at a significant distance by the effects of SO 2 and temperature. The mathematical model, developed in this study, correctly predicted 95.1 % of the cumulated effect of the environmental parameters upon the exhibits. Moreover, this model could also be used in the decisional process regarding the preventive preservation measures that should be implemented within the exhibition space. The paper presents a new way to assess the environmental impact on historical artifacts using binary logistic regression. The mathematical model developed on the environmental parameters analyzed by the binary logistic regression method could be useful in a decision-making process establishing the best measures for pollution reduction and preventive preservation of exhibits.
Hao, Chen; LiJun, Chen; Albright, Thomas P.
2007-01-01
Invasive exotic species pose a growing threat to the economy, public health, and ecological integrity of nations worldwide. Explaining and predicting the spatial distribution of invasive exotic species is of great importance to prevention and early warning efforts. We are investigating the potential distribution of invasive exotic species, the environmental factors that influence these distributions, and the ability to predict them using statistical and information-theoretic approaches. For some species, detailed presence/absence occurrence data are available, allowing the use of a variety of standard statistical techniques. However, for most species, absence data are not available. Presented with the challenge of developing a model based on presence-only information, we developed an improved logistic regression approach using Information Theory and Frequency Statistics to produce a relative suitability map. This paper generated a variety of distributions of ragweed (Ambrosia artemisiifolia L.) from logistic regression models applied to herbarium specimen location data and a suite of GIS layers including climatic, topographic, and land cover information. Our logistic regression model was based on Akaike's Information Criterion (AIC) from a suite of ecologically reasonable predictor variables. Based on the results we provided a new Frequency Statistical method to compartmentalize habitat-suitability in the native range. Finally, we used the model and the compartmentalized criterion developed in native ranges to "project" a potential distribution onto the exotic ranges to build habitat-suitability maps. ?? Science in China Press 2007.
Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.; Michael, John A.; Helsel, Dennis R.
2008-01-01
Logistic regression was used to develop statistical models that can be used to predict the probability of debris flows in areas recently burned by wildfires by using data from 14 wildfires that burned in southern California during 2003-2006. Twenty-eight independent variables describing the basin morphology, burn severity, rainfall, and soil properties of 306 drainage basins located within those burned areas were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows soon after the 2003 to 2006 fires were delineated from data in the National Elevation Dataset using a geographic information system; (2) Data describing the basin morphology, burn severity, rainfall, and soil properties were compiled for each basin. These data were then input to a statistics software package for analysis using logistic regression; and (3) Relations between the occurrence or absence of debris flows and the basin morphology, burn severity, rainfall, and soil properties were evaluated, and five multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combinations produced the most effective models, and the multivariate models that best predicted the occurrence of debris flows were identified. Percentage of high burn severity and 3-hour peak rainfall intensity were significant variables in all models. Soil organic matter content and soil clay content were significant variables in all models except Model 5. Soil slope was a significant variable in all models except Model 4. The most suitable model can be selected from these five models on the basis of the availability of independent variables in the particular area of interest and field checking of probability maps. The multivariate logistic regression models can be entered into a geographic information system, and maps showing the probability of debris flows can be constructed in recently burned areas of southern California. This study demonstrates that logistic regression is a valuable tool for developing models that predict the probability of debris flows occurring in recently burned landscapes.
Szekér, Szabolcs; Vathy-Fogarassy, Ágnes
2018-01-01
Logistic regression based propensity score matching is a widely used method in case-control studies to select the individuals of the control group. This method creates a suitable control group if all factors affecting the output variable are known. However, if relevant latent variables exist as well, which are not taken into account during the calculations, the quality of the control group is uncertain. In this paper, we present a statistics-based research in which we try to determine the relationship between the accuracy of the logistic regression model and the uncertainty of the dependent variable of the control group defined by propensity score matching. Our analyses show that there is a linear correlation between the fit of the logistic regression model and the uncertainty of the output variable. In certain cases, a latent binary explanatory variable can result in a relative error of up to 70% in the prediction of the outcome variable. The observed phenomenon calls the attention of analysts to an important point, which must be taken into account when deducting conclusions.
Identifying the Factors That Influence Change in SEBD Using Logistic Regression Analysis
ERIC Educational Resources Information Center
Camilleri, Liberato; Cefai, Carmel
2013-01-01
Multiple linear regression and ANOVA models are widely used in applications since they provide effective statistical tools for assessing the relationship between a continuous dependent variable and several predictors. However these models rely heavily on linearity and normality assumptions and they do not accommodate categorical dependent…
Staley, Dennis M.; Negri, Jacquelyn A.; Kean, Jason W.; Laber, Jayme L.; Tillery, Anne C.; Youberg, Ann M.
2016-06-30
Wildfire can significantly alter the hydrologic response of a watershed to the extent that even modest rainstorms can generate dangerous flash floods and debris flows. To reduce public exposure to hazard, the U.S. Geological Survey produces post-fire debris-flow hazard assessments for select fires in the western United States. We use publicly available geospatial data describing basin morphology, burn severity, soil properties, and rainfall characteristics to estimate the statistical likelihood that debris flows will occur in response to a storm of a given rainfall intensity. Using an empirical database and refined geospatial analysis methods, we defined new equations for the prediction of debris-flow likelihood using logistic regression methods. We showed that the new logistic regression model outperformed previous models used to predict debris-flow likelihood.
Multinomial logistic regression in workers' health
NASA Astrophysics Data System (ADS)
Grilo, Luís M.; Grilo, Helena L.; Gonçalves, Sónia P.; Junça, Ana
2017-11-01
In European countries, namely in Portugal, it is common to hear some people mentioning that they are exposed to excessive and continuous psychosocial stressors at work. This is increasing in diverse activity sectors, such as, the Services sector. A representative sample was collected from a Portuguese Services' organization, by applying a survey (internationally validated), which variables were measured in five ordered categories in Likert-type scale. A multinomial logistic regression model is used to estimate the probability of each category of the dependent variable general health perception where, among other independent variables, burnout appear as statistically significant.
Supporting Regularized Logistic Regression Privately and Efficiently.
Li, Wenfa; Liu, Hongzhe; Yang, Peng; Xie, Wei
2016-01-01
As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc.
Beyond Reading Alone: The Relationship Between Aural Literacy And Asthma Management
Rosenfeld, Lindsay; Rudd, Rima; Emmons, Karen M.; Acevedo-García, Dolores; Martin, Laurie; Buka, Stephen
2010-01-01
Objectives To examine the relationship between literacy and asthma management with a focus on the oral exchange. Methods Study participants, all of whom reported asthma, were drawn from the New England Family Study (NEFS), an examination of links between education and health. NEFS data included reading, oral (speaking), and aural (listening) literacy measures. An additional survey was conducted with this group of study participants related to asthma issues, particularly asthma management. Data analysis focused on bivariate and multivariable logistic regression. Results In bivariate logistic regression models exploring aural literacy, there was a statistically significant association between those participants with lower aural literacy skills and less successful asthma management (OR:4.37, 95%CI:1.11, 17.32). In multivariable logistic regression analyses, controlling for gender, income, and race in separate models (one-at-a-time), there remained a statistically significant association between those participants with lower aural literacy skills and less successful asthma management. Conclusion Lower aural literacy skills seem to complicate asthma management capabilities. Practice Implications Greater attention to the oral exchange, in particular the listening skills highlighted by aural literacy, as well as other related literacy skills may help us develop strategies for clear communication related to asthma management. PMID:20399060
Supporting Regularized Logistic Regression Privately and Efficiently
Li, Wenfa; Liu, Hongzhe; Yang, Peng; Xie, Wei
2016-01-01
As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc. PMID:27271738
Comparative Research of Navy Voluntary Education at Operational Commands
2017-03-01
return on investment, ROI, logistic regression, multivariate analysis, descriptive statistics, Markov, time-series, linear programming 15. NUMBER...21 B. DESCRIPTIVE STATISTICS TABLES ...............................................25 C. PRIVACY CONSIDERATIONS...THIS PAGE INTENTIONALLY LEFT BLANK xi LIST OF TABLES Table 1. Variables and Descriptions . Adapted from NETC (2016). .......................21
Methods for estimating selected low-flow frequency statistics for unregulated streams in Kentucky
Martin, Gary R.; Arihood, Leslie D.
2010-01-01
This report provides estimates of, and presents methods for estimating, selected low-flow frequency statistics for unregulated streams in Kentucky including the 30-day mean low flows for recurrence intervals of 2 and 5 years (30Q2 and 30Q5) and the 7-day mean low flows for recurrence intervals of 5, 10, and 20 years (7Q2, 7Q10, and 7Q20). Estimates of these statistics are provided for 121 U.S. Geological Survey streamflow-gaging stations with data through the 2006 climate year, which is the 12-month period ending March 31 of each year. Data were screened to identify the periods of homogeneous, unregulated flows for use in the analyses. Logistic-regression equations are presented for estimating the annual probability of the selected low-flow frequency statistics being equal to zero. Weighted-least-squares regression equations were developed for estimating the magnitude of the nonzero 30Q2, 30Q5, 7Q2, 7Q10, and 7Q20 low flows. Three low-flow regions were defined for estimating the 7-day low-flow frequency statistics. The explicit explanatory variables in the regression equations include total drainage area and the mapped streamflow-variability index measured from a revised statewide coverage of this characteristic. The percentage of the station low-flow statistics correctly classified as zero or nonzero by use of the logistic-regression equations ranged from 87.5 to 93.8 percent. The average standard errors of prediction of the weighted-least-squares regression equations ranged from 108 to 226 percent. The 30Q2 regression equations have the smallest standard errors of prediction, and the 7Q20 regression equations have the largest standard errors of prediction. The regression equations are applicable only to stream sites with low flows unaffected by regulation from reservoirs and local diversions of flow and to drainage basins in specified ranges of basin characteristics. Caution is advised when applying the equations for basins with characteristics near the applicable limits and for basins with karst drainage features.
Estimating the Probability of Rare Events Occurring Using a Local Model Averaging.
Chen, Jin-Hua; Chen, Chun-Shu; Huang, Meng-Fan; Lin, Hung-Chih
2016-10-01
In statistical applications, logistic regression is a popular method for analyzing binary data accompanied by explanatory variables. But when one of the two outcomes is rare, the estimation of model parameters has been shown to be severely biased and hence estimating the probability of rare events occurring based on a logistic regression model would be inaccurate. In this article, we focus on estimating the probability of rare events occurring based on logistic regression models. Instead of selecting a best model, we propose a local model averaging procedure based on a data perturbation technique applied to different information criteria to obtain different probability estimates of rare events occurring. Then an approximately unbiased estimator of Kullback-Leibler loss is used to choose the best one among them. We design complete simulations to show the effectiveness of our approach. For illustration, a necrotizing enterocolitis (NEC) data set is analyzed. © 2016 Society for Risk Analysis.
Neurophysiological correlates of depressive symptoms in young adults: A quantitative EEG study.
Lee, Poh Foong; Kan, Donica Pei Xin; Croarkin, Paul; Phang, Cheng Kar; Doruk, Deniz
2018-01-01
There is an unmet need for practical and reliable biomarkers for mood disorders in young adults. Identifying the brain activity associated with the early signs of depressive disorders could have important diagnostic and therapeutic implications. In this study we sought to investigate the EEG characteristics in young adults with newly identified depressive symptoms. Based on the initial screening, a total of 100 participants (n = 50 euthymic, n = 50 depressive) underwent 32-channel EEG acquisition. Simple logistic regression and C-statistic were used to explore if EEG power could be used to discriminate between the groups. The strongest EEG predictors of mood using multivariate logistic regression models. Simple logistic regression analysis with subsequent C-statistics revealed that only high-alpha and beta power originating from the left central cortex (C3) have a reliable discriminative value (ROC curve >0.7 (70%)) for differentiating the depressive group from the euthymic group. Multivariate regression analysis showed that the single most significant predictor of group (depressive vs. euthymic) is the high-alpha power over C3 (p = 0.03). The present findings suggest that EEG is a useful tool in the identification of neurophysiological correlates of depressive symptoms in young adults with no previous psychiatric history. Our results could guide future studies investigating the early neurophysiological changes and surrogate outcomes in depression. Copyright © 2017 Elsevier Ltd. All rights reserved.
Eash, David A.; Barnes, Kimberlee K.
2017-01-01
A statewide study was conducted to develop regression equations for estimating six selected low-flow frequency statistics and harmonic mean flows for ungaged stream sites in Iowa. The estimation equations developed for the six low-flow frequency statistics include: the annual 1-, 7-, and 30-day mean low flows for a recurrence interval of 10 years, the annual 30-day mean low flow for a recurrence interval of 5 years, and the seasonal (October 1 through December 31) 1- and 7-day mean low flows for a recurrence interval of 10 years. Estimation equations also were developed for the harmonic-mean-flow statistic. Estimates of these seven selected statistics are provided for 208 U.S. Geological Survey continuous-record streamgages using data through September 30, 2006. The study area comprises streamgages located within Iowa and 50 miles beyond the State's borders. Because trend analyses indicated statistically significant positive trends when considering the entire period of record for the majority of the streamgages, the longest, most recent period of record without a significant trend was determined for each streamgage for use in the study. The median number of years of record used to compute each of these seven selected statistics was 35. Geographic information system software was used to measure 54 selected basin characteristics for each streamgage. Following the removal of two streamgages from the initial data set, data collected for 206 streamgages were compiled to investigate three approaches for regionalization of the seven selected statistics. Regionalization, a process using statistical regression analysis, provides a relation for efficiently transferring information from a group of streamgages in a region to ungaged sites in the region. The three regionalization approaches tested included statewide, regional, and region-of-influence regressions. For the regional regression, the study area was divided into three low-flow regions on the basis of hydrologic characteristics, landform regions, and soil regions. A comparison of root mean square errors and average standard errors of prediction for the statewide, regional, and region-of-influence regressions determined that the regional regression provided the best estimates of the seven selected statistics at ungaged sites in Iowa. Because a significant number of streams in Iowa reach zero flow as their minimum flow during low-flow years, four different types of regression analyses were used: left-censored, logistic, generalized-least-squares, and weighted-least-squares regression. A total of 192 streamgages were included in the development of 27 regression equations for the three low-flow regions. For the northeast and northwest regions, a censoring threshold was used to develop 12 left-censored regression equations to estimate the 6 low-flow frequency statistics for each region. For the southern region a total of 12 regression equations were developed; 6 logistic regression equations were developed to estimate the probability of zero flow for the 6 low-flow frequency statistics and 6 generalized least-squares regression equations were developed to estimate the 6 low-flow frequency statistics, if nonzero flow is estimated first by use of the logistic equations. A weighted-least-squares regression equation was developed for each region to estimate the harmonic-mean-flow statistic. Average standard errors of estimate for the left-censored equations for the northeast region range from 64.7 to 88.1 percent and for the northwest region range from 85.8 to 111.8 percent. Misclassification percentages for the logistic equations for the southern region range from 5.6 to 14.0 percent. Average standard errors of prediction for generalized least-squares equations for the southern region range from 71.7 to 98.9 percent and pseudo coefficients of determination for the generalized-least-squares equations range from 87.7 to 91.8 percent. Average standard errors of prediction for weighted-least-squares equations developed for estimating the harmonic-mean-flow statistic for each of the three regions range from 66.4 to 80.4 percent. The regression equations are applicable only to stream sites in Iowa with low flows not significantly affected by regulation, diversion, or urbanization and with basin characteristics within the range of those used to develop the equations. If the equations are used at ungaged sites on regulated streams, or on streams affected by water-supply and agricultural withdrawals, then the estimates will need to be adjusted by the amount of regulation or withdrawal to estimate the actual flow conditions if that is of interest. Caution is advised when applying the equations for basins with characteristics near the applicable limits of the equations and for basins located in karst topography. A test of two drainage-area ratio methods using 31 pairs of streamgages, for the annual 7-day mean low-flow statistic for a recurrence interval of 10 years, indicates a weighted drainage-area ratio method provides better estimates than regional regression equations for an ungaged site on a gaged stream in Iowa when the drainage-area ratio is between 0.5 and 1.4. These regression equations will be implemented within the U.S. Geological Survey StreamStats web-based geographic-information-system tool. StreamStats allows users to click on any ungaged site on a river and compute estimates of the seven selected statistics; in addition, 90-percent prediction intervals and the measured basin characteristics for the ungaged sites also are provided. StreamStats also allows users to click on any streamgage in Iowa and estimates computed for these seven selected statistics are provided for the streamgage.
NASA Astrophysics Data System (ADS)
Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele
2015-11-01
The aim of this work is to define reliable susceptibility models for shallow landslides using Logistic Regression and Random Forests multivariate statistical techniques. The study area, located in North-East Sicily, was hit on October 1st 2009 by a severe rainstorm (225 mm of cumulative rainfall in 7 h) which caused flash floods and more than 1000 landslides. Several small villages, such as Giampilieri, were hit with 31 fatalities, 6 missing persons and damage to buildings and transportation infrastructures. Landslides, mainly types such as earth and debris translational slides evolving into debris flows, were triggered on steep slopes and involved colluvium and regolith materials which cover the underlying metamorphic bedrock. The work has been carried out with the following steps: i) realization of a detailed event landslide inventory map through field surveys coupled with observation of high resolution aerial colour orthophoto; ii) identification of landslide source areas; iii) data preparation of landslide controlling factors and descriptive statistics based on a bivariate method (Frequency Ratio) to get an initial overview on existing relationships between causative factors and shallow landslide source areas; iv) choice of criteria for the selection and sizing of the mapping unit; v) implementation of 5 multivariate statistical susceptibility models based on Logistic Regression and Random Forests techniques and focused on landslide source areas; vi) evaluation of the influence of sample size and type of sampling on results and performance of the models; vii) evaluation of the predictive capabilities of the models using ROC curve, AUC and contingency tables; viii) comparison of model results and obtained susceptibility maps; and ix) analysis of temporal variation of landslide susceptibility related to input parameter changes. Models based on Logistic Regression and Random Forests have demonstrated excellent predictive capabilities. Land use and wildfire variables were found to have a strong control on the occurrence of very rapid shallow landslides.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yahya, Noorazrul, E-mail: noorazrul.yahya@research.uwa.edu.au; Ebert, Martin A.; Bulsara, Max
Purpose: Given the paucity of available data concerning radiotherapy-induced urinary toxicity, it is important to ensure derivation of the most robust models with superior predictive performance. This work explores multiple statistical-learning strategies for prediction of urinary symptoms following external beam radiotherapy of the prostate. Methods: The performance of logistic regression, elastic-net, support-vector machine, random forest, neural network, and multivariate adaptive regression splines (MARS) to predict urinary symptoms was analyzed using data from 754 participants accrued by TROG03.04-RADAR. Predictive features included dose-surface data, comorbidities, and medication-intake. Four symptoms were analyzed: dysuria, haematuria, incontinence, and frequency, each with three definitions (grade ≥more » 1, grade ≥ 2 and longitudinal) with event rate between 2.3% and 76.1%. Repeated cross-validations producing matched models were implemented. A synthetic minority oversampling technique was utilized in endpoints with rare events. Parameter optimization was performed on the training data. Area under the receiver operating characteristic curve (AUROC) was used to compare performance using sample size to detect differences of ≥0.05 at the 95% confidence level. Results: Logistic regression, elastic-net, random forest, MARS, and support-vector machine were the highest-performing statistical-learning strategies in 3, 3, 3, 2, and 1 endpoints, respectively. Logistic regression, MARS, elastic-net, random forest, neural network, and support-vector machine were the best, or were not significantly worse than the best, in 7, 7, 5, 5, 3, and 1 endpoints. The best-performing statistical model was for dysuria grade ≥ 1 with AUROC ± standard deviation of 0.649 ± 0.074 using MARS. For longitudinal frequency and dysuria grade ≥ 1, all strategies produced AUROC>0.6 while all haematuria endpoints and longitudinal incontinence models produced AUROC<0.6. Conclusions: Logistic regression and MARS were most likely to be the best-performing strategy for the prediction of urinary symptoms with elastic-net and random forest producing competitive results. The predictive power of the models was modest and endpoint-dependent. New features, including spatial dose maps, may be necessary to achieve better models.« less
Ohlmacher, G.C.; Davis, J.C.
2003-01-01
Landslides in the hilly terrain along the Kansas and Missouri rivers in northeastern Kansas have caused millions of dollars in property damage during the last decade. To address this problem, a statistical method called multiple logistic regression has been used to create a landslide-hazard map for Atchison, Kansas, and surrounding areas. Data included digitized geology, slopes, and landslides, manipulated using ArcView GIS. Logistic regression relates predictor variables to the occurrence or nonoccurrence of landslides within geographic cells and uses the relationship to produce a map showing the probability of future landslides, given local slopes and geologic units. Results indicated that slope is the most important variable for estimating landslide hazard in the study area. Geologic units consisting mostly of shale, siltstone, and sandstone were most susceptible to landslides. Soil type and aspect ratio were considered but excluded from the final analysis because these variables did not significantly add to the predictive power of the logistic regression. Soil types were highly correlated with the geologic units, and no significant relationships existed between landslides and slope aspect. ?? 2003 Elsevier Science B.V. All rights reserved.
A Method for Calculating the Probability of Successfully Completing a Rocket Propulsion Ground Test
NASA Technical Reports Server (NTRS)
Messer, Bradley
2007-01-01
Propulsion ground test facilities face the daily challenge of scheduling multiple customers into limited facility space and successfully completing their propulsion test projects. Over the last decade NASA s propulsion test facilities have performed hundreds of tests, collected thousands of seconds of test data, and exceeded the capabilities of numerous test facility and test article components. A logistic regression mathematical modeling technique has been developed to predict the probability of successfully completing a rocket propulsion test. A logistic regression model is a mathematical modeling approach that can be used to describe the relationship of several independent predictor variables X(sub 1), X(sub 2),.., X(sub k) to a binary or dichotomous dependent variable Y, where Y can only be one of two possible outcomes, in this case Success or Failure of accomplishing a full duration test. The use of logistic regression modeling is not new; however, modeling propulsion ground test facilities using logistic regression is both a new and unique application of the statistical technique. Results from this type of model provide project managers with insight and confidence into the effectiveness of rocket propulsion ground testing.
Kesselmeier, Miriam; Lorenzo Bermejo, Justo
2017-11-01
Logistic regression is the most common technique used for genetic case-control association studies. A disadvantage of standard maximum likelihood estimators of the genotype relative risk (GRR) is their strong dependence on outlier subjects, for example, patients diagnosed at unusually young age. Robust methods are available to constrain outlier influence, but they are scarcely used in genetic studies. This article provides a non-intimidating introduction to robust logistic regression, and investigates its benefits and limitations in genetic association studies. We applied the bounded Huber and extended the R package 'robustbase' with the re-descending Hampel functions to down-weight outlier influence. Computer simulations were carried out to assess the type I error rate, mean squared error (MSE) and statistical power according to major characteristics of the genetic study and investigated markers. Simulations were complemented with the analysis of real data. Both standard and robust estimation controlled type I error rates. Standard logistic regression showed the highest power but standard GRR estimates also showed the largest bias and MSE, in particular for associated rare and recessive variants. For illustration, a recessive variant with a true GRR=6.32 and a minor allele frequency=0.05 investigated in a 1000 case/1000 control study by standard logistic regression resulted in power=0.60 and MSE=16.5. The corresponding figures for Huber-based estimation were power=0.51 and MSE=0.53. Overall, Hampel- and Huber-based GRR estimates did not differ much. Robust logistic regression may represent a valuable alternative to standard maximum likelihood estimation when the focus lies on risk prediction rather than identification of susceptibility variants. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Butler, W.J.; Kalasinski, L.A.
In this paper, a generalized logistic regression model for correlated observations is used to analyze epidemiologic data on the frequency of spontaneous abortion among a group of women office workers. The results are compared to those obtained from the use of the standard logistic regression model that assumes statistical independence among all the pregnancies contributed by one woman. In this example, the correlation among pregnancies from the same woman is fairly small and did not have a substantial impact on the magnitude of estimates of parameters of the model. This is due at least partly to the small average numbermore » of pregnancies contributed by each woman.« less
Disparities in Minority Promotion Rates: A Total Quality Approach
1992-01-01
UCL - p + 3 x.’ { p ( I - p) / n data, The statistical theory of logistic regression is beyond the scope of this report. Several computer statistical ... Statistics . Richard D. Irwin, Inc., Homewood IL: 1986. Feagin, J. R., Discrimination 4merican style: Institutional racism and sexism . Englewood Cliffs...current year data and the previous three years. Data for fiscal year One purpose of this project is to provide a statistical 1987, 1988, 1989, 1990, and
Crane, Paul K; Gibbons, Laura E; Jolley, Lance; van Belle, Gerald
2006-11-01
We present an ordinal logistic regression model for identification of items with differential item functioning (DIF) and apply this model to a Mini-Mental State Examination (MMSE) dataset. We employ item response theory ability estimation in our models. Three nested ordinal logistic regression models are applied to each item. Model testing begins with examination of the statistical significance of the interaction term between ability and the group indicator, consistent with nonuniform DIF. Then we turn our attention to the coefficient of the ability term in models with and without the group term. If including the group term has a marked effect on that coefficient, we declare that it has uniform DIF. We examined DIF related to language of test administration in addition to self-reported race, Hispanic ethnicity, age, years of education, and sex. We used PARSCALE for IRT analyses and STATA for ordinal logistic regression approaches. We used an iterative technique for adjusting IRT ability estimates on the basis of DIF findings. Five items were found to have DIF related to language. These same items also had DIF related to other covariates. The ordinal logistic regression approach to DIF detection, when combined with IRT ability estimates, provides a reasonable alternative for DIF detection. There appear to be several items with significant DIF related to language of test administration in the MMSE. More attention needs to be paid to the specific criteria used to determine whether an item has DIF, not just the technique used to identify DIF.
Intermediate and advanced topics in multilevel logistic regression analysis
Merlo, Juan
2017-01-01
Multilevel data occur frequently in health services, population and public health, and epidemiologic research. In such research, binary outcomes are common. Multilevel logistic regression models allow one to account for the clustering of subjects within clusters of higher‐level units when estimating the effect of subject and cluster characteristics on subject outcomes. A search of the PubMed database demonstrated that the use of multilevel or hierarchical regression models is increasing rapidly. However, our impression is that many analysts simply use multilevel regression models to account for the nuisance of within‐cluster homogeneity that is induced by clustering. In this article, we describe a suite of analyses that can complement the fitting of multilevel logistic regression models. These ancillary analyses permit analysts to estimate the marginal or population‐average effect of covariates measured at the subject and cluster level, in contrast to the within‐cluster or cluster‐specific effects arising from the original multilevel logistic regression model. We describe the interval odds ratio and the proportion of opposed odds ratios, which are summary measures of effect for cluster‐level covariates. We describe the variance partition coefficient and the median odds ratio which are measures of components of variance and heterogeneity in outcomes. These measures allow one to quantify the magnitude of the general contextual effect. We describe an R 2 measure that allows analysts to quantify the proportion of variation explained by different multilevel logistic regression models. We illustrate the application and interpretation of these measures by analyzing mortality in patients hospitalized with a diagnosis of acute myocardial infarction. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:28543517
NASA Technical Reports Server (NTRS)
Smith, Kelly M.; Gay, Robert S.; Stachowiak, Susan J.
2013-01-01
In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter to improve altitude knowledge. In order to increase overall robustness, the vehicle also has an alternate method of triggering the parachute deployment sequence based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this backup trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to semi-automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a statistical classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers improved performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles.
NASA Technical Reports Server (NTRS)
Smith, Kelly; Gay, Robert; Stachowiak, Susan
2013-01-01
In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter to improve altitude knowledge. In order to increase overall robustness, the vehicle also has an alternate method of triggering the parachute deployment sequence based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this backup trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to semi-automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a statistical classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers improved performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles
Learning investment indicators through data extension
NASA Astrophysics Data System (ADS)
Dvořák, Marek
2017-07-01
Stock prices in the form of time series were analysed using single and multivariate statistical methods. After simple data preprocessing in the form of logarithmic differences, we augmented this single variate time series to a multivariate representation. This method makes use of sliding windows to calculate several dozen of new variables using simple statistic tools like first and second moments as well as more complicated statistic, like auto-regression coefficients and residual analysis, followed by an optional quadratic transformation that was further used for data extension. These were used as a explanatory variables in a regularized logistic LASSO regression which tried to estimate Buy-Sell Index (BSI) from real stock market data.
The Mantel-Haenszel procedure revisited: models and generalizations.
Fidler, Vaclav; Nagelkerke, Nico
2013-01-01
Several statistical methods have been developed for adjusting the Odds Ratio of the relation between two dichotomous variables X and Y for some confounders Z. With the exception of the Mantel-Haenszel method, commonly used methods, notably binary logistic regression, are not symmetrical in X and Y. The classical Mantel-Haenszel method however only works for confounders with a limited number of discrete strata, which limits its utility, and appears to have no basis in statistical models. Here we revisit the Mantel-Haenszel method and propose an extension to continuous and vector valued Z. The idea is to replace the observed cell entries in strata of the Mantel-Haenszel procedure by subject specific classification probabilities for the four possible values of (X,Y) predicted by a suitable statistical model. For situations where X and Y can be treated symmetrically we propose and explore the multinomial logistic model. Under the homogeneity hypothesis, which states that the odds ratio does not depend on Z, the logarithm of the odds ratio estimator can be expressed as a simple linear combination of three parameters of this model. Methods for testing the homogeneity hypothesis are proposed. The relationship between this method and binary logistic regression is explored. A numerical example using survey data is presented.
The Mantel-Haenszel Procedure Revisited: Models and Generalizations
Fidler, Vaclav; Nagelkerke, Nico
2013-01-01
Several statistical methods have been developed for adjusting the Odds Ratio of the relation between two dichotomous variables X and Y for some confounders Z. With the exception of the Mantel-Haenszel method, commonly used methods, notably binary logistic regression, are not symmetrical in X and Y. The classical Mantel-Haenszel method however only works for confounders with a limited number of discrete strata, which limits its utility, and appears to have no basis in statistical models. Here we revisit the Mantel-Haenszel method and propose an extension to continuous and vector valued Z. The idea is to replace the observed cell entries in strata of the Mantel-Haenszel procedure by subject specific classification probabilities for the four possible values of (X,Y) predicted by a suitable statistical model. For situations where X and Y can be treated symmetrically we propose and explore the multinomial logistic model. Under the homogeneity hypothesis, which states that the odds ratio does not depend on Z, the logarithm of the odds ratio estimator can be expressed as a simple linear combination of three parameters of this model. Methods for testing the homogeneity hypothesis are proposed. The relationship between this method and binary logistic regression is explored. A numerical example using survey data is presented. PMID:23516463
Metsemakers, W-J; Handojo, K; Reynders, P; Sermon, A; Vanderschot, P; Nijs, S
2015-04-01
Despite modern advances in the treatment of tibial shaft fractures, complications including nonunion, malunion, and infection remain relatively frequent. A better understanding of these injuries and its complications could lead to prevention rather than treatment strategies. A retrospective study was performed to identify risk factors for deep infection and compromised fracture healing after intramedullary nailing (IMN) of tibial shaft fractures. Between January 2000 and January 2012, 480 consecutive patients with 486 tibial shaft fractures were enrolled in the study. Statistical analysis was performed to determine predictors of deep infection and compromised fracture healing. Compromised fracture healing was subdivided in delayed union and nonunion. The following independent variables were selected for analysis: age, sex, smoking, obesity, diabetes, American Society of Anaesthesiologists (ASA) classification, polytrauma, fracture type, open fractures, Gustilo type, primary external fixation (EF), time to nailing (TTN) and reaming. As primary statistical evaluation we performed a univariate analysis, followed by a multiple logistic regression model. Univariate regression analysis revealed similar risk factors for delayed union and nonunion, including fracture type, open fractures and Gustilo type. Factors affecting the occurrence of deep infection in this model were primary EF, a prolonged TTN, open fractures and Gustilo type. Multiple logistic regression analysis revealed polytrauma as the single risk factor for nonunion. With respect to delayed union, no risk factors could be identified. In the same statistical model, deep infection was correlated with primary EF. The purpose of this study was to evaluate risk factors of poor outcome after IMN of tibial shaft fractures. The univariate regression analysis showed that the nature of complications after tibial shaft nailing could be multifactorial. This was not confirmed in a multiple logistic regression model, which only revealed polytrauma and primary EF as risk factors for nonunion and deep infection, respectively. Future strategies should focus on prevention in high-risk populations such as polytrauma patients treated with EF. Copyright © 2014 Elsevier Ltd. All rights reserved.
Detecting Anomalies in Process Control Networks
NASA Astrophysics Data System (ADS)
Rrushi, Julian; Kang, Kyoung-Don
This paper presents the estimation-inspection algorithm, a statistical algorithm for anomaly detection in process control networks. The algorithm determines if the payload of a network packet that is about to be processed by a control system is normal or abnormal based on the effect that the packet will have on a variable stored in control system memory. The estimation part of the algorithm uses logistic regression integrated with maximum likelihood estimation in an inductive machine learning process to estimate a series of statistical parameters; these parameters are used in conjunction with logistic regression formulas to form a probability mass function for each variable stored in control system memory. The inspection part of the algorithm uses the probability mass functions to estimate the normalcy probability of a specific value that a network packet writes to a variable. Experimental results demonstrate that the algorithm is very effective at detecting anomalies in process control networks.
Disconcordance in Statistical Models of Bisphenol A and Chronic Disease Outcomes in NHANES 2003-08
Casey, Martin F.; Neidell, Matthew
2013-01-01
Background Bisphenol A (BPA), a high production chemical commonly found in plastics, has drawn great attention from researchers due to the substance’s potential toxicity. Using data from three National Health and Nutrition Examination Survey (NHANES) cycles, we explored the consistency and robustness of BPA’s reported effects on coronary heart disease and diabetes. Methods And Findings We report the use of three different statistical models in the analysis of BPA: (1) logistic regression, (2) log-linear regression, and (3) dose-response logistic regression. In each variation, confounders were added in six blocks to account for demographics, urinary creatinine, source of BPA exposure, healthy behaviours, and phthalate exposure. Results were sensitive to the variations in functional form of our statistical models, but no single model yielded consistent results across NHANES cycles. Reported ORs were also found to be sensitive to inclusion/exclusion criteria. Further, observed effects, which were most pronounced in NHANES 2003-04, could not be explained away by confounding. Conclusions Limitations in the NHANES data and a poor understanding of the mode of action of BPA have made it difficult to develop informative statistical models. Given the sensitivity of effect estimates to functional form, researchers should report results using multiple specifications with different assumptions about BPA measurement, thus allowing for the identification of potential discrepancies in the data. PMID:24223205
WebGLORE: a web service for Grid LOgistic REgression.
Jiang, Wenchao; Li, Pinghao; Wang, Shuang; Wu, Yuan; Xue, Meng; Ohno-Machado, Lucila; Jiang, Xiaoqian
2013-12-15
WebGLORE is a free web service that enables privacy-preserving construction of a global logistic regression model from distributed datasets that are sensitive. It only transfers aggregated local statistics (from participants) through Hypertext Transfer Protocol Secure to a trusted server, where the global model is synthesized. WebGLORE seamlessly integrates AJAX, JAVA Applet/Servlet and PHP technologies to provide an easy-to-use web service for biomedical researchers to break down policy barriers during information exchange. http://dbmi-engine.ucsd.edu/webglore3/. WebGLORE can be used under the terms of GNU general public license as published by the Free Software Foundation.
Chan, Siew Foong; Deeks, Jonathan J; Macaskill, Petra; Irwig, Les
2008-01-01
To compare three predictive models based on logistic regression to estimate adjusted likelihood ratios allowing for interdependency between diagnostic variables (tests). This study was a review of the theoretical basis, assumptions, and limitations of published models; and a statistical extension of methods and application to a case study of the diagnosis of obstructive airways disease based on history and clinical examination. Albert's method includes an offset term to estimate an adjusted likelihood ratio for combinations of tests. Spiegelhalter and Knill-Jones method uses the unadjusted likelihood ratio for each test as a predictor and computes shrinkage factors to allow for interdependence. Knottnerus' method differs from the other methods because it requires sequencing of tests, which limits its application to situations where there are few tests and substantial data. Although parameter estimates differed between the models, predicted "posttest" probabilities were generally similar. Construction of predictive models using logistic regression is preferred to the independence Bayes' approach when it is important to adjust for dependency of tests errors. Methods to estimate adjusted likelihood ratios from predictive models should be considered in preference to a standard logistic regression model to facilitate ease of interpretation and application. Albert's method provides the most straightforward approach.
Cameron, Isobel M; Scott, Neil W; Adler, Mats; Reid, Ian C
2014-12-01
It is important for clinical practice and research that measurement scales of well-being and quality of life exhibit only minimal differential item functioning (DIF). DIF occurs where different groups of people endorse items in a scale to different extents after being matched by the intended scale attribute. We investigate the equivalence or otherwise of common methods of assessing DIF. Three methods of measuring age- and sex-related DIF (ordinal logistic regression, Rasch analysis and Mantel χ(2) procedure) were applied to Hospital Anxiety Depression Scale (HADS) data pertaining to a sample of 1,068 patients consulting primary care practitioners. Three items were flagged by all three approaches as having either age- or sex-related DIF with a consistent direction of effect; a further three items identified did not meet stricter criteria for important DIF using at least one method. When applying strict criteria for significant DIF, ordinal logistic regression was slightly less sensitive. Ordinal logistic regression, Rasch analysis and contingency table methods yielded consistent results when identifying DIF in the HADS depression and HADS anxiety scales. Regardless of methods applied, investigators should use a combination of statistical significance, magnitude of the DIF effect and investigator judgement when interpreting the results.
Ordinal logistic regression analysis on the nutritional status of children in KarangKitri village
NASA Astrophysics Data System (ADS)
Ohyver, Margaretha; Yongharto, Kimmy Octavian
2015-09-01
Ordinal logistic regression is a statistical technique that can be used to describe the relationship between ordinal response variable with one or more independent variables. This method has been used in various fields including in the health field. In this research, ordinal logistic regression is used to describe the relationship between nutritional status of children with age, gender, height, and family status. Nutritional status of children in this research is divided into over nutrition, well nutrition, less nutrition, and malnutrition. The purpose for this research is to describe the characteristics of children in the KarangKitri Village and to determine the factors that influence the nutritional status of children in the KarangKitri village. There are three things that obtained from this research. First, there are still children who are not categorized as well nutritional status. Second, there are children who come from sufficient economic level which include in not normal status. Third, the factors that affect the nutritional level of children are age, family status, and height.
O'Dwyer, Jean; Morris Downes, Margaret; Adley, Catherine C
2016-02-01
This study analyses the relationship between meteorological phenomena and outbreaks of waterborne-transmitted vero cytotoxin-producing Escherichia coli (VTEC) in the Republic of Ireland over an 8-year period (2005-2012). Data pertaining to the notification of waterborne VTEC outbreaks were extracted from the Computerised Infectious Disease Reporting system, which is administered through the national Health Protection Surveillance Centre as part of the Health Service Executive. Rainfall and temperature data were obtained from the national meteorological office and categorised as cumulative rainfall, heavy rainfall events in the previous 7 days, and mean temperature. Regression analysis was performed using logistic regression (LR) analysis. The LR model was significant (p < 0.001), with all independent variables: cumulative rainfall, heavy rainfall and mean temperature making a statistically significant contribution to the model. The study has found that rainfall, particularly heavy rainfall in the preceding 7 days of an outbreak, is a strong statistical indicator of a waterborne outbreak and that temperature also impacts waterborne VTEC outbreak occurrence.
Bayesian logistic regression in detection of gene-steroid interaction for cancer at PDLIM5 locus.
Wang, Ke-Sheng; Owusu, Daniel; Pan, Yue; Xie, Changchun
2016-06-01
The PDZ and LIM domain 5 (PDLIM5) gene may play a role in cancer, bipolar disorder, major depression, alcohol dependence and schizophrenia; however, little is known about the interaction effect of steroid and PDLIM5 gene on cancer. This study examined 47 single-nucleotide polymorphisms (SNPs) within the PDLIM5 gene in the Marshfield sample with 716 cancer patients (any diagnosed cancer, excluding minor skin cancer) and 2848 noncancer controls. Multiple logistic regression model in PLINK software was used to examine the association of each SNP with cancer. Bayesian logistic regression in PROC GENMOD in SAS statistical software, ver. 9.4 was used to detect gene- steroid interactions influencing cancer. Single marker analysis using PLINK identified 12 SNPs associated with cancer (P< 0.05); especially, SNP rs6532496 revealed the strongest association with cancer (P = 6.84 × 10⁻³); while the next best signal was rs951613 (P = 7.46 × 10⁻³). Classic logistic regression in PROC GENMOD showed that both rs6532496 and rs951613 revealed strong gene-steroid interaction effects (OR=2.18, 95% CI=1.31-3.63 with P = 2.9 × 10⁻³ for rs6532496 and OR=2.07, 95% CI=1.24-3.45 with P = 5.43 × 10⁻³ for rs951613, respectively). Results from Bayesian logistic regression showed stronger interaction effects (OR=2.26, 95% CI=1.2-3.38 for rs6532496 and OR=2.14, 95% CI=1.14-3.2 for rs951613, respectively). All the 12 SNPs associated with cancer revealed significant gene-steroid interaction effects (P < 0.05); whereas 13 SNPs showed gene-steroid interaction effects without main effect on cancer. SNP rs4634230 revealed the strongest gene-steroid interaction effect (OR=2.49, 95% CI=1.5-4.13 with P = 4.0 × 10⁻⁴ based on the classic logistic regression and OR=2.59, 95% CI=1.4-3.97 from Bayesian logistic regression; respectively). This study provides evidence of common genetic variants within the PDLIM5 gene and interactions between PLDIM5 gene polymorphisms and steroid use influencing cancer.
NASA Astrophysics Data System (ADS)
Yilmaz, Işık
2009-06-01
The purpose of this study is to compare the landslide susceptibility mapping methods of frequency ratio (FR), logistic regression and artificial neural networks (ANN) applied in the Kat County (Tokat—Turkey). Digital elevation model (DEM) was first constructed using GIS software. Landslide-related factors such as geology, faults, drainage system, topographical elevation, slope angle, slope aspect, topographic wetness index (TWI) and stream power index (SPI) were used in the landslide susceptibility analyses. Landslide susceptibility maps were produced from the frequency ratio, logistic regression and neural networks models, and they were then compared by means of their validations. The higher accuracies of the susceptibility maps for all three models were obtained from the comparison of the landslide susceptibility maps with the known landslide locations. However, respective area under curve (AUC) values of 0.826, 0.842 and 0.852 for frequency ratio, logistic regression and artificial neural networks showed that the map obtained from ANN model is more accurate than the other models, accuracies of all models can be evaluated relatively similar. The results obtained in this study also showed that the frequency ratio model can be used as a simple tool in assessment of landslide susceptibility when a sufficient number of data were obtained. Input process, calculations and output process are very simple and can be readily understood in the frequency ratio model, however logistic regression and neural networks require the conversion of data to ASCII or other formats. Moreover, it is also very hard to process the large amount of data in the statistical package.
Wright, David K.; MacEachern, Scott; Lee, Jaeyong
2014-01-01
The locations of diy-geδ-bay (DGB) sites in the Mandara Mountains, northern Cameroon are hypothesized to occur as a function of their ability to see and be seen from points on the surrounding landscape. A series of geostatistical, two-way and Bayesian logistic regression analyses were performed to test two hypotheses related to the intervisibility of the sites to one another and their visual prominence on the landscape. We determine that the intervisibility of the sites to one another is highly statistically significant when compared to 10 stratified-random permutations of DGB sites. Bayesian logistic regression additionally demonstrates that the visibility of the sites to points on the surrounding landscape is statistically significant. The location of sites appears to have also been selected on the basis of lower slope than random permutations of sites. Using statistical measures, many of which are not commonly employed in archaeological research, to evaluate aspects of visibility on the landscape, we conclude that the placement of DGB sites improved their conspicuousness for enhanced ritual, social cooperation and/or competition purposes. PMID:25383883
NASA Astrophysics Data System (ADS)
Varouchakis, Emmanouil; Kourgialas, Nektarios; Karatzas, George; Giannakis, Georgios; Lilli, Maria; Nikolaidis, Nikolaos
2014-05-01
Riverbank erosion affects the river morphology and the local habitat and results in riparian land loss, damage to property and infrastructures, ultimately weakening flood defences. An important issue concerning riverbank erosion is the identification of the areas vulnerable to erosion, as it allows for predicting changes and assists with stream management and restoration. One way to predict the vulnerable to erosion areas is to determine the erosion probability by identifying the underlying relations between riverbank erosion and the geomorphological and/or hydrological variables that prevent or stimulate erosion. A statistical model for evaluating the probability of erosion based on a series of independent local variables and by using logistic regression is developed in this work. The main variables affecting erosion are vegetation index (stability), the presence or absence of meanders, bank material (classification), stream power, bank height, river bank slope, riverbed slope, cross section width and water velocities (Luppi et al. 2009). In statistics, logistic regression is a type of regression analysis used for predicting the outcome of a categorical dependent variable, e.g. binary response, based on one or more predictor variables (continuous or categorical). The probabilities of the possible outcomes are modelled as a function of independent variables using a logistic function. Logistic regression measures the relationship between a categorical dependent variable and, usually, one or several continuous independent variables by converting the dependent variable to probability scores. Then, a logistic regression is formed, which predicts success or failure of a given binary variable (e.g. 1 = "presence of erosion" and 0 = "no erosion") for any value of the independent variables. The regression coefficients are estimated by using maximum likelihood estimation. The erosion occurrence probability can be calculated in conjunction with the model deviance regarding the independent variables tested (Atkinson et al. 2003). The developed statistical model is applied to the Koiliaris River Basin in the island of Crete, Greece. The aim is to determine the probability of erosion along the Koiliaris' riverbanks considering a series of independent geomorphological and/or hydrological variables. Data for the river bank slope and for the river cross section width are available at ten locations along the river. The riverbank has indications of erosion at six of the ten locations while four has remained stable. Based on a recent work, measurements for the two independent variables and data regarding bank stability are available at eight different locations along the river. These locations were used as validation points for the proposed statistical model. The results show a very close agreement between the observed erosion indications and the statistical model as the probability of erosion was accurately predicted at seven out of the eight locations. The next step is to apply the model at more locations along the riverbanks. In November 2013, stakes were inserted at selected locations in order to be able to identify the presence or absence of erosion after the winter period. In April 2014 the presence or absence of erosion will be identified and the model results will be compared to the field data. Our intent is to extend the model by increasing the number of independent variables in order to indentify the key factors favouring erosion along the Koiliaris River. We aim at developing an easy to use statistical tool that will provide a quantified measure of the erosion probability along the riverbanks, which could consequently be used to prevent erosion and flooding events. Atkinson, P. M., German, S. E., Sear, D. A. and Clark, M. J. 2003. Exploring the relations between riverbank erosion and geomorphological controls using geographically weighted logistic regression. Geographical Analysis, 35 (1), 58-82. Luppi, L., Rinaldi, M., Teruggi, L. B., Darby, S. E. and Nardi, L. 2009. Monitoring and numerical modelling of riverbank erosion processes: A case study along the Cecina River (central Italy). Earth Surface Processes and Landforms, 34 (4), 530-546. Acknowledgements This work is part of an on-going THALES project (CYBERSENSORS - High Frequency Monitoring System for Integrated Water Resources Management of Rivers). The project has been co-financed by the European Union (European Social Fund - ESF) and Greek national funds through the Operational Program "Education and Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research Funding Program: THALES. Investing in knowledge society through the European Social Fund.
Ye, Jiang-Feng; Zhao, Yu-Xin; Ju, Jian; Wang, Wei
2017-10-01
To discuss the value of the Bedside Index for Severity in Acute Pancreatitis (BISAP), Modified Early Warning Score (MEWS), serum Ca2+, similarly hereinafter, and red cell distribution width (RDW) for predicting the severity grade of acute pancreatitis and to develop and verify a more accurate scoring system to predict the severity of AP. In 302 patients with AP, we calculated BISAP and MEWS scores and conducted regression analyses on the relationships of BISAP scoring, RDW, MEWS, and serum Ca2+ with the severity of AP using single-factor logistics. The variables with statistical significance in the single-factor logistic regression were used in a multi-factor logistic regression model; forward stepwise regression was used to screen variables and build a multi-factor prediction model. A receiver operating characteristic curve (ROC curve) was constructed, and the significance of multi- and single-factor prediction models in predicting the severity of AP using the area under the ROC curve (AUC) was evaluated. The internal validity of the model was verified through bootstrapping. Among 302 patients with AP, 209 had mild acute pancreatitis (MAP) and 93 had severe acute pancreatitis (SAP). According to single-factor logistic regression analysis, we found that BISAP, MEWS and serum Ca2+ are prediction indexes of the severity of AP (P-value<0.001), whereas RDW is not a prediction index of AP severity (P-value>0.05). The multi-factor logistic regression analysis showed that BISAP and serum Ca2+ are independent prediction indexes of AP severity (P-value<0.001), and MEWS is not an independent prediction index of AP severity (P-value>0.05); BISAP is negatively related to serum Ca2+ (r=-0.330, P-value<0.001). The constructed model is as follows: ln()=7.306+1.151*BISAP-4.516*serum Ca2+. The predictive ability of each model for SAP follows the order of the combined BISAP and serum Ca2+ prediction model>Ca2+>BISAP. There is no statistical significance for the predictive ability of BISAP and serum Ca2+ (P-value>0.05); however, there is remarkable statistical significance for the predictive ability using the newly built prediction model as well as BISAP and serum Ca2+ individually (P-value<0.01). Verification of the internal validity of the models by bootstrapping is favorable. BISAP and serum Ca2+ have high predictive value for the severity of AP. However, the model built by combining BISAP and serum Ca2+ is remarkably superior to those of BISAP and serum Ca2+ individually. Furthermore, this model is simple, practical and appropriate for clinical use. Copyright © 2016. Published by Elsevier Masson SAS.
Nagelkerke, Nico; Fidler, Vaclav
2015-01-01
The problem of discrimination and classification is central to much of epidemiology. Here we consider the estimation of a logistic regression/discrimination function from training samples, when one of the training samples is subject to misclassification or mislabeling, e.g. diseased individuals are incorrectly classified/labeled as healthy controls. We show that this leads to zero-inflated binomial model with a defective logistic regression or discrimination function, whose parameters can be estimated using standard statistical methods such as maximum likelihood. These parameters can be used to estimate the probability of true group membership among those, possibly erroneously, classified as controls. Two examples are analyzed and discussed. A simulation study explores properties of the maximum likelihood parameter estimates and the estimates of the number of mislabeled observations.
Calibrating random forests for probability estimation.
Dankowski, Theresa; Ziegler, Andreas
2016-09-30
Probabilities can be consistently estimated using random forests. It is, however, unclear how random forests should be updated to make predictions for other centers or at different time points. In this work, we present two approaches for updating random forests for probability estimation. The first method has been proposed by Elkan and may be used for updating any machine learning approach yielding consistent probabilities, so-called probability machines. The second approach is a new strategy specifically developed for random forests. Using the terminal nodes, which represent conditional probabilities, the random forest is first translated to logistic regression models. These are, in turn, used for re-calibration. The two updating strategies were compared in a simulation study and are illustrated with data from the German Stroke Study Collaboration. In most simulation scenarios, both methods led to similar improvements. In the simulation scenario in which the stricter assumptions of Elkan's method were not met, the logistic regression-based re-calibration approach for random forests outperformed Elkan's method. It also performed better on the stroke data than Elkan's method. The strength of Elkan's method is its general applicability to any probability machine. However, if the strict assumptions underlying this approach are not met, the logistic regression-based approach is preferable for updating random forests for probability estimation. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Early warnings for suicide attempt among Chinese rural population.
Lyu, Juncheng; Wang, Yingying; Shi, Hong; Zhang, Jie
2018-06-05
This study was to explore the main influencing factors of attempted suicide and establish an early warning model, so as to put forward prevention strategies for attempted suicide. Data came from a large-scale case-control epidemiological survey. A sample of 659 serious suicide attempters was randomly recruited from 13 rural counties in China. Each case was matched by a community control for gender, age, and residence location. Face to face interviews were conducted for all the cases and controls with the same structured questionnaire. Univariate logistic regression was applied to screen the factors and multivariate logistic regression was used to excavate the predictors. There were no statistical differences between suicide attempters and the community controls in gender, age, and residence location. The Cronbach`s coefficients for all the scales used were above 0.675. The multivariate logistic regressions have revealed 12 statistically significant variables predicting attempted suicide, including less education, family history of suicide, poor health, mental problem, aspiration strain, hopelessness, impulsivity, depression, negative life events. On the other hand, social support, coping skills, and healthy community protected the rural residents from suicide attempt. The excavated warning predictors are significant clinical meaning for the clinical psychiatrist. Crisis intervention strategies in rural China should be informed by the findings from this research. Education, social support, healthy community, and strain reduction are all measures to decrease the likelihood of crises. Copyright © 2018. Published by Elsevier B.V.
Semiparametric time varying coefficient model for matched case-crossover studies.
Ortega-Villa, Ana Maria; Kim, Inyoung; Kim, H
2017-03-15
In matched case-crossover studies, it is generally accepted that the covariates on which a case and associated controls are matched cannot exert a confounding effect on independent predictors included in the conditional logistic regression model. This is because any stratum effect is removed by the conditioning on the fixed number of sets of the case and controls in the stratum. Hence, the conditional logistic regression model is not able to detect any effects associated with the matching covariates by stratum. However, some matching covariates such as time often play an important role as an effect modification leading to incorrect statistical estimation and prediction. Therefore, we propose three approaches to evaluate effect modification by time. The first is a parametric approach, the second is a semiparametric penalized approach, and the third is a semiparametric Bayesian approach. Our parametric approach is a two-stage method, which uses conditional logistic regression in the first stage and then estimates polynomial regression in the second stage. Our semiparametric penalized and Bayesian approaches are one-stage approaches developed by using regression splines. Our semiparametric one stage approach allows us to not only detect the parametric relationship between the predictor and binary outcomes, but also evaluate nonparametric relationships between the predictor and time. We demonstrate the advantage of our semiparametric one-stage approaches using both a simulation study and an epidemiological example of a 1-4 bi-directional case-crossover study of childhood aseptic meningitis with drinking water turbidity. We also provide statistical inference for the semiparametric Bayesian approach using Bayes Factors. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
NASA Technical Reports Server (NTRS)
Generazio, Edward R.
2014-01-01
Unknown risks are introduced into failure critical systems when probability of detection (POD) capabilities are accepted without a complete understanding of the statistical method applied and the interpretation of the statistical results. The presence of this risk in the nondestructive evaluation (NDE) community is revealed in common statements about POD. These statements are often interpreted in a variety of ways and therefore, the very existence of the statements identifies the need for a more comprehensive understanding of POD methodologies. Statistical methodologies have data requirements to be met, procedures to be followed, and requirements for validation or demonstration of adequacy of the POD estimates. Risks are further enhanced due to the wide range of statistical methodologies used for determining the POD capability. Receiver/Relative Operating Characteristics (ROC) Display, simple binomial, logistic regression, and Bayes' rule POD methodologies are widely used in determining POD capability. This work focuses on Hit-Miss data to reveal the framework of the interrelationships between Receiver/Relative Operating Characteristics Display, simple binomial, logistic regression, and Bayes' Rule methodologies as they are applied to POD. Knowledge of these interrelationships leads to an intuitive and global understanding of the statistical data, procedural and validation requirements for establishing credible POD estimates.
Ai, Zi-Sheng; Gao, You-Shui; Sun, Yuan; Liu, Yue; Zhang, Chang-Qing; Jiang, Cheng-Hua
2013-03-01
Risk factors for femoral neck fracture-induced avascular necrosis of the femoral head have not been elucidated clearly in middle-aged and elderly patients. Moreover, the high incidence of screw removal in China and its effect on the fate of the involved femoral head require statistical methods to reflect their intrinsic relationship. Ninety-nine patients older than 45 years with femoral neck fracture were treated by internal fixation between May 1999 and April 2004. Descriptive analysis, interaction analysis between associated factors, single factor logistic regression, multivariate logistic regression, and detailed interaction analysis were employed to explore potential relationships among associated factors. Avascular necrosis of the femoral head was found in 15 cases (15.2 %). Age × the status of implants (removal vs. maintenance) and gender × the timing of reduction were interactive according to two-factor interactive analysis. Age, the displacement of fractures, the quality of reduction, and the status of implants were found to be significant factors in single factor logistic regression analysis. Age, age × the status of implants, and the quality of reduction were found to be significant factors in multivariate logistic regression analysis. In fine interaction analysis after multivariate logistic regression analysis, implant removal was the most important risk factor for avascular necrosis in 56-to-85-year-old patients, with a risk ratio of 26.00 (95 % CI = 3.076-219.747). The middle-aged and elderly have less incidence of avascular necrosis of the femoral head following femoral neck fractures treated by cannulated screws. The removal of cannulated screws can induce a significantly high incidence of avascular necrosis of the femoral head in elderly patients, while a high-quality reduction is helpful to reduce avascular necrosis.
Zhou, Jinzhe; Zhou, Yanbing; Cao, Shougen; Li, Shikuan; Wang, Hao; Niu, Zhaojian; Chen, Dong; Wang, Dongsheng; Lv, Liang; Zhang, Jian; Li, Yu; Jiao, Xuelong; Tan, Xiaojie; Zhang, Jianli; Wang, Haibo; Zhang, Bingyuan; Lu, Yun; Sun, Zhenqing
2016-01-01
Reporting of surgical complications is common, but few provide information about the severity and estimate risk factors of complications. If have, but lack of specificity. We retrospectively analyzed data on 2795 gastric cancer patients underwent surgical procedure at the Affiliated Hospital of Qingdao University between June 2007 and June 2012, established multivariate logistic regression model to predictive risk factors related to the postoperative complications according to the Clavien-Dindo classification system. Twenty-four out of 86 variables were identified statistically significant in univariate logistic regression analysis, 11 significant variables entered multivariate analysis were employed to produce the risk model. Liver cirrhosis, diabetes mellitus, Child classification, invasion of neighboring organs, combined resection, introperative transfusion, Billroth II anastomosis of reconstruction, malnutrition, surgical volume of surgeons, operating time and age were independent risk factors for postoperative complications after gastrectomy. Based on logistic regression equation, p=Exp∑BiXi / (1+Exp∑BiXi), multivariate logistic regression predictive model that calculated the risk of postoperative morbidity was developed, p = 1/(1 + e((4.810-1.287X1-0.504X2-0.500X3-0.474X4-0.405X5-0.318X6-0.316X7-0.305X8-0.278X9-0.255X10-0.138X11))). The accuracy, sensitivity and specificity of the model to predict the postoperative complications were 86.7%, 76.2% and 88.6%, respectively. This risk model based on Clavien-Dindo grading severity of complications system and logistic regression analysis can predict severe morbidity specific to an individual patient's risk factors, estimate patients' risks and benefits of gastric surgery as an accurate decision-making tool and may serve as a template for the development of risk models for other surgical groups.
A Powerful Test for Comparing Multiple Regression Functions.
Maity, Arnab
2012-09-01
In this article, we address the important problem of comparison of two or more population regression functions. Recently, Pardo-Fernández, Van Keilegom and González-Manteiga (2007) developed test statistics for simple nonparametric regression models: Y(ij) = θ(j)(Z(ij)) + σ(j)(Z(ij))∊(ij), based on empirical distributions of the errors in each population j = 1, … , J. In this paper, we propose a test for equality of the θ(j)(·) based on the concept of generalized likelihood ratio type statistics. We also generalize our test for other nonparametric regression setups, e.g, nonparametric logistic regression, where the loglikelihood for population j is any general smooth function [Formula: see text]. We describe a resampling procedure to obtain the critical values of the test. In addition, we present a simulation study to evaluate the performance of the proposed test and compare our results to those in Pardo-Fernández et al. (2007).
WebGLORE: a Web service for Grid LOgistic REgression
Jiang, Wenchao; Li, Pinghao; Wang, Shuang; Wu, Yuan; Xue, Meng; Ohno-Machado, Lucila; Jiang, Xiaoqian
2013-01-01
WebGLORE is a free web service that enables privacy-preserving construction of a global logistic regression model from distributed datasets that are sensitive. It only transfers aggregated local statistics (from participants) through Hypertext Transfer Protocol Secure to a trusted server, where the global model is synthesized. WebGLORE seamlessly integrates AJAX, JAVA Applet/Servlet and PHP technologies to provide an easy-to-use web service for biomedical researchers to break down policy barriers during information exchange. Availability and implementation: http://dbmi-engine.ucsd.edu/webglore3/. WebGLORE can be used under the terms of GNU general public license as published by the Free Software Foundation. Contact: x1jiang@ucsd.edu PMID:24072732
Association Between Socio-Demographic Background and Self-Esteem of University Students.
Haq, Muhammad Ahsan Ul
2016-12-01
The purpose of this study was to scrutinize self-esteem of university students and explore association of self-esteem with academic achievement, gender and other factors. A sample of 346 students was selected from Punjab University, Lahore Pakistan. Rosenberg self-esteem scale with demographic variables was used for data collection. Besides descriptive statistics, binary logistic regression and t test were used for analysing the data. Significant gender difference was observed, self-esteem was significantly higher in males than females. Logistic regression indicates that age, medium of instruction, family income, student monthly expenditures, GPA and area of residence has direct effect on self-esteem; while number of siblings showed an inverse effect.
Current state of the art for statistical modeling of species distributions [Chapter 16
Troy M. Hegel; Samuel A. Cushman; Jeffrey Evans; Falk Huettmann
2010-01-01
Over the past decade the number of statistical modelling tools available to ecologists to model species' distributions has increased at a rapid pace (e.g. Elith et al. 2006; Austin 2007), as have the number of species distribution models (SDM) published in the literature (e.g. Scott et al. 2002). Ten years ago, basic logistic regression (Hosmer and Lemeshow 2000)...
Regression analysis for solving diagnosis problem of children's health
NASA Astrophysics Data System (ADS)
Cherkashina, Yu A.; Gerget, O. M.
2016-04-01
The paper includes results of scientific researches. These researches are devoted to the application of statistical techniques, namely, regression analysis, to assess the health status of children in the neonatal period based on medical data (hemostatic parameters, parameters of blood tests, the gestational age, vascular-endothelial growth factor) measured at 3-5 days of children's life. In this paper a detailed description of the studied medical data is given. A binary logistic regression procedure is discussed in the paper. Basic results of the research are presented. A classification table of predicted values and factual observed values is shown, the overall percentage of correct recognition is determined. Regression equation coefficients are calculated, the general regression equation is written based on them. Based on the results of logistic regression, ROC analysis was performed, sensitivity and specificity of the model are calculated and ROC curves are constructed. These mathematical techniques allow carrying out diagnostics of health of children providing a high quality of recognition. The results make a significant contribution to the development of evidence-based medicine and have a high practical importance in the professional activity of the author.
Held, Elizabeth; Cape, Joshua; Tintle, Nathan
2016-01-01
Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data.
Goldman, S A
1996-10-01
Neurotoxicity in relation to concomitant administration of lithium and neuroleptic drugs, particularly haloperidol, has been an ongoing issue. This study examined whether use of lithium with neuroleptic drugs enhances neurotoxicity leading to permanent sequelae. The Spontaneous Reporting System database of the United States Food and Drug Administration and extant literature were reviewed for spectrum cases of lithium/neuroleptic neurotoxicity. Groups taking lithium alone (Li), lithium/haloperidol (LiHal) and lithium/ nonhaloperidol neuroleptics (LiNeuro), each paired for recovery and sequelae, were established for 237 cases. Statistical analyses included pairwise comparisons of lithium levels using the Wilcoxon Rank Sum procedure and logistic regression to analyze the relationship between independent variables and development of sequelae. The Li and Li-Neuro groups showed significant statistical differences in median lithium levels between recovery and sequelae pairs, whereas the LiHal pair did not differ significantly. Lithium level was associated with sequelae development overall and within the Li and LiNeuro groups; no such association was evident in the LiHal group. On multivariable logistic regression analysis, lithium level and taking lithium/haloperidol were significant factors in the development of sequelae, with multiple possibly confounding factors (e.g., age, sex) not statistically significant. Multivariable logistic regression analyses with neuroleptic dose as five discrete dose ranges or actual dose did not show an association between development of sequelae and dose. Database limitations notwithstanding, the lack of apparent impact of serum lithium level on the development of sequelae in patients treated with haloperidol contrasts notably with results in the Li and LiNeuro groups. These findings may suggest a possible effect of pharmacodynamic factors in lithium/neuroleptic combination therapy.
Lloyd-Jones, Luke R; Robinson, Matthew R; Yang, Jian; Visscher, Peter M
2018-04-01
Genome-wide association studies (GWAS) have identified thousands of loci that are robustly associated with complex diseases. The use of linear mixed model (LMM) methodology for GWAS is becoming more prevalent due to its ability to control for population structure and cryptic relatedness and to increase power. The odds ratio (OR) is a common measure of the association of a disease with an exposure ( e.g. , a genetic variant) and is readably available from logistic regression. However, when the LMM is applied to all-or-none traits it provides estimates of genetic effects on the observed 0-1 scale, a different scale to that in logistic regression. This limits the comparability of results across studies, for example in a meta-analysis, and makes the interpretation of the magnitude of an effect from an LMM GWAS difficult. In this study, we derived transformations from the genetic effects estimated under the LMM to the OR that only rely on summary statistics. To test the proposed transformations, we used real genotypes from two large, publicly available data sets to simulate all-or-none phenotypes for a set of scenarios that differ in underlying model, disease prevalence, and heritability. Furthermore, we applied these transformations to GWAS summary statistics for type 2 diabetes generated from 108,042 individuals in the UK Biobank. In both simulation and real-data application, we observed very high concordance between the transformed OR from the LMM and either the simulated truth or estimates from logistic regression. The transformations derived and validated in this study improve the comparability of results from prospective and already performed LMM GWAS on complex diseases by providing a reliable transformation to a common comparative scale for the genetic effects. Copyright © 2018 by the Genetics Society of America.
Pinna, Antonio; Contini, Emma Luigia; Carru, Ciriaco; Solinas, Giuliana
2013-01-01
Glucose-6-Phosphate Dehydrogenase (G6PD) deficiency is one of the most common human genetic abnormalities, with a high prevalence in Sardinia, Italy. Evidence indicates that G6PD-deficient patients are protected against vascular disease. Little is known about the relationship between G6PD deficiency and diabetes mellitus. The purpose of this study was to compare G6PD deficiency prevalence in Sardinian diabetic men with severe retinal vascular complications and in age-matched non-diabetic controls and ascertain whether G6PD deficiency may offer protection against this vascular disorder. Erythrocyte G6PD activity was determined using a quantitative assay in 390 diabetic men with proliferative diabetic retinopathy (PDR) and 390 male non-diabetic controls, both aged ≥50 years. Conditional logistic regression models were used to investigate the association between G6PD deficiency and diabetes with severe retinal complications. G6PD deficiency was found in 21 (5.4 %) diabetic patients and 33 (8.5 %) controls (P=0.09). In a univariate conditional logistic regression model, G6PD deficiency showed a trend for protection against diabetes with PDR, but the odds ratio (OR) fell short of statistical significance (OR=0.6, 95% confidence interval=0.35-1.08, P=0.09). In multivariate conditional logistic regression models, including as covariates G6PD deficiency, plasma glucose, and systemic hypertension or systolic or diastolic blood pressure, G6PD deficiency showed no statistically significant protection against diabetes with PDR. The prevalence of G6PD deficiency in diabetic men with PDR was lower than in age-matched non-diabetic controls. G6PD deficiency showed a trend for protection against diabetes with PDR, but results were not statistically significant.
Computing group cardinality constraint solutions for logistic regression problems.
Zhang, Yong; Kwon, Dongjin; Pohl, Kilian M
2017-01-01
We derive an algorithm to directly solve logistic regression based on cardinality constraint, group sparsity and use it to classify intra-subject MRI sequences (e.g. cine MRIs) of healthy from diseased subjects. Group cardinality constraint models are often applied to medical images in order to avoid overfitting of the classifier to the training data. Solutions within these models are generally determined by relaxing the cardinality constraint to a weighted feature selection scheme. However, these solutions relate to the original sparse problem only under specific assumptions, which generally do not hold for medical image applications. In addition, inferring clinical meaning from features weighted by a classifier is an ongoing topic of discussion. Avoiding weighing features, we propose to directly solve the group cardinality constraint logistic regression problem by generalizing the Penalty Decomposition method. To do so, we assume that an intra-subject series of images represents repeated samples of the same disease patterns. We model this assumption by combining series of measurements created by a feature across time into a single group. Our algorithm then derives a solution within that model by decoupling the minimization of the logistic regression function from enforcing the group sparsity constraint. The minimum to the smooth and convex logistic regression problem is determined via gradient descent while we derive a closed form solution for finding a sparse approximation of that minimum. We apply our method to cine MRI of 38 healthy controls and 44 adult patients that received reconstructive surgery of Tetralogy of Fallot (TOF) during infancy. Our method correctly identifies regions impacted by TOF and generally obtains statistically significant higher classification accuracy than alternative solutions to this model, i.e., ones relaxing group cardinality constraints. Copyright © 2016 Elsevier B.V. All rights reserved.
Eken, Cenker; Bilge, Ugur; Kartal, Mutlu; Eray, Oktay
2009-06-03
Logistic regression is the most common statistical model for processing multivariate data in the medical literature. Artificial intelligence models like an artificial neural network (ANN) and genetic algorithm (GA) may also be useful to interpret medical data. The purpose of this study was to perform artificial intelligence models on a medical data sheet and compare to logistic regression. ANN, GA, and logistic regression analysis were carried out on a data sheet of a previously published article regarding patients presenting to an emergency department with flank pain suspicious for renal colic. The study population was composed of 227 patients: 176 patients had a diagnosis of urinary stone, while 51 ultimately had no calculus. The GA found two decision rules in predicting urinary stones. Rule 1 consisted of being male, pain not spreading to back, and no fever. In rule 2, pelvicaliceal dilatation on bedside ultrasonography replaced no fever. ANN, GA rule 1, GA rule 2, and logistic regression had a sensitivity of 94.9, 67.6, 56.8, and 95.5%, a specificity of 78.4, 76.47, 86.3, and 47.1%, a positive likelihood ratio of 4.4, 2.9, 4.1, and 1.8, and a negative likelihood ratio of 0.06, 0.42, 0.5, and 0.09, respectively. The area under the curve was found to be 0.867, 0.720, 0.715, and 0.713 for all applications, respectively. Data mining techniques such as ANN and GA can be used for predicting renal colic in emergency settings and to constitute clinical decision rules. They may be an alternative to conventional multivariate analysis applications used in biostatistics.
NASA Astrophysics Data System (ADS)
Duman, T. Y.; Can, T.; Gokceoglu, C.; Nefeslioglu, H. A.; Sonmez, H.
2006-11-01
As a result of industrialization, throughout the world, cities have been growing rapidly for the last century. One typical example of these growing cities is Istanbul, the population of which is over 10 million. Due to rapid urbanization, new areas suitable for settlement and engineering structures are necessary. The Cekmece area located west of the Istanbul metropolitan area is studied, because the landslide activity is extensive in this area. The purpose of this study is to develop a model that can be used to characterize landslide susceptibility in map form using logistic regression analysis of an extensive landslide database. A database of landslide activity was constructed using both aerial-photography and field studies. About 19.2% of the selected study area is covered by deep-seated landslides. The landslides that occur in the area are primarily located in sandstones with interbedded permeable and impermeable layers such as claystone, siltstone and mudstone. About 31.95% of the total landslide area is located at this unit. To apply logistic regression analyses, a data matrix including 37 variables was constructed. The variables used in the forwards stepwise analyses are different measures of slope, aspect, elevation, stream power index (SPI), plan curvature, profile curvature, geology, geomorphology and relative permeability of lithological units. A total of 25 variables were identified as exerting strong influence on landslide occurrence, and included by the logistic regression equation. Wald statistics values indicate that lithology, SPI and slope are more important than the other parameters in the equation. Beta coefficients of the 25 variables included the logistic regression equation provide a model for landslide susceptibility in the Cekmece area. This model is used to generate a landslide susceptibility map that correctly classified 83.8% of the landslide-prone areas.
Risk estimation using probability machines
2014-01-01
Background Logistic regression has been the de facto, and often the only, model used in the description and analysis of relationships between a binary outcome and observed features. It is widely used to obtain the conditional probabilities of the outcome given predictors, as well as predictor effect size estimates using conditional odds ratios. Results We show how statistical learning machines for binary outcomes, provably consistent for the nonparametric regression problem, can be used to provide both consistent conditional probability estimation and conditional effect size estimates. Effect size estimates from learning machines leverage our understanding of counterfactual arguments central to the interpretation of such estimates. We show that, if the data generating model is logistic, we can recover accurate probability predictions and effect size estimates with nearly the same efficiency as a correct logistic model, both for main effects and interactions. We also propose a method using learning machines to scan for possible interaction effects quickly and efficiently. Simulations using random forest probability machines are presented. Conclusions The models we propose make no assumptions about the data structure, and capture the patterns in the data by just specifying the predictors involved and not any particular model structure. So they do not run the same risks of model mis-specification and the resultant estimation biases as a logistic model. This methodology, which we call a “risk machine”, will share properties from the statistical machine that it is derived from. PMID:24581306
Risk estimation using probability machines.
Dasgupta, Abhijit; Szymczak, Silke; Moore, Jason H; Bailey-Wilson, Joan E; Malley, James D
2014-03-01
Logistic regression has been the de facto, and often the only, model used in the description and analysis of relationships between a binary outcome and observed features. It is widely used to obtain the conditional probabilities of the outcome given predictors, as well as predictor effect size estimates using conditional odds ratios. We show how statistical learning machines for binary outcomes, provably consistent for the nonparametric regression problem, can be used to provide both consistent conditional probability estimation and conditional effect size estimates. Effect size estimates from learning machines leverage our understanding of counterfactual arguments central to the interpretation of such estimates. We show that, if the data generating model is logistic, we can recover accurate probability predictions and effect size estimates with nearly the same efficiency as a correct logistic model, both for main effects and interactions. We also propose a method using learning machines to scan for possible interaction effects quickly and efficiently. Simulations using random forest probability machines are presented. The models we propose make no assumptions about the data structure, and capture the patterns in the data by just specifying the predictors involved and not any particular model structure. So they do not run the same risks of model mis-specification and the resultant estimation biases as a logistic model. This methodology, which we call a "risk machine", will share properties from the statistical machine that it is derived from.
Dipnall, Joanna F.
2016-01-01
Background Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study. Methods The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009–2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators. Results After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001). Conclusion The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin. PMID:26848571
Dipnall, Joanna F; Pasco, Julie A; Berk, Michael; Williams, Lana J; Dodd, Seetal; Jacka, Felice N; Meyer, Denny
2016-01-01
Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study. The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009-2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators. After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001). The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin.
NASA Astrophysics Data System (ADS)
Aygunes, Gunes
2017-07-01
The objective of this paper is to survey and determine the macroeconomic factors affecting the level of venture capital (VC) investments in a country. The literary depends on venture capitalists' quality and countries' venture capital investments. The aim of this paper is to give relationship between venture capital investment and macro economic variables via statistical computation method. We investigate the countries and macro economic variables. By using statistical computation method, we derive correlation between venture capital investments and macro economic variables. According to method of logistic regression model (logit regression or logit model), macro economic variables are correlated with each other in three group. Venture capitalists regard correlations as a indicator. Finally, we give correlation matrix of our results.
Approaches to Identify Exceedances of Water Quality Thresholds Associated with Ocean Conditions
WED scientists have developed a method to help distinguish whether failures to meet water quality criteria are associated with natural coastal upwelling by using the statistical approach of logistic regression. Estuaries along the west coast of the United States periodically ha...
Carnahan, Brian; Meyer, Gérard; Kuntz, Lois-Ann
2003-01-01
Multivariate classification models play an increasingly important role in human factors research. In the past, these models have been based primarily on discriminant analysis and logistic regression. Models developed from machine learning research offer the human factors professional a viable alternative to these traditional statistical classification methods. To illustrate this point, two machine learning approaches--genetic programming and decision tree induction--were used to construct classification models designed to predict whether or not a student truck driver would pass his or her commercial driver license (CDL) examination. The models were developed and validated using the curriculum scores and CDL exam performances of 37 student truck drivers who had completed a 320-hr driver training course. Results indicated that the machine learning classification models were superior to discriminant analysis and logistic regression in terms of predictive accuracy. Actual or potential applications of this research include the creation of models that more accurately predict human performance outcomes.
Methods for estimating drought streamflow probabilities for Virginia streams
Austin, Samuel H.
2014-01-01
Maximum likelihood logistic regression model equations used to estimate drought flow probabilities for Virginia streams are presented for 259 hydrologic basins in Virginia. Winter streamflows were used to estimate the likelihood of streamflows during the subsequent drought-prone summer months. The maximum likelihood logistic regression models identify probable streamflows from 5 to 8 months in advance. More than 5 million streamflow daily values collected over the period of record (January 1, 1900 through May 16, 2012) were compiled and analyzed over a minimum 10-year (maximum 112-year) period of record. The analysis yielded the 46,704 equations with statistically significant fit statistics and parameter ranges published in two tables in this report. These model equations produce summer month (July, August, and September) drought flow threshold probabilities as a function of streamflows during the previous winter months (November, December, January, and February). Example calculations are provided, demonstrating how to use the equations to estimate probable streamflows as much as 8 months in advance.
New machine-learning algorithms for prediction of Parkinson's disease
NASA Astrophysics Data System (ADS)
Mandal, Indrajit; Sairam, N.
2014-03-01
This article presents an enhanced prediction accuracy of diagnosis of Parkinson's disease (PD) to prevent the delay and misdiagnosis of patients using the proposed robust inference system. New machine-learning methods are proposed and performance comparisons are based on specificity, sensitivity, accuracy and other measurable parameters. The robust methods of treating Parkinson's disease (PD) includes sparse multinomial logistic regression, rotation forest ensemble with support vector machines and principal components analysis, artificial neural networks, boosting methods. A new ensemble method comprising of the Bayesian network optimised by Tabu search algorithm as classifier and Haar wavelets as projection filter is used for relevant feature selection and ranking. The highest accuracy obtained by linear logistic regression and sparse multinomial logistic regression is 100% and sensitivity, specificity of 0.983 and 0.996, respectively. All the experiments are conducted over 95% and 99% confidence levels and establish the results with corrected t-tests. This work shows a high degree of advancement in software reliability and quality of the computer-aided diagnosis system and experimentally shows best results with supportive statistical inference.
NASA Astrophysics Data System (ADS)
Oh, Hyun-Joo; Lee, Saro; Chotikasathien, Wisut; Kim, Chang Hwan; Kwon, Ju Hyoung
2009-04-01
For predictive landslide susceptibility mapping, this study applied and verified probability model, the frequency ratio and statistical model, logistic regression at Pechabun, Thailand, using a geographic information system (GIS) and remote sensing. Landslide locations were identified in the study area from interpretation of aerial photographs and field surveys, and maps of the topography, geology and land cover were constructed to spatial database. The factors that influence landslide occurrence, such as slope gradient, slope aspect and curvature of topography and distance from drainage were calculated from the topographic database. Lithology and distance from fault were extracted and calculated from the geology database. Land cover was classified from Landsat TM satellite image. The frequency ratio and logistic regression coefficient were overlaid for landslide susceptibility mapping as each factor’s ratings. Then the landslide susceptibility map was verified and compared using the existing landslide location. As the verification results, the frequency ratio model showed 76.39% and logistic regression model showed 70.42% in prediction accuracy. The method can be used to reduce hazards associated with landslides and to plan land cover.
Quantifying discrimination of Framingham risk functions with different survival C statistics.
Pencina, Michael J; D'Agostino, Ralph B; Song, Linye
2012-07-10
Cardiovascular risk prediction functions offer an important diagnostic tool for clinicians and patients themselves. They are usually constructed with the use of parametric or semi-parametric survival regression models. It is essential to be able to evaluate the performance of these models, preferably with summaries that offer natural and intuitive interpretations. The concept of discrimination, popular in the logistic regression context, has been extended to survival analysis. However, the extension is not unique. In this paper, we define discrimination in survival analysis as the model's ability to separate those with longer event-free survival from those with shorter event-free survival within some time horizon of interest. This definition remains consistent with that used in logistic regression, in the sense that it assesses how well the model-based predictions match the observed data. Practical and conceptual examples and numerical simulations are employed to examine four C statistics proposed in the literature to evaluate the performance of survival models. We observe that they differ in the numerical values and aspects of discrimination that they capture. We conclude that the index proposed by Harrell is the most appropriate to capture discrimination described by the above definition. We suggest researchers report which C statistic they are using, provide a rationale for their selection, and be aware that comparing different indices across studies may not be meaningful. Copyright © 2012 John Wiley & Sons, Ltd.
Ren, Y Y; Zhou, L C; Yang, L; Liu, P Y; Zhao, B W; Liu, H X
2016-09-01
The paper highlights the use of the logistic regression (LR) method in the construction of acceptable statistically significant, robust and predictive models for the classification of chemicals according to their aquatic toxic modes of action. Essentials accounting for a reliable model were all considered carefully. The model predictors were selected by stepwise forward discriminant analysis (LDA) from a combined pool of experimental data and chemical structure-based descriptors calculated by the CODESSA and DRAGON software packages. Model predictive ability was validated both internally and externally. The applicability domain was checked by the leverage approach to verify prediction reliability. The obtained models are simple and easy to interpret. In general, LR performs much better than LDA and seems to be more attractive for the prediction of the more toxic compounds, i.e. compounds that exhibit excess toxicity versus non-polar narcotic compounds and more reactive compounds versus less reactive compounds. In addition, model fit and regression diagnostics was done through the influence plot which reflects the hat-values, studentized residuals, and Cook's distance statistics of each sample. Overdispersion was also checked for the LR model. The relationships between the descriptors and the aquatic toxic behaviour of compounds are also discussed.
ERIC Educational Resources Information Center
Liu, Leping; Maddux, Cleborne D.
2008-01-01
This article presents a study of Web 2.0 articles intended to (a) analyze the content of what is written and (b) develop a statistical model to predict whether authors' write about the need for new instructional design strategies and models. Eighty-eight technology articles were subjected to lexical analysis and a logistic regression model was…
Arevalillo, Jorge M; Sztein, Marcelo B; Kotloff, Karen L; Levine, Myron M; Simon, Jakub K
2017-10-01
Immunologic correlates of protection are important in vaccine development because they give insight into mechanisms of protection, assist in the identification of promising vaccine candidates, and serve as endpoints in bridging clinical vaccine studies. Our goal is the development of a methodology to identify immunologic correlates of protection using the Shigella challenge as a model. The proposed methodology utilizes the Random Forests (RF) machine learning algorithm as well as Classification and Regression Trees (CART) to detect immune markers that predict protection, identify interactions between variables, and define optimal cutoffs. Logistic regression modeling is applied to estimate the probability of protection and the confidence interval (CI) for such a probability is computed by bootstrapping the logistic regression models. The results demonstrate that the combination of Classification and Regression Trees and Random Forests complements the standard logistic regression and uncovers subtle immune interactions. Specific levels of immunoglobulin IgG antibody in blood on the day of challenge predicted protection in 75% (95% CI 67-86). Of those subjects that did not have blood IgG at or above a defined threshold, 100% were protected if they had IgA antibody secreting cells above a defined threshold. Comparison with the results obtained by applying only logistic regression modeling with standard Akaike Information Criterion for model selection shows the usefulness of the proposed method. Given the complexity of the immune system, the use of machine learning methods may enhance traditional statistical approaches. When applied together, they offer a novel way to quantify important immune correlates of protection that may help the development of vaccines. Copyright © 2017 Elsevier Inc. All rights reserved.
PAKDD Data Mining Competition 2009: New Ways of Using Known Methods
NASA Astrophysics Data System (ADS)
Linhart, Chaim; Harari, Guy; Abramovich, Sharon; Buchris, Altina
The PAKDD 2009 competition focuses on the problem of credit risk assessment. As required, we had to confront the problem of the robustness of the credit-scoring model against performance degradation caused by gradual market changes along a few years of business operation. We utilized the following standard models: logistic regression, KNN, SVM, GBM and decision tree. The novelty of our approach is two-fold: the integration of existing models, namely feeding the results of KNN as an input variable to the logistic regression, and re-coding categorical variables as numerical values that represent each category's statistical impact on the target label. The best solution we obtained reached 3rd place in the competition, with an AUC score of 0.655.
Real, J; Cleries, R; Forné, C; Roso-Llorach, A; Martínez-Sánchez, J M
In medicine and biomedical research, statistical techniques like logistic, linear, Cox and Poisson regression are widely known. The main objective is to describe the evolution of multivariate techniques used in observational studies indexed in PubMed (1970-2013), and to check the requirements of the STROBE guidelines in the author guidelines in Spanish journals indexed in PubMed. A targeted PubMed search was performed to identify papers that used logistic linear Cox and Poisson models. Furthermore, a review was also made of the author guidelines of journals published in Spain and indexed in PubMed and Web of Science. Only 6.1% of the indexed manuscripts included a term related to multivariate analysis, increasing from 0.14% in 1980 to 12.3% in 2013. In 2013, 6.7, 2.5, 3.5, and 0.31% of the manuscripts contained terms related to logistic, linear, Cox and Poisson regression, respectively. On the other hand, 12.8% of journals author guidelines explicitly recommend to follow the STROBE guidelines, and 35.9% recommend the CONSORT guideline. A low percentage of Spanish scientific journals indexed in PubMed include the STROBE statement requirement in the author guidelines. Multivariate regression models in published observational studies such as logistic regression, linear, Cox and Poisson are increasingly used both at international level, as well as in journals published in Spanish. Copyright © 2015 Sociedad Española de Médicos de Atención Primaria (SEMERGEN). Publicado por Elsevier España, S.L.U. All rights reserved.
2011-01-01
Introduction Necrotizing fasciitis (NF) is a life threatening infectious disease with a high mortality rate. We carried out a microbiological characterization of the causative pathogens. We investigated the correlation of mortality in NF with bloodstream infection and with the presence of co-morbidities. Methods In this retrospective study, we analyzed 323 patients who presented with necrotizing fasciitis at two different institutions. Bloodstream infection (BSI) was defined as a positive blood culture result. The patients were categorized as survivors and non-survivors. Eleven clinically important variables which were statistically significant by univariate analysis were selected for multivariate regression analysis and a stepwise logistic regression model was developed to determine the association between BSI and mortality. Results Univariate logistic regression analysis showed that patients with hypotension, heart disease, liver disease, presence of Vibrio spp. in wound cultures, presence of fungus in wound cultures, and presence of Streptococcus group A, Aeromonas spp. or Vibrio spp. in blood cultures, had a significantly higher risk of in-hospital mortality. Our multivariate logistic regression analysis showed a higher risk of mortality in patients with pre-existing conditions like hypotension, heart disease, and liver disease. Multivariate logistic regression analysis also showed that presence of Vibrio spp in wound cultures, and presence of Streptococcus Group A in blood cultures were associated with a high risk of mortality while debridement > = 3 was associated with improved survival. Conclusions Mortality in patients with necrotizing fasciitis was significantly associated with the presence of Vibrio in wound cultures and Streptococcus group A in blood cultures. PMID:21693053
Lee, Bum Ju; Kim, Keun Ho; Ku, Boncho; Jang, Jun-Su; Kim, Jong Yeol
2013-05-01
The body mass index (BMI) provides essential medical information related to body weight for the treatment and prognosis prediction of diseases such as cardiovascular disease, diabetes, and stroke. We propose a method for the prediction of normal, overweight, and obese classes based only on the combination of voice features that are associated with BMI status, independently of weight and height measurements. A total of 1568 subjects were divided into 4 groups according to age and gender differences. We performed statistical analyses by analysis of variance (ANOVA) and Scheffe test to find significant features in each group. We predicted BMI status (normal, overweight, and obese) by a logistic regression algorithm and two ensemble classification algorithms (bagging and random forests) based on statistically significant features. In the Female-2030 group (females aged 20-40 years), classification experiments using an imbalanced (original) data set gave area under the receiver operating characteristic curve (AUC) values of 0.569-0.731 by logistic regression, whereas experiments using a balanced data set gave AUC values of 0.893-0.994 by random forests. AUC values in Female-4050 (females aged 41-60 years), Male-2030 (males aged 20-40 years), and Male-4050 (males aged 41-60 years) groups by logistic regression in imbalanced data were 0.585-0.654, 0.581-0.614, and 0.557-0.653, respectively. AUC values in Female-4050, Male-2030, and Male-4050 groups in balanced data were 0.629-0.893 by bagging, 0.707-0.916 by random forests, and 0.695-0.854 by bagging, respectively. In each group, we found discriminatory features showing statistical differences among normal, overweight, and obese classes. The results showed that the classification models built by logistic regression in imbalanced data were better than those built by the other two algorithms, and significant features differed according to age and gender groups. Our results could support the development of BMI diagnosis tools for real-time monitoring; such tools are considered helpful in improving automated BMI status diagnosis in remote healthcare or telemedicine and are expected to have applications in forensic and medical science. Copyright © 2013 Elsevier B.V. All rights reserved.
Reitsma, Angela; Chu, Rong; Thorpe, Julia; McDonald, Sarah; Thabane, Lehana; Hutton, Eileen
2014-09-26
Clustering of outcomes at centers involved in multicenter trials is a type of center effect. The Consolidated Standards of Reporting Trials Statement recommends that multicenter randomized controlled trials (RCTs) should account for center effects in their analysis, however most do not. The Early External Cephalic Version (EECV) trials published in 2003 and 2011 stratified by center at randomization, but did not account for center in the analyses, and due to the nature of the intervention and number of centers, may have been prone to center effects. Using data from the EECV trials, we undertook an empirical study to compare various statistical approaches to account for center effect while estimating the impact of external cephalic version timing (early or delayed) on the outcomes of cesarean section, preterm birth, and non-cephalic presentation at the time of birth. The data from the EECV pilot trial and the EECV2 trial were merged into one dataset. Fisher's exact method was used to test the overall effect of external cephalic version timing unadjusted for center effects. Seven statistical models that accounted for center effects were applied to the data. The models included: i) the Mantel-Haenszel test, ii) logistic regression with fixed center effect and fixed treatment effect, iii) center-size weighted and iv) un-weighted logistic regression with fixed center effect and fixed treatment-by-center interaction, iv) logistic regression with random center effect and fixed treatment effect, v) logistic regression with random center effect and random treatment-by-center interaction, and vi) generalized estimating equations. For each of the three outcomes of interest approaches to account for center effect did not alter the overall findings of the trial. The results were similar for the majority of the methods used to adjust for center, illustrating the robustness of the findings. Despite literature that suggests center effect can change the estimate of effect in multicenter trials, this empirical study does not show a difference in the outcomes of the EECV trials when accounting for center effect. The EECV2 trial was registered on 30 July 30 2005 with Current Controlled Trials: ISRCTN 56498577.
Hein, R; Abbas, S; Seibold, P; Salazar, R; Flesch-Janys, D; Chang-Claude, J
2012-01-01
Menopausal hormone therapy (MHT) is associated with an increased breast cancer risk in postmenopausal women, with combined estrogen-progestagen therapy posing a greater risk than estrogen monotherapy. However, few studies focused on potential effect modification of MHT-associated breast cancer risk by genetic polymorphisms in the progesterone metabolism. We assessed effect modification of MHT use by five coding single nucleotide polymorphisms (SNPs) in the progesterone metabolizing enzymes AKR1C3 (rs7741), AKR1C4 (rs3829125, rs17134592), and SRD5A1 (rs248793, rs3736316) using a two-center population-based case-control study from Germany with 2,502 postmenopausal breast cancer patients and 4,833 matched controls. An empirical-Bayes procedure that tests for interaction using a weighted combination of the prospective and the retrospective case-control estimators as well as standard prospective logistic regression were applied to assess multiplicative statistical interaction between polymorphisms and duration of MHT use with regard to breast cancer risk assuming a log-additive mode of inheritance. No genetic marginal effects were observed. Breast cancer risk associated with duration of combined therapy was significantly modified by SRD5A1_rs3736316, showing a reduced risk elevation in carriers of the minor allele (p (interaction,empirical-Bayes) = 0.006 using the empirical-Bayes method, p (interaction,logistic regression) = 0.013 using logistic regression). The risk associated with duration of use of monotherapy was increased by AKR1C3_rs7741 in minor allele carriers (p (interaction,empirical-Bayes) = 0.083, p (interaction,logistic regression) = 0.029) and decreased in minor allele carriers of two SNPs in AKR1C4 (rs3829125: p (interaction,empirical-Bayes) = 0.07, p (interaction,logistic regression) = 0.021; rs17134592: p (interaction,empirical-Bayes) = 0.101, p (interaction,logistic regression) = 0.038). After Bonferroni correction for multiple testing only SRD5A1_rs3736316 assessed using the empirical-Bayes method remained significant. Postmenopausal breast cancer risk associated with combined therapy may be modified by genetic variation in SRD5A1. Further well-powered studies are, however, required to replicate our finding.
The Effectiveness of Edgenuity When Used for Credit Recovery
ERIC Educational Resources Information Center
Eddy, Carri
2013-01-01
This quantitative study used descriptive statistics, logistic regression, and chi-square analysis to determine the impact of using Edgenuity (formerly Education 2020 Virtual Classroom) to assist students in the recovery of lost credits. The sample included a North Texas school district. The Skyward student management system provided archived…
USDA-ARS?s Scientific Manuscript database
Probabilistic forecasts of US Drought Monitor (USDM) intensification over two, four and eight week time periods are developed based on recent anomalies in precipitation, evapotranspiration and soil moisture. These statistical forecasts are computed using logistic regression with cross validation. Wh...
An Examination of Master's Student Retention & Completion
ERIC Educational Resources Information Center
Barry, Melissa; Mathies, Charles
2011-01-01
This study was conducted at a research-extensive public university in the southeastern United States. It examined the retention and completion of master's degree students across numerous disciplines. Results were derived from a series of descriptive statistics, T-tests, and a series of binary logistic regression models. The findings from binary…
The Effect of Religiosity and Campus Alcohol Culture on Collegiate Alcohol Consumption
ERIC Educational Resources Information Center
Wells, Gayle M.
2010-01-01
Religiosity and campus culture were examined in relationship to alcohol consumption among college students using reference group theory. Participants and Methods: College students (N = 530) at a religious college and at a state university complete questionnaires on alcohol use and religiosity. Statistical tests and logistic regression were…
Child Mortality in a Developing Country: A Statistical Analysis
ERIC Educational Resources Information Center
Uddin, Md. Jamal; Hossain, Md. Zakir; Ullah, Mohammad Ohid
2009-01-01
This study uses data from the "Bangladesh Demographic and Health Survey (BDHS] 1999-2000" to investigate the predictors of child (age 1-4 years) mortality in a developing country like Bangladesh. The cross-tabulation and multiple logistic regression techniques have been used to estimate the predictors of child mortality. The…
Prediction of cold and heat patterns using anthropometric measures based on machine learning.
Lee, Bum Ju; Lee, Jae Chul; Nam, Jiho; Kim, Jong Yeol
2018-01-01
To examine the association of body shape with cold and heat patterns, to determine which anthropometric measure is the best indicator for discriminating between the two patterns, and to investigate whether using a combination of measures can improve the predictive power to diagnose these patterns. Based on a total of 4,859 subjects (3,000 women and 1,859 men), statistical analyses using binary logistic regression were performed to assess the significance of the difference and the predictive power of each anthropometric measure, and binary logistic regression and Naive Bayes with the variable selection technique were used to assess the improvement in the predictive power of the patterns using the combined measures. In women, the strongest indicators for determining the cold and heat patterns among anthropometric measures were body mass index (BMI) and rib circumference; in men, the best indicator was BMI. In experiments using a combination of measures, the values of the area under the receiver operating characteristic curve in women were 0.776 by Naive Bayes and 0.772 by logistic regression, and the values in men were 0.788 by Naive Bayes and 0.779 by logistic regression. Individuals with a higher BMI have a tendency toward a heat pattern in both women and men. The use of a combination of anthropometric measures can slightly improve the diagnostic accuracy. Our findings can provide fundamental information for the diagnosis of cold and heat patterns based on body shape for personalized medicine.
Teng, Ju-Hsi; Lin, Kuan-Chia; Ho, Bin-Shenq
2007-10-01
A community-based aboriginal study was conducted and analysed to explore the application of classification tree and logistic regression. A total of 1066 aboriginal residents in Yilan County were screened during 2003-2004. The independent variables include demographic characteristics, physical examinations, geographic location, health behaviours, dietary habits and family hereditary diseases history. Risk factors of cardiovascular diseases were selected as the dependent variables in further analysis. The completion rate for heath interview is 88.9%. The classification tree results find that if body mass index is higher than 25.72 kg m(-2) and the age is above 51 years, the predicted probability for number of cardiovascular risk factors > or =3 is 73.6% and the population is 322. If body mass index is higher than 26.35 kg m(-2) and geographical latitude of the village is lower than 24 degrees 22.8', the predicted probability for number of cardiovascular risk factors > or =4 is 60.8% and the population is 74. As the logistic regression results indicate that body mass index, drinking habit and menopause are the top three significant independent variables. The classification tree model specifically shows the discrimination paths and interactions between the risk groups. The logistic regression model presents and analyses the statistical independent factors of cardiovascular risks. Applying both models to specific situations will provide a different angle for the design and management of future health intervention plans after community-based study.
Gong, Xu; Cui, Jianli; Jiang, Ziping; Lu, Laijin; Li, Xiucun
2018-03-01
Few clinical retrospective studies have reported the risk factors of pedicled flap necrosis in hand soft tissue reconstruction. The aim of this study was to identify non-technical risk factors associated with pedicled flap perioperative necrosis in hand soft tissue reconstruction via a multivariate logistic regression analysis. For patients with hand soft tissue reconstruction, we carefully reviewed hospital records and identified 163 patients who met the inclusion criteria. The characteristics of these patients, flap transfer procedures and postoperative complications were recorded. Eleven predictors were identified. The correlations between pedicled flap necrosis and risk factors were analysed using a logistic regression model. Of 163 skin flaps, 125 flaps survived completely without any complications. The pedicled flap necrosis rate in hands was 11.04%, which included partial flap necrosis (7.36%) and total flap necrosis (3.68%). Soft tissue defects in fingers were noted in 68.10% of all cases. The logistic regression analysis indicated that the soft tissue defect site (P = 0.046, odds ratio (OR) = 0.079, confidence interval (CI) (0.006, 0.959)), flap size (P = 0.020, OR = 1.024, CI (1.004, 1.045)) and postoperative wound infection (P < 0.001, OR = 17.407, CI (3.821, 79.303)) were statistically significant risk factors for pedicled flap necrosis of the hand. Soft tissue defect site, flap size and postoperative wound infection were risk factors associated with pedicled flap necrosis in hand soft tissue defect reconstruction. © 2017 Royal Australasian College of Surgeons.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hu, H.; Kim, Rokho; Korrick, S.
1996-12-31
In an earlier report based on participants in the Veterans Administration Normative Aging Study, we found a significant association between the risk of hypertension and lead levels in tibia. To examine the possible confounding effects of education and occupation, we considered in this study five levels of education and three levels of occupation as independent variables in the statistical model. Of 1,171 active subjects seen between August 1991 and December 1994, 563 provided complete data for this analysis. In the initial logistic regression model, acre and body mass index, family history of hypertension, and dietary sodium intake, but neither cumulativemore » smoking nor alcohol ingestion, conferred increased odds ratios for being hypertensive that were statistically significant. When the lead biomarkers were added separately to this initial logistic model, tibia lead and patella lead levels were associated with significantly elevated odds ratios for hypertension. In the final backward elimination logistic regression model that included categorical variables for education and occupation, the only variables retained were body mass index, family history of hypertension, and tibia lead level. We conclude that education and occupation variables were not confounding the association between the lead biomarkers and hypertension that we reported previously. 27 refs., 3 tabs.« less
Non-ignorable missingness in logistic regression.
Wang, Joanna J J; Bartlett, Mark; Ryan, Louise
2017-08-30
Nonresponses and missing data are common in observational studies. Ignoring or inadequately handling missing data may lead to biased parameter estimation, incorrect standard errors and, as a consequence, incorrect statistical inference and conclusions. We present a strategy for modelling non-ignorable missingness where the probability of nonresponse depends on the outcome. Using a simple case of logistic regression, we quantify the bias in regression estimates and show the observed likelihood is non-identifiable under non-ignorable missing data mechanism. We then adopt a selection model factorisation of the joint distribution as the basis for a sensitivity analysis to study changes in estimated parameters and the robustness of study conclusions against different assumptions. A Bayesian framework for model estimation is used as it provides a flexible approach for incorporating different missing data assumptions and conducting sensitivity analysis. Using simulated data, we explore the performance of the Bayesian selection model in correcting for bias in a logistic regression. We then implement our strategy using survey data from the 45 and Up Study to investigate factors associated with worsening health from the baseline to follow-up survey. Our findings have practical implications for the use of the 45 and Up Study data to answer important research questions relating to health and quality-of-life. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Prediction model for the return to work of workers with injuries in Hong Kong.
Xu, Yanwen; Chan, Chetwyn C H; Lo, Karen Hui Yu-Ling; Tang, Dan
2008-01-01
This study attempts to formulate a prediction model of return to work for a group of workers who have been suffering from chronic pain and physical injury while also being out of work in Hong Kong. The study used Case-based Reasoning (CBR) method, and compared the result with the statistical method of logistic regression model. The database of the algorithm of CBR was composed of 67 cases who were also used in the logistic regression model. The testing cases were 32 participants who had a similar background and characteristics to those in the database. The methods of setting constraints and Euclidean distance metric were used in CBR to search the closest cases to the trial case based on the matrix. The usefulness of the algorithm was tested on 32 new participants, and the accuracy of predicting return to work outcomes was 62.5%, which was no better than the 71.2% accuracy derived from the logistic regression model. The results of the study would enable us to have a better understanding of the CBR applied in the field of occupational rehabilitation by comparing with the conventional regression analysis. The findings would also shed light on the development of relevant interventions for the return-to-work process of these workers.
[Statistical prediction methods in violence risk assessment and its application].
Liu, Yuan-Yuan; Hu, Jun-Mei; Yang, Min; Li, Xiao-Song
2013-06-01
It is an urgent global problem how to improve the violence risk assessment. As a necessary part of risk assessment, statistical methods have remarkable impacts and effects. In this study, the predicted methods in violence risk assessment from the point of statistics are reviewed. The application of Logistic regression as the sample of multivariate statistical model, decision tree model as the sample of data mining technique, and neural networks model as the sample of artificial intelligence technology are all reviewed. This study provides data in order to contribute the further research of violence risk assessment.
Statistical considerations in the development of injury risk functions.
McMurry, Timothy L; Poplin, Gerald S
2015-01-01
We address 4 frequently misunderstood and important statistical ideas in the construction of injury risk functions. These include the similarities of survival analysis and logistic regression, the correct scale on which to construct pointwise confidence intervals for injury risk, the ability to discern which form of injury risk function is optimal, and the handling of repeated tests on the same subject. The statistical models are explored through simulation and examination of the underlying mathematics. We provide recommendations for the statistically valid construction and correct interpretation of single-predictor injury risk functions. This article aims to provide useful and understandable statistical guidance to improve the practice in constructing injury risk functions.
Factors associated with active commuting to work among women.
Bopp, Melissa; Child, Stephanie; Campbell, Matthew
2014-01-01
Active commuting (AC), the act of walking or biking to work, has notable health benefits though rates of AC remain low among women. This study used a social-ecological framework to examine the factors associated with AC among women. A convenience sample of employed, working women (n = 709) completed an online survey about their mode of travel to work. Individual, interpersonal, institutional, community, and environmental influences were assessed. Basic descriptive statistics and frequencies described the sample. Simple logistic regression models examined associations with the independent variables with AC participation and multiple logistic regression analysis determined the relative influence of social ecological factors on AC participation. The sample was primarily middle-aged (44.09±11.38 years) and non-Hispanic White (92%). Univariate analyses revealed several individual, interpersonal, institutional, community and environmental factors significantly associated with AC. The multivariable logistic regression analysis results indicated that significant factors associated with AC included number of children, income, perceived behavioral control, coworker AC, coworker AC normative beliefs, employer and community supports for AC, and traffic. The results of this study contribute to the limited body of knowledge on AC participation for women and may help to inform gender-tailored interventions to enhance AC behavior and improve health.
Wartberg, L; Kriston, L; Kramer, M; Schwedler, A; Lincoln, T M; Kammerl, R
2017-06-01
Internet gaming disorder (IGD) has been included in the Diagnostic and Statistical Manual of Mental Disorders (DSM-5). Currently, associations between IGD in early adolescence and mental health are largely unexplained. In the present study, the relation of IGD with adolescent and parental mental health was investigated for the first time. We surveyed 1095 family dyads (an adolescent aged 12-14 years and a related parent) with a standardized questionnaire for IGD as well as for adolescent and parental mental health. We conducted linear (dimensional approach) and logistic (categorical approach) regression analyses. Both with dimensional and categorical approaches, we observed statistically significant associations between IGD and male gender, a higher degree of adolescent antisocial behavior, anger control problems, emotional distress, self-esteem problems, hyperactivity/inattention and parental anxiety (linear regression model: corrected R 2 =0.41, logistic regression model: Nagelkerke's R 2 =0.41). IGD appears to be associated with internalizing and externalizing problems in adolescents. Moreover, the findings of the present study provide first evidence that not only adolescent but also parental mental health is relevant to IGD in early adolescence. Adolescent and parental mental health should be considered in prevention and intervention programs for IGD in adolescence. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
NASA Astrophysics Data System (ADS)
Ariffin, Syaiba Balqish; Midi, Habshah
2014-06-01
This article is concerned with the performance of logistic ridge regression estimation technique in the presence of multicollinearity and high leverage points. In logistic regression, multicollinearity exists among predictors and in the information matrix. The maximum likelihood estimator suffers a huge setback in the presence of multicollinearity which cause regression estimates to have unduly large standard errors. To remedy this problem, a logistic ridge regression estimator is put forward. It is evident that the logistic ridge regression estimator outperforms the maximum likelihood approach for handling multicollinearity. The effect of high leverage points are then investigated on the performance of the logistic ridge regression estimator through real data set and simulation study. The findings signify that logistic ridge regression estimator fails to provide better parameter estimates in the presence of both high leverage points and multicollinearity.
Introduction to the use of regression models in epidemiology.
Bender, Ralf
2009-01-01
Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.
Discontinuation among University Students in Southern Thailand
ERIC Educational Resources Information Center
Sittichai, Ruthaychonnee; Tongkumchum, Phattrawan; McNeil, Nittaya
2009-01-01
This study uses a statistical model to account for the pattern of discontinuation of university study at Pattani campus of Prince of Songkla University (PSU) in southern Thailand. University records for 11,408 bachelor degree students enrolled between 1999 and 2006 were used. Discontinuation rates were analyzed by using a logistic regression model…
The Effect of Loans on the Persistence and Attainment of Community College Students
ERIC Educational Resources Information Center
Dowd, Alicia C.; Coury, Tarek
2006-01-01
This study informs public policies regarding the use of subsidized loans as financial aid for community college students. Using logistic regression, it analyzes the National Center for Education Statistics' Beginning Postsecondary Students (BPS 90/94) data to predict persistence to the second year of college and associate's degree attainment over…
Exploring Milk and Yogurt Selection in an Urban Universal School Breakfast Program
ERIC Educational Resources Information Center
Miller, M. Elizabeth; Kwon, Sockju
2015-01-01
Purpose/Objectives: The purpose of this study was to explore milk and yogurt selection among students participating in a School Breakfast Program. Methods: Researchers observed breakfast selection of milk, juice and yogurt in six elementary and four secondary schools. Data were analyzed using descriptive statistics and logistic regression to…
Primary Factors Related to Multiple Placements for Children in Out-of-Home Care
ERIC Educational Resources Information Center
Eggertsen, Lars
2008-01-01
Using an ecological framework, this study identified which factors related to out-of-home placements significantly influenced multiple placements for children in Utah during 2000, 2001, and 2002. Multinomial logistic regression statistical procedures and a geographical information system (GIS) were used to analyze the data. The final model…
Attitudes towards Participation in Business Development Programmes: An Ethnic Comparison in Sweden
ERIC Educational Resources Information Center
Abbasian, Saeid; Yazdanfar, Darush
2015-01-01
Purpose: The aim of the study is to investigate whether there are any differences between the attitudes towards participation in development programmes of entrepreneurs who are immigrants and those who are native-born. Design/methodology/approach: Several statistical methods, including a binary logistic regression model, were used to analyse a…
Logistic regression for southern pine beetle outbreaks with spatial and temporal autocorrelation
M. L. Gumpertz; C.-T. Wu; John M. Pye
2000-01-01
Regional outbreaks of southern pine beetle (Dendroctonus frontalis Zimm.) show marked spatial and temporal patterns. While these patterns are of interest in themselves, we focus on statistical methods for estimating the effects of underlying environmental factors in the presence of spatial and temporal autocorrelation. The most comprehensive available information on...
Factors associated with young adults' knowledge regarding family history of Stroke 1
Lima, Maria Jose Melo Ramos; Moreira, Thereza Maria Magalhães; Florêncio, Raquel Sampaio; Braga, Predro
2016-01-01
ABSTRACT Objective: to analyze the factors associated with young adults' knowledge regarding family history of stroke. Method: an analytical transversal study, with 579 young adults from state schools, with collection of sociodemographic, clinical and risk factor-related variables, analyzed using logistic regression (backward elimination). Results: a statistical association was detected between age, civil status, and classification of arterial blood pressure and abdominal circumference with knowledge of family history of stroke. In the final logistic regression model, a statistical association was observed between knowledge regarding family history of stroke and the civil status of having a partner (ORa=1.61[1.07-2.42]; p=0.023), abdominal circumference (ORa=0.98[0.96-0.99]; p=0.012) and normal arterial blood pressure (ORa=2.56[1.19-5.52]; p=0.016). Conclusion: an association was observed between socioeconomic factors and risk factors for stroke and knowledge of family history of stroke, suggesting the need for health education or even educational programs on this topic for the clientele in question. PMID:27878217
The effect of the Family Case Management Program on 1996 birth outcomes in Illinois.
Keeton, Kristie; Saunders, Stephen E; Koltun, David
2004-03-01
The purpose of this study was to determine if birth outcomes for Medicaid recipients were improved with participation in the Illinois Family Case Management Program. Health program data files were linked with the 1996 Illinois Vital Records linked birth-death certificate file. Logistic regression was used to characterize the variation in birth outcomes as a function of Family Case Management participation while statistically controlling for measurable factors found to be confounders. Results of the logistic regression analysis show that women who participated in the Family Care Management Program were significantly less likely to give birth to very low birth weight infants (odds ratio [OR] = 0.86, 95% confidence interval [CI] = 0.75, 0.99) and low birth weight infants (OR = 0.83, CI = 0.79, 0.89). For infant mortality, however, the adjusted OR (OR = 0.98, CI = 0.82, 1.17), although under 1, was not statistically significant. These results suggest that the Family Case Management Program may be effective in reducing very low birth weight and low birth weight rates among infants born to low-income women.
Wu, Robert; Glen, Peter; Ramsay, Tim; Martel, Guillaume
2014-06-28
Observational studies dominate the surgical literature. Statistical adjustment is an important strategy to account for confounders in observational studies. Research has shown that published articles are often poor in statistical quality, which may jeopardize their conclusions. The Statistical Analyses and Methods in the Published Literature (SAMPL) guidelines have been published to help establish standards for statistical reporting.This study will seek to determine whether the quality of statistical adjustment and the reporting of these methods are adequate in surgical observational studies. We hypothesize that incomplete reporting will be found in all surgical observational studies, and that the quality and reporting of these methods will be of lower quality in surgical journals when compared with medical journals. Finally, this work will seek to identify predictors of high-quality reporting. This work will examine the top five general surgical and medical journals, based on a 5-year impact factor (2007-2012). All observational studies investigating an intervention related to an essential component area of general surgery (defined by the American Board of Surgery), with an exposure, outcome, and comparator, will be included in this systematic review. Essential elements related to statistical reporting and quality were extracted from the SAMPL guidelines and include domains such as intent of analysis, primary analysis, multiple comparisons, numbers and descriptive statistics, association and correlation analyses, linear regression, logistic regression, Cox proportional hazard analysis, analysis of variance, survival analysis, propensity analysis, and independent and correlated analyses. Each article will be scored as a proportion based on fulfilling criteria in relevant analyses used in the study. A logistic regression model will be built to identify variables associated with high-quality reporting. A comparison will be made between the scores of surgical observational studies published in medical versus surgical journals. Secondary outcomes will pertain to individual domains of analysis. Sensitivity analyses will be conducted. This study will explore the reporting and quality of statistical analyses in surgical observational studies published in the most referenced surgical and medical journals in 2013 and examine whether variables (including the type of journal) can predict high-quality reporting.
Neural network modeling for surgical decisions on traumatic brain injury patients.
Li, Y C; Liu, L; Chiu, W T; Jian, W S
2000-01-01
Computerized medical decision support systems have been a major research topic in recent years. Intelligent computer programs were implemented to aid physicians and other medical professionals in making difficult medical decisions. This report compares three different mathematical models for building a traumatic brain injury (TBI) medical decision support system (MDSS). These models were developed based on a large TBI patient database. This MDSS accepts a set of patient data such as the types of skull fracture, Glasgow Coma Scale (GCS), episode of convulsion and return the chance that a neurosurgeon would recommend an open-skull surgery for this patient. The three mathematical models described in this report including a logistic regression model, a multi-layer perceptron (MLP) neural network and a radial-basis-function (RBF) neural network. From the 12,640 patients selected from the database. A randomly drawn 9480 cases were used as the training group to develop/train our models. The other 3160 cases were in the validation group which we used to evaluate the performance of these models. We used sensitivity, specificity, areas under receiver-operating characteristics (ROC) curve and calibration curves as the indicator of how accurate these models are in predicting a neurosurgeon's decision on open-skull surgery. The results showed that, assuming equal importance of sensitivity and specificity, the logistic regression model had a (sensitivity, specificity) of (73%, 68%), compared to (80%, 80%) from the RBF model and (88%, 80%) from the MLP model. The resultant areas under ROC curve for logistic regression, RBF and MLP neural networks are 0.761, 0.880 and 0.897, respectively (P < 0.05). Among these models, the logistic regression has noticeably poorer calibration. This study demonstrated the feasibility of applying neural networks as the mechanism for TBI decision support systems based on clinical databases. The results also suggest that neural networks may be a better solution for complex, non-linear medical decision support systems than conventional statistical techniques such as logistic regression.
Sample size determination for logistic regression on a logit-normal distribution.
Kim, Seongho; Heath, Elisabeth; Heilbrun, Lance
2017-06-01
Although the sample size for simple logistic regression can be readily determined using currently available methods, the sample size calculation for multiple logistic regression requires some additional information, such as the coefficient of determination ([Formula: see text]) of a covariate of interest with other covariates, which is often unavailable in practice. The response variable of logistic regression follows a logit-normal distribution which can be generated from a logistic transformation of a normal distribution. Using this property of logistic regression, we propose new methods of determining the sample size for simple and multiple logistic regressions using a normal transformation of outcome measures. Simulation studies and a motivating example show several advantages of the proposed methods over the existing methods: (i) no need for [Formula: see text] for multiple logistic regression, (ii) available interim or group-sequential designs, and (iii) much smaller required sample size.
Staley, James R; Jones, Edmund; Kaptoge, Stephen; Butterworth, Adam S; Sweeting, Michael J; Wood, Angela M; Howson, Joanna M M
2017-06-01
Logistic regression is often used instead of Cox regression to analyse genome-wide association studies (GWAS) of single-nucleotide polymorphisms (SNPs) and disease outcomes with cohort and case-cohort designs, as it is less computationally expensive. Although Cox and logistic regression models have been compared previously in cohort studies, this work does not completely cover the GWAS setting nor extend to the case-cohort study design. Here, we evaluated Cox and logistic regression applied to cohort and case-cohort genetic association studies using simulated data and genetic data from the EPIC-CVD study. In the cohort setting, there was a modest improvement in power to detect SNP-disease associations using Cox regression compared with logistic regression, which increased as the disease incidence increased. In contrast, logistic regression had more power than (Prentice weighted) Cox regression in the case-cohort setting. Logistic regression yielded inflated effect estimates (assuming the hazard ratio is the underlying measure of association) for both study designs, especially for SNPs with greater effect on disease. Given logistic regression is substantially more computationally efficient than Cox regression in both settings, we propose a two-step approach to GWAS in cohort and case-cohort studies. First to analyse all SNPs with logistic regression to identify associated variants below a pre-defined P-value threshold, and second to fit Cox regression (appropriately weighted in case-cohort studies) to those identified SNPs to ensure accurate estimation of association with disease.
Lee, Shang-Yi; Hung, Chih-Jen; Chen, Chih-Chieh; Wu, Chih-Cheng
2014-11-01
Postoperative nausea and vomiting as well as postoperative pain are two major concerns when patients undergo surgery and receive anesthetics. Various models and predictive methods have been developed to investigate the risk factors of postoperative nausea and vomiting, and different types of preventive managements have subsequently been developed. However, there continues to be a wide variation in the previously reported incidence rates of postoperative nausea and vomiting. This may have occurred because patients were assessed at different time points, coupled with the overall limitation of the statistical methods used. However, using survival analysis with Cox regression, and thus factoring in these time effects, may solve this statistical limitation and reveal risk factors related to the occurrence of postoperative nausea and vomiting in the following period. In this retrospective, observational, uni-institutional study, we analyzed the results of 229 patients who received patient-controlled epidural analgesia following surgery from June 2007 to December 2007. We investigated the risk factors for the occurrence of postoperative nausea and vomiting, and also assessed the effect of evaluating patients at different time points using the Cox proportional hazards model. Furthermore, the results of this inquiry were compared with those results using logistic regression. The overall incidence of postoperative nausea and vomiting in our study was 35.4%. Using logistic regression, we found that only sex, but not the total doses and the average dose of opioids, had significant effects on the occurrence of postoperative nausea and vomiting at some time points. Cox regression showed that, when patients consumed a higher average dose of opioids, this correlated with a higher incidence of postoperative nausea and vomiting with a hazard ratio of 1.286. Survival analysis using Cox regression showed that the average consumption of opioids played an important role in postoperative nausea and vomiting, a result not found by logistic regression. Therefore, the incidence of postoperative nausea and vomiting in patients cannot be reliably determined on the basis of a single visit at one point in time. Copyright © 2014. Published by Elsevier Taiwan.
Risk factors for persistent gestational trophoblastic neoplasia.
Kuyumcuoglu, Umur; Guzel, Ali Irfan; Erdemoglu, Mahmut; Celik, Yusuf
2011-01-01
This retrospective study evaluated the risk factors for persistent gestational trophoblastic disease (GTN) and determined their odds ratios. This study included 100 cases with GTN admitted to our clinic. Possible risk factors recorded were age, gravidity, parity, size of the neoplasia, and beta-human chorionic gonadotropin levels (beta-hCG) before and after the procedure. Statistical analyses consisted of the independent sample t-test and logistic regression using the statistical package SPSS ver. 15.0 for Windows (SPSS, Chicago, IL, USA). Twenty of the cases had persistent GTN, and the differences between these and the others cases were evaluated. The size of the neoplasia and histopathological type of GTN had no statistical relationship with persistence, whereas age, gravidity, and beta-hCG levels were significant risk factors for persistent GTN (p < 0.05). The odds ratios (95% confidence interval (CI)) for age, gravidity, and pre- and post-evacuation beta-hCG levels determined using logistic regression were 4.678 (0.97-22.44), 7.315 (1.16-46.16), 2.637 (1.41-4.94), and 2.339 (1.52-3.60), respectively. Patient age, gravidity, and beta-hCG levels were risk factors for persistent GTN, whereas the size of the neoplasia and histopathological type of GTN were not significant risk factors.
Survival Data and Regression Models
NASA Astrophysics Data System (ADS)
Grégoire, G.
2014-12-01
We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.
The crux of the method: assumptions in ordinary least squares and logistic regression.
Long, Rebecca G
2008-10-01
Logistic regression has increasingly become the tool of choice when analyzing data with a binary dependent variable. While resources relating to the technique are widely available, clear discussions of why logistic regression should be used in place of ordinary least squares regression are difficult to find. The current paper compares and contrasts the assumptions of ordinary least squares with those of logistic regression and explains why logistic regression's looser assumptions make it adept at handling violations of the more important assumptions in ordinary least squares.
Regression: The Apple Does Not Fall Far From the Tree.
Vetter, Thomas R; Schober, Patrick
2018-05-15
Researchers and clinicians are frequently interested in either: (1) assessing whether there is a relationship or association between 2 or more variables and quantifying this association; or (2) determining whether 1 or more variables can predict another variable. The strength of such an association is mainly described by the correlation. However, regression analysis and regression models can be used not only to identify whether there is a significant relationship or association between variables but also to generate estimations of such a predictive relationship between variables. This basic statistical tutorial discusses the fundamental concepts and techniques related to the most common types of regression analysis and modeling, including simple linear regression, multiple regression, logistic regression, ordinal regression, and Poisson regression, as well as the common yet often underrecognized phenomenon of regression toward the mean. The various types of regression analysis are powerful statistical techniques, which when appropriately applied, can allow for the valid interpretation of complex, multifactorial data. Regression analysis and models can assess whether there is a relationship or association between 2 or more observed variables and estimate the strength of this association, as well as determine whether 1 or more variables can predict another variable. Regression is thus being applied more commonly in anesthesia, perioperative, critical care, and pain research. However, it is crucial to note that regression can identify plausible risk factors; it does not prove causation (a definitive cause and effect relationship). The results of a regression analysis instead identify independent (predictor) variable(s) associated with the dependent (outcome) variable. As with other statistical methods, applying regression requires that certain assumptions be met, which can be tested with specific diagnostics.
Selenium in irrigated agricultural areas of the western United States
Nolan, B.T.; Clark, M.L.
1997-01-01
A logistic regression model was developed to predict the likelihood that Se exceeds the USEPA chronic criterion for aquatic life (5 ??g/L) in irrigated agricultural areas of the western USA. Preliminary analysis of explanatory variables used in the model indicated that surface-water Se concentration increased with increasing dissolved solids (DS) concentration and with the presence of Upper Cretaceous, mainly marine sediment. The presence or absence of Cretaceous sediment was the major variable affecting Se concentration in surface-water samples from the National Irrigation Water Quality Program. Median Se concentration was 14 ??g/L in samples from areas underlain by Cretaceous sediments and < 1 ??g/L in samples from areas underlain by non-Cretaceous sediments. Wilcoxon rank sum tests indicated that elevated Se concentrations in samples from areas with Cretaceous sediments, irrigated areas, and from closed lakes and ponds were statistically significant. Spearman correlations indicated that Se was positively correlated with a binary geology variable (0.64) and DS (0.45). Logistic regression models indicated that the concentration of Se in surface water was almost certain to exceed the Environmental Protection Agency aquatic-life chronic criterion of 5 ??g/L when DS was greater than 3000 mg/L in areas with Cretaceous sediments. The 'best' logistic regression model correctly predicted Se exceedances and nonexceedances 84.4% of the time, and model sensitivity was 80.7%. A regional map of Cretaceous sediment showed the location of potential problem areas. The map and logistic regression model are tools that can be used to determine the potential for Se contamination of irrigated agricultural areas in the western USA.
Using Dominance Analysis to Determine Predictor Importance in Logistic Regression
ERIC Educational Resources Information Center
Azen, Razia; Traxel, Nicole
2009-01-01
This article proposes an extension of dominance analysis that allows researchers to determine the relative importance of predictors in logistic regression models. Criteria for choosing logistic regression R[superscript 2] analogues were determined and measures were selected that can be used to perform dominance analysis in logistic regression. A…
Genomic-Enabled Prediction of Ordinal Data with Bayesian Logistic Ordinal Regression.
Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; Burgueño, Juan; Eskridge, Kent
2015-08-18
Most genomic-enabled prediction models developed so far assume that the response variable is continuous and normally distributed. The exception is the probit model, developed for ordered categorical phenotypes. In statistical applications, because of the easy implementation of the Bayesian probit ordinal regression (BPOR) model, Bayesian logistic ordinal regression (BLOR) is implemented rarely in the context of genomic-enabled prediction [sample size (n) is much smaller than the number of parameters (p)]. For this reason, in this paper we propose a BLOR model using the Pólya-Gamma data augmentation approach that produces a Gibbs sampler with similar full conditional distributions of the BPOR model and with the advantage that the BPOR model is a particular case of the BLOR model. We evaluated the proposed model by using simulation and two real data sets. Results indicate that our BLOR model is a good alternative for analyzing ordinal data in the context of genomic-enabled prediction with the probit or logit link. Copyright © 2015 Montesinos-López et al.
Austin, Peter C; Steyerberg, Ewout W
2012-06-20
When outcomes are binary, the c-statistic (equivalent to the area under the Receiver Operating Characteristic curve) is a standard measure of the predictive accuracy of a logistic regression model. An analytical expression was derived under the assumption that a continuous explanatory variable follows a normal distribution in those with and without the condition. We then conducted an extensive set of Monte Carlo simulations to examine whether the expressions derived under the assumption of binormality allowed for accurate prediction of the empirical c-statistic when the explanatory variable followed a normal distribution in the combined sample of those with and without the condition. We also examine the accuracy of the predicted c-statistic when the explanatory variable followed a gamma, log-normal or uniform distribution in combined sample of those with and without the condition. Under the assumption of binormality with equality of variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the product of the standard deviation of the normal components (reflecting more heterogeneity) and the log-odds ratio (reflecting larger effects). Under the assumption of binormality with unequal variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the standardized difference of the explanatory variable in those with and without the condition. In our Monte Carlo simulations, we found that these expressions allowed for reasonably accurate prediction of the empirical c-statistic when the distribution of the explanatory variable was normal, gamma, log-normal, and uniform in the entire sample of those with and without the condition. The discriminative ability of a continuous explanatory variable cannot be judged by its odds ratio alone, but always needs to be considered in relation to the heterogeneity of the population.
Martin, Gary R.; Fowler, Kathleen K.; Arihood, Leslie D.
2016-09-06
Information on low-flow characteristics of streams is essential for the management of water resources. This report provides equations for estimating the 1-, 7-, and 30-day mean low flows for a recurrence interval of 10 years and the harmonic-mean flow at ungaged, unregulated stream sites in Indiana. These equations were developed using the low-flow statistics and basin characteristics for 108 continuous-record streamgages in Indiana with at least 10 years of daily mean streamflow data through the 2011 climate year (April 1 through March 31). The equations were developed in cooperation with the Indiana Department of Environmental Management.Regression techniques were used to develop the equations for estimating low-flow frequency statistics and the harmonic-mean flows on the basis of drainage-basin characteristics. A geographic information system was used to measure basin characteristics for selected streamgages. A final set of 25 basin characteristics measured at all the streamgages were evaluated to choose the best predictors of the low-flow statistics.Logistic-regression equations applicable statewide are presented for estimating the probability that selected low-flow frequency statistics equal zero. These equations use the explanatory variables total drainage area, average transmissivity of the full thickness of the unconsolidated deposits within 1,000 feet of the stream network, and latitude of the basin outlet. The percentage of the streamgage low-flow statistics correctly classified as zero or nonzero using the logistic-regression equations ranged from 86.1 to 88.9 percent.Generalized-least-squares regression equations applicable statewide for estimating nonzero low-flow frequency statistics use total drainage area, the average hydraulic conductivity of the top 70 feet of unconsolidated deposits, the slope of the basin, and the index of permeability and thickness of the Quaternary surficial sediments as explanatory variables. The average standard error of prediction of these regression equations ranges from 55.7 to 61.5 percent.Regional weighted-least-squares regression equations were developed for estimating the harmonic-mean flows by dividing the State into three low-flow regions. The Northern region uses total drainage area and the average transmissivity of the entire thickness of unconsolidated deposits as explanatory variables. The Central region uses total drainage area, the average hydraulic conductivity of the entire thickness of unconsolidated deposits, and the index of permeability and thickness of the Quaternary surficial sediments. The Southern region uses total drainage area and the percent of the basin covered by forest. The average standard error of prediction for these equations ranges from 39.3 to 66.7 percent.The regional regression equations are applicable only to stream sites with low flows unaffected by regulation and to stream sites with drainage basin characteristic values within specified limits. Caution is advised when applying the equations for basins with characteristics near the applicable limits and for basins with karst drainage features and for urbanized basins. Extrapolations near and beyond the applicable basin characteristic limits will have unknown errors that may be large. Equations are presented for use in estimating the 90-percent prediction interval of the low-flow statistics estimated by use of the regression equations at a given stream site.The regression equations are to be incorporated into the U.S. Geological Survey StreamStats Web-based application for Indiana. StreamStats allows users to select a stream site on a map and automatically measure the needed basin characteristics and compute the estimated low-flow statistics and associated prediction intervals.
Dietary consumption patterns and laryngeal cancer risk.
Vlastarakos, Petros V; Vassileiou, Andrianna; Delicha, Evie; Kikidis, Dimitrios; Protopapas, Dimosthenis; Nikolopoulos, Thomas P
2016-06-01
We conducted a case-control study to investigate the effect of diet on laryngeal carcinogenesis. Our study population was made up of 140 participants-70 patients with laryngeal cancer (LC) and 70 controls with a non-neoplastic condition that was unrelated to diet, smoking, or alcohol. A food-frequency questionnaire determined the mean consumption of 113 different items during the 3 years prior to symptom onset. Total energy intake and cooking mode were also noted. The relative risk, odds ratio (OR), and 95% confidence interval (CI) were estimated by multiple logistic regression analysis. We found that the total energy intake was significantly higher in the LC group (p < 0.001), and that the difference remained statistically significant after logistic regression analysis (p < 0.001; OR: 118.70). Notably, meat consumption was higher in the LC group (p < 0.001), and the difference remained significant after logistic regression analysis (p = 0.029; OR: 1.16). LC patients also consumed significantly more fried food (p = 0.036); this difference also remained significant in the logistic regression model (p = 0.026; OR: 5.45). The LC group also consumed significantly more seafood (p = 0.012); the difference persisted after logistic regression analysis (p = 0.009; OR: 2.48), with the consumption of shrimp proving detrimental (p = 0.049; OR: 2.18). Finally, the intake of zinc was significantly higher in the LC group before and after logistic regression analysis (p = 0.034 and p = 0.011; OR: 30.15, respectively). Cereal consumption (including pastas) was also higher among the LC patients (p = 0.043), with logistic regression analysis showing that their negative effect was possibly associated with the sauces and dressings that traditionally accompany pasta dishes (p = 0.006; OR: 4.78). Conversely, a higher consumption of dairy products was found in controls (p < 0.05); logistic regression analysis showed that calcium appeared to be protective at the micronutrient level (p < 0.001; OR: 0.27). We found no difference in the overall consumption of fruits and vegetables between the LC patients and controls; however, the LC patients did have a greater consumption of cooked tomatoes and cooked root vegetables (p = 0.039 for both), and the controls had more consumption of leeks (p = 0.042) and, among controls younger than 65 years, cooked beans (p = 0.037). Lemon (p = 0.037), squeezed fruit juice (p = 0.032), and watermelon (p = 0.018) were also more frequently consumed by the controls. Other differences at the micronutrient level included greater consumption by the LC patients of retinol (p = 0.044), polyunsaturated fats (p = 0.041), and linoleic acid (p = 0.008); LC patients younger than 65 years also had greater intake of riboflavin (p = 0.045). We conclude that the differences in dietary consumption patterns between LC patients and controls indicate a possible role for lifestyle modifications involving nutritional factors as a means of decreasing the risk of laryngeal cancer.
Applying Kaplan-Meier to Item Response Data
ERIC Educational Resources Information Center
McNeish, Daniel
2018-01-01
Some IRT models can be equivalently modeled in alternative frameworks such as logistic regression. Logistic regression can also model time-to-event data, which concerns the probability of an event occurring over time. Using the relation between time-to-event models and logistic regression and the relation between logistic regression and IRT, this…
Artificial Neural Network for the Prediction of Chromosomal Abnormalities in Azoospermic Males.
Akinsal, Emre Can; Haznedar, Bulent; Baydilli, Numan; Kalinli, Adem; Ozturk, Ahmet; Ekmekçioğlu, Oğuz
2018-02-04
To evaluate whether an artifical neural network helps to diagnose any chromosomal abnormalities in azoospermic males. The data of azoospermic males attending to a tertiary academic referral center were evaluated retrospectively. Height, total testicular volume, follicle stimulating hormone, luteinising hormone, total testosterone and ejaculate volume of the patients were used for the analyses. In artificial neural network, the data of 310 azoospermics were used as the education and 115 as the test set. Logistic regression analyses and discriminant analyses were performed for statistical analyses. The tests were re-analysed with a neural network. Both logistic regression analyses and artificial neural network predicted the presence or absence of chromosomal abnormalities with more than 95% accuracy. The use of artificial neural network model has yielded satisfactory results in terms of distinguishing patients whether they have any chromosomal abnormality or not.
Costa, Andréa A; Serra-Negra, Júnia M; Bendo, Cristiane B; Pordeus, Isabela A; Paiva, Saul M
2016-01-01
To investigate the impact of wearing a fixed orthodontic appliance on oral health-related quality of life (OHRQoL) among adolescents. A case-control study (1 ∶ 2) was carried out with a population-based randomized sample of 327 adolescents aged 11 to 14 years enrolled at public and private schools in the City of Brumadinho, southeast of Brazil. The case group (n = 109) was made up of adolescents with a high negative impact on OHRQoL, and the control group (n = 218) was made up of adolescents with a low negative impact. The outcome variable was the impact on OHRQoL measured by the Brazilian version of the Child Perceptions Questionnaire (CPQ 11-14) - Impact Short Form (ISF:16). The main independent variable was wearing fixed orthodontic appliances. Malocclusion and the type of school were identified as possible confounding variables. Bivariate and multiple conditional logistic regressions were employed in the statistical analysis. A multiple conditional logistic regression model demonstrated that adolescents wearing fixed orthodontic appliances had a 4.88-fold greater chance of presenting high negative impact on OHRQoL (95% CI: 2.93-8.13; P < .001) than those who did not wear fixed orthodontic appliances. A bivariate conditional logistic regression demonstrated that malocclusion was significantly associated with OHRQoL (P = .017), whereas no statistically significant association was found between the type of school and OHRQoL (P = .108). Adolescents who wore fixed orthodontic appliances had a greater chance of reporting a negative impact on OHRQoL than those who did not wear such appliances.
Wang, Lian-Hong; Yan, Jin; Yang, Guo-Li; Long, Shuo; Yu, Yong; Wu, Xi-Lin
2015-04-01
Money boys with inconsistent condom use (less than 100% of the time) are at high risk of infection by human immunodeficiency virus (HIV) or sexually transmitted infection (STI), but relatively little research has examined their risk behaviors. We investigated the prevalence of consistent condom use (100% of the time) and associated factors among money boys. A cross-sectional study using a structured questionnaire was conducted among money boys in Changsha, China, between July 2012 and January 2013. Independent variables included socio-demographic data, substance abuse history, work characteristics, and self-reported HIV and STI history. Dependent variables included the consistent condom use with different types of sex partners. Among the participants, 82.4% used condoms consistently with male clients, 80.2% with male sex partners, and 77.1% with female sex partners in the past 3 months. A multiple stepwise logistic regression model identified four statistically significant factors associated with lower likelihoods of consistent condom use with male clients: age group, substance abuse, lack of an "employment" arrangement, and having no HIV test within the prior 6 months. In a similar model, only one factor associated significantly with lower likelihoods of consistent condom use with male sex partners was identified in multiple stepwise logistic regression analyses: having no HIV test within the prior six months. As for female sex partners, two significant variables were statistically significant in the multiple stepwise logistic regression analysis: having no HIV test within the prior 6 months and having STI history. Interventions which are linked with more realistic and acceptable HIV prevention methods are greatly warranted and should increase risk awareness and the behavior of consistent condom use in both commercial and personal relationship. © 2015 International Society for Sexual Medicine.
Jin, Meihua; Yang, Zhongrong; Dong, Zhengquan; Han, Jiankang
2013-12-01
There is growing evidence that men who have sex with men (MSM) are currently a group at high risk of HIV infection in China. Our study aims to know the factors affecting consistent condom use among MSM recruited through the internet in Huzhou city. An anonymous cross-sectional study was conducted by recruiting 410 MSM living in Huzhou city via the Internet. The socio-demographic profiles (age, education level, employment status, etc.) and sexual risk behaviors of the respondents were investigated. Bivariate logistic regression analyses were performed to compare the differences between consistent condom users and inconsistent condom users. Variables with significant bivariate between groups' differences were used as candidate variables in a stepwise multivariate logistic regression model. All statistical analyses were performed using SPSS for Windows 17.0, and a p value < 0.05 was considered to be statistically significant. According to their condom use, sixty-eight respondents were classified into two groups. One is consistent condom users, and the other is inconsistent condom users. Multivariate logistic regression showed that respondents who had a comprehensive knowledge of HIV (OR = 4.08, 95% CI: 1.85-8.99), who had sex with male sex workers (OR = 15.30, 95% CI: 5.89-39.75) and who had not drunk alcohol before sex (OR = 3.10, 95% CI: 1.38-6.95) were more likely to be consistent condom users. Consistent condom use among MSM was associated with comprehensive knowledge of HIV and a lack of alcohol use before sexual contact. As a result, reducing alcohol consumption and enhancing education regarding the risks of HIV among sexually active MSM would be effective in preventing of HIV transmission.
Timmermans, Luc; Falez, Freddy; Mélot, Christian; Wespes, Eric
2013-09-01
A urinary incontinence impairment rating must be a highly accurate, non-invasive exploration of the condition using International Classification of Functioning (ICF)-based assessment tools. The objective of this study was to identify the best evaluation test and to determine an impairment rating model of urinary incontinence. In performing a cross-sectional study comparing successive urodynamic tests using both the International Consultation on Incontinence Questionnaire-Urinary Incontinence-Short Form (ICIQ-UI-SF) and the 1-hr pad-weighing test in 120 patients, we performed statistical likelihood ratio analysis and used logistic regression to calculate the probability of urodynamic incontinence using the most significant independent predictors. Subsequently, we created a template that was based on the significant predictors and the probability of urodynamic incontinence. The mean ICIQ-UI-SF score was 13.5 ± 4.6, and the median pad test value was 8 g. The discrimination statistic (receiver operating characteristic) described how well the urodynamic observations matched the ICIQ-UI-SF scores (under curve area (UDA):0.689) and the pad test data (UDA: 0.693). Using logistic regression analysis, we demonstrated that the best independent predictors of urodynamic incontinence were the patient's age and the ICIQ-UI-SF score. The logistic regression model permitted us to construct an equation to determine the probability of urodynamic incontinence. Using these tools, we created a template to generate a probability index of urodynamic urinary incontinence. Using this probability index, relative to the patient and to the maximum impairment of the whole person (MIWP) relative to urinary incontinence, we were able to calculate a patient's permanent impairment. Copyright © 2012 Wiley Periodicals, Inc.
Ioannidis, J P; McQueen, P G; Goedert, J J; Kaslow, R A
1998-03-01
Complex immunogenetic associations of disease involving a large number of gene products are difficult to evaluate with traditional statistical methods and may require complex modeling. The authors evaluated the performance of feed-forward backpropagation neural networks in predicting rapid progression to acquired immunodeficiency syndrome (AIDS) for patients with human immunodeficiency virus (HIV) infection on the basis of major histocompatibility complex variables. Networks were trained on data from patients from the Multicenter AIDS Cohort Study (n = 139) and then validated on patients from the DC Gay cohort (n = 102). The outcome of interest was rapid disease progression, defined as progression to AIDS in <6 years from seroconversion. Human leukocyte antigen (HLA) variables were selected as network inputs with multivariate regression and a previously described algorithm selecting markers with extreme point estimates for progression risk. Network performance was compared with that of logistic regression. Networks with 15 HLA inputs and a single hidden layer of five nodes achieved a sensitivity of 87.5% and specificity of 95.6% in the training set, vs. 77.0% and 76.9%, respectively, achieved by logistic regression. When validated on the DC Gay cohort, networks averaged a sensitivity of 59.1% and specificity of 74.3%, vs. 53.1% and 61.4%, respectively, for logistic regression. Neural networks offer further support to the notion that HIV disease progression may be dependent on complex interactions between different class I and class II alleles and transporters associated with antigen processing variants. The effect in the current models is of moderate magnitude, and more data as well as other host and pathogen variables may need to be considered to improve the performance of the models. Artificial intelligence methods may complement linear statistical methods for evaluating immunogenetic associations of disease.
Ren, Anna N; Neher, Robert E; Bell, Tyler; Grimm, James
2018-06-01
Preoperative planning is important to achieve successful implantation in primary total knee arthroplasty (TKA). However, traditional TKA templating techniques are not accurate enough to predict the component size to a very close range. With the goal of developing a general predictive statistical model using patient demographic information, ordinal logistic regression was applied to build a proportional odds model to predict the tibia component size. The study retrospectively collected the data of 1992 primary Persona Knee System TKA procedures. Of them, 199 procedures were randomly selected as testing data and the rest of the data were randomly partitioned between model training data and model evaluation data with a ratio of 7:3. Different models were trained and evaluated on the training and validation data sets after data exploration. The final model had patient gender, age, weight, and height as independent variables and predicted the tibia size within 1 size difference 96% of the time on the validation data, 94% of the time on the testing data, and 92% on a prospective cadaver data set. The study results indicated the statistical model built by ordinal logistic regression can increase the accuracy of tibia sizing information for Persona Knee preoperative templating. This research shows statistical modeling may be used with radiographs to dramatically enhance the templating accuracy, efficiency, and quality. In general, this methodology can be applied to other TKA products when the data are applicable. Copyright © 2018 Elsevier Inc. All rights reserved.
Worku, Yohannes; Muchie, Mammo
2012-01-01
Objective. The objective was to investigate factors that affect the efficient management of solid waste produced by commercial businesses operating in the city of Pretoria, South Africa. Methods. Data was gathered from 1,034 businesses. Efficiency in solid waste management was assessed by using a structural time-based model designed for evaluating efficiency as a function of the length of time required to manage waste. Data analysis was performed using statistical procedures such as frequency tables, Pearson's chi-square tests of association, and binary logistic regression analysis. Odds ratios estimated from logistic regression analysis were used for identifying key factors that affect efficiency in the proper disposal of waste. Results. The study showed that 857 of the 1,034 businesses selected for the study (83%) were found to be efficient enough with regards to the proper collection and disposal of solid waste. Based on odds ratios estimated from binary logistic regression analysis, efficiency in the proper management of solid waste was significantly influenced by 4 predictor variables. These 4 influential predictor variables are lack of adherence to waste management regulations, wrong perception, failure to provide customers with enough trash cans, and operation of businesses by employed managers, in a decreasing order of importance. PMID:23209483
NASA Astrophysics Data System (ADS)
Lin, Yingzhi; Deng, Xiangzheng; Li, Xing; Ma, Enjun
2014-12-01
Spatially explicit simulation of land use change is the basis for estimating the effects of land use and cover change on energy fluxes, ecology and the environment. At the pixel level, logistic regression is one of the most common approaches used in spatially explicit land use allocation models to determine the relationship between land use and its causal factors in driving land use change, and thereby to evaluate land use suitability. However, these models have a drawback in that they do not determine/allocate land use based on the direct relationship between land use change and its driving factors. Consequently, a multinomial logistic regression method was introduced to address this flaw, and thereby, judge the suitability of a type of land use in any given pixel in a case study area of the Jiangxi Province, China. A comparison of the two regression methods indicated that the proportion of correctly allocated pixels using multinomial logistic regression was 92.98%, which was 8.47% higher than that obtained using logistic regression. Paired t-test results also showed that pixels were more clearly distinguished by multinomial logistic regression than by logistic regression. In conclusion, multinomial logistic regression is a more efficient and accurate method for the spatial allocation of land use changes. The application of this method in future land use change studies may improve the accuracy of predicting the effects of land use and cover change on energy fluxes, ecology, and environment.
The use of generalized estimating equations in the analysis of motor vehicle crash data.
Hutchings, Caroline B; Knight, Stacey; Reading, James C
2003-01-01
The purpose of this study was to determine if it is necessary to use generalized estimating equations (GEEs) in the analysis of seat belt effectiveness in preventing injuries in motor vehicle crashes. The 1992 Utah crash dataset was used, excluding crash participants where seat belt use was not appropriate (n=93,633). The model used in the 1996 Report to Congress [Report to congress on benefits of safety belts and motorcycle helmets, based on data from the Crash Outcome Data Evaluation System (CODES). National Center for Statistics and Analysis, NHTSA, Washington, DC, February 1996] was analyzed for all occupants with logistic regression, one level of nesting (occupants within crashes), and two levels of nesting (occupants within vehicles within crashes) to compare the use of GEEs with logistic regression. When using one level of nesting compared to logistic regression, 13 of 16 variance estimates changed more than 10%, and eight of 16 parameter estimates changed more than 10%. In addition, three of the independent variables changed from significant to insignificant (alpha=0.05). With the use of two levels of nesting, two of 16 variance estimates and three of 16 parameter estimates changed more than 10% from the variance and parameter estimates in one level of nesting. One of the independent variables changed from insignificant to significant (alpha=0.05) in the two levels of nesting model; therefore, only two of the independent variables changed from significant to insignificant when the logistic regression model was compared to the two levels of nesting model. The odds ratio of seat belt effectiveness in preventing injuries was 12% lower when a one-level nested model was used. Based on these results, we stress the need to use a nested model and GEEs when analyzing motor vehicle crash data.
Chung, Doo Yong; Cho, Kang Su; Lee, Dae Hun; Han, Jang Hee; Kang, Dong Hyuk; Jung, Hae Do; Kown, Jong Kyou; Ham, Won Sik; Choi, Young Deuk; Lee, Joo Yong
2015-01-01
Purpose This study was conducted to evaluate colic pain as a prognostic pretreatment factor that can influence ureter stone clearance and to estimate the probability of stone-free status in shock wave lithotripsy (SWL) patients with a ureter stone. Materials and Methods We retrospectively reviewed the medical records of 1,418 patients who underwent their first SWL between 2005 and 2013. Among these patients, 551 had a ureter stone measuring 4–20 mm and were thus eligible for our analyses. The colic pain as the chief complaint was defined as either subjective flank pain during history taking and physical examination. Propensity-scores for established for colic pain was calculated for each patient using multivariate logistic regression based upon the following covariates: age, maximal stone length (MSL), and mean stone density (MSD). Each factor was evaluated as predictor for stone-free status by Bayesian and non-Bayesian logistic regression model. Results After propensity-score matching, 217 patients were extracted in each group from the total patient cohort. There were no statistical differences in variables used in propensity- score matching. One-session success and stone-free rate were also higher in the painful group (73.7% and 71.0%, respectively) than in the painless group (63.6% and 60.4%, respectively). In multivariate non-Bayesian and Bayesian logistic regression models, a painful stone, shorter MSL, and lower MSD were significant factors for one-session stone-free status in patients who underwent SWL. Conclusions Colic pain in patients with ureter calculi was one of the significant predicting factors including MSL and MSD for one-session stone-free status of SWL. PMID:25902059
NASA Technical Reports Server (NTRS)
Smith, Kelly M.; Gay, Robert S.; Stachowiak, Susan J.
2013-01-01
In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter. In order to increase overall robustness, the vehicle also has an alternate method of triggering the drogue parachute deployment based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this velocity-based trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers excellent performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles.
ERIC Educational Resources Information Center
Yamaguchi, Kazuo
2016-01-01
This article describes (1) the survey methodological and statistical characteristics of the nonrandomized method for surveying sensitive questions for both cross-sectional and panel survey data and (2) the way to use the incompletely observed variable obtained from this survey method in logistic regression and in loglinear and log-multiplicative…
Statistical modeling of landslide hazard using GIS
Peter V. Gorsevski; Randy B. Foltz; Paul E. Gessler; Terrance W. Cundy
2001-01-01
A model for spatial prediction of landslide hazard was applied to a watershed affected by landslide events that occurred during the winter of 1995-96, following heavy rains, and snowmelt. Digital elevation data with 22.86 m x 22.86 m resolution was used for deriving topographic attributes used for modeling. The model is based on the combination of logistic regression...
An Exploration of Teacher Attrition and Mobility in High Poverty Racially Segregated Schools
ERIC Educational Resources Information Center
Djonko-Moore, Cara M.
2016-01-01
The purpose of this study was to examine the mobility (movement to a new school) and attrition (quitting teaching) patterns of teachers in high poverty, racially segregated (HPRS) schools in the US. Using 2007-9 survey data from the National Center for Education Statistics, a multi-level multinomial logistic regression was performed to examine the…
Howard B. Stauffer; Cynthia J. Zabel; Jeffrey R. Dunk
2005-01-01
We compared a set of competing logistic regression habitat selection models for Northern Spotted Owls (Strix occidentalis caurina) in California. The habitat selection models were estimated, compared, evaluated, and tested using multiple sample datasets collected on federal forestlands in northern California. We used Bayesian methods in interpreting...
ERIC Educational Resources Information Center
Duncan, Amie W.; Bishop, Somer L.
2015-01-01
Daily living skills standard scores on the Vineland Adaptive Behavior Scales-2nd edition were examined in 417 adolescents from the Simons Simplex Collection. All participants had at least average intelligence and a diagnosis of autism spectrum disorder. Descriptive statistics and binary logistic regressions were used to examine the prevalence and…
AP® Potential Predicted by PSAT/NMSQT® Scores Using Logistic Regression. Statistical Report 2014-1
ERIC Educational Resources Information Center
Zhang, Xiuyuan; Patel, Priyank; Ewing, Maureen
2014-01-01
AP Potential™ is an educational guidance tool that uses PSAT/NMSQT® scores to identify students who have the potential to do well on one or more Advanced Placement® (AP®) Exams. Students identified as having AP potential, perhaps students who would not have been otherwise identified, should consider enrolling in the corresponding AP course if they…
ERIC Educational Resources Information Center
Porter, Stephen R.
Annual funds face pressures to contact all alumni to maximize participation, but these efforts are costly. This paper uses a logistic regression model to predict likely donors among alumni from the College of Arts & Humanities at the University of Maryland, College Park. Alumni were grouped according to their predicted probability of donating…
Board, Amy R; Suzuki, Sumihiro
2016-01-01
Previous research has documented that parasite infection may increase vulnerability to TB among certain at risk populations. The purpose of this study was to identify whether an association exists between latent tuberculosis infection (LTBI) and intestinal parasite infection among newly resettled refugees in Texas while controlling for additional effects of region of origin, age and sex. Data for all refugees screened for both TB and intestinal parasites between January 2010 and mid-October 2013 were obtained from the Texas Refugee Health Screening Program and were analyzed using logistic regression. A total of 9860 refugees were included. In multivariable logistic regression analysis, pathogenic and non-pathogenic intestinal parasite infections yielded statistically significant reduced odds of LTBI. However, when individual parasite species were analyzed, hookworm infection indicated statistically significant increased odds of LTBI (OR 1.674, CI: 1.126-2.488). A positive association exists between hookworm infection and LTBI in newly arrived refugees to Texas. More research is needed to assess the nature and extent of these associations. © The Author 2015. Published by Oxford University Press on behalf of Royal Society of Tropical Medicine and Hygiene. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
WINPEPI updated: computer programs for epidemiologists, and their teaching potential
2011-01-01
Background The WINPEPI computer programs for epidemiologists are designed for use in practice and research in the health field and as learning or teaching aids. The programs are free, and can be downloaded from the Internet. Numerous additions have been made in recent years. Implementation There are now seven WINPEPI programs: DESCRIBE, for use in descriptive epidemiology; COMPARE2, for use in comparisons of two independent groups or samples; PAIRSetc, for use in comparisons of paired and other matched observations; LOGISTIC, for logistic regression analysis; POISSON, for Poisson regression analysis; WHATIS, a "ready reckoner" utility program; and ETCETERA, for miscellaneous other procedures. The programs now contain 122 modules, each of which provides a number, sometimes a large number, of statistical procedures. The programs are accompanied by a Finder that indicates which modules are appropriate for different purposes. The manuals explain the uses, limitations and applicability of the procedures, and furnish formulae and references. Conclusions WINPEPI is a handy resource for a wide variety of statistical routines used by epidemiologists. Because of its ready availability, portability, ease of use, and versatility, WINPEPI has a considerable potential as a learning and teaching aid, both with respect to practical procedures in the planning and analysis of epidemiological studies, and with respect to important epidemiological concepts. It can also be used as an aid in the teaching of general basic statistics. PMID:21288353
Lewis, Kristin Nicole; Heckman, Bernadette Davantes; Himawan, Lina
2011-08-01
Growth mixture modeling (GMM) identified latent groups based on treatment outcome trajectories of headache disability measures in patients in headache subspecialty treatment clinics. Using a longitudinal design, 219 patients in headache subspecialty clinics in 4 large cities throughout Ohio provided data on their headache disability at pretreatment and 3 follow-up assessments. GMM identified 3 treatment outcome trajectory groups: (1) patients who initiated treatment with elevated disability levels and who reported statistically significant reductions in headache disability (high-disability improvers; 11%); (2) patients who initiated treatment with elevated disability but who reported no reductions in disability (high-disability nonimprovers; 34%); and (3) patients who initiated treatment with moderate disability and who reported statistically significant reductions in headache disability (moderate-disability improvers; 55%). Based on the final multinomial logistic regression model, a dichotomized treatment appointment attendance variable was a statistically significant predictor for differentiating high-disability improvers from high-disability nonimprovers. Three-fourths of patients who initiated treatment with elevated disability levels did not report reductions in disability after 5 months of treatment with new preventive pharmacotherapies. Preventive headache agents may be most efficacious for patients with moderate levels of disability and for patients with high disability levels who attend all treatment appointments. Copyright © 2011 International Association for the Study of Pain. Published by Elsevier B.V. All rights reserved.
Logistic Regression in the Identification of Hazards in Construction
NASA Astrophysics Data System (ADS)
Drozd, Wojciech
2017-10-01
The construction site and its elements create circumstances that are conducive to the formation of risks to safety during the execution of works. Analysis indicates the critical importance of these factors in the set of characteristics that describe the causes of accidents in the construction industry. This article attempts to analyse the characteristics related to the construction site, in order to indicate their importance in defining the circumstances of accidents at work. The study includes sites inspected in 2014 - 2016 by the employees of the District Labour Inspectorate in Krakow (Poland). The analysed set of detailed (disaggregated) data includes both quantitative and qualitative characteristics. The substantive task focused on classification modelling in the identification of hazards in construction and identifying those of the analysed characteristics that are important in an accident. In terms of methodology, resource data analysis using statistical classifiers, in the form of logistic regression, was the method used.
NASA Astrophysics Data System (ADS)
Sánchez, Clara I.; Hornero, Roberto; Mayo, Agustín; García, María
2009-02-01
Diabetic Retinopathy is one of the leading causes of blindness and vision defects in developed countries. An early detection and diagnosis is crucial to avoid visual complication. Microaneurysms are the first ocular signs of the presence of this ocular disease. Their detection is of paramount importance for the development of a computer-aided diagnosis technique which permits a prompt diagnosis of the disease. However, the detection of microaneurysms in retinal images is a difficult task due to the wide variability that these images usually present in screening programs. We propose a statistical approach based on mixture model-based clustering and logistic regression which is robust to the changes in the appearance of retinal fundus images. The method is evaluated on the public database proposed by the Retinal Online Challenge in order to obtain an objective performance measure and to allow a comparative study with other proposed algorithms.
Lee, Gyu-Young; Choi, Yun-Jung
2015-08-01
In a cross-sectional research design, we investigated factors related to suicidal ideation in adolescents using data from the 2013 Online Survey of Youth Health Behavior in Korea. This self-report questionnaire was administered to 72,435 adolescents aged 13-18 years in middle and high school. School characteristics, family characteristics, and mental health variables were analyzed using descriptive statistics, χ(2) tests, and logistic regression. Both suicidal ideation and behavior were more common in girls. Suicidal ideation was most common in 11th grade for boys and 8th grade for girls. Across the sample, in logistic regression, suicidal ideation was predicted by low socioeconomic status, high stress, inadequate sleep, substance use, alcohol use, and smoking. Living apart from family predicted suicidal ideation in boys but not in girls. Gender- and school-grade-specific intervention programs may be useful for reducing suicidal ideation in students. © 2015 Wiley Periodicals, Inc.
Binary logistic regression modelling: Measuring the probability of relapse cases among drug addict
NASA Astrophysics Data System (ADS)
Ismail, Mohd Tahir; Alias, Siti Nor Shadila
2014-07-01
For many years Malaysia faced the drug addiction issues. The most serious case is relapse phenomenon among treated drug addict (drug addict who have under gone the rehabilitation programme at Narcotic Addiction Rehabilitation Centre, PUSPEN). Thus, the main objective of this study is to find the most significant factor that contributes to relapse to happen. The binary logistic regression analysis was employed to model the relationship between independent variables (predictors) and dependent variable. The dependent variable is the status of the drug addict either relapse, (Yes coded as 1) or not, (No coded as 0). Meanwhile the predictors involved are age, age at first taking drug, family history, education level, family crisis, community support and self motivation. The total of the sample is 200 which the data are provided by AADK (National Antidrug Agency). The finding of the study revealed that age and self motivation are statistically significant towards the relapse cases..
Real, Jordi; Forné, Carles; Roso-Llorach, Albert; Martínez-Sánchez, Jose M
2016-05-01
Controlling for confounders is a crucial step in analytical observational studies, and multivariable models are widely used as statistical adjustment techniques. However, the validation of the assumptions of the multivariable regression models (MRMs) should be made clear in scientific reporting. The objective of this study is to review the quality of statistical reporting of the most commonly used MRMs (logistic, linear, and Cox regression) that were applied in analytical observational studies published between 2003 and 2014 by journals indexed in MEDLINE.Review of a representative sample of articles indexed in MEDLINE (n = 428) with observational design and use of MRMs (logistic, linear, and Cox regression). We assessed the quality of reporting about: model assumptions and goodness-of-fit, interactions, sensitivity analysis, crude and adjusted effect estimate, and specification of more than 1 adjusted model.The tests of underlying assumptions or goodness-of-fit of the MRMs used were described in 26.2% (95% CI: 22.0-30.3) of the articles and 18.5% (95% CI: 14.8-22.1) reported the interaction analysis. Reporting of all items assessed was higher in articles published in journals with a higher impact factor.A low percentage of articles indexed in MEDLINE that used multivariable techniques provided information demonstrating rigorous application of the model selected as an adjustment method. Given the importance of these methods to the final results and conclusions of observational studies, greater rigor is required in reporting the use of MRMs in the scientific literature.
Standards for Standardized Logistic Regression Coefficients
ERIC Educational Resources Information Center
Menard, Scott
2011-01-01
Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…
Schörgendorfer, Angela; Branscum, Adam J; Hanson, Timothy E
2013-06-01
Logistic regression is a popular tool for risk analysis in medical and population health science. With continuous response data, it is common to create a dichotomous outcome for logistic regression analysis by specifying a threshold for positivity. Fitting a linear regression to the nondichotomized response variable assuming a logistic sampling model for the data has been empirically shown to yield more efficient estimates of odds ratios than ordinary logistic regression of the dichotomized endpoint. We illustrate that risk inference is not robust to departures from the parametric logistic distribution. Moreover, the model assumption of proportional odds is generally not satisfied when the condition of a logistic distribution for the data is violated, leading to biased inference from a parametric logistic analysis. We develop novel Bayesian semiparametric methodology for testing goodness of fit of parametric logistic regression with continuous measurement data. The testing procedures hold for any cutoff threshold and our approach simultaneously provides the ability to perform semiparametric risk estimation. Bayes factors are calculated using the Savage-Dickey ratio for testing the null hypothesis of logistic regression versus a semiparametric generalization. We propose a fully Bayesian and a computationally efficient empirical Bayesian approach to testing, and we present methods for semiparametric estimation of risks, relative risks, and odds ratios when parametric logistic regression fails. Theoretical results establish the consistency of the empirical Bayes test. Results from simulated data show that the proposed approach provides accurate inference irrespective of whether parametric assumptions hold or not. Evaluation of risk factors for obesity shows that different inferences are derived from an analysis of a real data set when deviations from a logistic distribution are permissible in a flexible semiparametric framework. © 2013, The International Biometric Society.
Ngo, Long H; Inouye, Sharon K; Jones, Richard N; Travison, Thomas G; Libermann, Towia A; Dillon, Simon T; Kuchel, George A; Vasunilashorn, Sarinnapha M; Alsop, David C; Marcantonio, Edward R
2017-06-06
The nested case-control study (NCC) design within a prospective cohort study is used when outcome data are available for all subjects, but the exposure of interest has not been collected, and is difficult or prohibitively expensive to obtain for all subjects. A NCC analysis with good matching procedures yields estimates that are as efficient and unbiased as estimates from the full cohort study. We present methodological considerations in a matched NCC design and analysis, which include the choice of match algorithms, analysis methods to evaluate the association of exposures of interest with outcomes, and consideration of overmatching. Matched, NCC design within a longitudinal observational prospective cohort study in the setting of two academic hospitals. Study participants are patients aged over 70 years who underwent scheduled major non-cardiac surgery. The primary outcome was postoperative delirium from in-hospital interviews and medical record review. The main exposure was IL-6 concentration (pg/ml) from blood sampled at three time points before delirium occurred. We used nonparametric signed ranked test to test for the median of the paired differences. We used conditional logistic regression to model the risk of IL-6 on delirium incidence. Simulation was used to generate a sample of cohort data on which unconditional multivariable logistic regression was used, and the results were compared to those of the conditional logistic regression. Partial R-square was used to assess the level of overmatching. We found that the optimal match algorithm yielded more matched pairs than the greedy algorithm. The choice of analytic strategy-whether to consider measured cytokine levels as the predictor or outcome-- yielded inferences that have different clinical interpretations but similar levels of statistical significance. Estimation results from NCC design using conditional logistic regression, and from simulated cohort design using unconditional logistic regression, were similar. We found minimal evidence for overmatching. Using a matched NCC approach introduces methodological challenges into the study design and data analysis. Nonetheless, with careful selection of the match algorithm, match factors, and analysis methods, this design is cost effective and, for our study, yields estimates that are similar to those from a prospective cohort study design.
Westreich, Daniel; Lessler, Justin; Funk, Michele Jonsson
2010-01-01
Summary Objective Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this Review was to assess machine learning alternatives to logistic regression which may accomplish the same goals but with fewer assumptions or greater accuracy. Study Design and Setting We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. Results We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (CART), and meta-classifiers (in particular, boosting). Conclusion While the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and to a lesser extent decision trees (particularly CART) appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice. PMID:20630332
Fungible weights in logistic regression.
Jones, Jeff A; Waller, Niels G
2016-06-01
In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Westreich, Daniel; Lessler, Justin; Funk, Michele Jonsson
2010-08-01
Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this review was to assess machine learning alternatives to logistic regression, which may accomplish the same goals but with fewer assumptions or greater accuracy. We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (classification and regression trees [CART]), and meta-classifiers (in particular, boosting). Although the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and, to a lesser extent, decision trees (particularly CART), appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice. Copyright (c) 2010 Elsevier Inc. All rights reserved.
Perez, Ivan; Chavez, Allison K; Ponce, Dario
2016-01-01
The Ricketts' posteroanterior (PA) cephalometry seems to be the most widely used and it has not been tested by multivariate statistics for sex determination. The objective was to determine the applicability of Ricketts' PA cephalometry for sex determination using the logistic regression analysis. The logistic models were estimated at distinct age cutoffs (all ages, 11 years, 13 years, and 15 years) in a database from 1,296 Hispano American Peruvians between 5 years and 44 years of age. The logistic models were composed by six cephalometric measurements; the accuracy achieved by resubstitution varied between 60% and 70% and all the variables, with one exception, exhibited a direct relationship with the probability of being classified as male; the nasal width exhibited an indirect relationship. The maxillary and facial widths were present in all models and may represent a sexual dimorphism indicator. The accuracy found was lower than the literature and the Ricketts' PA cephalometry may not be adequate for sex determination. The indirect relationship of the nasal width in models with data from patients of 12 years of age or less may be a trait related to age or a characteristic in the studied population, which could be better studied and confirmed.
Should metacognition be measured by logistic regression?
Rausch, Manuel; Zehetleitner, Michael
2017-03-01
Are logistic regression slopes suitable to quantify metacognitive sensitivity, i.e. the efficiency with which subjective reports differentiate between correct and incorrect task responses? We analytically show that logistic regression slopes are independent from rating criteria in one specific model of metacognition, which assumes (i) that rating decisions are based on sensory evidence generated independently of the sensory evidence used for primary task responses and (ii) that the distributions of evidence are logistic. Given a hierarchical model of metacognition, logistic regression slopes depend on rating criteria. According to all considered models, regression slopes depend on the primary task criterion. A reanalysis of previous data revealed that massive numbers of trials are required to distinguish between hierarchical and independent models with tolerable accuracy. It is argued that researchers who wish to use logistic regression as measure of metacognitive sensitivity need to control the primary task criterion and rating criteria. Copyright © 2017 Elsevier Inc. All rights reserved.
Sauzet, Odile; Peacock, Janet L
2017-07-20
The analysis of perinatal outcomes often involves datasets with some multiple births. These are datasets mostly formed of independent observations and a limited number of clusters of size two (twins) and maybe of size three or more. This non-independence needs to be accounted for in the statistical analysis. Using simulated data based on a dataset of preterm infants we have previously investigated the performance of several approaches to the analysis of continuous outcomes in the presence of some clusters of size two. Mixed models have been developed for binomial outcomes but very little is known about their reliability when only a limited number of small clusters are present. Using simulated data based on a dataset of preterm infants we investigated the performance of several approaches to the analysis of binomial outcomes in the presence of some clusters of size two. Logistic models, several methods of estimation for the logistic random intercept models and generalised estimating equations were compared. The presence of even a small percentage of twins means that a logistic regression model will underestimate all parameters but a logistic random intercept model fails to estimate the correlation between siblings if the percentage of twins is too small and will provide similar estimates to logistic regression. The method which seems to provide the best balance between estimation of the standard error and the parameter for any percentage of twins is the generalised estimating equations. This study has shown that the number of covariates or the level two variance do not necessarily affect the performance of the various methods used to analyse datasets containing twins but when the percentage of small clusters is too small, mixed models cannot capture the dependence between siblings.
Neck-focused panic attacks among Cambodian refugees; a logistic and linear regression analysis.
Hinton, Devon E; Chhean, Dara; Pich, Vuth; Um, Khin; Fama, Jeanne M; Pollack, Mark H
2006-01-01
Consecutive Cambodian refugees attending a psychiatric clinic were assessed for the presence and severity of current--i.e., at least one episode in the last month--neck-focused panic. Among the whole sample (N=130), in a logistic regression analysis, the Anxiety Sensitivity Index (ASI; odds ratio=3.70) and the Clinician-Administered PTSD Scale (CAPS; odds ratio=2.61) significantly predicted the presence of current neck panic (NP). Among the neck panic patients (N=60), in the linear regression analysis, NP severity was significantly predicted by NP-associated flashbacks (beta=.42), NP-associated catastrophic cognitions (beta=.22), and CAPS score (beta=.28). Further analysis revealed the effect of the CAPS score to be significantly mediated (Sobel test [Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173-1182]) by both NP-associated flashbacks and catastrophic cognitions. In the care of traumatized Cambodian refugees, NP severity, as well as NP-associated flashbacks and catastrophic cognitions, should be specifically assessed and treated.
London Measure of Unplanned Pregnancy: guidance for its use as an outcome measure
Hall, Jennifer A; Barrett, Geraldine; Copas, Andrew; Stephenson, Judith
2017-01-01
Background The London Measure of Unplanned Pregnancy (LMUP) is a psychometrically validated measure of the degree of intention of a current or recent pregnancy. The LMUP is increasingly being used worldwide, and can be used to evaluate family planning or preconception care programs. However, beyond recommending the use of the full LMUP scale, there is no published guidance on how to use the LMUP as an outcome measure. Ordinal logistic regression has been recommended informally, but studies published to date have all used binary logistic regression and dichotomized the scale at different cut points. There is thus a need for evidence-based guidance to provide a standardized methodology for multivariate analysis and to enable comparison of results. This paper makes recommendations for the regression method for analysis of the LMUP as an outcome measure. Materials and methods Data collected from 4,244 pregnant women in Malawi were used to compare five regression methods: linear, logistic with two cut points, and ordinal logistic with either the full or grouped LMUP score. The recommendations were then tested on the original UK LMUP data. Results There were small but no important differences in the findings across the regression models. Logistic regression resulted in the largest loss of information, and assumptions were violated for the linear and ordinal logistic regression. Consequently, robust standard errors were used for linear regression and a partial proportional odds ordinal logistic regression model attempted. The latter could only be fitted for grouped LMUP score. Conclusion We recommend the linear regression model with robust standard errors to make full use of the LMUP score when analyzed as an outcome measure. Ordinal logistic regression could be considered, but a partial proportional odds model with grouped LMUP score may be required. Logistic regression is the least-favored option, due to the loss of information. For logistic regression, the cut point for un/planned pregnancy should be between nine and ten. These recommendations will standardize the analysis of LMUP data and enhance comparability of results across studies. PMID:28435343
Logistic models--an odd(s) kind of regression.
Jupiter, Daniel C
2013-01-01
The logistic regression model bears some similarity to the multivariable linear regression with which we are familiar. However, the differences are great enough to warrant a discussion of the need for and interpretation of logistic regression. Copyright © 2013 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.
Comparison of two landslide susceptibility assessments in the Champagne-Ardenne region (France)
NASA Astrophysics Data System (ADS)
Den Eeckhaut, M. Van; Marre, A.; Poesen, J.
2010-02-01
The vineyards of the Montagne de Reims are mostly planted on steep south-oriented cuesta fronts receiving a maximum of sun radiation. Due to the location of the vineyards on steep hillslopes, the viticultural activity is threatened by slope failures. This study attempts to better understand the spatial patterns of landslide susceptibility in the Champagne-Ardenne region by comparing a heuristic (qualitative) and a statistical (quantitative) model in a 1120 km² study area. The heuristic landslide susceptibility model was adopted from the Bureau de Recherches Géologiques et Minières, the GEGEAA - Reims University and the Comité Interprofessionnel du Vin de Champagne. In this model, expert knowledge of the region was used to assign weights to all slope classes and lithologies present in the area, but the final susceptibility map was never evaluated with the location of mapped landslides. For the statistical landslide susceptibility assessment, logistic regression was applied to a dataset of 291 'old' (Holocene) landslides. The robustness of the logistic regression model was evaluated and ROC curves were used for model calibration and validation. With regard to the variables assumed to be important environmental factors controlling landslides, the two models are in agreement. They both indicate that present and future landslides are mainly controlled by slope gradient and lithology. However, the comparison of the two landslide susceptibility maps through (1) an evaluation with the location of mapped 'old' landslides and through (2) a temporal validation with spatial data of 'recent' (1960-1999; n = 48) and 'very recent' (2000-2008; n = 46) landslides showed a better prediction capacity for the statistical model produced in this study compared to the heuristic model. In total, the statistically-derived landslide susceptibility map succeeded in correctly classifying 81.0% of the 'old' and 91.6% of the 'recent' and 'very recent' landslides. On the susceptibility map derived from the heuristic model, on the other hand, only 54.6% of the 'old' and 64.0% of the 'recent' and 'very recent' landslides were correctly classified as unstable. Hence, the landslide susceptibility map obtained from logistic regression is a better tool for regional landslide susceptibility analysis in the study area of the Montagne de Reims. The accurate classification of zones with very high and high susceptibility allows delineating zones where viticulturists should be informed and where implementation of precaution measures is needed to secure slope stability.
Is parenting style a predictor of suicide attempts in a representative sample of adolescents?
Donath, Carolin; Graessel, Elmar; Baier, Dirk; Bleich, Stefan; Hillemacher, Thomas
2014-04-26
Suicidal ideation and suicide attempts are serious but not rare conditions in adolescents. However, there are several research and practical suicide-prevention initiatives that discuss the possibility of preventing serious self-harm. Profound knowledge about risk and protective factors is therefore necessary. The aim of this study is a) to clarify the role of parenting behavior and parenting styles in adolescents' suicide attempts and b) to identify other statistically significant and clinically relevant risk and protective factors for suicide attempts in a representative sample of German adolescents. In the years 2007/2008, a representative written survey of N = 44,610 students in the 9th grade of different school types in Germany was conducted. In this survey, the lifetime prevalence of suicide attempts was investigated as well as potential predictors including parenting behavior. A three-step statistical analysis was carried out: I) As basic model, the association between parenting and suicide attempts was explored via binary logistic regression controlled for age and sex. II) The predictive values of 13 additional potential risk/protective factors were analyzed with single binary logistic regression analyses for each predictor alone. Non-significant predictors were excluded in Step III. III) In a multivariate binary logistic regression analysis, all significant predictor variables from Step II and the parenting styles were included after testing for multicollinearity. Three parental variables showed a relevant association with suicide attempts in adolescents - (all protective): mother's warmth and father's warmth in childhood and mother's control in adolescence (Step I). In the full model (Step III), Authoritative parenting (protective: OR: .79) and Rejecting-Neglecting parenting (risk: OR: 1.63) were identified as significant predictors (p < .001) for suicidal attempts. Seven further variables were interpreted to be statistically significant and clinically relevant: ADHD, female sex, smoking, Binge Drinking, absenteeism/truancy, migration background, and parental separation events. Parenting style does matter. While children of Authoritative parents profit, children of Rejecting-Neglecting parents are put at risk - as we were able to show for suicide attempts in adolescence. Some of the identified risk factors contribute new knowledge and potential areas of intervention for special groups such as migrants or children diagnosed with ADHD.
A Predictive Model for Readmissions Among Medicare Patients in a California Hospital.
Duncan, Ian; Huynh, Nhan
2017-11-17
Predictive models for hospital readmission rates are in high demand because of the Centers for Medicare & Medicaid Services (CMS) Hospital Readmission Reduction Program (HRRP). The LACE index is one of the most popular predictive tools among hospitals in the United States. The LACE index is a simple tool with 4 parameters: Length of stay, Acuity of admission, Comorbidity, and Emergency visits in the previous 6 months. The authors applied logistic regression to develop a predictive model for a medium-sized not-for-profit community hospital in California using patient-level data with more specific patient information (including 13 explanatory variables). Specifically, the logistic regression is applied to 2 populations: a general population including all patients and the specific group of patients targeted by the CMS penalty (characterized as ages 65 or older with select conditions). The 2 resulting logistic regression models have a higher sensitivity rate compared to the sensitivity of the LACE index. The C statistic values of the model applied to both populations demonstrate moderate levels of predictive power. The authors also build an economic model to demonstrate the potential financial impact of the use of the model for targeting high-risk patients in a sample hospital and demonstrate that, on balance, whether the hospital gains or loses from reducing readmissions depends on its margin and the extent of its readmission penalties.
Machado-Carvalhais, Helenaura P; Ramos-Jorge, Maria L; Auad, Sheyla M; Martins, Laura H P M; Paiva, Saul M; Pordeus, Isabela A
2008-10-01
The aims of this cross-sectional study were to determine the prevalence of occupational accidents with exposure to biological material among undergraduate students of dentistry and to estimate potential risk factors associated with exposure to blood. Data were collected through a self-administered questionnaire (86.4 percent return rate), which was completed by a sample of 286 undergraduate dental students (mean age 22.4 +/-2.4 years). The students were enrolled in the clinical component of the curriculum, which corresponds to the final six semesters of study. Descriptive, bivariate, simple logistic regression and multiple logistic regression (Forward Stepwise Procedure) analyses were performed. The level of statistical significance was set at 5 percent. Percutaneous and mucous exposures to potentially infectious biological material were reported by 102 individuals (35.6 percent); 26.8 percent reported the occurrence of multiple episodes of exposure. The logistic regression analyses revealed that the incomplete use of individual protection equipment (OR=3.7; 95 percent CI 1.5-9.3), disciplines where surgical procedures are carried out (OR=16.3; 95 percent CI 7.1-37.2), and handling sharp instruments (OR=4.4; 95 percent CI 2.1-9.1), more specifically, hollow-bore needles (OR=6.8; 95 percent CI 2.1-19.0), were independently associated with exposure to blood. Policies of reviewing the procedures during clinical practice are recommended in order to reduce occupational exposure.
Reboussin, Beth A; Preisser, John S; Song, Eun-Young; Wolfson, Mark
2012-07-01
Under-age drinking is an enormous public health issue in the USA. Evidence that community level structures may impact on under-age drinking has led to a proliferation of efforts to change the environment surrounding the use of alcohol. Although the focus of these efforts is to reduce drinking by individual youths, environmental interventions are typically implemented at the community level with entire communities randomized to the same intervention condition. A distinct feature of these trials is the tendency of the behaviours of individuals residing in the same community to be more alike than that of others residing in different communities, which is herein called 'clustering'. Statistical analyses and sample size calculations must account for this clustering to avoid type I errors and to ensure an appropriately powered trial. Clustering itself may also be of scientific interest. We consider the alternating logistic regressions procedure within the population-averaged modelling framework to estimate the effect of a law enforcement intervention on the prevalence of under-age drinking behaviours while modelling the clustering at multiple levels, e.g. within communities and within neighbourhoods nested within communities, by using pairwise odds ratios. We then derive sample size formulae for estimating intervention effects when planning a post-test-only or repeated cross-sectional community-randomized trial using the alternating logistic regressions procedure.
NASA Technical Reports Server (NTRS)
Wolf, S. F.; Lipschutz, M. E.
1993-01-01
Multivariate statistical analysis techniques (linear discriminant analysis and logistic regression) can provide powerful discrimination tools which are generally unfamiliar to the planetary science community. Fall parameters were used to identify a group of 17 H chondrites (Cluster 1) that were part of a coorbital stream which intersected Earth's orbit in May, from 1855 - 1895, and can be distinguished from all other H chondrite falls. Using multivariate statistical techniques, it was demonstrated that a totally different criterion, labile trace element contents - hence thermal histories - or 13 Cluster 1 meteorites are distinguishable from those of 45 non-Cluster 1 H chondrites. Here, we focus upon the principles of multivariate statistical techniques and illustrate their application using non-meteoritic and meteoritic examples.
2012-01-01
Background When outcomes are binary, the c-statistic (equivalent to the area under the Receiver Operating Characteristic curve) is a standard measure of the predictive accuracy of a logistic regression model. Methods An analytical expression was derived under the assumption that a continuous explanatory variable follows a normal distribution in those with and without the condition. We then conducted an extensive set of Monte Carlo simulations to examine whether the expressions derived under the assumption of binormality allowed for accurate prediction of the empirical c-statistic when the explanatory variable followed a normal distribution in the combined sample of those with and without the condition. We also examine the accuracy of the predicted c-statistic when the explanatory variable followed a gamma, log-normal or uniform distribution in combined sample of those with and without the condition. Results Under the assumption of binormality with equality of variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the product of the standard deviation of the normal components (reflecting more heterogeneity) and the log-odds ratio (reflecting larger effects). Under the assumption of binormality with unequal variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the standardized difference of the explanatory variable in those with and without the condition. In our Monte Carlo simulations, we found that these expressions allowed for reasonably accurate prediction of the empirical c-statistic when the distribution of the explanatory variable was normal, gamma, log-normal, and uniform in the entire sample of those with and without the condition. Conclusions The discriminative ability of a continuous explanatory variable cannot be judged by its odds ratio alone, but always needs to be considered in relation to the heterogeneity of the population. PMID:22716998
Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model
NASA Astrophysics Data System (ADS)
Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami
2017-06-01
A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.
The purpose of this report is to provide a reference manual that could be used by investigators for making informed use of logistic regression using two methods (standard logistic regression and MARS). The details for analyses of relationships between a dependent binary response ...
Predicting U.S. Army Reserve Unit Manning Using Market Demographics
2015-06-01
develops linear regression , classification tree, and logistic regression models to determine the ability of the location to support manning requirements... logistic regression model delivers predictive results that allow decision-makers to identify locations with a high probability of meeting unit...manning requirements. The recommendation of this thesis is that the USAR implement the logistic regression model. 14. SUBJECT TERMS U.S
ERIC Educational Resources Information Center
Chen, Chau-Kuang
2005-01-01
Logistic and Cox regression methods are practical tools used to model the relationships between certain student learning outcomes and their relevant explanatory variables. The logistic regression model fits an S-shaped curve into a binary outcome with data points of zero and one. The Cox regression model allows investigators to study the duration…
Schlairet, Maura C; Schlairet, Timothy James; Sauls, Denise H; Bellflowers, Lois
2015-03-01
Establishing the impact of the high-fidelity simulation environment on student performance, as well as identifying factors that could predict learning, would refine simulation outcome expectations among educators. The purpose of this quasi-experimental pilot study was to explore the impact of simulation on emotion and cognitive load among beginning nursing students. Forty baccalaureate nursing students participated in teaching simulations, rated their emotional state and cognitive load, and completed evaluation simulations. Two principal components of emotion were identified representing the pleasant activation and pleasant deactivation components of affect. Mean rating of cognitive load following simulation was high. Linear regression identiffed slight but statistically nonsignificant positive associations between principal components of emotion and cognitive load. Logistic regression identified a negative but statistically nonsignificant effect of cognitive load on assessment performance. Among lower ability students, a more pronounced effect of cognitive load on assessment performance was observed; this also was statistically non-significant. Copyright 2015, SLACK Incorporated.
Yusuf, O B; Bamgboye, E A; Afolabi, R F; Shodimu, M A
2014-09-01
Logistic regression model is widely used in health research for description and predictive purposes. Unfortunately, most researchers are sometimes not aware that the underlying principles of the techniques have failed when the algorithm for maximum likelihood does not converge. Young researchers particularly postgraduate students may not know why separation problem whether quasi or complete occurs, how to identify it and how to fix it. This study was designed to critically evaluate convergence issues in articles that employed logistic regression analysis published in an African Journal of Medicine and medical sciences between 2004 and 2013. Problems of quasi or complete separation were described and were illustrated with the National Demographic and Health Survey dataset. A critical evaluation of articles that employed logistic regression was conducted. A total of 581 articles was reviewed, of which 40 (6.9%) used binary logistic regression. Twenty-four (60.0%) stated the use of logistic regression model in the methodology while none of the articles assessed model fit. Only 3 (12.5%) properly described the procedures. Of the 40 that used the logistic regression model, the problem of convergence occurred in 6 (15.0%) of the articles. Logistic regression tends to be poorly reported in studies published between 2004 and 2013. Our findings showed that the procedure may not be well understood by researchers since very few described the process in their reports and may be totally unaware of the problem of convergence or how to deal with it.
Míguez, A; Iftimi, A; Montes, F
2016-09-01
Epidemiologists agree that there is a prevailing seasonality in the presentation of epidemic waves of respiratory syncytial virus (RSV) infections and influenza. The aim of this study is to quantify the potential relationship between the activity of RSV, with respect to the influenza virus, in order to use the RSV seasonal curve as a predictor of the evolution of an influenza virus epidemic wave. Two statistical tools, logistic regression and time series, are used for predicting the evolution of influenza. Both logistic models and time series of influenza consider RSV information from previous weeks. Data consist of influenza and confirmed RSV cases reported in Comunitat Valenciana (Spain) during the period from week 40 (2010) to week 8 (2014). Binomial logistic regression models used to predict the two states of influenza wave, basal or peak, result in a rate of correct classification higher than 92% with the validation set. When a finer three-states categorization is established, basal, increasing peak and decreasing peak, the multinomial logistic model performs well in 88% of cases of the validation set. The ARMAX model fits well for influenza waves and shows good performance for short-term forecasts up to 3 weeks. The seasonal evolution of influenza virus can be predicted a minimum of 4 weeks in advance using logistic models based on RSV. It would be necessary to study more inter-pandemic seasons to establish a stronger relationship between the epidemic waves of both viruses.
Ahn, Jae Joon; Kim, Young Min; Yoo, Keunje; Park, Joonhong; Oh, Kyong Joo
2012-11-01
For groundwater conservation and management, it is important to accurately assess groundwater pollution vulnerability. This study proposed an integrated model using ridge regression and a genetic algorithm (GA) to effectively select the major hydro-geological parameters influencing groundwater pollution vulnerability in an aquifer. The GA-Ridge regression method determined that depth to water, net recharge, topography, and the impact of vadose zone media were the hydro-geological parameters that influenced trichloroethene pollution vulnerability in a Korean aquifer. When using these selected hydro-geological parameters, the accuracy was improved for various statistical nonlinear and artificial intelligence (AI) techniques, such as multinomial logistic regression, decision trees, artificial neural networks, and case-based reasoning. These results provide a proof of concept that the GA-Ridge regression is effective at determining influential hydro-geological parameters for the pollution vulnerability of an aquifer, and in turn, improves the AI performance in assessing groundwater pollution vulnerability.
Mocellin, Simone; Ambrosi, Alessandro; Montesco, Maria Cristina; Foletto, Mirto; Zavagno, Giorgio; Nitti, Donato; Lise, Mario; Rossi, Carlo Riccardo
2006-08-01
Currently, approximately 80% of melanoma patients undergoing sentinel node biopsy (SNB) have negative sentinel lymph nodes (SLNs), and no prediction system is reliable enough to be implemented in the clinical setting to reduce the number of SNB procedures. In this study, the predictive power of support vector machine (SVM)-based statistical analysis was tested. The clinical records of 246 patients who underwent SNB at our institution were used for this analysis. The following clinicopathologic variables were considered: the patient's age and sex and the tumor's histological subtype, Breslow thickness, Clark level, ulceration, mitotic index, lymphocyte infiltration, regression, angiolymphatic invasion, microsatellitosis, and growth phase. The results of SVM-based prediction of SLN status were compared with those achieved with logistic regression. The SLN positivity rate was 22% (52 of 234). When the accuracy was > or = 80%, the negative predictive value, positive predictive value, specificity, and sensitivity were 98%, 54%, 94%, and 77% and 82%, 41%, 69%, and 93% by using SVM and logistic regression, respectively. Moreover, SVM and logistic regression were associated with a diagnostic error and an SNB percentage reduction of (1) 1% and 60% and (2) 15% and 73%, respectively. The results from this pilot study suggest that SVM-based prediction of SLN status might be evaluated as a prognostic method to avoid the SNB procedure in 60% of patients currently eligible, with a very low error rate. If validated in larger series, this strategy would lead to obvious advantages in terms of both patient quality of life and costs for the health care system.
Li, Xiucun; Cui, Jianli; Maharjan, Suraj; Lu, Laijin; Gong, Xu
2016-01-01
Objective The purpose of this study is to determine the correlation between non-technical risk factors and the perioperative flap survival rate and to evaluate the choice of skin flap for the reconstruction of foot and ankle. Methods This was a clinical retrospective study. Nine variables were identified. The Kaplan-Meier method coupled with a log-rank test and a Cox regression model was used to predict the risk factors that influence the perioperative flap survival rate. The relationship between postoperative wound infection and risk factors was also analyzed using a logistic regression model. Results The overall flap survival rate was 85.42%. The necrosis rates of free flaps and pedicled flaps were 5.26% and 20.69%, respectively. According to the Cox regression model, flap type (hazard ratio [HR] = 2.592; 95% confidence interval [CI] (1.606, 4.184); P < 0.001) and postoperative wound infection (HR = 0.266; 95% CI (0.134, 0.529); P < 0.001) were found to be statistically significant risk factors associated with flap necrosis. Based on the logistic regression model, preoperative wound bed inflammation (odds ratio [OR] = 11.371,95% CI (3.117, 41.478), P < 0.001) was a statistically significant risk factor for postoperative wound infection. Conclusion Flap type and postoperative wound infection were both independent risk factors influencing the flap survival rate in the foot and ankle. However, postoperative wound infection was a risk factor for the pedicled flap but not for the free flap. Microvascular anastomosis is a major cause of free flap necrosis. To reconstruct complex or wide soft tissue defects of the foot or ankle, free flaps are safer and more reliable than pedicled flaps and should thus be the primary choice. PMID:27930679
Logistic Regression: Concept and Application
ERIC Educational Resources Information Center
Cokluk, Omay
2010-01-01
The main focus of logistic regression analysis is classification of individuals in different groups. The aim of the present study is to explain basic concepts and processes of binary logistic regression analysis intended to determine the combination of independent variables which best explain the membership in certain groups called dichotomous…
NASA Astrophysics Data System (ADS)
Pradhan, Biswajeet
2010-05-01
This paper presents the results of the cross-validation of a multivariate logistic regression model using remote sensing data and GIS for landslide hazard analysis on the Penang, Cameron, and Selangor areas in Malaysia. Landslide locations in the study areas were identified by interpreting aerial photographs and satellite images, supported by field surveys. SPOT 5 and Landsat TM satellite imagery were used to map landcover and vegetation index, respectively. Maps of topography, soil type, lineaments and land cover were constructed from the spatial datasets. Ten factors which influence landslide occurrence, i.e., slope, aspect, curvature, distance from drainage, lithology, distance from lineaments, soil type, landcover, rainfall precipitation, and normalized difference vegetation index (ndvi), were extracted from the spatial database and the logistic regression coefficient of each factor was computed. Then the landslide hazard was analysed using the multivariate logistic regression coefficients derived not only from the data for the respective area but also using the logistic regression coefficients calculated from each of the other two areas (nine hazard maps in all) as a cross-validation of the model. For verification of the model, the results of the analyses were then compared with the field-verified landslide locations. Among the three cases of the application of logistic regression coefficient in the same study area, the case of Selangor based on the Selangor logistic regression coefficients showed the highest accuracy (94%), where as Penang based on the Penang coefficients showed the lowest accuracy (86%). Similarly, among the six cases from the cross application of logistic regression coefficient in other two areas, the case of Selangor based on logistic coefficient of Cameron showed highest (90%) prediction accuracy where as the case of Penang based on the Selangor logistic regression coefficients showed the lowest accuracy (79%). Qualitatively, the cross application model yields reasonable results which can be used for preliminary landslide hazard mapping.
Sabel, Michael S; Rice, John D; Griffith, Kent A; Lowe, Lori; Wong, Sandra L; Chang, Alfred E; Johnson, Timothy M; Taylor, Jeremy M G
2012-01-01
To identify melanoma patients at sufficiently low risk of nodal metastases who could avoid sentinel lymph node biopsy (SLNB), several statistical models have been proposed based upon patient/tumor characteristics, including logistic regression, classification trees, random forests, and support vector machines. We sought to validate recently published models meant to predict sentinel node status. We queried our comprehensive, prospectively collected melanoma database for consecutive melanoma patients undergoing SLNB. Prediction values were estimated based upon four published models, calculating the same reported metrics: negative predictive value (NPV), rate of negative predictions (RNP), and false-negative rate (FNR). Logistic regression performed comparably with our data when considering NPV (89.4 versus 93.6%); however, the model's specificity was not high enough to significantly reduce the rate of biopsies (SLN reduction rate of 2.9%). When applied to our data, the classification tree produced NPV and reduction in biopsy rates that were lower (87.7 versus 94.1 and 29.8 versus 14.3, respectively). Two published models could not be applied to our data due to model complexity and the use of proprietary software. Published models meant to reduce the SLNB rate among patients with melanoma either underperformed when applied to our larger dataset, or could not be validated. Differences in selection criteria and histopathologic interpretation likely resulted in underperformance. Statistical predictive models must be developed in a clinically applicable manner to allow for both validation and ultimately clinical utility.
Soo, Danielle H E; Pendharkar, Sayali A; Jivanji, Chirag J; Gillies, Nicola A; Windsor, John A; Petrov, Maxim S
2017-10-01
Approximately 40% of patients develop abnormal glucose metabolism after a single episode of acute pancreatitis. This study aimed to develop and validate a prediabetes self-assessment screening score for patients after acute pancreatitis. Data from non-overlapping training (n=82) and validation (n=80) cohorts were analysed. Univariate logistic and linear regression identified variables associated with prediabetes after acute pancreatitis. Multivariate logistic regression developed the score, ranging from 0 to 215. The area under the receiver-operating characteristic curve (AUROC), Hosmer-Lemeshow χ 2 statistic, and calibration plots were used to assess model discrimination and calibration. The developed score was validated using data from the validation cohort. The score had an AUROC of 0.88 (95% CI, 0.80-0.97) and Hosmer-Lemeshow χ 2 statistic of 5.75 (p=0.676). Patients with a score of ≥75 had a 94.1% probability of having prediabetes, and were 29 times more likely to have prediabetes than those with a score of <75. The AUROC in the validation cohort was 0.81 (95% CI, 0.70-0.92) and the Hosmer-Lemeshow χ 2 statistic was 5.50 (p=0.599). Model calibration of the score showed good calibration in both cohorts. The developed and validated score, called PERSEUS, is the first instrument to identify individuals who are at high risk of developing abnormal glucose metabolism following an episode of acute pancreatitis. Copyright © 2017 Editrice Gastroenterologica Italiana S.r.l. Published by Elsevier Ltd. All rights reserved.
Sabel, Michael S.; Rice, John D.; Griffith, Kent A.; Lowe, Lori; Wong, Sandra L.; Chang, Alfred E.; Johnson, Timothy M.; Taylor, Jeremy M.G.
2013-01-01
Introduction To identify melanoma patients at sufficiently low risk of nodal metastases who could avoid SLN biopsy (SLNB). Several statistical models have been proposed based upon patient/tumor characteristics, including logistic regression, classification trees, random forests and support vector machines. We sought to validate recently published models meant to predict sentinel node status. Methods We queried our comprehensive, prospectively-collected melanoma database for consecutive melanoma patients undergoing SLNB. Prediction values were estimated based upon 4 published models, calculating the same reported metrics: negative predictive value (NPV), rate of negative predictions (RNP), and false negative rate (FNR). Results Logistic regression performed comparably with our data when considering NPV (89.4% vs. 93.6%); however the model’s specificity was not high enough to significantly reduce the rate of biopsies (SLN reduction rate of 2.9%). When applied to our data, the classification tree produced NPV and reduction in biopsies rates that were lower 87.7% vs. 94.1% and 29.8% vs. 14.3%, respectively. Two published models could not be applied to our data due to model complexity and the use of proprietary software. Conclusions Published models meant to reduce the SLNB rate among patients with melanoma either underperformed when applied to our larger dataset, or could not be validated. Differences in selection criteria and histopathologic interpretation likely resulted in underperformance. Development of statistical predictive models must be created in a clinically applicable manner to allow for both validation and ultimately clinical utility. PMID:21822550
Rupert, Michael G.; Plummer, Niel
2009-01-01
This raster data set delineates the predicted probability of elevated volatile organic compound (VOC) concentrations in groundwater in the Eagle River watershed valley-fill aquifer, Eagle County, North-Central Colorado, 2006-2007. This data set was developed by a cooperative project between the U.S. Geological Survey, Eagle County, the Eagle River Water and Sanitation District, the Town of Eagle, the Town of Gypsum, and the Upper Eagle Regional Water Authority. This project was designed to evaluate potential land-development effects on groundwater and surface-water resources so that informed land-use and water management decisions can be made. This groundwater probability map and its associated probability maps was developed as follows: (1) A point data set of wells with groundwater quality and groundwater age data was overlaid with thematic layers of anthropogenic (related to human activities) and hydrogeologic data by using a geographic information system to assign each well values for depth to groundwater, distance to major streams and canals, distance to gypsum beds, precipitation, soils, and well depth. These data then were downloaded to a statistical software package for analysis by logistic regression. (2) Statistical models predicting the probability of elevated nitrate concentrations, the probability of unmixed young water (using chlorofluorocarbon-11 concentrations and tritium activities), and the probability of elevated volatile organic compound concentrations were developed using logistic regression techniques. (3) The statistical models were entered into a GIS and the probability map was constructed.
Rupert, Michael G.; Plummer, Niel
2009-01-01
This raster data set delineates the predicted probability of elevated nitrate concentrations in groundwater in the Eagle River watershed valley-fill aquifer, Eagle County, North-Central Colorado, 2006-2007. This data set was developed by a cooperative project between the U.S. Geological Survey, Eagle County, the Eagle River Water and Sanitation District, the Town of Eagle, the Town of Gypsum, and the Upper Eagle Regional Water Authority. This project was designed to evaluate potential land-development effects on groundwater and surface-water resources so that informed land-use and water management decisions can be made. This groundwater probability map and its associated probability maps was developed as follows: (1) A point data set of wells with groundwater quality and groundwater age data was overlaid with thematic layers of anthropogenic (related to human activities) and hydrogeologic data by using a geographic information system to assign each well values for depth to groundwater, distance to major streams and canals, distance to gypsum beds, precipitation, soils, and well depth. These data then were downloaded to a statistical software package for analysis by logistic regression. (2) Statistical models predicting the probability of elevated nitrate concentrations, the probability of unmixed young water (using chlorofluorocarbon-11 concentrations and tritium activities), and the probability of elevated volatile organic compound concentrations were developed using logistic regression techniques. (3) The statistical models were entered into a GIS and the probability map was constructed.
HIV/AIDS information by African companies: an empirical analysis.
Barako, Dulacha G; Taplin, Ross H; Brown, Alistair M
2010-01-01
This article investigates the extent of Human Immunodeficiency Virus/Acquired Immune Deficiency Syndrome Disclosures (HIV/AIDSD) in online annual reports by 200 listed companies from 10 African countries for the year ending 2006. Descriptive statistics reveal a very low level of overall HIV/AIDSD practices with a mean of 6 per cent disclosure, with half (100 out of 200) of the African companies making no disclosures at all. Logistic regression analysis reveals that company size and country are highly significant predictors of any disclosure of HIV/AIDS in annual reports. Profitability is also statistically significantly associated with the extent of disclosure.
Comparison of V50 Shot Placement on Final Outcome
2014-11-01
molecular- weight polyethylene (UHMWPE). In V50 testing of those types of materials, large delaminations may occur that influence the results. This...placement, a proper evaluation of materials may not be possible. 15. SUBJECT TERMS ballistics, V50 test, logistic regression , statistical inference...from an impact. While this may work with ceramics or metal armor, it is inappropriate for use on composite armors like ultra-high-molecular- weight
Strayhorn, G
2000-04-01
To determine whether students' performances in a pre-admission program predicted whether participants would (1) apply to medical school, (2) get accepted, and (3) graduate. Using prospectively collected data from participants in the University of North Carolina at Chapel Hill's Medical Education Development Program (MEDP) and data from the Association of American Colleges Student and Applicant Information Management System, the author identified 371 underrepresented minority (URM) students who were full-time participants and completed the program between 1984 and 1989, prior to their acceptance into medical school. Logistic regression analysis was used to determine whether MEDP performance significantly predicted (after statistically controlling for traditional predictors of these outcomes) the proportions of URM participants who applied to medical school and were accepted, the timeliness of graduating, and the proportion graduating. Odds ratios with 95% confidence intervals were calculated to determine the associations between the independent and outcome variables. In separate logistic regression models, MEDP performance predicted the study's outcomes after statistically controlling for traditional predictors with 95% confidence intervals. Pre-admission programs with similar outcomes can improve the diversity of the physician workforce and the access to health care for underrepresented minority and economically disadvantaged populations.
Glass-Kaastra, Shiona K; Pearl, David L; Reid-Smith, Richard J; McEwen, Beverly; Slavic, Durda; Fairles, Jim; McEwen, Scott A
2014-10-01
Susceptibility results for Pasteurella multocida and Streptococcus suis isolated from swine clinical samples were obtained from January 1998 to October 2010 from the Animal Health Laboratory at the University of Guelph, Guelph, Ontario, and used to describe variation in antimicrobial resistance (AMR) to 4 drugs of importance in the Ontario swine industry: ampicillin, tetracycline, tiamulin, and trimethoprim-sulfamethoxazole. Four temporal data-analysis options were used: visualization of trends in 12-month rolling averages, logistic-regression modeling, temporal-scan statistics, and a scan with the "What's strange about recent events?" (WSARE) algorithm. The AMR trends varied among the antimicrobial drugs for a single pathogen and between pathogens for a single antimicrobial, suggesting that pathogen-specific AMR surveillance may be preferable to indicator data. The 4 methods provided complementary and, at times, redundant results. The most appropriate combination of analysis methods for surveillance using these data included temporal-scan statistics with a visualization method (rolling-average or predicted-probability plots following logistic-regression models). The WSARE algorithm provided interesting results for quality control and has the potential to detect new resistance patterns; however, missing data created problems for displaying the results in a way that would be meaningful to all surveillance stakeholders.
Glass-Kaastra, Shiona K.; Pearl, David L.; Reid-Smith, Richard J.; McEwen, Beverly; Slavic, Durda; Fairles, Jim; McEwen, Scott A.
2014-01-01
Susceptibility results for Pasteurella multocida and Streptococcus suis isolated from swine clinical samples were obtained from January 1998 to October 2010 from the Animal Health Laboratory at the University of Guelph, Guelph, Ontario, and used to describe variation in antimicrobial resistance (AMR) to 4 drugs of importance in the Ontario swine industry: ampicillin, tetracycline, tiamulin, and trimethoprim–sulfamethoxazole. Four temporal data-analysis options were used: visualization of trends in 12-month rolling averages, logistic-regression modeling, temporal-scan statistics, and a scan with the “What’s strange about recent events?” (WSARE) algorithm. The AMR trends varied among the antimicrobial drugs for a single pathogen and between pathogens for a single antimicrobial, suggesting that pathogen-specific AMR surveillance may be preferable to indicator data. The 4 methods provided complementary and, at times, redundant results. The most appropriate combination of analysis methods for surveillance using these data included temporal-scan statistics with a visualization method (rolling-average or predicted-probability plots following logistic-regression models). The WSARE algorithm provided interesting results for quality control and has the potential to detect new resistance patterns; however, missing data created problems for displaying the results in a way that would be meaningful to all surveillance stakeholders. PMID:25355992
Radiation Exposure and Mortality from Cardiovascular Disease and Cancer in Early NASA Astronauts.
Elgart, S Robin; Little, Mark P; Chappell, Lori J; Milder, Caitlin M; Shavers, Mark R; Huff, Janice L; Patel, Zarana S
2018-05-31
Understanding space radiation health effects is critical due to potential increased morbidity and mortality following spaceflight. We evaluated whether there is evidence for excess cardiovascular disease or cancer mortality in early NASA astronauts and if a correlation exists between space radiation exposure and mortality. Astronauts selected from 1959-1969 were included and followed until death or February 2017, with 39 of 73 individuals still alive at that time. Calculated standardized mortality rates for tested outcomes were significantly below U.S. white male population rates, including all-cardiovascular disease (n = 7, SMR = 33; 95% CI, 14-65) and all-cancer (n = 7, SMR = 43; 95% CI, 18-83), as anticipated in a healthy worker population. Space radiation doses for cohort members ranged from 0-78 mGy. No significant associations between space radiation dose and mortality were found using logistic regression with an internal reference group, adjusting for medical radiation. Statistical power of the logistic regression was <6%, remaining <12% even when expected risk level or observed deaths were assumed to be 10 times higher than currently reported. While no excess radiation-associated cardiovascular or cancer mortality risk was observed, findings must be tempered by the statistical limitations of this cohort; notwithstanding, this small unique cohort provides a foundation for assessment of astronaut health.
NASA Astrophysics Data System (ADS)
Guntur, R. D.; Lobo, M.
2017-02-01
A research has been carried out to investigate the characteristics of reasons for DOSC and to determine the statistical model explaining factors which influence on the DOSC in the age group 7 - 18 years in East Nusa Tenggara (ENT) Province. Primary data of out of school children had been collected throughout interviews using prepared questionnaires in three selected districts. Data was then analysed using descriptive and logistic regression method. The analysis shows that from the 341 samples, there were 194DOSC. The majority of them were males, lived in the countryside, had farmer parents, had family size of 5, and had mothers with only primary education level. The main reasons of children to drop out from the primary and junior education levels were the inabilities of paying the school fees and the willingness to work in the farms to help their parents. For senior education level, it was because of the unaffordable school tuitions and no desire of children in having good education. Both partial and simultaneous parameter tests in the logistic regression model show that children who lived in countryside, from poor families, males were the three factors that significantly affected the number of DOSC in the group age with odds ratio values 2.48; 2.37; 1.97 respectively.
Aging, not menopause, is associated with higher prevalence of hyperuricemia among older women.
Krishnan, Eswar; Bennett, Mihoko; Chen, Linjun
2014-11-01
This work aims to study the associations, if any, of hyperuricemia, gout, and menopause status in the US population. Using multiyear data from the National Health and Nutrition Examination Survey, we performed unmatched comparisons and one to three age-matched comparisons of women aged 20 to 70 years with and without hyperuricemia (serum urate ≥6 mg/dL). Analyses were performed using survey-weighted multiple logistic regression and conditional logistic regression, respectively. Overall, there were 1,477 women with hyperuricemia. Age and serum urate were significantly correlated. In unmatched analyses (n = 9,573 controls), postmenopausal women were older, were heavier, and had higher prevalence of renal impairment, hypertension, diabetes, and hyperlipidemia. In multivariable regression, after accounting for age, body mass index, glomerular filtration rate, and diuretic use, menopause was associated with hyperuricemia (odds ratio, 1.36; 95% CI, 1.05-1.76; P = 0.002). In corresponding multivariable regression using age-matched data (n = 4,431 controls), the odds ratio for menopause was 0.94 (95% CI, 0.83-1.06). Current use of hormone therapy was not associated with prevalent hyperuricemia in both unmatched and matched analyses. Age is a better statistical explanation for the higher prevalence of hyperuricemia among older women than menopause status.
Utility-Weighted Modified Rankin Scale as Primary Outcome in Stroke Trials
Voormolen, Daphne C.; Venema, Esmee; Roozenbeek, Bob; Polinder, Suzanne; Haagsma, Juanita A.; Nieboer, Daan; Chalos, Vicky; Yoo, Albert J.; Schreuders, Jennifer; van der Lugt, Aad; Majoie, Charles B.L.M.; Roos, Yvo B.W.E.M.; van Zwam, Wim H.; van Oostenbrugge, Robert J.; Steyerberg, Ewout W.; Dippel, Diederik W.J.; Lingsma, Hester F.
2018-01-01
Background and Purpose— The utility-weighted modified Rankin Scale (UW-mRS) has been proposed as a new patient-centered primary outcome in stroke trials. We aimed to describe utility weights for the mRS health states and to evaluate the statistical efficiency of the UW-mRS to detect treatment effects in stroke intervention trials. Methods— We used data of the 500 patients enrolled in the MR CLEAN (Multicenter Randomized Clinical Trial of Endovascular Treatment for Acute Ischemic Stroke in the Netherlands). Utility values were elicited from the EuroQol Group 5-Dimension Self-Report Questionnaire assessed at 90 days after inclusion, simultaneously with the mRS. Utility weights were determined by averaging the utilities of all patients within each mRS category. We performed simulations to evaluate statistical efficiency. The simulated treatment effect was an odds ratio of 1.65 in favor of the treatment arm, similar for all mRS cutoffs. This treatment effect was analyzed using 3 approaches: linear regression with the UW-mRS as outcome, binary logistic regression with a dichotomized mRS (0–1/2–6, 0–2/3–6, and 0–4/5–6), and proportional odds logistic regression with the ordinal mRS. The statistical power of the 3 approaches was expressed as the proportion of 10 000 simulations that resulted in a statistically significant treatment effect (P≤0.05). Results— The mean utility values (SD) for mRS categories 0 to 6 were: 0.95 (0.08), 0.93 (0.13), 0.83 (0.21), 0.62 (0.27), 0.42 (0.28), 0.11 (0.28), and 0 (0), respectively, but varied substantially between individual patients within each category. The UW-mRS approach was more efficient than the dichotomous approach (power 85% versus 71%) but less efficient than the ordinal approach (power 85% versus 87%). Conclusions— The UW-mRS as primary outcome does not capture individual variation in utility values and may reduce the statistical power of a randomized trial. PMID:29535271
An Entropy-Based Measure for Assessing Fuzziness in Logistic Regression
Weiss, Brandi A.; Dardick, William
2015-01-01
This article introduces an entropy-based measure of data–model fit that can be used to assess the quality of logistic regression models. Entropy has previously been used in mixture-modeling to quantify how well individuals are classified into latent classes. The current study proposes the use of entropy for logistic regression models to quantify the quality of classification and separation of group membership. Entropy complements preexisting measures of data–model fit and provides unique information not contained in other measures. Hypothetical data scenarios, an applied example, and Monte Carlo simulation results are used to demonstrate the application of entropy in logistic regression. Entropy should be used in conjunction with other measures of data–model fit to assess how well logistic regression models classify cases into observed categories. PMID:29795897
Large unbalanced credit scoring using Lasso-logistic regression ensemble.
Wang, Hong; Xu, Qingsong; Zhou, Lifeng
2015-01-01
Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data.
An Entropy-Based Measure for Assessing Fuzziness in Logistic Regression.
Weiss, Brandi A; Dardick, William
2016-12-01
This article introduces an entropy-based measure of data-model fit that can be used to assess the quality of logistic regression models. Entropy has previously been used in mixture-modeling to quantify how well individuals are classified into latent classes. The current study proposes the use of entropy for logistic regression models to quantify the quality of classification and separation of group membership. Entropy complements preexisting measures of data-model fit and provides unique information not contained in other measures. Hypothetical data scenarios, an applied example, and Monte Carlo simulation results are used to demonstrate the application of entropy in logistic regression. Entropy should be used in conjunction with other measures of data-model fit to assess how well logistic regression models classify cases into observed categories.
Didarloo, Alireza; Nabilou, Bahram; Khalkhali, Hamid Reza
2017-11-03
Breast cancer is a life-threatening condition affecting women around the world. The early detection of breast lumps using a breast self-examination (BSE) is important for the prevention and control of this disease. The aim of this study was to examine BSE behavior and its predictive factors among female university students using the Health Belief Model (HBM). This investigation was a cross-sectional survey carried out with 334 female students at Urmia University of Medical Sciences in the northwest of Iran. To collect the necessary data, researchers applied a valid and reliable three-part questionnaire. The data were analyzed using descriptive statistics and a chi-square test, in addition to multivariate logistic regression statistics in SPSS software version 16.0 (SPSS Inc., Chicago, IL, USA). The results indicated that 82 of the 334 participants (24.6%) reported practicing BSEs. Multivariate logistic regression analyses showed that high perceived severity [OR = 2.38, 95% CI = (1.02-5.54)], high perceived benefits [OR = 1.94, 95% CI = (1.09-3.46)], and high perceived self-efficacy [OR = 13.15, 95% CI = (3.64-47.51)] were better predictors of BSE behavior (P < 0.05) than low perceived severity, benefits, and self-efficacy. The findings also showed that a high level of knowledge compared to a low level of knowledge [OR = 5.51, 95% CI = (1.79-16.86)] and academic undergraduate and graduate degrees compared to doctoral degrees [OR = 2.90, 95% CI = (1.42-5.92)] of the participants were predictors of BSE performance (P < 0.05). The study revealed that the HBM constructs are able to predict BSE behavior. Among these constructs, self-efficacy was the most important predictor of the behavior. Interventions based on the constructs of perceived self-efficacy, benefits, and severity are recommended for increasing women's regular screening for breast cancer.
Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning
ERIC Educational Resources Information Center
Li, Zhushan
2014-01-01
Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…
A Methodology for Generating Placement Rules that Utilizes Logistic Regression
ERIC Educational Resources Information Center
Wurtz, Keith
2008-01-01
The purpose of this article is to provide the necessary tools for institutional researchers to conduct a logistic regression analysis and interpret the results. Aspects of the logistic regression procedure that are necessary to evaluate models are presented and discussed with an emphasis on cutoff values and choosing the appropriate number of…
John Hogland; Nedret Billor; Nathaniel Anderson
2013-01-01
Discriminant analysis, referred to as maximum likelihood classification within popular remote sensing software packages, is a common supervised technique used by analysts. Polytomous logistic regression (PLR), also referred to as multinomial logistic regression, is an alternative classification approach that is less restrictive, more flexible, and easy to interpret. To...
Providing written language services in the schools: the time is now.
Fallon, Karen A; Katz, Lauren A
2011-01-01
The current study was conducted to investigate the provision of written language services by school-based speech-language pathologists (SLPs). Specifically, the study examined SLPs' knowledge, attitudes, and collaborative practices in the area of written language services as well as the variables that impact provision of these services. Public school-based SLPs from across the country were solicited for participation in an online, Web-based survey. Data from 645 full-time SLPs from 49 states were evaluated using descriptive statistics and logistic regression. Many school-based SLPs reported not providing any services in the area of written language to students with written language weaknesses. Knowledge, attitudes, and collaborative practices were mixed. A logistic regression revealed three variables likely to predict high levels of service provision in the area of written language. Data from the current study revealed that many struggling readers and writers on school-based SLPs' caseloads are not receiving services from their SLPs. Implications for SLPs' preservice preparation, continuing education, and doctoral preparation are discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hooman, A.; Mohammadzadeh, M
Some medical and epidemiological surveys have been designed to predict a nominal response variable with several levels. With regard to the type of pregnancy there are four possible states: wanted, unwanted by wife, unwanted by husband and unwanted by couple. In this paper, we have predicted the type of pregnancy, as well as the factors influencing it using three different models and comparing them. Regarding the type of pregnancy with several levels, we developed a multinomial logistic regression, a neural network and a flexible discrimination based on the data and compared their results using tow statistical indices: Surface under curvemore » (ROC) and kappa coefficient. Based on these tow indices, flexible discrimination proved to be a better fit for prediction on data in comparison to other methods. When the relations among variables are complex, one can use flexible discrimination instead of multinomial logistic regression and neural network to predict the nominal response variables with several levels in order to gain more accurate predictions.« less
Hayes, Andrew F; Matthes, Jörg
2009-08-01
Researchers often hypothesize moderated effects, in which the effect of an independent variable on an outcome variable depends on the value of a moderator variable. Such an effect reveals itself statistically as an interaction between the independent and moderator variables in a model of the outcome variable. When an interaction is found, it is important to probe the interaction, for theories and hypotheses often predict not just interaction but a specific pattern of effects of the focal independent variable as a function of the moderator. This article describes the familiar pick-a-point approach and the much less familiar Johnson-Neyman technique for probing interactions in linear models and introduces macros for SPSS and SAS to simplify the computations and facilitate the probing of interactions in ordinary least squares and logistic regression. A script version of the SPSS macro is also available for users who prefer a point-and-click user interface rather than command syntax.
Avoiding overstating the strength of forensic evidence: Shrunk likelihood ratios/Bayes factors.
Morrison, Geoffrey Stewart; Poh, Norman
2018-05-01
When strength of forensic evidence is quantified using sample data and statistical models, a concern may be raised as to whether the output of a model overestimates the strength of evidence. This is particularly the case when the amount of sample data is small, and hence sampling variability is high. This concern is related to concern about precision. This paper describes, explores, and tests three procedures which shrink the value of the likelihood ratio or Bayes factor toward the neutral value of one. The procedures are: (1) a Bayesian procedure with uninformative priors, (2) use of empirical lower and upper bounds (ELUB), and (3) a novel form of regularized logistic regression. As a benchmark, they are compared with linear discriminant analysis, and in some instances with non-regularized logistic regression. The behaviours of the procedures are explored using Monte Carlo simulated data, and tested on real data from comparisons of voice recordings, face images, and glass fragments. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
Uchino, Makoto; Hirano, Teruyuki; Satoh, Hiroshi; Arimura, Kimiyoshi; Nakagawa, Masanori; Wakamiya, Jyunji
2005-01-01
Minamata disease (MD) was caused by ingestion of seafood from the methylmercury-contaminated areas. Although 50 years have passed since the discovery of MD, there have been only a few studies on the temporal profile of neurological findings in certified MD patients. Thus, we evaluated changes in neurological symptoms and signs of MD using discriminants by multiple logistic regression analysis. The severity of predictive index declined in 25 years in most of the patients. Only a few patients showed aggravation of neurological findings, which was due to complications such as spino-cerebellar degeneration. Patients with chronic MD aged over 45 years had several concomitant diseases so that their clinical pictures were complicated. It was difficult to differentiate chronic MD using statistically established discriminants based on sensory disturbance alone. In conclusion, the severity of MD declined in 25 years along with the modification by age-related concomitant disorders.
NASA Astrophysics Data System (ADS)
Xu, Wenbo; Jing, Shaocai; Yu, Wenjuan; Wang, Zhaoxian; Zhang, Guoping; Huang, Jianxi
2013-11-01
In this study, the high risk areas of Sichuan Province with debris flow, Panzhihua and Liangshan Yi Autonomous Prefecture, were taken as the studied areas. By using rainfall and environmental factors as the predictors and based on the different prior probability combinations of debris flows, the prediction of debris flows was compared in the areas with statistical methods: logistic regression (LR) and Bayes discriminant analysis (BDA). The results through the comprehensive analysis show that (a) with the mid-range scale prior probability, the overall predicting accuracy of BDA is higher than those of LR; (b) with equal and extreme prior probabilities, the overall predicting accuracy of LR is higher than those of BDA; (c) the regional predicting models of debris flows with rainfall factors only have worse performance than those introduced environmental factors, and the predicting accuracies of occurrence and nonoccurrence of debris flows have been changed in the opposite direction as the supplemented information.
Pearl, D L; Louie, M; Chui, L; Doré, K; Grimsrud, K M; Martin, S W; Michel, P; Svenson, L W; McEwen, S A
2008-04-01
Using multivariable models, we compared whether there were significant differences between reported outbreak and sporadic cases in terms of their sex, age, and mode and site of disease transmission. We also determined the potential role of administrative, temporal, and spatial factors within these models. We compared a variety of approaches to account for clustering of cases in outbreaks including weighted logistic regression, random effects models, general estimating equations, robust variance estimates, and the random selection of one case from each outbreak. Age and mode of transmission were the only epidemiologically and statistically significant covariates in our final models using the above approaches. Weighing observations in a logistic regression model by the inverse of their outbreak size appeared to be a relatively robust and valid means for modelling these data. Some analytical techniques, designed to account for clustering, had difficulty converging or producing realistic measures of association.
Risk Factors for Developing Scoliosis in Cerebral Palsy: A Cross-Sectional Descriptive Study.
Bertoncelli, Carlo M; Solla, Federico; Loughenbury, Peter R; Tsirikos, Athanasios I; Bertoncelli, Domenico; Rampal, Virginie
2017-06-01
This study aims to identify the risk factors leading to the development of severe scoliosis among children with cerebral palsy. A cross-sectional descriptive study of 70 children (aged 12-18 years) with severe spastic and/or dystonic cerebral palsy treated in a single specialist unit is described. Statistical analysis included Fisher exact test and logistic regression analysis to identify risk factors. Severe scoliosis is more likely to occur in patients with intractable epilepsy ( P = .008), poor gross motor functional assessment scores ( P = .018), limb spasticity ( P = .045), a history of previous hip surgery ( P = .048), and nonambulatory patients ( P = .013). Logistic regression model confirms the major risk factors are previous hip surgery ( P = .001), moderate to severe epilepsy ( P = .007), and female gender ( P = .03). History of previous hip surgery, intractable epilepsy, and female gender are predictors of developing severe scoliosis in children with cerebral palsy. This knowledge should aid in the early diagnosis of scoliosis and timely referral to specialist services.
Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble
Wang, Hong; Xu, Qingsong; Zhou, Lifeng
2015-01-01
Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data. PMID:25706988
Henrard, S; Speybroeck, N; Hermans, C
2015-11-01
Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
Viewing health expenditures, payment and coping mechanisms with an equity lens in Nigeria
2013-01-01
Background This paper examines socio-economic and geographic differences in payment and payment coping mechanisms for health services in southeast Nigeria. It shows the extent to which the poor and rural dwellers disproportionally bear the burden of health care costs and offers policy recommendations for improvements. Methods Questionnaires were used to collect data from 3071 randomly selected households in six communities in southeast Nigeria using a four week recall. The sample was divided into quintiles (Q1-Q5) using a socio-economic status (SES) index as well as into geographic groups (rural, peri-urban and urban). Tabulations and logistic regression were used to determine the relationships between payment and payment coping mechanisms and key independent variables. Q1/Q5 and rural/urban ratios were the measures of equity. Results Most of the respondents used out-of-pocket spending (OOPS) and own money to pay for healthcare. There was statistically significant geographic differences in the use of own money to pay for health services indicating more use among rural dwellers. Logistic regression showed statistically significant geographic differences in the use of both OOPS and own money when controlling for the effects of potential cofounders. Conclusions This study shows statistically significant geographic differences in the use of OOPS and own money to pay for health services. Though the SES differences were not statistically significant, they showed high equity ratios indicating more use among poor and rural dwellers. The high expenditure incurred on drugs alone highlights the need for expediting pro-poor interventions like exemptions and waivers aimed at improving access to health care for the vulnerable poor and rural dwellers. PMID:23497246
An Entropy-Based Measure for Assessing Fuzziness in Logistic Regression
ERIC Educational Resources Information Center
Weiss, Brandi A.; Dardick, William
2016-01-01
This article introduces an entropy-based measure of data-model fit that can be used to assess the quality of logistic regression models. Entropy has previously been used in mixture-modeling to quantify how well individuals are classified into latent classes. The current study proposes the use of entropy for logistic regression models to quantify…
What Are the Odds of that? A Primer on Understanding Logistic Regression
ERIC Educational Resources Information Center
Huang, Francis L.; Moon, Tonya R.
2013-01-01
The purpose of this Methodological Brief is to present a brief primer on logistic regression, a commonly used technique when modeling dichotomous outcomes. Using data from the National Education Longitudinal Study of 1988 (NELS:88), logistic regression techniques were used to investigate student-level variables in eighth grade (i.e., enrolled in a…
Cade, Brian S.; Noon, Barry R.; Scherer, Rick D.; Keane, John J.
2017-01-01
Counts of avian fledglings, nestlings, or clutch size that are bounded below by zero and above by some small integer form a discrete random variable distribution that is not approximated well by conventional parametric count distributions such as the Poisson or negative binomial. We developed a logistic quantile regression model to provide estimates of the empirical conditional distribution of a bounded discrete random variable. The logistic quantile regression model requires that counts are randomly jittered to a continuous random variable, logit transformed to bound them between specified lower and upper values, then estimated in conventional linear quantile regression, repeating the 3 steps and averaging estimates. Back-transformation to the original discrete scale relies on the fact that quantiles are equivariant to monotonic transformations. We demonstrate this statistical procedure by modeling 20 years of California Spotted Owl fledgling production (0−3 per territory) on the Lassen National Forest, California, USA, as related to climate, demographic, and landscape habitat characteristics at territories. Spotted Owl fledgling counts increased nonlinearly with decreasing precipitation in the early nesting period, in the winter prior to nesting, and in the prior growing season; with increasing minimum temperatures in the early nesting period; with adult compared to subadult parents; when there was no fledgling production in the prior year; and when percentage of the landscape surrounding nesting sites (202 ha) with trees ≥25 m height increased. Changes in production were primarily driven by changes in the proportion of territories with 2 or 3 fledglings. Average variances of the discrete cumulative distributions of the estimated fledgling counts indicated that temporal changes in climate and parent age class explained 18% of the annual variance in owl fledgling production, which was 34% of the total variance. Prior fledgling production explained as much of the variance in the fledgling counts as climate, parent age class, and landscape habitat predictors. Our logistic quantile regression model can be used for any discrete response variables with fixed upper and lower bounds.
On the Usefulness of a Multilevel Logistic Regression Approach to Person-Fit Analysis
ERIC Educational Resources Information Center
Conijn, Judith M.; Emons, Wilco H. M.; van Assen, Marcel A. L. M.; Sijtsma, Klaas
2011-01-01
The logistic person response function (PRF) models the probability of a correct response as a function of the item locations. Reise (2000) proposed to use the slope parameter of the logistic PRF as a person-fit measure. He reformulated the logistic PRF model as a multilevel logistic regression model and estimated the PRF parameters from this…
Guo, Yanyong; Li, Zhibin; Wu, Yao; Xu, Chengcheng
2018-06-01
Bicyclists running the red light at crossing facilities increase the potential of colliding with motor vehicles. Exploring the contributing factors could improve the prediction of running red-light probability and develop countermeasures to reduce such behaviors. However, individuals could have unobserved heterogeneities in running a red light, which make the accurate prediction more challenging. Traditional models assume that factor parameters are fixed and cannot capture the varying impacts on red-light running behaviors. In this study, we employed the full Bayesian random parameters logistic regression approach to account for the unobserved heterogeneous effects. Two types of crossing facilities were considered which were the signalized intersection crosswalks and the road segment crosswalks. Electric and conventional bikes were distinguished in the modeling. Data were collected from 16 crosswalks in urban area of Nanjing, China. Factors such as individual characteristics, road geometric design, environmental features, and traffic variables were examined. Model comparison indicates that the full Bayesian random parameters logistic regression approach is statistically superior to the standard logistic regression model. More red-light runners are predicted at signalized intersection crosswalks than at road segment crosswalks. Factors affecting red-light running behaviors are gender, age, bike type, road width, presence of raised median, separation width, signal type, green ratio, bike and vehicle volume, and average vehicle speed. Factors associated with the unobserved heterogeneity are gender, bike type, signal type, separation width, and bike volume. Copyright © 2018 Elsevier Ltd. All rights reserved.
Rhodes, Darson L; Kirchofer, Gregg; Hammig, Bart J; Ogletree, Roberta J
2013-05-01
This study examined the impact of professional preparation and class structure on sexuality topics taught and use of practice-based instructional strategies in US middle and high school health classes. Data from the classroom-level file of the 2006 School Health Policies and Programs were used. A series of multivariable logistic regression models were employed to determine if sexuality content taught was dependent on professional preparation and /or class structure (HE only versus HE/another subject combined). Additional multivariable logistic regression models were employed to determine if use of practice-based instructional strategies was dependent upon professional preparation and/or class structure. Years of teaching health topics and size of the school district were included as covariates in the multivariable logistic regression models. Findings indicated professionally prepared health educators were significantly more likely to teach 7 of the 13 sexuality topics as compared to nonprofessionally prepared health educators. There was no statistically significant difference in the instructional strategies used by professionally prepared and nonprofessionally prepared health educators. Exclusively health education classes versus combined classes were significantly more likely to have included 6 of the 13 topics and to have incorporated practice-based instructional strategies in the curricula. This study indicated professional preparation and class structure impacted sexuality content taught. Class structure also impacted whether opportunities for students to practice skills were made available. Results support the need for continued advocacy for professionally prepared health educators and health only courses. © 2013, American School Health Association.
Raines, G.L.; Mihalasky, M.J.
2002-01-01
The U.S. Geological Survey (USGS) is proposing to conduct a global mineral-resource assessment using geologic maps, significant deposits, and exploration history as minimal data requirements. Using a geologic map and locations of significant pluton-related deposits, the pluton-related-deposit tract maps from the USGS national mineral-resource assessment have been reproduced with GIS-based analysis and modeling techniques. Agreement, kappa, and Jaccard's C correlation statistics between the expert USGS and calculated tract maps of 87%, 40%, and 28%, respectively, have been achieved using a combination of weights-of-evidence and weighted logistic regression methods. Between the experts' and calculated maps, the ranking of states measured by total permissive area correlates at 84%. The disagreement between the experts and calculated results can be explained primarily by tracts defined by geophysical evidence not considered in the calculations, generalization of tracts by the experts, differences in map scales, and the experts' inclusion of large tracts that are arguably not permissive. This analysis shows that tracts for regional mineral-resource assessment approximating those delineated by USGS experts can be calculated using weights of evidence and weighted logistic regression, a geologic map, and the location of significant deposits. Weights of evidence and weighted logistic regression applied to a global geologic map could provide quickly a useful reconnaissance definition of tracts for mineral assessment that is tied to the data and is reproducible. ?? 2002 International Association for Mathematical Geology.
[Use of data display screens and ocular hypertension in local public sector workers].
Abellán Torró, Rosana; Merelles Tormo, Antoni
2014-01-01
The main objective of this study is to examine the association between work with data display screens (DDS) and ocular hypertension (OHT). A cross-sectional study among local public sector workers (Diputación Provincial de Valencia). Data from 620 people were collected over 25 months, from periodic medical examinations performed at an occupational health unit. Intraocular pressure (IOP) was obtained with a portable puff tonometer validated for screening, establishing the cut-off point for OHT at 22 mmHg. Both biological characteristics and other work-related variables were taken into account as covariates. Descriptive statistics of the data were obtained, together with nonparametric tests with a level of significance of 95% and logistic regression with p 〈0.1 as the level of significance of the likelihood test. The average age of the study population is 52.8 years. The prevalence of OHT was 3.5% (5.1% among men and 1.2% among women; p=0.012). No significant associations were found between hours of DDS-related work and OHT were found (p=0.395). Logistic regression corroborated the association between gender and OHT, with women less affected (OR = 0.234; 95%CI: 0.068 - 0.799; p=0.020). In our study, no associations were found between time of exposure to data display screens and ocular hypertension. Logistic regression points to a certain association between ocular hypertension and gender, with men being more predisposed. Copyright belongs to the Societat Catalana de Salut Laboral.
Stata Modules for Calculating Novel Predictive Performance Indices for Logistic Models.
Barkhordari, Mahnaz; Padyab, Mojgan; Hadaegh, Farzad; Azizi, Fereidoun; Bozorgmanesh, Mohammadreza
2016-01-01
Prediction is a fundamental part of prevention of cardiovascular diseases (CVD). The development of prediction algorithms based on the multivariate regression models loomed several decades ago. Parallel with predictive models development, biomarker researches emerged in an impressively great scale. The key question is how best to assess and quantify the improvement in risk prediction offered by new biomarkers or more basically how to assess the performance of a risk prediction model. Discrimination, calibration, and added predictive value have been recently suggested to be used while comparing the predictive performances of the predictive models' with and without novel biomarkers. Lack of user-friendly statistical software has restricted implementation of novel model assessment methods while examining novel biomarkers. We intended, thus, to develop a user-friendly software that could be used by researchers with few programming skills. We have written a Stata command that is intended to help researchers obtain cut point-free and cut point-based net reclassification improvement index and (NRI) and relative and absolute Integrated discriminatory improvement index (IDI) for logistic-based regression analyses.We applied the commands to a real data on women participating the Tehran lipid and glucose study (TLGS) to examine if information of a family history of premature CVD, waist circumference, and fasting plasma glucose can improve predictive performance of the Framingham's "general CVD risk" algorithm. The command is addpred for logistic regression models. The Stata package provided herein can encourage the use of novel methods in examining predictive capacity of ever-emerging plethora of novel biomarkers.
Inferring microhabitat preferences of Lilium catesbaei (Liliaceae).
Sommers, Kristen Penney; Elswick, Michael; Herrick, Gabriel I; Fox, Gordon A
2011-05-01
Microhabitat studies use varied statistical methods, some treating site occupancy as a dependent and others as an independent variable. Using the rare Lilium catesbaei as an example, we show why approaches to testing hypotheses of differences between occupied and unoccupied sites can lead to erroneous conclusions about habitat preferences. Predictive approaches like logistic regression can better lead to understanding of habitat requirements. Using 32 lily locations and 30 random locations >2 m from a lily (complete data: 31 lily and 28 random spots), we measured physical conditions--photosynthetically active radiation (PAR), canopy cover, litter depth, distance to and height of nearest shrub, and soil moisture--and number and identity of neighboring plants. Twelve lilies were used to estimate a photosynthetic assimilation curve. Analyses used logistic regression, discriminant function analysis (DFA), (multivariate) analysis of variance, and resampled Wilcoxon tests. Logistic regression and DFA found identical predictors of presence (PAR, canopy cover, distance to shrub, litter), but hypothesis tests pointed to a different set (PAR, litter, canopy cover, height of nearest shrub). Lilies are mainly in high-PAR spots, often close to light saturation. By contrast, PAR in random spots was often near the lily light compensation point. Lilies were near Serenoa repens less than at random; otherwise, neighbor identity had no significant effect. Predictive methods are more useful in this context than the hypothesis tests. Light availability plays a big role in lily presence, which may help to explain increases in flowering and emergence after fire and roller-chopping.
Peng, Yong; Peng, Shuangling; Wang, Xinghua; Tan, Shiyang
2018-06-01
This study aims to identify the effects of characteristics of vehicle, roadway, driver, and environment on fatality of drivers in vehicle-fixed object accidents on expressways in Changsha-Zhuzhou-Xiangtan district of Hunan province in China by developing multinomial logistic regression models. For this purpose, 121 vehicle-fixed object accidents from 2011-2017 are included in the modeling process. First, descriptive statistical analysis is made to understand the main characteristics of the vehicle-fixed object crashes. Then, 19 explanatory variables are selected, and correlation analysis of each two variables is conducted to choose the variables to be concluded. Finally, five multinomial logistic regression models including different independent variables are compared, and the model with best fitting and prediction capability is chosen as the final model. The results showed that the turning direction in avoiding fixed objects raised the possibility that drivers would die. About 64% of drivers died in the accident were found being ejected out of the car, of which 50% did not use a seatbelt before the fatal accidents. Drivers are likely to die when they encounter bad weather on the expressway. Drivers with less than 10 years of driving experience are more likely to die in these accidents. Fatigue or distracted driving is also a significant factor in fatality of drivers. Findings from this research provide an insight into reducing fatality of drivers in vehicle-fixed object accidents.
Mocellin, Simone; Thompson, John F; Pasquali, Sandro; Montesco, Maria C; Pilati, Pierluigi; Nitti, Donato; Saw, Robyn P; Scolyer, Richard A; Stretch, Jonathan R; Rossi, Carlo R
2009-12-01
To improve selection for sentinel node (SN) biopsy (SNB) in patients with cutaneous melanoma using statistical models predicting SN status. About 80% of patients currently undergoing SNB are node negative. In the absence of conclusive evidence of a SNBassociated survival benefit, these patients may be over-treated. Here, we tested the efficiency of 4 different models in predicting SN status. The clinicopathologic data (age, gender, tumor thickness, Clark level, regression, ulceration, histologic subtype, and mitotic index) of 1132 melanoma patients who had undergone SNB at institutions in Italy and Australia were analyzed. Logistic regression, classification tree, random forest, and support vector machine models were fitted to the data. The predictive models were built with the aim of maximizing the negative predictive value (NPV) and reducing the rate of SNB procedures though minimizing the error rate. After cross-validation logistic regression, classification tree, random forest, and support vector machine predictive models obtained clinically relevant NPV (93.6%, 94.0%, 97.1%, and 93.0%, respectively), SNB reduction (27.5%, 29.8%, 18.2%, and 30.1%, respectively), and error rates (1.8%, 1.8%, 0.5%, and 2.1%, respectively). Using commonly available clinicopathologic variables, predictive models can preoperatively identify a proportion of patients ( approximately 25%) who might be spared SNB, with an acceptable (1%-2%) error. If validated in large prospective series, these models might be implemented in the clinical setting for improved patient selection, which ultimately would lead to better quality of life for patients and optimization of resource allocation for the health care system.
NASA Astrophysics Data System (ADS)
Nanus, L.; Campbell, D. H.; Williams, M. W.
2004-12-01
Acidification of high-elevation lakes in the Western United States is of concern because of the storage and release of pollutants in snowmelt runoff combined with steep topography, granitic bedrock, and limited soils and biota. Land use managers have limited resources for sampling and thus need direction on how best to design monitoring programs. We evaluated the sensitivity of 400 lakes in Grand Teton (GRTE) and Yellowstone (YELL) National Parks to acidification from atmospheric deposition of nitrogen and sulfur based on statistical relations between acid-neutralizing capacity (ANC) concentrations and basin characteristics to aid in the design of a long-term monitoring plan for Outstanding Natural Resource Waters. ANC concentrations that were measured at 52 lakes in GRTE and 23 lakes in YELL during synoptic surveys were used to calibrate the statistical models. Basin-characteristic information was derived from Geographic Information System data sets. The explanatory variables that were considered included bedrock type, basin slope, basin aspect, basin elevation, lake area, basin area, inorganic nitrogen (N) deposition, sulfate deposition, hydrogen ion deposition, basin precipitation, soil type, and vegetation type. A logistic regression model was developed and applied to lake basins greater than 1 hectare (ha) in GRTE (n=106) and YELL (n=294). For GRTE, 36 percent of lakes had a greater than 60-percent probability of having ANC concentrations less than 100 microequivalents per liter, and 14 percent of lakes had a greater than 80-percent probability of having ANC concentrations less than 100 microequivalents per liter. The elevation of the lake outlet and the area of the basin with northeast aspects were determined to be statistically significant and were used as the explanatory variables in the multivariate logistic regression model. For YELL, results indicated that 13 percent of lakes had a greater than 60-percent probability of having ANC concentrations less than 100 microequivalents per liter, and 9 percent of lakes had a greater than 80-percent probability of having ANC concentrations less than 100 microequivalents per liter. Only the elevation of the lake outlet was determined to be statistically significant and was used as the explanatory variable in the multivariate logistic regression model. The lakes that exceeded 80-percent probability of having an ANC concentration less than 100 microequivalents per liter, and therefore had the greatest sensitivity to acidification from atmospheric deposition, are located at elevations greater than 2,810 meters (m) in GRTE, and greater than 2,655 m in YELL.
NASA Astrophysics Data System (ADS)
Martínez-Fernández, J.; Chuvieco, E.; Koutsias, N.
2013-02-01
Humans are responsible for most forest fires in Europe, but anthropogenic factors behind these events are still poorly understood. We tried to identify the driving factors of human-caused fire occurrence in Spain by applying two different statistical approaches. Firstly, assuming stationary processes for the whole country, we created models based on multiple linear regression and binary logistic regression to find factors associated with fire density and fire presence, respectively. Secondly, we used geographically weighted regression (GWR) to better understand and explore the local and regional variations of those factors behind human-caused fire occurrence. The number of human-caused fires occurring within a 25-yr period (1983-2007) was computed for each of the 7638 Spanish mainland municipalities, creating a binary variable (fire/no fire) to develop logistic models, and a continuous variable (fire density) to build standard linear regression models. A total of 383 657 fires were registered in the study dataset. The binary logistic model, which estimates the probability of having/not having a fire, successfully classified 76.4% of the total observations, while the ordinary least squares (OLS) regression model explained 53% of the variation of the fire density patterns (adjusted R2 = 0.53). Both approaches confirmed, in addition to forest and climatic variables, the importance of variables related with agrarian activities, land abandonment, rural population exodus and developmental processes as underlying factors of fire occurrence. For the GWR approach, the explanatory power of the GW linear model for fire density using an adaptive bandwidth increased from 53% to 67%, while for the GW logistic model the correctly classified observations improved only slightly, from 76.4% to 78.4%, but significantly according to the corrected Akaike Information Criterion (AICc), from 3451.19 to 3321.19. The results from GWR indicated a significant spatial variation in the local parameter estimates for all the variables and an important reduction of the autocorrelation in the residuals of the GW linear model. Despite the fitting improvement of local models, GW regression, more than an alternative to "global" or traditional regression modelling, seems to be a valuable complement to explore the non-stationary relationships between the response variable and the explanatory variables. The synergy of global and local modelling provides insights into fire management and policy and helps further our understanding of the fire problem over large areas while at the same time recognizing its local character.
NASA Astrophysics Data System (ADS)
Rossi, M.; Apuani, T.; Felletti, F.
2009-04-01
The aim of this paper is to compare the results of two statistical methods for landslide susceptibility analysis: 1) univariate probabilistic method based on landslide susceptibility index, 2) multivariate method (logistic regression). The study area is the Febbraro valley, located in the central Italian Alps, where different types of metamorphic rocks croup out. On the eastern part of the studied basin a quaternary cover represented by colluvial and secondarily, by glacial deposits, is dominant. In this study 110 earth flows, mainly located toward NE portion of the catchment, were analyzed. They involve only the colluvial deposits and their extension mainly ranges from 36 to 3173 m2. Both statistical methods require to establish a spatial database, in which each landslide is described by several parameters that can be assigned using a main scarp central point of landslide. The spatial database is constructed using a Geographical Information System (GIS). Each landslide is described by several parameters corresponding to the value of main scarp central point of the landslide. Based on bibliographic review a total of 15 predisposing factors were utilized. The width of the intervals, in which the maps of the predisposing factors have to be reclassified, has been defined assuming constant intervals to: elevation (100 m), slope (5 °), solar radiation (0.1 MJ/cm2/year), profile curvature (1.2 1/m), tangential curvature (2.2 1/m), drainage density (0.5), lineament density (0.00126). For the other parameters have been used the results of the probability-probability plots analysis and the statistical indexes of landslides site. In particular slope length (0 ÷ 2, 2 ÷ 5, 5 ÷ 10, 10 ÷ 20, 20 ÷ 35, 35 ÷ 260), accumulation flow (0 ÷ 1, 1 ÷ 2, 2 ÷ 5, 5 ÷ 12, 12 ÷ 60, 60 ÷27265), Topographic Wetness Index 0 ÷ 0.74, 0.74 ÷ 1.94, 1.94 ÷ 2.62, 2.62 ÷ 3.48, 3.48 ÷ 6,00, 6.00 ÷ 9.44), Stream Power Index (0 ÷ 0.64, 0.64 ÷ 1.28, 1.28 ÷ 1.81, 1.81 ÷ 4.20, 4.20 ÷ 9.40). Geological map and land use map were also used, considering geological and land use properties as categorical variables. Appling the univariate probabilistic method the Landslide Susceptibility Index (LSI) is defined as the sum of the ratio Ra/Rb calculated for each predisposing factor, where Ra is the ratio between number of pixel of class and the total number of pixel of the study area, and Rb is the ratio between number of landslides respect to the pixel number of the interval area. From the analysis of the Ra/Rb ratio the relationship between landslide occurrence and predisposing factors were defined. Then the equation of LSI was used in GIS to trace the landslide susceptibility maps. The multivariate method for landslide susceptibility analysis, based on logistic regression, was performed starting from the density maps of the predisposing factors, calculated with the intervals defined above using the equation Rb/Rbtot, where Rbtot is a sum of all Rb values. Using stepwise forward algorithms the logistic regression was performed in two successive steps: first a univariate logistic regression is used to choose the most significant predisposing factors, then the multivariate logistic regression can be performed. The univariate regression highlighted the importance of the following factors: elevation, accumulation flow, drainage density, lineament density, geology and land use. When the multivariate regression was applied the number of controlling factors was reduced neglecting the geological properties. The resulting final susceptibility equation is: P = 1 / (1 + exp-(6.46-22.34*elevation-5.33*accumulation flow-7.99* drainage density-4.47*lineament density-17.31*land use)) and using this equation the susceptibility maps were obtained. To easy compare the results of the two methodologies, the susceptibility maps were reclassified in five susceptibility intervals (very high, high, moderate, low and very low) using natural breaks. Then the maps were validated using two cumulative distribution curves, one related to the landslides (number of landslides in each susceptibility class) and one to the basin (number of pixel covering each class). Comparing the curves for each method, it results that the two approaches (univariate and multivariate) are appropriate, providing acceptable results. In both maps the distribution of high susceptibility condition is mainly localized on the left slope of the catchment in agreement with the field evidences. The comparison between the methods was obtained by subtraction of the two maps. This operation shows that about 40% of the basin is classified by the same class of susceptibility. In general the univariate probabilistic method tends to overestimate the areal extension of the high susceptibility class with respect to the maps obtained by the logistic regression method.
Dynamic Dimensionality Selection for Bayesian Classifier Ensembles
2015-03-19
learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but much more...classifier, Generative learning, Discriminative learning, Naïve Bayes, Feature selection, Logistic regression , higher order attribute independence 16...discriminative learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but
Travis Woolley; David C. Shaw; Lisa M. Ganio; Stephen Fitzgerald
2012-01-01
Logistic regression models used to predict tree mortality are critical to post-fire management, planning prescribed bums and understanding disturbance ecology. We review literature concerning post-fire mortality prediction using logistic regression models for coniferous tree species in the western USA. We include synthesis and review of: methods to develop, evaluate...
Preserving Institutional Privacy in Distributed binary Logistic Regression.
Wu, Yuan; Jiang, Xiaoqian; Ohno-Machado, Lucila
2012-01-01
Privacy is becoming a major concern when sharing biomedical data across institutions. Although methods for protecting privacy of individual patients have been proposed, it is not clear how to protect the institutional privacy, which is many times a critical concern of data custodians. Built upon our previous work, Grid Binary LOgistic REgression (GLORE)1, we developed an Institutional Privacy-preserving Distributed binary Logistic Regression model (IPDLR) that considers both individual and institutional privacy for building a logistic regression model in a distributed manner. We tested our method using both simulated and clinical data, showing how it is possible to protect the privacy of individuals and of institutions using a distributed strategy.
Covariate Imbalance and Adjustment for Logistic Regression Analysis of Clinical Trial Data
Ciolino, Jody D.; Martin, Reneé H.; Zhao, Wenle; Jauch, Edward C.; Hill, Michael D.; Palesch, Yuko Y.
2014-01-01
In logistic regression analysis for binary clinical trial data, adjusted treatment effect estimates are often not equivalent to unadjusted estimates in the presence of influential covariates. This paper uses simulation to quantify the benefit of covariate adjustment in logistic regression. However, International Conference on Harmonization guidelines suggest that covariate adjustment be pre-specified. Unplanned adjusted analyses should be considered secondary. Results suggest that that if adjustment is not possible or unplanned in a logistic setting, balance in continuous covariates can alleviate some (but never all) of the shortcomings of unadjusted analyses. The case of log binomial regression is also explored. PMID:24138438
Differentially private distributed logistic regression using private and public data.
Ji, Zhanglong; Jiang, Xiaoqian; Wang, Shuang; Xiong, Li; Ohno-Machado, Lucila
2014-01-01
Privacy protecting is an important issue in medical informatics and differential privacy is a state-of-the-art framework for data privacy research. Differential privacy offers provable privacy against attackers who have auxiliary information, and can be applied to data mining models (for example, logistic regression). However, differentially private methods sometimes introduce too much noise and make outputs less useful. Given available public data in medical research (e.g. from patients who sign open-consent agreements), we can design algorithms that use both public and private data sets to decrease the amount of noise that is introduced. In this paper, we modify the update step in Newton-Raphson method to propose a differentially private distributed logistic regression model based on both public and private data. We try our algorithm on three different data sets, and show its advantage over: (1) a logistic regression model based solely on public data, and (2) a differentially private distributed logistic regression model based on private data under various scenarios. Logistic regression models built with our new algorithm based on both private and public datasets demonstrate better utility than models that trained on private or public datasets alone without sacrificing the rigorous privacy guarantee.
Amini, Payam; Maroufizadeh, Saman; Samani, Reza Omani; Hamidi, Omid; Sepidarkish, Mahdi
2017-06-01
Preterm birth (PTB) is a leading cause of neonatal death and the second biggest cause of death in children under five years of age. The objective of this study was to determine the prevalence of PTB and its associated factors using logistic regression and decision tree classification methods. This cross-sectional study was conducted on 4,415 pregnant women in Tehran, Iran, from July 6-21, 2015. Data were collected by a researcher-developed questionnaire through interviews with mothers and review of their medical records. To evaluate the accuracy of the logistic regression and decision tree methods, several indices such as sensitivity, specificity, and the area under the curve were used. The PTB rate was 5.5% in this study. The logistic regression outperformed the decision tree for the classification of PTB based on risk factors. Logistic regression showed that multiple pregnancies, mothers with preeclampsia, and those who conceived with assisted reproductive technology had an increased risk for PTB ( p < 0.05). Identifying and training mothers at risk as well as improving prenatal care may reduce the PTB rate. We also recommend that statisticians utilize the logistic regression model for the classification of risk groups for PTB.
Nowcasting sunshine number using logistic modeling
NASA Astrophysics Data System (ADS)
Brabec, Marek; Badescu, Viorel; Paulescu, Marius
2013-04-01
In this paper, we present a formalized approach to statistical modeling of the sunshine number, binary indicator of whether the Sun is covered by clouds introduced previously by Badescu (Theor Appl Climatol 72:127-136, 2002). Our statistical approach is based on Markov chain and logistic regression and yields fully specified probability models that are relatively easily identified (and their unknown parameters estimated) from a set of empirical data (observed sunshine number and sunshine stability number series). We discuss general structure of the model and its advantages, demonstrate its performance on real data and compare its results to classical ARIMA approach as to a competitor. Since the model parameters have clear interpretation, we also illustrate how, e.g., their inter-seasonal stability can be tested. We conclude with an outlook to future developments oriented to construction of models allowing for practically desirable smooth transition between data observed with different frequencies and with a short discussion of technical problems that such a goal brings.
Is parenting style a predictor of suicide attempts in a representative sample of adolescents?
2014-01-01
Background Suicidal ideation and suicide attempts are serious but not rare conditions in adolescents. However, there are several research and practical suicide-prevention initiatives that discuss the possibility of preventing serious self-harm. Profound knowledge about risk and protective factors is therefore necessary. The aim of this study is a) to clarify the role of parenting behavior and parenting styles in adolescents’ suicide attempts and b) to identify other statistically significant and clinically relevant risk and protective factors for suicide attempts in a representative sample of German adolescents. Methods In the years 2007/2008, a representative written survey of N = 44,610 students in the 9th grade of different school types in Germany was conducted. In this survey, the lifetime prevalence of suicide attempts was investigated as well as potential predictors including parenting behavior. A three-step statistical analysis was carried out: I) As basic model, the association between parenting and suicide attempts was explored via binary logistic regression controlled for age and sex. II) The predictive values of 13 additional potential risk/protective factors were analyzed with single binary logistic regression analyses for each predictor alone. Non-significant predictors were excluded in Step III. III) In a multivariate binary logistic regression analysis, all significant predictor variables from Step II and the parenting styles were included after testing for multicollinearity. Results Three parental variables showed a relevant association with suicide attempts in adolescents – (all protective): mother’s warmth and father’s warmth in childhood and mother’s control in adolescence (Step I). In the full model (Step III), Authoritative parenting (protective: OR: .79) and Rejecting-Neglecting parenting (risk: OR: 1.63) were identified as significant predictors (p < .001) for suicidal attempts. Seven further variables were interpreted to be statistically significant and clinically relevant: ADHD, female sex, smoking, Binge Drinking, absenteeism/truancy, migration background, and parental separation events. Conclusions Parenting style does matter. While children of Authoritative parents profit, children of Rejecting-Neglecting parents are put at risk – as we were able to show for suicide attempts in adolescence. Some of the identified risk factors contribute new knowledge and potential areas of intervention for special groups such as migrants or children diagnosed with ADHD. PMID:24766881
Sun, Z W; Shi, T T; Fu, P X
2017-02-01
To explore the characteristics of schizophrenia patients' homicide behaviors and the influences of the assessments of criminal capacity. Indicators such as demographic and clinical data, characteristics of criminal behaviors and criminal capacity from the suspects whom were diagnosed by forensic psychiatry as schizophrenia ( n =110) and normal mental ( n =70) with homicide behavior, were collected by self-made investigation form and compared. The influences of the assessments of criminal capacity on the suspects diagnosed as schizophrenia were also analyzed using logistic regression analysis. There were no significant statistical differences between the schizophrenic group and the normal mental group concerning age, gender, education and marital status ( P >0.05). There were significant statistical differences between the two groups concerning thought disorder, emotion state and social function before crime ( P <0.05) and there were significant statistical differences in some characteristics of the case such as aggressive history ( P <0.05), cue, trigger, plan, criminal incentives, object of crime, circumstance cognition and self-protection ( P <0.05). Multivariate logistic regression analysis suggested that thought disorder, emotion state, social function, criminal incentives, plan and self-protection before crime of the schizophrenic group were positively correlated with the criminal capacity ( P <0.05). The relevant influences of psychopathology and crime characteristics should be considered comprehensively for improving the accuracy of the criminal capacity evaluation on the suspects diagnosed as schizophrenia with homicide behavior. Copyright© by the Editorial Department of Journal of Forensic Medicine
Rupert, Michael G.; Plummer, Niel
2009-01-01
This raster data set delineates the predicted probability of unmixed young groundwater (defined using chlorofluorocarbon-11 concentrations and tritium activities) in groundwater in the Eagle River watershed valley-fill aquifer, Eagle County, North-Central Colorado, 2006-2007. This data set was developed by a cooperative project between the U.S. Geological Survey, Eagle County, the Eagle River Water and Sanitation District, the Town of Eagle, the Town of Gypsum, and the Upper Eagle Regional Water Authority. This project was designed to evaluate potential land-development effects on groundwater and surface-water resources so that informed land-use and water management decisions can be made. This groundwater probability map and its associated probability maps were developed as follows: (1) A point data set of wells with groundwater quality and groundwater age data was overlaid with thematic layers of anthropogenic (related to human activities) and hydrogeologic data by using a geographic information system to assign each well values for depth to groundwater, distance to major streams and canals, distance to gypsum beds, precipitation, soils, and well depth. These data then were downloaded to a statistical software package for analysis by logistic regression. (2) Statistical models predicting the probability of elevated nitrate concentrations, the probability of unmixed young water (using chlorofluorocarbon-11 concentrations and tritium activities), and the probability of elevated volatile organic compound concentrations were developed using logistic regression techniques. (3) The statistical models were entered into a GIS and the probability map was constructed.
Wartberg, Lutz; Kriston, Levente; Kammerl, Rudolf
2017-07-01
Internet Gaming Disorder (IGD) has been included in the current edition of the Diagnostic and Statistical Manual of Mental Disorders-Fifth Edition (DSM-5). In the present study, the relationship among social support, friends only known through the Internet, health-related quality of life, and IGD in adolescence was explored for the first time. For this purpose, 1,095 adolescents aged from 12 to 14 years were surveyed with a standardized questionnaire concerning IGD, self-perceived social support, proportion of friends only known through the Internet, and health-related quality of life. The authors conducted unpaired t-tests, a chi-square test, as well as correlation and logistic regression analyses. According to the statistical analyses, adolescents with IGD reported lower self-perceived social support, more friends only known through the Internet, and a lower health-related quality of life compared with the group without IGD. Both in bivariate and multivariate logistic regression models, statistically significant associations between IGD and male gender, a higher proportion of friends only known through the Internet, and a lower health-related quality of life (multivariate model: Nagelkerke's R 2 = 0.37) were revealed. Lower self-perceived social support was related to IGD in the bivariate model only. In summary, quality of life and social aspects seem to be important factors for IGD in adolescence and therefore should be incorporated in further (longitudinal) studies. The findings of the present survey may provide starting points for the development of prevention and intervention programs for adolescents affected by IGD.
Craven, Stephen; Shirsat, Nishikant; Whelan, Jessica; Glennon, Brian
2013-01-01
A Monod kinetic model, logistic equation model, and statistical regression model were developed for a Chinese hamster ovary cell bioprocess operated under three different modes of operation (batch, bolus fed-batch, and continuous fed-batch) and grown on two different bioreactor scales (3 L bench-top and 15 L pilot-scale). The Monod kinetic model was developed for all modes of operation under study and predicted cell density, glucose glutamine, lactate, and ammonia concentrations well for the bioprocess. However, it was computationally demanding due to the large number of parameters necessary to produce a good model fit. The transferability of the Monod kinetic model structure and parameter set across bioreactor scales and modes of operation was investigated and a parameter sensitivity analysis performed. The experimentally determined parameters had the greatest influence on model performance. They changed with scale and mode of operation, but were easily calculated. The remaining parameters, which were fitted using a differential evolutionary algorithm, were not as crucial. Logistic equation and statistical regression models were investigated as alternatives to the Monod kinetic model. They were less computationally intensive to develop due to the absence of a large parameter set. However, modeling of the nutrient and metabolite concentrations proved to be troublesome due to the logistic equation model structure and the inability of both models to incorporate a feed. The complexity, computational load, and effort required for model development has to be balanced with the necessary level of model sophistication when choosing which model type to develop for a particular application. Copyright © 2012 American Institute of Chemical Engineers (AIChE).
Item Response Theory Modeling of the Philadelphia Naming Test.
Fergadiotis, Gerasimos; Kellough, Stacey; Hula, William D
2015-06-01
In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating explanatory variables to item difficulty. This article describes the statistical model underlying the computer adaptive PNT presented in a companion article (Hula, Kellough, & Fergadiotis, 2015). Using archival data, we evaluated the fit of the PNT to 1- and 2-parameter logistic models and examined the precision of the resulting parameter estimates. We regressed the item difficulty estimates on three predictor variables: word length, age of acquisition, and contextual diversity. The 2-parameter logistic model demonstrated marginally better fit, but the fit of the 1-parameter logistic model was adequate. Precision was excellent for both person ability and item difficulty estimates. Word length, age of acquisition, and contextual diversity all independently contributed to variance in item difficulty. Item-response-theory methods can be productively used to analyze and quantify anomia severity in aphasia. Regression of item difficulty on lexical variables supported the validity of the PNT and interpretation of anomia severity scores in the context of current word-finding models.
Gao, Lei; Xi, Qian Qian; Wu, Jun; Han, Yu; Dai, Wei; Su, Yuan Yuan; Zhang, Xin
2015-09-01
To investigate the association between autism and prenatal environmental risk factors. A case-control study was conducted among 193 children with autism from the special educational schools and 733 typical development controls matched by age and gender by using questionnaire in Tianjin from 2007 to 2012. Statistical analysis included quick unbiased efficient statistical tree (QUEST) and logistic regression in SPSS 20.0. There were four predictors by QUEST and the logistic regression analysis, maternal air conditioner use during pregnancy (OR=0.316, 95% CI: 0.215-0.463) was the single first-level node (χ²=50.994, P=0.000); newborn complications (OR=4.277, 95% CI: 2.314-7.908) and paternal consumption of freshwater fish (OR=0.383, 95% CI: 0.256-0.573) were second-layer predictors (χ²=45.248, P=0.000; χ²=24.212, P=0.000); and maternal depression (OR=4.822, 95% CI: 3.047-7.631) was the single third-level predictor (χ²=23.835, P=0.000). The prediction accuracy of the tree was 89.2%. The air conditioner use during pregnancy and paternal freshwater fish diet might be beneficial for the prevention of autism, while newborn complications and maternal depression might be the risk factors. Copyright © 2015 The Editorial Board of Biomedical and Environmental Sciences. Published by China CDC. All rights reserved.
Misra-Hebert, Anita D; Santurri, Laura; DeChant, Richard; Watts, Brook; Sehgal, Ashwini R; Aron, David C
2015-10-01
To assess health status among student veterans at a community college utilizing a partnership between a Veterans Affairs Medical Center and a community college. Student veterans at Cuyahoga Community College in Cleveland, Ohio, in January to April 2013. A health assessment survey was sent to 978 veteran students. Descriptive analyses to assess prevalence of clinical diagnoses and health behaviors were performed. Logistic regression analyses were performed to assess for independent predictors of functional limitations. 204 students participated in the survey (21% response rate). Self-reported depression and unhealthy behaviors were high. Physical and emotional limitations (45% and 35%, respectively), and pain interfering with work (42%) were reported. Logistic regression analyses confirmed the independent association of self-reported depression with functional limitation (odds ratio [OR] = 3.3, 95% confidence interval [CI] 1.4-7.8, p < 0.05, and C statistic 0.72) and of post-traumatic stress disorder with pain interfering with work (OR 3.9, CI 1.1-13.6, p < 0.05, and C statistic 0.75). A health assessment survey identified priority areas to inform targeted health promotion for student veterans at a community college. A partnership between a Veterans Affairs Medical Center and a community college can be utilized to help understand the health needs of veteran students. Reprint & Copyright © 2015 Association of Military Surgeons of the U.S.
A 3-Year Study of Predictive Factors for Positive and Negative Appendicectomies.
Chang, Dwayne T S; Maluda, Melissa; Lee, Lisa; Premaratne, Chandrasiri; Khamhing, Srisongham
2018-03-06
Early and accurate identification or exclusion of acute appendicitis is the key to avoid the morbidity of delayed treatment for true appendicitis or unnecessary appendicectomy, respectively. We aim (i) to identify potential predictive factors for positive and negative appendicectomies; and (ii) to analyse the use of ultrasound scans (US) and computed tomography (CT) scans for acute appendicitis. All appendicectomies that took place at our hospital from the 1st of January 2013 to the 31st of December 2015 were retrospectively recorded. Test results of potential predictive factors of acute appendicitis were recorded. Statistical analysis was performed using Fisher exact test, logistic regression analysis, sensitivity, specificity, and positive and negative predictive values calculation. 208 patients were included in this study. 184 patients had histologically proven acute appendicitis. The other 24 patients had either nonappendicitis pathology or normal appendix. Logistic regression analysis showed statistically significant associations between appendicitis and white cell count, neutrophil count, C-reactive protein, and bilirubin. Neutrophil count was the test with the highest sensitivity and negative predictive values, whereas bilirubin was the test with the highest specificity and positive predictive values (PPV). US and CT scans had high sensitivity and PPV for diagnosing appendicitis. No single test was sufficient to diagnose or exclude acute appendicitis by itself. Combining tests with high sensitivity (abnormal neutrophil count, and US and CT scans) and high specificity (raised bilirubin) may predict acute appendicitis more accurately.
Sanson, R L; Gloster, J; Burgin, L
2011-09-24
The aims of this study were to statistically reassess the likelihood that windborne spread of foot-and-mouth disease (FMD) virus (FMDV) occurred at the start of the UK 1967 to 1968 FMD epidemic at Oswestry, Shropshire, and to derive dose-response probability of infection curves for farms exposed to airborne FMDV. To enable this, data on all farms present in 1967 in the parishes near Oswestry were assembled. Cases were infected premises whose date of appearance of first clinical signs was within 14 days of the depopulation of the index farm. Logistic regression was used to evaluate the association between infection status and distance and direction from the index farm. The UK Met Office's NAME atmospheric dispersion model (ADM) was used to generate plumes for each day that FMDV was excreted from the index farm based on actual historical weather records from October 1967. Daily airborne FMDV exposure rates for all farms in the study area were calculated using a geographical information system. Probit analyses were used to calculate dose-response probability of infection curves to FMDV, using relative exposure rates on case and control farms. Both the logistic regression and probit analyses gave strong statistical support to the hypothesis that airborne spread occurred. There was some evidence that incubation period was inversely proportional to the exposure rate.
Kalp, Ericka L; Harris, Jeanette J; Zawistowski, Grace
2018-06-06
The 2015 APIC MegaSurvey was completed by 4,078 members to assess infection prevention practices. This study's purpose was to examine MegaSurvey results to relate infection preventionist (IP) certification status with demographic characteristics, organizational structure, compensation benefits, and practice and competency factors. Descriptive statistics were used to examine population characteristics and certification status. Bivariate logistic regression was performed to evaluate relationships between independent variables and certification status. Variables demonstrating statistical significance (P <.05) were included in multivariable logistic regression analyses. Forty-seven percent of survey respondents had their CIC. IPs were less likely certified if their educational attainment was less than a bachelor's degree, they were aged 18-45 years, they worked in rural facilities, they had <16 years' experience in health care before becoming an IP, and the percentage of job dedicated to infection prevention was <75%. However, certification was associated with CIC benefit paid fully by employer, self-rating as proficient and expert-advanced, and surveillance and epidemiologic investigation competency obtained via professional development and training. CIC attainment was associated with IP characteristics. Additional research should focus on identifying strategies to increase certification among noncertified IPs because CIC is a measure of proficiency that should be a goal for all IPs. Copyright © 2018 Association for Professionals in Infection Control and Epidemiology, Inc. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
García-Díaz, J. Carlos
2009-11-01
Fault detection and diagnosis is an important problem in process engineering. Process equipments are subject to malfunctions during operation. Galvanized steel is a value added product, furnishing effective performance by combining the corrosion resistance of zinc with the strength and formability of steel. Fault detection and diagnosis is an important problem in continuous hot dip galvanizing and the increasingly stringent quality requirements in automotive industry has also demanded ongoing efforts in process control to make the process more robust. When faults occur, they change the relationship among these observed variables. This work compares different statistical regression models proposed in the literature for estimating the quality of galvanized steel coils on the basis of short time histories. Data for 26 batches were available. Five variables were selected for monitoring the process: the steel strip velocity, four bath temperatures and bath level. The entire data consisting of 48 galvanized steel coils was divided into sets. The first training data set was 25 conforming coils and the second data set was 23 nonconforming coils. Logistic regression is a modeling tool in which the dependent variable is categorical. In most applications, the dependent variable is binary. The results show that the logistic generalized linear models do provide good estimates of quality coils and can be useful for quality control in manufacturing process.
Lin, Hanxiao; Zhang, Hua; Yan, Yuxia; Liu, Duan; Zhang, Ruyi; Liu, Yeungyeung; Chen, Pei; Zhang, Jincai; Xuan, Dongying
2014-12-01
This study aimed to compare the opinions of dentists and endocrinologists regarding diabetes mellitus (DM) and periodontitis, and to investigate the possible effects on their practice. Cross-sectional data were collected from 297 endocrinologists and 134 dentists practicing in southern China using two separated questionnaires. Questions were close-ended or Likert-scaled. Statistical analyses were done by descriptive statistics, bivariate and binary logistic regression analysis. Compared with endocrinologists, dentists presented more favorable attitudes for the relationship of DM and periodontitis (P<0.001). 61.2% of dentists reported they would frequently refer patients with severe periodontitis for DM evaluation, while only 26.6% of endocrinologists reported they would frequently advise patients with DM to visit a dentist. Nearly all of the respondents (94.4%) agreed that the interdisciplinary collaboration should be strengthened. The logistic regression analysis exhibited that respondents with more favorable attitudes were more likely to advise a dental visit (P=0.003) or to screen for DM (P=0.006). Endocrinologists and dentists are not equally equipped with the knowledge about the relationship between DM and periodontitis, and there is a wide gap between their practice and the current evidence, especially for endocrinologists. It's urgent to take measures to develop the interdisciplinary education and collaboration among the health care providers. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Li, Gaoming; Yi, Dali; Wu, Xiaojiao; Liu, Xiaoyu; Zhang, Yanqi; Liu, Ling; Yi, Dong
2015-01-01
Background Although a substantial number of studies focus on the teaching and application of medical statistics in China, few studies comprehensively evaluate the recognition of and demand for medical statistics. In addition, the results of these various studies differ and are insufficiently comprehensive and systematic. Objectives This investigation aimed to evaluate the general cognition of and demand for medical statistics by undergraduates, graduates, and medical staff in China. Methods We performed a comprehensive database search related to the cognition of and demand for medical statistics from January 2007 to July 2014 and conducted a meta-analysis of non-controlled studies with sub-group analysis for undergraduates, graduates, and medical staff. Results There are substantial differences with respect to the cognition of theory in medical statistics among undergraduates (73.5%), graduates (60.7%), and medical staff (39.6%). The demand for theory in medical statistics is high among graduates (94.6%), undergraduates (86.1%), and medical staff (88.3%). Regarding specific statistical methods, the cognition of basic statistical methods is higher than of advanced statistical methods. The demand for certain advanced statistical methods, including (but not limited to) multiple analysis of variance (ANOVA), multiple linear regression, and logistic regression, is higher than that for basic statistical methods. The use rates of the Statistical Package for the Social Sciences (SPSS) software and statistical analysis software (SAS) are only 55% and 15%, respectively. Conclusion The overall statistical competence of undergraduates, graduates, and medical staff is insufficient, and their ability to practically apply their statistical knowledge is limited, which constitutes an unsatisfactory state of affairs for medical statistics education. Because the demand for skills in this area is increasing, the need to reform medical statistics education in China has become urgent. PMID:26053876
Wu, Yazhou; Zhou, Liang; Li, Gaoming; Yi, Dali; Wu, Xiaojiao; Liu, Xiaoyu; Zhang, Yanqi; Liu, Ling; Yi, Dong
2015-01-01
Although a substantial number of studies focus on the teaching and application of medical statistics in China, few studies comprehensively evaluate the recognition of and demand for medical statistics. In addition, the results of these various studies differ and are insufficiently comprehensive and systematic. This investigation aimed to evaluate the general cognition of and demand for medical statistics by undergraduates, graduates, and medical staff in China. We performed a comprehensive database search related to the cognition of and demand for medical statistics from January 2007 to July 2014 and conducted a meta-analysis of non-controlled studies with sub-group analysis for undergraduates, graduates, and medical staff. There are substantial differences with respect to the cognition of theory in medical statistics among undergraduates (73.5%), graduates (60.7%), and medical staff (39.6%). The demand for theory in medical statistics is high among graduates (94.6%), undergraduates (86.1%), and medical staff (88.3%). Regarding specific statistical methods, the cognition of basic statistical methods is higher than of advanced statistical methods. The demand for certain advanced statistical methods, including (but not limited to) multiple analysis of variance (ANOVA), multiple linear regression, and logistic regression, is higher than that for basic statistical methods. The use rates of the Statistical Package for the Social Sciences (SPSS) software and statistical analysis software (SAS) are only 55% and 15%, respectively. The overall statistical competence of undergraduates, graduates, and medical staff is insufficient, and their ability to practically apply their statistical knowledge is limited, which constitutes an unsatisfactory state of affairs for medical statistics education. Because the demand for skills in this area is increasing, the need to reform medical statistics education in China has become urgent.
Logistic regression for dichotomized counts.
Preisser, John S; Das, Kalyan; Benecha, Habtamu; Stamm, John W
2016-12-01
Sometimes there is interest in a dichotomized outcome indicating whether a count variable is positive or zero. Under this scenario, the application of ordinary logistic regression may result in efficiency loss, which is quantifiable under an assumed model for the counts. In such situations, a shared-parameter hurdle model is investigated for more efficient estimation of regression parameters relating to overall effects of covariates on the dichotomous outcome, while handling count data with many zeroes. One model part provides a logistic regression containing marginal log odds ratio effects of primary interest, while an ancillary model part describes the mean count of a Poisson or negative binomial process in terms of nuisance regression parameters. Asymptotic efficiency of the logistic model parameter estimators of the two-part models is evaluated with respect to ordinary logistic regression. Simulations are used to assess the properties of the models with respect to power and Type I error, the latter investigated under both misspecified and correctly specified models. The methods are applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren. © The Author(s) 2014.
Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P
2015-01-01
This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of more than 10% over the standard classification models, which can be translated to correct labeling of additional 400 - 500 readmissions for heart failure patients in the state of California over a year. Lastly, several key predictor identified from the HCUP data include the disposition location from discharge, the number of chronic conditions, and the number of acute procedures. It would be beneficial to apply simple decision rules obtained from the decision tree in an ad-hoc manner to guide the cohort stratification. It could be potentially beneficial to explore the effect of pairwise interactions between influential predictors when building the logistic regression models for different data strata. Judicious use of the ad-hoc CLR models developed offers insights into future development of prediction models for hospital readmissions, which can lead to better intuition in identifying high-risk patients and developing effective post-discharge care strategies. Lastly, this paper is expected to raise the awareness of collecting data on additional markers and developing necessary database infrastructure for larger-scale exploratory studies on readmission risk prediction.
Harvey, H Benjamin; Liu, Catherine; Ai, Jing; Jaworsky, Cristina; Guerrier, Claude Emmanuel; Flores, Efren; Pianykh, Oleg
2017-10-01
To test whether data elements available in the electronic medical record (EMR) can be effectively leveraged to predict failure to attend a scheduled radiology examination. Using data from a large academic medical center, we identified all patients with a diagnostic imaging examination scheduled from January 1, 2016, to April 1, 2016, and determined whether the patient successfully attended the examination. Demographic, clinical, and health services utilization variables available in the EMR potentially relevant to examination attendance were recorded for each patient. We used descriptive statistics and logistic regression models to test whether these data elements could predict failure to attend a scheduled radiology examination. The predictive accuracy of the regression models were determined by calculating the area under the receiver operator curve. Among the 54,652 patient appointments with radiology examinations scheduled during the study period, 6.5% were no-shows. No-show rates were highest for the modalities of mammography and CT and lowest for PET and MRI. Logistic regression indicated that 16 of the 27 demographic, clinical, and health services utilization factors were significantly associated with failure to attend a scheduled radiology examination (P ≤ .05). Stepwise logistic regression analysis demonstrated that previous no-shows, days between scheduling and appointments, modality type, and insurance type were most strongly predictive of no-show. A model considering all 16 data elements had good ability to predict radiology no-shows (area under the receiver operator curve = 0.753). The predictive ability was similar or improved when these models were analyzed by modality. Patient and examination information readily available in the EMR can be successfully used to predict radiology no-shows. Moving forward, this information can be proactively leveraged to identify patients who might benefit from additional patient engagement through appointment reminders or other targeted interventions to avoid no-shows. Copyright © 2017 American College of Radiology. Published by Elsevier Inc. All rights reserved.
Differentially private distributed logistic regression using private and public data
2014-01-01
Background Privacy protecting is an important issue in medical informatics and differential privacy is a state-of-the-art framework for data privacy research. Differential privacy offers provable privacy against attackers who have auxiliary information, and can be applied to data mining models (for example, logistic regression). However, differentially private methods sometimes introduce too much noise and make outputs less useful. Given available public data in medical research (e.g. from patients who sign open-consent agreements), we can design algorithms that use both public and private data sets to decrease the amount of noise that is introduced. Methodology In this paper, we modify the update step in Newton-Raphson method to propose a differentially private distributed logistic regression model based on both public and private data. Experiments and results We try our algorithm on three different data sets, and show its advantage over: (1) a logistic regression model based solely on public data, and (2) a differentially private distributed logistic regression model based on private data under various scenarios. Conclusion Logistic regression models built with our new algorithm based on both private and public datasets demonstrate better utility than models that trained on private or public datasets alone without sacrificing the rigorous privacy guarantee. PMID:25079786
Yang, Lixue; Chen, Kean
2015-11-01
To improve the design of underwater target recognition systems based on auditory perception, this study compared human listeners with automatic classifiers. Performances measures and strategies in three discrimination experiments, including discriminations between man-made and natural targets, between ships and submarines, and among three types of ships, were used. In the experiments, the subjects were asked to assign a score to each sound based on how confident they were about the category to which it belonged, and logistic regression, which represents linear discriminative models, also completed three similar tasks by utilizing many auditory features. The results indicated that the performances of logistic regression improved as the ratio between inter- and intra-class differences became larger, whereas the performances of the human subjects were limited by their unfamiliarity with the targets. Logistic regression performed better than the human subjects in all tasks but the discrimination between man-made and natural targets, and the strategies employed by excellent human subjects were similar to that of logistic regression. Logistic regression and several human subjects demonstrated similar performances when discriminating man-made and natural targets, but in this case, their strategies were not similar. An appropriate fusion of their strategies led to further improvement in recognition accuracy.
NASA Technical Reports Server (NTRS)
Duda, David P.; Minnis, Patrick
2009-01-01
Straightforward application of the Schmidt-Appleman contrail formation criteria to diagnose persistent contrail occurrence from numerical weather prediction data is hindered by significant bias errors in the upper tropospheric humidity. Logistic models of contrail occurrence have been proposed to overcome this problem, but basic questions remain about how random measurement error may affect their accuracy. A set of 5000 synthetic contrail observations is created to study the effects of random error in these probabilistic models. The simulated observations are based on distributions of temperature, humidity, and vertical velocity derived from Advanced Regional Prediction System (ARPS) weather analyses. The logistic models created from the simulated observations were evaluated using two common statistical measures of model accuracy, the percent correct (PC) and the Hanssen-Kuipers discriminant (HKD). To convert the probabilistic results of the logistic models into a dichotomous yes/no choice suitable for the statistical measures, two critical probability thresholds are considered. The HKD scores are higher when the climatological frequency of contrail occurrence is used as the critical threshold, while the PC scores are higher when the critical probability threshold is 0.5. For both thresholds, typical random errors in temperature, relative humidity, and vertical velocity are found to be small enough to allow for accurate logistic models of contrail occurrence. The accuracy of the models developed from synthetic data is over 85 percent for both the prediction of contrail occurrence and non-occurrence, although in practice, larger errors would be anticipated.
NASA Astrophysics Data System (ADS)
Mei, Zhixiong; Wu, Hao; Li, Shiyun
2018-06-01
The Conversion of Land Use and its Effects at Small regional extent (CLUE-S), which is a widely used model for land-use simulation, utilizes logistic regression to estimate the relationships between land use and its drivers, and thus, predict land-use change probabilities. However, logistic regression disregards possible spatial autocorrelation and self-organization in land-use data. Autologistic regression can depict spatial autocorrelation but cannot address self-organization, while logistic regression by considering only self-organization (NElogistic regression) fails to capture spatial autocorrelation. Therefore, this study developed a regression (NE-autologistic regression) method, which incorporated both spatial autocorrelation and self-organization, to improve CLUE-S. The Zengcheng District of Guangzhou, China was selected as the study area. The land-use data of 2001, 2005, and 2009, as well as 10 typical driving factors, were used to validate the proposed regression method and the improved CLUE-S model. Then, three future land-use scenarios in 2020: the natural growth scenario, ecological protection scenario, and economic development scenario, were simulated using the improved model. Validation results showed that NE-autologistic regression performed better than logistic regression, autologistic regression, and NE-logistic regression in predicting land-use change probabilities. The spatial allocation accuracy and kappa values of NE-autologistic-CLUE-S were higher than those of logistic-CLUE-S, autologistic-CLUE-S, and NE-logistic-CLUE-S for the simulations of two periods, 2001-2009 and 2005-2009, which proved that the improved CLUE-S model achieved the best simulation and was thereby effective to a certain extent. The scenario simulation results indicated that under all three scenarios, traffic land and residential/industrial land would increase, whereas arable land and unused land would decrease during 2009-2020. Apparent differences also existed in the simulated change sizes and locations of each land-use type under different scenarios. The results not only demonstrate the validity of the improved model but also provide a valuable reference for relevant policy-makers.
Unitary Response Regression Models
ERIC Educational Resources Information Center
Lipovetsky, S.
2007-01-01
The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…
Determining factors influencing survival of breast cancer by fuzzy logistic regression model.
Nikbakht, Roya; Bahrampour, Abbas
2017-01-01
Fuzzy logistic regression model can be used for determining influential factors of disease. This study explores the important factors of actual predictive survival factors of breast cancer's patients. We used breast cancer data which collected by cancer registry of Kerman University of Medical Sciences during the period of 2000-2007. The variables such as morphology, grade, age, and treatments (surgery, radiotherapy, and chemotherapy) were applied in the fuzzy logistic regression model. Performance of model was determined in terms of mean degree of membership (MDM). The study results showed that almost 41% of patients were in neoplasm and malignant group and more than two-third of them were still alive after 5-year follow-up. Based on the fuzzy logistic model, the most important factors influencing survival were chemotherapy, morphology, and radiotherapy, respectively. Furthermore, the MDM criteria show that the fuzzy logistic regression have a good fit on the data (MDM = 0.86). Fuzzy logistic regression model showed that chemotherapy is more important than radiotherapy in survival of patients with breast cancer. In addition, another ability of this model is calculating possibilistic odds of survival in cancer patients. The results of this study can be applied in clinical research. Furthermore, there are few studies which applied the fuzzy logistic models. Furthermore, we recommend using this model in various research areas.
Statistical analysis of subjective preferences for video enhancement
NASA Astrophysics Data System (ADS)
Woods, Russell L.; Satgunam, PremNandhini; Bronstad, P. Matthew; Peli, Eli
2010-02-01
Measuring preferences for moving video quality is harder than for static images due to the fleeting and variable nature of moving video. Subjective preferences for image quality can be tested by observers indicating their preference for one image over another. Such pairwise comparisons can be analyzed using Thurstone scaling (Farrell, 1999). Thurstone (1927) scaling is widely used in applied psychology, marketing, food tasting and advertising research. Thurstone analysis constructs an arbitrary perceptual scale for the items that are compared (e.g. enhancement levels). However, Thurstone scaling does not determine the statistical significance of the differences between items on that perceptual scale. Recent papers have provided inferential statistical methods that produce an outcome similar to Thurstone scaling (Lipovetsky and Conklin, 2004). Here, we demonstrate that binary logistic regression can analyze preferences for enhanced video.
Stata Modules for Calculating Novel Predictive Performance Indices for Logistic Models
Barkhordari, Mahnaz; Padyab, Mojgan; Hadaegh, Farzad; Azizi, Fereidoun; Bozorgmanesh, Mohammadreza
2016-01-01
Background Prediction is a fundamental part of prevention of cardiovascular diseases (CVD). The development of prediction algorithms based on the multivariate regression models loomed several decades ago. Parallel with predictive models development, biomarker researches emerged in an impressively great scale. The key question is how best to assess and quantify the improvement in risk prediction offered by new biomarkers or more basically how to assess the performance of a risk prediction model. Discrimination, calibration, and added predictive value have been recently suggested to be used while comparing the predictive performances of the predictive models’ with and without novel biomarkers. Objectives Lack of user-friendly statistical software has restricted implementation of novel model assessment methods while examining novel biomarkers. We intended, thus, to develop a user-friendly software that could be used by researchers with few programming skills. Materials and Methods We have written a Stata command that is intended to help researchers obtain cut point-free and cut point-based net reclassification improvement index and (NRI) and relative and absolute Integrated discriminatory improvement index (IDI) for logistic-based regression analyses.We applied the commands to a real data on women participating the Tehran lipid and glucose study (TLGS) to examine if information of a family history of premature CVD, waist circumference, and fasting plasma glucose can improve predictive performance of the Framingham’s “general CVD risk” algorithm. Results The command is addpred for logistic regression models. Conclusions The Stata package provided herein can encourage the use of novel methods in examining predictive capacity of ever-emerging plethora of novel biomarkers. PMID:27279830
Huang, Jinxi; Wang, Chenghu; Yuan, Weiwei; Zhang, Zhandong; Chen, Beibei; Zhang, Xiefu
2017-01-01
Background This study was conducted to investigate the risk factors of anastomotic fistula after the radical resection of esophageal‐cardiac cancer. Methods Five hundred and forty‐four esophageal‐cardiac cancer patients who underwent surgery and had complete clinical data were included in the study. Fifty patients diagnosed with postoperative anastomotic fistula were considered the case group and the remaining 494 subjects who did not develop postoperative anastomotic fistula were considered the control. The potential risk factors for anastomotic fistula, such as age, gender, diabetes history, smoking history, were collected and compared between the groups. Statistically significant variables were substituted into logistic regression to further evaluate the independent risk factors for postoperative anastomotic fistulas in esophageal‐cardiac cancer. Results The incidence of anastomotic fistulas was 9.2% (50/544). Logistic regression analysis revealed that female gender (P < 0.05), laparoscopic surgery (P < 0.05), decreased postoperative albumin (P < 0.05), and postoperative renal dysfunction (P < 0.05) were independent risk factors for anastomotic fistulas in patients who received surgery for esophageal‐cardiac cancer. Of the 50 anastomotic fistulas, 16 cases were small fistulas, which were only discovered by conventional imaging examination and not presenting clinical symptoms. All of the anastomotic fistulas occurred within seven days after surgery. Five of the patients with anastomotic fistulas underwent a second surgery and three died. Conclusion Female patients with esophageal‐cardiac cancer treated with endoscopic surgery and suffering from postoperative hypoproteinemia and renal dysfunction were susceptible to postoperative anastomotic fistula. PMID:28940985
Fischer, John P; Nelson, Jonas A; Shang, Eric K; Wink, Jason D; Wingate, Nicholas A; Woo, Edward Y; Jackson, Benjamin M; Kovach, Stephen J; Kanchwala, Suhail
2014-12-01
Groin wound complications after open vascular surgery procedures are common, morbid, and costly. The purpose of this study was to generate a simple, validated, clinically usable risk assessment tool for predicting groin wound morbidity after infra-inguinal vascular surgery. A retrospective review of consecutive patients undergoing groin cutdowns for femoral access between 2005-2011 was performed. Patients necessitating salvage flaps were compared to those who did not, and a stepwise logistic regression was performed and validated using a bootstrap technique. Utilising this analysis, a simplified risk score was developed to predict the risk of developing a wound which would necessitate salvage. A total of 925 patients were included in the study. The salvage flap rate was 11.2% (n = 104). Predictors determined by logistic regression included prior groin surgery (OR = 4.0, p < 0.001), prosthetic graft (OR = 2.7, p < 0.001), coronary artery disease (OR = 1.8, p = 0.019), peripheral arterial disease (OR = 5.0, p < 0.001), and obesity (OR = 1.7, p = 0.039). Based upon the respective logistic coefficients, a simplified scoring system was developed to enable the preoperative risk stratification regarding the likelihood of a significant complication which would require a salvage muscle flap. The c-statistic for the regression demonstrated excellent discrimination at 0.89. This study presents a simple, internally validated risk assessment tool that accurately predicts wound morbidity requiring flap salvage in open groin vascular surgery patients. The preoperatively high-risk patient can be identified and selectively targeted as a candidate for a prophylactic muscle flap.
Austin, P C; Shah, B R; Newman, A; Anderson, G M
2012-09-01
There are limited validated methods to ascertain comorbidities for risk adjustment in ambulatory populations of patients with diabetes using administrative health-care databases. The objective was to examine the ability of the Johns Hopkins' Aggregated Diagnosis Groups to predict mortality in population-based ambulatory samples of both incident and prevalent subjects with diabetes. Retrospective cohorts constructed using population-based administrative data. The incident cohort consisted of all 346,297 subjects diagnosed with diabetes between 1 April 2004 and 31 March 2008. The prevalent cohort consisted of all 879,849 subjects with pre-existing diabetes on 1 January, 2007. The outcome was death within 1 year of the subject's index date. A logistic regression model consisting of age, sex and indicator variables for 22 of the 32 Johns Hopkins' Aggregated Diagnosis Group categories had excellent discrimination for predicting mortality in incident diabetes patients: the c-statistic was 0.87 in an independent validation sample. A similar model had excellent discrimination for predicting mortality in prevalent diabetes patients: the c-statistic was 0.84 in an independent validation sample. Both models demonstrated very good calibration, denoting good agreement between observed and predicted mortality across the range of predicted mortality in which the large majority of subjects lay. For comparative purposes, regression models incorporating the Charlson comorbidity index, age and sex, age and sex, and age alone had poorer discrimination than the model that incorporated the Johns Hopkins' Aggregated Diagnosis Groups. Logistical regression models using age, sex and the John Hopkins' Aggregated Diagnosis Groups were able to accurately predict 1-year mortality in population-based samples of patients with diabetes. © 2011 The Authors. Diabetic Medicine © 2011 Diabetes UK.
Cardiorespiratory Fitness, Waist Circumference and Alanine Aminotransferase in Youth
Trilk, Jennifer L.; Ortaglia, Andrew; Blair, Steven N.; Bottai, Matteo; Church, Timothy S.; Pate, Russell R.
2012-01-01
Non-alcoholic fatty liver disease (NAFLD) is considered the liver component of the metabolic syndrome and is strongly associated with cardiometabolic diseases. In adults, cardiorespiratory fitness (CRF) is inversely associated with alanine aminotransferase (ALT), a blood biomarker for NAFLD. However, information regarding these associations is scarce for youth. Purpose To examine associations between CRF, waist circumference (WC) and ALT in youth. Methods Data were obtained from youth (n=2844, 12-19 years) in the National Health and Nutrition Examination Survey (NHANES) 2001-2004. CRF was dichotomized into youth FITNESSGRAM® categories of “low” and “adequate” CRF. Logistic and quantile regression were used for a comprehensive analysis of associations, and variables with previously-reported associations with ALT were a priori included in the models. Results Results from logistic regression suggested that youth with low CRF had 1.5 times the odds of having an ALT>30 than youth with adequate CRF, although the association was not statistically significant (P=0.09). However, quantile regression demonstrated that youth with low CRF had statistically significantly higher ALT (+1.04, +1.05, and +2.57 U/L) at the upper end of the ALT distribution (80th, 85th, and 90th percentiles, respectively) than youth with adequate CRF. For every 1-cm increase in WC, the odds of having an ALT>30 increased by 1.06 (P<0.001), and the strength of this association increased across the ALT distribution. Conclusions Future studies should examine whether interventions to improve CRF can decrease hepatic fat and liver enzyme concentrations in youth with ALT ≥80th percentile or in youth diagnosed with NAFLD. PMID:23190589
Maciejewski, Conrad C; Haines, Trevor; Rourke, Keith F
2017-05-01
To identify factors that predict patient satisfaction after urethroplasty by prospectively examining patient-reported quality of life scores using 3 validated instruments. A 3-part prospective survey consisting of the International Prostate Symptom Score (IPSS), the International Index of Erectile Function (IIEF) score, and a urethroplasty quality of life survey was completed by patients who underwent urethroplasty preoperatively and at 6 months postoperatively. The quality of life score included questions on genitourinary pain, urinary tract infection (UTI), postvoid dribbling, chordee, shortening, overall satisfaction, and overall health. Data were analyzed using descriptive statistics, paired t test, univariate and multivariate logistic regression analyses, and Wilcoxon signed-rank analysis. Patients were enrolled in the study from February 2011 to December 2014, and a total of 94 patients who underwent a total of 102 urethroplasties completed the study. Patients reported statistically significant improvements in IPSS (P < .001). Ordinal linear regression analysis revealed no association between age, IPSS, or IIEF score and patient satisfaction. Wilcoxon signed-rank analysis revealed significant improvements in pain scores (P = .02), UTI (P < .001), perceived overall health (P = .01), and satisfaction (P < .001). Univariate logistic regression identified a length >4 cm and the absence of UTI, pain, shortening, and chordee as predictors of patient satisfaction. Multivariate analysis of quality of life domain scores identified absence of shortening and absence of chordee as independent predictors of patient satisfaction following urethroplasty (P < .01). Patient voiding function and quality of life improve significantly following urethroplasty, but improvement in voiding function is not associated with patient satisfaction. Chordee status and perceived penile shortening impact patient satisfaction, and should be included in patient-reported outcome measures. Copyright © 2017 Elsevier Inc. All rights reserved.
A Method for Calculating the Probability of Successfully Completing a Rocket Propulsion Ground Test
NASA Technical Reports Server (NTRS)
Messer, Bradley P.
2004-01-01
Propulsion ground test facilities face the daily challenges of scheduling multiple customers into limited facility space and successfully completing their propulsion test projects. Due to budgetary and schedule constraints, NASA and industry customers are pushing to test more components, for less money, in a shorter period of time. As these new rocket engine component test programs are undertaken, the lack of technology maturity in the test articles, combined with pushing the test facilities capabilities to their limits, tends to lead to an increase in facility breakdowns and unsuccessful tests. Over the last five years Stennis Space Center's propulsion test facilities have performed hundreds of tests, collected thousands of seconds of test data, and broken numerous test facility and test article parts. While various initiatives have been implemented to provide better propulsion test techniques and improve the quality, reliability, and maintainability of goods and parts used in the propulsion test facilities, unexpected failures during testing still occur quite regularly due to the harsh environment in which the propulsion test facilities operate. Previous attempts at modeling the lifecycle of a propulsion component test project have met with little success. Each of the attempts suffered form incomplete or inconsistent data on which to base the models. By focusing on the actual test phase of the tests project rather than the formulation, design or construction phases of the test project, the quality and quantity of available data increases dramatically. A logistic regression model has been developed form the data collected over the last five years, allowing the probability of successfully completing a rocket propulsion component test to be calculated. A logistic regression model is a mathematical modeling approach that can be used to describe the relationship of several independent predictor variables X(sub 1), X(sub 2),..,X(sub k) to a binary or dichotomous dependent variable Y, where Y can only be one of two possible outcomes, in this case Success or Failure. Logistic regression has primarily been used in the fields of epidemiology and biomedical research, but lends itself to many other applications. As indicated the use of logistic regression is not new, however, modeling propulsion ground test facilities using logistic regression is both a new and unique application of the statistical technique. Results from the models provide project managers with insight and confidence into the affectivity of rocket engine component ground test projects. The initial success in modeling rocket propulsion ground test projects clears the way for more complex models to be developed in this area.
Mixed conditional logistic regression for habitat selection studies.
Duchesne, Thierry; Fortin, Daniel; Courbin, Nicolas
2010-05-01
1. Resource selection functions (RSFs) are becoming a dominant tool in habitat selection studies. RSF coefficients can be estimated with unconditional (standard) and conditional logistic regressions. While the advantage of mixed-effects models is recognized for standard logistic regression, mixed conditional logistic regression remains largely overlooked in ecological studies. 2. We demonstrate the significance of mixed conditional logistic regression for habitat selection studies. First, we use spatially explicit models to illustrate how mixed-effects RSFs can be useful in the presence of inter-individual heterogeneity in selection and when the assumption of independence from irrelevant alternatives (IIA) is violated. The IIA hypothesis states that the strength of preference for habitat type A over habitat type B does not depend on the other habitat types also available. Secondly, we demonstrate the significance of mixed-effects models to evaluate habitat selection of free-ranging bison Bison bison. 3. When movement rules were homogeneous among individuals and the IIA assumption was respected, fixed-effects RSFs adequately described habitat selection by simulated animals. In situations violating the inter-individual homogeneity and IIA assumptions, however, RSFs were best estimated with mixed-effects regressions, and fixed-effects models could even provide faulty conclusions. 4. Mixed-effects models indicate that bison did not select farmlands, but exhibited strong inter-individual variations in their response to farmlands. Less than half of the bison preferred farmlands over forests. Conversely, the fixed-effect model simply suggested an overall selection for farmlands. 5. Conditional logistic regression is recognized as a powerful approach to evaluate habitat selection when resource availability changes. This regression is increasingly used in ecological studies, but almost exclusively in the context of fixed-effects models. Fitness maximization can imply differences in trade-offs among individuals, which can yield inter-individual differences in selection and lead to departure from IIA. These situations are best modelled with mixed-effects models. Mixed-effects conditional logistic regression should become a valuable tool for ecological research.
Advanced colorectal neoplasia risk stratification by penalized logistic regression.
Lin, Yunzhi; Yu, Menggang; Wang, Sijian; Chappell, Richard; Imperiale, Thomas F
2016-08-01
Colorectal cancer is the second leading cause of death from cancer in the United States. To facilitate the efficiency of colorectal cancer screening, there is a need to stratify risk for colorectal cancer among the 90% of US residents who are considered "average risk." In this article, we investigate such risk stratification rules for advanced colorectal neoplasia (colorectal cancer and advanced, precancerous polyps). We use a recently completed large cohort study of subjects who underwent a first screening colonoscopy. Logistic regression models have been used in the literature to estimate the risk of advanced colorectal neoplasia based on quantifiable risk factors. However, logistic regression may be prone to overfitting and instability in variable selection. Since most of the risk factors in our study have several categories, it was tempting to collapse these categories into fewer risk groups. We propose a penalized logistic regression method that automatically and simultaneously selects variables, groups categories, and estimates their coefficients by penalizing the [Formula: see text]-norm of both the coefficients and their differences. Hence, it encourages sparsity in the categories, i.e. grouping of the categories, and sparsity in the variables, i.e. variable selection. We apply the penalized logistic regression method to our data. The important variables are selected, with close categories simultaneously grouped, by penalized regression models with and without the interactions terms. The models are validated with 10-fold cross-validation. The receiver operating characteristic curves of the penalized regression models dominate the receiver operating characteristic curve of naive logistic regressions, indicating a superior discriminative performance. © The Author(s) 2013.
Analyzing thresholds and efficiency with hierarchical Bayesian logistic regression.
Houpt, Joseph W; Bittner, Jennifer L
2018-07-01
Ideal observer analysis is a fundamental tool used widely in vision science for analyzing the efficiency with which a cognitive or perceptual system uses available information. The performance of an ideal observer provides a formal measure of the amount of information in a given experiment. The ratio of human to ideal performance is then used to compute efficiency, a construct that can be directly compared across experimental conditions while controlling for the differences due to the stimuli and/or task specific demands. In previous research using ideal observer analysis, the effects of varying experimental conditions on efficiency have been tested using ANOVAs and pairwise comparisons. In this work, we present a model that combines Bayesian estimates of psychometric functions with hierarchical logistic regression for inference about both unadjusted human performance metrics and efficiencies. Our approach improves upon the existing methods by constraining the statistical analysis using a standard model connecting stimulus intensity to human observer accuracy and by accounting for variability in the estimates of human and ideal observer performance scores. This allows for both individual and group level inferences. Copyright © 2018 Elsevier Ltd. All rights reserved.
Sampaolo, Letizia; Tommaso, Giulia; Gherardi, Bianca; Carrozzi, Giuliano; Freni Sterrantino, Anna; Ottone, Marta; Goldoni, Carlo Alberto; Bertozzi, Nicoletta; Scaringi, Meri; Bolognesi, Lara; Masocco, Maria; Salmaso, Stefania; Lauriola, Paolo
2017-01-01
"OBJECTIVES: to identify groups of people in relation to the perception of environmental risk and to assess the main characteristics using data collected in the environmental module of the surveillance network Italian Behavioral Risk Factor Surveillance System (PASSI). perceptive profiles were identified using a latent class analysis; later they were included as outcome in multinomial logistic regression models to assess the association between environmental risk perception and demographic, health, socio-economic and behavioural variables. the latent class analysis allowed to split the sample in "worried", "indifferent", and "positive" people. The multinomial logistic regression model showed that the "worried" profile typically includes people of Italian nationality, living in highly urbanized areas, with a high level of education, and with economic difficulties; they pay special attention to their own health and fitness, but they have a negative perception of their own psychophysical state. the application of advanced statistical analysis enable to appraise PASSI data in order to characterize the perception of environmental risk, making the planning of interventions related to risk communication possible. ".
Effort test failure: toward a predictive model.
Webb, James W; Batchelor, Jennifer; Meares, Susanne; Taylor, Alan; Marsh, Nigel V
2012-01-01
Predictors of effort test failure were examined in an archival sample of 555 traumatically brain-injured (TBI) adults. Logistic regression models were used to examine whether compensation-seeking, injury-related, psychological, demographic, and cultural factors predicted effort test failure (ETF). ETF was significantly associated with compensation-seeking (OR = 3.51, 95% CI [1.25, 9.79]), low education (OR:. 83 [.74, . 94]), self-reported mood disorder (OR: 5.53 [3.10, 9.85]), exaggerated displays of behavior (OR: 5.84 [2.15, 15.84]), psychotic illness (OR: 12.86 [3.21, 51.44]), being foreign-born (OR: 5.10 [2.35, 11.06]), having sustained a workplace accident (OR: 4.60 [2.40, 8.81]), and mild traumatic brain injury severity compared with very severe traumatic brain injury severity (OR: 0.37 [0.13, 0.995]). ETF was associated with a broader range of statistical predictors than has previously been identified and the relative importance of psychological and behavioral predictors of ETF was evident in the logistic regression model. Variables that might potentially extend the model of ETF are identified for future research efforts.
Occupational exposures and non-Hodgkin's lymphoma: Canadian case-control study.
Karunanayake, Chandima P; McDuffie, Helen H; Dosman, James A; Spinelli, John J; Pahwa, Punam
2008-08-07
The objective was to study the association between Non-Hodgkin's Lymphoma (NHL) and occupational exposures related to long held occupations among males in six provinces of Canada. A population based case-control study was conducted from 1991 to 1994. Males with newly diagnosed NHL (ICD-10) were stratified by province of residence and age group. A total of 513 incident cases and 1506 population based controls were included in the analysis. Conditional logistic regression was conducted to fit statistical models. Based on conditional logistic regression modeling, the following factors independently increased the risk of NHL: farmer and machinist as long held occupations; constant exposure to diesel exhaust fumes; constant exposure to ionizing radiation (radium); and personal history of another cancer. Men who had worked for 20 years or more as farmer and machinist were the most likely to develop NHL. An increased risk of developing NHL is associated with the following: long held occupations of faer and machinist; exposure to diesel fumes; and exposure to ionizing radiation (radium). The risk of NHL increased with the duration of employment as a farmer or machinist.
Comparing Methods for Assessing Reliability Uncertainty Based on Pass/Fail Data Collected Over Time
Abes, Jeff I.; Hamada, Michael S.; Hills, Charles R.
2017-12-20
In this paper, we compare statistical methods for analyzing pass/fail data collected over time; some methods are traditional and one (the RADAR or Rationale for Assessing Degradation Arriving at Random) was recently developed. These methods are used to provide uncertainty bounds on reliability. We make observations about the methods' assumptions and properties. Finally, we illustrate the differences between two traditional methods, logistic regression and Weibull failure time analysis, and the RADAR method using a numerical example.
2007-03-01
simulation are analyzed using regression, statistical and marginal benefit techniques to show how the MOEs are affected by varying levels of the...being supported by the seabase increases. A large marginal benefit is realized in reducing a unit’s frequency and time spent in a balk state by...units. SOF units operate within the range of sea-based helicopter assets; therefore the risk of a ‘ bingo ’ (i.e., near empty) fuel state is nearly
2016-09-01
noise density and temperature sensitivity of these devices are all on the same order of magnitude. Even the worst- case noise density of the GCDC...accelerations from a handgun firing were distinct from other impulsive events on the wrist, such as using a hammer. Loeffler first identified potential shots by...spikes, taking various statistical parameters. He used a logistic regression model on these parameters and was able to classify 98.9% of shots
Comparing Methods for Assessing Reliability Uncertainty Based on Pass/Fail Data Collected Over Time
DOE Office of Scientific and Technical Information (OSTI.GOV)
Abes, Jeff I.; Hamada, Michael S.; Hills, Charles R.
In this paper, we compare statistical methods for analyzing pass/fail data collected over time; some methods are traditional and one (the RADAR or Rationale for Assessing Degradation Arriving at Random) was recently developed. These methods are used to provide uncertainty bounds on reliability. We make observations about the methods' assumptions and properties. Finally, we illustrate the differences between two traditional methods, logistic regression and Weibull failure time analysis, and the RADAR method using a numerical example.
Fusion of multiscale wavelet-based fractal analysis on retina image for stroke prediction.
Che Azemin, M Z; Kumar, Dinesh K; Wong, T Y; Wang, J J; Kawasaki, R; Mitchell, P; Arjunan, Sridhar P
2010-01-01
In this paper, we present a novel method of analyzing retinal vasculature using Fourier Fractal Dimension to extract the complexity of the retinal vasculature enhanced at different wavelet scales. Logistic regression was used as a fusion method to model the classifier for 5-year stroke prediction. The efficacy of this technique has been tested using standard pattern recognition performance evaluation, Receivers Operating Characteristics (ROC) analysis and medical prediction statistics, odds ratio. Stroke prediction model was developed using the proposed system.
Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon
2015-01-01
Background: Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. Methods: In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. Results: The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Conclusion: Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended. PMID:26793655
Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon
2015-01-01
Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended.
Kempe, P T; van Oppen, P; de Haan, E; Twisk, J W R; Sluis, A; Smit, J H; van Dyck, R; van Balkom, A J L M
2007-09-01
Two methods for predicting remissions in obsessive-compulsive disorder (OCD) treatment are evaluated. Y-BOCS measurements of 88 patients with a primary OCD (DSM-III-R) diagnosis were performed over a 16-week treatment period, and during three follow-ups. Remission at any measurement was defined as a Y-BOCS score lower than thirteen combined with a reduction of seven points when compared with baseline. Logistic regression models were compared with a Cox regression for recurrent events model. Logistic regression yielded different models at different evaluation times. The recurrent events model remained stable when fewer measurements were used. Higher baseline levels of neuroticism and more severe OCD symptoms were associated with a lower chance of remission, early age of onset and more depressive symptoms with a higher chance. Choice of outcome time affects logistic regression prediction models. Recurrent events analysis uses all information on remissions and relapses. Short- and long-term predictors for OCD remission show overlap.
Koltun, G.F.; Kula, Stephanie P.
2013-01-01
This report presents the results of a study to develop methods for estimating selected low-flow statistics and for determining annual flow-duration statistics for Ohio streams. Regression techniques were used to develop equations for estimating 10-year recurrence-interval (10-percent annual-nonexceedance probability) low-flow yields, in cubic feet per second per square mile, with averaging periods of 1, 7, 30, and 90-day(s), and for estimating the yield corresponding to the long-term 80-percent duration flow. These equations, which estimate low-flow yields as a function of a streamflow-variability index, are based on previously published low-flow statistics for 79 long-term continuous-record streamgages with at least 10 years of data collected through water year 1997. When applied to the calibration dataset, average absolute percent errors for the regression equations ranged from 15.8 to 42.0 percent. The regression results have been incorporated into the U.S. Geological Survey (USGS) StreamStats application for Ohio (http://water.usgs.gov/osw/streamstats/ohio.html) in the form of a yield grid to facilitate estimation of the corresponding streamflow statistics in cubic feet per second. Logistic-regression equations also were developed and incorporated into the USGS StreamStats application for Ohio for selected low-flow statistics to help identify occurrences of zero-valued statistics. Quantiles of daily and 7-day mean streamflows were determined for annual and annual-seasonal (September–November) periods for each complete climatic year of streamflow-gaging station record for 110 selected streamflow-gaging stations with 20 or more years of record. The quantiles determined for each climatic year were the 99-, 98-, 95-, 90-, 80-, 75-, 70-, 60-, 50-, 40-, 30-, 25-, 20-, 10-, 5-, 2-, and 1-percent exceedance streamflows. Selected exceedance percentiles of the annual-exceedance percentiles were subsequently computed and tabulated to help facilitate consideration of the annual risk of exceedance or nonexceedance of annual and annual-seasonal-period flow-duration values. The quantiles are based on streamflow data collected through climatic year 2008.
Estimating the exceedance probability of rain rate by logistic regression
NASA Technical Reports Server (NTRS)
Chiu, Long S.; Kedem, Benjamin
1990-01-01
Recent studies have shown that the fraction of an area with rain intensity above a fixed threshold is highly correlated with the area-averaged rain rate. To estimate the fractional rainy area, a logistic regression model, which estimates the conditional probability that rain rate over an area exceeds a fixed threshold given the values of related covariates, is developed. The problem of dependency in the data in the estimation procedure is bypassed by the method of partial likelihood. Analyses of simulated scanning multichannel microwave radiometer and observed electrically scanning microwave radiometer data during the Global Atlantic Tropical Experiment period show that the use of logistic regression in pixel classification is superior to multiple regression in predicting whether rain rate at each pixel exceeds a given threshold, even in the presence of noisy data. The potential of the logistic regression technique in satellite rain rate estimation is discussed.
NASA Astrophysics Data System (ADS)
Roşca, S.; Bilaşco, Ş.; Petrea, D.; Fodorean, I.; Vescan, I.; Filip, S.; Măguţ, F.-L.
2015-11-01
The existence of a large number of GIS models for the identification of landslide occurrence probability makes difficult the selection of a specific one. The present study focuses on the application of two quantitative models: the logistic and the BSA models. The comparative analysis of the results aims at identifying the most suitable model. The territory corresponding to the Niraj Mic Basin (87 km2) is an area characterised by a wide variety of the landforms with their morphometric, morphographical and geological characteristics as well as by a high complexity of the land use types where active landslides exist. This is the reason why it represents the test area for applying the two models and for the comparison of the results. The large complexity of input variables is illustrated by 16 factors which were represented as 72 dummy variables, analysed on the basis of their importance within the model structures. The testing of the statistical significance corresponding to each variable reduced the number of dummy variables to 12 which were considered significant for the test area within the logistic model, whereas for the BSA model all the variables were employed. The predictability degree of the models was tested through the identification of the area under the ROC curve which indicated a good accuracy (AUROC = 0.86 for the testing area) and predictability of the logistic model (AUROC = 0.63 for the validation area).
NASA Astrophysics Data System (ADS)
Cary, Theodore W.; Cwanger, Alyssa; Venkatesh, Santosh S.; Conant, Emily F.; Sehgal, Chandra M.
2012-03-01
This study compares the performance of two proven but very different machine learners, Naïve Bayes and logistic regression, for differentiating malignant and benign breast masses using ultrasound imaging. Ultrasound images of 266 masses were analyzed quantitatively for shape, echogenicity, margin characteristics, and texture features. These features along with patient age, race, and mammographic BI-RADS category were used to train Naïve Bayes and logistic regression classifiers to diagnose lesions as malignant or benign. ROC analysis was performed using all of the features and using only a subset that maximized information gain. Performance was determined by the area under the ROC curve, Az, obtained from leave-one-out cross validation. Naïve Bayes showed significant variation (Az 0.733 +/- 0.035 to 0.840 +/- 0.029, P < 0.002) with the choice of features, but the performance of logistic regression was relatively unchanged under feature selection (Az 0.839 +/- 0.029 to 0.859 +/- 0.028, P = 0.605). Out of 34 features, a subset of 6 gave the highest information gain: brightness difference, margin sharpness, depth-to-width, mammographic BI-RADs, age, and race. The probabilities of malignancy determined by Naïve Bayes and logistic regression after feature selection showed significant correlation (R2= 0.87, P < 0.0001). The diagnostic performance of Naïve Bayes and logistic regression can be comparable, but logistic regression is more robust. Since probability of malignancy cannot be measured directly, high correlation between the probabilities derived from two basic but dissimilar models increases confidence in the predictive power of machine learning models for characterizing solid breast masses on ultrasound.
Wang, Qingliang; Li, Xiaojie; Hu, Kunpeng; Zhao, Kun; Yang, Peisheng; Liu, Bo
2015-05-12
To explore the risk factors of portal hypertensive gastropathy (PHG) in patients with hepatitis B associated cirrhosis and establish a Logistic regression model of noninvasive prediction. The clinical data of 234 hospitalized patients with hepatitis B associated cirrhosis from March 2012 to March 2014 were analyzed retrospectively. The dependent variable was the occurrence of PHG while the independent variables were screened by binary Logistic analysis. Multivariate Logistic regression was used for further analysis of significant noninvasive independent variables. Logistic regression model was established and odds ratio was calculated for each factor. The accuracy, sensitivity and specificity of model were evaluated by the curve of receiver operating characteristic (ROC). According to univariate Logistic regression, the risk factors included hepatic dysfunction, albumin (ALB), bilirubin (TB), prothrombin time (PT), platelet (PLT), white blood cell (WBC), portal vein diameter, spleen index, splenic vein diameter, diameter ratio, PLT to spleen volume ratio, esophageal varices (EV) and gastric varices (GV). Multivariate analysis showed that hepatic dysfunction (X1), TB (X2), PLT (X3) and splenic vein diameter (X4) were the major occurring factors for PHG. The established regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4. The accuracy of model for PHG was 79.1% with a sensitivity of 77.2% and a specificity of 80.8%. Hepatic dysfunction, TB, PLT and splenic vein diameter are risk factors for PHG and the noninvasive predicted Logistic regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4.
Risk factors for lesions of the knee menisci among workers in South Korea's national parks.
Shin, Donghee; Youn, Kanwoo; Lee, Eunja; Lee, Myeongjun; Chung, Hweemin; Kim, Deokweon
2016-01-01
This study was designed to investigate the prevalence of the menisci lesions in national park workers and work factors affecting this prevalence. The study subjects were 698 workers who worked in 20 Korean national parks in 2014. An orthopedist visited each national park and performed physical examinations. Knee MRI was performed if the McMurray test or Apley test was positive and there was a complaint of pain in knee area. An orthopedist and a radiologist respectively read these images of the menisci using a grading system based on the MRI signals. To calculate the cumulative intensity of trekking of the workers, the mean trail distance, the difficulty of the trail, the tenure at each national parks, and the number of treks per month for each worker from the start of work until the present were investigated. Chi-square tests was performed to see if there were differences in the menisci lesions grade according to the variables. The variables used in the Chi-square test were evaluated using simple logistic regression analysis to get crude odds ratios, and adjusted odds ratios and 95 % confidence intervals were calculated using multivariate logistic regression analysis after establishing three different models according to the adjusted variables. According to the MRI signal grades of menisci, 29 % were grade 0, 11.3 % were grade 1, 46.0 % were grade 2, and 13.7 % were grade 3. The differences in the MRI signal grades of menisci according to age and the intensity of trekking as calculated by the three different methods were statistically significant. Multiple logistic regression analysis was performed for three models. In model 1, there was no statistically significant factor affecting the menisci lesions. In model 2, among the factors affecting the menisci lesions, the OR of a high cumulative intensity of trekking was 4.08 (95 % CI 1.00-16.61), and in model 3, the OR of a high cumulative intensity of trekking was 5.84 (95 % CI 1.09-31.26). The factor that most affected the menisci lesions among the workers in Korean national park was a high cumulative intensity of trekking.
Variable Selection in Logistic Regression.
1987-06-01
23 %. AUTIOR(.) S. CONTRACT OR GRANT NUMBE Rf.i %Z. D. Bai, P. R. Krishnaiah and . C. Zhao F49620-85- C-0008 " PERFORMING ORGANIZATION NAME AND AOORESS...d I7 IOK-TK- d 7 -I0 7’ VARIABLE SELECTION IN LOGISTIC REGRESSION Z. D. Bai, P. R. Krishnaiah and L. C. Zhao Center for Multivariate Analysis...University of Pittsburgh Center for Multivariate Analysis University of Pittsburgh Y !I VARIABLE SELECTION IN LOGISTIC REGRESSION Z- 0. Bai, P. R. Krishnaiah
Understanding logistic regression analysis.
Sperandei, Sandro
2014-01-01
Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using examples to make it as simple as possible. After definition of the technique, the basic interpretation of the results is highlighted and then some special issues are discussed.
Ignjatović, Aleksandra; Stojanović, Miodrag; Milošević, Zoran; Anđelković Apostolović, Marija
2017-12-02
The interest in developing risk models in medicine not only is appealing, but also associated with many obstacles in different aspects of predictive model development. Initially, the association of biomarkers or the association of more markers with the specific outcome was proven by statistical significance, but novel and demanding questions required the development of new and more complex statistical techniques. Progress of statistical analysis in biomedical research can be observed the best through the history of the Framingham study and development of the Framingham score. Evaluation of predictive models comes from a combination of the facts which are results of several metrics. Using logistic regression and Cox proportional hazards regression analysis, the calibration test, and the ROC curve analysis should be mandatory and eliminatory, and the central place should be taken by some new statistical techniques. In order to obtain complete information related to the new marker in the model, recently, there is a recommendation to use the reclassification tables by calculating the net reclassification index and the integrated discrimination improvement. Decision curve analysis is a novel method for evaluating the clinical usefulness of a predictive model. It may be noted that customizing and fine-tuning of the Framingham risk score initiated the development of statistical analysis. Clinically applicable predictive model should be a trade-off between all abovementioned statistical metrics, a trade-off between calibration and discrimination, accuracy and decision-making, costs and benefits, and quality and quantity of patient's life.
Nazir, Muhammad Ashraf; Almas, Khalid; Majeed, Muhammad Irfan
2017-01-01
To evaluate the prevalence of halitosis and the factors associated with it among dental students and interns in Lahore, Pakistan. A cross-sectional study design was chosen, and a sample of dental students and interns was collected from seven dental colleges in Lahore, Pakistan. A total of 833 participants were approached in person as convenient sample population. A self-reported questionnaire was administered and informed consent was obtained. The associations between oral malodor and different variables of the study were explored using analytical statistics (Chi-square test and logistic regression analysis). Statistical significance was determined using a 95% confidence interval (CI). Six hundred and fifteen participants (aged 19-27 years) completed the survey with a response rate of 73.8%. The prevalence of self-reported halitosis was 75.1%. More female (51.4%) than male students (23.7%) reported oral malodor, and most participants (61%) reported early morning halitosis. Thirteen percent of respondents had examination for oral malodor by a dentist and 37.6% treated the condition with self-medication. Binary logistic regression model showed that male gender (odds ratio [OR] =0.44, CI = 0.22-0.87), daily use of dental floss (OR = 0.28, CI = 0.13-0.58), and drinking tea with mint (OR = 0.44, CI = 0.22-0.89) were significantly associated with oral malodor. The participants with tongue coating had higher odds (OR = 2.75, CI = 1.13-6.69) of having oral malodor than those without tongue coating, and the association was statistically significant. The study identified high prevalence of oral malodor among dental students and interns. They should receive appropriate diagnosis and management of the condition from dentist. The regular use of dental floss and removal of tongue coating can significantly reduce halitosis.
[An evaluation of clinical characteristics and prognosis of brain-stem infarction in diabetics].
Lu, Zheng-qi; Li, Hai-yan; Hu, Xue-qiang; Zhang, Bing-jun
2011-01-01
To analyze the relationship between diabetics and the onset, clinical outcomes and prognosis of brainstem infarction, and to evaluate the impact of diabetes on brainstem infarction. Compare 172 cases of acute brainstem infarction in patients with or without diabetes. Analyze the associated risk factors of patients with brain-stem infarction in diabetics by multi-variate logistic regression analysis. Compare the National Institutes of Health Stroke Scale (NIHSS) and Modified Rankin scale (mRS) Score, pathogenetic condition and the outcome of the two groups in different times. The systolic blood pressure (SBP), TG, LDL-C, apolipoprotein B (Apo B), glutamyl transpeptidase (γ-GT), fibrinogen (Fb), fasting blood glucose (FPG) and glycosylated hemoglobin(HbA1c)in diabetic group were higher than those in non-diabetic group, which was statistically significant (P < 0.05). From multi-variate logistic regression analysis, γ-GT, Apo B and FPG were the risk predictors of diabetes with brainstem infarction(OR = 1.017, 4.667 and 3.173, respectively), while HDL-C was protective (OR = 0.288). HbA1c was a risk predictor of severity for acute brainstem infarction (OR = 1.299), while Apo A was beneficial (OR = 0.212). Compared with brain-stem infarction in non-diabetic group, NIHSS score and intensive care therapy of diabetic groups on the admission had no statistically significance, while the NIHSS score on discharge and the outcome at 6 months' of follow-up were statistically significant. Diabetes is closely associated with brainstem infarction. Brainstem infarction with diabetes cause more rapid progression, poorer prognosis, higher rates of mortality as well as disability and higher recurrence rate of cerebral infarction.
Peters, L L; Boter, H; Burgerhof, J G M; Slaets, J P J; Buskens, E
2015-09-01
The primary objective of the present study was to evaluate the validity of the Groningen Frailty Indicator (GFI) in a sample of Dutch elderly persons participating in LifeLines, a large population-based cohort study. Additional aims were to assess differences between frail and non-frail elderly and examine which individual characteristics were associated with frailty. By December 2012, 5712 elderly persons were enrolled in LifeLines and complied with the inclusion criteria of the present study. Mann-Whitney U or Kruskal-Wallis tests were used to assess the variability of GFI-scores among elderly subgroups that differed in demographic characteristics, morbidity, obesity, and healthcare utilization. Within subgroups Kruskal-Wallis tests were also used to examine differences in GFI-scores across age groups. Multivariate logistic regression analyses were performed to assess associations between individual characteristics and frailty. The GFI discriminated between subgroups: statistically significantly higher GFI-median scores (interquartile range) were found in e.g. males (1 [0-2]), the oldest old (2 [1-3]), in elderly who were single (1 [0-2]), with lower socio economic status (1 [0-3]), with increasing co-morbidity (2 [1-3]), who were obese (2 [1-3]), and used more healthcare (2 [1-4]). Overall age had an independent and statistically significant association with GFI scores. Compared with the non-frail, frail elderly persons experienced statistically significantly more chronic stress and more social/psychological related problems. In the multivariate logistic regression model, psychological morbidity had the strongest association with frailty. The present study supports the construct validity of the GFI and provides an insight in the characteristics of (non)frail community-dwelling elderly persons participating in LifeLines. Copyright © 2015 Elsevier Inc. All rights reserved.
Popko, Janusz; Karpiński, Michał; Chojnowska, Sylwia; Maresz, Katarzyna; Milewski, Robert; Badmaev, Vladimir; Schurgers, Leon J
2018-06-06
In the past decades, an increased interest in the roles of vitamin D and K has become evident, in particular in relation to bone health and prevention of bone fractures. The aim of the current study was to evaluate vitamin D and K status in children with low-energy fractures and in children without fractures. The study group of 20 children (14 boys, 6 girls) aged 5 to 15 years old, with radiologically confirmed low-energy fractures was compared with the control group of 19 healthy children (9 boys, 10 girls), aged 7 to 17 years old, without fractures. Total vitamin D (25(OH)D3 plus 25(OH)D2), calcium, BALP (bone alkaline phosphatase), NTx (N-terminal telopeptide), and uncarboxylated (ucOC) and carboxylated osteocalcin (cOC) serum concentrations were evaluated. Ratio of serum uncarboxylated osteocalcin to serum carboxylated osteocalcin ucOC:cOC (UCR) was used as an indicator of bone vitamin K status. Logistic regression models were created to establish UCR influence for odds ratio of low-energy fractures in both groups. There were no statistically significant differences in the serum calcium, NTx, BALP, or total vitamin D levels between the two groups. There was, however, a statistically significant difference in the UCR ratio. The median UCR in the fracture group was 0.471 compared with the control group value of 0.245 ( p < 0.0001). In the logistic regression analysis, odds ratio of low-energy fractures for UCR was calculated, with an increased risk of fractures by some 78.3 times. In this pilot study, better vitamin K status expressed as the ratio of ucOC:cOC-UCR—is positively and statistically significantly correlated with lower rate of low-energy fracture incidence.
Wei, Peng; Tang, Hongwei; Li, Donghui
2014-01-01
Most complex human diseases are likely the consequence of the joint actions of genetic and environmental factors. Identification of gene-environment (GxE) interactions not only contributes to a better understanding of the disease mechanisms, but also improves disease risk prediction and targeted intervention. In contrast to the large number of genetic susceptibility loci discovered by genome-wide association studies, there have been very few successes in identifying GxE interactions which may be partly due to limited statistical power and inaccurately measured exposures. While existing statistical methods only consider interactions between genes and static environmental exposures, many environmental/lifestyle factors, such as air pollution and diet, change over time, and cannot be accurately captured at one measurement time point or by simply categorizing into static exposure categories. There is a dearth of statistical methods for detecting gene by time-varying environmental exposure interactions. Here we propose a powerful functional logistic regression (FLR) approach to model the time-varying effect of longitudinal environmental exposure and its interaction with genetic factors on disease risk. Capitalizing on the powerful functional data analysis framework, our proposed FLR model is capable of accommodating longitudinal exposures measured at irregular time points and contaminated by measurement errors, commonly encountered in observational studies. We use extensive simulations to show that the proposed method can control the Type I error and is more powerful than alternative ad hoc methods. We demonstrate the utility of this new method using data from a case-control study of pancreatic cancer to identify the windows of vulnerability of lifetime body mass index on the risk of pancreatic cancer as well as genes which may modify this association. PMID:25219575
Predicting Rotator Cuff Tears Using Data Mining and Bayesian Likelihood Ratios
Lu, Hsueh-Yi; Huang, Chen-Yuan; Su, Chwen-Tzeng; Lin, Chen-Chiang
2014-01-01
Objectives Rotator cuff tear is a common cause of shoulder diseases. Correct diagnosis of rotator cuff tears can save patients from further invasive, costly and painful tests. This study used predictive data mining and Bayesian theory to improve the accuracy of diagnosing rotator cuff tears by clinical examination alone. Methods In this retrospective study, 169 patients who had a preliminary diagnosis of rotator cuff tear on the basis of clinical evaluation followed by confirmatory MRI between 2007 and 2011 were identified. MRI was used as a reference standard to classify rotator cuff tears. The predictor variable was the clinical assessment results, which consisted of 16 attributes. This study employed 2 data mining methods (ANN and the decision tree) and a statistical method (logistic regression) to classify the rotator cuff diagnosis into “tear” and “no tear” groups. Likelihood ratio and Bayesian theory were applied to estimate the probability of rotator cuff tears based on the results of the prediction models. Results Our proposed data mining procedures outperformed the classic statistical method. The correction rate, sensitivity, specificity and area under the ROC curve of predicting a rotator cuff tear were statistical better in the ANN and decision tree models compared to logistic regression. Based on likelihood ratios derived from our prediction models, Fagan's nomogram could be constructed to assess the probability of a patient who has a rotator cuff tear using a pretest probability and a prediction result (tear or no tear). Conclusions Our predictive data mining models, combined with likelihood ratios and Bayesian theory, appear to be good tools to classify rotator cuff tears as well as determine the probability of the presence of the disease to enhance diagnostic decision making for rotator cuff tears. PMID:24733553
Wei, Peng; Tang, Hongwei; Li, Donghui
2014-11-01
Most complex human diseases are likely the consequence of the joint actions of genetic and environmental factors. Identification of gene-environment (G × E) interactions not only contributes to a better understanding of the disease mechanisms, but also improves disease risk prediction and targeted intervention. In contrast to the large number of genetic susceptibility loci discovered by genome-wide association studies, there have been very few successes in identifying G × E interactions, which may be partly due to limited statistical power and inaccurately measured exposures. Although existing statistical methods only consider interactions between genes and static environmental exposures, many environmental/lifestyle factors, such as air pollution and diet, change over time, and cannot be accurately captured at one measurement time point or by simply categorizing into static exposure categories. There is a dearth of statistical methods for detecting gene by time-varying environmental exposure interactions. Here, we propose a powerful functional logistic regression (FLR) approach to model the time-varying effect of longitudinal environmental exposure and its interaction with genetic factors on disease risk. Capitalizing on the powerful functional data analysis framework, our proposed FLR model is capable of accommodating longitudinal exposures measured at irregular time points and contaminated by measurement errors, commonly encountered in observational studies. We use extensive simulations to show that the proposed method can control the Type I error and is more powerful than alternative ad hoc methods. We demonstrate the utility of this new method using data from a case-control study of pancreatic cancer to identify the windows of vulnerability of lifetime body mass index on the risk of pancreatic cancer as well as genes that may modify this association. © 2014 Wiley Periodicals, Inc.
Orthotopic bladder substitution in men revisited: identification of continence predictors.
Koraitim, M M; Atta, M A; Foda, M K
2006-11-01
We determined the impact of the functional characteristics of the neobladder and urethral sphincter on continence results, and determined the most significant predictors of continence. A total of 88 male patients 29 to 70 years old underwent orthotopic bladder substitution with tubularized ileocecal segment (40) and detubularized sigmoid (25) or ileum (23). Uroflowmetry, cystometry and urethral pressure profilometry were performed at 13 to 36 months (mean 19) postoperatively. The correlation between urinary continence and 28 urodynamic variables was assessed. Parameters that correlated significantly with continence were entered into a multivariate analysis using a logistic regression model to determine the most significant predictors of continence. Maximum urethral closure pressure was the only parameter that showed a statistically significant correlation with diurnal continence. Nocturnal continence had not only a statistically significant positive correlation with maximum urethral closure pressure, but also statistically significant negative correlations with maximum contraction amplitude, and baseline pressure at mid and maximum capacity. Three of these 4 parameters, including maximum urethral closure pressure, maximum contraction amplitude and baseline pressure at mid capacity, proved to be significant predictors of continence on multivariate analysis. While daytime continence is determined by maximum urethral closure pressure, during the night it is the net result of 2 forces that have about equal influence but in opposite directions, that is maximum urethral closure pressure vs maximum contraction amplitude plus baseline pressure at mid capacity. Two equations were derived from the logistic regression model to predict the probability of continence after orthotopic bladder substitution, including Z1 (diurnal) = 0.605 + 0.0085 maximum urethral closure pressure and Z2 (nocturnal) = 0.841 + 0.01 [maximum urethral closure pressure - (maximum contraction amplitude + baseline pressure at mid capacity)].
ERIC Educational Resources Information Center
Koon, Sharon; Petscher, Yaacov
2015-01-01
The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…
Ngwa, Julius S; Cabral, Howard J; Cheng, Debbie M; Pencina, Michael J; Gagnon, David R; LaValley, Michael P; Cupples, L Adrienne
2016-11-03
Typical survival studies follow individuals to an event and measure explanatory variables for that event, sometimes repeatedly over the course of follow up. The Cox regression model has been used widely in the analyses of time to diagnosis or death from disease. The associations between the survival outcome and time dependent measures may be biased unless they are modeled appropriately. In this paper we explore the Time Dependent Cox Regression Model (TDCM), which quantifies the effect of repeated measures of covariates in the analysis of time to event data. This model is commonly used in biomedical research but sometimes does not explicitly adjust for the times at which time dependent explanatory variables are measured. This approach can yield different estimates of association compared to a model that adjusts for these times. In order to address the question of how different these estimates are from a statistical perspective, we compare the TDCM to Pooled Logistic Regression (PLR) and Cross Sectional Pooling (CSP), considering models that adjust and do not adjust for time in PLR and CSP. In a series of simulations we found that time adjusted CSP provided identical results to the TDCM while the PLR showed larger parameter estimates compared to the time adjusted CSP and the TDCM in scenarios with high event rates. We also observed upwardly biased estimates in the unadjusted CSP and unadjusted PLR methods. The time adjusted PLR had a positive bias in the time dependent Age effect with reduced bias when the event rate is low. The PLR methods showed a negative bias in the Sex effect, a subject level covariate, when compared to the other methods. The Cox models yielded reliable estimates for the Sex effect in all scenarios considered. We conclude that survival analyses that explicitly account in the statistical model for the times at which time dependent covariates are measured provide more reliable estimates compared to unadjusted analyses. We present results from the Framingham Heart Study in which lipid measurements and myocardial infarction data events were collected over a period of 26 years.
2013-01-01
Background Malnutrition is one of the principal causes of child mortality in developing countries including Bangladesh. According to our knowledge, most of the available studies, that addressed the issue of malnutrition among under-five children, considered the categorical (dichotomous/polychotomous) outcome variables and applied logistic regression (binary/multinomial) to find their predictors. In this study malnutrition variable (i.e. outcome) is defined as the number of under-five malnourished children in a family, which is a non-negative count variable. The purposes of the study are (i) to demonstrate the applicability of the generalized Poisson regression (GPR) model as an alternative of other statistical methods and (ii) to find some predictors of this outcome variable. Methods The data is extracted from the Bangladesh Demographic and Health Survey (BDHS) 2007. Briefly, this survey employs a nationally representative sample which is based on a two-stage stratified sample of households. A total of 4,460 under-five children is analysed using various statistical techniques namely Chi-square test and GPR model. Results The GPR model (as compared to the standard Poisson regression and negative Binomial regression) is found to be justified to study the above-mentioned outcome variable because of its under-dispersion (variance < mean) property. Our study also identify several significant predictors of the outcome variable namely mother’s education, father’s education, wealth index, sanitation status, source of drinking water, and total number of children ever born to a woman. Conclusions Consistencies of our findings in light of many other studies suggest that the GPR model is an ideal alternative of other statistical models to analyse the number of under-five malnourished children in a family. Strategies based on significant predictors may improve the nutritional status of children in Bangladesh. PMID:23297699
What are hierarchical models and how do we analyze them?
Royle, Andy
2016-01-01
In this chapter we provide a basic definition of hierarchical models and introduce the two canonical hierarchical models in this book: site occupancy and N-mixture models. The former is a hierarchical extension of logistic regression and the latter is a hierarchical extension of Poisson regression. We introduce basic concepts of probability modeling and statistical inference including likelihood and Bayesian perspectives. We go through the mechanics of maximizing the likelihood and characterizing the posterior distribution by Markov chain Monte Carlo (MCMC) methods. We give a general perspective on topics such as model selection and assessment of model fit, although we demonstrate these topics in practice in later chapters (especially Chapters 5, 6, 7, and 10 Chapter 5 Chapter 6 Chapter 7 Chapter 10)
Threshold altitude resulting in decompression sickness
NASA Technical Reports Server (NTRS)
Kumar, K. V.; Waligora, James M.; Calkins, Dick S.
1990-01-01
A review of case reports, hypobaric chamber training data, and experimental evidence indicated that the threshold for incidence of altitude decompression sickness (DCS) was influenced by various factors such as prior denitrogenation, exercise or rest, and period of exposure, in addition to individual susceptibility. Fitting these data with appropriate statistical models makes it possible to examine the influence of various factors on the threshold for DCS. This approach was illustrated by logistic regression analysis on the incidence of DCS below 9144 m. Estimations using these regressions showed that, under a noprebreathe, 6-h exposure, simulated EVA profile, the threshold for symptoms occurred at approximately 3353 m; while under a noprebreathe, 2-h exposure profile with knee-bends exercise, the threshold occurred at 7925 m.
Thomas, Christoph; Brodoefel, Harald; Tsiflikas, Ilias; Bruckner, Friederike; Reimann, Anja; Ketelsen, Dominik; Drosch, Tanja; Claussen, Claus D; Kopp, Andreas; Heuschmid, Martin; Burgstahler, Christof
2010-02-01
To prospectively evaluate the influence of the clinical pretest probability assessed by the Morise score onto image quality and diagnostic accuracy in coronary dual-source computed tomography angiography (DSCTA). In 61 patients, DSCTA and invasive coronary angiography were performed. Subjective image quality and accuracy for stenosis detection (>50%) of DSCTA with invasive coronary angiography as gold standard were evaluated. The influence of pretest probability onto image quality and accuracy was assessed by logistic regression and chi-square testing. Correlations of image quality and accuracy with the Morise score were determined using linear regression. Thirty-eight patients were categorized into the high, 21 into the intermediate, and 2 into the low probability group. Accuracies for the detection of significant stenoses were 0.94, 0.97, and 1.00, respectively. Logistic regressions and chi-square tests showed statistically significant correlations between Morise score and image quality (P < .0001 and P < .001) and accuracy (P = .0049 and P = .027). Linear regression revealed a cutoff Morise score for a good image quality of 16 and a cutoff for a barely diagnostic image quality beyond the upper Morise scale. Pretest probability is a weak predictor of image quality and diagnostic accuracy in coronary DSCTA. A sufficient image quality for diagnostic images can be reached with all pretest probabilities. Therefore, coronary DSCTA might be suitable also for patients with a high pretest probability. Copyright 2010 AUR. Published by Elsevier Inc. All rights reserved.
2017-03-23
PUBLIC RELEASE; DISTRIBUTION UNLIMITED Using Multiple and Logistic Regression to Estimate the Median Will- Cost and Probability of Cost and... Cost and Probability of Cost and Schedule Overrun for Program Managers Ryan C. Trudelle Follow this and additional works at: https://scholar.afit.edu...afit.edu. Recommended Citation Trudelle, Ryan C., "Using Multiple and Logistic Regression to Estimate the Median Will- Cost and Probability of Cost and
2013-11-01
Ptrend 0.78 0.62 0.75 Unconditional logistic regression was used to estimate odds ratios (OR) and 95 % confidence intervals (CI) for risk of node...Ptrend 0.71 0.67 Unconditional logistic regression was used to estimate odds ratios (OR) and 95 % confidence intervals (CI) for risk of high-grade tumors... logistic regression was used to estimate odds ratios (OR) and 95 % confidence intervals (CI) for the associations between each of the seven SNPs and
Dhiman, Paula; Kai, Joe; Horsfall, Laura; Walters, Kate; Qureshi, Nadeem
2014-01-01
The potential to use data on family history of premature disease to assess disease risk is increasingly recognised, particularly in scoring risk for coronary heart disease (CHD). However the quality of family health information in primary care records is unclear. To assess the availability and quality of family history of CHD documented in electronic primary care records. Cross-sectional study. 537 UK family practices contributing to The Health Improvement Network database. Data were obtained from patients aged 20 years or more, registered with their current practice between 1(st) January 1998 and 31(st) December 2008, for at least one year. The availability and quality of recorded CHD family history was assessed using multilevel logistic and ordinal logistic regression respectively. In a cross-section of 1,504,535 patients, 19% had a positive or negative family history of CHD recorded. Multilevel logistic regression showed patients aged 50-59 had higher odds of having their family history recorded compared to those aged 20-29 (OR:1.23 (1.21 to 1.25)), however most deprived patients had lower odds compared to those least deprived (OR: 0.86 (0.85 to 0.88)). Of the 140,058 patients with a positive family history recorded (9% of total cohort), age of onset was available in 45%; with data specifying both age of onset and relative affected available in only 11% of records. Multilevel ordinal logistic regression confirmed no statistical association between the quality of family history recording and age, gender, deprivation and year of registration. Family history of CHD is documented in a small proportion of primary care records; and where positive family history is documented the details are insufficient to assess familial risk or populate cardiovascular risk assessment tools. Data capture needs to be improved particularly for more disadvantaged patients who may be most likely to benefit from CHD risk assessment.
Kim, Sun Mi; Kim, Yongdai; Jeong, Kuhwan; Jeong, Heeyeong; Kim, Jiyoung
2018-01-01
The aim of this study was to compare the performance of image analysis for predicting breast cancer using two distinct regression models and to evaluate the usefulness of incorporating clinical and demographic data (CDD) into the image analysis in order to improve the diagnosis of breast cancer. This study included 139 solid masses from 139 patients who underwent a ultrasonography-guided core biopsy and had available CDD between June 2009 and April 2010. Three breast radiologists retrospectively reviewed 139 breast masses and described each lesion using the Breast Imaging Reporting and Data System (BI-RADS) lexicon. We applied and compared two regression methods-stepwise logistic (SL) regression and logistic least absolute shrinkage and selection operator (LASSO) regression-in which the BI-RADS descriptors and CDD were used as covariates. We investigated the performances of these regression methods and the agreement of radiologists in terms of test misclassification error and the area under the curve (AUC) of the tests. Logistic LASSO regression was superior (P<0.05) to SL regression, regardless of whether CDD was included in the covariates, in terms of test misclassification errors (0.234 vs. 0.253, without CDD; 0.196 vs. 0.258, with CDD) and AUC (0.785 vs. 0.759, without CDD; 0.873 vs. 0.735, with CDD). However, it was inferior (P<0.05) to the agreement of three radiologists in terms of test misclassification errors (0.234 vs. 0.168, without CDD; 0.196 vs. 0.088, with CDD) and the AUC without CDD (0.785 vs. 0.844, P<0.001), but was comparable to the AUC with CDD (0.873 vs. 0.880, P=0.141). Logistic LASSO regression based on BI-RADS descriptors and CDD showed better performance than SL in predicting the presence of breast cancer. The use of CDD as a supplement to the BI-RADS descriptors significantly improved the prediction of breast cancer using logistic LASSO regression.
Yu, Yuanyuan; Li, Hongkai; Sun, Xiaoru; Su, Ping; Wang, Tingting; Liu, Yi; Yuan, Zhongshang; Liu, Yanxun; Xue, Fuzhong
2017-12-28
Confounders can produce spurious associations between exposure and outcome in observational studies. For majority of epidemiologists, adjusting for confounders using logistic regression model is their habitual method, though it has some problems in accuracy and precision. It is, therefore, important to highlight the problems of logistic regression and search the alternative method. Four causal diagram models were defined to summarize confounding equivalence. Both theoretical proofs and simulation studies were performed to verify whether conditioning on different confounding equivalence sets had the same bias-reducing potential and then to select the optimum adjusting strategy, in which logistic regression model and inverse probability weighting based marginal structural model (IPW-based-MSM) were compared. The "do-calculus" was used to calculate the true causal effect of exposure on outcome, then the bias and standard error were used to evaluate the performances of different strategies. Adjusting for different sets of confounding equivalence, as judged by identical Markov boundaries, produced different bias-reducing potential in the logistic regression model. For the sets satisfied G-admissibility, adjusting for the set including all the confounders reduced the equivalent bias to the one containing the parent nodes of the outcome, while the bias after adjusting for the parent nodes of exposure was not equivalent to them. In addition, all causal effect estimations through logistic regression were biased, although the estimation after adjusting for the parent nodes of exposure was nearest to the true causal effect. However, conditioning on different confounding equivalence sets had the same bias-reducing potential under IPW-based-MSM. Compared with logistic regression, the IPW-based-MSM could obtain unbiased causal effect estimation when the adjusted confounders satisfied G-admissibility and the optimal strategy was to adjust for the parent nodes of outcome, which obtained the highest precision. All adjustment strategies through logistic regression were biased for causal effect estimation, while IPW-based-MSM could always obtain unbiased estimation when the adjusted set satisfied G-admissibility. Thus, IPW-based-MSM was recommended to adjust for confounders set.
Use and interpretation of logistic regression in habitat-selection studies
Keating, Kim A.; Cherry, Steve
2004-01-01
Logistic regression is an important tool for wildlife habitat-selection studies, but the method frequently has been misapplied due to an inadequate understanding of the logistic model, its interpretation, and the influence of sampling design. To promote better use of this method, we review its application and interpretation under 3 sampling designs: random, case-control, and use-availability. Logistic regression is appropriate for habitat use-nonuse studies employing random sampling and can be used to directly model the conditional probability of use in such cases. Logistic regression also is appropriate for studies employing case-control sampling designs, but careful attention is required to interpret results correctly. Unless bias can be estimated or probability of use is small for all habitats, results of case-control studies should be interpreted as odds ratios, rather than probability of use or relative probability of use. When data are gathered under a use-availability design, logistic regression can be used to estimate approximate odds ratios if probability of use is small, at least on average. More generally, however, logistic regression is inappropriate for modeling habitat selection in use-availability studies. In particular, using logistic regression to fit the exponential model of Manly et al. (2002:100) does not guarantee maximum-likelihood estimates, valid probabilities, or valid likelihoods. We show that the resource selection function (RSF) commonly used for the exponential model is proportional to a logistic discriminant function. Thus, it may be used to rank habitats with respect to probability of use and to identify important habitat characteristics or their surrogates, but it is not guaranteed to be proportional to probability of use. Other problems associated with the exponential model also are discussed. We describe an alternative model based on Lancaster and Imbens (1996) that offers a method for estimating conditional probability of use in use-availability studies. Although promising, this model fails to converge to a unique solution in some important situations. Further work is needed to obtain a robust method that is broadly applicable to use-availability studies.
Jacob, Benjamin G; Novak, Robert J; Toe, Laurent; Sanfo, Moussa S; Afriyie, Abena N; Ibrahim, Mohammed A; Griffith, Daniel A; Unnasch, Thomas R
2012-01-01
The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l. a major black-fly vector of Onchoceriasis, postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects. Generally, this correlation comes from two sources: (1) the design of the random effects and their assumed covariance from the multiple levels within the regression model; and, (2) the correlation structure of the residuals. Unfortunately, inconspicuous errors in residual intra-cluster correlation estimates can overstate precision in forecasted S.damnosum s.l. riverine larval habitat explanatory attributes regardless how they are treated (e.g., independent, autoregressive, Toeplitz, etc). In this research, the geographical locations for multiple riverine-based S. damnosum s.l. larval ecosystem habitats sampled from 2 pre-established epidemiological sites in Togo were identified and recorded from July 2009 to June 2010. Initially the data was aggregated into proc genmod. An agglomerative hierarchical residual cluster-based analysis was then performed. The sampled clustered study site data was then analyzed for statistical correlations using Monthly Biting Rates (MBR). Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS. A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by Annual Biting Rates (ABR). This data was overlain onto multitemporal sub-meter pixel resolution satellite data (i.e., QuickBird 0.61m wavbands ). Orthogonal spatial filter eigenvectors were then generated in SAS/GIS. Univariate and non-linear regression-based models (i.e., Logistic, Poisson and Negative Binomial) were also employed to determine probability distributions and to identify statistically significant parameter estimators from the sampled data. Thereafter, Durbin-Watson test statistics were used to test the null hypothesis that the regression residuals were not autocorrelated against the alternative that the residuals followed an autoregressive process in AUTOREG. Bayesian uncertainty matrices were also constructed employing normal priors for each of the sampled estimators in PROC MCMC. The residuals revealed both spatially structured and unstructured error effects in the high and low ABR-stratified clusters. The analyses also revealed that the estimators, levels of turbidity and presence of rocks were statistically significant for the high-ABR-stratified clusters, while the estimators distance between habitats and floating vegetation were important for the low-ABR-stratified cluster. Varying and constant coefficient regression models, ABR- stratified GIS-generated clusters, sub-meter resolution satellite imagery, a robust residual intra-cluster diagnostic test, MBR-based histograms, eigendecomposition spatial filter algorithms and Bayesian matrices can enable accurate autoregressive estimation of latent uncertainity affects and other residual error probabilities (i.e., heteroskedasticity) for testing correlations between georeferenced S. damnosum s.l. riverine larval habitat estimators. The asymptotic distribution of the resulting residual adjusted intra-cluster predictor error autocovariate coefficients can thereafter be established while estimates of the asymptotic variance can lead to the construction of approximate confidence intervals for accurately targeting productive S. damnosum s.l habitats based on spatiotemporal field-sampled count data.
Modeling Governance KB with CATPCA to Overcome Multicollinearity in the Logistic Regression
NASA Astrophysics Data System (ADS)
Khikmah, L.; Wijayanto, H.; Syafitri, U. D.
2017-04-01
The problem often encounters in logistic regression modeling are multicollinearity problems. Data that have multicollinearity between explanatory variables with the result in the estimation of parameters to be bias. Besides, the multicollinearity will result in error in the classification. In general, to overcome multicollinearity in regression used stepwise regression. They are also another method to overcome multicollinearity which involves all variable for prediction. That is Principal Component Analysis (PCA). However, classical PCA in only for numeric data. Its data are categorical, one method to solve the problems is Categorical Principal Component Analysis (CATPCA). Data were used in this research were a part of data Demographic and Population Survey Indonesia (IDHS) 2012. This research focuses on the characteristic of women of using the contraceptive methods. Classification results evaluated using Area Under Curve (AUC) values. The higher the AUC value, the better. Based on AUC values, the classification of the contraceptive method using stepwise method (58.66%) is better than the logistic regression model (57.39%) and CATPCA (57.39%). Evaluation of the results of logistic regression using sensitivity, shows the opposite where CATPCA method (99.79%) is better than logistic regression method (92.43%) and stepwise (92.05%). Therefore in this study focuses on major class classification (using a contraceptive method), then the selected model is CATPCA because it can raise the level of the major class model accuracy.
Zgheib, Nathalie K; Sleiman, Fatima; Nasreddine, Lara; Nasrallah, Mona; Nakhoul, Nancy; Isma’eel, Hussain; Tamim, Hani
2018-01-01
In Lebanon, data stemming from national cross-sectional surveys indicated significant increasing trends in the prevalence of cardiovascular diseases and associated behavioral and age-related risk factors. To our knowledge, no data are available on relative telomere length (RTL) as a potential biomarker for age-related diseases in a Lebanese population. The aim of this study was to evaluate whether there is an association between RTL and demographic characteristics, lifestyle habits and diseases in the Lebanese. This was a cross-sectional study of 497 Lebanese subjects. Peripheral blood RTL was measured by amplifying telomere and single copy gene using real-time PCR. Mean ± SD RTL was 1.42 ± 0.83, and it was categorized into 3 tertiles. Older age (P=0.002) and wider waist circumference (WC) (P=0.001) were statistically significantly associated with shorter RTL. Multinomial logistic regression showed that subjects who had some level of sleeping difficulty had a statistically significantly shorter RTL when compared to those with no sleeping difficulties at all [OR (95% CI): 2.01 (1.11-3.62) in the first RTL tertile]. Importantly, statistically significantly shorter RTL was found with every additional 10 cm of WC [OR (95% CI): 1.30 (1.11-1.52) for first RTL tertile]. In addition, and after performing the multivariate logistic regression and adjusting for “predictors” of RTL, the odds of having hypertension or being treated for hypertension were higher in patients who had shorter RTL: OR (95% CI): 2.45 (1.36-4.44) and 2.28 (1.22-4.26) in the first RTL tertiles respectively with a similar trend, though not statistically significant, in the second RTL tertiles. This is the first study in Lebanon to show an association between age, central obesity, poor sleep and hypertension and RTL. It is hoped that telomere length measurement be potentially used as a biomarker for biological age and age-related diseases and progression in the Lebanese. PMID:29392083
Association factor analysis between osteoporosis with cerebral artery disease: The STROBE study.
Jin, Eun-Sun; Jeong, Je Hoon; Lee, Bora; Im, Soo Bin
2017-03-01
The purpose of this study was to determine the clinical association factors between osteoporosis and cerebral artery disease in Korean population. Two hundred nineteen postmenopausal women and men undergoing cerebral computed tomography angiography were enrolled in this study to evaluate the cerebral artery disease by cross-sectional study. Cerebral artery disease was diagnosed if there was narrowing of 50% higher diameter in one or more cerebral vessel artery or presence of vascular calcification. History of osteoporotic fracture was assessed using medical record, and radiographic data such as simple radiography, MRI, and bone scan. Bone mineral density was checked by dual-energy x-ray absorptiometry. We reviewed clinical characteristics in all patients and also performed subgroup analysis for total or extracranial/ intracranial cerebral artery disease group retrospectively. We performed statistical analysis by means of chi-square test or Fisher's exact test for categorical variables and Student's t-test or Wilcoxon's rank sum test for continuous variables. We also used univariate and multivariate logistic regression analyses were conducted to assess the factors associated with the prevalence of cerebral artery disease. A two-tailed p-value of less than 0.05 was considered as statistically significant. All statistical analyses were performed using R (version 3.1.3; The R Foundation for Statistical Computing, Vienna, Austria) and SPSS (version 14.0; SPSS, Inc, Chicago, Ill, USA). Of the 219 patients, 142 had cerebral artery disease. All vertebral fracture was observed in 29 (13.24%) patients. There was significant difference in hip fracture according to the presence or absence of cerebral artery disease. In logistic regression analysis, osteoporotic hip fracture was significantly associated with extracranial cerebral artery disease after adjusting for multiple risk factors. Females with osteoporotic hip fracture were associated with total calcified cerebral artery disease. Some clinical factors such as age, hypertension, and osteoporotic hip fracture, smoking history and anti-osteoporosis drug use were associated with cerebral artery disease.
Jakovljevic, Aleksandar; Lazic, Emira; Soldatovic, Ivan; Nedeljkovic, Nenad; Andric, Miroslav
2015-07-01
To analyze radiographic predictors for lower third molar eruption among subjects with different anteroposterior skeletal relations and of different age groups. In total, 300 lower third molars were recorded on diagnostic digital orthopantomograms (DPTs) and lateral cephalograms (LCs). The radiographs were grouped according to sagittal intermaxillary angle (ANB), subject age, and level of lower third molar eruption. The DPT was used to analyze retromolar space, mesiodistal crown width, space/width ratio, third and second molar angulation (α, γ), third molar inclination (β), and gonion angle. The LC was used to determine ANB, angles of maxillar and mandibular prognathism (SNA, SNB), mandibular plane angle (SN/MP), and mandibular lengths. A logistic regression model was created using the statistically significant predictors. The logistic regression analysis revealed a statistically significant impact of β angle and distance between gonion and gnathion (Go-Gn) on the level of lower third molar eruption (P < .001 and P < .015, respectively). The retromolar space was significantly increased in the adult subgroup for all skeletal classes. The lower third molar impaction rate was significantly higher in the adult subgroup with the Class II (62.3%) compared with Class III subjects (31.7%; P < .013). The most favorable values of linear and angular predictors of mandibular third molar eruption were measured in Class III subjects. For valid estimation of mandibular third molar eruption, certain linear and angular measures (β angle, Go-Gn), as well as the size of the retromolar space, need to be considered.
Efficace, Fabio; Breccia, Massimo; Cottone, Francesco; Okumura, Iris; Doro, Maribel; Riccardi, Francesca; Rosti, Gianantonio; Baccarani, Michele
2016-12-01
The main objective of this study was to investigate whether social support is independently associated with psychological well-being in chronic myeloid leukemia (CML) patients. Secondary objectives were to compare the psychological well-being profile of CML patients with that of their peers in general population and to examine possible age- and sex-related differences. Analysis was performed on 417 patients in treatment with lifelong molecularly targeted therapies. Mean age of patients analyzed was 56 years (range 19-87 years) and 247 (59 %) were male and 170 (41 %) were female. Social support was assessed with the Multidimensional Scale of Perceived Social Support and psychological well-being was evaluated with the short version of the Psychological General Well-Being Index. Descriptive statistics and multivariate logistic regression analyses were used. Multivariate logistic regression analysis revealed that a greater social support was independently associated with lower anxiety and depression, as well as with higher positive well-being, self-control, and vitality (p < 0.001). Female patients reported statistically significant worse outcomes in all dimensions of psychological well-being. Age- and sex-adjusted comparisons with population norms revealed that depression (ES = -0.42, p < 0.001) and self-control (ES = -0.48, p < 0.001) were the two main impaired psychological dimensions. This study indicates that social support is a critical factor associated with psychological well-being of CML patients treated with modern lifelong targeted therapies.
Is there a relationship between periodontal conditions and number of medications among the elderly?
Natto, Zuhair S; Aladmawy, Majdi; Alshaeri, Heba K; Alasqah, Mohammed; Papas, Athena
2016-03-01
To investigate possible correlations of clinical attachment level and pocket depth with number of medications in elderly individuals. Intra-oral examinations for 139 patients visiting Tufts dental clinic were done. Periodontal assessments were performed with a manual UNC-15 periodontal probe to measure probing depth (PD) and clinical attachment level (CAL) at 6 sites. Complete lists of patients' medications were obtained during the examinations. Statistical analysis involved Kruskal-Wallis, chi square and multivariate logistic regression analyses. Age and health status attained statistical significance (p< 0.05), in contingency table analysis with number of medications. Number of medications had an effect on CAL: increased attachment loss was observed when 4 or more medications were being taken by the patient. Number of medications did not have any effect on periodontal PD. In multivariate logistic regression analysis, 6 or more medications had a higher risk of attachment loss (>3mm) when compared to the no-medication group, in crude OR (1.20, 95% CI:0.22-6.64), and age adjusted (OR=1.16, 95% CI:0.21-6.45), but not with the multivariate model (OR=0.71, 95% CI:0.11-4.39). CAL seems to be more sensitive to the number of medications taken, when compared to PD. However, it is not possible to discriminate at exactly what number of drug combinations the breakdown in CAL will happen. We need to do further analysis, including more subjects, to understand the possible synergistic mechanisms for different drug and periodontal responses.
Artificial Intelligence Systems as Prognostic and Predictive Tools in Ovarian Cancer.
Enshaei, A; Robson, C N; Edmondson, R J
2015-11-01
The ability to provide accurate prognostic and predictive information to patients is becoming increasingly important as clinicians enter an era of personalized medicine. For a disease as heterogeneous as epithelial ovarian cancer, conventional algorithms become too complex for routine clinical use. This study therefore investigated the potential for an artificial intelligence model to provide this information and compared it with conventional statistical approaches. The authors created a database comprising 668 cases of epithelial ovarian cancer during a 10-year period and collected data routinely available in a clinical environment. They also collected survival data for all the patients, then constructed an artificial intelligence model capable of comparing a variety of algorithms and classifiers alongside conventional statistical approaches such as logistic regression. The model was used to predict overall survival and demonstrated that an artificial neural network (ANN) algorithm was capable of predicting survival with high accuracy (93 %) and an area under the curve (AUC) of 0.74 and that this outperformed logistic regression. The model also was used to predict the outcome of surgery and again showed that ANN could predict outcome (complete/optimal cytoreduction vs. suboptimal cytoreduction) with 77 % accuracy and an AUC of 0.73. These data are encouraging and demonstrate that artificial intelligence systems may have a role in providing prognostic and predictive data for patients. The performance of these systems likely will improve with increasing data set size, and this needs further investigation.
Scannapieco, Frank A; Ho, Alex W; DiTolla, Maris; Chen, Casey; Dentino, Andrew R
2004-03-01
To determine if the prevalence of respiratory disease among dental students and dental residents varies with their exposure to the clinical dental environment. A detailed questionnaire was administered to 817 students at 3 dental schools. The questionnaire sought information concerning demographic characteristics, school year, exposure to the dental environment and dental procedures, and history of respiratory disease. The data obtained were subjected to bivariate and multiple logistic regression analysis. Respondents reported experiencing the following respiratory conditions during the previous year: asthma (26 cases), bronchitis (11 cases), chronic lung disease (6 cases), pneumonia (5 cases) and streptococcal pharyngitis (50 cases). Bivariate statistical analyses indicated no significant associations between the prevalence of any of the respiratory conditions and year in dental school, except for asthma, for which there was a significantly higher prevalence at 1 school compared to the other 2 schools. When all cases of respiratory disease were combined as a composite variable and subjected to multivariate logistic regression analysis controlling for age, sex, race, dental school, smoking history and alcohol consumption, no statistically significant association was observed between respiratory condition and year in dental school or exposure to the dental environment as a dental patient. No association was found between the prevalence of respiratory disease and a student's year in dental school or previous exposure to the dental environment as a patient. These results suggest that exposure to the dental environment does not increase the risk for respiratory infection in healthy dental health care workers.
Risk factors for repetitive strain injuries among school teachers in Thailand.
Chaiklieng, Sunisa; Suggaravetsiri, Pornnapa
2012-01-01
Prolonged posture, static works and repetition are previously reported as the cause of repetitive strain injuries (RSIs) among workers including teachers. This cross-sectional analytic study aimed to investigate the prevalence and risk factors of RSIs among school teachers. Participants were 452 full-time school teachers in Thailand. Data were collected by the structural questionnaires, illuminance measurements and the physical fitness tests. Descriptive statistics and inferential statistics which were Chi-square test and multiple logistic regression analysis were used. Most teachers in this study were females (57.3%), the mean years of work experience was 22.6 ± 10.4 years. The six-month prevalence of RSIs was 73.7%. The univariate analysis identified the related risk factors to RSIs which were chronic disease (OR=1.8; 95% CI = 1.16-2.73), history of trauma (OR=2.0; 95% CI = 1.02-4.01), member of family had RSIs (OR=2.0; 95% CI = 1.02- 4.01), stretch to write on board (OR=1.7; 95% CI = 1.06-1.70) and high heel shoe >2 inch (OR=1.6; 95% CI = 1.03-2.51). Multiple logistic regression analysis showed that chronic diseases and high heel shoe >2 inch significantly related to developing of RSIs. The poor grip strength and back muscle flexibility significantly affected RSIs of teachers. In conclusions, RSIs were highly prevalent in school teachers that they should be aware of health promotion to prevent RSIs.
Llamas-Carreras, José María; Amarilla, Almudena; Espinar-Escalona, Eduardo; Castellanos-Cosano, Lizett; Martín-González, Jenifer; Sánchez-Domínguez, Benito; López-Frías, Francisco Javier
2012-05-01
The purpose of this study was to compare, in a split mouth design, the external apical root resorption (EARR) associated with orthodontic treatment in root-filled maxillary incisors and their contralateral teeth with vital pulps. The study sample consisted of 38 patients (14 males and 24 females), who had one root-filled incisor before completion of multiband/bracket orthodontic therapy for at least 1 year. For each patient, digital panoramic radiographs taken before and after orthodontic treatment were used to determine the root resortion and the proportion of external root resorption (PRR), defined as the ratio between the root resorption in the endodontically treated incisor and that in its contralateral incisor with a vital pulp. The student's t-test, chi-square test and logistic regression analysis were used to determine statistical significance. There was no statistically significant difference (p > 0.05) between EARR in vital teeth (1.1 ± 1.0 mm) and endodontically treated incisors (1.1 ± 0.8 mm). Twenty-six patients (68.4%) showed greater resorption of the endodontically treated incisor than its homolog vital tooth (p > 0.05). The mean and standard deviation of PPR were 1.0 ± 0.2. Multivariate logistic regression suggested that PRR does not correlate with any of the variables analyzed. There was no significant difference in the amount or severity of external root resorption during orthodontic movement between root-filled incisors and their contralateral teeth with vital pulps.
Kim, Dong Wook; Kim, Hwiyoung; Nam, Woong; Kim, Hyung Jun; Cha, In-Ho
2018-04-23
The aim of this study was to build and validate five types of machine learning models that can predict the occurrence of BRONJ associated with dental extraction in patients taking bisphosphonates for the management of osteoporosis. A retrospective review of the medical records was conducted to obtain cases and controls for the study. Total 125 patients consisting of 41 cases and 84 controls were selected for the study. Five machine learning prediction algorithms including multivariable logistic regression model, decision tree, support vector machine, artificial neural network, and random forest were implemented. The outputs of these models were compared with each other and also with conventional methods, such as serum CTX level. Area under the receiver operating characteristic (ROC) curve (AUC) was used to compare the results. The performance of machine learning models was significantly superior to conventional statistical methods and single predictors. The random forest model yielded the best performance (AUC = 0.973), followed by artificial neural network (AUC = 0.915), support vector machine (AUC = 0.882), logistic regression (AUC = 0.844), decision tree (AUC = 0.821), drug holiday alone (AUC = 0.810), and CTX level alone (AUC = 0.630). Machine learning methods showed superior performance in predicting BRONJ associated with dental extraction compared to conventional statistical methods using drug holiday and serum CTX level. Machine learning can thus be applied in a wide range of clinical studies. Copyright © 2017. Published by Elsevier Inc.
Logistic regression models of factors influencing the location of bioenergy and biofuels plants
T.M. Young; R.L. Zaretzki; J.H. Perdue; F.M. Guess; X. Liu
2011-01-01
Logistic regression models were developed to identify significant factors that influence the location of existing wood-using bioenergy/biofuels plants and traditional wood-using facilities. Logistic models provided quantitative insight for variables influencing the location of woody biomass-using facilities. Availability of "thinnings to a basal area of 31.7m2/ha...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wenzel, Tom
NHTSA recently completed a logistic regression analysis updating its 2003, 2010, and 2012 studies of the relationship between vehicle mass and US fatality risk per vehicle mile traveled (VMT; Kahane 2010, Kahane 2012, Puckett 2016). The new study updates the 2012 analysis using FARS data from 2005 to 2011 for model year 2003 to 2010. Using the updated databases, NHTSA estimates that reducing vehicle mass by 100 pounds while holding footprint fixed would increase fatality risk per VMT by 1.49% for lighter-than-average cars and by 0.50% for heavierthan- average cars, but reduce risk by 0.10% for lighter-than-average light-duty trucks, bymore » 0.71% for heavier-than-average light-duty trucks, and by 0.99% for CUVs/minivans. Using a jack knife method to estimate the statistical uncertainty of these point estimates, NHTSA finds that none of these estimates are statistically significant at the 95% confidence level; however, the 1.49% increase in risk associated with mass reduction in lighter-than-average cars, and the 0.71% and 0.99% decreases in risk associated with mass reduction in heavier-than-average light trucks and CUVs/minivans, are statistically significant at the 90% confidence interval. The effect of mass reduction on risk that NHTSA estimated in 2016 is more beneficial than in its 2012 study, particularly for light trucks and CUVs/minivans. The 2016 NHTSA analysis estimates that reducing vehicle footprint by one square foot while holding mass constant would increase fatality risk per VMT by 0.28% in cars, by 0.38% in light trucks, and by 1.18% in CUVs and minivans.This report replicates the 2016 NHTSA analysis, and reproduces their main results. This report uses the confidence intervals output by the logistic regression models, which are smaller than the intervals NHTSA estimated using a jack-knife technique that accounts for the sampling error in the FARS fatality and state crash data. In addition to reproducing the NHTSA results, this report also examines the NHTSA data in slightly different ways to get a deeper understanding of the relationship between vehicle weight, footprint, and safety. The results of the NHTSA baseline results, and these alternative analyses, are summarized in Table ES.1; statistically significant estimates, based on the confidence intervals output by the logistic regression models, are shown in red in the tables. We found that NHTSA’s reasonable assumption that all vehicles will have ESC installed by 2017 in its baseline regression model slightly increases the estimated increase in risk from mass reduction in cars, but substantially decreases the estimated increase in risk from footprint reduction in all three vehicle types (Alternative 1 in Table ES.1; explained in more detail in Section 2.1 of this report). This is because NHTSA projects ESC to substantially reduce the number of fatalities in rollovers and crashes with stationary objects, and mass reduction appears to reduce risk, while footprint reduction appears to increase risk, in these types of crashes, particularly in cars and CUVs/minivans. A single regression model including all crash types results in slightly different estimates of the relationship between decreasing mass and risk, as shown in Alternative 2 in Table ES.1.« less
Ozge, C; Toros, F; Bayramkaya, E; Camdeviren, H; Sasmaz, T
2006-08-01
The purpose of this study is to evaluate the most important sociodemographic factors on smoking status of high school students using a broad randomised epidemiological survey. Using in-class, self administered questionnaire about their sociodemographic variables and smoking behaviour, a representative sample of total 3304 students of preparatory, 9th, 10th, and 11th grades, from 22 randomly selected schools of Mersin, were evaluated and discriminative factors have been determined using appropriate statistics. In addition to binary logistic regression analysis, the study evaluated combined effects of these factors using classification and regression tree methodology, as a new statistical method. The data showed that 38% of the students reported lifetime smoking and 16.9% of them reported current smoking with a male predominancy and increasing prevalence by age. Second hand smoking was reported at a 74.3% frequency with father predominance (56.6%). The significantly important factors that affect current smoking in these age groups were increased by household size, late birth rank, certain school types, low academic performance, increased second hand smoking, and stress (especially reported as separation from a close friend or because of violence at home). Classification and regression tree methodology showed the importance of some neglected sociodemographic factors with a good classification capacity. It was concluded that, as closely related with sociocultural factors, smoking was a common problem in this young population, generating important academic and social burden in youth life and with increasing data about this behaviour and using new statistical methods, effective coping strategies could be composed.
Fuzzy multinomial logistic regression analysis: A multi-objective programming approach
NASA Astrophysics Data System (ADS)
Abdalla, Hesham A.; El-Sayed, Amany A.; Hamed, Ramadan
2017-05-01
Parameter estimation for multinomial logistic regression is usually based on maximizing the likelihood function. For large well-balanced datasets, Maximum Likelihood (ML) estimation is a satisfactory approach. Unfortunately, ML can fail completely or at least produce poor results in terms of estimated probabilities and confidence intervals of parameters, specially for small datasets. In this study, a new approach based on fuzzy concepts is proposed to estimate parameters of the multinomial logistic regression. The study assumes that the parameters of multinomial logistic regression are fuzzy. Based on the extension principle stated by Zadeh and Bárdossy's proposition, a multi-objective programming approach is suggested to estimate these fuzzy parameters. A simulation study is used to evaluate the performance of the new approach versus Maximum likelihood (ML) approach. Results show that the new proposed model outperforms ML in cases of small datasets.
NASA Astrophysics Data System (ADS)
Esposito, Carlo; Barra, Anna; Evans, Stephen G.; Scarascia Mugnozza, Gabriele; Delaney, Keith
2014-05-01
The study of landslide susceptibility by multivariate statistical methods is based on finding a quantitative relationship between controlling factors and landslide occurrence. Such studies have become popular in the last few decades thanks to the development of geographic information systems (GIS) software and the related improved data management. In this work we applied a statistical approach to an area of high landslide susceptibility mainly due to its tropical climate and geological-geomorphological setting. The study area is located in the south-east region of Brazil that has frequently been affected by flood and landslide hazard, especially because of heavy rainfall events during the summer season. In this work we studied a disastrous event that occurred on January 11th and 12th of 2011, which involved Região Serrana (the mountainous region of Rio de Janeiro State) and caused more than 5000 landslides and at least 904 deaths. In order to produce susceptibility maps, we focused our attention on an area of 93,6 km2 that includes Nova Friburgo city. We utilized two different multivariate statistic methods: Logistic Regression (LR), already widely used in applied geosciences, and Random Forest (RF), which has only recently been applied to landslide susceptibility analysis. With reference to each mapping unit, the first method (LR) results in a probability of landslide occurrence, while the second one (RF) gives a prediction in terms of % of area susceptible to slope failure. With this aim in mind, a landslide inventory map (related to the studied event) has been drawn up through analyses of high-resolution GeoEye satellite images, in a GIS environment. Data layers of 11 causative factors have been created and processed in order to be used as continuous numerical or discrete categorical variables in statistical analysis. In particular, the logistic regression method has frequent difficulties in managing numerical continuous and discrete categorical variables together; therefore in our work we tried different methods to process categorical variables , until we obtained a statistically significant model. The outcomes of the two statistical methods (RF and LR) have been tested with a spatial validation and gave us two susceptibility maps. The significance of the models is quantified in terms of Area Under ROC Curve (AUC resulted in 0.81 for RF model and in 0.72 for LR model). In the first instance, a graphical comparison of the two methods shows a good correspondence between them. Further, we integrated results in a unique susceptibility map which maintains both information of probability of occurrence and % of area of landslide detachment, resulting from LR and RF respectively. In fact, in view of a landslide susceptibility classification of the study area, the former is less accurate but gives easily classifiable results, while the latter is more accurate but the results can be only subjectively classified. The obtained "integrated" susceptibility map preserves information about the probability that a given % of area could fail for each mapping unit.
T2 relaxation time is related to liver fibrosis severity
Siqueira, Luiz; Uppal, Ritika; Alford, Jamu; Fuchs, Bryan C.; Yamada, Suguru; Tanabe, Kenneth; Chung, Raymond T.; Lauwers, Gregory; Chew, Michael L.; Boland, Giles W.; Sahani, Duhyant V.; Vangel, Mark; Hahn, Peter F.; Caravan, Peter
2016-01-01
Background The grading of liver fibrosis relies on liver biopsy. Imaging techniques, including elastography and relaxometric, techniques have had varying success in diagnosing moderate fibrosis. The goal of this study was to determine if there is a relationship between the T2-relaxation time of hepatic parenchyma and the histologic grade of liver fibrosis in patients with hepatitis C undergoing both routine, liver MRI and liver biopsy, and to validate our methodology with phantoms and in a rat model of liver fibrosis. Methods This study is composed of three parts: (I) 123 patients who underwent both routine, clinical liver MRI and biopsy within a 6-month period, between July 1999 and January 2010 were enrolled in a retrospective study. MR imaging was performed at 1.5 T using dual-echo turbo-spin echo equivalent pulse sequence. T2 relaxation time of liver parenchyma in patients was calculated by mono-exponential fit of a region of interest (ROI) within the right lobe correlating to histopathologic grading (Ishak 0–6) and routine serum liver inflammation [aspartate aminotransferase (AST) and alanine aminotransferase (ALT)]. Statistical comparison was performed using ordinary logistic and ordinal logistic regression and ANOVA comparing T2 to Ishak fibrosis without and using AST and ALT as covariates; (II) a phantom was prepared using serial dilutions of dextran coated magnetic iron oxide nanoparticles. T2 weighed imaging was performed by comparing a dual echo fast spin echo sequence to a Carr-Purcell-Meigboom-Gill (CPMG) multi-echo sequence at 1.5 T. Statistical comparison was performed using a paired t-test; (III) male Wistar rats receiving weekly intraperitoneal injections of phosphate buffer solution (PBS) control (n=4 rats); diethylnitrosamine (DEN) for either 5 (n=5 rats) or 8 weeks (n=4 rats) were MR imaged on a Bruker Pharmascan 4.7 T magnet with a home-built bird-cage coil. T2 was quantified by using a mono-exponential fitting algorithm on multi-slice multi echo T2 weighted data. Statistical comparison was performed using ANOVA. Results (I) Histopathologic evaluation of both rat and human livers demonstrated no evidence of steatosis or hemochromatosis There was a monotonic increase in mean T2 value with increasing degree of fibrosis (control 65.4±2.9 ms, n=6 patients); mild (Ishak 1–2) 66.7±1.9 ms (n=30); moderate (Ishak 3–4) 71.6±1.7 ms (n=26); severe (Ishak 5–6) 72.4±1.4 ms (n=61); with relatively low standard error (~2.9 ms). There was a statistically significant difference between degrees of mild (Ishak <4) vs. moderate to severe fibrosis (Ishak >4) (P=0.03) based on logistic regression of T2 and Ishak, which became insignificant (P=0.07) when using inflammatory markers as covariates. Expanding on this model using ordinal logistic regression, there was significance amongst all 4 groups comparing T2 to Ishak (P=0.01), with significance using inflammation as a covariate (P=0.03) and approaching statistical significance amongst all groups by ANOVA (P=0.07); (II) there was a monotonic increase in T2 and statistical significance (ANOVA P<0.0001) between each rat subgroup [phosphate buffer solution (PBS) 25.2±0.8, DEN 5-week (31.1±1.5), and DEN 9-week (49.4±0.4) ms]; (III) the phantoms that had T2 values within the relevant range for the human liver (e.g., 20–100 ms), demonstrated no statistical difference between two point fits on turbo spin echo (TSE) data and multi-echo CPMG data (P=0.9). Conclusions The finding of increased T2 with liver fibrosis may relate to inflammation that may be an alternative or adjunct to other noninvasive MR imaging based approaches for assessing liver fibrosis. PMID:27190762
A Primer on Logistic Regression.
ERIC Educational Resources Information Center
Woldbeck, Tanya
This paper introduces logistic regression as a viable alternative when the researcher is faced with variables that are not continuous. If one is to use simple regression, the dependent variable must be measured on a continuous scale. In the behavioral sciences, it may not always be appropriate or possible to have a measured dependent variable on a…
A Solution to Separation and Multicollinearity in Multiple Logistic Regression
Shen, Jianzhao; Gao, Sujuan
2010-01-01
In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27–38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth’s penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study. PMID:20376286
A Solution to Separation and Multicollinearity in Multiple Logistic Regression.
Shen, Jianzhao; Gao, Sujuan
2008-10-01
In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27-38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth's penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study.
Axial Myopia Is Associated with Visual Field Prognosis of Primary Open-Angle Glaucoma
Qiu, Chen; Qian, Shaohong; Sun, Xinghuai; Zhou, Chuandi; Meng, Fanrong
2015-01-01
Purpose To identify whether myopia was associated with the visual field (VF) progression of primary open-angle glaucoma (POAG). Methods A total of 270 eyes of 270 POAG followed up for more than 3 years with ≥9 reliable VFs by Octopus perimetry were retrospectively reviewed. Myopia was divided into: mild myopia (-2.99 diopter [D], 0), moderate myopia (-5.99, 3.00 D), marked myopia (-9.00, -6.00 D) and non-myopia (0 D or more). An annual change in the mean defect (MD) slope >0.22 dB/y and 0.30 dB/y was defined as fast progression, respectively. Logistic regression was performed to determine prognostic factors for VF progression. Results For the cutoff threshold at 0.22 dB/y, logistic regression showed that vertical cup-to-disk ratio (VCDR; p = 0.004) and the extent of myopia (p = 0.002) were statistically significant. When logistic regression was repeated after excluding the extent of myopia, axial length (AL; p = 0.008, odds ratio [OR] = 0.796) reached significance, as did VCDR (p = 0.001). Compared to eyes with AL≤23 mm, the OR values were 0.334 (p = 0.059), 0.309 (p = 0.044), 0.266 (p = 0.019), 0.260 (p = 0.018), respectively, for 23
Bejaei, M; Wiseman, K; Cheng, K M
2015-01-01
Consumers' interest in specialty eggs appears to be growing in Europe and North America. The objective of this research was to develop logistic regression models that utilise purchaser attributes and demographics to predict the probability of a consumer purchasing a specific type of table egg including regular (white and brown), non-caged (free-run, free-range and organic) or nutrient-enhanced eggs. These purchase prediction models, together with the purchasers' attributes, can be used to assess market opportunities of different egg types specifically in British Columbia (BC). An online survey was used to gather data for the models. A total of 702 completed questionnaires were submitted by BC residents. Selected independent variables included in the logistic regression to develop models for different egg types to predict the probability of a consumer purchasing a specific type of table egg. The variables used in the model accounted for 54% and 49% of variances in the purchase of regular and non-caged eggs, respectively. Research results indicate that consumers of different egg types exhibit a set of unique and statistically significant characteristics and/or demographics. For example, consumers of regular eggs were less educated, older, price sensitive, major chain store buyers, and store flyer users, and had lower awareness about different types of eggs and less concern regarding animal welfare issues. However, most of the non-caged egg consumers were less concerned about price, had higher awareness about different types of table eggs, purchased their eggs from local/organic grocery stores, farm gates or farmers markets, and they were more concerned about care and feeding of hens compared to consumers of other eggs types.
Amiresmaili, Mohammadreza; Khosravi, Sajad; Feyzabadi, Vahid Yazdi
2014-01-01
Background: Rural family physician program as the new reform in the Iranian health system has been implemented since 2005. Its success depends much on physicians’ retention. The present study aimed to identify influential factors on physicians’ willingness to leave out this program in Kerman province. Methods: The present cross-sectional study was performed in Kerman province in 2011. All family physicians working in this program (n = 271) were studied using a questionnaire. Data analysis was carried out using descriptive statistics and logistic regression through SPSS version 18.0. Results: Twenty-six percent (70) of the physicians had left out the program in the past. In addition, 77.3% (208) intended to leave out in the near future. Opportunity for continuing education, inappropriate and long working hours, unsuitable requirements of salary, irregular payments, lack of job security and high working responsibility were regarded as the most important reasons for leaving out the program in the past and intention to leave out in future orderly. According to univariate logistic regression, younger physicians (odds ratio [OR] =2.479; 95% confidence interval [CI]: 1.261-4.872) and physicians who had older children (OR = 4.743; 95% CI: 1.441-15.607) were more willing to leave out the plan in the near future, however it was not significant in multivariate logistic regression. Conclusions: Physician retention in family physician program is faced with serious doubts due to different reasons. The success of the program is endangered because of the pivotal role of human resources. Hence, the revision of human resources policies of the program seems necessary in order to reduce physicians leave out and improving its effectiveness. PMID:25400891
Jarvis, J; Seed, M; Elton, R; Sawyer, L; Agius, R
2005-01-01
Aims: To investigate quantitatively, relationships between chemical structure and reported occupational asthma hazard for low molecular weight (LMW) organic compounds; to develop and validate a model linking asthma hazard with chemical substructure; and to generate mechanistic hypotheses that might explain the relationships. Methods: A learning dataset used 78 LMW chemical asthmagens reported in the literature before 1995, and 301 control compounds with recognised occupational exposures and hazards other than respiratory sensitisation. The chemical structures of the asthmagens and control compounds were characterised by the presence of chemical substructure fragments. Odds ratios were calculated for these fragments to determine which were associated with a likelihood of being reported as an occupational asthmagen. Logistic regression modelling was used to identify the independent contribution of these substructures. A post-1995 set of 21 asthmagens and 77 controls were selected to externally validate the model. Results: Nitrogen or oxygen containing functional groups such as isocyanate, amine, acid anhydride, and carbonyl were associated with an occupational asthma hazard, particularly when the functional group was present twice or more in the same molecule. A logistic regression model using only statistically significant independent variables for occupational asthma hazard correctly assigned 90% of the model development set. The external validation showed a sensitivity of 86% and specificity of 99%. Conclusions: Although a wide variety of chemical structures are associated with occupational asthma, bifunctional reactivity is strongly associated with occupational asthma hazard across a range of chemical substructures. This suggests that chemical cross-linking is an important molecular mechanism leading to the development of occupational asthma. The logistic regression model is freely available on the internet and may offer a useful but inexpensive adjunct to the prediction of occupational asthma hazard. PMID:15778257
González-Madroño, A; Mancha, A; Rodríguez, F J; Culebras, J; de Ulibarri, J I
2012-01-01
To ratify previous validations of the CONUT nutritional screening tool by the development of two probabilistic models using the parameters included in the CONUT, to see if the CONUT´s effectiveness could be improved. It is a two step prospective study. In Step 1, 101 patients were randomly selected, and SGA and CONUT was made. With data obtained an unconditional logistic regression model was developed, and two variants of CONUT were constructed: Model 1 was made by a method of logistic regression. Model 2 was made by dividing the probabilities of undernutrition obtained in model 1 in seven regular intervals. In step 2, 60 patients were selected and underwent the SGA, the original CONUT and the new models developed. The diagnostic efficacy of the original CONUT and the new models was tested by means of ROC curves. Both samples 1 and 2 were put together to measure the agreement degree between the original CONUT and SGA, and diagnostic efficacy parameters were calculated. No statistically significant differences were found between sample 1 and 2, regarding age, sex and medical/surgical distribution and undernutrition rates were similar (over 40%). The AUC for the ROC curves were 0.862 for the original CONUT, and 0.839 and 0.874, for model 1 and 2 respectively. The kappa index for the CONUT and SGA was 0.680. The CONUT, with the original scores assigned by the authors is equally good than mathematical models and thus is a valuable tool, highly useful and efficient for the purpose of Clinical Undernutrition screening.
McKechnie, Duncan; Fisher, Murray J; Pryor, Julie; Bonser, Melissa; Jesus, Jhoven De
2018-03-01
To develop a falls risk screening tool (FRST) sensitive to the traumatic brain injury rehabilitation population. Falls are the most frequently recorded patient safety incident within the hospital context. The inpatient traumatic brain injury rehabilitation population is one particular population that has been identified as at high risk of falls. However, no FRST has been developed for this patient population. Consequently in the traumatic brain injury rehabilitation population, there is the real possibility that nurses are using falls risk screening tools that have a poor clinical utility. Multisite prospective cohort study. Univariate and multiple logistic regression modelling techniques (backward elimination, elastic net and hierarchical) were used to examine each variable's association with patients who fell. The resulting FRST's clinical validity was examined. Of the 140 patients in the study, 41 (29%) fell. Through multiple logistic regression modelling, 11 variables were identified as predictors for falls. Using hierarchical logistic regression, five of these were identified for inclusion in the resulting falls risk screening tool: prescribed mobility aid (such as, wheelchair or frame), a fall since admission to hospital, impulsive behaviour, impaired orientation and bladder and/or bowel incontinence. The resulting FRST has good clinical validity (sensitivity = 0.9; specificity = 0.62; area under the curve = 0.87; Youden index = 0.54). The tool was significantly more accurate (p = .037 on DeLong test) in discriminating fallers from nonfallers than the Ontario Modified STRATIFY FRST. A FRST has been developed using a comprehensive statistical framework, and evidence has been provided of this tool's clinical validity. The developed tool, the Sydney Falls Risk Screening Tool, should be considered for use in brain injury rehabilitation populations. © 2017 John Wiley & Sons Ltd.
Huang, Jinxi; Zhou, Yi; Wang, Chenghu; Yuan, Weiwei; Zhang, Zhandong; Chen, Beibei; Zhang, Xiefu
2017-11-01
This study was conducted to investigate the risk factors of anastomotic fistula after the radical resection of esophageal-cardiac cancer. Five hundred and forty-four esophageal-cardiac cancer patients who underwent surgery and had complete clinical data were included in the study. Fifty patients diagnosed with postoperative anastomotic fistula were considered the case group and the remaining 494 subjects who did not develop postoperative anastomotic fistula were considered the control. The potential risk factors for anastomotic fistula, such as age, gender, diabetes history, smoking history, were collected and compared between the groups. Statistically significant variables were substituted into logistic regression to further evaluate the independent risk factors for postoperative anastomotic fistulas in esophageal-cardiac cancer. The incidence of anastomotic fistulas was 9.2% (50/544). Logistic regression analysis revealed that female gender (P < 0.05), laparoscopic surgery (P < 0.05), decreased postoperative albumin (P < 0.05), and postoperative renal dysfunction (P < 0.05) were independent risk factors for anastomotic fistulas in patients who received surgery for esophageal-cardiac cancer. Of the 50 anastomotic fistulas, 16 cases were small fistulas, which were only discovered by conventional imaging examination and not presenting clinical symptoms. All of the anastomotic fistulas occurred within seven days after surgery. Five of the patients with anastomotic fistulas underwent a second surgery and three died. Female patients with esophageal-cardiac cancer treated with endoscopic surgery and suffering from postoperative hypoproteinemia and renal dysfunction were susceptible to postoperative anastomotic fistula. © 2017 The Authors. Thoracic Cancer published by China Lung Oncology Group and John Wiley & Sons Australia, Ltd.
NASA Astrophysics Data System (ADS)
Demir, Gökhan; aytekin, mustafa; banu ikizler, sabriye; angın, zekai
2013-04-01
The North Anatolian Fault is know as one of the most active and destructive fault zone which produced many earthquakes with high magnitudes. Along this fault zone, the morphology and the lithological features are prone to landsliding. However, many earthquake induced landslides were recorded by several studies along this fault zone, and these landslides caused both injuiries and live losts. Therefore, a detailed landslide susceptibility assessment for this area is indispancable. In this context, a landslide susceptibility assessment for the 1445 km2 area in the Kelkit River valley a part of North Anatolian Fault zone (Eastern Black Sea region of Turkey) was intended with this study, and the results of this study are summarized here. For this purpose, geographical information system (GIS) and a bivariate statistical model were used. Initially, Landslide inventory maps are prepared by using landslide data determined by field surveys and landslide data taken from General Directorate of Mineral Research and Exploration. The landslide conditioning factors are considered to be lithology, slope gradient, slope aspect, topographical elevation, distance to streams, distance to roads and distance to faults, drainage density and fault density. ArcGIS package was used to manipulate and analyze all the collected data Logistic regression method was applied to create a landslide susceptibility map. Landslide susceptibility maps were divided into five susceptibility regions such as very low, low, moderate, high and very high. The result of the analysis was verified using the inventoried landslide locations and compared with the produced probability model. For this purpose, Area Under Curvature (AUC) approach was applied, and a AUC value was obtained. Based on this AUC value, the obtained landslide susceptibility map was concluded as satisfactory. Keywords: North Anatolian Fault Zone, Landslide susceptibility map, Geographical Information Systems, Logistic Regression Analysis.
Ye, Dong-qing; Hu, Yi-song; Li, Xiang-pei; Huang, Fen; Yang, Shi-gui; Hao, Jia-hu; Yin, Jing; Zhang, Guo-qing; Liu, Hui-hui
2004-11-01
To explore the impact of environmental factors, daily lifestyle, psycho-social factors and the interactions between environmental factors and chemokines genes on systemic lupus erythematosus (SLE). Case-control study was carried out and environmental factors for SLE were analyzed by univariate and multivariate unconditional logistic regression. Interactions between environmental factors and chemokines polymorphism contributing to systemic lupus erythematosus were also analyzed by logistic regression model. There were nineteen factors associated with SLE when univariate unconditional logistic regression was used. However, when multivariate unconditional logistic regression was used, only five factors showed having impacts on the disease, in which drinking well water (OR=0.099) was protective factor for SLE, and multiple drug allergy (OR=8.174), over-exposure to sunshine (OR=18.339), taking antibiotics (OR=9.630) and oral contraceptives were risk factors for SLE. When unconditional logistic regression model was used, results showed that there was interaction between eating irritable food and -2518MCP-1G/G genotype (OR=4.387). No interaction between environmental factors was found that contributing to SLE in this study. Many environmental factors were related to SLE, and there was an interaction between -2518MCP-1G/G genotype and eating irritable food.
Mielniczuk, Jan; Teisseyre, Paweł
2018-03-01
Detection of gene-gene interactions is one of the most important challenges in genome-wide case-control studies. Besides traditional logistic regression analysis, recently the entropy-based methods attracted a significant attention. Among entropy-based methods, interaction information is one of the most promising measures having many desirable properties. Although both logistic regression and interaction information have been used in several genome-wide association studies, the relationship between them has not been thoroughly investigated theoretically. The present paper attempts to fill this gap. We show that although certain connections between the two methods exist, in general they refer two different concepts of dependence and looking for interactions in those two senses leads to different approaches to interaction detection. We introduce ordering between interaction measures and specify conditions for independent and dependent genes under which interaction information is more discriminative measure than logistic regression. Moreover, we show that for so-called perfect distributions those measures are equivalent. The numerical experiments illustrate the theoretical findings indicating that interaction information and its modified version are more universal tools for detecting various types of interaction than logistic regression and linkage disequilibrium measures. © 2017 WILEY PERIODICALS, INC.
ERIC Educational Resources Information Center
Shih, Ching-Lin; Liu, Tien-Hsiang; Wang, Wen-Chung
2014-01-01
The simultaneous item bias test (SIBTEST) method regression procedure and the differential item functioning (DIF)-free-then-DIF strategy are applied to the logistic regression (LR) method simultaneously in this study. These procedures are used to adjust the effects of matching true score on observed score and to better control the Type I error…
Stochastic modeling of sunshine number data
NASA Astrophysics Data System (ADS)
Brabec, Marek; Paulescu, Marius; Badescu, Viorel
2013-11-01
In this paper, we will present a unified statistical modeling framework for estimation and forecasting sunshine number (SSN) data. Sunshine number has been proposed earlier to describe sunshine time series in qualitative terms (Theor Appl Climatol 72 (2002) 127-136) and since then, it was shown to be useful not only for theoretical purposes but also for practical considerations, e.g. those related to the development of photovoltaic energy production. Statistical modeling and prediction of SSN as a binary time series has been challenging problem, however. Our statistical model for SSN time series is based on an underlying stochastic process formulation of Markov chain type. We will show how its transition probabilities can be efficiently estimated within logistic regression framework. In fact, our logistic Markovian model can be relatively easily fitted via maximum likelihood approach. This is optimal in many respects and it also enables us to use formalized statistical inference theory to obtain not only the point estimates of transition probabilities and their functions of interest, but also related uncertainties, as well as to test of various hypotheses of practical interest, etc. It is straightforward to deal with non-homogeneous transition probabilities in this framework. Very importantly from both physical and practical points of view, logistic Markov model class allows us to test hypotheses about how SSN dependents on various external covariates (e.g. elevation angle, solar time, etc.) and about details of the dynamic model (order and functional shape of the Markov kernel, etc.). Therefore, using generalized additive model approach (GAM), we can fit and compare models of various complexity which insist on keeping physical interpretation of the statistical model and its parts. After introducing the Markovian model and general approach for identification of its parameters, we will illustrate its use and performance on high resolution SSN data from the Solar Radiation Monitoring Station of the West University of Timisoara.
Stochastic modeling of sunshine number data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brabec, Marek, E-mail: mbrabec@cs.cas.cz; Paulescu, Marius; Badescu, Viorel
2013-11-13
In this paper, we will present a unified statistical modeling framework for estimation and forecasting sunshine number (SSN) data. Sunshine number has been proposed earlier to describe sunshine time series in qualitative terms (Theor Appl Climatol 72 (2002) 127-136) and since then, it was shown to be useful not only for theoretical purposes but also for practical considerations, e.g. those related to the development of photovoltaic energy production. Statistical modeling and prediction of SSN as a binary time series has been challenging problem, however. Our statistical model for SSN time series is based on an underlying stochastic process formulation ofmore » Markov chain type. We will show how its transition probabilities can be efficiently estimated within logistic regression framework. In fact, our logistic Markovian model can be relatively easily fitted via maximum likelihood approach. This is optimal in many respects and it also enables us to use formalized statistical inference theory to obtain not only the point estimates of transition probabilities and their functions of interest, but also related uncertainties, as well as to test of various hypotheses of practical interest, etc. It is straightforward to deal with non-homogeneous transition probabilities in this framework. Very importantly from both physical and practical points of view, logistic Markov model class allows us to test hypotheses about how SSN dependents on various external covariates (e.g. elevation angle, solar time, etc.) and about details of the dynamic model (order and functional shape of the Markov kernel, etc.). Therefore, using generalized additive model approach (GAM), we can fit and compare models of various complexity which insist on keeping physical interpretation of the statistical model and its parts. After introducing the Markovian model and general approach for identification of its parameters, we will illustrate its use and performance on high resolution SSN data from the Solar Radiation Monitoring Station of the West University of Timisoara.« less
Hermes, Ilarraza-Lomelí; Marianna, García-Saldivia; Jessica, Rojano-Castillo; Carlos, Barrera-Ramírez; Rafael, Chávez-Domínguez; María Dolores, Rius-Suárez; Pedro, Iturralde
2016-10-01
Mortality due to cardiovascular disease is often associated with ventricular arrhythmias. Nowadays, patients with cardiovascular disease are more encouraged to take part in physical training programs. Nevertheless, high-intensity exercise is associated to a higher risk for sudden death, even in apparently healthy people. During an exercise testing (ET), health care professionals provide patients, in a controlled scenario, an intense physiological stimulus that could precipitate cardiac arrhythmia in high risk individuals. There is still no clinical or statistical tool to predict this incidence. The aim of this study was to develop a statistical model to predict the incidence of exercise-induced potentially life-threatening ventricular arrhythmia (PLVA) during high intensity exercise. 6415 patients underwent a symptom-limited ET with a Balke ramp protocol. A multivariate logistic regression model where the primary outcome was PLVA was performed. Incidence of PLVA was 548 cases (8.5%). After a bivariate model, thirty one clinical or ergometric variables were statistically associated with PLVA and were included in the regression model. In the multivariate model, 13 of these variables were found to be statistically significant. A regression model (G) with a X(2) of 283.987 and a p<0.001, was constructed. Significant variables included: heart failure, antiarrhythmic drugs, myocardial lower-VD, age and use of digoxin, nitrates, among others. This study allows clinicians to identify patients at risk of ventricular tachycardia or couplets during exercise, and to take preventive measures or appropriate supervision. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Wang, Yunzhi; Qiu, Yuchen; Thai, Theresa; More, Kathleen; Ding, Kai; Liu, Hong; Zheng, Bin
2016-03-01
How to rationally identify epithelial ovarian cancer (EOC) patients who will benefit from bevacizumab or other antiangiogenic therapies is a critical issue in EOC treatments. The motivation of this study is to quantitatively measure adiposity features from CT images and investigate the feasibility of predicting potential benefit of EOC patients with or without receiving bevacizumab-based chemotherapy treatment using multivariate statistical models built based on quantitative adiposity image features. A dataset involving CT images from 59 advanced EOC patients were included. Among them, 32 patients received maintenance bevacizumab after primary chemotherapy and the remaining 27 patients did not. We developed a computer-aided detection (CAD) scheme to automatically segment subcutaneous fat areas (VFA) and visceral fat areas (SFA) and then extracted 7 adiposity-related quantitative features. Three multivariate data analysis models (linear regression, logistic regression and Cox proportional hazards regression) were performed respectively to investigate the potential association between the model-generated prediction results and the patients' progression-free survival (PFS) and overall survival (OS). The results show that using all 3 statistical models, a statistically significant association was detected between the model-generated results and both of the two clinical outcomes in the group of patients receiving maintenance bevacizumab (p<0.01), while there were no significant association for both PFS and OS in the group of patients without receiving maintenance bevacizumab. Therefore, this study demonstrated the feasibility of using quantitative adiposity-related CT image features based statistical prediction models to generate a new clinical marker and predict the clinical outcome of EOC patients receiving maintenance bevacizumab-based chemotherapy.
Access disparities to Magnet hospitals for patients undergoing neurosurgical operations
Missios, Symeon; Bekelis, Kimon
2017-01-01
Background Centers of excellence focusing on quality improvement have demonstrated superior outcomes for a variety of surgical interventions. We investigated the presence of access disparities to hospitals recognized by the Magnet Recognition Program of the American Nurses Credentialing Center (ANCC) for patients undergoing neurosurgical operations. Methods We performed a cohort study of all neurosurgery patients who were registered in the New York Statewide Planning and Research Cooperative System (SPARCS) database from 2009–2013. We examined the association of African-American race and lack of insurance with Magnet status hospitalization for neurosurgical procedures. A mixed effects propensity adjusted multivariable regression analysis was used to control for confounding. Results During the study period, 190,535 neurosurgical patients met the inclusion criteria. Using a multivariable logistic regression, we demonstrate that African-Americans had lower admission rates to Magnet institutions (OR 0.62; 95% CI, 0.58–0.67). This persisted in a mixed effects logistic regression model (OR 0.77; 95% CI, 0.70–0.83) to adjust for clustering at the patient county level, and a propensity score adjusted logistic regression model (OR 0.75; 95% CI, 0.69–0.82). Additionally, lack of insurance was associated with lower admission rates to Magnet institutions (OR 0.71; 95% CI, 0.68–0.73), in a multivariable logistic regression model. This persisted in a mixed effects logistic regression model (OR 0.72; 95% CI, 0.69–0.74), and a propensity score adjusted logistic regression model (OR 0.72; 95% CI, 0.69–0.75). Conclusions Using a comprehensive all-payer cohort of neurosurgery patients in New York State we identified an association of African-American race and lack of insurance with lower rates of admission to Magnet hospitals. PMID:28684152
Adjusting for Confounding in Early Postlaunch Settings: Going Beyond Logistic Regression Models.
Schmidt, Amand F; Klungel, Olaf H; Groenwold, Rolf H H
2016-01-01
Postlaunch data on medical treatments can be analyzed to explore adverse events or relative effectiveness in real-life settings. These analyses are often complicated by the number of potential confounders and the possibility of model misspecification. We conducted a simulation study to compare the performance of logistic regression, propensity score, disease risk score, and stabilized inverse probability weighting methods to adjust for confounding. Model misspecification was induced in the independent derivation dataset. We evaluated performance using relative bias confidence interval coverage of the true effect, among other metrics. At low events per coefficient (1.0 and 0.5), the logistic regression estimates had a large relative bias (greater than -100%). Bias of the disease risk score estimates was at most 13.48% and 18.83%. For the propensity score model, this was 8.74% and >100%, respectively. At events per coefficient of 1.0 and 0.5, inverse probability weighting frequently failed or reduced to a crude regression, resulting in biases of -8.49% and 24.55%. Coverage of logistic regression estimates became less than the nominal level at events per coefficient ≤5. For the disease risk score, inverse probability weighting, and propensity score, coverage became less than nominal at events per coefficient ≤2.5, ≤1.0, and ≤1.0, respectively. Bias of misspecified disease risk score models was 16.55%. In settings with low events/exposed subjects per coefficient, disease risk score methods can be useful alternatives to logistic regression models, especially when propensity score models cannot be used. Despite better performance of disease risk score methods than logistic regression and propensity score models in small events per coefficient settings, bias, and coverage still deviated from nominal.
Espelt, Albert; Marí-Dell'Olmo, Marc; Penelo, Eva; Bosque-Prous, Marina
2016-06-14
To examine the differences between Prevalence Ratio (PR) and Odds Ratio (OR) in a cross-sectional study and to provide tools to calculate PR using two statistical packages widely used in substance use research (STATA and R). We used cross-sectional data from 41,263 participants of 16 European countries participating in the Survey on Health, Ageing and Retirement in Europe (SHARE). The dependent variable, hazardous drinking, was calculated using the Alcohol Use Disorders Identification Test - Consumption (AUDIT-C). The main independent variable was gender. Other variables used were: age, educational level and country of residence. PR of hazardous drinking in men with relation to women was estimated using Mantel-Haenszel method, log-binomial regression models and poisson regression models with robust variance. These estimations were compared to the OR calculated using logistic regression models. Prevalence of hazardous drinkers varied among countries. Generally, men have higher prevalence of hazardous drinking than women [PR=1.43 (1.38-1.47)]. Estimated PR was identical independently of the method and the statistical package used. However, OR overestimated PR, depending on the prevalence of hazardous drinking in the country. In cross-sectional studies, where comparisons between countries with differences in the prevalence of the disease or condition are made, it is advisable to use PR instead of OR.
Pfeiffer, R M; Riedl, R
2015-08-15
We assess the asymptotic bias of estimates of exposure effects conditional on covariates when summary scores of confounders, instead of the confounders themselves, are used to analyze observational data. First, we study regression models for cohort data that are adjusted for summary scores. Second, we derive the asymptotic bias for case-control studies when cases and controls are matched on a summary score, and then analyzed either using conditional logistic regression or by unconditional logistic regression adjusted for the summary score. Two scores, the propensity score (PS) and the disease risk score (DRS) are studied in detail. For cohort analysis, when regression models are adjusted for the PS, the estimated conditional treatment effect is unbiased only for linear models, or at the null for non-linear models. Adjustment of cohort data for DRS yields unbiased estimates only for linear regression; all other estimates of exposure effects are biased. Matching cases and controls on DRS and analyzing them using conditional logistic regression yields unbiased estimates of exposure effect, whereas adjusting for the DRS in unconditional logistic regression yields biased estimates, even under the null hypothesis of no association. Matching cases and controls on the PS yield unbiased estimates only under the null for both conditional and unconditional logistic regression, adjusted for the PS. We study the bias for various confounding scenarios and compare our asymptotic results with those from simulations with limited sample sizes. To create realistic correlations among multiple confounders, we also based simulations on a real dataset. Copyright © 2015 John Wiley & Sons, Ltd.
Afifi, Tracie O; Cox, Brian J; Martens, Patricia J; Sareen, Jitender; Enns, Murray W
2010-01-01
Gambling has become an increasingly common activity among women since the widespread growth of the gambling industry. Currently, our knowledge of the relationship between problem gambling among women and mental and physical correlates is limited. Therefore, important relationships between problem gambling and health and functioning, mental disorders, physical health conditions, and help-seeking behaviours among women were examined using a nationally representative Canadian sample. Data were from the nationally representative Canadian Community Health Survey Cycle 1.2 (CCHS 1.2; n = 10,056 women aged 15 years and older; data collected in 2002). The statistical analysis included binary logistic regression, multinomial logistic regression, and linear regression models. Past 12-month problem gambling was associated with a significantly higher probability of current lower general health, suicidal ideation and attempts, decreased psychological well-being, increased distress, depression, mania, panic attacks, social phobia, agoraphobia, alcohol dependence, any mental disorder, comorbidity of mental disorders, chronic bronchitis, fibromyalgia, migraine headaches, help-seeking from a professional, attending a self-help group, and calling a telephone help line (odds ratios ranged from 1.5 to 8.2). Problem gambling was associated with a broad range of negative health correlates among women. Problem gambling is an important public health concern. These findings can be used to inform healthy public policies on gambling.
Modifiable Lifestyle Behaviors Are Associated With Metabolic Syndrome in a Taiwanese Population.
Lin, Kuei-Man; Chiou, Jeng-Yuan; Ko, Shu-Hua; Tan, Jung-Ying; Huang, Chien-Ning; Liao, Wen-Chun
2015-11-01
To explore associations between metabolic syndrome and modifiable lifestyle behaviors among the adult population in Taiwan. This cross-sectional study analyzed data from a nationally representative sample that participated in the 2005-2008 Nutrition and Health Survey in Taiwan. The sample (2,337 participants older than 19 years) provided data on demographic characteristics, modifiable lifestyle behaviors, anthropometric measurements, and blood chemistry panel. These data were analyzed by descriptive statistics, univariate logistic regression, and multivariate logistic regression to determine factors associated with metabolic syndrome. Metabolic syndrome had a prevalence of 25.2%, and this prevalence increased with age. In univariate regression analysis, metabolic syndrome was associated with age, living with family members, educational level, and modifiable lifestyle behaviors (smoking, drinking, betel quid chewing, and physical activity). Individuals with a smoking history and currently chewing betel quid had the highest risk for metabolic syndrome. The risk for metabolic syndrome might be reduced by public health campaigns to encourage people to quit smoking cigarettes and chewing betel quid. Implementing more modifiable lifestyle behaviors in daily life will decrease metabolic syndrome in Taiwan. Considering that betel quid chewing and tobacco smoking interact to adversely affect metabolic syndrome risk, public health campaigns against both behaviors seem to be a cost-effective and efficient health promotion strategy to reduce the prevalence rate of metabolic syndrome. © 2015 Sigma Theta Tau International.
Veazey, Lindsay M; Franklin, Erik C; Kelley, Christopher; Rooney, John; Frazer, L Neil; Toonen, Robert J
2016-01-01
Predictive habitat suitability models are powerful tools for cost-effective, statistically robust assessment of the environmental drivers of species distributions. The aim of this study was to develop predictive habitat suitability models for two genera of scleractinian corals (Leptoserisand Montipora) found within the mesophotic zone across the main Hawaiian Islands. The mesophotic zone (30-180 m) is challenging to reach, and therefore historically understudied, because it falls between the maximum limit of SCUBA divers and the minimum typical working depth of submersible vehicles. Here, we implement a logistic regression with rare events corrections to account for the scarcity of presence observations within the dataset. These corrections reduced the coefficient error and improved overall prediction success (73.6% and 74.3%) for both original regression models. The final models included depth, rugosity, slope, mean current velocity, and wave height as the best environmental covariates for predicting the occurrence of the two genera in the mesophotic zone. Using an objectively selected theta ("presence") threshold, the predicted presence probability values (average of 0.051 for Leptoseris and 0.040 for Montipora) were translated to spatially-explicit habitat suitability maps of the main Hawaiian Islands at 25 m grid cell resolution. Our maps are the first of their kind to use extant presence and absence data to examine the habitat preferences of these two dominant mesophotic coral genera across Hawai'i.
Hay, Peter D; Smith, Julie; O'Connor, Richard A
2016-02-01
The aim of this study was to evaluate the benefits to SPECT bone scan image quality when applying resolution recovery (RR) during image reconstruction using software provided by a third-party supplier. Bone SPECT data from 90 clinical studies were reconstructed retrospectively using software supplied independent of the gamma camera manufacturer. The current clinical datasets contain 120×10 s projections and are reconstructed using an iterative method with a Butterworth postfilter. Five further reconstructions were created with the following characteristics: 10 s projections with a Butterworth postfilter (to assess intraobserver variation); 10 s projections with a Gaussian postfilter with and without RR; and 5 s projections with a Gaussian postfilter with and without RR. Two expert observers were asked to rate image quality on a five-point scale relative to our current clinical reconstruction. Datasets were anonymized and presented in random order. The benefits of RR on image scores were evaluated using ordinal logistic regression (visual grading regression). The application of RR during reconstruction increased the probability of both observers of scoring image quality as better than the current clinical reconstruction even where the dataset contained half the normal counts. Type of reconstruction and observer were both statistically significant variables in the ordinal logistic regression model. Visual grading regression was found to be a useful method for validating the local introduction of technological developments in nuclear medicine imaging. RR, as implemented by the independent software supplier, improved bone SPECT image quality when applied during image reconstruction. In the majority of clinical cases, acquisition times for bone SPECT intended for the purposes of localization can safely be halved (from 10 s projections to 5 s) when RR is applied.
No rationale for 1 variable per 10 events criterion for binary logistic regression analysis.
van Smeden, Maarten; de Groot, Joris A H; Moons, Karel G M; Collins, Gary S; Altman, Douglas G; Eijkemans, Marinus J C; Reitsma, Johannes B
2016-11-24
Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth's correction, are compared. The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect ('separation'). We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth's correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.
MODELING SNAKE MICROHABITAT FROM RADIOTELEMETRY STUDIES USING POLYTOMOUS LOGISTIC REGRESSION
Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...
The impact of young drivers' lifestyle on their road traffic accident risk in greater Athens area.
Chliaoutakis, J E; Darviri, C; Demakakos, P T
1999-11-01
Young drivers (18-24) both in Greece and elsewhere appear to have high rates of road traffic accidents. Many factors contribute to the creation of these high road traffic accidents rates. It has been suggested that lifestyle is an important one. The main objective of this study is to find out and clarify the (potential) relationship between young drivers' lifestyle and the road traffic accident risk they face. Moreover, to examine if all the youngsters have the same elevated risk on the road or not. The sample consisted of 241 young Greek drivers of both sexes. The statistical analysis included factor analysis and logistic regression analysis. Through the principal component analysis a ten factor scale was created which included the basic lifestyle traits of young Greek drivers. The logistic regression analysis showed that the young drivers whose dominant lifestyle trait is alcohol consumption or drive without destination have high accident risk, while these whose dominant lifestyle trait is culture, face low accident risk. Furthermore, young drivers who are religious in one way or another seem to have low accident risk. Finally, some preliminary observations on how health promotion should be put into practice are discussed.
Mak, Kwok-Kei; Kim, Dae-Hwan; Leigh, J Paul
2015-01-01
Few population-based studies have used an econometric approach to understand the association between two cancer risk factors, obesity and stress. This study investigated sociodemographic differences in the association between obesity and stress among Korean adults (6,546 men and 8,473 women). Data were drawn from the Korean National Health and Nutrition Examination Survey for 2008, 2009, and 2010. Ordered logistic regression models and propensity score matching methods were used to examine the associations between obesity and stress, stratified by gender and age groups. In women, the stress level of the obese group was found to be 27.6% higher than the nonobese group in the ordered logistic regression; the obesity effect on stress was statistically significant in the propensity score-matched analysis. Corresponding evidence for the effect of obesity on stress was lacking among men. Participants who were young, well-educated, and working were more likely to report stress. In Korea, obesity causes stress in women but not in men. Young women are susceptible to a disproportionate level of stress. More cancer prevention programs targeting young and obese women are encouraged in developed Asian countries.
Díaz Villegas, Gregory Mishell; Runzer Colmenares, Fernando
2015-01-01
To evaluate the association between calf circumference and gait speed in elderly patients 65 years or older at Geriatric day clinic at Peruvian Centro Médico Naval. Cross-sectional, retrospective study. We assessed 139 participants, 65 years or older at Peruvian Centro Médico Naval including calf circumference, gait speed and Short Physical Performance Battery. With bivariate analyses and logistic regression model we search for association between variables. The age mean was 79.37 years old (SD: 8.71). 59.71% were male, the 30.97% had a slow walking speed and the mean calf circumference was 33.42cm (SD: 5.61). After a bivariate analysis, we found a calf circumference mean of 30.35cm (SD: 3.74) in the slow speed group and, in normal gait group, a mean of 33.51cm (SD: 3.26) with significantly differences. We used logistic regression to analyze association with slow gait speed, founding statistically significant results adjusting model by disability and age. Low calf circumference is associated with slow speed walk in population over 65 years old. Copyright © 2014. Published by Elsevier Espana.
Factors associated with preventable infant death: a multiple logistic regression.
Vidal E Silva, Sandra Maria Cunha; Tuon, Rogério Antonio; Probst, Livia Fernandes; Gondinho, Brunna Verna Castro; Pereira, Antonio Carlos; Meneghim, Marcelo de Castro; Cortellazzi, Karine Laura; Ambrosano, Glaucia Maria Bovi
2018-01-01
OBJECTIVE To identify and analyze factors associated with preventable child deaths. METHODS This analytical cross-sectional study had preventable child mortality as dependent variable. From a population of 34,284 live births, we have selected a systematic sample of 4,402 children who did not die compared to 272 children who died from preventable causes during the period studied. The independent variables were analyzed in four hierarchical blocks: sociodemographic factors, the characteristics of the mother, prenatal and delivery care, and health conditions of the patient and neonatal care. We performed a descriptive statistical analysis and estimated multiple hierarchical logistic regression models. RESULTS Approximatelly 35.3% of the deaths could have been prevented with the early diagnosis and treatment of diseases during pregnancy and 26.8% of them could have been prevented with better care conditions for pregnant women. CONCLUSIONS The following characteristics of the mother are determinant for the higher mortality of children before the first year of life: living in neighborhoods with an average family income lower than four minimum wages, being aged ≤ 19 years, having one or more alive children, having a child with low APGAR level at the fifth minute of life, and having a child with low birth weight.
Sawamoto, Ryoko; Nozaki, Takehiro; Furukawa, Tomokazu; Tanahashi, Tokusei; Morita, Chihiro; Hata, Tomokazu; Komaki, Gen; Sudo, Nobuyuki
2016-01-01
To investigate predictors of dropout from a group cognitive behavioral therapy (CBT) intervention for overweight or obese women. 119 overweight and obese Japanese women aged 25-65 years who attended an outpatient weight loss intervention were followed throughout the 7-month weight loss phase. Somatic characteristics, socioeconomic status, obesity-related diseases, diet and exercise habits, and psychological variables (depression, anxiety, self-esteem, alexithymia, parenting style, perfectionism, and eating attitude) were assessed at baseline. Significant variables, extracted by univariate statistical analysis, were then used as independent variables in a stepwise multiple logistic regression analysis with dropout as the dependent variable. 90 participants completed the weight loss phase, giving a dropout rate of 24.4%. The multiple logistic regression analysis demonstrated that compared to completers the dropouts had significantly stronger body shape concern, tended to not have jobs, perceived their mothers to be less caring, and were more disorganized in temperament. Of all these factors, the best predictor of dropout was shape concern. Shape concern, job condition, parenting care, and organization predicted dropout from the group CBT weight loss intervention for overweight or obese Japanese women. © 2016 S. Karger GmbH, Freiburg.
Spatial analysis of alcohol-related motor vehicle crash injuries in southeastern Michigan.
Meliker, Jaymie R; Maio, Ronald F; Zimmerman, Marc A; Kim, Hyungjin Myra; Smith, Sarah C; Wilson, Mark L
2004-11-01
Temporal, behavioral and social risk factors that affect injuries resulting from alcohol-related motor vehicle crashes have been characterized in previous research. Much less is known about spatial patterns and environmental associations of alcohol-related motor vehicle crashes. The aim of this study was to evaluate geographic patterns of alcohol-related motor vehicle crashes and to determine if locations of alcohol outlets are associated with those crashes. In addition, we sought to demonstrate the value of integrating spatial and traditional statistical techniques in the analysis of this preventable public health risk. The study design was a cross-sectional analysis of individual-level blood alcohol content, traffic report information, census block group data, and alcohol distribution outlets. Besag and Newell's spatial analysis and traditional logistic regression both indicated that areas of low population density had more alcohol-related motor vehicle crashes than expected (P < 0.05). There was no significant association between alcohol outlets and alcohol-related motor vehicle crashes using distance analyses, logistic regression, and Chi-square. Differences in environmental or behavioral factors characteristic of areas of low population density may be responsible for the higher proportion of alcohol-related crashes occurring in these areas.
Sawamoto, Ryoko; Nozaki, Takehiro; Furukawa, Tomokazu; Tanahashi, Tokusei; Morita, Chihiro; Hata, Tomokazu; Komaki, Gen; Sudo, Nobuyuki
2016-01-01
Objective To investigate predictors of dropout from a group cognitive behavioral therapy (CBT) intervention for overweight or obese women. Methods 119 overweight and obese Japanese women aged 25-65 years who attended an outpatient weight loss intervention were followed throughout the 7-month weight loss phase. Somatic characteristics, socioeconomic status, obesity-related diseases, diet and exercise habits, and psychological variables (depression, anxiety, self-esteem, alexithymia, parenting style, perfectionism, and eating attitude) were assessed at baseline. Significant variables, extracted by univariate statistical analysis, were then used as independent variables in a stepwise multiple logistic regression analysis with dropout as the dependent variable. Results 90 participants completed the weight loss phase, giving a dropout rate of 24.4%. The multiple logistic regression analysis demonstrated that compared to completers the dropouts had significantly stronger body shape concern, tended to not have jobs, perceived their mothers to be less caring, and were more disorganized in temperament. Of all these factors, the best predictor of dropout was shape concern. Conclusion Shape concern, job condition, parenting care, and organization predicted dropout from the group CBT weight loss intervention for overweight or obese Japanese women. PMID:26745715
Winkler, Petr; Horáček, Jiří; Weissová, Aneta; Šustr, Martin; Brunovský, Martin
2015-01-01
Comorbidities associated with depression have been researched in a number of contexts. However, the epidemiological situation in clinical practice is understudied, especially in the post-Communist Central and Eastern Europe region. The aim of this study was to assess physical comorbidities in depression, and to identify whether there are increased odds of physical comorbidities associated with co-occurring depressive and anxiety disorders. Data on 4264 patients aged 18–98 were collected among medical doctors in the Czech Republic between 2010 and 2011. Descriptive statistics were calculated and multiple logistic regressions were performed to assess comorbidities among patients with depressive disorder. There were 51.29% of those who have a physical comorbidity, and 45.5% of those who have a comorbid anxiety disorders among patients treated with depression in Czech primary care. Results of logistic regressions show that odds of having pain, hypertension or diabetes mellitus are particularly elevated at those who have co-occurring depressive and anxiety disorder. Our findings demonstrate that comorbidities associated with depressive disorders are highly prevalent in primary health care practice, and that physical comorbidities are particularly frequent among those with co-occurring depressive and anxiety disorders. PMID:26690458
Factors Influencing Cecal Intubation Time during Retrograde Approach Single-Balloon Enteroscopy
Chen, Peng-Jen; Shih, Yu-Lueng; Huang, Hsin-Hung; Hsieh, Tsai-Yuan
2014-01-01
Background and Aim. The predisposing factors for prolonged cecal intubation time (CIT) during colonoscopy have been well identified. However, the factors influencing CIT during retrograde SBE have not been addressed. The aim of this study was to determine the factors influencing CIT during retrograde SBE. Methods. We investigated patients who underwent retrograde SBE at a medical center from January 2011 to March 2014. The medical charts and SBE reports were reviewed. The patients' characteristics and procedure-associated data were recorded. These data were analyzed with univariate analysis as well as multivariate logistic regression analysis to identify the possible predisposing factors. Results. We enrolled 66 patients into this study. The median CIT was 17.4 minutes. With univariate analysis, there was no statistical difference in age, sex, BMI, or history of abdominal surgery, except for bowel preparation (P = 0.021). Multivariate logistic regression analysis showed that inadequate bowel preparation (odds ratio 30.2, 95% confidence interval 4.63–196.54; P < 0.001) was the independent predisposing factors for prolonged CIT during retrograde SBE. Conclusions. For experienced endoscopist, inadequate bowel preparation was the independent predisposing factor for prolonged CIT during retrograde SBE. PMID:25505904
Relation between serum creatinine and postoperative results of open-heart surgery.
Ezeldin, Tamer H
2013-10-01
To determine the impact of preoperative serum creatinine level in non-dialyzable patients on postoperative morbidity and mortality. This is a prospective study, where serum creatinine was used to give primary assessment on renal function status preoperatively. This study includes 1,033 patients, who underwent coronary artery bypass grafting, or valve(s) operations. The study took place at Al-Hada Military Hospital, Taif, Kingdom of Saudi between May 2008 and January 2012. Data were statistically analyzed using Chi square (x2) test and multivariable logistic regression, to evaluate the postoperative morbidity and mortality risks associated with low serum creatinine levels. Postoperative mortality increased with high serum creatinine level >1.8 mg/dL (p=0.0005). Multivariable logistic regression, adjusting for potentially confounding variables demonstrated that a creatinine level of more than 1.8 mg/dL was associated with increased risk of re-operation for bleeding, postoperative renal failure, prolonged ventilatory support, ICU stay, and total hospital stay. Perioperative serum creatinine is strongly related to post operative morbidity and mortality in open heart surgery. High serum creatinine in non-dialyzable patients can predict the increased morbidity and mortality after cardiac operations.
Biomarker combinations for diagnosis and prognosis in multicenter studies: Principles and methods.
Meisner, Allison; Parikh, Chirag R; Kerr, Kathleen F
2017-01-01
Many investigators are interested in combining biomarkers to predict a binary outcome or detect underlying disease. This endeavor is complicated by the fact that many biomarker studies involve data from multiple centers. Depending upon the relationship between center, the biomarkers, and the target of prediction, care must be taken when constructing and evaluating combinations of biomarkers. We introduce a taxonomy to describe the role of center and consider how a biomarker combination should be constructed and evaluated. We show that ignoring center, which is frequently done by clinical researchers, is often not appropriate. The limited statistical literature proposes using random intercept logistic regression models, an approach that we demonstrate is generally inadequate and may be misleading. We instead propose using fixed intercept logistic regression, which appropriately accounts for center without relying on untenable assumptions. After constructing the biomarker combination, we recommend using performance measures that account for the multicenter nature of the data, namely the center-adjusted area under the receiver operating characteristic curve. We apply these methods to data from a multicenter study of acute kidney injury after cardiac surgery. Appropriately accounting for center, both in construction and evaluation, may increase the likelihood of identifying clinically useful biomarker combinations.
Rajbongshi, Nijara; Mahanta, Lipi B; Nath, Dilip C
2015-06-01
Breast cancer is the most commonly diagnosed cancer among the female population of Assam, India. Chewing of betel quid with or without tobacco is common practice among female population of this region. Moreoverthe method of preparing the betel quid is different from other parts of the country.So matched case control study is conducted to analyse whetherbetel quid chewing plays a significant role in the high incidence of breast cancer occurrences in Assam. Here, controls are matched to the cases by age at diagnosis (±5 years), family income and place of residence with matching ratio 1:1. Conditional logistic regression models and odd ratios (OR) was used to draw conclusions. It is observed that cases are more habituated to chewing habits than the controls.Further the conditional logistic regression analysis reveals that betel quid chewer faces 2.353 times more risk having breast cancer than the non-chewer with p value 0.0003 (95% CI 1.334-4.150). Though the female population in Assam usually does not smoke, the addictive habits typical to this region have equal effect on the occurrence of breast cancer.
Bayesian logistic regression approaches to predict incorrect DRG assignment.
Suleiman, Mani; Demirhan, Haydar; Boyd, Leanne; Girosi, Federico; Aksakalli, Vural
2018-05-07
Episodes of care involving similar diagnoses and treatments and requiring similar levels of resource utilisation are grouped to the same Diagnosis-Related Group (DRG). In jurisdictions which implement DRG based payment systems, DRGs are a major determinant of funding for inpatient care. Hence, service providers often dedicate auditing staff to the task of checking that episodes have been coded to the correct DRG. The use of statistical models to estimate an episode's probability of DRG error can significantly improve the efficiency of clinical coding audits. This study implements Bayesian logistic regression models with weakly informative prior distributions to estimate the likelihood that episodes require a DRG revision, comparing these models with each other and to classical maximum likelihood estimates. All Bayesian approaches had more stable model parameters than maximum likelihood. The best performing Bayesian model improved overall classification per- formance by 6% compared to maximum likelihood, with a 34% gain compared to random classification, respectively. We found that the original DRG, coder and the day of coding all have a significant effect on the likelihood of DRG error. Use of Bayesian approaches has improved model parameter stability and classification accuracy. This method has already lead to improved audit efficiency in an operational capacity.
Factors associated with vocal fold pathologies in teachers.
Souza, Carla Lima de; Carvalho, Fernando Martins; Araújo, Tânia Maria de; Reis, Eduardo José Farias Borges Dos; Lima, Verônica Maria Cadena; Porto, Lauro Antonio
2011-10-01
To analyze factors associated with the prevalence of the medical diagnosis of vocal fold pathologies in teachers. A census-based epidemiological, cross-sectional study was conducted with 4,495 public primary and secondary school teachers in the city of Salvador, Northeastern Brazil, between March and April 2006. The dependent variable was the self-reported medical diagnosis of vocal fold pathologies and the independent variables were sociodemographic characteristics; professional activity; work organization/interpersonal relationships; physical work environment characteristics; frequency of common mental disorders, measured by the Self-Reporting Questionnaire-20 (SRQ-20 >7); and general health conditions. Descriptive statistical, bivariate and multiple logistic regression analysis techniques were used. The prevalence of self-reported medical diagnosis of vocal fold pathologies was 18.9%. In the logistic regression analysis, the variables that remained associated with this medical diagnosis were as follows: being female, having worked as a teacher for more than seven years, excessive voice use, reporting more than five unfavorable physical work environment characteristics and presence of common mental disorders. The presence of self-reported vocal fold pathologies was associated with factors that point out the need of actions that promote teachers' vocal health and changes in their work structure and organization.
Sirichotiratana, Nithat; Yogi, Subash; Prutipinyo, Chardsumon
2013-08-30
This study was conducted during February-March 2012 to determine the perception and support regarding smoke-free policy among tourists at Suvarnabhumi International Airport, Bangkok, Thailand. In this cross-sectional study, 200 tourists (n = 200) were enrolled by convenience sampling and interviewed by structured questionnaire. Descriptive statistics, chi-square, and multinomial logistic regression were adopted in the study. Results revealed that half (50%) of the tourists were current smokers and 55% had visited Thailand twice or more. Three quarter (76%) of tourists indicated that they would visit Thailand again even if it had a 100% smoke-free regulation. Almost all (99%) of the tourists had supported for the smoke-free policy (partial ban and total ban), and current smokers had higher percentage of support than non-smokers. Two factors, current smoking status and knowledge level, were significantly associated with perception level. After analysis with Multinomial Logistic Regression, it was found that perception, country group, and presence of designated smoking room (DSR) were associated with smoke-free policy. Recommendation is that, at institution level effective monitoring system is needed at the airport. At policy level, the recommendation is that effective comprehensive policy needed to be emphasized to ensure smoke-free airport environment.
Association between developmental enamel defects in the primary and permanent dentitions.
Casanova-Rosado, A J; Medina-Solís, C E; Casanova-Rosado, J F; Vallejos-Sánchez, A A; Martinez-Mier, E A; Loyola-Rodríguez, J P; Islas-Márquez, A J; Maupomé, G
2011-09-01
To determine if the presence of developmental enamel defects (DED) in the primary dentition is a risk indicator for the presence of DED in the permanent dentition in children with mixed dentition, as well as others factors. A cross-sectional study was undertaken in 1296 school children ages six to 72 years. The DED [FDI; 1982] in both dentitions were identified by means of an oral exam scoring enamel opacities [classified as demarcated or diffused], and enamel hypoplasia. Sociodemographic and socioeconomic variables were collected through a questionnaire. Socioeconomic status (SES) was determined based on the occupation and maximum level of education of parents. Statistical analysis included logistic regression. Mean age of participants was 8.40 +/- 1.68; 51.6% were boys. DED prevalence was 7.5% in the permanent dentition and 10.0% in the primary dentition. The logistic regression model, adjusting for sociodemographic and socioeconomic variables, showed that for each primary tooth with DED, the odds of observing DED in the permanent dentition increased 7.38 times [95% CI = 1.17-1.64; p < 0.001]. An association between DED presence in both permanent and primary dentitions was observed. Further studies are necessary to fully characterise such relationship.
Brenn, T; Arnesen, E
1985-01-01
For comparative evaluation, discriminant analysis, logistic regression and Cox's model were used to select risk factors for total and coronary deaths among 6595 men aged 20-49 followed for 9 years. Groups with mortality between 5 and 93 per 1000 were considered. Discriminant analysis selected variable sets only marginally different from the logistic and Cox methods which always selected the same sets. A time-saving option, offered for both the logistic and Cox selection, showed no advantage compared with discriminant analysis. Analysing more than 3800 subjects, the logistic and Cox methods consumed, respectively, 80 and 10 times more computer time than discriminant analysis. When including the same set of variables in non-stepwise analyses, all methods estimated coefficients that in most cases were almost identical. In conclusion, discriminant analysis is advocated for preliminary or stepwise analysis, otherwise Cox's method should be used.
ERIC Educational Resources Information Center
DeMars, Christine E.
2009-01-01
The Mantel-Haenszel (MH) and logistic regression (LR) differential item functioning (DIF) procedures have inflated Type I error rates when there are large mean group differences, short tests, and large sample sizes.When there are large group differences in mean score, groups matched on the observed number-correct score differ on true score,…
Satellite rainfall retrieval by logistic regression
NASA Technical Reports Server (NTRS)
Chiu, Long S.
1986-01-01
The potential use of logistic regression in rainfall estimation from satellite measurements is investigated. Satellite measurements provide covariate information in terms of radiances from different remote sensors.The logistic regression technique can effectively accommodate many covariates and test their significance in the estimation. The outcome from the logistical model is the probability that the rainrate of a satellite pixel is above a certain threshold. By varying the thresholds, a rainrate histogram can be obtained, from which the mean and the variant can be estimated. A logistical model is developed and applied to rainfall data collected during GATE, using as covariates the fractional rain area and a radiance measurement which is deduced from a microwave temperature-rainrate relation. It is demonstrated that the fractional rain area is an important covariate in the model, consistent with the use of the so-called Area Time Integral in estimating total rain volume in other studies. To calibrate the logistical model, simulated rain fields generated by rainfield models with prescribed parameters are needed. A stringent test of the logistical model is its ability to recover the prescribed parameters of simulated rain fields. A rain field simulation model which preserves the fractional rain area and lognormality of rainrates as found in GATE is developed. A stochastic regression model of branching and immigration whose solutions are lognormally distributed in some asymptotic limits has also been developed.
Statistical text classifier to detect specific type of medical incidents.
Wong, Zoie Shui-Yee; Akiyama, Masanori
2013-01-01
WHO Patient Safety has put focus to increase the coherence and expressiveness of patient safety classification with the foundation of International Classification for Patient Safety (ICPS). Text classification and statistical approaches has showed to be successful to identifysafety problems in the Aviation industryusing incident text information. It has been challenging to comprehend the taxonomy of medical incidents in a structured manner. Independent reporting mechanisms for patient safety incidents have been established in the UK, Canada, Australia, Japan, Hong Kong etc. This research demonstrates the potential to construct statistical text classifiers to detect specific type of medical incidents using incident text data. An illustrative example for classifying look-alike sound-alike (LASA) medication incidents using structured text from 227 advisories related to medication errors from Global Patient Safety Alerts (GPSA) is shown in this poster presentation. The classifier was built using logistic regression model. ROC curve and the AUC value indicated that this is a satisfactory good model.
Tahir, M Ramzan; Tran, Quang X; Nikulin, Mikhail S
2017-05-30
We studied the problem of testing a hypothesized distribution in survival regression models when the data is right censored and survival times are influenced by covariates. A modified chi-squared type test, known as Nikulin-Rao-Robson statistic, is applied for the comparison of accelerated failure time models. This statistic is used to test the goodness-of-fit for hypertabastic survival model and four other unimodal hazard rate functions. The results of simulation study showed that the hypertabastic distribution can be used as an alternative to log-logistic and log-normal distribution. In statistical modeling, because of its flexible shape of hazard functions, this distribution can also be used as a competitor of Birnbaum-Saunders and inverse Gaussian distributions. The results for the real data application are shown. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
The extension of total gain (TG) statistic in survival models: properties and applications.
Choodari-Oskooei, Babak; Royston, Patrick; Parmar, Mahesh K B
2015-07-01
The results of multivariable regression models are usually summarized in the form of parameter estimates for the covariates, goodness-of-fit statistics, and the relevant p-values. These statistics do not inform us about whether covariate information will lead to any substantial improvement in prediction. Predictive ability measures can be used for this purpose since they provide important information about the practical significance of prognostic factors. R (2)-type indices are the most familiar forms of such measures in survival models, but they all have limitations and none is widely used. In this paper, we extend the total gain (TG) measure, proposed for a logistic regression model, to survival models and explore its properties using simulations and real data. TG is based on the binary regression quantile plot, otherwise known as the predictiveness curve. Standardised TG ranges from 0 (no explanatory power) to 1 ('perfect' explanatory power). The results of our simulations show that unlike many of the other R (2)-type predictive ability measures, TG is independent of random censoring. It increases as the effect of a covariate increases and can be applied to different types of survival models, including models with time-dependent covariate effects. We also apply TG to quantify the predictive ability of multivariable prognostic models developed in several disease areas. Overall, TG performs well in our simulation studies and can be recommended as a measure to quantify the predictive ability in survival models.
Bomfim, Rafael Aiello; Crosato, Edgard; Mazzilli, Luiz Eugênio Nigro; Frias, Antonio Carlos
2015-01-01
This study evaluates the prevalence and risk factors of non-carious cervical lesions (NCCLs) in a Brazilian population of workers exposed and non-exposed to acid mists and chemical products. One hundred workers (46 exposed and 54 non-exposed) were evaluated in a Centro de Referência em Saúde do Trabalhador - CEREST (Worker's Health Reference Center). The workers responded to questionnaires regarding their personal information and about alcohol consumption and tobacco use. A clinical examination was conducted to evaluate the presence of NCCLs, according to WHO parameters. Statistical analyses were performed by unconditional logistic regression and multiple linear regression, with the critical level of p < 0.05. NCCLs were significantly associated with age groups (18-34, 35-44, 45-68 years). The unconditional logistic regression showed that the presence of NCCLs was better explained by age group (OR = 4.04; CI 95% 1.77-9.22) and occupational exposure to acid mists and chemical products (OR = 3.84; CI 95% 1.10-13.49), whereas the linear multiple regression revealed that NCCLs were better explained by years of smoking (p = 0.01) and age group (p = 0.04). The prevalence of NCCLs in the study population was particularly high (76.84%), and the risk factors for NCCLs were age, exposure to acid mists and smoking habit. Controlling risk factors through preventive and educative measures, allied to the use of personal protective equipment to prevent the occupational exposure to acid mists, may contribute to minimizing the prevalence of NCCLs.
Chen, Chen; Xie, Yuanchang
2014-12-01
Driving hours and rest breaks are closely related to driver fatigue, which is a major contributor to truck crashes. This study investigates the effects of driving hours and rest breaks on commercial truck driver safety. A discrete-time logistic regression model is used to evaluate the crash odds ratios of driving hours and rest breaks. Driving time is divided into 11 one hour intervals. These intervals and rest breaks are modeled as dummy variables. In addition, a Cox proportional hazards regression model with time-dependent covariates is used to assess the transient effects of rest breaks, which consists of a fixed effect and a variable effect. Data collected from two national truckload carriers in 2009 and 2010 are used. The discrete-time logistic regression result indicates that only the crash odds ratio of the 11th driving hour is statistically significant. Taking one, two, and three rest breaks can reduce drivers' crash odds by 68%, 83%, and 85%, respectively, compared to drivers who did not take any rest breaks. The Cox regression result shows clear transient effects for rest breaks. It also suggests that drivers may need some time to adjust themselves to normal driving tasks after a rest break. Overall, the third rest break's safety benefit is very limited based on the results of both models. The findings of this research can help policy makers better understand the impact of driving time and rest breaks and develop more effective rules to improve commercial truck safety. Copyright © 2014 National Safety Council and Elsevier Ltd. All rights reserved.
Are low wages risk factors for hypertension?
Du, Juan
2012-01-01
Objective: Socio-economic status (SES) is strongly correlated with hypertension. But SES has several components, including income and correlations in cross-sectional data need not imply SES is a risk factor. This study investigates whether wages—the largest category within income—are risk factors. Methods: We analysed longitudinal, nationally representative US data from four waves (1999, 2001, 2003 and 2005) of the Panel Study of Income Dynamics. The overall sample was restricted to employed persons age 25–65 years, n = 17 295. Separate subsamples were constructed of persons within two age groups (25–44 and 45–65 years) and genders. Hypertension incidence was self-reported based on physician diagnosis. Our study was prospective since data from three base years (1999, 2001, 2003) were used to predict newly diagnosed hypertension for three subsequent years (2001, 2003, 2005). In separate analyses, data from the first base year were used to predict time-to-reporting hypertension. Logistic regressions with random effects and Cox proportional hazards regressions were run. Results: Negative and strongly statistically significant correlations between wages and hypertension were found both in logistic and Cox regressions, especially for subsamples containing the younger age group (25–44 years) and women. Correlations were stronger when three health variables—obesity, subjective measures of health and number of co-morbidities—were excluded from regressions. Doubling the wage was associated with 25–30% lower chances of hypertension for persons aged 25–44 years. Conclusions: The strongest evidence for low wages being risk factors for hypertension among working people were for women and persons aged 25–44 years. PMID:22262559
Are low wages risk factors for hypertension?
Leigh, J Paul; Du, Juan
2012-12-01
Socio-economic status (SES) is strongly correlated with hypertension. But SES has several components, including income and correlations in cross-sectional data need not imply SES is a risk factor. This study investigates whether wages-the largest category within income-are risk factors. We analysed longitudinal, nationally representative US data from four waves (1999, 2001, 2003 and 2005) of the Panel Study of Income Dynamics. The overall sample was restricted to employed persons age 25-65 years, n = 17 295. Separate subsamples were constructed of persons within two age groups (25-44 and 45-65 years) and genders. Hypertension incidence was self-reported based on physician diagnosis. Our study was prospective since data from three base years (1999, 2001, 2003) were used to predict newly diagnosed hypertension for three subsequent years (2001, 2003, 2005). In separate analyses, data from the first base year were used to predict time-to-reporting hypertension. Logistic regressions with random effects and Cox proportional hazards regressions were run. Negative and strongly statistically significant correlations between wages and hypertension were found both in logistic and Cox regressions, especially for subsamples containing the younger age group (25-44 years) and women. Correlations were stronger when three health variables-obesity, subjective measures of health and number of co-morbidities-were excluded from regressions. Doubling the wage was associated with 25-30% lower chances of hypertension for persons aged 25-44 years. The strongest evidence for low wages being risk factors for hypertension among working people were for women and persons aged 25-44 years.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd
Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test ofmore » the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.« less
NASA Astrophysics Data System (ADS)
Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd; Baharum, Adam
2015-10-01
Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test of the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.
The cross-validated AUC for MCP-logistic regression with high-dimensional data.
Jiang, Dingfeng; Huang, Jian; Zhang, Ying
2013-10-01
We propose a cross-validated area under the receiving operator characteristic (ROC) curve (CV-AUC) criterion for tuning parameter selection for penalized methods in sparse, high-dimensional logistic regression models. We use this criterion in combination with the minimax concave penalty (MCP) method for variable selection. The CV-AUC criterion is specifically designed for optimizing the classification performance for binary outcome data. To implement the proposed approach, we derive an efficient coordinate descent algorithm to compute the MCP-logistic regression solution surface. Simulation studies are conducted to evaluate the finite sample performance of the proposed method and its comparison with the existing methods including the Akaike information criterion (AIC), Bayesian information criterion (BIC) or Extended BIC (EBIC). The model selected based on the CV-AUC criterion tends to have a larger predictive AUC and smaller classification error than those with tuning parameters selected using the AIC, BIC or EBIC. We illustrate the application of the MCP-logistic regression with the CV-AUC criterion on three microarray datasets from the studies that attempt to identify genes related to cancers. Our simulation studies and data examples demonstrate that the CV-AUC is an attractive method for tuning parameter selection for penalized methods in high-dimensional logistic regression models.
GIS and statistical analysis for landslide susceptibility mapping in the Daunia area, Italy
NASA Astrophysics Data System (ADS)
Mancini, F.; Ceppi, C.; Ritrovato, G.
2010-09-01
This study focuses on landslide susceptibility mapping in the Daunia area (Apulian Apennines, Italy) and achieves this by using a multivariate statistical method and data processing in a Geographical Information System (GIS). The Logistic Regression (hereafter LR) method was chosen to produce a susceptibility map over an area of 130 000 ha where small settlements are historically threatened by landslide phenomena. By means of LR analysis, the tendency to landslide occurrences was, therefore, assessed by relating a landslide inventory (dependent variable) to a series of causal factors (independent variables) which were managed in the GIS, while the statistical analyses were performed by means of the SPSS (Statistical Package for the Social Sciences) software. The LR analysis produced a reliable susceptibility map of the investigated area and the probability level of landslide occurrence was ranked in four classes. The overall performance achieved by the LR analysis was assessed by local comparison between the expected susceptibility and an independent dataset extrapolated from the landslide inventory. Of the samples classified as susceptible to landslide occurrences, 85% correspond to areas where landslide phenomena have actually occurred. In addition, the consideration of the regression coefficients provided by the analysis demonstrated that a major role is played by the "land cover" and "lithology" causal factors in determining the occurrence and distribution of landslide phenomena in the Apulian Apennines.
Endoscopic third ventriculostomy in the treatment of childhood hydrocephalus.
Kulkarni, Abhaya V; Drake, James M; Mallucci, Conor L; Sgouros, Spyros; Roth, Jonathan; Constantini, Shlomi
2009-08-01
To develop a model to predict the probability of endoscopic third ventriculostomy (ETV) success in the treatment for hydrocephalus on the basis of a child's individual characteristics. We analyzed 618 ETVs performed consecutively on children at 12 international institutions to identify predictors of ETV success at 6 months. A multivariable logistic regression model was developed on 70% of the dataset (training set) and validated on 30% of the dataset (validation set). In the training set, 305/455 ETVs (67.0%) were successful. The regression model (containing patient age, cause of hydrocephalus, and previous cerebrospinal fluid shunt) demonstrated good fit (Hosmer-Lemeshow, P = .78) and discrimination (C statistic = 0.70). In the validation set, 105/163 ETVs (64.4%) were successful and the model maintained good fit (Hosmer-Lemeshow, P = .45), discrimination (C statistic = 0.68), and calibration (calibration slope = 0.88). A simplified ETV Success Score was devised that closely approximates the predicted probability of ETV success. Children most likely to succeed with ETV can now be accurately identified and spared the long-term complications of CSF shunting.
Austin, Peter C; Wagner, Philippe; Merlo, Juan
2017-03-15
Multilevel data occurs frequently in many research areas like health services research and epidemiology. A suitable way to analyze such data is through the use of multilevel regression models (MLRM). MLRM incorporate cluster-specific random effects which allow one to partition the total individual variance into between-cluster variation and between-individual variation. Statistically, MLRM account for the dependency of the data within clusters and provide correct estimates of uncertainty around regression coefficients. Substantively, the magnitude of the effect of clustering provides a measure of the General Contextual Effect (GCE). When outcomes are binary, the GCE can also be quantified by measures of heterogeneity like the Median Odds Ratio (MOR) calculated from a multilevel logistic regression model. Time-to-event outcomes within a multilevel structure occur commonly in epidemiological and medical research. However, the Median Hazard Ratio (MHR) that corresponds to the MOR in multilevel (i.e., 'frailty') Cox proportional hazards regression is rarely used. Analogously to the MOR, the MHR is the median relative change in the hazard of the occurrence of the outcome when comparing identical subjects from two randomly selected different clusters that are ordered by risk. We illustrate the application and interpretation of the MHR in a case study analyzing the hazard of mortality in patients hospitalized for acute myocardial infarction at hospitals in Ontario, Canada. We provide R code for computing the MHR. The MHR is a useful and intuitive measure for expressing cluster heterogeneity in the outcome and, thereby, estimating general contextual effects in multilevel survival analysis. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
Özge, C; Toros, F; Bayramkaya, E; Çamdeviren, H; Şaşmaz, T
2006-01-01
Background The purpose of this study is to evaluate the most important sociodemographic factors on smoking status of high school students using a broad randomised epidemiological survey. Methods Using in‐class, self administered questionnaire about their sociodemographic variables and smoking behaviour, a representative sample of total 3304 students of preparatory, 9th, 10th, and 11th grades, from 22 randomly selected schools of Mersin, were evaluated and discriminative factors have been determined using appropriate statistics. In addition to binary logistic regression analysis, the study evaluated combined effects of these factors using classification and regression tree methodology, as a new statistical method. Results The data showed that 38% of the students reported lifetime smoking and 16.9% of them reported current smoking with a male predominancy and increasing prevalence by age. Second hand smoking was reported at a 74.3% frequency with father predominance (56.6%). The significantly important factors that affect current smoking in these age groups were increased by household size, late birth rank, certain school types, low academic performance, increased second hand smoking, and stress (especially reported as separation from a close friend or because of violence at home). Classification and regression tree methodology showed the importance of some neglected sociodemographic factors with a good classification capacity. Conclusions It was concluded that, as closely related with sociocultural factors, smoking was a common problem in this young population, generating important academic and social burden in youth life and with increasing data about this behaviour and using new statistical methods, effective coping strategies could be composed. PMID:16891446
Vaeth, Michael; Skovlund, Eva
2004-06-15
For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.
San-Martín, Montserrat; Delgado-Bolton, Roberto; Vivanco, Luis
2017-01-01
Background: Empathy in the context of patient care is defined as a predominantly cognitive attribute that involves an understanding of the patient's experiences, concerns, and perspectives, combined with a capacity to communicate this understanding and an intention to help. In medical education, it is recognized that empathy can be improved by interventional approaches. In this sense, a semiotic-based curriculum could be an important didactic tool for improving medical empathy. The main purpose of this study was to determine if in medical schools where a semiotic-based curriculum is offered, the empathetic orientation of medical students improves as a consequence of the acquisition and development of students' communication skills that are required in clinician-patient encounters. Design: This quasi-experimental study was conducted in three medical schools of the Dominican Republic that offer three different medical curricula: (i) a theoretical and practical semiotic-based curriculum; (ii) a theoretical semiotic-based curriculum; and (iii) a curriculum without semiotic courses. The Jefferson scale of empathy was administered in two different moments to students enrolled in pre-clinical cycles of those institutions. Data was subjected to comparative statistical analysis and logistic regression analysis. Results: The study included 165 students (55 male and 110 female). Comparison analysis showed statistically significant differences in the development of empathy among groups ( p < 0.001). Logistic regression confirmed that gender, age, and a semiotic-based curriculum contributed toward the enhancement of empathy. Conclusion: These findings demonstrate the importance of medical semiotics as a didactic teaching method for improving beginners' empathetic orientation in patients' care.
Adverse Effects of Prolonged Sitting Behavior on the General Health of Office Workers.
Daneshmandi, Hadi; Choobineh, Alireza; Ghaem, Haleh; Karimi, Mehran
2017-07-01
Excessive sitting behavior is a risk factor for many adverse health outcomes. This study aimed to survey the prevalence of sitting behavior and its adverse effects among Iranian office workers. This cross-sectional study included 447 Iranian office workers. A two-part questionnaire was used as the data collection tool. The first part surveyed the demographic characteristics and general health of the respondents, while the second part contained the Nordic Musculoskeletal Questionnaire (NMQ) to assess symptoms. Statistical analyses were performed using the Statistical Package for the Social Sciences software using Mann-Whitney U and Chi-square tests and multiple logistic regression analysis. The respondents spent an average of 6.29 hours of an 8-hour working shift in a sitting position. The results showed that 48.8% of the participants did not feel comfortable with their workstations and 73.6% felt exhausted during the workday. Additionally, 6.3% suffered from hypertension, and 11.2% of them reported hyperlipidemia. The results of the NMQ showed that neck (53.5%), lower back (53.2%) and shoulder (51.6%) symptoms were the most prevalent problem among office workers. Based upon a multiple logistic regression, only sex had a significant association with prolonged sitting behavior (odds ratio = 3.084). Our results indicated that long sitting times were associated with exhaustion during the working day, decreased job satisfaction, hypertension, and musculoskeletal disorder symptoms in the shoulders, lower back, thighs, and knees of office workers. Sitting behavior had adverse effects on office workers. Active workstations are therefore recommended to improve working conditions.
Wang, Jiao; Luo, Gong-tang; Niu, Wei-jing; Gong, Man-man; Liu, Lu; Zhou, Jie; Zhou, Xue-wei; He, Li-hua
2013-12-18
To explore the risk and protective factors of kidney calculi in order to put forward theoretical basis for preventive and control measures. A 1:1 matched case-control study was performed using data from a hospital in Beijing. The case group included 100 inpatients who were diagnosed kidney calculi using B ultrasonic, X-ray and intravenous pyelography during the survey while other 100 urolithiasis and endocrine disease excluded inpatients who shared the same sex, within five years gap to the case group inpatients were for the control group. A face-to-face survey was conducted with self-made questionnaires which covered demographic characteristics, water issues, dietary habits, genetic and medical history. Epidata 3.0 was used to build the database and SPSS 19.0 for the statistical analysis. In the univariate Logistic regression analysis, ten variables were found showing statistical significance. For the multivariate Logistic regression analysis, variables left in the model were labor intensity (OR=0.622, 95%CI: 0.435-0.889), preferring to drink after dinner (OR=0.316, 95%CI: 0.122-0.815), loving drinking (OR=0.232, 95%CI: 0.084-0.642), drinking tea regularly (OR=1.463, 95%CI: 1.033-2.071), eating more vegetables (OR=0.571, 95%CI: 0.328-0.993), the history of the urolithiasis (OR=2.127, 95%CI: 1.065-90.145). Drinking tea regularly, urolithiasis history and brain work are the risk factors of kidney calculi while loving drinking and eating more vegetables for the protection.
Datamining approaches for modeling tumor control probability.
Naqa, Issam El; Deasy, Joseph O; Mu, Yi; Huang, Ellen; Hope, Andrew J; Lindsay, Patricia E; Apte, Aditya; Alaly, James; Bradley, Jeffrey D
2010-11-01
Tumor control probability (TCP) to radiotherapy is determined by complex interactions between tumor biology, tumor microenvironment, radiation dosimetry, and patient-related variables. The complexity of these heterogeneous variable interactions constitutes a challenge for building predictive models for routine clinical practice. We describe a datamining framework that can unravel the higher order relationships among dosimetric dose-volume prognostic variables, interrogate various radiobiological processes, and generalize to unseen data before when applied prospectively. Several datamining approaches are discussed that include dose-volume metrics, equivalent uniform dose, mechanistic Poisson model, and model building methods using statistical regression and machine learning techniques. Institutional datasets of non-small cell lung cancer (NSCLC) patients are used to demonstrate these methods. The performance of the different methods was evaluated using bivariate Spearman rank correlations (rs). Over-fitting was controlled via resampling methods. Using a dataset of 56 patients with primary NCSLC tumors and 23 candidate variables, we estimated GTV volume and V75 to be the best model parameters for predicting TCP using statistical resampling and a logistic model. Using these variables, the support vector machine (SVM) kernel method provided superior performance for TCP prediction with an rs=0.68 on leave-one-out testing compared to logistic regression (rs=0.4), Poisson-based TCP (rs=0.33), and cell kill equivalent uniform dose model (rs=0.17). The prediction of treatment response can be improved by utilizing datamining approaches, which are able to unravel important non-linear complex interactions among model variables and have the capacity to predict on unseen data for prospective clinical applications.
Santos, Frédéric; Guyomarc'h, Pierre; Bruzek, Jaroslav
2014-12-01
Accuracy of identification tools in forensic anthropology primarily rely upon the variations inherent in the data upon which they are built. Sex determination methods based on craniometrics are widely used and known to be specific to several factors (e.g. sample distribution, population, age, secular trends, measurement technique, etc.). The goal of this study is to discuss the potential variations linked to the statistical treatment of the data. Traditional craniometrics of four samples extracted from documented osteological collections (from Portugal, France, the U.S.A., and Thailand) were used to test three different classification methods: linear discriminant analysis (LDA), logistic regression (LR), and support vector machines (SVM). The Portuguese sample was set as a training model on which the other samples were applied in order to assess the validity and reliability of the different models. The tests were performed using different parameters: some included the selection of the best predictors; some included a strict decision threshold (sex assessed only if the related posterior probability was high, including the notion of indeterminate result); and some used an unbalanced sex-ratio. Results indicated that LR tends to perform slightly better than the other techniques and offers a better selection of predictors. Also, the use of a decision threshold (i.e. p>0.95) is essential to ensure an acceptable reliability of sex determination methods based on craniometrics. Although the Portuguese, French, and American samples share a similar sexual dimorphism, application of Western models on the Thai sample (that displayed a lower degree of dimorphism) was unsuccessful. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
San-Martín, Montserrat; Delgado-Bolton, Roberto; Vivanco, Luis
2017-01-01
Background: Empathy in the context of patient care is defined as a predominantly cognitive attribute that involves an understanding of the patient’s experiences, concerns, and perspectives, combined with a capacity to communicate this understanding and an intention to help. In medical education, it is recognized that empathy can be improved by interventional approaches. In this sense, a semiotic-based curriculum could be an important didactic tool for improving medical empathy. The main purpose of this study was to determine if in medical schools where a semiotic-based curriculum is offered, the empathetic orientation of medical students improves as a consequence of the acquisition and development of students’ communication skills that are required in clinician–patient encounters. Design: This quasi-experimental study was conducted in three medical schools of the Dominican Republic that offer three different medical curricula: (i) a theoretical and practical semiotic-based curriculum; (ii) a theoretical semiotic-based curriculum; and (iii) a curriculum without semiotic courses. The Jefferson scale of empathy was administered in two different moments to students enrolled in pre-clinical cycles of those institutions. Data was subjected to comparative statistical analysis and logistic regression analysis. Results: The study included 165 students (55 male and 110 female). Comparison analysis showed statistically significant differences in the development of empathy among groups (p < 0.001). Logistic regression confirmed that gender, age, and a semiotic-based curriculum contributed toward the enhancement of empathy. Conclusion: These findings demonstrate the importance of medical semiotics as a didactic teaching method for improving beginners’ empathetic orientation in patients’ care. PMID:29209252
Probability of foliar injury for Acer sp. based on foliar fluoride concentrations.
McDonough, Andrew M; Dixon, Murray J; Terry, Debbie T; Todd, Aaron K; Luciani, Michael A; Williamson, Michele L; Roszak, Danuta S; Farias, Kim A
2016-12-01
Fluoride is considered one of the most phytotoxic elements to plants, and indicative fluoride injury has been associated over a wide range of foliar fluoride concentrations. The aim of this study was to determine the probability of indicative foliar fluoride injury based on Acer sp. foliar fluoride concentrations using a logistic regression model. Foliage from Acer nedundo, Acer saccharinum, Acer saccharum and Acer platanoides was collected along a distance gradient from three separate brick manufacturing facilities in southern Ontario as part of a long-term monitoring programme between 1995 and 2014. Hydrogen fluoride is the major emission source associated with the manufacturing facilities resulting with highly elevated foliar fluoride close to the facilities and decreasing with distance. Consistent with other studies, indicative fluoride injury was observed over a wide range of foliar concentrations (9.9-480.0 μg F - g -1 ). The logistic regression model was statistically significant for the Acer sp. group, A. negundo and A. saccharinum; consequently, A. negundo being the most sensitive species among the group. In addition, A. saccharum and A. platanoides were not statistically significant within the model. We are unaware of published foliar fluoride values for Acer sp. within Canada, and this research provides policy maker and scientist with probabilities of indicative foliar injury for common urban Acer sp. trees that can help guide decisions about emissions controls. Further research should focus on mechanisms driving indicative fluoride injury over wide ranging foliar fluoride concentrations and help determine foliar fluoride thresholds for damage.
Developing and Testing a Model to Predict Outcomes of Organizational Change
Gustafson, David H; Sainfort, François; Eichler, Mary; Adams, Laura; Bisognano, Maureen; Steudel, Harold
2003-01-01
Objective To test the effectiveness of a Bayesian model employing subjective probability estimates for predicting success and failure of health care improvement projects. Data Sources Experts' subjective assessment data for model development and independent retrospective data on 221 healthcare improvement projects in the United States, Canada, and the Netherlands collected between 1996 and 2000 for validation. Methods A panel of theoretical and practical experts and literature in organizational change were used to identify factors predicting the outcome of improvement efforts. A Bayesian model was developed to estimate probability of successful change using subjective estimates of likelihood ratios and prior odds elicited from the panel of experts. A subsequent retrospective empirical analysis of change efforts in 198 health care organizations was performed to validate the model. Logistic regression and ROC analysis were used to evaluate the model's performance using three alternative definitions of success. Data Collection For the model development, experts' subjective assessments were elicited using an integrative group process. For the validation study, a staff person intimately involved in each improvement project responded to a written survey asking questions about model factors and project outcomes. Results Logistic regression chi-square statistics and areas under the ROC curve demonstrated a high level of model performance in predicting success. Chi-square statistics were significant at the 0.001 level and areas under the ROC curve were greater than 0.84. Conclusions A subjective Bayesian model was effective in predicting the outcome of actual improvement projects. Additional prospective evaluations as well as testing the impact of this model as an intervention are warranted. PMID:12785571
Amarilla, Almudena; Espinar-Escalona, Eduardo; Castellanos-Cosano, Lizett; Martín-González, Jenifer; Sánchez-Domínguez, Benito; López-Frías, Francisco J.
2012-01-01
Introduction: The purpose of this study was to compare, in a split mouth design, the external apical root resorption (EARR) associated with orthodontic treatment in root-filled maxillary incisors and their contralateral teeth with vital pulps. Methodology: The study sample consisted of 38 patients (14 males and 24 females), who had one root-filled incisor before completion of multiband/bracket orthodontic therapy for at least 1 year. For each patient, digital panoramic radiographs taken before and after orthodontic treatment were used to determine the root resortion and the proportion of external root resorption (PRR), defined as the ratio between the root resorption in the endodontically treated incisor and that in its contralateral incisor with a vital pulp. The student’s t-test, chi-square test and logistic regression analysis were used to determine statistical significance. Results: There was no statistically significant difference (p > 0.05) between EARR in vital teeth (1.1 ± 1.0 mm) and endodontically treated incisors (1.1 ± 0.8 mm). Twenty-six patients (68.4%) showed greater resorption of the endodontically treated incisor than its homolog vital tooth (p > 0.05). The mean and standard deviation of PPR were 1.0 ± 0.2. Multivariate logistic regression suggested that PRR does not correlate with any of the variables analyzed. Conclusions: There was no significant difference in the amount or severity of external root resorption during orthodontic movement between root-filled incisors and their contralateral teeth with vital pulps. Key words:Endodontics, orthodontics, root canal treatment, root resorption. PMID:22143731
Zgheib, Sandy M; Kacim, Mohammad; Kostev, Karel
2017-12-01
During the last decades, there has been an alarming and dramatic increase in the number of cesarean births in both developed and undeveloped countries. This increase has not been clinically justified but, nevertheless, has raised an important number of issues. The aim of this study was to determine the risk factors associated with the high cesarean section rates in Lebanon. This study is based on a sample of 29,270 Lebanese women who were pregnant between 2000 and 2015. Among these, 14,327 gave birth by cesarean section and 14,943 gave birth vaginally. To identify the risk factors of cesarean section, logistic regression was applied as a statistical method using the SPSS statistical package. Of the 29,270 pregnant women included in the study, 49% had cesarean sections while 51% gave birth vaginally. Repeat cesarean section accounted for 23% while vaginal birth after cesarean accounted for only 0.2% of deliveries. In addition, weekdays were associated with a preference of providers to carry out more cesarean sections. According to an analysis of our data using logistic regression, the risk factors associated with the increase in cesarean section rates were advanced maternal age, elective cesarean section, malpresentation of fetus, multiple birth, prolonged pregnancy, prolonged labor, and fetal distress. Based on these results, it is recommended that a new health policy be implemented to reduce the number of unnecessary cesarean deliveries in Lebanon. Copyright © 2017 Australian College of Midwives. Published by Elsevier Ltd. All rights reserved.
Beyene, Belay Bezabih; Yalew, Woyneshet Gelaye; Demilew, Ermias; Abie, Getent; Tewabe, Tsehaye; Abera, Bayeh
2016-03-01
Malaria is one of the leading public health challenges in Ethiopia. To address this, the Federal Ministry of Ethiopia launched a laboratory diagnosis programme for promoting use of either rapid diagnostic tests (RDTs) or Giemsa microscopy to all suspected malaria cases. This study was conducted to assess the performance of RDT and influencing factors for Giemsa microscopic diagnosis in Amhara region. A cross-sectional study was conducted in 10 high burden malaria districts of Amhara region from 15 May to 15 June 2014. Data were collected using structured questionnaire. Blood samples were collected from 1000 malaria suspected cases in 10 health centers. RDT (SD BIOLINE) and Giemsa microscopy were performed as per standard procedures. Kappa value, logistic regression and chi-square test were used for statistical analysis. The overall positivity rate (PR) of malaria parasites by RDT and Giemsa microscopy was 17.1 and 16.5% respectively. Compared to Giemsa microscopy as "gold standard", RDT showed 83.9% sensitivity and 96% specificity. The level of agreement between first reader and second reader for blood film microscopy was moderate (Kappa value = 0.74). Logistic regression showed that male, under five year of age and having fever more than 24 h prior to malaria diagnosis had statistically significant association with malaria positivity rate for malaria parasites. The overall specificity and negative predictive values of RDT for malaria diagnosis were excellent. However, the sensitivity and positive predictive values of RDT were low. Therefore, in-service training, quality monitoring of RDTs, and adequate laboratory supplies for diagnostic services of malaria would be crucial for effective intervention measures.
Quantitative Analysis of Land Loss in Coastal Louisiana Using Remote Sensing
NASA Astrophysics Data System (ADS)
Wales, P. M.; Kuszmaul, J.; Roberts, C.
2005-12-01
For the past thirty-five years the land loss along the Louisiana Coast has been recognized as a growing problem. One of the clearest indicators of this land loss is that in 2000 smooth cord grass (spartina alterniflora) was turning brown well before its normal hibernation period. Over 100,000 acres of marsh were affected by the 2000 browning. In 2001 data were collected using low altitude helicopter based transects of the coast, with 7,400 data points being collected by researchers at the USGS, National Wetlands Research Center, and Louisiana Department of Natural Resources. The surveys contained data describing the characteristics of the marsh, including latitude, longitude, marsh condition, marsh color, percent vegetated, and marsh die-back. Creating a model that combines remote sensing images, field data, and statistical analysis to develop a methodology for estimating the margin of error in measurements of coastal land loss (erosion) is the ultimate goal of the study. A model was successfully created using a series of band combinations (used as predictive variables). The most successful band combinations or predictive variables were the braud value [(Sum Visible TM Bands - Sum Infrared TM Bands)/(Sum Visible TM Bands + Sum Infrared TM Bands)], TM band 7/ TM band 2, brightness, NDVI, wetness, vegetation index, and a 7x7 autocovariate nearest neighbor floating window. The model values were used to generate the logistic regression model. A new image was created based on the logistic regression probability equation where each pixel represents the probability of finding water or non-water at that location in each image. Pixels within each image that have a high probability of representing water have a value close to 1 and pixels with a low probability of representing water have a value close to 0. A logistic regression model is proposed that uses seven independent variables. This model yields an accurate classification in 86.5% of the locations considered in the 1997 and 2001 survey locations. When the logistic regression was modeled to the satellite imagery of the entire Louisiana Coast study area a statewide loss was estimated to be 358 mi2 to 368 mi2, from 1997 to 2001, using two different methods for estimating land loss.
Sampson, Maureen L; Gounden, Verena; van Deventer, Hendrik E; Remaley, Alan T
2016-02-01
The main drawback of the periodic analysis of quality control (QC) material is that test performance is not monitored in time periods between QC analyses, potentially leading to the reporting of faulty test results. The objective of this study was to develop a patient based QC procedure for the more timely detection of test errors. Results from a Chem-14 panel measured on the Beckman LX20 analyzer were used to develop the model. Each test result was predicted from the other 13 members of the panel by multiple regression, which resulted in correlation coefficients between the predicted and measured result of >0.7 for 8 of the 14 tests. A logistic regression model, which utilized the measured test result, the predicted test result, the day of the week and time of day, was then developed for predicting test errors. The output of the logistic regression was tallied by a daily CUSUM approach and used to predict test errors, with a fixed specificity of 90%. The mean average run length (ARL) before error detection by CUSUM-Logistic Regression (CSLR) was 20 with a mean sensitivity of 97%, which was considerably shorter than the mean ARL of 53 (sensitivity 87.5%) for a simple prediction model that only used the measured result for error detection. A CUSUM-Logistic Regression analysis of patient laboratory data can be an effective approach for the rapid and sensitive detection of clinical laboratory errors. Published by Elsevier Inc.
Automatic prediction of solar flares and super geomagnetic storms
NASA Astrophysics Data System (ADS)
Song, Hui
Space weather is the response of our space environment to the constantly changing Sun. As the new technology advances, mankind has become more and more dependent on space system, satellite-based services. A geomagnetic storm, a disturbance in Earth's magnetosphere, may produce many harmful effects on Earth. Solar flares and Coronal Mass Ejections (CMEs) are believed to be the major causes of geomagnetic storms. Thus, establishing a real time forecasting method for them is very important in space weather study. The topics covered in this dissertation are: the relationship between magnetic gradient and magnetic shear of solar active regions; the relationship between solar flare index and magnetic features of solar active regions; based on these relationships a statistical ordinal logistic regression model is developed to predict the probability of solar flare occurrences in the next 24 hours; and finally the relationship between magnetic structures of CME source regions and geomagnetic storms, in particular, the super storms when the D st index decreases below -200 nT is studied and proved to be able to predict those super storms. The results are briefly summarized as follows: (1) There is a significant correlation between magnetic gradient and magnetic shear of active region. Furthermore, compared with magnetic shear, magnetic gradient might be a better proxy to locate where a large flare occurs. It appears to be more accurate in identification of sources of X-class flares than M-class flares; (2) Flare index, defined by weighting the SXR flares, is proved to have positive correlation with three magnetic features of active region; (3) A statistical ordinal logistic regression model is proposed for solar flare prediction. The results are much better than those data published in the NASA/SDAC service, and comparable to the data provided by the NOAA/SEC complicated expert system. To our knowledge, this is the first time that logistic regression model has been applied in solar physics to predict flare occurrences; (4) The magnetic orientation angle [straight theta], determined from a potential field model, is proved to be able to predict the probability of super geomagnetic storms (D= st <=-200nT). The results show that those active regions associated with | [straight theta]| < 90° are more likely to cause a super geomagnetic storm.
Flynn-Evans, Erin E.; Lockley, Steven W.
2016-01-01
Study Objectives: There is currently no questionnaire-based pre-screening tool available to detect non-24-hour sleep-wake rhythm disorder (N24HSWD) among blind patients. Our goal was to develop such a tool, derived from gold standard, objective hormonal measures of circadian entrainment status, for the detection of N24HSWD among those with visual impairment. Methods: We evaluated the contribution of 40 variables in their ability to predict N24HSWD among 127 blind women, classified using urinary 6-sulfatoxymelatonin period, an objective marker of circadian entrainment status in this population. We subjected the 40 candidate predictors to 1,000 bootstrapped iterations of a logistic regression forward selection model to predict N24HSWD, with model inclusion set at the p < 0.05 level. We removed any predictors that were not selected at least 1% of the time in the 1,000 bootstrapped models and applied a second round of 1,000 bootstrapped logistic regression forward selection models to the remaining 23 candidate predictors. We included all questions that were selected at least 10% of the time in the final model. We subjected the selected predictors to a final logistic regression model to predict N24SWD over 1,000 bootstrapped models to calculate the concordance statistic and adjusted optimism of the final model. We used this information to generate a predictive model and determined the sensitivity and specificity of the model. Finally, we applied the model to a cohort of 1,262 blind women who completed the survey, but did not collect urine samples. Results: The final model consisted of eight questions. The concordance statistic, adjusted for bootstrapping, was 0.85. The positive predictive value was 88%, the negative predictive value was 79%. Applying this model to our larger dataset of women, we found that 61% of those without light perception, and 27% with some degree of light perception, would be referred for further screening for N24HSWD. Conclusions: Our model has predictive utility sufficient to serve as a pre-screening questionnaire for N24HSWD among the blind. Citation: Flynn-Evans EE, Lockley SW. A pre-screening questionnaire to predict non-24-hour sleep-wake rhythm disorder (N24HSWD) among the blind. J Clin Sleep Med 2016;12(5):703–710. PMID:26951421
Over- and undersupply in home care: a representative multicenter correlational study.
Lahmann, Nils A; Suhr, Ralf; Kuntz, Simone; Kottner, Jan
2015-04-01
Quality assurance and funding of care become a major challenge against the background of demographic changes in western societies. The primary aim of the study was to identify possible misclassification, respectively over and undersupply of care by comparing the Barthel Index of clients of home care service with the level of care (Stage 0, I, II, III) according to the statutory German long-term care insurance. In 2012, a multi-center point prevalence study of 878 randomly selected clients of 100 randomly selected home care services across Germany was conducted. According to a standardized study protocol, demographics, the Barthel Index and the nurses' professional judgment-whether a client requires more nursing care-were assessed. Associations of the Barthel items and professional judgment were analyzed using univariate (Chi-square) and multivariate (logistic regression and classification-regression-tree-models) statistics. In each level of care, the Barthel Index showed large variability e.g. in level II ranging from 0 to 100 points. Multivariate logistic regression regarding possible under- and oversupply revealed occasionally fecal incontinence (2.1; 95 % CI 1.2-3.7), urinary incontinence (2.0; 95 % CI 1.1-3.6), feeding (1.7; 95 % CI 1.0-2.9), immobility (0.2; 95 % CI 0.1-0.6) and to be female (1.8; 95 % CI 1.2-2.6) to be statistically significantly associated. The variability in Barthel Index in each level of care found in this study indicated a large general misclassification of home care clients according to their actual need of care. Professional caregivers identified occasional incontinence, help with eating and drinking and mobility (especially in female clients) as areas of possible under- and oversupply of care. The statutory German long-term care insurance classification should be modified according to the above finding to increase the quality of care in home care clients.
Comparison of statistical tests for association between rare variants and binary traits.
Bacanu, Silviu-Alin; Nelson, Matthew R; Whittaker, John C
2012-01-01
Genome-wide association studies have found thousands of common genetic variants associated with a wide variety of diseases and other complex traits. However, a large portion of the predicted genetic contribution to many traits remains unknown. One plausible explanation is that some of the missing variation is due to the effects of rare variants. Nonetheless, the statistical analysis of rare variants is challenging. A commonly used method is to contrast, within the same region (gene), the frequency of minor alleles at rare variants between cases and controls. However, this strategy is most useful under the assumption that the tested variants have similar effects. We previously proposed a method that can accommodate heterogeneous effects in the analysis of quantitative traits. Here we extend this method to include binary traits that can accommodate covariates. We use simulations for a variety of causal and covariate impact scenarios to compare the performance of the proposed method to standard logistic regression, C-alpha, SKAT, and EREC. We found that i) logistic regression methods perform well when the heterogeneity of the effects is not extreme and ii) SKAT and EREC have good performance under all tested scenarios but they can be computationally intensive. Consequently, it would be more computationally desirable to use a two-step strategy by (i) selecting promising genes by faster methods and ii) analyzing selected genes using SKAT/EREC. To select promising genes one can use (1) regression methods when effect heterogeneity is assumed to be low and the covariates explain a non-negligible part of trait variability, (2) C-alpha when heterogeneity is assumed to be large and covariates explain a small fraction of trait's variability and (3) the proposed trend and heterogeneity test when the heterogeneity is assumed to be non-trivial and the covariates explain a large fraction of trait variability.
Nonconvex Sparse Logistic Regression With Weakly Convex Regularization
NASA Astrophysics Data System (ADS)
Shen, Xinyue; Gu, Yuantao
2018-06-01
In this work we propose to fit a sparse logistic regression model by a weakly convex regularized nonconvex optimization problem. The idea is based on the finding that a weakly convex function as an approximation of the $\\ell_0$ pseudo norm is able to better induce sparsity than the commonly used $\\ell_1$ norm. For a class of weakly convex sparsity inducing functions, we prove the nonconvexity of the corresponding sparse logistic regression problem, and study its local optimality conditions and the choice of the regularization parameter to exclude trivial solutions. Despite the nonconvexity, a method based on proximal gradient descent is used to solve the general weakly convex sparse logistic regression, and its convergence behavior is studied theoretically. Then the general framework is applied to a specific weakly convex function, and a necessary and sufficient local optimality condition is provided. The solution method is instantiated in this case as an iterative firm-shrinkage algorithm, and its effectiveness is demonstrated in numerical experiments by both randomly generated and real datasets.
A comparative study on entrepreneurial attitudes modeled with logistic regression and Bayes nets.
López Puga, Jorge; García García, Juan
2012-11-01
Entrepreneurship research is receiving increasing attention in our context, as entrepreneurs are key social agents involved in economic development. We compare the success of the dichotomic logistic regression model and the Bayes simple classifier to predict entrepreneurship, after manipulating the percentage of missing data and the level of categorization in predictors. A sample of undergraduate university students (N = 1230) completed five scales (motivation, attitude towards business creation, obstacles, deficiencies, and training needs) and we found that each of them predicted different aspects of the tendency to business creation. Additionally, our results show that the receiver operating characteristic (ROC) curve is affected by the rate of missing data in both techniques, but logistic regression seems to be more vulnerable when faced with missing data, whereas Bayes nets underperform slightly when categorization has been manipulated. Our study sheds light on the potential entrepreneur profile and we propose to use Bayesian networks as an additional alternative to overcome the weaknesses of logistic regression when missing data are present in applied research.
Comparison of cranial sex determination by discriminant analysis and logistic regression.
Amores-Ampuero, Anabel; Alemán, Inmaculada
2016-04-05
Various methods have been proposed for estimating dimorphism. The objective of this study was to compare sex determination results from cranial measurements using discriminant analysis or logistic regression. The study sample comprised 130 individuals (70 males) of known sex, age, and cause of death from San José cemetery in Granada (Spain). Measurements of 19 neurocranial dimensions and 11 splanchnocranial dimensions were subjected to discriminant analysis and logistic regression, and the percentages of correct classification were compared between the sex functions obtained with each method. The discriminant capacity of the selected variables was evaluated with a cross-validation procedure. The percentage accuracy with discriminant analysis was 78.2% for the neurocranium (82.4% in females and 74.6% in males) and 73.7% for the splanchnocranium (79.6% in females and 68.8% in males). These percentages were higher with logistic regression analysis: 85.7% for the neurocranium (in both sexes) and 94.1% for the splanchnocranium (100% in females and 91.7% in males).
Research design and statistical methods in Pakistan Journal of Medical Sciences (PJMS).
Akhtar, Sohail; Shah, Syed Wadood Ali; Rafiq, M; Khan, Ajmal
2016-01-01
This article compares the study design and statistical methods used in 2005, 2010 and 2015 of Pakistan Journal of Medical Sciences (PJMS). Only original articles of PJMS were considered for the analysis. The articles were carefully reviewed for statistical methods and designs, and then recorded accordingly. The frequency of each statistical method and research design was estimated and compared with previous years. A total of 429 articles were evaluated (n=74 in 2005, n=179 in 2010, n=176 in 2015) in which 171 (40%) were cross-sectional and 116 (27%) were prospective study designs. A verity of statistical methods were found in the analysis. The most frequent methods include: descriptive statistics (n=315, 73.4%), chi-square/Fisher's exact tests (n=205, 47.8%) and student t-test (n=186, 43.4%). There was a significant increase in the use of statistical methods over time period: t-test, chi-square/Fisher's exact test, logistic regression, epidemiological statistics, and non-parametric tests. This study shows that a diverse variety of statistical methods have been used in the research articles of PJMS and frequency improved from 2005 to 2015. However, descriptive statistics was the most frequent method of statistical analysis in the published articles while cross-sectional study design was common study design.
Li, Baoyue; Lingsma, Hester F; Steyerberg, Ewout W; Lesaffre, Emmanuel
2011-05-23
Logistic random effects models are a popular tool to analyze multilevel also called hierarchical data with a binary or ordinal outcome. Here, we aim to compare different statistical software implementations of these models. We used individual patient data from 8509 patients in 231 centers with moderate and severe Traumatic Brain Injury (TBI) enrolled in eight Randomized Controlled Trials (RCTs) and three observational studies. We fitted logistic random effects regression models with the 5-point Glasgow Outcome Scale (GOS) as outcome, both dichotomized as well as ordinal, with center and/or trial as random effects, and as covariates age, motor score, pupil reactivity or trial. We then compared the implementations of frequentist and Bayesian methods to estimate the fixed and random effects. Frequentist approaches included R (lme4), Stata (GLLAMM), SAS (GLIMMIX and NLMIXED), MLwiN ([R]IGLS) and MIXOR, Bayesian approaches included WinBUGS, MLwiN (MCMC), R package MCMCglmm and SAS experimental procedure MCMC.Three data sets (the full data set and two sub-datasets) were analysed using basically two logistic random effects models with either one random effect for the center or two random effects for center and trial. For the ordinal outcome in the full data set also a proportional odds model with a random center effect was fitted. The packages gave similar parameter estimates for both the fixed and random effects and for the binary (and ordinal) models for the main study and when based on a relatively large number of level-1 (patient level) data compared to the number of level-2 (hospital level) data. However, when based on relatively sparse data set, i.e. when the numbers of level-1 and level-2 data units were about the same, the frequentist and Bayesian approaches showed somewhat different results. The software implementations differ considerably in flexibility, computation time, and usability. There are also differences in the availability of additional tools for model evaluation, such as diagnostic plots. The experimental SAS (version 9.2) procedure MCMC appeared to be inefficient. On relatively large data sets, the different software implementations of logistic random effects regression models produced similar results. Thus, for a large data set there seems to be no explicit preference (of course if there is no preference from a philosophical point of view) for either a frequentist or Bayesian approach (if based on vague priors). The choice for a particular implementation may largely depend on the desired flexibility, and the usability of the package. For small data sets the random effects variances are difficult to estimate. In the frequentist approaches the MLE of this variance was often estimated zero with a standard error that is either zero or could not be determined, while for Bayesian methods the estimates could depend on the chosen "non-informative" prior of the variance parameter. The starting value for the variance parameter may be also critical for the convergence of the Markov chain.
Hill, Andrew; Loh, Po-Ru; Bharadwaj, Ragu B.; Pons, Pascal; Shang, Jingbo; Guinan, Eva; Lakhani, Karim; Kilty, Iain
2017-01-01
Abstract Background: The association of differing genotypes with disease-related phenotypic traits offers great potential to both help identify new therapeutic targets and support stratification of patients who would gain the greatest benefit from specific drug classes. Development of low-cost genotyping and sequencing has made collecting large-scale genotyping data routine in population and therapeutic intervention studies. In addition, a range of new technologies is being used to capture numerous new and complex phenotypic descriptors. As a result, genotype and phenotype datasets have grown exponentially. Genome-wide association studies associate genotypes and phenotypes using methods such as logistic regression. As existing tools for association analysis limit the efficiency by which value can be extracted from increasing volumes of data, there is a pressing need for new software tools that can accelerate association analyses on large genotype-phenotype datasets. Results: Using open innovation (OI) and contest-based crowdsourcing, the logistic regression analysis in a leading, community-standard genetics software package (PLINK 1.07) was substantially accelerated. OI allowed us to do this in <6 months by providing rapid access to highly skilled programmers with specialized, difficult-to-find skill sets. Through a crowd-based contest a combination of computational, numeric, and algorithmic approaches was identified that accelerated the logistic regression in PLINK 1.07 by 18- to 45-fold. Combining contest-derived logistic regression code with coarse-grained parallelization, multithreading, and associated changes to data initialization code further developed through distributed innovation, we achieved an end-to-end speedup of 591-fold for a data set size of 6678 subjects by 645 863 variants, compared to PLINK 1.07's logistic regression. This represents a reduction in run time from 4.8 hours to 29 seconds. Accelerated logistic regression code developed in this project has been incorporated into the PLINK2 project. Conclusions: Using iterative competition-based OI, we have developed a new, faster implementation of logistic regression for genome-wide association studies analysis. We present lessons learned and recommendations on running a successful OI process for bioinformatics. PMID:28327993
Hill, Andrew; Loh, Po-Ru; Bharadwaj, Ragu B; Pons, Pascal; Shang, Jingbo; Guinan, Eva; Lakhani, Karim; Kilty, Iain; Jelinsky, Scott A
2017-05-01
The association of differing genotypes with disease-related phenotypic traits offers great potential to both help identify new therapeutic targets and support stratification of patients who would gain the greatest benefit from specific drug classes. Development of low-cost genotyping and sequencing has made collecting large-scale genotyping data routine in population and therapeutic intervention studies. In addition, a range of new technologies is being used to capture numerous new and complex phenotypic descriptors. As a result, genotype and phenotype datasets have grown exponentially. Genome-wide association studies associate genotypes and phenotypes using methods such as logistic regression. As existing tools for association analysis limit the efficiency by which value can be extracted from increasing volumes of data, there is a pressing need for new software tools that can accelerate association analyses on large genotype-phenotype datasets. Using open innovation (OI) and contest-based crowdsourcing, the logistic regression analysis in a leading, community-standard genetics software package (PLINK 1.07) was substantially accelerated. OI allowed us to do this in <6 months by providing rapid access to highly skilled programmers with specialized, difficult-to-find skill sets. Through a crowd-based contest a combination of computational, numeric, and algorithmic approaches was identified that accelerated the logistic regression in PLINK 1.07 by 18- to 45-fold. Combining contest-derived logistic regression code with coarse-grained parallelization, multithreading, and associated changes to data initialization code further developed through distributed innovation, we achieved an end-to-end speedup of 591-fold for a data set size of 6678 subjects by 645 863 variants, compared to PLINK 1.07's logistic regression. This represents a reduction in run time from 4.8 hours to 29 seconds. Accelerated logistic regression code developed in this project has been incorporated into the PLINK2 project. Using iterative competition-based OI, we have developed a new, faster implementation of logistic regression for genome-wide association studies analysis. We present lessons learned and recommendations on running a successful OI process for bioinformatics. © The Author 2017. Published by Oxford University Press.
Lin, Chao-Cheng; Bai, Ya-Mei; Chen, Jen-Yeu; Hwang, Tzung-Jeng; Chen, Tzu-Ting; Chiu, Hung-Wen; Li, Yu-Chuan
2010-03-01
Metabolic syndrome (MetS) is an important side effect of second-generation antipsychotics (SGAs). However, many SGA-treated patients with MetS remain undetected. In this study, we trained and validated artificial neural network (ANN) and multiple logistic regression models without biochemical parameters to rapidly identify MetS in patients with SGA treatment. A total of 383 patients with a diagnosis of schizophrenia or schizoaffective disorder (DSM-IV criteria) with SGA treatment for more than 6 months were investigated to determine whether they met the MetS criteria according to the International Diabetes Federation. The data for these patients were collected between March 2005 and September 2005. The input variables of ANN and logistic regression were limited to demographic and anthropometric data only. All models were trained by randomly selecting two-thirds of the patient data and were internally validated with the remaining one-third of the data. The models were then externally validated with data from 69 patients from another hospital, collected between March 2008 and June 2008. The area under the receiver operating characteristic curve (AUC) was used to measure the performance of all models. Both the final ANN and logistic regression models had high accuracy (88.3% vs 83.6%), sensitivity (93.1% vs 86.2%), and specificity (86.9% vs 83.8%) to identify MetS in the internal validation set. The mean +/- SD AUC was high for both the ANN and logistic regression models (0.934 +/- 0.033 vs 0.922 +/- 0.035, P = .63). During external validation, high AUC was still obtained for both models. Waist circumference and diastolic blood pressure were the common variables that were left in the final ANN and logistic regression models. Our study developed accurate ANN and logistic regression models to detect MetS in patients with SGA treatment. The models are likely to provide a noninvasive tool for large-scale screening of MetS in this group of patients. (c) 2010 Physicians Postgraduate Press, Inc.
Analysis strategies for longitudinal attachment loss data.
Beck, J D; Elter, J R
2000-02-01
The purpose of this invited review is to describe and discuss methods currently in use to quantify the progression of attachment loss in epidemiological studies of periodontal disease, and to make recommendations for specific analytic methods based upon the particular design of the study and structure of the data. The review concentrates on the definition of incident attachment loss (ALOSS) and its component parts; measurement issues including thresholds and regression to the mean; methods of accounting for longitudinal change, including changes in means, changes in proportions of affected sites, incidence density, the effect of tooth loss and reversals, and repeated events; statistical models of longitudinal change, including the incorporation of the time element, use of linear, logistic or Poisson regression or survival analysis, and statistical tests; site vs person level of analysis, including statistical adjustment for correlated data; the strengths and limitations of ALOSS data. Examples from the Piedmont 65+ Dental Study are used to illustrate specific concepts. We conclude that incidence density is the preferred methodology to use for periodontal studies with more than one period of follow-up and that the use of studies not employing methods for dealing with complex samples, correlated data, and repeated measures does not take advantage of our current understanding of the site- and person-level variables important in periodontal disease and may generate biased results.
Asano, Junichi; Hirakawa, Akihiro
2017-01-01
The Cox proportional hazards cure model is a survival model incorporating a cure rate with the assumption that the population contains both uncured and cured individuals. It contains a logistic regression for the cure rate, and a Cox regression to estimate the hazard for uncured patients. A single predictive model for both the cure and hazard can be developed by using a cure model that simultaneously predicts the cure rate and hazards for uncured patients; however, model selection is a challenge because of the lack of a measure for quantifying the predictive accuracy of a cure model. Recently, we developed an area under the receiver operating characteristic curve (AUC) for determining the cure rate in a cure model (Asano et al., 2014), but the hazards measure for uncured patients was not resolved. In this article, we propose novel C-statistics that are weighted by the patients' cure status (i.e., cured, uncured, or censored cases) for the cure model. The operating characteristics of the proposed C-statistics and their confidence interval were examined by simulation analyses. We also illustrate methods for predictive model selection and for further interpretation of variables using the proposed AUCs and C-statistics via application to breast cancer data.
Deletion Diagnostics for Alternating Logistic Regressions
Preisser, John S.; By, Kunthel; Perin, Jamie; Qaqish, Bahjat F.
2013-01-01
Deletion diagnostics are introduced for the regression analysis of clustered binary outcomes estimated with alternating logistic regressions, an implementation of generalized estimating equations (GEE) that estimates regression coefficients in a marginal mean model and in a model for the intracluster association given by the log odds ratio. The diagnostics are developed within an estimating equations framework that recasts the estimating functions for association parameters based upon conditional residuals into equivalent functions based upon marginal residuals. Extensions of earlier work on GEE diagnostics follow directly, including computational formulae for one-step deletion diagnostics that measure the influence of a cluster of observations on the estimated regression parameters and on the overall marginal mean or association model fit. The diagnostic formulae are evaluated with simulations studies and with an application concerning an assessment of factors associated with health maintenance visits in primary care medical practices. The application and the simulations demonstrate that the proposed cluster-deletion diagnostics for alternating logistic regressions are good approximations of their exact fully iterated counterparts. PMID:22777960
Logistic model analysis of neurological findings in Minamata disease and the predicting index.
Nakagawa, Masanori; Kodama, Tomoko; Akiba, Suminori; Arimura, Kimiyoshi; Wakamiya, Junji; Futatsuka, Makoto; Kitano, Takao; Osame, Mitsuhiro
2002-01-01
To establish a statistical diagnostic method to identify patients with Minamata disease (MD) considering factors of aging and sex, we analyzed the neurological findings in MD patients, inhabitants in a methylmercury polluted (MP) area, and inhabitants in a non-MP area. We compared the neurological findings in MD patients and inhabitants aged more than 40 years in the non-MP area. Based on the different frequencies of the neurological signs in the two groups, we devised the following formula to calculate the predicting index for MD: predicting index = 1/(1+e(-x)) x 100 (The value of x was calculated using the regression coefficients of each neurological finding obtained from logistic analysis. The index 100 indicated MD, and 0, non-MD). Using this method, we found that 100% of male and 98% of female patients with MD (95 cases) gave predicting indices higher than 95. Five percent of the aged inhabitants in the MP area (598 inhabitants) and 0.2% of those in the non-MP area (558 inhabitants) gave predicting indices of 50 or higher. Our statistical diagnostic method for MD was useful in distinguishing MD patients from healthy elders based on their neurological findings.
Knol, Mirjam J; van der Tweel, Ingeborg; Grobbee, Diederick E; Numans, Mattijs E; Geerlings, Mirjam I
2007-10-01
To determine the presence of interaction in epidemiologic research, typically a product term is added to the regression model. In linear regression, the regression coefficient of the product term reflects interaction as departure from additivity. However, in logistic regression it refers to interaction as departure from multiplicativity. Rothman has argued that interaction estimated as departure from additivity better reflects biologic interaction. So far, literature on estimating interaction on an additive scale using logistic regression only focused on dichotomous determinants. The objective of the present study was to provide the methods to estimate interaction between continuous determinants and to illustrate these methods with a clinical example. and results From the existing literature we derived the formulas to quantify interaction as departure from additivity between one continuous and one dichotomous determinant and between two continuous determinants using logistic regression. Bootstrapping was used to calculate the corresponding confidence intervals. To illustrate the theory with an empirical example, data from the Utrecht Health Project were used, with age and body mass index as risk factors for elevated diastolic blood pressure. The methods and formulas presented in this article are intended to assist epidemiologists to calculate interaction on an additive scale between two variables on a certain outcome. The proposed methods are included in a spreadsheet which is freely available at: http://www.juliuscenter.nl/additive-interaction.xls.
Statistical primer: propensity score matching and its alternatives.
Benedetto, Umberto; Head, Stuart J; Angelini, Gianni D; Blackstone, Eugene H
2018-06-01
Propensity score (PS) methods offer certain advantages over more traditional regression methods to control for confounding by indication in observational studies. Although multivariable regression models adjust for confounders by modelling the relationship between covariates and outcome, the PS methods estimate the treatment effect by modelling the relationship between confounders and treatment assignment. Therefore, methods based on the PS are not limited by the number of events, and their use may be warranted when the number of confounders is large, or the number of outcomes is small. The PS is the probability for a subject to receive a treatment conditional on a set of baseline characteristics (confounders). The PS is commonly estimated using logistic regression, and it is used to match patients with similar distribution of confounders so that difference in outcomes gives unbiased estimate of treatment effect. This review summarizes basic concepts of the PS matching and provides guidance in implementing matching and other methods based on the PS, such as stratification, weighting and covariate adjustment.
Liu, Jian; Gao, Yun-Hua; Li, Ding-Dong; Gao, Yan-Chun; Hou, Ling-Mi; Xie, Ting
2014-01-01
To compare the value of contrast-enhanced ultrasound (CEUS) qualitative and quantitative analysis in the identification of breast tumor lumps. Qualitative and quantitative indicators of CEUS for 73 cases of breast tumor lumps were retrospectively analyzed by univariate and multivariate approaches. Logistic regression was applied and ROC curves were drawn for evaluation and comparison. The CEUS qualitative indicator-generated regression equation contained three indicators, namely enhanced homogeneity, diameter line expansion and peak intensity grading, which demonstrated prediction accuracy for benign and malignant breast tumor lumps of 91.8%; the quantitative indicator-generated regression equation only contained one indicator, namely the relative peak intensity, and its prediction accuracy was 61.5%. The corresponding areas under the ROC curve for qualitative and quantitative analyses were 91.3% and 75.7%, respectively, which exhibited a statistically significant difference by the Z test (P<0.05). The ability of CEUS qualitative analysis to identify breast tumor lumps is better than with quantitative analysis.
Engvall, Karin; Hult, M; Corner, R; Lampa, E; Norbäck, D; Emenius, G
2010-01-01
The aim was to develop a new model to identify residential buildings with higher frequencies of "SBS" than expected, "risk buildings". In 2005, 481 multi-family buildings with 10,506 dwellings in Stockholm were studied by a new stratified random sampling. A standardised self-administered questionnaire was used to assess "SBS", atopy and personal factors. The response rate was 73%. Statistical analysis was performed by multiple logistic regressions. Dwellers owning their building reported less "SBS" than those renting. There was a strong relationship between socio-economic factors and ownership. The regression model, ended up with high explanatory values for age, gender, atopy and ownership. Applying our model, 9% of all residential buildings in Stockholm were classified as "risk buildings" with the highest proportion in houses built 1961-1975 (26%) and lowest in houses built 1985-1990 (4%). To identify "risk buildings", it is necessary to adjust for ownership and population characteristics.
ERIC Educational Resources Information Center
Osborne, Jason W.
2012-01-01
Logistic regression is slowly gaining acceptance in the social sciences, and fills an important niche in the researcher's toolkit: being able to predict important outcomes that are not continuous in nature. While OLS regression is a valuable tool, it cannot routinely be used to predict outcomes that are binary or categorical in nature. These…
Warton, David I; Thibaut, Loïc; Wang, Yi Alice
2017-01-01
Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)-common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of "model-free bootstrap", adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods.
Thibaut, Loïc; Wang, Yi Alice
2017-01-01
Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)—common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of “model-free bootstrap”, adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods. PMID:28738071
Prediction of sickness absence: development of a screening instrument
Duijts, S F A; Kant, IJ; Landeweerd, J A; Swaen, G M H
2006-01-01
Objectives To develop a concise screening instrument for early identification of employees at risk for sickness absence due to psychosocial health complaints. Methods Data from the Maastricht Cohort Study on “Fatigue at Work” were used to identify items to be associated with an increased risk of sickness absence. The analytical procedures univariate logistic regression, backward stepwise linear regression, and multiple logistic regression were successively applied. For both men and women, sum scores were calculated, and sensitivity and specificity rates of different cut‐off points on the screening instrument were defined. Results In women, results suggested that feeling depressed, having a burnout, being tired, being less interested in work, experiencing obligatory change in working days, and living alone, were strong predictors of sickness absence due to psychosocial health complaints. In men, statistically significant predictors were having a history of sickness absence, compulsive thinking, being mentally fatigued, finding it hard to relax, lack of supervisor support, and having no hobbies. A potential cut‐off point of 10 on the screening instrument resulted in a sensitivity score of 41.7% for women and 38.9% for men, and a specificity score of 91.3% for women and 90.6% for men. Conclusions This study shows that it is possible to identify predictive factors for sickness absence and to develop an instrument for early identification of employees at risk for sickness absence. The results of this study increase the possibility for both employers and policymakers to implement interventions directed at the prevention of sickness absence. PMID:16698807
Vincent, Agnès; Ayzac, Louis; Girard, Raphaële; Caillat-Vallet, Emmanuelle; Chapuis, Catherine; Depaix, Florence; Dumas, Anne-Marie; Gignoux, Chantal; Haond, Catherine; Lafarge-Leboucher, Joëlle; Launay, Carine; Tissot-Guerraz, Françoise; Fabry, Jacques
2008-03-01
To evaluate whether the adjusted rates of surgical site infection (SSI) and urinary tract infection (UTI) after cesarean delivery decrease in maternity units that perform active healthcare-associated infection surveillance. Trend analysis by means of multiple logistic regression. A total of 80 maternity units participating in the Mater Sud-Est surveillance network. A total of 37,074 cesarean deliveries were included in the surveillance from January 1, 1997, through December 31, 2003. We used a logistic regression model to estimate risk-adjusted post-cesarean delivery infection odds ratios. The variables included were the maternity units' annual rate of operative procedures, the level of dispensed neonatal care, the year of delivery, maternal risk factors, and the characteristics of cesarean delivery. The trend of risk-adjusted odds ratios for SSI and UTI during the study period was studied by linear regression. The crude rates of SSI and UTI after cesarean delivery were 1.5% (571 of 37,074 patients) and 1.8% (685 of 37,074 patients), respectively. During the study period, the decrease in SSI and UTI adjusted odds ratios was statistically significant (R=-0.823 [P=.023] and R=-0.906 [P=.005], respectively). Reductions of 48% in the SSI rate and 52% in the UTI rate were observed in the maternity units. These unbiased trends could be related to progress in preventive practices as a result of the increased dissemination of national standards and a collaborative surveillance with benchmarking of rates.
Ali Morowatisharifabad, Mohammad; Abdolkarimi, Mahdi; Asadpour, Mohammad; Fathollahi, Mahmood Sheikh; Balaee, Parisa
2018-04-15
Theory-based education tailored to target behaviour and group can be effective in promoting physical activity. The purpose of this study was to examine the predictive power of Protection Motivation Theory on intent and behaviour of Physical Activity in Patients with Type 2 Diabetes. This descriptive study was conducted on 250 patients in Rafsanjan, Iran. To examine the scores of protection motivation theory structures, a researcher-made questionnaire was used. Its validity and reliability were confirmed. The level of physical activity was also measured by the International Short - form Physical Activity Inventory. Its validity and reliability were also approved. Data were analysed by statistical tests including correlation coefficient, chi-square, logistic regression and linear regression. The results revealed that there was a significant correlation between all the protection motivation theory constructs and the intention to do physical activity. The results showed that the Theory structures were able to predict 60% of the variance of physical activity intention. The results of logistic regression demonstrated that increase in the score of physical activity intent and self - efficacy increased the chance of higher level of physical activity by 3.4 and 1.5 times, respectively OR = (3.39, 1.54). Considering the ability of protection motivation theory structures to explain the physical activity behaviour, interventional designs are suggested based on the structures of this theory, especially to improve self -efficacy as the most powerful factor in predicting physical activity intention and behaviour.
Competing risks models and time-dependent covariates
Barnett, Adrian; Graves, Nick
2008-01-01
New statistical models for analysing survival data in an intensive care unit context have recently been developed. Two models that offer significant advantages over standard survival analyses are competing risks models and multistate models. Wolkewitz and colleagues used a competing risks model to examine survival times for nosocomial pneumonia and mortality. Their model was able to incorporate time-dependent covariates and so examine how risk factors that changed with time affected the chances of infection or death. We briefly explain how an alternative modelling technique (using logistic regression) can more fully exploit time-dependent covariates for this type of data. PMID:18423067
Kumar, Amit; Karmarkar, Amol; Downer, Brian; Vashist, Amit; Adhikari, Deepak; Al Snih, Soham; Ottenbacher, Kenneth
2017-11-01
To compare the performances of 3 comorbidity indices, the Charlson Comorbidity Index, the Elixhauser Comorbidity Index, and the Centers for Medicare & Medicaid Services (CMS) risk adjustment model, Hierarchical Condition Category (HCC), in predicting post-acute discharge settings and hospital readmission for patients after joint replacement. A retrospective study of Medicare beneficiaries with total knee replacement (TKR) or total hip replacement (THR) discharged from hospitals in 2009-2011 (n = 607,349) was performed. Study outcomes were post-acute discharge setting and unplanned 30-, 60-, and 90-day hospital readmissions. Logistic regression models were built to compare the performance of the 3 comorbidity indices using C statistics. The base model included patient demographics and hospital use. Subsequent models included 1 of the 3 comorbidity indices. Additional multivariable logistic regression models were built to identify individual comorbid conditions associated with high risk of hospital readmissions. The 30-, 60-, and 90-day unplanned hospital readmission rates were 5.3%, 7.2%, and 8.5%, respectively. Patients were most frequently discharged to home health (46.3%), followed by skilled nursing facility (40.9%) and inpatient rehabilitation facility (12.7%). The C statistics for the base model in predicting post-acute discharge setting and 30-, 60-, and 90-day readmission in TKR and THR were between 0.63 and 0.67. Adding the Charlson Comorbidity Index, the Elixhauser Comorbidity Index, or HCC increased the C statistic minimally from the base model for predicting both discharge settings and hospital readmission. The health conditions most frequently associated with hospital readmission were diabetes mellitus, pulmonary disease, arrhythmias, and heart disease. The comorbidity indices and CMS-HCC demonstrated weak discriminatory ability to predict post-acute discharge settings and hospital readmission following joint replacement. © 2017, American College of Rheumatology.
Foster, Guy M.; Graham, Jennifer L.
2016-04-06
The Kansas River is a primary source of drinking water for about 800,000 people in northeastern Kansas. Source-water supplies are treated by a combination of chemical and physical processes to remove contaminants before distribution. Advanced notification of changing water-quality conditions and cyanobacteria and associated toxin and taste-and-odor compounds provides drinking-water treatment facilities time to develop and implement adequate treatment strategies. The U.S. Geological Survey (USGS), in cooperation with the Kansas Water Office (funded in part through the Kansas State Water Plan Fund), and the City of Lawrence, the City of Topeka, the City of Olathe, and Johnson County Water One, began a study in July 2012 to develop statistical models at two Kansas River sites located upstream from drinking-water intakes. Continuous water-quality monitors have been operated and discrete-water quality samples have been collected on the Kansas River at Wamego (USGS site number 06887500) and De Soto (USGS site number 06892350) since July 2012. Continuous and discrete water-quality data collected during July 2012 through June 2015 were used to develop statistical models for constituents of interest at the Wamego and De Soto sites. Logistic models to continuously estimate the probability of occurrence above selected thresholds were developed for cyanobacteria, microcystin, and geosmin. Linear regression models to continuously estimate constituent concentrations were developed for major ions, dissolved solids, alkalinity, nutrients (nitrogen and phosphorus species), suspended sediment, indicator bacteria (Escherichia coli, fecal coliform, and enterococci), and actinomycetes bacteria. These models will be used to provide real-time estimates of the probability that cyanobacteria and associated compounds exceed thresholds and of the concentrations of other water-quality constituents in the Kansas River. The models documented in this report are useful for characterizing changes in water-quality conditions through time, characterizing potentially harmful cyanobacterial events, and indicating changes in water-quality conditions that may affect drinking-water treatment processes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boutilier, J; Chan, T; Lee, T
2014-06-15
Purpose: To develop a statistical model that predicts optimization objective function weights from patient geometry for intensity-modulation radiotherapy (IMRT) of prostate cancer. Methods: A previously developed inverse optimization method (IOM) is applied retrospectively to determine optimal weights for 51 treated patients. We use an overlap volume ratio (OVR) of bladder and rectum for different PTV expansions in order to quantify patient geometry in explanatory variables. Using the optimal weights as ground truth, we develop and train a logistic regression (LR) model to predict the rectum weight and thus the bladder weight. Post hoc, we fix the weights of the leftmore » femoral head, right femoral head, and an artificial structure that encourages conformity to the population average while normalizing the bladder and rectum weights accordingly. The population average of objective function weights is used for comparison. Results: The OVR at 0.7cm was found to be the most predictive of the rectum weights. The LR model performance is statistically significant when compared to the population average over a range of clinical metrics including bladder/rectum V53Gy, bladder/rectum V70Gy, and mean voxel dose to the bladder, rectum, CTV, and PTV. On average, the LR model predicted bladder and rectum weights that are both 63% closer to the optimal weights compared to the population average. The treatment plans resulting from the LR weights have, on average, a rectum V70Gy that is 35% closer to the clinical plan and a bladder V70Gy that is 43% closer. Similar results are seen for bladder V54Gy and rectum V54Gy. Conclusion: Statistical modelling from patient anatomy can be used to determine objective function weights in IMRT for prostate cancer. Our method allows the treatment planners to begin the personalization process from an informed starting point, which may lead to more consistent clinical plans and reduce overall planning time.« less
[Dementia, depression and activity of daily living as risk factors for falls in elderly patients].
Gostynski, M; Ajdacic-Gross, V; Heusser-Gretler, R; Gutzwiller, F; Michel, J P; Herrmann, F
2001-01-01
Falls among elderly are a well-recognised public health problem. The purpose of the present study was to explore the relation between dementia, number of depressive symptoms, activities of daily living, setting, and risk of falling. Data for the analysis came from a cross-sectional study about dementia, depression, and disabilities, carried out 1995/96 in Zurich and Geneva. The random sample stratified, by age and gender consisted of 921 subjects aged 65 and more. The interview was conducted by means of the Canberra interview for the Elderly, extended by short questionnaire. The subject was classified as a faller if the subject and/or the informant had reported a fall within the last 12 months prior to the interview. Logistic-regression analysis was used to determine the independent impact of dementia, depressive symptoms, and ADL-score on risk of falling. The stepwise logistic regression analysis has revealed a statistically significant association between dementia (OR 2.14, 95% CI 1.15-3.96), two resp. three depressive symptoms (OR 1.64, 95% CI 1.04-2.60) as well as four or more depressive symptoms (OR 2.64, 95% CI 1.39-5.02) and the risk of falling. There was no statistically significant relationship between studied risk factors and the risk of being one-time faller. However, we found a strong positive association between dementia (OR 3.92, 95% CI 1.75-8.79), four or more depressive symptoms (OR 3.90, 95% CI 1.55-9.83) and the risk of being recurrent faller. Moreover, residents of nursing homes (OR 8.50, 95% CI 2.18-33.22) and elderly aged 85 or more (OR 2.29, 95% CI 1.08-4.87) were under statistically significant higher risk of sustaining recurrent falls. The results of the present study confirm that dementia and depression substantially increase the risk of falling.
Predicting Social Trust with Binary Logistic Regression
ERIC Educational Resources Information Center
Adwere-Boamah, Joseph; Hufstedler, Shirley
2015-01-01
This study used binary logistic regression to predict social trust with five demographic variables from a national sample of adult individuals who participated in The General Social Survey (GSS) in 2012. The five predictor variables were respondents' highest degree earned, race, sex, general happiness and the importance of personally assisting…
Species Composition at the Sub-Meter Level in Discontinuous Permafrost in Subarctic Sweden
NASA Astrophysics Data System (ADS)
Anderson, S. M.; Palace, M. W.; Layne, M.; Varner, R. K.; Crill, P. M.
2013-12-01
Northern latitudes are experiencing rapid warming. Wetlands underlain by permafrost are particularly vulnerable to warming which results in changes in vegetative cover. Specific species have been associated with greenhouse gas emissions therefore knowledge of species compositional shift allows for the systematic change and quantification of emissions and changes in such emissions. Species composition varies on the sub-meter scale based on topography and other microsite environmental parameters. This complexity and the need to scale vegetation to the landscape level proves vital in our estimation of carbon dioxide (CO2) and methane (CH4) emissions and dynamics. Stordalen Mire (68°21'N, 18°49'E) in Abisko and is located at the edge of discontinuous permafrost zone. This provides a unique opportunity to analyze multiple vegetation communities in a close proximity. To do this, we randomly selected 25 1x1 meter plots that were representative of five major cover types: Semi-wet, wet, hummock, tall graminoid, and tall shrub. We used a quadrat with 64 sub plots and measured areal percent cover for 24 species. We collected ground based remote sensing (RS) at each plot to determine species composition using an ADC-lite (near infrared, red, green) and GoPro (red, blue, green). We normalized each image based on a Teflon white chip placed in each image. Textural analysis was conducted on each image for entropy, angular second momentum, and lacunarity. A logistic regression was developed to examine vegetation cover types and remote sensing parameters. We used a multiple linear regression using forwards stepwise variable selection. We found statistical difference in species composition and diversity indices between vegetation cover types. In addition, we were able to build regression model to significantly estimate vegetation cover type as well as percent cover for specific key vegetative species. This ground-based remote sensing allows for quick quantification of vegetation cover and species and also provides the framework for scaling to satellite image data to estimate species composition and shift on the landscape level. To determine diversity within our plots we calculated species richness and Shannon Index. We found that there were statistically different species composition within each vegetation cover type and also determined which species were indicative for cover type. Our logistical regression was able to significantly classify vegetation cover types based on RS parameters. Our multiple regression analysis indicated Betunla nana (Dwarf Birch) (r2= .48, p=<0.0001) and Sphagnum (r2=0.59, p=<0.0001) were statistically significant with respect to RS parameters. We suggest that ground based remote sensing methods may provide a unique and efficient method to quantify vegetation across the landscape in northern latitude wetlands.
Otwombe, Kennedy N.; Petzold, Max; Martinson, Neil; Chirwa, Tobias
2014-01-01
Background Research in the predictors of all-cause mortality in HIV-infected people has widely been reported in literature. Making an informed decision requires understanding the methods used. Objectives We present a review on study designs, statistical methods and their appropriateness in original articles reporting on predictors of all-cause mortality in HIV-infected people between January 2002 and December 2011. Statistical methods were compared between 2002–2006 and 2007–2011. Time-to-event analysis techniques were considered appropriate. Data Sources Pubmed/Medline. Study Eligibility Criteria Original English-language articles were abstracted. Letters to the editor, editorials, reviews, systematic reviews, meta-analysis, case reports and any other ineligible articles were excluded. Results A total of 189 studies were identified (n = 91 in 2002–2006 and n = 98 in 2007–2011) out of which 130 (69%) were prospective and 56 (30%) were retrospective. One hundred and eighty-two (96%) studies described their sample using descriptive statistics while 32 (17%) made comparisons using t-tests. Kaplan-Meier methods for time-to-event analysis were commonly used in the earlier period (n = 69, 76% vs. n = 53, 54%, p = 0.002). Predictors of mortality in the two periods were commonly determined using Cox regression analysis (n = 67, 75% vs. n = 63, 64%, p = 0.12). Only 7 (4%) used advanced survival analysis methods of Cox regression analysis with frailty in which 6 (3%) were used in the later period. Thirty-two (17%) used logistic regression while 8 (4%) used other methods. There were significantly more articles from the first period using appropriate methods compared to the second (n = 80, 88% vs. n = 69, 70%, p-value = 0.003). Conclusion Descriptive statistics and survival analysis techniques remain the most common methods of analysis in publications on predictors of all-cause mortality in HIV-infected cohorts while prospective research designs are favoured. Sophisticated techniques of time-dependent Cox regression and Cox regression with frailty are scarce. This motivates for more training in the use of advanced time-to-event methods. PMID:24498313
Statistical models to predict type 2 diabetes remission after bariatric surgery.
Ramos-Levi, Ana M; Matia, Pilar; Cabrerizo, Lucio; Barabash, Ana; Sanchez-Pernaute, Andres; Calle-Pascual, Alfonso L; Torres, Antonio J; Rubio, Miguel A
2014-09-01
Type 2 diabetes (T2D) remission may be achieved after bariatric surgery (BS), but rates vary according to patients' baseline characteristics. The present study evaluates the relevance of several preoperative factors and develops statistical models to predict T2D remission 1 year after BS. We retrospectively studied 141 patients (57.4% women), with a preoperative diagnosis of T2D, who underwent BS in a single center (2006-2011). Anthropometric and glucose metabolism parameters before surgery and at 1-year follow-up were recorded. Remission of T2D was defined according to consensus criteria: HbA1c <6%, fasting glucose (FG) <100 mg/dL, absence of pharmacologic treatment. The influence of several preoperative factors was explored and different statistical models to predict T2D remission were elaborated using logistic regression analysis. Three preoperative characteristics considered individually were identified as the most powerful predictors of T2D remission: C-peptide (R2 = 0.249; odds ratio [OR] 1.652, 95% confidence interval [CI] 1.181-2.309; P = 0.003), T2D duration (R2 = 0.197; OR 0.869, 95% CI 0.808-0.935; P < 0.001), and previous insulin therapy (R2 = 0.165; OR 4.670, 95% CI 2.257-9.665; P < 0.001). High C-peptide levels, a shorter duration of T2D, and the absence of insulin therapy favored remission. Different multivariate logistic regression models were designed. When considering sex, T2D duration, and insulin treatment, remission was correctly predicted in 72.4% of cases. The model that included age, FG and C-peptide levels resulted in 83.7% correct classifications. When sex, FG, C-peptide, insulin treatment, and percentage weight loss were considered, correct classification of T2D remission was achieved in 95.9% of cases. Preoperative characteristics determine T2D remission rates after BS to different extents. The use of statistical models may help clinicians reliably predict T2D remission rates after BS. © 2014 Ruijin Hospital, Shanghai Jiaotong University School of Medicine and Wiley Publishing Asia Pty Ltd.
Rural-urban disparity in oral health-related quality of life.
Gaber, Amal; Galarneau, Chantal; Feine, Jocelyne S; Emami, Elham
2018-04-01
The objective of this population-based cross-sectional study was to estimate rural-urban disparity in the oral health-related quality of life (OHRQoL) of the Quebec adult population. A 2-stage sampling design was used to collect data from the 1788 parents/caregivers of schoolchildren living in the 8 regions of the province of Quebec in Canada. Andersen's behavioural model for health services utilization was used as a conceptual framework. Place of residency was defined according to the Statistics Canada Census Metropolitan Area and Census Agglomeration Influenced Zone classification. The outcome of interest was OHRQoL measured using the Oral Health Impact Profile (OHIP)-14 validated questionnaire. Data weighting was applied, and the prevalence, extent and severity of negative oral health impacts were calculated. Statistical analyses included descriptive statistics, bivariate analyses and binary logistic regression. The prevalence of poor oral health-related quality life (OHRQoL) was statistically higher in rural areas than in urban zones (P = .02). Rural residents reported a significantly higher prevalence of negative daily-life impacts in pain, psychological discomfort and social disability OHIP domains (P < .05). Additionally, the rural population showed a greater number of negative oral health impacts (P = .03). There was no significant rural-urban difference in the severity of poor oral health. Logistic regression indicated that the prevalence of poor OHRQoL was significantly related to place of residency (OR = 1.6; 95% CI = 1.1-2.5; P = .022), perceived oral health (OR = 9.4; 95% CI = 5.7-15.5; P < .001), dental treatment needs factors (perceived need for dental treatment, pain, dental care seeking) (OR = 8.7; 95% CI = 4.8-15.6; P < .001) and education (OR = 2.7; 95% CI = 1.8-3.9; P < .001). The results of this study suggest a potential difference in OHRQoL of Quebec rural and urban populations, and a need to develop strategies to promote oral health outcomes, specifically for rural residents. Further studies are needed to confirm these results. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Hung, Shih-Chiang; Kung, Chia-Te; Hung, Chih-Wei; Liu, Ber-Ming; Liu, Jien-Wei; Chew, Ghee; Chuang, Hung-Yi; Lee, Wen-Huei; Lee, Tzu-Chi
2014-08-23
The adverse effects of delayed admission to the intensive care unit (ICU) have been recognized in previous studies. However, the definitions of delayed admission varies across studies. This study proposed a model to define "delayed admission", and explored the effect of ICU-waiting time on patients' outcome. This retrospective cohort study included non-traumatic adult patients on mechanical ventilation in the emergency department (ED), from July 2009 to June 2010. The primary outcomes measures were 21-ventilator-day mortality and prolonged hospital stays (over 30 days). Models of Cox regression and logistic regression were used for multivariate analysis. The non-delayed ICU-waiting was defined as a period in which the time effect on mortality was not statistically significant in a Cox regression model. To identify a suitable cut-off point between "delayed" and "non-delayed", subsets from the overall data were made based on ICU-waiting time and the hazard ratio of ICU-waiting hour in each subset was iteratively calculated. The cut-off time was then used to evaluate the impact of delayed ICU admission on mortality and prolonged length of hospital stay. The final analysis included 1,242 patients. The time effect on mortality emerged after 4 hours, thus we deduced ICU-waiting time in ED > 4 hours as delayed. By logistic regression analysis, delayed ICU admission affected the outcomes of 21 ventilator-days mortality and prolonged hospital stay, with odds ratio of 1.41 (95% confidence interval, 1.05 to 1.89) and 1.56 (95% confidence interval, 1.07 to 2.27) respectively. For patients on mechanical ventilation at the ED, delayed ICU admission is associated with higher probability of mortality and additional resource expenditure. A benchmark waiting time of no more than 4 hours for ICU admission is recommended.
Why do some studies find that CPR fraction is not a predictor of survival?
Wik, Lars; Olsen, Jan-Aage; Persse, David; Sterz, Fritz; Lozano, Michael; Brouwer, Marc A; Westfall, Mark; Souders, Chris M; Travis, David T; Herken, Ulrich R; Lerner, E Brooke
2016-07-01
An 80% chest compression fraction (CCF) during resuscitation is recommended. However, heterogeneous results in CCF studies were found during the 2015 Consensus on Science (CoS), which may be because chest compressions are stopped for a wide variety of reasons including providing lifesaving care, provider distraction, fatigue, confusion, and inability to perform lifesaving skills efficiently. The effect of confounding variables on CCF to predict cardiac arrest survival. A secondary analysis of emergency medical services (EMS) treated out-of-hospital cardiac arrest (OHCA) patients who received manual compressions. CCF (percent of time patients received compressions) was determined from electronic defibrillator files. Two Sample Wilcoxon Rank Sum or regression determined a statistical association between CCF and age, gender, bystander CPR, public location, witnessed arrest, shockable rhythm, resuscitation duration, study site, and number of shocks. Univariate and multivariate logistic regressions were used to determine CCF effect on survival. Of 2132 patients with manual compressions 1997 had complete data. Shockable rhythm (p<0.001), public location (p<0.004), treatment duration (p<0.001), and number of shocks (p<0.001) were associated with lower CCF. Univariate logistic regression found that CCF was inversely associated with survival (OR 0.07; 95% CI 0.01-0.36). Multivariate regression controlling for factors associated with survival and/or CCF found that increasing CCF was associated with survival (OR 6.34; 95% CI 1.02-39.5). CCF cannot be looked at in isolation as a predictor of survival, but in the context of other resuscitation activities. When controlling for the effects of other resuscitation activities, a higher CCF is predictive of survival. This may explain the heterogeneity of findings during the CoS review. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Somma, Francesco; Cammarota, Giuseppe; Plotino, Gianluca; Grande, Nicola M; Pameijer, Cornelis H
2008-04-01
The aim of this study was to compare the effectiveness of the Mtwo R (Sweden & Martina, Padova, Italy), ProTaper retreatment files (Dentsply-Maillefer, Ballaigues, Switzerland), and a Hedström manual technique in the removal of three different filling materials (gutta-percha, Resilon [Resilon Research LLC, Madison, CT], and EndoRez [Ultradent Products Inc, South Jordan, UT]) during retreatment. Ninety single-rooted straight premolars were instrumented and randomly divided into 9 groups of 10 teeth each (n = 10) with regards to filling material and instrument used. For all roots, the following data were recorded: procedural errors, time of retreatment, apically extruded material, canal wall cleanliness through optical stereomicroscopy (OSM), and scanning electron microscopy (SEM). A linear regression analysis and three logistic regression analyses were performed to assess the level of significance set at p = 0.05. The results indicated that the overall regression models were statistically significant. The Mtwo R, ProTaper retreatment files, and Resilon filling material had a positive impact in reducing the time for retreatment. Both ProTaper retreatment files and Mtwo R showed a greater extrusion of debris. For both OSM and SEM logistic regression models, the root canal apical third had the greatest impact on the score values. EndoRez filling material resulted in cleaner root canal walls using OSM analysis, whereas Resilon filling material and both engine-driven NiTi rotary techniques resulted in less clean root canal walls according to SEM analysis. In conclusion, all instruments left remnants of filling material and debris on the root canal walls irrespective of the root filling material used. Both the engine-driven NiTi rotary systems proved to be safe and fast devices for the removal of endodontic filling material.
Delva, J; Spencer, M S; Lin, J K
2000-01-01
This article compares estimates of the relative odds of nitrite use obtained from weighted unconditional logistic regression with estimates obtained from conditional logistic regression after post-stratification and matching of cases with controls by neighborhood of residence. We illustrate these methods by comparing the odds associated with nitrite use among adults of four racial/ethnic groups, with and without a high school education. We used aggregated data from the 1994-B through 1996 National Household Survey on Drug Abuse (NHSDA). Difference between the methods and implications for analysis and inference are discussed.