Algamal, Zakariya Yahya; Lee, Muhammad Hisyam
2015-12-01
Cancer classification and gene selection in high-dimensional data have been popular research topics in genetics and molecular biology. Recently, adaptive regularized logistic regression using the elastic net regularization, which is called the adaptive elastic net, has been successfully applied in high-dimensional cancer classification to tackle both estimating the gene coefficients and performing gene selection simultaneously. The adaptive elastic net originally used elastic net estimates as the initial weight, however, using this weight may not be preferable for certain reasons: First, the elastic net estimator is biased in selecting genes. Second, it does not perform well when the pairwise correlations between variables are not high. Adjusted adaptive regularized logistic regression (AAElastic) is proposed to address these issues and encourage grouping effects simultaneously. The real data results indicate that AAElastic is significantly consistent in selecting genes compared to the other three competitor regularization methods. Additionally, the classification performance of AAElastic is comparable to the adaptive elastic net and better than other regularization methods. Thus, we can conclude that AAElastic is a reliable adaptive regularized logistic regression method in the field of high-dimensional cancer classification.
Li, Li; Brumback, Babette A; Weppelmann, Thomas A; Morris, J Glenn; Ali, Afsar
2016-08-15
Motivated by an investigation of the effect of surface water temperature on the presence of Vibrio cholerae in water samples collected from different fixed surface water monitoring sites in Haiti in different months, we investigated methods to adjust for unmeasured confounding due to either of the two crossed factors site and month. In the process, we extended previous methods that adjust for unmeasured confounding due to one nesting factor (such as site, which nests the water samples from different months) to the case of two crossed factors. First, we developed a conditional pseudolikelihood estimator that eliminates fixed effects for the levels of each of the crossed factors from the estimating equation. Using the theory of U-Statistics for independent but non-identically distributed vectors, we show that our estimator is consistent and asymptotically normal, but that its variance depends on the nuisance parameters and thus cannot be easily estimated. Consequently, we apply our estimator in conjunction with a permutation test, and we investigate use of the pigeonhole bootstrap and the jackknife for constructing confidence intervals. We also incorporate our estimator into a diagnostic test for a logistic mixed model with crossed random effects and no unmeasured confounding. For comparison, we investigate between-within models extended to two crossed factors. These generalized linear mixed models include covariate means for each level of each factor in order to adjust for the unmeasured confounding. We conduct simulation studies, and we apply the methods to the Haitian data. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26892025
Practical Session: Logistic Regression
NASA Astrophysics Data System (ADS)
Clausel, M.; Grégoire, G.
2014-12-01
An exercise is proposed to illustrate the logistic regression. One investigates the different risk factors in the apparition of coronary heart disease. It has been proposed in Chapter 5 of the book of D.G. Kleinbaum and M. Klein, "Logistic Regression", Statistics for Biology and Health, Springer Science Business Media, LLC (2010) and also by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr341.pdf). This example is based on data given in the file evans.txt coming from http://www.sph.emory.edu/dkleinb/logreg3.htm#data.
[Understanding logistic regression].
El Sanharawi, M; Naudet, F
2013-10-01
Logistic regression is one of the most common multivariate analysis models utilized in epidemiology. It allows the measurement of the association between the occurrence of an event (qualitative dependent variable) and factors susceptible to influence it (explicative variables). The choice of explicative variables that should be included in the logistic regression model is based on prior knowledge of the disease physiopathology and the statistical association between the variable and the event, as measured by the odds ratio. The main steps for the procedure, the conditions of application, and the essential tools for its interpretation are discussed concisely. We also discuss the importance of the choice of variables that must be included and retained in the regression model in order to avoid the omission of important confounding factors. Finally, by way of illustration, we provide an example from the literature, which should help the reader test his or her knowledge.
Logistic Regression: Concept and Application
ERIC Educational Resources Information Center
Cokluk, Omay
2010-01-01
The main focus of logistic regression analysis is classification of individuals in different groups. The aim of the present study is to explain basic concepts and processes of binary logistic regression analysis intended to determine the combination of independent variables which best explain the membership in certain groups called dichotomous…
Fungible weights in logistic regression.
Jones, Jeff A; Waller, Niels G
2016-06-01
In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record
Logistic regression: a brief primer.
Stoltzfus, Jill C
2011-10-01
Regression techniques are versatile in their application to medical research because they can measure associations, predict outcomes, and control for confounding variable effects. As one such technique, logistic regression is an efficient and powerful way to analyze the effect of a group of independent variables on a binary outcome by quantifying each independent variable's unique contribution. Using components of linear regression reflected in the logit scale, logistic regression iteratively identifies the strongest linear combination of variables with the greatest probability of detecting the observed outcome. Important considerations when conducting logistic regression include selecting independent variables, ensuring that relevant assumptions are met, and choosing an appropriate model building strategy. For independent variable selection, one should be guided by such factors as accepted theory, previous empirical investigations, clinical considerations, and univariate statistical analyses, with acknowledgement of potential confounding variables that should be accounted for. Basic assumptions that must be met for logistic regression include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers. Additionally, there should be an adequate number of events per independent variable to avoid an overfit model, with commonly recommended minimum "rules of thumb" ranging from 10 to 20 events per covariate. Regarding model building strategies, the three general types are direct/standard, sequential/hierarchical, and stepwise/statistical, with each having a different emphasis and purpose. Before reaching definitive conclusions from the results of any of these methods, one should formally quantify the model's internal validity (i.e., replicability within the same data set) and external validity (i.e., generalizability beyond the current sample). The resulting logistic regression model
Transfer Learning Based on Logistic Regression
NASA Astrophysics Data System (ADS)
Paul, A.; Rottensteiner, F.; Heipke, C.
2015-08-01
In this paper we address the problem of classification of remote sensing images in the framework of transfer learning with a focus on domain adaptation. The main novel contribution is a method for transductive transfer learning in remote sensing on the basis of logistic regression. Logistic regression is a discriminative probabilistic classifier of low computational complexity, which can deal with multiclass problems. This research area deals with methods that solve problems in which labelled training data sets are assumed to be available only for a source domain, while classification is needed in the target domain with different, yet related characteristics. Classification takes place with a model of weight coefficients for hyperplanes which separate features in the transformed feature space. In term of logistic regression, our domain adaptation method adjusts the model parameters by iterative labelling of the target test data set. These labelled data features are iteratively added to the current training set which, at the beginning, only contains source features and, simultaneously, a number of source features are deleted from the current training set. Experimental results based on a test series with synthetic and real data constitutes a first proof-of-concept of the proposed method.
Standards for Standardized Logistic Regression Coefficients
ERIC Educational Resources Information Center
Menard, Scott
2011-01-01
Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…
Computing measures of explained variation for logistic regression models.
Mittlböck, M; Schemper, M
1999-01-01
The proportion of explained variation (R2) is frequently used in the general linear model but in logistic regression no standard definition of R2 exists. We present a SAS macro which calculates two R2-measures based on Pearson and on deviance residuals for logistic regression. Also, adjusted versions for both measures are given, which should prevent the inflation of R2 in small samples. PMID:10195643
Satellite rainfall retrieval by logistic regression
NASA Technical Reports Server (NTRS)
Chiu, Long S.
1986-01-01
The potential use of logistic regression in rainfall estimation from satellite measurements is investigated. Satellite measurements provide covariate information in terms of radiances from different remote sensors.The logistic regression technique can effectively accommodate many covariates and test their significance in the estimation. The outcome from the logistical model is the probability that the rainrate of a satellite pixel is above a certain threshold. By varying the thresholds, a rainrate histogram can be obtained, from which the mean and the variant can be estimated. A logistical model is developed and applied to rainfall data collected during GATE, using as covariates the fractional rain area and a radiance measurement which is deduced from a microwave temperature-rainrate relation. It is demonstrated that the fractional rain area is an important covariate in the model, consistent with the use of the so-called Area Time Integral in estimating total rain volume in other studies. To calibrate the logistical model, simulated rain fields generated by rainfield models with prescribed parameters are needed. A stringent test of the logistical model is its ability to recover the prescribed parameters of simulated rain fields. A rain field simulation model which preserves the fractional rain area and lognormality of rainrates as found in GATE is developed. A stochastic regression model of branching and immigration whose solutions are lognormally distributed in some asymptotic limits has also been developed.
Predicting Social Trust with Binary Logistic Regression
ERIC Educational Resources Information Center
Adwere-Boamah, Joseph; Hufstedler, Shirley
2015-01-01
This study used binary logistic regression to predict social trust with five demographic variables from a national sample of adult individuals who participated in The General Social Survey (GSS) in 2012. The five predictor variables were respondents' highest degree earned, race, sex, general happiness and the importance of personally assisting…
Logistic regression when binary predictor variables are highly correlated.
Barker, L; Brown, C
Standard logistic regression can produce estimates having large mean square error when predictor variables are multicollinear. Ridge regression and principal components regression can reduce the impact of multicollinearity in ordinary least squares regression. Generalizations of these, applicable in the logistic regression framework, are alternatives to standard logistic regression. It is shown that estimates obtained via ridge and principal components logistic regression can have smaller mean square error than estimates obtained through standard logistic regression. Recommendations for choosing among standard, ridge and principal components logistic regression are developed. Published in 2001 by John Wiley & Sons, Ltd.
Spatial correlation in Bayesian logistic regression with misclassification.
Bihrmann, Kristine; Toft, Nils; Nielsen, Søren Saxmose; Ersbøll, Annette Kjær
2014-06-01
Standard logistic regression assumes that the outcome is measured perfectly. In practice, this is often not the case, which could lead to biased estimates if not accounted for. This study presents Bayesian logistic regression with adjustment for misclassification of the outcome applied to data with spatial correlation. The models assessed include a fixed effects model, an independent random effects model, and models with spatially correlated random effects modelled using conditional autoregressive prior distributions (ICAR and ICAR(ρ)). Performance of these models was evaluated in a simulation study. Parameters were estimated by Markov Chain Monte Carlo methods, using slice sampling to improve convergence. The results demonstrated that adjustment for misclassification must be included to produce unbiased regression estimates. With strong correlation the ICAR model performed best. With weak or moderate correlation the ICAR(ρ) performed best. With unknown spatial correlation the recommended model would be the ICAR(ρ), assuming convergence can be obtained. PMID:24889989
Model selection for logistic regression models
NASA Astrophysics Data System (ADS)
Duller, Christine
2012-09-01
Model selection for logistic regression models decides which of some given potential regressors have an effect and hence should be included in the final model. The second interesting question is whether a certain factor is heterogeneous among some subsets, i.e. whether the model should include a random intercept or not. In this paper these questions will be answered with classical as well as with Bayesian methods. The application show some results of recent research projects in medicine and business administration.
Logistic models--an odd(s) kind of regression.
Jupiter, Daniel C
2013-01-01
The logistic regression model bears some similarity to the multivariable linear regression with which we are familiar. However, the differences are great enough to warrant a discussion of the need for and interpretation of logistic regression.
Logistic Regression Applied to Seismic Discrimination
BG Amindan; DN Hagedorn
1998-10-08
The usefulness of logistic discrimination was examined in an effort to learn how it performs in a regional seismic setting. Logistic discrimination provides an easily understood method, works with user-defined models and few assumptions about the population distributions, and handles both continuous and discrete data. Seismic event measurements from a data set compiled by Los Alamos National Laboratory (LANL) of Chinese events recorded at station WMQ were used in this demonstration study. PNNL applied logistic regression techniques to the data. All possible combinations of the Lg and Pg measurements were tried, and a best-fit logistic model was created. The best combination of Lg and Pg frequencies for predicting the source of a seismic event (earthquake or explosion) used Lg{sub 3.0-6.0} and Pg{sub 3.0-6.0} as the predictor variables. A cross-validation test was run, which showed that this model was able to correctly predict 99.7% earthquakes and 98.0% explosions for this given data set. Two other models were identified that used Pg and Lg measurements from the 1.5 to 3.0 Hz frequency range. Although these other models did a good job of correctly predicting the earthquakes, they were not as effective at predicting the explosions. Two possible biases were discovered which affect the predicted probabilities for each outcome. The first bias was due to this being a case-controlled study. The sampling fractions caused a bias in the probabilities that were calculated using the models. The second bias is caused by a change in the proportions for each event. If at a later date the proportions (a priori probabilities) of explosions versus earthquakes change, this would cause a bias in the predicted probability for an event. When using logistic regression, the user needs to be aware of the possible biases and what affect they will have on the predicted probabilities.
Supporting Regularized Logistic Regression Privately and Efficiently
Li, Wenfa; Liu, Hongzhe; Yang, Peng; Xie, Wei
2016-01-01
As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc. PMID:27271738
Supporting Regularized Logistic Regression Privately and Efficiently.
Li, Wenfa; Liu, Hongzhe; Yang, Peng; Xie, Wei
2016-01-01
As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc. PMID:27271738
Supporting Regularized Logistic Regression Privately and Efficiently.
Li, Wenfa; Liu, Hongzhe; Yang, Peng; Xie, Wei
2016-01-01
As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc.
Jackknife bias reduction for polychotomous logistic regression.
Bull, S B; Greenwood, C M; Hauck, W W
1997-03-15
Despite theoretical and empirical evidence that the usual MLEs can be misleading in finite samples and some evidence that bias reduced estimates are less biased and more efficient, they have not seen a wide application in practice. One can obtain bias reduced estimates by jackknife methods, with or without full iteration, or by use of higher order terms in a Taylor series expansion of the log-likelihood to approximate asymptotic bias. We provide details of these methods for polychotomous logistic regression with a nominal categorical response. We conducted a Monte Carlo comparison of the jackknife and Taylor series estimates in moderate sample sizes in a general logistic regression setting, to investigate dichotomous and trichotomous responses and a mixture of correlated and uncorrelated binary and normal covariates. We found an approximate two-step jackknife and the Taylor series methods useful when the ratio of the number of observations to the number of parameters is greater than 15, but we cannot recommend the two-step and the fully iterated jackknife estimates when this ratio is less than 20, especially when there are large effects, binary covariates, or multicollinearity in the covariates.
Using Dominance Analysis to Determine Predictor Importance in Logistic Regression
ERIC Educational Resources Information Center
Azen, Razia; Traxel, Nicole
2009-01-01
This article proposes an extension of dominance analysis that allows researchers to determine the relative importance of predictors in logistic regression models. Criteria for choosing logistic regression R[superscript 2] analogues were determined and measures were selected that can be used to perform dominance analysis in logistic regression. A…
[Logistic regression against a divergent Bayesian network].
Sánchez Trujillo, Noel Antonio
2015-02-03
This article is a discussion about two statistical tools used for prediction and causality assessment: logistic regression and Bayesian networks. Using data of a simulated example from a study assessing factors that might predict pulmonary emphysema (where fingertip pigmentation and smoking are considered); we posed the following questions. Is pigmentation a confounding, causal or predictive factor? Is there perhaps another factor, like smoking, that confounds? Is there a synergy between pigmentation and smoking? The results, in terms of prediction, are similar with the two techniques; regarding causation, differences arise. We conclude that, in decision-making, the sum of both: a statistical tool, used with common sense, and previous evidence, taking years or even centuries to develop; is better than the automatic and exclusive use of statistical resources.
Logistic Regression: Going beyond Point-and-Click.
ERIC Educational Resources Information Center
King, Jason E.
A review of the literature reveals that important statistical algorithms and indices pertaining to logistic regression are being underused. This paper describes logistic regression in comparison with discriminant analysis and linear regression, and suggests that some techniques only accessible through computer syntax should be consulted in…
A Methodology for Generating Placement Rules that Utilizes Logistic Regression
ERIC Educational Resources Information Center
Wurtz, Keith
2008-01-01
The purpose of this article is to provide the necessary tools for institutional researchers to conduct a logistic regression analysis and interpret the results. Aspects of the logistic regression procedure that are necessary to evaluate models are presented and discussed with an emphasis on cutoff values and choosing the appropriate number of…
Matrix variate logistic regression model with application to EEG data.
Hung, Hung; Wang, Chen-Chien
2013-01-01
Logistic regression has been widely applied in the field of biomedical research for a long time. In some applications, the covariates of interest have a natural structure, such as that of a matrix, at the time of collection. The rows and columns of the covariate matrix then have certain physical meanings, and they must contain useful information regarding the response. If we simply stack the covariate matrix as a vector and fit a conventional logistic regression model, relevant information can be lost, and the problem of inefficiency will arise. Motivated from these reasons, we propose in this paper the matrix variate logistic (MV-logistic) regression model. The advantages of the MV-logistic regression model include the preservation of the inherent matrix structure of covariates and the parsimony of parameters needed. In the EEG Database Data Set, we successfully extract the structural effects of covariate matrix, and a high classification accuracy is achieved.
Preserving Institutional Privacy in Distributed binary Logistic Regression.
Wu, Yuan; Jiang, Xiaoqian; Ohno-Machado, Lucila
2012-01-01
Privacy is becoming a major concern when sharing biomedical data across institutions. Although methods for protecting privacy of individual patients have been proposed, it is not clear how to protect the institutional privacy, which is many times a critical concern of data custodians. Built upon our previous work, Grid Binary LOgistic REgression (GLORE)1, we developed an Institutional Privacy-preserving Distributed binary Logistic Regression model (IPDLR) that considers both individual and institutional privacy for building a logistic regression model in a distributed manner. We tested our method using both simulated and clinical data, showing how it is possible to protect the privacy of individuals and of institutions using a distributed strategy.
Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble
Wang, Hong; Xu, Qingsong; Zhou, Lifeng
2015-01-01
Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data. PMID:25706988
Sparse logistic regression with Lp penalty for biomarker identification.
Liu, Zhenqiu; Jiang, Feng; Tian, Guoliang; Wang, Suna; Sato, Fumiaki; Meltzer, Stephen J; Tan, Ming
2007-01-01
In this paper, we propose a novel method for sparse logistic regression with non-convex regularization Lp (p <1). Based on smooth approximation, we develop several fast algorithms for learning the classifier that is applicable to high dimensional dataset such as gene expression. To the best of our knowledge, these are the first algorithms to perform sparse logistic regression with an Lp and elastic net (Le) penalty. The regularization parameters are decided through maximizing the area under the ROC curve (AUC) of the test data. Experimental results on methylation and microarray data attest the accuracy, sparsity, and efficiency of the proposed algorithms. Biomarkers identified with our methods are compared with that in the literature. Our computational results show that Lp Logistic regression (p <1) outperforms the L1 logistic regression and SCAD SVM. Software is available upon request from the first author. PMID:17402921
MODELING SNAKE MICROHABITAT FROM RADIOTELEMETRY STUDIES USING POLYTOMOUS LOGISTIC REGRESSION
Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...
Large unbalanced credit scoring using Lasso-logistic regression ensemble.
Wang, Hong; Xu, Qingsong; Zhou, Lifeng
2015-01-01
Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data.
Target detection in hyperspectral Imaging using logistic regression
NASA Astrophysics Data System (ADS)
Lo, Edisanter; Ientilucci, Emmett
2016-05-01
Target detection is an important application in hyperspectral imaging. Conventional algorithms for target detection assume that the pixels have a multivariate normal distribution. The pixels in most images do not have multivariate normal distributions. The logistic regression model, which does not require the assumption of multivariate normal distribution, is proposed in this paper as a target detection algorithm. Experimental results show that the logistic regression model can work well in target detection.
[Unconditioned logistic regression and sample size: a bibliographic review].
Ortega Calvo, Manuel; Cayuela Domínguez, Aurelio
2002-01-01
Unconditioned logistic regression is a highly useful risk prediction method in epidemiology. This article reviews the different solutions provided by different authors concerning the interface between the calculation of the sample size and the use of logistics regression. Based on the knowledge of the information initially provided, a review is made of the customized regression and predictive constriction phenomenon, the design of an ordinal exposition with a binary output, the event of interest per variable concept, the indicator variables, the classic Freeman equation, etc. Some skeptical ideas regarding this subject are also included. PMID:12025266
Estimating the exceedance probability of rain rate by logistic regression
NASA Technical Reports Server (NTRS)
Chiu, Long S.; Kedem, Benjamin
1990-01-01
Recent studies have shown that the fraction of an area with rain intensity above a fixed threshold is highly correlated with the area-averaged rain rate. To estimate the fractional rainy area, a logistic regression model, which estimates the conditional probability that rain rate over an area exceeds a fixed threshold given the values of related covariates, is developed. The problem of dependency in the data in the estimation procedure is bypassed by the method of partial likelihood. Analyses of simulated scanning multichannel microwave radiometer and observed electrically scanning microwave radiometer data during the Global Atlantic Tropical Experiment period show that the use of logistic regression in pixel classification is superior to multiple regression in predicting whether rain rate at each pixel exceeds a given threshold, even in the presence of noisy data. The potential of the logistic regression technique in satellite rain rate estimation is discussed.
Differentially private distributed logistic regression using private and public data
2014-01-01
Background Privacy protecting is an important issue in medical informatics and differential privacy is a state-of-the-art framework for data privacy research. Differential privacy offers provable privacy against attackers who have auxiliary information, and can be applied to data mining models (for example, logistic regression). However, differentially private methods sometimes introduce too much noise and make outputs less useful. Given available public data in medical research (e.g. from patients who sign open-consent agreements), we can design algorithms that use both public and private data sets to decrease the amount of noise that is introduced. Methodology In this paper, we modify the update step in Newton-Raphson method to propose a differentially private distributed logistic regression model based on both public and private data. Experiments and results We try our algorithm on three different data sets, and show its advantage over: (1) a logistic regression model based solely on public data, and (2) a differentially private distributed logistic regression model based on private data under various scenarios. Conclusion Logistic regression models built with our new algorithm based on both private and public datasets demonstrate better utility than models that trained on private or public datasets alone without sacrificing the rigorous privacy guarantee. PMID:25079786
Preserving Institutional Privacy in Distributed Binary Logistic Regression
Wu, Yuan; Jiang, Xiaoqian; Ohno-Machado, Lucila
2012-01-01
Privacy is becoming a major concern when sharing biomedical data across institutions. Although methods for protecting privacy of individual patients have been proposed, it is not clear how to protect the institutional privacy, which is many times a critical concern of data custodians. Built upon our previous work, Grid Binary LOgistic REgression (GLORE) 1 , we developed an Institutional Privacy-preserving Distributed binary Logistic Regression model (IPDLR) that considers both individual and institutional privacy for building a logistic regression model in a distributed manner. We tested our method using both simulated and clinical data, showing how it is possible to protect the privacy of individuals and of institutions using a distributed strategy. PMID:23304425
Advanced colorectal neoplasia risk stratification by penalized logistic regression.
Lin, Yunzhi; Yu, Menggang; Wang, Sijian; Chappell, Richard; Imperiale, Thomas F
2016-08-01
Colorectal cancer is the second leading cause of death from cancer in the United States. To facilitate the efficiency of colorectal cancer screening, there is a need to stratify risk for colorectal cancer among the 90% of US residents who are considered "average risk." In this article, we investigate such risk stratification rules for advanced colorectal neoplasia (colorectal cancer and advanced, precancerous polyps). We use a recently completed large cohort study of subjects who underwent a first screening colonoscopy. Logistic regression models have been used in the literature to estimate the risk of advanced colorectal neoplasia based on quantifiable risk factors. However, logistic regression may be prone to overfitting and instability in variable selection. Since most of the risk factors in our study have several categories, it was tempting to collapse these categories into fewer risk groups. We propose a penalized logistic regression method that automatically and simultaneously selects variables, groups categories, and estimates their coefficients by penalizing the [Formula: see text]-norm of both the coefficients and their differences. Hence, it encourages sparsity in the categories, i.e. grouping of the categories, and sparsity in the variables, i.e. variable selection. We apply the penalized logistic regression method to our data. The important variables are selected, with close categories simultaneously grouped, by penalized regression models with and without the interactions terms. The models are validated with 10-fold cross-validation. The receiver operating characteristic curves of the penalized regression models dominate the receiver operating characteristic curve of naive logistic regressions, indicating a superior discriminative performance.
Hierarchical Logistic Regression: Accounting for Multilevel Data in DIF Detection
ERIC Educational Resources Information Center
French, Brian F.; Finch, W. Holmes
2010-01-01
The purpose of this study was to examine the performance of differential item functioning (DIF) assessment in the presence of a multilevel structure that often underlies data from large-scale testing programs. Analyses were conducted using logistic regression (LR), a popular, flexible, and effective tool for DIF detection. Data were simulated…
Classification of microarray data with penalized logistic regression
NASA Astrophysics Data System (ADS)
Eilers, Paul H. C.; Boer, Judith M.; van Ommen, Gert-Jan; van Houwelingen, Hans C.
2001-06-01
Classification of microarray data needs a firm statistical basis. In principle, logistic regression can provide it, modeling the probability of membership of a class with (transforms of) linear combinations of explanatory variables. However, classical logistic regression does not work for microarrays, because generally there will be far more variables than observations. One problem is multicollinearity: estimating equations become singular and have no unique and stable solution. A second problem is over-fitting: a model may fit well into a data set, but perform badly when used to classify new data. We propose penalized likelihood as a solution to both problems. The values of the regression coefficients are constrained in a similar way as in ridge regression. All variables play an equal role, there is no ad-hoc selection of most relevant or most expressed genes. The dimension of the resulting systems of equations is equal to the number of variables, and generally will be too large for most computers, but it can dramatically be reduced with the singular value decomposition of some matrices. The penalty is optimized with AIC (Akaike's Information Criterion), which essentially is a measure of prediction performance. We find that penalized logistic regression performs well on a public data set (the MIT ALL/AML data).
A Solution to Separation and Multicollinearity in Multiple Logistic Regression.
Shen, Jianzhao; Gao, Sujuan
2008-10-01
In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27-38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth's penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study.
ERIC Educational Resources Information Center
Shih, Ching-Lin; Liu, Tien-Hsiang; Wang, Wen-Chung
2014-01-01
The simultaneous item bias test (SIBTEST) method regression procedure and the differential item functioning (DIF)-free-then-DIF strategy are applied to the logistic regression (LR) method simultaneously in this study. These procedures are used to adjust the effects of matching true score on observed score and to better control the Type I error…
Determination of riverbank erosion probability using Locally Weighted Logistic Regression
NASA Astrophysics Data System (ADS)
Ioannidou, Elena; Flori, Aikaterini; Varouchakis, Emmanouil A.; Giannakis, Georgios; Vozinaki, Anthi Eirini K.; Karatzas, George P.; Nikolaidis, Nikolaos
2015-04-01
Riverbank erosion is a natural geomorphologic process that affects the fluvial environment. The most important issue concerning riverbank erosion is the identification of the vulnerable locations. An alternative to the usual hydrodynamic models to predict vulnerable locations is to quantify the probability of erosion occurrence. This can be achieved by identifying the underlying relations between riverbank erosion and the geomorphological or hydrological variables that prevent or stimulate erosion. Thus, riverbank erosion can be determined by a regression model using independent variables that are considered to affect the erosion process. The impact of such variables may vary spatially, therefore, a non-stationary regression model is preferred instead of a stationary equivalent. Locally Weighted Regression (LWR) is proposed as a suitable choice. This method can be extended to predict the binary presence or absence of erosion based on a series of independent local variables by using the logistic regression model. It is referred to as Locally Weighted Logistic Regression (LWLR). Logistic regression is a type of regression analysis used for predicting the outcome of a categorical dependent variable (e.g. binary response) based on one or more predictor variables. The method can be combined with LWR to assign weights to local independent variables of the dependent one. LWR allows model parameters to vary over space in order to reflect spatial heterogeneity. The probabilities of the possible outcomes are modelled as a function of the independent variables using a logistic function. Logistic regression measures the relationship between a categorical dependent variable and, usually, one or several continuous independent variables by converting the dependent variable to probability scores. Then, a logistic regression is formed, which predicts success or failure of a given binary variable (e.g. erosion presence or absence) for any value of the independent variables. The
Classifying machinery condition using oil samples and binary logistic regression
NASA Astrophysics Data System (ADS)
Phillips, J.; Cripps, E.; Lau, John W.; Hodkiewicz, M. R.
2015-08-01
The era of big data has resulted in an explosion of condition monitoring information. The result is an increasing motivation to automate the costly and time consuming human elements involved in the classification of machine health. When working with industry it is important to build an understanding and hence some trust in the classification scheme for those who use the analysis to initiate maintenance tasks. Typically "black box" approaches such as artificial neural networks (ANN) and support vector machines (SVM) can be difficult to provide ease of interpretability. In contrast, this paper argues that logistic regression offers easy interpretability to industry experts, providing insight to the drivers of the human classification process and to the ramifications of potential misclassification. Of course, accuracy is of foremost importance in any automated classification scheme, so we also provide a comparative study based on predictive performance of logistic regression, ANN and SVM. A real world oil analysis data set from engines on mining trucks is presented and using cross-validation we demonstrate that logistic regression out-performs the ANN and SVM approaches in terms of prediction for healthy/not healthy engines.
Vaeth, Michael; Skovlund, Eva
2004-06-15
For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination.
Comparison of a Bayesian network with a logistic regression model to forecast IgA nephropathy.
Ducher, Michel; Kalbacher, Emilie; Combarnous, François; Finaz de Vilaine, Jérome; McGregor, Brigitte; Fouque, Denis; Fauvel, Jean Pierre
2013-01-01
Models are increasingly used in clinical practice to improve the accuracy of diagnosis. The aim of our work was to compare a Bayesian network to logistic regression to forecast IgA nephropathy (IgAN) from simple clinical and biological criteria. Retrospectively, we pooled the results of all biopsies (n = 155) performed by nephrologists in a specialist clinical facility between 2002 and 2009. Two groups were constituted at random. The first subgroup was used to determine the parameters of the models adjusted to data by logistic regression or Bayesian network, and the second was used to compare the performances of the models using receiver operating characteristics (ROC) curves. IgAN was found (on pathology) in 44 patients. Areas under the ROC curves provided by both methods were highly significant but not different from each other. Based on the highest Youden indices, sensitivity reached (100% versus 67%) and specificity (73% versus 95%) using the Bayesian network and logistic regression, respectively. A Bayesian network is at least as efficient as logistic regression to estimate the probability of a patient suffering IgAN, using simple clinical and biological data obtained during consultation.
Comparison of a Bayesian Network with a Logistic Regression Model to Forecast IgA Nephropathy
Ducher, Michel; Kalbacher, Emilie; Combarnous, François; Finaz de Vilaine, Jérome; McGregor, Brigitte; Fouque, Denis; Fauvel, Jean Pierre
2013-01-01
Models are increasingly used in clinical practice to improve the accuracy of diagnosis. The aim of our work was to compare a Bayesian network to logistic regression to forecast IgA nephropathy (IgAN) from simple clinical and biological criteria. Retrospectively, we pooled the results of all biopsies (n = 155) performed by nephrologists in a specialist clinical facility between 2002 and 2009. Two groups were constituted at random. The first subgroup was used to determine the parameters of the models adjusted to data by logistic regression or Bayesian network, and the second was used to compare the performances of the models using receiver operating characteristics (ROC) curves. IgAN was found (on pathology) in 44 patients. Areas under the ROC curves provided by both methods were highly significant but not different from each other. Based on the highest Youden indices, sensitivity reached (100% versus 67%) and specificity (73% versus 95%) using the Bayesian network and logistic regression, respectively. A Bayesian network is at least as efficient as logistic regression to estimate the probability of a patient suffering IgAN, using simple clinical and biological data obtained during consultation. PMID:24328031
Risk factors for mortality after bereavement: a logistic regression analysis
Bowling, Ann; Charlton, John
1987-01-01
A national sample of elderly widowed people was followed up for six years. Excess mortality was found for men aged 75 years and over in the first six months of bereavement compared with men of the same age in the general population. Logistic regression analysis, controlling for age and sex together, demonstrated that the best independent predictors of mortality among the elderly widowed were: interviewer assessment of low happiness level; interviewer assessed and self-reported problems with nerves and depression; and lack of telephone contacts. The general practitioner is well placed to assess levels of depression and unhappiness among the widowed and to check that they have adequate social support. PMID:3503942
Autoregressive Logistic Regression Applied to Atmospheric Circulation Patterns
NASA Astrophysics Data System (ADS)
Guanche, Yanira; Mínguez, Roberto; Méndez, Fernando J.
2013-04-01
The study of atmospheric patterns, weather types or circulation patterns, is a topic deeply studied by climatologists, and it is widely accepted to disaggregate the atmospheric conditions over regions in a certain number of representative states. This consensus allows simplifying the study of climate conditions to improve weather predictions and a better knowledge of the influence produced by anthropogenic activities on the climate system. Once the atmospheric conditions have been reduced to a catalogue of representative states, it is desirable to dispose of numerical models to improve our understanding about weather dynamics, i.e. i) to analyze climate change studying trends in the probability of occurrence of weather types, ii) to study seasonality and iii) to analyze the possible influence of previous states (Autoregressive terms or Markov Chains). This work introduces the mathematical framework to analyze those effects from a qualitative point of view. In particular, an autoregressive logistic regression model, which has been successfully applied in medical and pharmacological research fields, is presented. The main advantages of autoregressive logistic regression are that i) it can be used to model polytomous outcome variables, such as circulation types, and ii) standard statistical software can be used for fitting purposes. To show the potential of these kind of models for analyzing atmospheric conditions, a case of study located in the Northeastern Atlantic is described. Results obtained show how the model is capable of dealing simultaneously with predictors related to different time scales, which can be used to simulate the behaviour of circulation patterns.
Landslide Hazard Mapping in Rwanda Using Logistic Regression
NASA Astrophysics Data System (ADS)
Piller, A.; Anderson, E.; Ballard, H.
2015-12-01
Landslides in the United States cause more than $1 billion in damages and 50 deaths per year (USGS 2014). Globally, figures are much more grave, yet monitoring, mapping and forecasting of these hazards are less than adequate. Seventy-five percent of the population of Rwanda earns a living from farming, mostly subsistence. Loss of farmland, housing, or life, to landslides is a very real hazard. Landslides in Rwanda have an impact at the economic, social, and environmental level. In a developing nation that faces challenges in tracking, cataloging, and predicting the numerous landslides that occur each year, satellite imagery and spatial analysis allow for remote study. We have focused on the development of a landslide inventory and a statistical methodology for assessing landslide hazards. Using logistic regression on approximately 30 test variables (i.e. slope, soil type, land cover, etc.) and a sample of over 200 landslides, we determine which variables are statistically most relevant to landslide occurrence in Rwanda. A preliminary predictive hazard map for Rwanda has been produced, using the variables selected from the logistic regression analysis.
Cluster-localized sparse logistic regression for SNP data.
Binder, Harald; Müller, Tina; Schwender, Holger; Golka, Klaus; Steffens, Michael; Hengstler, Jan G; Ickstadt, Katja; Schumacher, Martin
2012-08-14
The task of analyzing high-dimensional single nucleotide polymorphism (SNP) data in a case-control design using multivariable techniques has only recently been tackled. While many available approaches investigate only main effects in a high-dimensional setting, we propose a more flexible technique, cluster-localized regression (CLR), based on localized logistic regression models, that allows different SNPs to have an effect for different groups of individuals. Separate multivariable regression models are fitted for the different groups of individuals by incorporating weights into componentwise boosting, which provides simultaneous variable selection, hence sparse fits. For model fitting, these groups of individuals are identified using a clustering approach, where each group may be defined via different SNPs. This allows for representing complex interaction patterns, such as compositional epistasis, that might not be detected by a single main effects model. In a simulation study, the CLR approach results in improved prediction performance, compared to the main effects approach, and identification of important SNPs in several scenarios. Improved prediction performance is also obtained for an application example considering urinary bladder cancer. Some of the identified SNPs are predictive for all individuals, while others are only relevant for a specific group. Together with the sets of SNPs that define the groups, potential interaction patterns are uncovered.
Asymptotically Unbiased Estimation of Exposure Odds Ratios in Complete Records Logistic Regression.
Bartlett, Jonathan W; Harel, Ofer; Carpenter, James R
2015-10-15
Missing data are a commonly occurring threat to the validity and efficiency of epidemiologic studies. Perhaps the most common approach to handling missing data is to simply drop those records with 1 or more missing values, in so-called "complete records" or "complete case" analysis. In this paper, we bring together earlier-derived yet perhaps now somewhat neglected results which show that a logistic regression complete records analysis can provide asymptotically unbiased estimates of the association of an exposure of interest with an outcome, adjusted for a number of confounders, under a surprisingly wide range of missing-data assumptions. We give detailed guidance describing how the observed data can be used to judge the plausibility of these assumptions. The results mean that in large epidemiologic studies which are affected by missing data and analyzed by logistic regression, exposure associations may be estimated without bias in a number of settings where researchers might otherwise assume that bias would occur. PMID:26429998
Asymptotically Unbiased Estimation of Exposure Odds Ratios in Complete Records Logistic Regression.
Bartlett, Jonathan W; Harel, Ofer; Carpenter, James R
2015-10-15
Missing data are a commonly occurring threat to the validity and efficiency of epidemiologic studies. Perhaps the most common approach to handling missing data is to simply drop those records with 1 or more missing values, in so-called "complete records" or "complete case" analysis. In this paper, we bring together earlier-derived yet perhaps now somewhat neglected results which show that a logistic regression complete records analysis can provide asymptotically unbiased estimates of the association of an exposure of interest with an outcome, adjusted for a number of confounders, under a surprisingly wide range of missing-data assumptions. We give detailed guidance describing how the observed data can be used to judge the plausibility of these assumptions. The results mean that in large epidemiologic studies which are affected by missing data and analyzed by logistic regression, exposure associations may be estimated without bias in a number of settings where researchers might otherwise assume that bias would occur.
Logistic regression analysis of cadmium-induced renal abnormalities
Ellis, K.J.; Yuen, K.; Cohn, S.H.
1986-02-01
Cases of renal dysfunction associated with cadium exposure have been reported in Belgium, Great Britian, Japan, United States, and Sweden. Indirect estimates of body burden were often based on the measurement of environmental exposure conditions or on tissue concentrations in urine, blood, saliva, or hair clippings. More recently, however, the direct in vivo assessment of liver and kidney cadmium burden in humans has provided additional data. Sufficient data on humans does exist, however, to make reasonable estimates of the increased risk for cadmium-induced renal dysfunction. In the present paper, a linear logistic regression model has been developed on the basis of liver and kidney cadmium burden. These relationships are discussed with respect to the concept of a critical concentration for the renal cortex. 14 refs., 3 figs., 2 tabs.
Sparse Multinomial Logistic Regression via Approximate Message Passing
NASA Astrophysics Data System (ADS)
Byrne, Evan; Schniter, Philip
2016-11-01
For the problem of multi-class linear classification and feature selection, we propose approximate message passing approaches to sparse multinomial logistic regression (MLR). First, we propose two algorithms based on the Hybrid Generalized Approximate Message Passing (HyGAMP) framework: one finds the maximum a posteriori (MAP) linear classifier and the other finds an approximation of the test-error-rate minimizing linear classifier. Then we design computationally simplified variants of these two algorithms. Next, we detail methods to tune the hyperparameters of their assumed statistical models using Stein's unbiased risk estimate (SURE) and expectation-maximization (EM), respectively. Finally, using both synthetic and real-world datasets, we demonstrate improved error-rate and runtime performance relative to existing state-of-the-art approaches to sparse MLR.
Ordinal logistic regression models: application in quality of life studies.
Abreu, Mery Natali Silva; Siqueira, Arminda Lucia; Cardoso, Clareci Silva; Caiaffa, Waleska Teixeira
2008-01-01
Quality of life has been increasingly emphasized in public health research in recent years. Typically, the results of quality of life are measured by means of ordinal scales. In these situations, specific statistical methods are necessary because procedures such as either dichotomization or misinformation on the distribution of the outcome variable may complicate the inferential process. Ordinal logistic regression models are appropriate in many of these situations. This article presents a review of the proportional odds model, partial proportional odds model, continuation ratio model, and stereotype model. The fit, statistical inference, and comparisons between models are illustrated with data from a study on quality of life in 273 patients with schizophrenia. All tested models showed good fit, but the proportional odds or partial proportional odds models proved to be the best choice due to the nature of the data and ease of interpretation of the results. Ordinal logistic models perform differently depending on categorization of outcome, adequacy in relation to assumptions, goodness-of-fit, and parsimony.
Logistic Regression-HSMM-Based Heart Sound Segmentation.
Springer, David B; Tarassenko, Lionel; Clifford, Gari D
2016-04-01
The identification of the exact positions of the first and second heart sounds within a phonocardiogram (PCG), or heart sound segmentation, is an essential step in the automatic analysis of heart sound recordings, allowing for the classification of pathological events. While threshold-based segmentation methods have shown modest success, probabilistic models, such as hidden Markov models, have recently been shown to surpass the capabilities of previous methods. Segmentation performance is further improved when a priori information about the expected duration of the states is incorporated into the model, such as in a hidden semi-Markov model (HSMM). This paper addresses the problem of the accurate segmentation of the first and second heart sound within noisy real-world PCG recordings using an HSMM, extended with the use of logistic regression for emission probability estimation. In addition, we implement a modified Viterbi algorithm for decoding the most likely sequence of states, and evaluated this method on a large dataset of 10,172 s of PCG recorded from 112 patients (including 12,181 first and 11,627 second heart sounds). The proposed method achieved an average F1 score of 95.63 ± 0.85%, while the current state of the art achieved 86.28 ± 1.55% when evaluated on unseen test recordings. The greater discrimination between states afforded using logistic regression as opposed to the previous Gaussian distribution-based emission probability estimation as well as the use of an extended Viterbi algorithm allows this method to significantly outperform the current state-of-the-art method based on a two-sided paired t-test.
Use of generalized ordered logistic regression for the analysis of multidrug resistance data.
Agga, Getahun E; Scott, H Morgan
2015-10-01
Statistical analysis of antimicrobial resistance data largely focuses on individual antimicrobial's binary outcome (susceptible or resistant). However, bacteria are becoming increasingly multidrug resistant (MDR). Statistical analysis of MDR data is mostly descriptive often with tabular or graphical presentations. Here we report the applicability of generalized ordinal logistic regression model for the analysis of MDR data. A total of 1,152 Escherichia coli, isolated from the feces of weaned pigs experimentally supplemented with chlortetracycline (CTC) and copper, were tested for susceptibilities against 15 antimicrobials and were binary classified into resistant or susceptible. The 15 antimicrobial agents tested were grouped into eight different antimicrobial classes. We defined MDR as the number of antimicrobial classes to which E. coli isolates were resistant ranging from 0 to 8. Proportionality of the odds assumption of the ordinal logistic regression model was violated only for the effect of treatment period (pre-treatment, during-treatment and post-treatment); but not for the effect of CTC or copper supplementation. Subsequently, a partially constrained generalized ordinal logistic model was built that allows for the effect of treatment period to vary while constraining the effects of treatment (CTC and copper supplementation) to be constant across the levels of MDR classes. Copper (Proportional Odds Ratio [Prop OR]=1.03; 95% CI=0.73-1.47) and CTC (Prop OR=1.1; 95% CI=0.78-1.56) supplementation were not significantly associated with the level of MDR adjusted for the effect of treatment period. MDR generally declined over the trial period. In conclusion, generalized ordered logistic regression can be used for the analysis of ordinal data such as MDR data when the proportionality assumptions for ordered logistic regression are violated.
Length bias correction in gene ontology enrichment analysis using logistic regression.
Mi, Gu; Di, Yanming; Emerson, Sarah; Cumbie, Jason S; Chang, Jeff H
2012-01-01
When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible. PMID:23056249
Preisser, John S; Arcury, Thomas A; Quandt, Sara A
2003-09-01
In longitudinal surveillance studies of occupational illnesses, sickness episodes are recorded for workers over time. Since observations on the same worker are typically more similar than observations from different workers, statistical analysis must take into account the intraworker association due to workers' repeated measures. Additionally, when workers are employed in groups or clusters, observations from workers in the same workplace are typically more similar than observations from workers in different workplaces. For such cluster-correlated longitudinal data, alternating logistic regressions may be used to model the pattern of occupational illness clustering. Data on 182 Latino farm workers from a 1999 North Carolina study on green tobacco sickness provided an estimated pairwise odds ratio for within-worker clustering of 3.15 (95% confidence interval (CI): 1.84, 5.41) and an estimated pairwise odds ratio for within-camp clustering of 1.90 (95% CI: 1.22, 2.97). After adjustment for risk factors, the estimated pairwise odds ratios were 2.13 (95% CI: 1.18, 3.86) and 1.41 (95% CI: 0.89, 2.24), respectively. In this paper, a comparative analysis of alternating logistic regressions with generalized estimating equations and random-effects logistic regression is presented, and the relative strengths of the three methods are discussed. PMID:12936905
Metadata for ReVA logistic regression dataset
LaMotte, Andrew E.
2004-01-01
The U.S. Geological Survey in cooperation with the U.S. Environmental Protection Agency's Regional Vulnerability Assessment Program, has developed a set of statistical tools to support regional-scale, ground-water quality and vulnerability assessments. The Regional Vulnerability Assessment Program goals are to develop and demonstrate approaches to comprehensive, regional-scale assessments that effectively inform water-resources managers and decision-makers as to the magnitude, extent, distribution, and uncertainty of current and anticipated environmental risks. The U.S. Geological Survey is developing and exploring the use of statistical probability models to characterize the relation between ground-water quality and geographic factors in the Mid-Atlantic Region. Available water-quality data obtained from U.S. Geological Survey National Water-Quality Assessment Program studies conducted in the Mid-Atlantic Region were used in association with geographic data (land cover, geology, soils, and others) to develop logistic-regression equations that use explanatory variables to predict the presence of a selected water-quality parameter exceeding specified management concentration thresholds. The resulting logistic-regression equations were transformed to determine the probability, P(X), of a water-quality parameter exceeding a specified management threshold. Additional statistical procedures modified by the U.S. Geological Survey were used to compare the observed values to model-predicted values at each sample point. In addition, procedures to evaluate the confidence of the model predictions and estimate the uncertainty of the probability value were developed and applied. The resulting logistic-regression models were applied to the Mid-Atlantic Region to predict the spatial probability of nitrate concentrations exceeding specified management thresholds. These thresholds are usually set or established by regulators or managers at national or local levels. At management
NASA Astrophysics Data System (ADS)
Ariffin, Syaiba Balqish; Midi, Habshah
2014-06-01
This article is concerned with the performance of logistic ridge regression estimation technique in the presence of multicollinearity and high leverage points. In logistic regression, multicollinearity exists among predictors and in the information matrix. The maximum likelihood estimator suffers a huge setback in the presence of multicollinearity which cause regression estimates to have unduly large standard errors. To remedy this problem, a logistic ridge regression estimator is put forward. It is evident that the logistic ridge regression estimator outperforms the maximum likelihood approach for handling multicollinearity. The effect of high leverage points are then investigated on the performance of the logistic ridge regression estimator through real data set and simulation study. The findings signify that logistic ridge regression estimator fails to provide better parameter estimates in the presence of both high leverage points and multicollinearity.
Drought Patterns Forecasting using an Auto-Regressive Logistic Model
NASA Astrophysics Data System (ADS)
del Jesus, M.; Sheffield, J.; Méndez Incera, F. J.; Losada, I. J.; Espejo, A.
2014-12-01
Drought is characterized by a water deficit that may manifest across a large range of spatial and temporal scales. Drought may create important socio-economic consequences, many times of catastrophic dimensions. A quantifiable definition of drought is elusive because depending on its impacts, consequences and generation mechanism, different water deficit periods may be identified as a drought by virtue of some definitions but not by others. Droughts are linked to the water cycle and, although a climate change signal may not have emerged yet, they are also intimately linked to climate.In this work we develop an auto-regressive logistic model for drought prediction at different temporal scales that makes use of a spatially explicit framework. Our model allows to include covariates, continuous or categorical, to improve the performance of the auto-regressive component.Our approach makes use of dimensionality reduction (principal component analysis) and classification techniques (K-Means and maximum dissimilarity) to simplify the representation of complex climatic patterns, such as sea surface temperature (SST) and sea level pressure (SLP), while including information on their spatial structure, i.e. considering their spatial patterns. This procedure allows us to include in the analysis multivariate representation of complex climatic phenomena, as the El Niño-Southern Oscillation. We also explore the impact of other climate-related variables such as sun spots. The model allows to quantify the uncertainty of the forecasts and can be easily adapted to make predictions under future climatic scenarios. The framework herein presented may be extended to other applications such as flash flood analysis, or risk assessment of natural hazards.
On the Usefulness of a Multilevel Logistic Regression Approach to Person-Fit Analysis
ERIC Educational Resources Information Center
Conijn, Judith M.; Emons, Wilco H. M.; van Assen, Marcel A. L. M.; Sijtsma, Klaas
2011-01-01
The logistic person response function (PRF) models the probability of a correct response as a function of the item locations. Reise (2000) proposed to use the slope parameter of the logistic PRF as a person-fit measure. He reformulated the logistic PRF model as a multilevel logistic regression model and estimated the PRF parameters from this…
The crux of the method: assumptions in ordinary least squares and logistic regression.
Long, Rebecca G
2008-10-01
Logistic regression has increasingly become the tool of choice when analyzing data with a binary dependent variable. While resources relating to the technique are widely available, clear discussions of why logistic regression should be used in place of ordinary least squares regression are difficult to find. The current paper compares and contrasts the assumptions of ordinary least squares with those of logistic regression and explains why logistic regression's looser assumptions make it adept at handling violations of the more important assumptions in ordinary least squares.
Risk factors for temporomandibular disorder: Binary logistic regression analysis
Magalhães, Bruno G.; de-Sousa, Stéphanie T.; de Mello, Victor V C.; da-Silva-Barbosa, André C.; de-Assis-Morais, Mariana P L.; Barbosa-Vasconcelos, Márcia M V.
2014-01-01
Objectives: To analyze the influence of socioeconomic and demographic factors (gender, economic class, age and marital status) on the occurrence of temporomandibular disorder. Study Design: One hundred individuals from urban areas in the city of Recife (Brazil) registered at Family Health Units was examined using Axis I of the Research Diagnostic Criteria for Temporomandibular Disorders (RDC/TMD) which addresses myofascial pain and joint problems (disc displacement, arthralgia, osteoarthritis and oesteoarthrosis). The Brazilian Economic Classification Criteria (CCEB) was used for the collection of socioeconomic and demographic data. Then, it was categorized as Class A (high social class), Classes B/C (middle class) and Classes D/E (very poor social class). The results were analyzed using Pearson’s chi-square test for proportions, Fisher’s exact test, nonparametric Mann-Whitney test and Binary logistic regression analysis. Results: None of the participants belonged to Class A, 72% belonged to Classes B/C and 28% belonged to Classes D/E. The multivariate analysis revealed that participants from Classes D/E had a 4.35-fold greater chance of exhibiting myofascial pain and 11.3-fold greater chance of exhibiting joint problems. Conclusions: Poverty is a important condition to exhibit myofascial pain and joint problems. Key words:Temporomandibular joint disorders, risk factors, prevalence. PMID:24316706
Actigraphy-based scratch detection using logistic regression.
Petersen, Johanna; Austin, Daniel; Sack, Robert; Hayes, Tamara L
2013-03-01
Incessant scratching as a result of diseases such as atopic dermatitis causes skin break down, poor sleep quality, and reduced quality of life for affected individuals. In order to develop more effective therapies, there is a need for objective measures to detect scratching. Wrist actigraphy, which detects wrist movements over time using micro-accelerometers, has shown great promise in detecting scratch because it is lightweight, usable in the home environment, can record longitudinally, and does not require any wires. However, current actigraphy-based scratch-detection methods are limited in their ability to discriminate scratch from other nighttime activities. Our previous work demonstrated the separability of scratch from both walking and restless sleep using a clustering technique which employed four features derived from the actigraphic data: number of accelerations above 0.01 gs, epoch variance, peak frequency, and autocorrelation value at one lag. In this paper, we extended these results by employing these same features as independent variables in a logistic regression model. This allows us to directly estimate the conditional probability of scratching for each epoch. Our approach outperforms competing actigraphy-based approaches and has both high sensitivity (0.96) and specificity (0.92) for identifying scratch as validated on experimental data collected from 12 healthy subjects. The model must still be fully validated on clinical data, but shows promise for applications to clinical trials and longitudinal studies of scratch.
Autoregressive logistic regression applied to atmospheric circulation patterns
NASA Astrophysics Data System (ADS)
Guanche, Y.; Mínguez, R.; Méndez, F. J.
2014-01-01
Autoregressive logistic regression models have been successfully applied in medical and pharmacology research fields, and in simple models to analyze weather types. The main purpose of this paper is to introduce a general framework to study atmospheric circulation patterns capable of dealing simultaneously with: seasonality, interannual variability, long-term trends, and autocorrelation of different orders. To show its effectiveness on modeling performance, daily atmospheric circulation patterns identified from observed sea level pressure fields over the Northeastern Atlantic, have been analyzed using this framework. Model predictions are compared with probabilities from the historical database, showing very good fitting diagnostics. In addition, the fitted model is used to simulate the evolution over time of atmospheric circulation patterns using Monte Carlo method. Simulation results are statistically consistent with respect to the historical sequence in terms of (1) probability of occurrence of the different weather types, (2) transition probabilities and (3) persistence. The proposed model constitutes an easy-to-use and powerful tool for a better understanding of the climate system.
Application of Bayesian Logistic Regression to Mining Biomedical Data
Avali, Viji R.; Cooper, Gregory F.; Gopalakrishnan, Vanathi
2014-01-01
Mining high dimensional biomedical data with existing classifiers is challenging and the predictions are often inaccurate. We investigated the use of Bayesian Logistic Regression (B-LR) for mining such data to predict and classify various disease conditions. The analysis was done on twelve biomedical datasets with binary class variables and the performance of B-LR was compared to those from other popular classifiers on these datasets with 10-fold cross validation using the WEKA data mining toolkit. The statistical significance of the results was analyzed by paired two tailed t-tests and non-parametric Wilcoxon signed-rank tests. We observed overall that B-LR with non-informative Gaussian priors performed on par with other classifiers in terms of accuracy, balanced accuracy and AUC. These results suggest that it is worthwhile to explore the application of B-LR to predictive modeling tasks in bioinformatics using informative biological prior probabilities. With informative prior probabilities, we conjecture that the performance of B-LR will improve. PMID:25954328
Efficient logistic regression designs under an imperfect population identifier.
Albert, Paul S; Liu, Aiyi; Nansel, Tonja
2014-03-01
Motivated by actual study designs, this article considers efficient logistic regression designs where the population is identified with a binary test that is subject to diagnostic error. We consider the case where the imperfect test is obtained on all participants, while the gold standard test is measured on a small chosen subsample. Under maximum-likelihood estimation, we evaluate the optimal design in terms of sample selection as well as verification. We show that there may be substantial efficiency gains by choosing a small percentage of individuals who test negative on the imperfect test for inclusion in the sample (e.g., verifying 90% test-positive cases). We also show that a two-stage design may be a good practical alternative to a fixed design in some situations. Under optimal and nearly optimal designs, we compare maximum-likelihood and semi-parametric efficient estimators under correct and misspecified models with simulations. The methodology is illustrated with an analysis from a diabetes behavioral intervention trial.
ERIC Educational Resources Information Center
Chen, Chau-Kuang
2005-01-01
Logistic and Cox regression methods are practical tools used to model the relationships between certain student learning outcomes and their relevant explanatory variables. The logistic regression model fits an S-shaped curve into a binary outcome with data points of zero and one. The Cox regression model allows investigators to study the duration…
ERIC Educational Resources Information Center
Schulz, E. Matthew; Betebenner, Damian; Ahn, Meeyeon
This study was performed to determine whether hierarchical logistic regression models could reduce the sample size requirements of ordinary (nonhierarchical) logistic regression models. Data from courses with varying class size were randomly partitioned into two halves per course. Grades of students in college algebra courses were obtained from 40…
Mixed-Effects Logistic Regression Models for Indirectly Observed Discrete Outcome Variables
ERIC Educational Resources Information Center
Vermunt, Jeroen K.
2005-01-01
A well-established approach to modeling clustered data introduces random effects in the model of interest. Mixed-effects logistic regression models can be used to predict discrete outcome variables when observations are correlated. An extension of the mixed-effects logistic regression model is presented in which the dependent variable is a latent…
Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning
ERIC Educational Resources Information Center
Li, Zhushan
2014-01-01
Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…
What Are the Odds of that? A Primer on Understanding Logistic Regression
ERIC Educational Resources Information Center
Huang, Francis L.; Moon, Tonya R.
2013-01-01
The purpose of this Methodological Brief is to present a brief primer on logistic regression, a commonly used technique when modeling dichotomous outcomes. Using data from the National Education Longitudinal Study of 1988 (NELS:88), logistic regression techniques were used to investigate student-level variables in eighth grade (i.e., enrolled in a…
ERIC Educational Resources Information Center
Koon, Sharon; Petscher, Yaacov
2015-01-01
The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules…
Asymptotically Unbiased Estimation of Exposure Odds Ratios in Complete Records Logistic Regression
Bartlett, Jonathan W.; Harel, Ofer; Carpenter, James R.
2015-01-01
Missing data are a commonly occurring threat to the validity and efficiency of epidemiologic studies. Perhaps the most common approach to handling missing data is to simply drop those records with 1 or more missing values, in so-called “complete records” or “complete case” analysis. In this paper, we bring together earlier-derived yet perhaps now somewhat neglected results which show that a logistic regression complete records analysis can provide asymptotically unbiased estimates of the association of an exposure of interest with an outcome, adjusted for a number of confounders, under a surprisingly wide range of missing-data assumptions. We give detailed guidance describing how the observed data can be used to judge the plausibility of these assumptions. The results mean that in large epidemiologic studies which are affected by missing data and analyzed by logistic regression, exposure associations may be estimated without bias in a number of settings where researchers might otherwise assume that bias would occur. PMID:26429998
Modeling data for pancreatitis in presence of a duodenal diverticula using logistic regression
NASA Astrophysics Data System (ADS)
Dineva, S.; Prodanova, K.; Mlachkova, D.
2013-12-01
The presence of a periampullary duodenal diverticulum (PDD) is often observed during upper digestive tract barium meal studies and endoscopic retrograde cholangiopancreatography (ERCP). A few papers reported that the diverticulum had something to do with the incidence of pancreatitis. The aim of this study is to investigate if the presence of duodenal diverticula predisposes to the development of a pancreatic disease. A total 3966 patients who had undergone ERCP were studied retrospectively. They were divided into 2 groups-with and without PDD. Patients with a duodenal diverticula had a higher rate of acute pancreatitis. The duodenal diverticula is a risk factor for acute idiopathic pancreatitis. A multiple logistic regression to obtain adjusted estimate of odds and to identify if a PDD is a predictor of acute or chronic pancreatitis was performed. The software package STATISTICA 10.0 was used for analyzing the real data.
Schörgendorfer, Angela; Branscum, Adam J; Hanson, Timothy E
2013-06-01
Logistic regression is a popular tool for risk analysis in medical and population health science. With continuous response data, it is common to create a dichotomous outcome for logistic regression analysis by specifying a threshold for positivity. Fitting a linear regression to the nondichotomized response variable assuming a logistic sampling model for the data has been empirically shown to yield more efficient estimates of odds ratios than ordinary logistic regression of the dichotomized endpoint. We illustrate that risk inference is not robust to departures from the parametric logistic distribution. Moreover, the model assumption of proportional odds is generally not satisfied when the condition of a logistic distribution for the data is violated, leading to biased inference from a parametric logistic analysis. We develop novel Bayesian semiparametric methodology for testing goodness of fit of parametric logistic regression with continuous measurement data. The testing procedures hold for any cutoff threshold and our approach simultaneously provides the ability to perform semiparametric risk estimation. Bayes factors are calculated using the Savage-Dickey ratio for testing the null hypothesis of logistic regression versus a semiparametric generalization. We propose a fully Bayesian and a computationally efficient empirical Bayesian approach to testing, and we present methods for semiparametric estimation of risks, relative risks, and odds ratios when parametric logistic regression fails. Theoretical results establish the consistency of the empirical Bayes test. Results from simulated data show that the proposed approach provides accurate inference irrespective of whether parametric assumptions hold or not. Evaluation of risk factors for obesity shows that different inferences are derived from an analysis of a real data set when deviations from a logistic distribution are permissible in a flexible semiparametric framework.
Overlapping Group Logistic Regression with Applications to Genetic Pathway Selection
Zeng, Yaohui; Breheny, Patrick
2016-01-01
Discovering important genes that account for the phenotype of interest has long been a challenge in genome-wide expression analysis. Analyses such as gene set enrichment analysis (GSEA) that incorporate pathway information have become widespread in hypothesis testing, but pathway-based approaches have been largely absent from regression methods due to the challenges of dealing with overlapping pathways and the resulting lack of available software. The R package grpreg is widely used to fit group lasso and other group-penalized regression models; in this study, we develop an extension, grpregOverlap, to allow for overlapping group structure using a latent variable approach. We compare this approach to the ordinary lasso and to GSEA using both simulated and real data. We find that incorporation of prior pathway information can substantially improve the accuracy of gene expression classifiers, and we shed light on several ways in which hypothesis-testing approaches such as GSEA differ from regression approaches with respect to the analysis of pathway data. PMID:27679461
Overlapping Group Logistic Regression with Applications to Genetic Pathway Selection.
Zeng, Yaohui; Breheny, Patrick
2016-01-01
Discovering important genes that account for the phenotype of interest has long been a challenge in genome-wide expression analysis. Analyses such as gene set enrichment analysis (GSEA) that incorporate pathway information have become widespread in hypothesis testing, but pathway-based approaches have been largely absent from regression methods due to the challenges of dealing with overlapping pathways and the resulting lack of available software. The R package grpreg is widely used to fit group lasso and other group-penalized regression models; in this study, we develop an extension, grpregOverlap, to allow for overlapping group structure using a latent variable approach. We compare this approach to the ordinary lasso and to GSEA using both simulated and real data. We find that incorporation of prior pathway information can substantially improve the accuracy of gene expression classifiers, and we shed light on several ways in which hypothesis-testing approaches such as GSEA differ from regression approaches with respect to the analysis of pathway data. PMID:27679461
Overlapping Group Logistic Regression with Applications to Genetic Pathway Selection
Zeng, Yaohui; Breheny, Patrick
2016-01-01
Discovering important genes that account for the phenotype of interest has long been a challenge in genome-wide expression analysis. Analyses such as gene set enrichment analysis (GSEA) that incorporate pathway information have become widespread in hypothesis testing, but pathway-based approaches have been largely absent from regression methods due to the challenges of dealing with overlapping pathways and the resulting lack of available software. The R package grpreg is widely used to fit group lasso and other group-penalized regression models; in this study, we develop an extension, grpregOverlap, to allow for overlapping group structure using a latent variable approach. We compare this approach to the ordinary lasso and to GSEA using both simulated and real data. We find that incorporation of prior pathway information can substantially improve the accuracy of gene expression classifiers, and we shed light on several ways in which hypothesis-testing approaches such as GSEA differ from regression approaches with respect to the analysis of pathway data.
ERIC Educational Resources Information Center
DeMars, Christine E.
2009-01-01
The Mantel-Haenszel (MH) and logistic regression (LR) differential item functioning (DIF) procedures have inflated Type I error rates when there are large mean group differences, short tests, and large sample sizes.When there are large group differences in mean score, groups matched on the observed number-correct score differ on true score,…
Analysis of Differential Item Functioning (DIF) Using Hierarchical Logistic Regression Models.
ERIC Educational Resources Information Center
Swanson, David B.; Clauser, Brian E.; Case, Susan M.; Nungester, Ronald J.; Featherman, Carol
2002-01-01
Outlines an approach to differential item functioning (DIF) analysis using hierarchical linear regression that makes it possible to combine results of logistic regression analyses across items to identify consistent sources of DIF, to quantify the proportion of explained variation in DIF coefficients, and to compare the predictive accuracy of…
Estimation of adjusted rate differences using additive negative binomial regression.
Donoghoe, Mark W; Marschner, Ian C
2016-08-15
Rate differences are an important effect measure in biostatistics and provide an alternative perspective to rate ratios. When the data are event counts observed during an exposure period, adjusted rate differences may be estimated using an identity-link Poisson generalised linear model, also known as additive Poisson regression. A problem with this approach is that the assumption of equality of mean and variance rarely holds in real data, which often show overdispersion. An additive negative binomial model is the natural alternative to account for this; however, standard model-fitting methods are often unable to cope with the constrained parameter space arising from the non-negativity restrictions of the additive model. In this paper, we propose a novel solution to this problem using a variant of the expectation-conditional maximisation-either algorithm. Our method provides a reliable way to fit an additive negative binomial regression model and also permits flexible generalisations using semi-parametric regression functions. We illustrate the method using a placebo-controlled clinical trial of fenofibrate treatment in patients with type II diabetes, where the outcome is the number of laser therapy courses administered to treat diabetic retinopathy. An R package is available that implements the proposed method. Copyright © 2016 John Wiley & Sons, Ltd. PMID:27073156
Kesselmeier, Miriam; Legrand, Carine; Peil, Barbara; Kabisch, Maria; Fischer, Christine; Hamann, Ute; Lorenzo Bermejo, Justo
2014-01-01
Logistic regression is usually applied to investigate the association between inherited genetic variants and a binary disease phenotype. A limitation of standard methods used to estimate the parameters of logistic regression models is their strong dependence on a few observations deviating from the majority of the data. We used data from the Genetic Analysis Workshop 18 to explore the possible benefit of robust logistic regression to estimate the genetic risk of hypertension. The comparison between standard and robust methods relied on the influence of departing hypertension profiles (outliers) on the estimated odds ratios, areas under the receiver operating characteristic curves, and clinical net benefit. Our results confirmed that single outliers may substantially affect the estimated genotype relative risks. The ranking of variants by probability values was different in standard and in robust logistic regression. For cutoff probabilities between 0.2 and 0.6, the clinical net benefit estimated by leave-one-out cross-validation in the investigated sample was slightly larger under robust regression, but the overall area under the receiver operating characteristic curve was larger for standard logistic regression. The potential advantage of robust statistics in the context of genetic association studies should be investigated in future analyses based on real and simulated data.
NASA Astrophysics Data System (ADS)
Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.
2014-07-01
Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.
Comparison of artificial neural networks with logistic regression for detection of obesity.
Heydari, Seyed Taghi; Ayatollahi, Seyed Mohammad Taghi; Zare, Najaf
2012-08-01
Obesity is a common problem in nutrition, both in the developed and developing countries. The aim of this study was to classify obesity by artificial neural networks and logistic regression. This cross-sectional study comprised of 414 healthy military personnel in southern Iran. All subjects completed questionnaires on their socio-economic status and their anthropometric measures were measured by a trained nurse. Classification of obesity was done by artificial neural networks and logistic regression. The mean age±SD of participants was 34.4 ± 7.5 years. A total of 187 (45.2%) were obese. In regard to logistic regression and neural networks the respective values were 80.2% and 81.2% when correctly classified, 80.2 and 79.7 for sensitivity and 81.9 and 83.7 for specificity; while the area under Receiver-Operating Characteristic (ROC) curve were 0.888 and 0.884 and the Kappa statistic were 0.600 and 0.629 for logistic regression and neural networks model respectively. We conclude that the neural networks and logistic regression both were good classifier for obesity detection but they were not significantly different in classification.
Yang, Lixue; Chen, Kean
2015-11-01
To improve the design of underwater target recognition systems based on auditory perception, this study compared human listeners with automatic classifiers. Performances measures and strategies in three discrimination experiments, including discriminations between man-made and natural targets, between ships and submarines, and among three types of ships, were used. In the experiments, the subjects were asked to assign a score to each sound based on how confident they were about the category to which it belonged, and logistic regression, which represents linear discriminative models, also completed three similar tasks by utilizing many auditory features. The results indicated that the performances of logistic regression improved as the ratio between inter- and intra-class differences became larger, whereas the performances of the human subjects were limited by their unfamiliarity with the targets. Logistic regression performed better than the human subjects in all tasks but the discrimination between man-made and natural targets, and the strategies employed by excellent human subjects were similar to that of logistic regression. Logistic regression and several human subjects demonstrated similar performances when discriminating man-made and natural targets, but in this case, their strategies were not similar. An appropriate fusion of their strategies led to further improvement in recognition accuracy.
Solving Logistic Regression with Group Cardinality Constraints for Time Series Analysis
Zhang, Yong; Pohl, Kilian M.
2016-01-01
We propose an algorithm to distinguish 3D+t images of healthy from diseased subjects by solving logistic regression based on cardinality constrained, group sparsity. This method reduces the risk of overfitting by providing an elegant solution to identifying anatomical regions most impacted by disease. It also ensures that consistent identification across the time series by grouping each image feature across time and counting the number of non-zero groupings. While popular in medical imaging, group cardinality constrained problems are generally solved by relaxing counting with summing over the groupings. We instead solve the original problem by generalizing a penalty decomposition algorithm, which alternates between minimizing a logistic regression function with a regularizer based on the Frobenius norm and enforcing sparsity. Applied to 86 cine MRIs of healthy cases and subjects with Tetralogy of Fallot (TOF), our method correctly identifies regions impacted by TOF and obtains a statistically significant higher classification accuracy than logistic regression without and relaxed grouped sparsity constraint.
Mohammed, Mohammed A; Manktelow, Bradley N; Hofer, Timothy P
2016-04-01
There is interest in deriving case-mix adjusted standardised mortality ratios so that comparisons between healthcare providers, such as hospitals, can be undertaken in the controversial belief that variability in standardised mortality ratios reflects quality of care. Typically standardised mortality ratios are derived using a fixed effects logistic regression model, without a hospital term in the model. This fails to account for the hierarchical structure of the data - patients nested within hospitals - and so a hierarchical logistic regression model is more appropriate. However, four methods have been advocated for deriving standardised mortality ratios from a hierarchical logistic regression model, but their agreement is not known and neither do we know which is to be preferred. We found significant differences between the four types of standardised mortality ratios because they reflect a range of underlying conceptual issues. The most subtle issue is the distinction between asking how an average patient fares in different hospitals versus how patients at a given hospital fare at an average hospital. Since the answers to these questions are not the same and since the choice between these two approaches is not obvious, the extent to which profiling hospitals on mortality can be undertaken safely and reliably, without resolving these methodological issues, remains questionable.
Aoyama, Shigeru; Kino, Koji; Kobayashi, Jyunji; Yoshimasu, Hidemi; Amagasa, Teruo
2005-06-01
This study compares temporomandibular joint dysfunction (TMD) symptoms before and after bilateral sagittal split ramus osteotomy, and identifies predictive factors for the postoperative TMD symptoms by assessing the adjusted odds ratio using multiple logistic regression analysis. A consecutive series of 37 cases treated only with bilateral sagittal split ramus osteotomy were evaluated. New postoperative TMD symptoms appeared in 9 cases, preoperative TMD symptoms disappeared in 6 cases, and TMD symptoms were unchanged in 5 cases. The median period until the interincisal opening range attained 40 mm was 5 months (range, from 2 to 15 months). Age was a positive factor in patients with postoperative TMD symptoms, with an odds ratio of 1.43 (95 percent confidence interval, from 1.05 to 1.93). In addition, the maximum value of the bilateral setback distance of more than 9 mm was a positive factor of 6.95 (95 percent confidence interval, from 1.06 to 45.42). We concluded that surgical correction in skeletal malocclusion may affect temporomandibular joint dysfunction symptoms. PMID:16187616
[Clinical research XX. From clinical judgment to multiple logistic regression model].
Berea-Baltierra, Ricardo; Rivas-Ruiz, Rodolfo; Pérez-Rodríguez, Marcela; Palacios-Cruz, Lino; Moreno, Jorge; Talavera, Juan O
2014-01-01
The complexity of the causality phenomenon in clinical practice implies that the result of a maneuver is not solely caused by the maneuver, but by the interaction among the maneuver and other baseline factors or variables occurring during the maneuver. This requires methodological designs that allow the evaluation of these variables. When the outcome is a binary variable, we use the multiple logistic regression model (MLRM). This multivariate model is useful when we want to predict or explain, adjusting due to the effect of several risk factors, the effect of a maneuver or exposition over the outcome. In order to perform an MLRM, the outcome or dependent variable must be a binary variable and both categories must mutually exclude each other (i.e. live/death, healthy/ill); on the other hand, independent variables or risk factors may be either qualitative or quantitative. The effect measure obtained from this model is the odds ratio (OR) with 95 % confidence intervals (CI), from which we can estimate the proportion of the outcome's variability explained through the risk factors. For these reasons, the MLRM is used in clinical research, since one of the main objectives in clinical practice comprises the ability to predict or explain an event where different risk or prognostic factors are taken into account.
Bakhtiyari, Mahmood; Mehmandar, Mohammad Reza; Mirbagheri, Babak; Hariri, Gholam Reza; Delpisheh, Ali; Soori, Hamid
2014-01-01
Risk factors of human-related traffic crashes are the most important and preventable challenges for community health due to their noteworthy burden in developing countries in particular. The present study aims to investigate the role of human risk factors of road traffic crashes in Iran. Through a cross-sectional study using the COM 114 data collection forms, the police records of almost 600,000 crashes occurred in 2010 are investigated. The binary logistic regression and proportional odds regression models are used. The odds ratio for each risk factor is calculated. These models are adjusted for known confounding factors including age, sex and driving time. The traffic crash reports of 537,688 men (90.8%) and 54,480 women (9.2%) are analysed. The mean age is 34.1 ± 14 years. Not maintaining eyes on the road (53.7%) and losing control of the vehicle (21.4%) are the main causes of drivers' deaths in traffic crashes within cities. Not maintaining eyes on the road is also the most frequent human risk factor for road traffic crashes out of cities. Sudden lane excursion (OR = 9.9, 95% CI: 8.2-11.9) and seat belt non-compliance (OR = 8.7, CI: 6.7-10.1), exceeding authorised speed (OR = 17.9, CI: 12.7-25.1) and exceeding safe speed (OR = 9.7, CI: 7.2-13.2) are the most significant human risk factors for traffic crashes in Iran. The high mortality rate of 39 people for every 100,000 population emphasises on the importance of traffic crashes in Iran. Considering the important role of human risk factors in traffic crashes, struggling efforts are required to control dangerous driving behaviours such as exceeding speed, illegal overtaking and not maintaining eyes on the road.
Use and interpretation of logistic regression in habitat-selection studies
Keating, Kim A.; Cherry, Steve
2004-01-01
Logistic regression is an important tool for wildlife habitat-selection studies, but the method frequently has been misapplied due to an inadequate understanding of the logistic model, its interpretation, and the influence of sampling design. To promote better use of this method, we review its application and interpretation under 3 sampling designs: random, case-control, and use-availability. Logistic regression is appropriate for habitat use-nonuse studies employing random sampling and can be used to directly model the conditional probability of use in such cases. Logistic regression also is appropriate for studies employing case-control sampling designs, but careful attention is required to interpret results correctly. Unless bias can be estimated or probability of use is small for all habitats, results of case-control studies should be interpreted as odds ratios, rather than probability of use or relative probability of use. When data are gathered under a use-availability design, logistic regression can be used to estimate approximate odds ratios if probability of use is small, at least on average. More generally, however, logistic regression is inappropriate for modeling habitat selection in use-availability studies. In particular, using logistic regression to fit the exponential model of Manly et al. (2002:100) does not guarantee maximum-likelihood estimates, valid probabilities, or valid likelihoods. We show that the resource selection function (RSF) commonly used for the exponential model is proportional to a logistic discriminant function. Thus, it may be used to rank habitats with respect to probability of use and to identify important habitat characteristics or their surrogates, but it is not guaranteed to be proportional to probability of use. Other problems associated with the exponential model also are discussed. We describe an alternative model based on Lancaster and Imbens (1996) that offers a method for estimating conditional probability of use in
ERIC Educational Resources Information Center
Ozechowski, Timothy J.; Turner, Charles W.; Hops, Hyman
2007-01-01
This article demonstrates the use of mixed-effects logistic regression (MLR) for conducting sequential analyses of binary observational data. MLR is a special case of the mixed-effects logit modeling framework, which may be applied to multicategorical observational data. The MLR approach is motivated in part by G. A. Dagne, G. W. Howe, C. H.…
The cross-validated AUC for MCP-logistic regression with high-dimensional data.
Jiang, Dingfeng; Huang, Jian; Zhang, Ying
2013-10-01
We propose a cross-validated area under the receiving operator characteristic (ROC) curve (CV-AUC) criterion for tuning parameter selection for penalized methods in sparse, high-dimensional logistic regression models. We use this criterion in combination with the minimax concave penalty (MCP) method for variable selection. The CV-AUC criterion is specifically designed for optimizing the classification performance for binary outcome data. To implement the proposed approach, we derive an efficient coordinate descent algorithm to compute the MCP-logistic regression solution surface. Simulation studies are conducted to evaluate the finite sample performance of the proposed method and its comparison with the existing methods including the Akaike information criterion (AIC), Bayesian information criterion (BIC) or Extended BIC (EBIC). The model selected based on the CV-AUC criterion tends to have a larger predictive AUC and smaller classification error than those with tuning parameters selected using the AIC, BIC or EBIC. We illustrate the application of the MCP-logistic regression with the CV-AUC criterion on three microarray datasets from the studies that attempt to identify genes related to cancers. Our simulation studies and data examples demonstrate that the CV-AUC is an attractive method for tuning parameter selection for penalized methods in high-dimensional logistic regression models.
Detecting DIF in Polytomous Items Using MACS, IRT and Ordinal Logistic Regression
ERIC Educational Resources Information Center
Elosua, Paula; Wells, Craig
2013-01-01
The purpose of the present study was to compare the Type I error rate and power of two model-based procedures, the mean and covariance structure model (MACS) and the item response theory (IRT), and an observed-score based procedure, ordinal logistic regression, for detecting differential item functioning (DIF) in polytomous items. A simulation…
Multiple Logistic Regression Analysis of Cigarette Use among High School Students
ERIC Educational Resources Information Center
Adwere-Boamah, Joseph
2011-01-01
A binary logistic regression analysis was performed to predict high school students' cigarette smoking behavior from selected predictors from 2009 CDC Youth Risk Behavior Surveillance Survey. The specific target student behavior of interest was frequent cigarette use. Five predictor variables included in the model were: a) race, b) frequency of…
Heteroscedastic Extended Logistic Regression for Post-Processing of Ensemble Guidance
NASA Astrophysics Data System (ADS)
Messner, Jakob W.; Mayr, Georg J.; Wilks, Daniel S.; Zeileis, Achim
2014-05-01
To achieve well-calibrated probabilistic weather forecasts, numerical ensemble forecasts are often statistically post-processed. One recent ensemble-calibration method is extended logistic regression which extends the popular logistic regression to yield full probability distribution forecasts. Although the purpose of this method is to post-process ensemble forecasts, usually only the ensemble mean is used as predictor variable, whereas the ensemble spread is neglected because it does not improve the forecasts. In this study we show that when simply used as ordinary predictor variable in extended logistic regression, the ensemble spread only affects the location but not the variance of the predictive distribution. Uncertainty information contained in the ensemble spread is therefore not utilized appropriately. To solve this drawback we propose a new approach where the ensemble spread is directly used to predict the dispersion of the predictive distribution. With wind speed data and ensemble forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF) we show that using this approach, the ensemble spread can be used effectively to improve forecasts from extended logistic regression.
ERIC Educational Resources Information Center
Courtney, Jon R.; Prophet, Retta
2011-01-01
Placement instability is often associated with a number of negative outcomes for children. To gain state level contextual knowledge of factors associated with placement stability/instability, logistic regression was applied to selected variables from the New Mexico Adoption and Foster Care Administrative Reporting System dataset. Predictors…
An Additional Measure of Overall Effect Size for Logistic Regression Models
ERIC Educational Resources Information Center
Allen, Jeff; Le, Huy
2008-01-01
Users of logistic regression models often need to describe the overall predictive strength, or effect size, of the model's predictors. Analogs of R[superscript 2] have been developed, but none of these measures are interpretable on the same scale as effects of individual predictors. Furthermore, R[superscript 2] analogs are not invariant to the…
ERIC Educational Resources Information Center
Carney, Michelle Mohr; Buttell, Frederick P.; Muldoon, John
2006-01-01
The purpose of this study was to (1) create a predictive model that would correctly identify men at greatest risk of dropping out of a court-mandated, batterer intervention program; and, (2) explore the creation of such a logistic regression model using a set of instruments that were different from those used in previous research. Method: The…
ERIC Educational Resources Information Center
Anderson, Carolyn J.; Verkuilen, Jay; Peyton, Buddy L.
2010-01-01
Survey items with multiple response categories and multiple-choice test questions are ubiquitous in psychological and educational research. We illustrate the use of log-multiplicative association (LMA) models that are extensions of the well-known multinomial logistic regression model for multiple dependent outcome variables to reanalyze a set of…
ERIC Educational Resources Information Center
Rudner, Lawrence
2016-01-01
In the machine learning literature, it is commonly accepted as fact that as calibration sample sizes increase, Naïve Bayes classifiers initially outperform Logistic Regression classifiers in terms of classification accuracy. Applied to subtests from an on-line final examination and from a highly regarded certification examination, this study shows…
Propensity Score Estimation with Data Mining Techniques: Alternatives to Logistic Regression
ERIC Educational Resources Information Center
Keller, Bryan S. B.; Kim, Jee-Seon; Steiner, Peter M.
2013-01-01
Propensity score analysis (PSA) is a methodological technique which may correct for selection bias in a quasi-experiment by modeling the selection process using observed covariates. Because logistic regression is well understood by researchers in a variety of fields and easy to implement in a number of popular software packages, it has…
ERIC Educational Resources Information Center
French, Brian F.; Maller, Susan J.
2007-01-01
Two unresolved implementation issues with logistic regression (LR) for differential item functioning (DIF) detection include ability purification and effect size use. Purification is suggested to control inaccuracies in DIF detection as a result of DIF items in the ability estimate. Additionally, effect size use may be beneficial in controlling…
ERIC Educational Resources Information Center
Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza
2014-01-01
This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…
Comparison of IRT Likelihood Ratio Test and Logistic Regression DIF Detection Procedures
ERIC Educational Resources Information Center
Atar, Burcu; Kamata, Akihito
2011-01-01
The Type I error rates and the power of IRT likelihood ratio test and cumulative logit ordinal logistic regression procedures in detecting differential item functioning (DIF) for polytomously scored items were investigated in this Monte Carlo simulation study. For this purpose, 54 simulation conditions (combinations of 3 sample sizes, 2 sample…
Comparison of Two Approaches for Handling Missing Covariates in Logistic Regression
ERIC Educational Resources Information Center
Peng, Chao-Ying Joanne; Zhu, Jin
2008-01-01
For the past 25 years, methodological advances have been made in missing data treatment. Most published work has focused on missing data in dependent variables under various conditions. The present study seeks to fill the void by comparing two approaches for handling missing data in categorical covariates in logistic regression: the…
ERIC Educational Resources Information Center
Magis, David; Raiche, Gilles; Beland, Sebastien; Gerard, Paul
2011-01-01
We present an extension of the logistic regression procedure to identify dichotomous differential item functioning (DIF) in the presence of more than two groups of respondents. Starting from the usual framework of a single focal group, we propose a general approach to estimate the item response functions in each group and to test for the presence…
A Note on Three Statistical Tests in the Logistic Regression DIF Procedure
ERIC Educational Resources Information Center
Paek, Insu
2012-01-01
Although logistic regression became one of the well-known methods in detecting differential item functioning (DIF), its three statistical tests, the Wald, likelihood ratio (LR), and score tests, which are readily available under the maximum likelihood, do not seem to be consistently distinguished in DIF literature. This paper provides a clarifying…
ERIC Educational Resources Information Center
Schumacher, Phyllis; Olinsky, Alan; Quinn, John; Smith, Richard
2010-01-01
The authors extended previous research by 2 of the authors who conducted a study designed to predict the successful completion of students enrolled in an actuarial program. They used logistic regression to determine the probability of an actuarial student graduating in the major or dropping out. They compared the results of this study with those…
Risk Factors of Falls in Community-Dwelling Older Adults: Logistic Regression Tree Analysis
ERIC Educational Resources Information Center
Yamashita, Takashi; Noe, Douglas A.; Bailer, A. John
2012-01-01
Purpose of the Study: A novel logistic regression tree-based method was applied to identify fall risk factors and possible interaction effects of those risk factors. Design and Methods: A nationally representative sample of American older adults aged 65 years and older (N = 9,592) in the Health and Retirement Study 2004 and 2006 modules was used.…
The purpose of this report is to provide a reference manual that could be used by investigators for making informed use of logistic regression using two methods (standard logistic regression and MARS). The details for analyses of relationships between a dependent binary response ...
Logistic regression by means of evolutionary radial basis function neural networks.
Gutierrez, Pedro Antonio; Hervas-Martinez, César; Martinez-Estudillo, Francisco J
2011-02-01
This paper proposes a hybrid multilogistic methodology, named logistic regression using initial and radial basis function (RBF) covariates. The process for obtaining the coefficients is carried out in three steps. First, an evolutionary programming (EP) algorithm is applied, in order to produce an RBF neural network (RBFNN) with a reduced number of RBF transformations and the simplest structure possible. Then, the initial attribute space (or, as commonly known as in logistic regression literature, the covariate space) is transformed by adding the nonlinear transformations of the input variables given by the RBFs of the best individual in the final generation. Finally, a maximum likelihood optimization method determines the coefficients associated with a multilogistic regression model built in this augmented covariate space. In this final step, two different multilogistic regression algorithms are applied: one considers all initial and RBF covariates (multilogistic initial-RBF regression) and the other one incrementally constructs the model and applies cross validation, resulting in an automatic covariate selection [simplelogistic initial-RBF regression (SLIRBF)]. Both methods include a regularization parameter, which has been also optimized. The methodology proposed is tested using 18 benchmark classification problems from well-known machine learning problems and two real agronomical problems. The results are compared with the corresponding multilogistic regression methods applied to the initial covariate space, to the RBFNNs obtained by the EP algorithm, and to other probabilistic classifiers, including different RBFNN design methods [e.g., relaxed variable kernel density estimation, support vector machines, a sparse classifier (sparse multinomial logistic regression)] and a procedure similar to SLIRBF but using product unit basis functions. The SLIRBF models are found to be competitive when compared with the corresponding multilogistic regression methods and the
Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De
2014-01-01
As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.
Goo, Yeong-Jia James; Shen, Zone-De
2014-01-01
As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338
Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De
2014-01-01
As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338
Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula
2011-01-01
Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.
Nonlinear logistic regression model for outcomes after endourologic procedures: a novel predictor.
Kadlec, Adam O; Ohlander, Samuel; Hotaling, James; Hannick, Jessica; Niederberger, Craig; Turk, Thomas M
2014-08-01
The purpose of this study was to design a thorough and practical nonlinear logistic regression model that can be used for outcome prediction after various forms of endourologic intervention. Input variables and outcome data from 382 renal units endourologically treated at a single institution were used to build and cross-validate an independently designed nonlinear logistic regression model. Model outcomes were stone-free status and need for a secondary procedure. The model predicted stone-free status with sensitivity 75.3% and specificity 60.4%, yielding a positive predictive value (PPV) of 75.3% and negative predictive value (NPV) of 60.4%, with classification accuracy of 69.6%. Receiver operating characteristic area under the curve (ROC AUC) was 0.749. The model predicted the need for a secondary procedure with sensitivity 30% and specificity 98.3%, yielding a PPV of 60% and NPV of 94.2%. ROC AUC was 0.863. The model had equivalent predictive value to a traditional logistic regression model for the secondary procedure outcome. This study is proof-of-concept that a nonlinear regression model adequately predicts key clinical outcomes after shockwave lithotripsy, ureteroscopic lithotripsy, and percutaneous nephrolithotomy. This model holds promise for further optimization via dataset expansion, preferably with multi-institutional data, and could be developed into a predictive nomogram in the future.
Aranda, Diana; Lopez, Jose V; Solo-Gabriele, Helena M; Fleisher, Jay M
2016-02-01
Recreational water quality surveillance involves comparing bacterial levels to set threshold values to determine beach closure. Bacterial levels can be predicted through models which are traditionally based upon multiple linear regression. The objective of this study was to evaluate exceedance probabilities, as opposed to bacterial levels, as an alternate method to express beach risk. Data were incorporated into a logistic regression for the purpose of identifying environmental parameters most closely correlated with exceedance probabilities. The analysis was based on 7,422 historical sample data points from the years 2000-2010 for 15 South Florida beach sample sites. Probability analyses showed which beaches in the dataset were most susceptible to exceedances. No yearly trends were observed nor were any relationships apparent with monthly rainfall or hurricanes. Results from logistic regression analyses found that among the environmental parameters evaluated, tide was most closely associated with exceedances, with exceedances 2.475 times more likely to occur at high tide compared to low tide. The logistic regression methodology proved useful for predicting future exceedances at a beach location in terms of probability and modeling water quality environmental parameters with dependence on a binary response. This methodology can be used by beach managers for allocating resources when sampling more than one beach. PMID:26837832
Aranda, Diana; Lopez, Jose V; Solo-Gabriele, Helena M; Fleisher, Jay M
2016-02-01
Recreational water quality surveillance involves comparing bacterial levels to set threshold values to determine beach closure. Bacterial levels can be predicted through models which are traditionally based upon multiple linear regression. The objective of this study was to evaluate exceedance probabilities, as opposed to bacterial levels, as an alternate method to express beach risk. Data were incorporated into a logistic regression for the purpose of identifying environmental parameters most closely correlated with exceedance probabilities. The analysis was based on 7,422 historical sample data points from the years 2000-2010 for 15 South Florida beach sample sites. Probability analyses showed which beaches in the dataset were most susceptible to exceedances. No yearly trends were observed nor were any relationships apparent with monthly rainfall or hurricanes. Results from logistic regression analyses found that among the environmental parameters evaluated, tide was most closely associated with exceedances, with exceedances 2.475 times more likely to occur at high tide compared to low tide. The logistic regression methodology proved useful for predicting future exceedances at a beach location in terms of probability and modeling water quality environmental parameters with dependence on a binary response. This methodology can be used by beach managers for allocating resources when sampling more than one beach.
Bayesian logistic regression in detection of gene-steroid interaction for cancer at PDLIM5 locus.
Wang, Ke-Sheng; Owusu, Daniel; Pan, Yue; Xie, Changchun
2016-06-01
The PDZ and LIM domain 5 (PDLIM5) gene may play a role in cancer, bipolar disorder, major depression, alcohol dependence and schizophrenia; however, little is known about the interaction effect of steroid and PDLIM5 gene on cancer. This study examined 47 single-nucleotide polymorphisms (SNPs) within the PDLIM5 gene in the Marshfield sample with 716 cancer patients (any diagnosed cancer, excluding minor skin cancer) and 2848 noncancer controls. Multiple logistic regression model in PLINK software was used to examine the association of each SNP with cancer. Bayesian logistic regression in PROC GENMOD in SAS statistical software, ver. 9.4 was used to detect gene- steroid interactions influencing cancer. Single marker analysis using PLINK identified 12 SNPs associated with cancer (P< 0.05); especially, SNP rs6532496 revealed the strongest association with cancer (P = 6.84 × 10⁻³); while the next best signal was rs951613 (P = 7.46 × 10⁻³). Classic logistic regression in PROC GENMOD showed that both rs6532496 and rs951613 revealed strong gene-steroid interaction effects (OR=2.18, 95% CI=1.31-3.63 with P = 2.9 × 10⁻³ for rs6532496 and OR=2.07, 95% CI=1.24-3.45 with P = 5.43 × 10⁻³ for rs951613, respectively). Results from Bayesian logistic regression showed stronger interaction effects (OR=2.26, 95% CI=1.2-3.38 for rs6532496 and OR=2.14, 95% CI=1.14-3.2 for rs951613, respectively). All the 12 SNPs associated with cancer revealed significant gene-steroid interaction effects (P < 0.05); whereas 13 SNPs showed gene-steroid interaction effects without main effect on cancer. SNP rs4634230 revealed the strongest gene-steroid interaction effect (OR=2.49, 95% CI=1.5-4.13 with P = 4.0 × 10⁻⁴ based on the classic logistic regression and OR=2.59, 95% CI=1.4-3.97 from Bayesian logistic regression; respectively). This study provides evidence of common genetic variants within the PDLIM5 gene and interactions between PLDIM5 gene polymorphisms and steroid use
Phillips, Kirk T.; Street, W. Nick
2005-01-01
The purpose of this study is to determine the best prediction of heart failure outcomes, resulting from two methods -- standard epidemiologic analysis with logistic regression and knowledge discovery with supervised learning/data mining. Heart failure was chosen for this study as it exhibits higher prevalence and cost of treatment than most other hospitalized diseases. The prevalence of heart failure has exceeded 4 million cases in the U.S.. Findings of this study should be useful for the design of quality improvement initiatives, as particular aspects of patient comorbidity and treatment are found to be associated with mortality. This is also a proof of concept study, considering the feasibility of emerging health informatics methods of data mining in conjunction with or in lieu of traditional logistic regression methods of prediction. Findings may also support the design of decision support systems and quality improvement programming for other diseases. PMID:16779367
Wang, Shuang; Jiang, Xiaoqian; Wu, Yuan; Cui, Lijuan; Cheng, Samuel; Ohno-Machado, Lucila
2013-01-01
We developed an EXpectation Propagation LOgistic REgRession (EXPLORER) model for distributed privacy-preserving online learning. The proposed framework provides a high level guarantee for protecting sensitive information, since the information exchanged between the server and the client is the encrypted posterior distribution of coefficients. Through experimental results, EXPLORER shows the same performance (e.g., discrimination, calibration, feature selection etc.) as the traditional frequentist Logistic Regression model, but provides more flexibility in model updating. That is, EXPLORER can be updated one point at a time rather than having to retrain the entire data set when new observations are recorded. The proposed EXPLORER supports asynchronized communication, which relieves the participants from coordinating with one another, and prevents service breakdown from the absence of participants or interrupted communications. PMID:23562651
Predicting Student Success on the Texas Chemistry STAAR Test: A Logistic Regression Analysis
ERIC Educational Resources Information Center
Johnson, William L.; Johnson, Annabel M.; Johnson, Jared
2012-01-01
Background: The context is the new Texas STAAR end-of-course testing program. Purpose: The authors developed a logistic regression model to predict who would pass-or-fail the new Texas chemistry STAAR end-of-course exam. Setting: Robert E. Lee High School (5A) with an enrollment of 2700 students, Tyler, Texas. Date of the study was the 2011-2012…
Direkvand-Moghadam, Ashraf; Suhrabi, Zainab; Akbari, Malihe
2016-01-01
Background Female sexual dysfunction, which can occur during any stage of a normal sexual activity, is a serious condition for individuals and couples. The present study aimed to determine the prevalence and predictive factors of female sexual dysfunction in women referred to health centers in Ilam, the Western Iran, in 2014. Methods In the present cross-sectional study, 444 women who attended health centers in Ilam were enrolled from May to September 2014. Participants were selected according to the simple random sampling method. Univariate and multivariate logistic regression analyses were used to predict the risk factors of female sexual dysfunction. Diffe rences with an alpha error of 0.05 were regarded as statistically significant. Results Overall, 75.9% of the study population exhibited sexual dysfunction. Univariate logistic regression analysis demonstrated that there was a significant association between female sexual dysfunction and age, menarche age, gravidity, parity, and education (P<0.05). Multivariate logistic regression analysis indicated that, menarche age (odds ratio, 1.26), education level (odds ratio, 1.71), and gravida (odds ratio, 1.59) were independent predictive variables for female sexual dysfunction. Conclusion The majority of Iranian women suffer from sexual dysfunction. A lack of awareness of Iranian women's sexual pleasure and formal training on sexual function and its influencing factors, such as menarche age, gravida, and level of education, may lead to a high prevalence of female sexual dysfunction. PMID:27688863
Lee, Saro
2004-08-01
For landslide susceptibility mapping, this study applied and verified a Bayesian probability model, a likelihood ratio and statistical model, and logistic regression to Janghung, Korea, using a Geographic Information System (GIS). Landslide locations were identified in the study area from interpretation of IRS satellite imagery and field surveys; and a spatial database was constructed from topographic maps, soil type, forest cover, geology and land cover. The factors that influence landslide occurrence, such as slope gradient, slope aspect, and curvature of topography, were calculated from the topographic database. Soil texture, material, drainage, and effective depth were extracted from the soil database, while forest type, diameter, and density were extracted from the forest database. Land cover was classified from Landsat TM satellite imagery using unsupervised classification. The likelihood ratio and logistic regression coefficient were overlaid to determine each factor's rating for landslide susceptibility mapping. Then the landslide susceptibility map was verified and compared with known landslide locations. The logistic regression model had higher prediction accuracy than the likelihood ratio model. The method can be used to reduce hazards associated with landslides and to land cover planning.
Direkvand-Moghadam, Ashraf; Suhrabi, Zainab; Akbari, Malihe
2016-01-01
Background Female sexual dysfunction, which can occur during any stage of a normal sexual activity, is a serious condition for individuals and couples. The present study aimed to determine the prevalence and predictive factors of female sexual dysfunction in women referred to health centers in Ilam, the Western Iran, in 2014. Methods In the present cross-sectional study, 444 women who attended health centers in Ilam were enrolled from May to September 2014. Participants were selected according to the simple random sampling method. Univariate and multivariate logistic regression analyses were used to predict the risk factors of female sexual dysfunction. Diffe rences with an alpha error of 0.05 were regarded as statistically significant. Results Overall, 75.9% of the study population exhibited sexual dysfunction. Univariate logistic regression analysis demonstrated that there was a significant association between female sexual dysfunction and age, menarche age, gravidity, parity, and education (P<0.05). Multivariate logistic regression analysis indicated that, menarche age (odds ratio, 1.26), education level (odds ratio, 1.71), and gravida (odds ratio, 1.59) were independent predictive variables for female sexual dysfunction. Conclusion The majority of Iranian women suffer from sexual dysfunction. A lack of awareness of Iranian women's sexual pleasure and formal training on sexual function and its influencing factors, such as menarche age, gravida, and level of education, may lead to a high prevalence of female sexual dysfunction.
A comparative study on entrepreneurial attitudes modeled with logistic regression and Bayes nets.
López Puga, Jorge; García García, Juan
2012-11-01
Entrepreneurship research is receiving increasing attention in our context, as entrepreneurs are key social agents involved in economic development. We compare the success of the dichotomic logistic regression model and the Bayes simple classifier to predict entrepreneurship, after manipulating the percentage of missing data and the level of categorization in predictors. A sample of undergraduate university students (N = 1230) completed five scales (motivation, attitude towards business creation, obstacles, deficiencies, and training needs) and we found that each of them predicted different aspects of the tendency to business creation. Additionally, our results show that the receiver operating characteristic (ROC) curve is affected by the rate of missing data in both techniques, but logistic regression seems to be more vulnerable when faced with missing data, whereas Bayes nets underperform slightly when categorization has been manipulated. Our study sheds light on the potential entrepreneur profile and we propose to use Bayesian networks as an additional alternative to overcome the weaknesses of logistic regression when missing data are present in applied research. PMID:23156922
Guo, Chao-Yu; Chen, Yu-Jing; Chen, Yi-Hau
2014-07-01
One of the greatest challenges in genetic studies is the determination of gene-environment interactions due to underlying complications and inadequate statistical power. With the increased sample size gained by using case-parent trios and unrelated cases and controls, the performance may be much improved. Focusing on a dichotomous trait, a two-stage approach was previously proposed to deal with gene-environment interaction when utilizing mixed study samples. Theoretically, the two-stage association analysis uses likelihood functions such that the computational algorithms may not converge in the maximum likelihood estimation with small study samples. In an effort to avoid such convergence issues, we propose a logistic regression framework model, based on the combined haplotype relative risk (CHRR) method, which intuitively pools the case-parent trios and unrelated subjects in a two by two table. A positive feature of the logistic regression model is the effortless adjustment for either discrete or continuous covariates. According to computer simulations, under the circumstances in which the two-stage test converges in larger sample sizes, we discovered that the performances of the two tests were quite similar; the two-stage test is more powerful under the dominant and additive disease models, but the extended CHRR is more powerful under the recessive disease model. PMID:24766627
Austin, Peter C; Steyerberg, Ewout W
2014-02-10
Predicting the probability of the occurrence of a binary outcome or condition is important in biomedical research. While assessing discrimination is an essential issue in developing and validating binary prediction models, less attention has been paid to methods for assessing model calibration. Calibration refers to the degree of agreement between observed and predicted probabilities and is often assessed by testing for lack-of-fit. The objective of our study was to examine the ability of graphical methods to assess the calibration of logistic regression models. We examined lack of internal calibration, which was related to misspecification of the logistic regression model, and external calibration, which was related to an overfit model or to shrinkage of the linear predictor. We conducted an extensive set of Monte Carlo simulations with a locally weighted least squares regression smoother (i.e., the loess algorithm) to examine the ability of graphical methods to assess model calibration. We found that loess-based methods were able to provide evidence of moderate departures from linearity and indicate omission of a moderately strong interaction. Misspecification of the link function was harder to detect. Visual patterns were clearer with higher sample sizes, higher incidence of the outcome, or higher discrimination. Loess-based methods were also able to identify the lack of calibration in external validation samples when an overfit regression model had been used. In conclusion, loess-based smoothing methods are adequate tools to graphically assess calibration and merit wider application.
NASA Astrophysics Data System (ADS)
Ko, K.; Cheong, B.; Koh, D.
2010-12-01
Groundwater has been used a main source to provide a drinking water in a rural area with no regional potable water supply system in Korea. More than 50 percent of rural area residents depend on groundwater as drinking water. Thus, research on predicting groundwater pollution for the sustainable groundwater usage and protection from potential pollutants was demanded. This study was carried out to know the vulnerability of groundwater nitrate contamination reflecting the effect of land use in Nonsan city of a representative rural area of South Korea. About 47% of the study area is occupied by cultivated land with high vulnerable area to groundwater nitrate contamination because it has higher nitrogen fertilizer input of 62.3 tons/km2 than that of country’s average of 44.0 tons/km2. The two vulnerability assessment methods, logistic regression and DRASTIC model, were tested and compared to know more suitable techniques for the assessment of groundwater nitrate contamination in Nonsan area. The groundwater quality data were acquired from the collection of analyses of 111 samples of small potable supply system in the study area. The analyzed values of nitrate were classified by land use such as resident, upland, paddy, and field area. One dependent and two independent variables were addressed for logistic regression analysis. One dependent variable was a binary categorical data with 0 or 1 whether or not nitrate exceeding thresholds of 1 through 10 mg/L. The independent variables were one continuous data of slope indicating topography and multiple categorical data of land use which are classified by resident, upland, paddy, and field area. The results of the Levene’s test and T-test for slope and land use were showed the significant difference of mean values among groups in 95% confidence level. From the logistic regression, we could know the negative correlation between slope and nitrate which was caused by the decrease of contaminants inputs into groundwater with
Assessing Longitudinal Change: Adjustment for Regression to the Mean Effects
ERIC Educational Resources Information Center
Rocconi, Louis M.; Ethington, Corinna A.
2009-01-01
Pascarella (J Coll Stud Dev 47:508-520, 2006) has called for an increase in use of longitudinal data with pretest-posttest design when studying effects on college students. However, such designs that use multiple measures to document change are vulnerable to an important threat to internal validity, regression to the mean. Herein, we discuss a…
Hu, Jingwen; Chou, Clifford C.; Yang, King H.; King, Albert I.
2007-01-01
A weighted logistic regression with careful selection of crash, vehicle, occupant and injury data and sequentially adjusting the covariants, was used to investigate the predictors of the odds of head/face and neck (HFN) injuries during rollovers. The results show that unbelted occupants have statistically significant higher HFN injury risks than belted occupants. Age, number of quarter-turns, rollover initiation type, maximum lateral deformation adjacent to the occupant, A-pillar and B-pillar deformation are significant predictors of HFN injury odds for belted occupants. Age, rollover leading side and windshield header deformation are significant predictors of HFN injury odds for unbelted occupants. The results also show that the significant predictors are different between head/face (HF) and neck injury odds, indicating the injury mechanisms of HF and neck injuries are different. PMID:18184502
KUPPER, Lawrence L.
2012-01-01
A common goal in environmental epidemiologic studies is to undertake logistic regression modeling to associate a continuous measure of exposure with binary disease status, adjusting for covariates. A frequent complication is that exposure may only be measurable indirectly, through a collection of subject-specific variables assumed associated with it. Motivated by a specific study to investigate the association between lung function and exposure to metal working fluids, we focus on a multiplicative-lognormal structural measurement error scenario and approaches to address it when external validation data are available. Conceptually, we emphasize the case in which true untransformed exposure is of interest in modeling disease status, but measurement error is additive on the log scale and thus multiplicative on the raw scale. Methodologically, we favor a pseudo-likelihood (PL) approach that exhibits fewer computational problems than direct full maximum likelihood (ML) yet maintains consistency under the assumed models without necessitating small exposure effects and/or small measurement error assumptions. Such assumptions are required by computationally convenient alternative methods like regression calibration (RC) and ML based on probit approximations. We summarize simulations demonstrating considerable potential for bias in the latter two approaches, while supporting the use of PL across a variety of scenarios. We also provide accessible strategies for obtaining adjusted standard errors to accompany RC and PL estimates. PMID:24027381
Procedures for adjusting regional regression models of urban-runoff quality using local data
Hoos, Anne B.; Lizarraga, Joy S.
1996-01-01
Statistical operations termed model-adjustment procedures can be used to incorporate local data into existing regression modes to improve the predication of urban-runoff quality. Each procedure is a form of regression analysis in which the local data base is used as a calibration data set; the resulting adjusted regression models can then be used to predict storm-runoff quality at unmonitored sites. Statistical tests of the calibration data set guide selection among proposed procedures.
Stiglic, Gregor; Povalej Brzan, Petra; Fijacko, Nino; Wang, Fei; Delibasic, Boris; Kalousis, Alexandros; Obradovic, Zoran
2015-01-01
Different studies have demonstrated the importance of comorbidities to better understand the origin and evolution of medical complications. This study focuses on improvement of the predictive model interpretability based on simple logical features representing comorbidities. We use group lasso based feature interaction discovery followed by a post-processing step, where simple logic terms are added. In the final step, we reduce the feature set by applying lasso logistic regression to obtain a compact set of non-zero coefficients that represent a more comprehensible predictive model. The effectiveness of the proposed approach was demonstrated on a pediatric hospital discharge dataset that was used to build a readmission risk estimation model. The evaluation of the proposed method demonstrates a reduction of the initial set of features in a regression model by 72%, with a slight improvement in the Area Under the ROC Curve metric from 0.763 (95% CI: 0.755-0.771) to 0.769 (95% CI: 0.761-0.777). Additionally, our results show improvement in comprehensibility of the final predictive model using simple comorbidity based terms for logistic regression.
Comparing the importance of prognostic factors in Cox and logistic regression using SAS.
Heinze, Georg; Schemper, Michael
2003-06-01
Two SAS macro programs are presented that evaluate the relative importance of prognostic factors in the proportional hazards regression model and in the logistic regression model. The importance of a prognostic factor is quantified by the proportion of variation in the outcome attributable to this factor. For proportional hazards regression, the program %RELIMPCR uses the recently proposed measure V to calculate the proportion of explained variation (PEV). For the logistic model, the R(2) measure based on squared raw residuals is used by the program %RELIMPLR. Both programs are able to compute marginal and partial PEV, to compare PEVs of factors, of groups of factors, and even to compare PEVs of different models. The programs use a bootstrap resampling scheme to test differences of the PEVs of different factors. Confidence limits for P-values are provided. The programs further allow to base the computation of PEV on models with shrinked or bias-corrected parameter estimates. The SAS macros are freely available at www.akh-wien.ac.at/imc/biometrie/relimp
Olson, Scott A.; Brouillette, Michael C.
2006-01-01
A logistic regression equation was developed for estimating the probability of a stream flowing intermittently at unregulated, rural stream sites in Vermont. These determinations can be used for a wide variety of regulatory and planning efforts at the Federal, State, regional, county and town levels, including such applications as assessing fish and wildlife habitats, wetlands classifications, recreational opportunities, water-supply potential, waste-assimilation capacities, and sediment transport. The equation will be used to create a derived product for the Vermont Hydrography Dataset having the streamflow characteristic of 'intermittent' or 'perennial.' The Vermont Hydrography Dataset is Vermont's implementation of the National Hydrography Dataset and was created at a scale of 1:5,000 based on statewide digital orthophotos. The equation was developed by relating field-verified perennial or intermittent status of a stream site during normal summer low-streamflow conditions in the summer of 2005 to selected basin characteristics of naturally flowing streams in Vermont. The database used to develop the equation included 682 stream sites with drainage areas ranging from 0.05 to 5.0 square miles. When the 682 sites were observed, 126 were intermittent (had no flow at the time of the observation) and 556 were perennial (had flowing water at the time of the observation). The results of the logistic regression analysis indicate that the probability of a stream having intermittent flow in Vermont is a function of drainage area, elevation of the site, the ratio of basin relief to basin perimeter, and the areal percentage of well- and moderately well-drained soils in the basin. Using a probability cutpoint (a lower probability indicates the site has perennial flow and a higher probability indicates the site has intermittent flow) of 0.5, the logistic regression equation correctly predicted the perennial or intermittent status of 116 test sites 85 percent of the time.
A secure distributed logistic regression protocol for the detection of rare adverse drug events
El Emam, Khaled; Samet, Saeed; Arbuckle, Luk; Tamblyn, Robyn; Earle, Craig; Kantarcioglu, Murat
2013-01-01
Background There is limited capacity to assess the comparative risks of medications after they enter the market. For rare adverse events, the pooling of data from multiple sources is necessary to have the power and sufficient population heterogeneity to detect differences in safety and effectiveness in genetic, ethnic and clinically defined subpopulations. However, combining datasets from different data custodians or jurisdictions to perform an analysis on the pooled data creates significant privacy concerns that would need to be addressed. Existing protocols for addressing these concerns can result in reduced analysis accuracy and can allow sensitive information to leak. Objective To develop a secure distributed multi-party computation protocol for logistic regression that provides strong privacy guarantees. Methods We developed a secure distributed logistic regression protocol using a single analysis center with multiple sites providing data. A theoretical security analysis demonstrates that the protocol is robust to plausible collusion attacks and does not allow the parties to gain new information from the data that are exchanged among them. The computational performance and accuracy of the protocol were evaluated on simulated datasets. Results The computational performance scales linearly as the dataset sizes increase. The addition of sites results in an exponential growth in computation time. However, for up to five sites, the time is still short and would not affect practical applications. The model parameters are the same as the results on pooled raw data analyzed in SAS, demonstrating high model accuracy. Conclusion The proposed protocol and prototype system would allow the development of logistic regression models in a secure manner without requiring the sharing of personal health information. This can alleviate one of the key barriers to the establishment of large-scale post-marketing surveillance programs. We extended the secure protocol to account for
Analyzing Beijing's in-use vehicle emissions test results using logistic regression.
Chang, Cheng; Ortolano, Leonard
2008-10-01
A logistic regression model was built using vehicle emissions test data collected in 2003 for 129 604 motor vehicles in Beijing. The regression model uses vehicle model, model year, inspection station, ownership, and vehicle registration area as covariates to predict the probability that a vehicle fails an annual emissions test on the first try. Vehicle model is the most influential predictor variable: some vehicle models are much more likely to fail in emissions tests than an "average" vehicle. Five out of 14 vehicle models that performed the worst (out of a total of 52 models) were manufactured by foreign companies or by their joint ventures with Chinese enterprises. These 14 vehicle model types may have failed at relatively high rates because of design and manufacturing deficiencies, and such deficiencies cannot be easily detected and corrected without further efforts, such as programs for in-use surveillance and vehicle recall. PMID:18939563
Logistic Regression-Based Trichotomous Classification Tree and Its Application in Medical Diagnosis.
Zhu, Yanke; Fang, Jiqian
2016-11-01
The classification tree is a valuable methodology for predictive modeling and data mining. However, the current existing classification trees ignore the fact that there might be a subset of individuals who cannot be well classified based on the information of the given set of predictor variables and who might be classified with a higher error rate; most of the current existing classification trees do not use the combination of variables in each step. An algorithm of a logistic regression-based trichotomous classification tree (LRTCT) is proposed that employs the trichotomous tree structure and the linear combination of predictor variables in the recursive partitioning process. Compared with the widely used classification and regression tree through the applications on a series of simulated data and 2 real data sets, the LRTCT performed better in several aspects and does not require excessive complicated calculations.
Chen, Guangchao; Li, Xuehua; Chen, Jingwen; Zhang, Ya-Nan; Peijnenburg, Willie J G M
2014-12-01
Biodegradation is the principal environmental dissipation process of chemicals. As such, it is a dominant factor determining the persistence and fate of organic chemicals in the environment, and is therefore of critical importance to chemical management and regulation. In the present study, the authors developed in silico methods assessing biodegradability based on a large heterogeneous set of 825 organic compounds, using the techniques of the C4.5 decision tree, the functional inner regression tree, and logistic regression. External validation was subsequently carried out by 2 independent test sets of 777 and 27 chemicals. As a result, the functional inner regression tree exhibited the best predictability with predictive accuracies of 81.5% and 81.0%, respectively, on the training set (825 chemicals) and test set I (777 chemicals). Performance of the developed models on the 2 test sets was subsequently compared with that of the Estimation Program Interface (EPI) Suite Biowin 5 and Biowin 6 models, which also showed a better predictability of the functional inner regression tree model. The model built in the present study exhibits a reasonable predictability compared with existing models while possessing a transparent algorithm. Interpretation of the mechanisms of biodegradation was also carried out based on the models developed.
Ordinal logistic regression analysis on the nutritional status of children in KarangKitri village
NASA Astrophysics Data System (ADS)
Ohyver, Margaretha; Yongharto, Kimmy Octavian
2015-09-01
Ordinal logistic regression is a statistical technique that can be used to describe the relationship between ordinal response variable with one or more independent variables. This method has been used in various fields including in the health field. In this research, ordinal logistic regression is used to describe the relationship between nutritional status of children with age, gender, height, and family status. Nutritional status of children in this research is divided into over nutrition, well nutrition, less nutrition, and malnutrition. The purpose for this research is to describe the characteristics of children in the KarangKitri Village and to determine the factors that influence the nutritional status of children in the KarangKitri village. There are three things that obtained from this research. First, there are still children who are not categorized as well nutritional status. Second, there are children who come from sufficient economic level which include in not normal status. Third, the factors that affect the nutritional level of children are age, family status, and height.
Gayou, Olivier; Das, Shiva K; Zhou, Su-Min; Marks, Lawrence B; Parda, David S; Miften, Moyed
2008-12-01
A given outcome of radiotherapy treatment can be modeled by analyzing its correlation with a combination of dosimetric, physiological, biological, and clinical factors, through a logistic regression fit of a large patient population. The quality of the fit is measured by the combination of the predictive power of this particular set of factors and the statistical significance of the individual factors in the model. We developed a genetic algorithm (GA), in which a small sample of all the possible combinations of variables are fitted to the patient data. New models are derived from the best models, through crossover and mutation operations, and are in turn fitted. The process is repeated until the sample converges to the combination of factors that best predicts the outcome. The GA was tested on a data set that investigated the incidence of lung injury in NSCLC patients treated with 3DCRT. The GA identified a model with two variables as the best predictor of radiation pneumonitis: the V30 (p=0.048) and the ongoing use of tobacco at the time of referral (p=0.074). This two-variable model was confirmed as the best model by analyzing all possible combinations of factors. In conclusion, genetic algorithms provide a reliable and fast way to select significant factors in logistic regression analysis of large clinical studies.
Gayou, Olivier; Das, Shiva K.; Zhou, Su-Min; Marks, Lawrence B.; Parda, David S.; Miften, Moyed
2008-01-01
A given outcome of radiotherapy treatment can be modeled by analyzing its correlation with a combination of dosimetric, physiological, biological, and clinical factors, through a logistic regression fit of a large patient population. The quality of the fit is measured by the combination of the predictive power of this particular set of factors and the statistical significance of the individual factors in the model. We developed a genetic algorithm (GA), in which a small sample of all the possible combinations of variables are fitted to the patient data. New models are derived from the best models, through crossover and mutation operations, and are in turn fitted. The process is repeated until the sample converges to the combination of factors that best predicts the outcome. The GA was tested on a data set that investigated the incidence of lung injury in NSCLC patients treated with 3DCRT. The GA identified a model with two variables as the best predictor of radiation pneumonitis: the V30 (p=0.048) and the ongoing use of tobacco at the time of referral (p=0.074). This two-variable model was confirmed as the best model by analyzing all possible combinations of factors. In conclusion, genetic algorithms provide a reliable and fast way to select significant factors in logistic regression analysis of large clinical studies. PMID:19175102
Regularization Paths for Conditional Logistic Regression: The clogitL1 Package
Reid, Stephen; Tibshirani, Rob
2014-01-01
We apply the cyclic coordinate descent algorithm of Friedman, Hastie, and Tibshirani (2010) to the fitting of a conditional logistic regression model with lasso (ℓ1) and elastic net penalties. The sequential strong rules of Tibshirani, Bien, Hastie, Friedman, Taylor, Simon, and Tibshirani (2012) are also used in the algorithm and it is shown that these offer a considerable speed up over the standard coordinate descent algorithm with warm starts. Once implemented, the algorithm is used in simulation studies to compare the variable selection and prediction performance of the conditional logistic regression model against that of its unconditional (standard) counterpart. We find that the conditional model performs admirably on datasets drawn from a suitable conditional distribution, outperforming its unconditional counterpart at variable selection. The conditional model is also fit to a small real world dataset, demonstrating how we obtain regularization paths for the parameters of the model and how we apply cross validation for this method where natural unconditional prediction rules are hard to come by. PMID:26257587
NASA Astrophysics Data System (ADS)
Jokar Arsanjani, Jamal; Helbich, Marco; Kainz, Wolfgang; Darvishi Boloorani, Ali
2013-04-01
This research analyses the suburban expansion in the metropolitan area of Tehran, Iran. A hybrid model consisting of logistic regression model, Markov chain (MC), and cellular automata (CA) was designed to improve the performance of the standard logistic regression model. Environmental and socio-economic variables dealing with urban sprawl were operationalised to create a probability surface of spatiotemporal states of built-up land use for the years 2006, 2016, and 2026. For validation, the model was evaluated by means of relative operating characteristic values for different sets of variables. The approach was calibrated for 2006 by cross comparing of actual and simulated land use maps. The achieved outcomes represent a match of 89% between simulated and actual maps of 2006, which was satisfactory to approve the calibration process. Thereafter, the calibrated hybrid approach was implemented for forthcoming years. Finally, future land use maps for 2016 and 2026 were predicted by means of this hybrid approach. The simulated maps illustrate a new wave of suburban development in the vicinity of Tehran at the western border of the metropolis during the next decades.
Predicting students' success at pre-university studies using linear and logistic regressions
NASA Astrophysics Data System (ADS)
Suliman, Noor Azizah; Abidin, Basir; Manan, Norhafizah Abdul; Razali, Ahmad Mahir
2014-09-01
The study is aimed to find the most suitable model that could predict the students' success at the medical pre-university studies, Centre for Foundation in Science, Languages and General Studies of Cyberjaya University College of Medical Sciences (CUCMS). The predictors under investigation were the national high school exit examination-Sijil Pelajaran Malaysia (SPM) achievements such as Biology, Chemistry, Physics, Additional Mathematics, Mathematics, English and Bahasa Malaysia results as well as gender and high school background factors. The outcomes showed that there is a significant difference in the final CGPA, Biology and Mathematics subjects at pre-university by gender factor, while by high school background also for Mathematics subject. In general, the correlation between the academic achievements at the high school and medical pre-university is moderately significant at α-level of 0.05, except for languages subjects. It was found also that logistic regression techniques gave better prediction models than the multiple linear regression technique for this data set. The developed logistic models were able to give the probability that is almost accurate with the real case. Hence, it could be used to identify successful students who are qualified to enter the CUCMS medical faculty before accepting any students to its foundation program.
Hill, Benjamin David; Womble, Melissa N; Rohling, Martin L
2015-01-01
This study utilized logistic regression to determine whether performance patterns on Concussion Vital Signs (CVS) could differentiate known groups with either genuine or feigned performance. For the embedded measure development group (n = 174), clinical patients and undergraduate students categorized as feigning obtained significantly lower scores on the overall test battery mean for the CVS, Shipley-2 composite score, and California Verbal Learning Test-Second Edition subtests than did genuinely performing individuals. The final full model of 3 predictor variables (Verbal Memory immediate hits, Verbal Memory immediate correct passes, and Stroop Test complex reaction time correct) was significant and correctly classified individuals in their known group 83% of the time (sensitivity = .65; specificity = .97) in a mixed sample of young-adult clinical cases and simulators. The CVS logistic regression function was applied to a separate undergraduate college group (n = 378) that was asked to perform genuinely and identified 5% as having possibly feigned performance indicating a low false-positive rate. The failure rate was 11% and 16% at baseline cognitive testing in samples of high school and college athletes, respectively. These findings have particular relevance given the increasing use of computerized test batteries for baseline cognitive testing and return-to-play decisions after concussion.
Heinze, Georg
2006-12-30
In logistic regression analysis of small or sparse data sets, results obtained by classical maximum likelihood methods cannot be generally trusted. In such analyses it may even happen that the likelihood meets the convergence criteria while at least one parameter estimate diverges to +/-infinity. This situation has been termed 'separation', and it typically occurs whenever no events are observed in one of the two groups defined by a dichotomous covariate. More generally, separation is caused by a linear combination of continuous or dichotomous covariates that perfectly separates events from non-events. Separation implies infinite or zero maximum likelihood estimates of odds ratios, which are usually considered unrealistic. I provide some examples of separation and near-separation in clinical data sets and discuss some options to analyse such data, including exact logistic regression analysis and a penalized likelihood approach. Both methods supply finite point estimates in case of separation. Profile penalized likelihood confidence intervals for parameters show excellent behaviour in terms of coverage probability and provide higher power than exact confidence intervals. General advantages of the penalized likelihood approach are discussed.
Adjustment of regional regression equations for urban storm-runoff quality using at-site data
Barks, C.S.
1996-01-01
Regional regression equations have been developed to estimate urban storm-runoff loads and mean concentrations using a national data base. Four statistical methods using at-site data to adjust the regional equation predictions were developed to provide better local estimates. The four adjustment procedures are a single-factor adjustment, a regression of the observed data against the predicted values, a regression of the observed values against the predicted values and additional local independent variables, and a weighted combination of a local regression with the regional prediction. Data collected at five representative storm-runoff sites during 22 storms in Little Rock, Arkansas, were used to verify, and, when appropriate, adjust the regional regression equation predictions. Comparison of observed values of stormrunoff loads and mean concentrations to the predicted values from the regional regression equations for nine constituents (chemical oxygen demand, suspended solids, total nitrogen as N, total ammonia plus organic nitrogen as N, total phosphorus as P, dissolved phosphorus as P, total recoverable copper, total recoverable lead, and total recoverable zinc) showed large prediction errors ranging from 63 percent to more than several thousand percent. Prediction errors for 6 of the 18 regional regression equations were less than 100 percent and could be considered reasonable for water-quality prediction equations. The regression adjustment procedure was used to adjust five of the regional equation predictions to improve the predictive accuracy. For seven of the regional equations the observed and the predicted values are not significantly correlated. Thus neither the unadjusted regional equations nor any of the adjustments were appropriate. The mean of the observed values was used as a simple estimator when the regional equation predictions and adjusted predictions were not appropriate.
Venugopal, G; Ramakrishnan, S
2014-01-01
In this work, an attempt has been made to differentiate surface electromyography signals under fatigue and non-fatigue conditions. Signals are recorded from the biceps brachii muscles of 50 healthy volunteers. A well-established experimental protocol is followed for this purpose. Signals are subjected to further processing and features namely amplitude of first burst, myopulse percentage rate, Willison amplitude, power spectrum ratio and variance of central frequency are extracted. Three types of logistic regression classifiers, linear logistic, polykernel logistic regression and multinomial regression with ridge estimator are used for automated analysis. Classifier parameters are tuned to enhance the accuracy and performance indices of algorithms, and are compared. The results show distinct values for extracted features in fatigue conditions which are statistically significant (0.0027 = P = 0.03). All classifiers are found to be effective in demarcating the signals. The linear logistic regression algorithm provides 79% accuracy with 40 iterations. However, in the case of multinomial regression with ridge estimator, only 7 iterations are required to achieve 80% accuracy. The polykernel logistic regression algorithm (0.06 = ? = 0.1) also provides 80% accuracy but with a marginal increment (1 % to 4 %) for precision, recall and specificity compared to other two classifiers.
Battaglin, W.A.
1996-01-01
Agricultural chemicals (herbicides, insecticides, other pesticides and fertilizers) in surface water may constitute a human health risk. Recent research on unregulated rivers in the midwestern USA documents that elevated concentrations of herbicides occur for 1-4 months following application in spring and early summer. In contrast, nitrate concentrations in unregulated rivers are elevated during the fall, winter and spring. Natural and anthropogenic variables of river drainage basins, such as soil permeability, the amount of agricultural chemicals applied or percentage of land planted in corn, affect agricultural chemical concentrations in rivers. Logistic regression (LGR) models are used to investigate relations between various drainage basin variables and the concentration of selected agricultural chemicals in rivers. The method is successful in contributing to the understanding of agricultural chemical concentration in rivers. Overall accuracies of the best LGR models, defined as the number of correct classifications divided by the number of attempted classifications, averaged about 66%.
Statistical modelling for thoracic surgery using a nomogram based on logistic regression
Liu, Run-Zhong; Zhao, Ze-Rui
2016-01-01
A well-developed clinical nomogram is a popular decision-tool, which can be used to predict the outcome of an individual, bringing benefits to both clinicians and patients. With just a few steps on a user-friendly interface, the approximate clinical outcome of patients can easily be estimated based on their clinical and laboratory characteristics. Therefore, nomograms have recently been developed to predict the different outcomes or even the survival rate at a specific time point for patients with different diseases. However, on the establishment and application of nomograms, there is still a lot of confusion that may mislead researchers. The objective of this paper is to provide a brief introduction on the history, definition, and application of nomograms and then to illustrate simple procedures to develop a nomogram with an example based on a multivariate logistic regression model in thoracic surgery. In addition, validation strategies and common pitfalls have been highlighted. PMID:27621910
Statistical modelling for thoracic surgery using a nomogram based on logistic regression.
Liu, Run-Zhong; Zhao, Ze-Rui; Ng, Calvin S H
2016-08-01
A well-developed clinical nomogram is a popular decision-tool, which can be used to predict the outcome of an individual, bringing benefits to both clinicians and patients. With just a few steps on a user-friendly interface, the approximate clinical outcome of patients can easily be estimated based on their clinical and laboratory characteristics. Therefore, nomograms have recently been developed to predict the different outcomes or even the survival rate at a specific time point for patients with different diseases. However, on the establishment and application of nomograms, there is still a lot of confusion that may mislead researchers. The objective of this paper is to provide a brief introduction on the history, definition, and application of nomograms and then to illustrate simple procedures to develop a nomogram with an example based on a multivariate logistic regression model in thoracic surgery. In addition, validation strategies and common pitfalls have been highlighted. PMID:27621910
Statistical modelling for thoracic surgery using a nomogram based on logistic regression
Liu, Run-Zhong; Zhao, Ze-Rui
2016-01-01
A well-developed clinical nomogram is a popular decision-tool, which can be used to predict the outcome of an individual, bringing benefits to both clinicians and patients. With just a few steps on a user-friendly interface, the approximate clinical outcome of patients can easily be estimated based on their clinical and laboratory characteristics. Therefore, nomograms have recently been developed to predict the different outcomes or even the survival rate at a specific time point for patients with different diseases. However, on the establishment and application of nomograms, there is still a lot of confusion that may mislead researchers. The objective of this paper is to provide a brief introduction on the history, definition, and application of nomograms and then to illustrate simple procedures to develop a nomogram with an example based on a multivariate logistic regression model in thoracic surgery. In addition, validation strategies and common pitfalls have been highlighted.
A Bayesian approach to a logistic regression model with incomplete information.
Choi, Taeryon; Schervish, Mark J; Schmitt, Ketra A; Small, Mitchell J
2008-06-01
We consider a set of independent Bernoulli trials with possibly different success probabilities that depend on covariate values. However, the available data consist only of aggregate numbers of successes among subsets of the trials along with all of the covariate values. We still wish to estimate the parameters of a modeled relationship between the covariates and the success probabilities, e.g., a logistic regression model. In this article, estimation of the parameters is made from a Bayesian perspective by using a Markov chain Monte Carlo algorithm based only on the available data. The proposed methodology is applied to both simulation studies and real data from a dose-response study of a toxic chemical, perchlorate.
Binary logistic regression modelling: Measuring the probability of relapse cases among drug addict
NASA Astrophysics Data System (ADS)
Ismail, Mohd Tahir; Alias, Siti Nor Shadila
2014-07-01
For many years Malaysia faced the drug addiction issues. The most serious case is relapse phenomenon among treated drug addict (drug addict who have under gone the rehabilitation programme at Narcotic Addiction Rehabilitation Centre, PUSPEN). Thus, the main objective of this study is to find the most significant factor that contributes to relapse to happen. The binary logistic regression analysis was employed to model the relationship between independent variables (predictors) and dependent variable. The dependent variable is the status of the drug addict either relapse, (Yes coded as 1) or not, (No coded as 0). Meanwhile the predictors involved are age, age at first taking drug, family history, education level, family crisis, community support and self motivation. The total of the sample is 200 which the data are provided by AADK (National Antidrug Agency). The finding of the study revealed that age and self motivation are statistically significant towards the relapse cases..
Accounting for Informatively Missing Data in Logistic Regression by Means of Reassessment Sampling
Lin, Ji; Lyles, Robert H.
2015-01-01
We explore the “reassessment” design in a logistic regression setting, where a second wave of sampling is applied to recover a portion of the missing data on a binary exposure and/or outcome variable. We construct a joint likelihood function based on the original model of interest and a model for the missing data mechanism, with emphasis on non-ignorable missingness. The estimation is carried out by numerical maximization of the joint likelihood function with close approximation of the accompanying Hessian matrix, using sharable programs that take advantage of general optimization routines in standard software. We show how likelihood ratio tests can be used for model selection, and how they facilitate direct hypothesis testing for whether missingness is at random. Examples and simulations are presented to demonstrate the performance of the proposed method. PMID:25707010
Statistical modelling for thoracic surgery using a nomogram based on logistic regression.
Liu, Run-Zhong; Zhao, Ze-Rui; Ng, Calvin S H
2016-08-01
A well-developed clinical nomogram is a popular decision-tool, which can be used to predict the outcome of an individual, bringing benefits to both clinicians and patients. With just a few steps on a user-friendly interface, the approximate clinical outcome of patients can easily be estimated based on their clinical and laboratory characteristics. Therefore, nomograms have recently been developed to predict the different outcomes or even the survival rate at a specific time point for patients with different diseases. However, on the establishment and application of nomograms, there is still a lot of confusion that may mislead researchers. The objective of this paper is to provide a brief introduction on the history, definition, and application of nomograms and then to illustrate simple procedures to develop a nomogram with an example based on a multivariate logistic regression model in thoracic surgery. In addition, validation strategies and common pitfalls have been highlighted.
Screening for ketosis using multiple logistic regression based on milk yield and composition.
Kayano, Mitsunori; Kataoka, Tomoko
2015-11-01
Multiple logistic regression was applied to milk yield and composition data for 632 records of healthy cows and 61 records of ketotic cows in Hokkaido, Japan. The purpose was to diagnose ketosis based on milk yield and composition, simultaneously. The cows were divided into two groups: (1) multiparous, including 314 healthy cows and 45 ketotic cows and (2) primiparous, including 318 healthy cows and 16 ketotic cows, since nutritional status, milk yield and composition are affected by parity. Multiple logistic regression was applied to these groups separately. For multiparous cows, milk yield (kg/day/cow) and protein-to-fat (P/F) ratio in milk were significant factors (P<0.05) for the diagnosis of ketosis. For primiparous cows, lactose content (%), solid not fat (SNF) content (%) and milk urea nitrogen (MUN) content (mg/dl) were significantly associated with ketosis (P<0.01). A diagnostic rule was constructed for each group of cows: (1) 9.978 × P/F ratio + 0.085 × milk yield <10 and (2) 2.327 × SNF - 2.703 × lactose + 0.225 × MUN <10. The sensitivity, specificity and the area under the curve (AUC) of the diagnostic rules were (1) 0.800, 0.729 and 0.811; (2) 0.813, 0.730 and 0.787, respectively. The P/F ratio, which is a widely used measure of ketosis, provided the sensitivity, specificity and AUC values of (1) 0.711, 0.726 and 0.781; and (2) 0.678, 0.767 and 0.738, respectively.
Modeling Group Size and Scalar Stress by Logistic Regression from an Archaeological Perspective
Alberti, Gianmarco
2014-01-01
Johnson’s scalar stress theory, describing the mechanics of (and the remedies to) the increase in in-group conflictuality that parallels the increase in groups’ size, provides scholars with a useful theoretical framework for the understanding of different aspects of the material culture of past communities (i.e., social organization, communal food consumption, ceramic style, architecture and settlement layout). Due to its relevance in archaeology and anthropology, the article aims at proposing a predictive model of critical level of scalar stress on the basis of community size. Drawing upon Johnson’s theory and on Dunbar’s findings on the cognitive constrains to human group size, a model is built by means of Logistic Regression on the basis of the data on colony fissioning among the Hutterites of North America. On the grounds of the theoretical framework sketched in the first part of the article, the absence or presence of colony fissioning is considered expression of not critical vs. critical level of scalar stress for the sake of the model building. The model, which is also tested against a sample of archaeological and ethnographic cases: a) confirms the existence of a significant relationship between critical scalar stress and group size, setting the issue on firmer statistical grounds; b) allows calculating the intercept and slope of the logistic regression model, which can be used in any time to estimate the probability that a community experienced a critical level of scalar stress; c) allows locating a critical scalar stress threshold at community size 127 (95% CI: 122–132), while the maximum probability of critical scale stress is predicted at size 158 (95% CI: 147–170). The model ultimately provides grounds to assess, for the sake of any further archaeological/anthropological interpretation, the probability that a group reached a hot spot of size development critical for its internal cohesion. PMID:24626241
NASA Astrophysics Data System (ADS)
Melo, Raquel; Vieira, Gonçalo; Caselli, Alberto; Ramos, Miguel
2010-05-01
Field surveying during the austral summer of 2007/08 and the analysis of a QuickBird satellite image, resulted on the production of a detailed geomorphological map of the Irizar and Crater Lake area in Deception Island (South Shetlands, Maritime Antarctic - 1:10 000) and allowed its analysis and spatial modelling of the geomorphological phenomena. The present study focus on the analysis of the spatial distribution and characteristics of hummocky terrains, lag surfaces and nivation hollows, complemented by GIS spatial modelling intending to identify relevant controlling geographical factors. Models of the susceptibility of occurrence of these phenomena were created using two statistical methods: logistical regression, as a multivariate method; and the informative value as a bivariate method. Success and prediction rate curves were used for model validation. The Area Under the Curve (AUC) was used to quantify the level of performance and prediction of the models and to allow the comparison between the two methods. Regarding the logistic regression method, the AUC showed a success rate of 71% for the lag surfaces, 81% for the hummocky terrains and 78% for the nivation hollows. The prediction rate was 72%, 68% and 71%, respectively. Concerning the informative value method, the success rate was 69% for the lag surfaces, 84% for the hummocky terrains and 78% for the nivation hollows, and with a correspondingly prediction of 71%, 66% and 69%. The results were of very good quality and demonstrate the potential of the models to predict the influence of independent variables in the occurrence of the geomorphological phenomena and also the reliability of the data. Key-words: present-day geomorphological dynamics, detailed geomorphological mapping, GIS, spatial modelling, Deception Island, Antarctic.
Modeling group size and scalar stress by logistic regression from an archaeological perspective.
Alberti, Gianmarco
2014-01-01
Johnson's scalar stress theory, describing the mechanics of (and the remedies to) the increase in in-group conflictuality that parallels the increase in groups' size, provides scholars with a useful theoretical framework for the understanding of different aspects of the material culture of past communities (i.e., social organization, communal food consumption, ceramic style, architecture and settlement layout). Due to its relevance in archaeology and anthropology, the article aims at proposing a predictive model of critical level of scalar stress on the basis of community size. Drawing upon Johnson's theory and on Dunbar's findings on the cognitive constrains to human group size, a model is built by means of Logistic Regression on the basis of the data on colony fissioning among the Hutterites of North America. On the grounds of the theoretical framework sketched in the first part of the article, the absence or presence of colony fissioning is considered expression of not critical vs. critical level of scalar stress for the sake of the model building. The model, which is also tested against a sample of archaeological and ethnographic cases: a) confirms the existence of a significant relationship between critical scalar stress and group size, setting the issue on firmer statistical grounds; b) allows calculating the intercept and slope of the logistic regression model, which can be used in any time to estimate the probability that a community experienced a critical level of scalar stress; c) allows locating a critical scalar stress threshold at community size 127 (95% CI: 122-132), while the maximum probability of critical scale stress is predicted at size 158 (95% CI: 147-170). The model ultimately provides grounds to assess, for the sake of any further archaeological/anthropological interpretation, the probability that a group reached a hot spot of size development critical for its internal cohesion.
Genomic-Enabled Prediction of Ordinal Data with Bayesian Logistic Ordinal Regression.
Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; Burgueño, Juan; Eskridge, Kent
2015-08-18
Most genomic-enabled prediction models developed so far assume that the response variable is continuous and normally distributed. The exception is the probit model, developed for ordered categorical phenotypes. In statistical applications, because of the easy implementation of the Bayesian probit ordinal regression (BPOR) model, Bayesian logistic ordinal regression (BLOR) is implemented rarely in the context of genomic-enabled prediction [sample size (n) is much smaller than the number of parameters (p)]. For this reason, in this paper we propose a BLOR model using the Pólya-Gamma data augmentation approach that produces a Gibbs sampler with similar full conditional distributions of the BPOR model and with the advantage that the BPOR model is a particular case of the BLOR model. We evaluated the proposed model by using simulation and two real data sets. Results indicate that our BLOR model is a good alternative for analyzing ordinal data in the context of genomic-enabled prediction with the probit or logit link.
Genomic-Enabled Prediction of Ordinal Data with Bayesian Logistic Ordinal Regression.
Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; Burgueño, Juan; Eskridge, Kent
2015-10-01
Most genomic-enabled prediction models developed so far assume that the response variable is continuous and normally distributed. The exception is the probit model, developed for ordered categorical phenotypes. In statistical applications, because of the easy implementation of the Bayesian probit ordinal regression (BPOR) model, Bayesian logistic ordinal regression (BLOR) is implemented rarely in the context of genomic-enabled prediction [sample size (n) is much smaller than the number of parameters (p)]. For this reason, in this paper we propose a BLOR model using the Pólya-Gamma data augmentation approach that produces a Gibbs sampler with similar full conditional distributions of the BPOR model and with the advantage that the BPOR model is a particular case of the BLOR model. We evaluated the proposed model by using simulation and two real data sets. Results indicate that our BLOR model is a good alternative for analyzing ordinal data in the context of genomic-enabled prediction with the probit or logit link. PMID:26290569
Ren, Y Y; Zhou, L C; Yang, L; Liu, P Y; Zhao, B W; Liu, H X
2016-09-01
The paper highlights the use of the logistic regression (LR) method in the construction of acceptable statistically significant, robust and predictive models for the classification of chemicals according to their aquatic toxic modes of action. Essentials accounting for a reliable model were all considered carefully. The model predictors were selected by stepwise forward discriminant analysis (LDA) from a combined pool of experimental data and chemical structure-based descriptors calculated by the CODESSA and DRAGON software packages. Model predictive ability was validated both internally and externally. The applicability domain was checked by the leverage approach to verify prediction reliability. The obtained models are simple and easy to interpret. In general, LR performs much better than LDA and seems to be more attractive for the prediction of the more toxic compounds, i.e. compounds that exhibit excess toxicity versus non-polar narcotic compounds and more reactive compounds versus less reactive compounds. In addition, model fit and regression diagnostics was done through the influence plot which reflects the hat-values, studentized residuals, and Cook's distance statistics of each sample. Overdispersion was also checked for the LR model. The relationships between the descriptors and the aquatic toxic behaviour of compounds are also discussed. PMID:27653817
Genomic-Enabled Prediction of Ordinal Data with Bayesian Logistic Ordinal Regression
Montesinos-López, Osval A.; Montesinos-López, Abelardo; Crossa, José; Burgueño, Juan; Eskridge, Kent
2015-01-01
Most genomic-enabled prediction models developed so far assume that the response variable is continuous and normally distributed. The exception is the probit model, developed for ordered categorical phenotypes. In statistical applications, because of the easy implementation of the Bayesian probit ordinal regression (BPOR) model, Bayesian logistic ordinal regression (BLOR) is implemented rarely in the context of genomic-enabled prediction [sample size (n) is much smaller than the number of parameters (p)]. For this reason, in this paper we propose a BLOR model using the Pólya-Gamma data augmentation approach that produces a Gibbs sampler with similar full conditional distributions of the BPOR model and with the advantage that the BPOR model is a particular case of the BLOR model. We evaluated the proposed model by using simulation and two real data sets. Results indicate that our BLOR model is a good alternative for analyzing ordinal data in the context of genomic-enabled prediction with the probit or logit link. PMID:26290569
Prognosis of conservatively treated patients with Pott's paraplegia: logistic regression analysis
Kalita, J; Misra, U; Mandal, S; Srivastava, M
2005-01-01
Methods: The study included 43 patients with Pott's paraplegia, managed conservatively. The diagnosis of Pott's spine was based on clinical, magnetic resonance imaging, and computed tomography or ultrasound guided aspiration biopsy. All patients were examined clinically, and motor evoked potentials (MEPs) to lower limbs and tibial somatosensory evoked potentials (SEP) were recorded. Outcome at six months was defined as good or poor. For evaluating predictors of outcome, 15 clinical, investigative, and evoked potential variables were analysed, using multiple logistic regression analysis. Results: The age range of the patients was 16–70 years, and 22 were female. Mild spasticity with hyperreflexia only was seen in 13 patients. In the remaining, weakness was severe in eight, and moderate and mild in 11 patients each. Twenty patients had loss of joint position sensation. MEP and SEP were abnormal in 19 and 18 patients, respectively. On multiple regression analysis, the best model predicting six month outcome included power, paraplegia score, SEP, and MEP. Conclusion: Patients with Pott's paraplegia are likely to recover completely by six months if they have mild weakness, lower paraplegia score and normal SEPs and MEPs. PMID:15897514
Neck-focused panic attacks among Cambodian refugees; a logistic and linear regression analysis.
Hinton, Devon E; Chhean, Dara; Pich, Vuth; Um, Khin; Fama, Jeanne M; Pollack, Mark H
2006-01-01
Consecutive Cambodian refugees attending a psychiatric clinic were assessed for the presence and severity of current--i.e., at least one episode in the last month--neck-focused panic. Among the whole sample (N=130), in a logistic regression analysis, the Anxiety Sensitivity Index (ASI; odds ratio=3.70) and the Clinician-Administered PTSD Scale (CAPS; odds ratio=2.61) significantly predicted the presence of current neck panic (NP). Among the neck panic patients (N=60), in the linear regression analysis, NP severity was significantly predicted by NP-associated flashbacks (beta=.42), NP-associated catastrophic cognitions (beta=.22), and CAPS score (beta=.28). Further analysis revealed the effect of the CAPS score to be significantly mediated (Sobel test [Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173-1182]) by both NP-associated flashbacks and catastrophic cognitions. In the care of traumatized Cambodian refugees, NP severity, as well as NP-associated flashbacks and catastrophic cognitions, should be specifically assessed and treated.
NASA Astrophysics Data System (ADS)
Heckmann, Tobias; Gegg, Katharina; Becht, Michael
2013-04-01
Statistical approaches to landslide susceptibility modelling on the catchment and regional scale are used very frequently compared to heuristic and physically based approaches. In the present study, we deal with the problem of the optimal sample size for a logistic regression model. More specifically, a stepwise approach has been chosen in order to select those independent variables (from a number of derivatives of a digital elevation model and landcover data) that explain best the spatial distribution of debris flow initiation zones in two neighbouring central alpine catchments in Austria (used mutually for model calculation and validation). In order to minimise problems arising from spatial autocorrelation, we sample a single raster cell from each debris flow initiation zone within an inventory. In addition, as suggested by previous work using the "rare events logistic regression" approach, we take a sample of the remaining "non-event" raster cells. The recommendations given in the literature on the size of this sample appear to be motivated by practical considerations, e.g. the time and cost of acquiring data for non-event cases, which do not apply to the case of spatial data. In our study, we aim at finding empirically an "optimal" sample size in order to avoid two problems: First, a sample too large will violate the independent sample assumption as the independent variables are spatially autocorrelated; hence, a variogram analysis leads to a sample size threshold above which the average distance between sampled cells falls below the autocorrelation range of the independent variables. Second, if the sample is too small, repeated sampling will lead to very different results, i.e. the independent variables and hence the result of a single model calculation will be extremely dependent on the choice of non-event cells. Using a Monte-Carlo analysis with stepwise logistic regression, 1000 models are calculated for a wide range of sample sizes. For each sample size
NASA Astrophysics Data System (ADS)
Pradhan, Biswajeet
Recently, in 2006 and 2007 heavy monsoons rainfall have triggered floods along Malaysia's east coast as well as in southern state of Johor. The hardest hit areas are along the east coast of peninsular Malaysia in the states of Kelantan, Terengganu and Pahang. The city of Johor was particularly hard hit in southern side. The flood cost nearly billion ringgit of property and many lives. The extent of damage could have been reduced or minimized if an early warning system would have been in place. This paper deals with flood susceptibility analysis using logistic regression model. We have evaluated the flood susceptibility and the effect of flood-related factors along the Kelantan river basin using the Geographic Information System (GIS) and remote sensing data. Previous flooded areas were extracted from archived radarsat images using image processing tools. Flood susceptibility mapping was conducted in the study area along the Kelantan River using radarsat imagery and then enlarged to 1:25,000 scales. Topographical, hydrological, geological data and satellite images were collected, processed, and constructed into a spatial database using GIS and image processing. The factors chosen that influence flood occurrence were: topographic slope, topographic aspect, topographic curvature, DEM and distance from river drainage, all from the topographic database; flow direction, flow accumulation, extracted from hydrological database; geology and distance from lineament, taken from the geologic database; land use from SPOT satellite images; soil texture from soil database; and the vegetation index value from SPOT satellite images. Flood susceptible areas were analyzed and mapped using the probability-logistic regression model. Results indicate that flood prone areas can be performed at 1:25,000 which is comparable to some conventional flood hazard map scales. The flood prone areas delineated on these maps correspond to areas that would be inundated by significant flooding
A logistic regression based approach for the prediction of flood warning threshold exceedance
NASA Astrophysics Data System (ADS)
Diomede, Tommaso; Trotter, Luca; Stefania Tesini, Maria; Marsigli, Chiara
2016-04-01
A method based on logistic regression is proposed for the prediction of river level threshold exceedance at short (+0-18h) and medium (+18-42h) lead times. The aim of the study is to provide a valuable tool for the issue of warnings by the authority responsible of public safety in case of flood. The role of different precipitation periods as predictors for the exceedance of a fixed river level has been investigated, in order to derive significant information for flood forecasting. Based on catchment-averaged values, a separation of "antecedent" and "peak-triggering" rainfall amounts as independent variables is attempted. In particular, the following flood-related precipitation periods have been considered: (i) the period from 1 to n days before the forecast issue time, which may be relevant for the soil saturation, (ii) the last 24 hours, which may be relevant for the current water level in the river, and (iii) the period from 0 to x hours in advance with respect to the forecast issue time, when the flood-triggering precipitation generally occurs. Several combinations and values of these predictors have been tested to optimise the method implementation. In particular, the period for the precursor antecedent precipitation ranges between 5 and 45 days; the state of the river can be represented by the last 24-h precipitation or, as alternative, by the current river level. The flood-triggering precipitation has been cumulated over the next 18 hours (for the short lead time) and 36-42 hours (for the medium lead time). The proposed approach requires a specific implementation of logistic regression for each river section and warning threshold. The method performance has been evaluated over the Santerno river catchment (about 450 km2) in the Emilia-Romagna Region, northern Italy. A statistical analysis in terms of false alarms, misses and related scores was carried out by using a 8-year long database. The results are quite satisfactory, with slightly better performances
Heinze, Georg; Ploner, Meinhard; Beyea, Jan
2013-12-20
In the logistic regression analysis of a small-sized, case-control study on Alzheimer's disease, some of the risk factors exhibited missing values, motivating the use of multiple imputation. Usually, Rubin's rules (RR) for combining point estimates and variances would then be used to estimate (symmetric) confidence intervals (CIs), on the assumption that the regression coefficients were distributed normally. Yet, rarely is this assumption tested, with or without transformation. In analyses of small, sparse, or nearly separated data sets, such symmetric CI may not be reliable. Thus, RR alternatives have been considered, for example, Bayesian sampling methods, but not yet those that combine profile likelihoods, particularly penalized profile likelihoods, which can remove first order biases and guarantee convergence of parameter estimation. To fill the gap, we consider the combination of penalized likelihood profiles (CLIP) by expressing them as posterior cumulative distribution functions (CDFs) obtained via a chi-squared approximation to the penalized likelihood ratio statistic. CDFs from multiple imputations can then easily be averaged into a combined CDF c , allowing confidence limits for a parameter β at level 1 - α to be identified as those β* and β** that satisfy CDF c (β*) = α ∕ 2 and CDF c (β**) = 1 - α ∕ 2. We demonstrate that the CLIP method outperforms RR in analyzing both simulated data and data from our motivating example. CLIP can also be useful as a confirmatory tool, should it show that the simpler RR are adequate for extended analysis. We also compare the performance of CLIP to Bayesian sampling methods using Markov chain Monte Carlo. CLIP is available in the R package logistf. PMID:23873477
Hwang, Kyu-Baek; Lee, In-Hee; Park, Jin-Ho; Hambuch, Tina; Choi, Yongjoon; Kim, MinHyeok; Lee, Kyungjoon; Song, Taemin; Neu, Matthew B.; Gupta, Neha; Kohane, Isaac S.; Green, Robert C.; Kong, Sek Won
2014-01-01
As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired WGS datasets of an extended family prepared using two sequencing platforms and a validated set of variants in NA12878. Using LR or ensemble genotyping based filtering, false negative rates were significantly reduced by 1.1- to 17.8-fold at the same levels of false discovery rates (5.4% for heterozygous and 4.5% for homozygous SNVs; 30.0% for heterozygous and 18.7% for homozygous insertions; 25.2% for heterozygous and 16.6% for homozygous deletions) compared to the filtering based on genotype quality scores. Moreover, ensemble genotyping excluded > 98% (105,080 of 107,167) of false positives while retaining > 95% (897 of 937) of true positives in de novo mutation (DNM) discovery, and performed better than a consensus method using two sequencing platforms. Our proposed methods were effective in prioritizing phenotype-associated variants, and ensemble genotyping would be essential to minimize false positive DNM candidates. PMID:24829188
Cost-Sensitive Boosting: Fitting an Additive Asymmetric Logistic Regression Model
NASA Astrophysics Data System (ADS)
Li, Qiu-Jie; Mao, Yao-Bin; Wang, Zhi-Quan; Xiang, Wen-Bo
Conventional machine learning algorithms like boosting tend to equally treat misclassification errors that are not adequate to process certain cost-sensitive classification problems such as object detection. Although many cost-sensitive extensions of boosting by directly modifying the weighting strategy of correspond original algorithms have been proposed and reported, they are heuristic in nature and only proved effective by empirical results but lack sound theoretical analysis. This paper develops a framework from a statistical insight that can embody almost all existing cost-sensitive boosting algorithms: fitting an additive asymmetric logistic regression model by stage-wise optimization of certain criterions. Four cost-sensitive versions of boosting algorithms are derived, namely CSDA, CSRA, CSGA and CSLB which respectively correspond to Discrete AdaBoost, Real AdaBoost, Gentle AdaBoost and LogitBoost. Experimental results on the application of face detection have shown the effectiveness of the proposed learning framework in the reduction of the cumulative misclassification cost.
Binary Logistic Regression Analysis of Foramen Magnum Dimensions for Sex Determination.
Kamath, Venkatesh Gokuldas; Asif, Muhammed; Shetty, Radhakrishna; Avadhani, Ramakrishna
2015-01-01
Purpose. The structural integrity of foramen magnum is usually preserved in fire accidents and explosions due to its resistant nature and secluded anatomical position and this study attempts to determine its sexing potential. Methods. The sagittal and transverse diameters and area of foramen magnum of seventy-two skulls (41 male and 31 female) from south Indian population were measured. The analysis was done using Student's t-test, linear correlation, histogram, Q-Q plot, and Binary Logistic Regression (BLR) to obtain a model for sex determination. The predicted probabilities of BLR were analysed using Receiver Operating Characteristic (ROC) curve. Result. BLR analysis and ROC curve revealed that the predictability of the dimensions in sexing the crania was 69.6% for sagittal diameter, 66.4% for transverse diameter, and 70.3% for area of foramen. Conclusion. The sexual dimorphism of foramen magnum dimensions is established. However, due to considerable overlapping of male and female values, it is unwise to singularly rely on the foramen measurements. However, considering the high sex predictability percentage of its dimensions in the present study and the studies preceding it, the foramen measurements can be used to supplement other sexing evidence available so as to precisely ascertain the sex of the skeleton. PMID:26346917
Zeng, Fangfang; Li, Zhongtao; Yu, Xiaoling; Zhou, Linuo
2013-01-01
Background This study aimed to develop the artificial neural network (ANN) and multivariable logistic regression (LR) analyses for prediction modeling of cardiovascular autonomic (CA) dysfunction in the general population, and compare the prediction models using the two approaches. Methods and Materials We analyzed a previous dataset based on a Chinese population sample consisting of 2,092 individuals aged 30–80 years. The prediction models were derived from an exploratory set using ANN and LR analysis, and were tested in the validation set. Performances of these prediction models were then compared. Results Univariate analysis indicated that 14 risk factors showed statistically significant association with the prevalence of CA dysfunction (P<0.05). The mean area under the receiver-operating curve was 0.758 (95% CI 0.724–0.793) for LR and 0.762 (95% CI 0.732–0.793) for ANN analysis, but noninferiority result was found (P<0.001). The similar results were found in comparisons of sensitivity, specificity, and predictive values in the prediction models between the LR and ANN analyses. Conclusion The prediction models for CA dysfunction were developed using ANN and LR. ANN and LR are two effective tools for developing prediction models based on our dataset. PMID:23940593
Kernel-based logistic regression model for protein sequence without vectorialization.
Fong, Youyi; Datta, Saheli; Georgiev, Ivelin S; Kwong, Peter D; Tomaras, Georgia D
2015-07-01
Protein sequence data arise more and more often in vaccine and infectious disease research. These types of data are discrete, high-dimensional, and complex. We propose to study the impact of protein sequences on binary outcomes using a kernel-based logistic regression model, which models the effect of protein through a random effect whose variance-covariance matrix is mostly determined by a kernel function. We propose a novel, biologically motivated, profile hidden Markov model (HMM)-based mutual information (MI) kernel. Hypothesis testing can be carried out using the maximum of the score statistics and a parametric bootstrap procedure. To improve the power of testing, we propose intuitive modifications to the test statistic. We show through simulation studies that the profile HMM-based MI kernel can be substantially more powerful than competing kernels, and that the modified test statistics bring incremental gains in power. We use these proposed methods to investigate two problems from HIV-1 vaccine research: (1) identifying segments of HIV-1 envelope (Env) protein that confer resistance to neutralizing antibody and (2) identifying segments of Env that are associated with attenuation of protective vaccine effect by antibodies of isotype A in the RV144 vaccine trial.
Binary Logistic Regression Analysis of Foramen Magnum Dimensions for Sex Determination
Kamath, Venkatesh Gokuldas; Asif, Muhammed; Shetty, Radhakrishna; Avadhani, Ramakrishna
2015-01-01
Purpose. The structural integrity of foramen magnum is usually preserved in fire accidents and explosions due to its resistant nature and secluded anatomical position and this study attempts to determine its sexing potential. Methods. The sagittal and transverse diameters and area of foramen magnum of seventy-two skulls (41 male and 31 female) from south Indian population were measured. The analysis was done using Student's t-test, linear correlation, histogram, Q-Q plot, and Binary Logistic Regression (BLR) to obtain a model for sex determination. The predicted probabilities of BLR were analysed using Receiver Operating Characteristic (ROC) curve. Result. BLR analysis and ROC curve revealed that the predictability of the dimensions in sexing the crania was 69.6% for sagittal diameter, 66.4% for transverse diameter, and 70.3% for area of foramen. Conclusion. The sexual dimorphism of foramen magnum dimensions is established. However, due to considerable overlapping of male and female values, it is unwise to singularly rely on the foramen measurements. However, considering the high sex predictability percentage of its dimensions in the present study and the studies preceding it, the foramen measurements can be used to supplement other sexing evidence available so as to precisely ascertain the sex of the skeleton. PMID:26346917
Garcia-Magariños, Manuel; Antoniadis, Anestis; Cao, Ricardo; Gonzãlez-Manteiga, Wenceslao
2010-01-01
Statistical methods generating sparse models are of great value in the gene expression field, where the number of covariates (genes) under study moves about the thousands while the sample sizes seldom reach a hundred of individuals. For phenotype classification, we propose different lasso logistic regression approaches with specific penalizations for each gene. These methods are based on a generalized soft-threshold (GSoft) estimator. We also show that a recent algorithm for convex optimization, namely, the cyclic coordinate descent (CCD) algorithm, provides with a way to solve the optimization problem significantly faster than with other competing methods. Viewing GSoft as an iterative thresholding procedure allows us to get the asymptotic properties of the resulting estimates in a straightforward manner. Results are obtained for simulated and real data. The leukemia and colon datasets are commonly used to evaluate new statistical approaches, so they come in useful to establish comparisons with similar methods. Furthermore, biological meaning is extracted from the leukemia results, and compared with previous studies. In summary, the approaches presented here give rise to sparse, interpretable models that are competitive with similar methods developed in the field.
Use of binary logistic regression technique with MODIS data to estimate wild fire risk
NASA Astrophysics Data System (ADS)
Fan, Hong; Di, Liping; Yang, Wenli; Bonnlander, Brian; Li, Xiaoyan
2007-11-01
Many forest fires occur across the globe each year, which destroy life and property, and strongly impact ecosystems. In recent years, wildland fires and altered fire disturbance regimes have become a significant management and science problem affecting ecosystems and wildland/urban interface cross the United States and global. In this paper, we discuss the estimation of 504 probability models for forecasting fire risk for 14 fuel types, 12 months, one day/week/month in advance, which use 19 years of historical fire data in addition to meteorological and vegetation variables. MODIS land products are utilized as a major data source, and a logistical binary regression was adopted to solve fire forecast probability. In order to better modeling the change of fire risk along with the transition of seasons, some spatial and temporal stratification strategies were applied. In order to explore the possibilities of real time prediction, the Matlab distributing computing toolbox was used to accelerate the prediction. Finally, this study give an evaluation and validation of predict based on the ground truth collected. Validating results indicate these fire risk models have achieved nearly 70% accuracy of prediction and as well MODIS data are potential data source to implement near real-time fire risk prediction.
NASA Astrophysics Data System (ADS)
Kumarasamy, K.; Kaluarachchi, J.
2006-12-01
Groundwater is an important natural resource for numerous human activities, accounting for more than 50% of the total water used in the United States. It is vulnerable to contamination by several organic and inorganic pollutants such as nitrates, heavy metals, and pesticides etc. Assessment of groundwater vulnerability aids in the management of limited groundwater resources. The focus of this thesis is to (1) statistically compare two groundwater vulnerability assessment models; DRASTIC (Acronym for Depth to water, Net recharge, Aquifer Media, Soil media, Topography, Impact of vadose zone, and Hydraulic conductivity of aquifer) and Ordinal Logistic Regression on a national scale by using nitrate concentration as the performance indicator. (2) Analyze any discrepancies in the predictability of each of these models. (3) The advantage of each of the above mentioned models with respect to the knowledge, expertise, time, data requirement and its ability to predict the vulnerability. The breadth of nitrate concentration data in groundwater allows for a reliable comparison of the two models. This work will contribute towards the ultimate motive of developing a universal method to predict the vulnerability of groundwater.
Greenland, Sander; Mansournia, Mohammad Ali
2015-10-15
Penalization is a very general method of stabilizing or regularizing estimates, which has both frequentist and Bayesian rationales. We consider some questions that arise when considering alternative penalties for logistic regression and related models. The most widely programmed penalty appears to be the Firth small-sample bias-reduction method (albeit with small differences among implementations and the results they provide), which corresponds to using the log density of the Jeffreys invariant prior distribution as a penalty function. The latter representation raises some serious contextual objections to the Firth reduction, which also apply to alternative penalties based on t-distributions (including Cauchy priors). Taking simplicity of implementation and interpretation as our chief criteria, we propose that the log-F(1,1) prior provides a better default penalty than other proposals. Penalization based on more general log-F priors is trivial to implement and facilitates mean-squared error reduction and sensitivity analyses of penalty strength by varying the number of prior degrees of freedom. We caution however against penalization of intercepts, which are unduly sensitive to covariate coding and design idiosyncrasies.
Kadilar, Gamze Ozel
2016-01-01
The aim of the study is to examine the factors that appear to have a higher potential for serious injury or death of drivers in traffic accidents in Turkey, such as collision type, roadway surface, vehicle speed, alcohol/drug use, and restraint use. Driver crash severity is the dependent variable of this study with two categories, fatal and non-fatal. Due to the binary nature of the dependent variable, a conditional logistic regression analysis was found suitable. Of the 16 independent variables obtained from Turkish police accident reports, 11 variables were found most significantly associated with driver crash severity. They are age, education level, restraint use, roadway condition, roadway type, time of day, collision location, collision type, number and direction of vehicles, vehicle speed, and alcohol/drug use. This study found that belted drivers aged 18-25 years involving two vehicles travelling in the same direction, in an urban area, during the daytime, and on an avenue or a street have better chances of survival in traffic accidents. PMID:25087577
Greenland, Sander; Mansournia, Mohammad Ali
2015-10-15
Penalization is a very general method of stabilizing or regularizing estimates, which has both frequentist and Bayesian rationales. We consider some questions that arise when considering alternative penalties for logistic regression and related models. The most widely programmed penalty appears to be the Firth small-sample bias-reduction method (albeit with small differences among implementations and the results they provide), which corresponds to using the log density of the Jeffreys invariant prior distribution as a penalty function. The latter representation raises some serious contextual objections to the Firth reduction, which also apply to alternative penalties based on t-distributions (including Cauchy priors). Taking simplicity of implementation and interpretation as our chief criteria, we propose that the log-F(1,1) prior provides a better default penalty than other proposals. Penalization based on more general log-F priors is trivial to implement and facilitates mean-squared error reduction and sensitivity analyses of penalty strength by varying the number of prior degrees of freedom. We caution however against penalization of intercepts, which are unduly sensitive to covariate coding and design idiosyncrasies. PMID:26011599
Snedden, Gregg A.; Steyer, Gregory D.
2013-01-01
Understanding plant community zonation along estuarine stress gradients is critical for effective conservation and restoration of coastal wetland ecosystems. We related the presence of plant community types to estuarine hydrology at 173 sites across coastal Louisiana. Percent relative cover by species was assessed at each site near the end of the growing season in 2008, and hourly water level and salinity were recorded at each site Oct 2007–Sep 2008. Nine plant community types were delineated with k-means clustering, and indicator species were identified for each of the community types with indicator species analysis. An inverse relation between salinity and species diversity was observed. Canonical correspondence analysis (CCA) effectively segregated the sites across ordination space by community type, and indicated that salinity and tidal amplitude were both important drivers of vegetation composition. Multinomial logistic regression (MLR) and Akaike's Information Criterion (AIC) were used to predict the probability of occurrence of the nine vegetation communities as a function of salinity and tidal amplitude, and probability surfaces obtained from the MLR model corroborated the CCA results. The weighted kappa statistic, calculated from the confusion matrix of predicted versus actual community types, was 0.7 and indicated good agreement between observed community types and model predictions. Our results suggest that models based on a few key hydrologic variables can be valuable tools for predicting vegetation community development when restoring and managing coastal wetlands.
Use of multilevel logistic regression to identify the causes of differential item functioning.
Balluerka, Nekane; Gorostiaga, Arantxa; Gómez-Benito, Juana; Hidalgo, María Dolores
2010-11-01
Given that a key function of tests is to serve as evaluation instruments and for decision making in the fields of psychology and education, the possibility that some of their items may show differential behaviour is a major concern for psychometricians. In recent decades, important progress has been made as regards the efficacy of techniques designed to detect this differential item functioning (DIF). However, the findings are scant when it comes to explaining its causes. The present study addresses this problem from the perspective of multilevel analysis. Starting from a case study in the area of transcultural comparisons, multilevel logistic regression is used: 1) to identify the item characteristics associated with the presence of DIF; 2) to estimate the proportion of variation in the DIF coefficients that is explained by these characteristics; and 3) to evaluate alternative explanations of the DIF by comparing the explanatory power or fit of different sequential models. The comparison of these models confirmed one of the two alternatives (familiarity with the stimulus) and rejected the other (the topic area) as being a cause of differential functioning with respect to the compared groups.
Lewis, Kristin Nicole; Heckman, Bernadette Davantes; Himawan, Lina
2011-08-01
Growth mixture modeling (GMM) identified latent groups based on treatment outcome trajectories of headache disability measures in patients in headache subspecialty treatment clinics. Using a longitudinal design, 219 patients in headache subspecialty clinics in 4 large cities throughout Ohio provided data on their headache disability at pretreatment and 3 follow-up assessments. GMM identified 3 treatment outcome trajectory groups: (1) patients who initiated treatment with elevated disability levels and who reported statistically significant reductions in headache disability (high-disability improvers; 11%); (2) patients who initiated treatment with elevated disability but who reported no reductions in disability (high-disability nonimprovers; 34%); and (3) patients who initiated treatment with moderate disability and who reported statistically significant reductions in headache disability (moderate-disability improvers; 55%). Based on the final multinomial logistic regression model, a dichotomized treatment appointment attendance variable was a statistically significant predictor for differentiating high-disability improvers from high-disability nonimprovers. Three-fourths of patients who initiated treatment with elevated disability levels did not report reductions in disability after 5 months of treatment with new preventive pharmacotherapies. Preventive headache agents may be most efficacious for patients with moderate levels of disability and for patients with high disability levels who attend all treatment appointments.
Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.
2003-01-01
Logistic regression was used to predict the probability of debris flows occurring in areas recently burned by wildland fires. Multiple logistic regression is conceptually similar to multiple linear regression because statistical relations between one dependent variable and several independent variables are evaluated. In logistic regression, however, the dependent variable is transformed to a binary variable (debris flow did or did not occur), and the actual probability of the debris flow occurring is statistically modeled. Data from 399 basins located within 15 wildland fires that burned during 2000-2002 in Colorado, Idaho, Montana, and New Mexico were evaluated. More than 35 independent variables describing the burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows were delineated from National Elevation Data using a Geographic Information System (GIS). (2) Data describing the burn severity, geology, land surface gradient, rainfall, and soil properties were determined for each basin. These data were then downloaded to a statistics software package for analysis using logistic regression. (3) Relations between the occurrence/non-occurrence of debris flows and burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated and several preliminary multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combination produced the most effective model. The multivariate model that best predicted the occurrence of debris flows was selected. (4) The multivariate logistic regression model was entered into a GIS, and a map showing the probability of debris flows was constructed. The most effective model incorporates the percentage of each basin with slope greater than 30 percent, percentage of land burned at medium and high burn severity
ERIC Educational Resources Information Center
Taylor, Aaron B.; West, Stephen G.; Aiken, Leona S.
2006-01-01
Variables that have been coarsely categorized into a small number of ordered categories are often modeled as outcome variables in psychological research. The authors employ a Monte Carlo study to investigate the effects of this coarse categorization of dependent variables on power to detect true effects using three classes of regression models:…
NASA Astrophysics Data System (ADS)
Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd; Baharum, Adam
2015-10-01
Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test of the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.
Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd; Baharum, Adam
2015-10-22
Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test of the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.
ERIC Educational Resources Information Center
Guler, Nese; Penfield, Randall D.
2009-01-01
In this study, we investigate the logistic regression (LR), Mantel-Haenszel (MH), and Breslow-Day (BD) procedures for the simultaneous detection of both uniform and nonuniform differential item functioning (DIF). A simulation study was used to assess and compare the Type I error rate and power of a combined decision rule (CDR), which assesses DIF…
Asghari, Mehdi Poursheikhali; Hayatshahi, Sayyed Hamed Sadat; Abdolmaleki, Parviz
2012-01-01
From both the structural and functional points of view, β-turns play important biological roles in proteins. In the present study, a novel two-stage hybrid procedure has been developed to identify β-turns in proteins. Binary logistic regression was initially used for the first time to select significant sequence parameters in identification of β-turns due to a re-substitution test procedure. Sequence parameters were consisted of 80 amino acid positional occurrences and 20 amino acid percentages in sequence. Among these parameters, the most significant ones which were selected by binary logistic regression model, were percentages of Gly, Ser and the occurrence of Asn in position i+2, respectively, in sequence. These significant parameters have the highest effect on the constitution of a β-turn sequence. A neural network model was then constructed and fed by the parameters selected by binary logistic regression to build a hybrid predictor. The networks have been trained and tested on a non-homologous dataset of 565 protein chains. With applying a nine fold cross-validation test on the dataset, the network reached an overall accuracy (Qtotal) of 74, which is comparable with results of the other β-turn prediction methods. In conclusion, this study proves that the parameter selection ability of binary logistic regression together with the prediction capability of neural networks lead to the development of more precise models for identifying β-turns in proteins.
ERIC Educational Resources Information Center
Le, Huy; Marcus, Justin
2012-01-01
This study used Monte Carlo simulation to examine the properties of the overall odds ratio (OOR), which was recently introduced as an index for overall effect size in multiple logistic regression. It was found that the OOR was relatively independent of study base rate and performed better than most commonly used R-square analogs in indexing model…
ERIC Educational Resources Information Center
Gordovil-Merino, Amalia; Guardia-Olmos, Joan; Pero-Cebollero, Maribel
2012-01-01
In this paper, we used simulations to compare the performance of classical and Bayesian estimations in logistic regression models using small samples. In the performed simulations, conditions were varied, including the type of relationship between independent and dependent variable values (i.e., unrelated and related values), the type of variable…
ERIC Educational Resources Information Center
Olejnik, Stephen; Mills, Jamie; Keselman, Harvey
2000-01-01
Evaluated the use of Mallow's C(p) and Wherry's adjusted R squared (R. Wherry, 1931) statistics to select a final model from a pool of model solutions using computer generated data. Neither statistic identified the underlying regression model any better than, and usually less well than, the stepwise selection method, which itself was poor for…
Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.; Michael, John A.; Helsel, Dennis R.
2008-01-01
Logistic regression was used to develop statistical models that can be used to predict the probability of debris flows in areas recently burned by wildfires by using data from 14 wildfires that burned in southern California during 2003-2006. Twenty-eight independent variables describing the basin morphology, burn severity, rainfall, and soil properties of 306 drainage basins located within those burned areas were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows soon after the 2003 to 2006 fires were delineated from data in the National Elevation Dataset using a geographic information system; (2) Data describing the basin morphology, burn severity, rainfall, and soil properties were compiled for each basin. These data were then input to a statistics software package for analysis using logistic regression; and (3) Relations between the occurrence or absence of debris flows and the basin morphology, burn severity, rainfall, and soil properties were evaluated, and five multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combinations produced the most effective models, and the multivariate models that best predicted the occurrence of debris flows were identified. Percentage of high burn severity and 3-hour peak rainfall intensity were significant variables in all models. Soil organic matter content and soil clay content were significant variables in all models except Model 5. Soil slope was a significant variable in all models except Model 4. The most suitable model can be selected from these five models on the basis of the availability of independent variables in the particular area of interest and field checking of probability maps. The multivariate logistic regression models can be entered into a geographic information system, and maps showing the probability of debris flows can be constructed in recently burned areas of
Bent, Gardner C.; Archfield, Stacey A.
2002-01-01
A logistic regression equation was developed for estimating the probability of a stream flowing perennially at a specific site in Massachusetts. The equation provides city and town conservation commissions and the Massachusetts Department of Environmental Protection with an additional method for assessing whether streams are perennial or intermittent at a specific site in Massachusetts. This information is needed to assist these environmental agencies, who administer the Commonwealth of Massachusetts Rivers Protection Act of 1996, which establishes a 200-foot-wide protected riverfront area extending along the length of each side of the stream from the mean annual high-water line along each side of perennial streams, with exceptions in some urban areas. The equation was developed by relating the verified perennial or intermittent status of a stream site to selected basin characteristics of naturally flowing streams (no regulation by dams, surface-water withdrawals, ground-water withdrawals, diversion, waste-water discharge, and so forth) in Massachusetts. Stream sites used in the analysis were identified as perennial or intermittent on the basis of review of measured streamflow at sites throughout Massachusetts and on visual observation at sites in the South Coastal Basin, southeastern Massachusetts. Measured or observed zero flow(s) during months of extended drought as defined by the 310 Code of Massachusetts Regulations (CMR) 10.58(2)(a) were not considered when designating the perennial or intermittent status of a stream site. The database used to develop the equation included a total of 305 stream sites (84 intermittent- and 89 perennial-stream sites in the State, and 50 intermittent- and 82 perennial-stream sites in the South Coastal Basin). Stream sites included in the database had drainage areas that ranged from 0.14 to 8.94 square miles in the State and from 0.02 to 7.00 square miles in the South Coastal Basin.Results of the logistic regression analysis
NASA Astrophysics Data System (ADS)
Nowicki, M. A.; Hearne, M.; Thompson, E.; Wald, D. J.
2012-12-01
Seismically induced landslides present a costly and often fatal threats in many mountainous regions. Substantial effort has been invested to understand where seismically induced landslides may occur in the future. Both slope-stability methods and, more recently, statistical approaches to the problem are described throughout the literature. Though some regional efforts have succeeded, no uniformly agreed-upon method is available for predicting the likelihood and spatial extent of seismically induced landslides. For use in the U. S. Geological Survey (USGS) Prompt Assessment of Global Earthquakes for Response (PAGER) system, we would like to routinely make such estimates, in near-real time, around the globe. Here we use the recently produced USGS ShakeMap Atlas of historic earthquakes to develop an empirical landslide probability model. We focus on recent events, yet include any digitally-mapped landslide inventories for which well-constrained ShakeMaps are also available. We combine these uniform estimates of the input shaking (e.g., peak acceleration and velocity) with broadly available susceptibility proxies, such as topographic slope and surface geology. The resulting database is used to build a predictive model of the probability of landslide occurrence with logistic regression. The landslide database includes observations from the Northridge, California (1994); Wenchuan, China (2008); ChiChi, Taiwan (1999); and Chuetsu, Japan (2004) earthquakes; we also provide ShakeMaps for moderate-sized events without landslide for proper model testing and training. The performance of the regression model is assessed with both statistical goodness-of-fit metrics and a qualitative review of whether or not the model is able to capture the spatial extent of landslides for each event. Part of our goal is to determine which variables can be employed based on globally-available data or proxies, and whether or not modeling results from one region are transferrable to
Binary logistic regression analysis of hard palate dimensions for sexing human crania
Asif, Muhammed; Shetty, Radhakrishna; Avadhani, Ramakrishna
2016-01-01
Sex determination is the preliminary step in every forensic investigation and the hard palate assumes significance in cranial sexing in cases involving burns and explosions due to its resistant nature and secluded location. This study analyzes the sexing potential of incisive foramen to posterior nasal spine length, palatine process of maxilla length, horizontal plate of palatine bone length and transverse length between the greater palatine foramina. The study deviates from the conventional method of measuring the maxillo-alveolar length and breadth as the dimensions considered in this study are more heat resistant and useful in situations with damaged alveolar margins. The study involves 50 male and 50 female adult dry skulls of Indian ethnic group. The dimensions measured were statistically analyzed using Student's t test, binary logistic regression and receiver operating characteristic curve. It was observed that the incisive foramen to posterior nasal spine length is a definite sex marker with sex predictability of 87.2%. The palatine process of maxilla length with 66.8% sex predictability and the horizontal plate of palatine bone length with 71.9% sex predictability cannot be relied upon as definite sex markers. The transverse length between the greater palatine foramina is statistically insignificant in sexing crania (P=0.318). Considering a significant overlap of values in both the sexes the palatal dimensions singularly cannot be relied upon for sexing. Nevertheless, considering the high sex predictability of incisive foramen to posterior nasal spine length this dimension can definitely be used to supplement other sexing evidence available to precisely conclude the cranial sex. PMID:27382518
Multitask Coupled Logistic Regression and its Fast Implementation for Large Multitask Datasets.
Gu, Xin; Chung, Fu-Lai; Ishibuchi, Hisao; Wang, Shitong
2015-09-01
When facing multitask-learning problems, it is desirable that the learning method could find the correct input-output features and share the commonality among multiple domains and also scale-up for large multitask datasets. We introduce the multitask coupled logistic regression (LR) framework called LR-based multitask classification learning algorithm (MTC-LR), which is a new method for generating each classifier for each task, capable of sharing the commonality among multitask domains. The basic idea of MTC-LR is to use all individual LR based classifiers, each one appropriate for each task domain, but in contrast to other support vector machine (SVM)-based proposals, learning all the parameter vectors of all individual classifiers by using the conjugate gradient method, in a global way and without the use of kernel trick, and being easily extended into its scaled version. We theoretically show that the addition of a new term in the cost function of the set of LRs (that penalizes the diversity among multiple tasks) produces a coupling of multiple tasks that allows MTC-LR to improve the learning performance in a LR way. This finding can make us easily integrate it with a state-of-the-art fast LR algorithm called dual coordinate descent method (CDdual) to develop its fast version MTC-LR-CDdual for large multitask datasets. The proposed algorithm MTC-LR-CDdual is also theoretically analyzed. Our experimental results on artificial and real-datasets indicate the effectiveness of the proposed algorithm MTC-LR-CDdual in classification accuracy, speed, and robustness. PMID:25423663
Nakasone, Yutaka Ikeda, Osamu; Yamashita, Yasuyuki; Kudoh, Kouichi; Shigematsu, Yoshinori; Harada, Kazunori
2007-09-15
We applied multivariate analysis to the clinical findings in patients with acute gastrointestinal (GI) hemorrhage and compared the relationship between these findings and angiographic evidence of extravasation. Our study population consisted of 46 patients with acute GI bleeding. They were divided into two groups. In group 1 we retrospectively analyzed 41 angiograms obtained in 29 patients (age range, 25-91 years; average, 71 years). Their clinical findings including the shock index (SI), diastolic blood pressure, hemoglobin, platelet counts, and age, which were quantitatively analyzed. In group 2, consisting of 17 patients (age range, 21-78 years; average, 60 years), we prospectively applied statistical analysis by a logistics regression model to their clinical findings and then assessed 21 angiograms obtained in these patients to determine whether our model was useful for predicting the presence of angiographic evidence of extravasation. On 18 of 41 (43.9%) angiograms in group 1 there was evidence of extravasation; in 3 patients it was demonstrated only by selective angiography. Factors significantly associated with angiographic visualization of extravasation were the SI and patient age. For differentiation between cases with and cases without angiographic evidence of extravasation, the maximum cutoff point was between 0.51 and 0.0.53. Of the 21 angiograms obtained in group 2, 13 (61.9%) showed evidence of extravasation; in 1 patient it was demonstrated only on selective angiograms. We found that in 90% of the cases, the prospective application of our model correctly predicted the angiographically confirmed presence or absence of extravasation. We conclude that in patients with GI hemorrhage, angiographic visualization of extravasation is associated with the pre-embolization SI. Patients with a high SI value should undergo study to facilitate optimal treatment planning.
LOGISTIC NETWORK REGRESSION FOR SCALABLE ANALYSIS OF NETWORKS WITH JOINT EDGE/VERTEX DYNAMICS
Almquist, Zack W.; Butts, Carter T.
2015-01-01
Change in group size and composition has long been an important area of research in the social sciences. Similarly, interest in interaction dynamics has a long history in sociology and social psychology. However, the effects of endogenous group change on interaction dynamics are a surprisingly understudied area. One way to explore these relationships is through social network models. Network dynamics may be viewed as a process of change in the edge structure of a network, in the vertex set on which edges are defined, or in both simultaneously. Although early studies of such processes were primarily descriptive, recent work on this topic has increasingly turned to formal statistical models. Although showing great promise, many of these modern dynamic models are computationally intensive and scale very poorly in the size of the network under study and/or the number of time points considered. Likewise, currently used models focus on edge dynamics, with little support for endogenously changing vertex sets. Here, the authors show how an existing approach based on logistic network regression can be extended to serve as a highly scalable framework for modeling large networks with dynamic vertex sets. The authors place this approach within a general dynamic exponential family (exponential-family random graph modeling) context, clarifying the assumptions underlying the framework (and providing a clear path for extensions), and they show how model assessment methods for cross-sectional networks can be extended to the dynamic case. Finally, the authors illustrate this approach on a classic data set involving interactions among windsurfers on a California beach. PMID:26120218
Logistic regression modeling to assess groundwater vulnerability to contamination in Hawaii, USA.
Mair, Alan; El-Kadi, Aly I
2013-10-01
Capture zone analysis combined with a subjective susceptibility index is currently used in Hawaii to assess vulnerability to contamination of drinking water sources derived from groundwater. In this study, we developed an alternative objective approach that combines well capture zones with multiple-variable logistic regression (LR) modeling and applied it to the highly-utilized Pearl Harbor and Honolulu aquifers on the island of Oahu, Hawaii. Input for the LR models utilized explanatory variables based on hydrogeology, land use, and well geometry/location. A suite of 11 target contaminants detected in the region, including elevated nitrate (>1 mg/L), four chlorinated solvents, four agricultural fumigants, and two pesticides, was used to develop the models. We then tested the ability of the new approach to accurately separate groups of wells with low and high vulnerability, and the suitability of nitrate as an indicator of other types of contamination. Our results produced contaminant-specific LR models that accurately identified groups of wells with the lowest/highest reported detections and the lowest/highest nitrate concentrations. Current and former agricultural land uses were identified as significant explanatory variables for eight of the 11 target contaminants, while elevated nitrate was a significant variable for five contaminants. The utility of the combined approach is contingent on the availability of hydrologic and chemical monitoring data for calibrating groundwater and LR models. Application of the approach using a reference site with sufficient data could help identify key variables in areas with similar hydrogeology and land use but limited data. In addition, elevated nitrate may also be a suitable indicator of groundwater contamination in areas with limited data. The objective LR modeling approach developed in this study is flexible enough to address a wide range of contaminants and represents a suitable addition to the current subjective approach.
Logistic regression modeling to assess groundwater vulnerability to contamination in Hawaii, USA
NASA Astrophysics Data System (ADS)
Mair, Alan; El-Kadi, Aly I.
2013-10-01
Capture zone analysis combined with a subjective susceptibility index is currently used in Hawaii to assess vulnerability to contamination of drinking water sources derived from groundwater. In this study, we developed an alternative objective approach that combines well capture zones with multiple-variable logistic regression (LR) modeling and applied it to the highly-utilized Pearl Harbor and Honolulu aquifers on the island of Oahu, Hawaii. Input for the LR models utilized explanatory variables based on hydrogeology, land use, and well geometry/location. A suite of 11 target contaminants detected in the region, including elevated nitrate (> 1 mg/L), four chlorinated solvents, four agricultural fumigants, and two pesticides, was used to develop the models. We then tested the ability of the new approach to accurately separate groups of wells with low and high vulnerability, and the suitability of nitrate as an indicator of other types of contamination. Our results produced contaminant-specific LR models that accurately identified groups of wells with the lowest/highest reported detections and the lowest/highest nitrate concentrations. Current and former agricultural land uses were identified as significant explanatory variables for eight of the 11 target contaminants, while elevated nitrate was a significant variable for five contaminants. The utility of the combined approach is contingent on the availability of hydrologic and chemical monitoring data for calibrating groundwater and LR models. Application of the approach using a reference site with sufficient data could help identify key variables in areas with similar hydrogeology and land use but limited data. In addition, elevated nitrate may also be a suitable indicator of groundwater contamination in areas with limited data. The objective LR modeling approach developed in this study is flexible enough to address a wide range of contaminants and represents a suitable addition to the current subjective approach.
Spalj, Stjepan; Spalj, Vedrana Tudor; Ivanković, Luida; Plancak, Darije
2014-03-01
The aim of this study was to explore the patterns of oral health-related risk behaviours in relation to dental status, attitudes, motivation and knowledge among Croatian adolescents. The assessment was conducted in the sample of 750 male subjects - military recruits aged 18-28 in Croatia using the questionnaire and clinical examination. Mean number of decayed, missing and filled teeth (DMFT) and Significant Caries Index (SIC) were calculated. Multiple logistic regression models were crated for analysis. Although models of risk behaviours were statistically significant their explanatory values were quite low. Five of them--rarely toothbrushing, not using hygiene auxiliaries, rarely visiting dentist, toothache as a primary reason to visit dentist, and demand for tooth extraction due to toothache--had the highest explanatory values ranging from 21-29% and correctly classified 73-89% of subjects. Toothache as a primary reason to visit dentist, extraction as preferable therapy when toothache occurs, not having brushing education in school and frequent gingival bleeding were significantly related to population with high caries experience (DMFT > or = 14 according to SiC) producing Odds ratios of 1.6 (95% CI 1.07-2.46), 2.1 (95% CI 1.29-3.25), 1.8 (95% CI 1.21-2.74) and 2.4 (95% CI 1.21-2.74) respectively. DMFT> or = 14 model had low explanatory value of 6.5% and correctly classified 83% of subjects. It can be concluded that oral health-related risk behaviours are interrelated. Poor association was seen between attitudes concerning oral health and oral health-related risk behaviours, indicating insufficient motivation to change lifestyle and habits. Self-reported oral hygiene habits were not strongly related to dental status.
Mallinis, Georgios; Koutsias, Nikos
2008-01-01
Improvement of satellite sensor characteristics motivates the development of new techniques for satellite image classification. Spatial information seems to be critical in classification processes, especially for heterogeneous and complex landscapes such as those observed in the Mediterranean basin. In our study, a spectral classification method of a LANDSAT-5 TM imagery that uses several binomial logistic regression models was developed, evaluated and compared to the familiar parametric maximum likelihood algorithm. The classification approach based on logistic regression modelling was extended to a contextual one by using autocovariates to consider spatial dependencies of every pixel with its neighbours. Finally, the maximum likelihood algorithm was upgraded to contextual by considering typicality, a measure which indicates the strength of class membership. The use of logistic regression for broad-scale land cover classification presented higher overall accuracy (75.61%), although not statistically significant, than the maximum likelihood algorithm (64.23%), even when the latter was refined following a spatial approach based on Mahalanobis distance (66.67%). However, the consideration of the spatial autocovariate in the logistic models significantly improved the fit of the models and increased the overall accuracy from 75.61% to 80.49%.
Pisano, Etta D.; Acharyya, Suddhasatta; Cole, Elodia B.; Marques, Helga S.; Yaffe, Martin J.; Blevins, Meredith; Conant, Emily F.; Hendrick, R. Edward; Baum, Janet K.; Fajardo, Laurie L.; Jong, Roberta A.; Koomen, Marcia A.; Kuzmiak, Cherie M.; Lee, Yeonhee; Pavic, Dag; Yoon, Sora C.; Padungchaichote, Wittaya; Gatsonis, Constantine
2009-01-01
Purpose: To determine which factors contributed to the Digital Mammographic Imaging Screening Trial (DMIST) cancer detection results. Materials and Methods: This project was HIPAA compliant and institutional review board approved. Seven radiologist readers reviewed the film hard-copy (screen-film) and digital mammograms in DMIST cancer cases and assessed the factors that contributed to lesion visibility on both types of images. Two multinomial logistic regression models were used to analyze the combined and condensed visibility ratings assigned by the readers to the paired digital and screen-film images. Results: Readers most frequently attributed differences in DMIST cancer visibility to variations in image contrast—not differences in positioning or compression—between digital and screen-film mammography. The odds of a cancer being more visible on a digital mammogram—rather than being equally visible on digital and screen-film mammograms—were significantly greater for women with dense breasts than for women with nondense breasts, even with the data adjusted for patient age, lesion type, and mammography system (odds ratio, 2.28; P < .0001). The odds of a cancer being more visible at digital mammography—rather than being equally visible at digital and screen-film mammography—were significantly greater for lesions imaged with the General Electric digital mammography system than for lesions imaged with the Fischer (P = .0070) and Fuji (P = .0070) devices. Conclusion: The significantly better diagnostic accuracy of digital mammography, as compared with screen-film mammography, in women with dense breasts demonstrated in the DMIST was most likely attributable to differences in image contrast, which were most likely due to the inherent system performance improvements that are available with digital mammography. The authors conclude that the DMIST results were attributable primarily to differences in the display and acquisition characteristics of the
NASA Astrophysics Data System (ADS)
Al-Mudhafar, W. J.
2013-12-01
Precisely prediction of rock facies leads to adequate reservoir characterization by improving the porosity-permeability relationships to estimate the properties in non-cored intervals. It also helps to accurately identify the spatial facies distribution to perform an accurate reservoir model for optimal future reservoir performance. In this paper, the facies estimation has been done through Multinomial logistic regression (MLR) with respect to the well logs and core data in a well in upper sandstone formation of South Rumaila oil field. The entire independent variables are gamma rays, formation density, water saturation, shale volume, log porosity, core porosity, and core permeability. Firstly, Robust Sequential Imputation Algorithm has been considered to impute the missing data. This algorithm starts from a complete subset of the dataset and estimates sequentially the missing values in an incomplete observation by minimizing the determinant of the covariance of the augmented data matrix. Then, the observation is added to the complete data matrix and the algorithm continues with the next observation with missing values. The MLR has been chosen to estimate the maximum likelihood and minimize the standard error for the nonlinear relationships between facies & core and log data. The MLR is used to predict the probabilities of the different possible facies given each independent variable by constructing a linear predictor function having a set of weights that are linearly combined with the independent variables by using a dot product. Beta distribution of facies has been considered as prior knowledge and the resulted predicted probability (posterior) has been estimated from MLR based on Baye's theorem that represents the relationship between predicted probability (posterior) with the conditional probability and the prior knowledge. To assess the statistical accuracy of the model, the bootstrap should be carried out to estimate extra-sample prediction error by randomly
Wagner, Philippe; Ghith, Nermin; Leckie, George
2016-01-01
Background and Aim Many multilevel logistic regression analyses of “neighbourhood and health” focus on interpreting measures of associations (e.g., odds ratio, OR). In contrast, multilevel analysis of variance is rarely considered. We propose an original stepwise analytical approach that distinguishes between “specific” (measures of association) and “general” (measures of variance) contextual effects. Performing two empirical examples we illustrate the methodology, interpret the results and discuss the implications of this kind of analysis in public health. Methods We analyse 43,291 individuals residing in 218 neighbourhoods in the city of Malmö, Sweden in 2006. We study two individual outcomes (psychotropic drug use and choice of private vs. public general practitioner, GP) for which the relative importance of neighbourhood as a source of individual variation differs substantially. In Step 1 of the analysis, we evaluate the OR and the area under the receiver operating characteristic (AUC) curve for individual-level covariates (i.e., age, sex and individual low income). In Step 2, we assess general contextual effects using the AUC. Finally, in Step 3 the OR for a specific neighbourhood characteristic (i.e., neighbourhood income) is interpreted jointly with the proportional change in variance (i.e., PCV) and the proportion of ORs in the opposite direction (POOR) statistics. Results For both outcomes, information on individual characteristics (Step 1) provide a low discriminatory accuracy (AUC = 0.616 for psychotropic drugs; = 0.600 for choosing a private GP). Accounting for neighbourhood of residence (Step 2) only improved the AUC for choosing a private GP (+0.295 units). High neighbourhood income (Step 3) was strongly associated to choosing a private GP (OR = 3.50) but the PCV was only 11% and the POOR 33%. Conclusion Applying an innovative stepwise multilevel analysis, we observed that, in Malmö, the neighbourhood context per se had a negligible
The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring
ERIC Educational Resources Information Center
Haberman, Shelby J.; Sinharay, Sandip
2010-01-01
Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…
Staley, Dennis M.; Negri, Jacquelyn A.; Kean, Jason W.; Laber, Jayme L.; Tillery, Anne C.; Youberg, Ann M.
2016-01-01
Wildfire can significantly alter the hydrologic response of a watershed to the extent that even modest rainstorms can generate dangerous flash floods and debris flows. To reduce public exposure to hazard, the U.S. Geological Survey produces post-fire debris-flow hazard assessments for select fires in the western United States. We use publicly available geospatial data describing basin morphology, burn severity, soil properties, and rainfall characteristics to estimate the statistical likelihood that debris flows will occur in response to a storm of a given rainfall intensity. Using an empirical database and refined geospatial analysis methods, we defined new equations for the prediction of debris-flow likelihood using logistic regression methods. We showed that the new logistic regression model outperformed previous models used to predict debris-flow likelihood.
Li, J.; Gray, B.R.; Bates, D.M.
2008-01-01
Partitioning the variance of a response by design levels is challenging for binomial and other discrete outcomes. Goldstein (2003) proposed four definitions for variance partitioning coefficients (VPC) under a two-level logistic regression model. In this study, we explicitly derived formulae for multi-level logistic regression model and subsequently studied the distributional properties of the calculated VPCs. Using simulations and a vegetation dataset, we demonstrated associations between different VPC definitions, the importance of methods for estimating VPCs (by comparing VPC obtained using Laplace and penalized quasilikehood methods), and bivariate dependence between VPCs calculated at different levels. Such an empirical study lends an immediate support to wider applications of VPC in scientific data analysis.
Staley, Dennis M.; Negri, Jacquelyn A.; Kean, Jason W.; Laber, Jayme L.; Tillery, Anne C.; Youberg, Ann M.
2016-06-30
Wildfire can significantly alter the hydrologic response of a watershed to the extent that even modest rainstorms can generate dangerous flash floods and debris flows. To reduce public exposure to hazard, the U.S. Geological Survey produces post-fire debris-flow hazard assessments for select fires in the western United States. We use publicly available geospatial data describing basin morphology, burn severity, soil properties, and rainfall characteristics to estimate the statistical likelihood that debris flows will occur in response to a storm of a given rainfall intensity. Using an empirical database and refined geospatial analysis methods, we defined new equations for the prediction of debris-flow likelihood using logistic regression methods. We showed that the new logistic regression model outperformed previous models used to predict debris-flow likelihood.
Chen, Chau-Kuang; Bruce, Michelle; Tyler, Lauren; Brown, Claudine; Garrett, Angelica; Goggins, Susan; Lewis-Polite, Brandy; Weriwoh, Mirabel L; Juarez, Paul D.; Hood, Darryl B.; Skelton, Tyler
2014-01-01
The goal of this study was to analyze a 54-item instrument for assessment of perception of exposure to environmental contaminants within the context of the built environment, or exposome. This exposome was defined in five domains to include 1) home and hobby, 2) school, 3) community, 4) occupation, and 5) exposure history. Interviews were conducted with child-bearing-age minority women at Metro Nashville General Hospital at Meharry Medical College. Data were analyzed utilizing DTReg software for Support Vector Machine (SVM) modeling followed by an SPSS package for a logistic regression model. The target (outcome) variable of interest was respondent's residence by ZIP code. The results demonstrate that the rank order of important variables with respect to SVM modeling versus traditional logistic regression models is almost identical. This is the first study documenting that SVM analysis has discriminate power for determination of higher-ordered spatial relationships on an environmental exposure history questionnaire. PMID:23395953
Chen, Chau-Kuang; Bruce, Michelle; Tyler, Lauren; Brown, Claudine; Garrett, Angelica; Goggins, Susan; Lewis-Polite, Brandy; Weriwoh, Mirabel L; Juarez, Paul D; Hood, Darryl B; Skelton, Tyler
2013-02-01
The goal of this study was to analyze a 54-item instrument for assessment of perception of exposure to environmental contaminants within the context of the built environment, or exposome. This exposome was defined in five domains to include 1) home and hobby, 2) school, 3) community, 4) occupation, and 5) exposure history. Interviews were conducted with child-bearing-age minority women at Metro Nashville General Hospital at Meharry Medical College. Data were analyzed utilizing DTReg software for Support Vector Machine (SVM) modeling followed by an SPSS package for a logistic regression model. The target (outcome) variable of interest was respondent's residence by ZIP code. The results demonstrate that the rank order of important variables with respect to SVM modeling versus traditional logistic regression models is almost identical. This is the first study documenting that SVM analysis has discriminate power for determination of higher-ordered spatial relationships on an environmental exposure history questionnaire.
NASA Astrophysics Data System (ADS)
Colkesen, Ismail; Sahin, Emrehan Kutlug; Kavzoglu, Taskin
2016-06-01
Identification of landslide prone areas and production of accurate landslide susceptibility zonation maps have been crucial topics for hazard management studies. Since the prediction of susceptibility is one of the main processing steps in landslide susceptibility analysis, selection of a suitable prediction method plays an important role in the success of the susceptibility zonation process. Although simple statistical algorithms (e.g. logistic regression) have been widely used in the literature, the use of advanced non-parametric algorithms in landslide susceptibility zonation has recently become an active research topic. The main purpose of this study is to investigate the possible application of kernel-based Gaussian process regression (GPR) and support vector regression (SVR) for producing landslide susceptibility map of Tonya district of Trabzon, Turkey. Results of these two regression methods were compared with logistic regression (LR) method that is regarded as a benchmark method. Results showed that while kernel-based GPR and SVR methods generally produced similar results (90.46% and 90.37%, respectively), they outperformed the conventional LR method by about 18%. While confirming the superiority of the GPR method, statistical tests based on ROC statistics, success rate and prediction rate curves revealed the significant improvement in susceptibility map accuracy by applying kernel-based GPR and SVR methods.
NASA Astrophysics Data System (ADS)
Saro, Lee; Woo, Jeon Seong; Kwan-Young, Oh; Moung-Jin, Lee
2016-02-01
The aim of this study is to predict landslide susceptibility caused using the spatial analysis by the application of a statistical methodology based on the GIS. Logistic regression models along with artificial neutral network were applied and validated to analyze landslide susceptibility in Inje, Korea. Landslide occurrence area in the study were identified based on interpretations of optical remote sensing data (Aerial photographs) followed by field surveys. A spatial database considering forest, geophysical, soil and topographic data, was built on the study area using the Geographical Information System (GIS). These factors were analysed using artificial neural network (ANN) and logistic regression models to generate a landslide susceptibility map. The study validates the landslide susceptibility map by comparing them with landslide occurrence areas. The locations of landslide occurrence were divided randomly into a training set (50%) and a test set (50%). A training set analyse the landslide susceptibility map using the artificial network along with logistic regression models, and a test set was retained to validate the prediction map. The validation results revealed that the artificial neural network model (with an accuracy of 80.10%) was better at predicting landslides than the logistic regression model (with an accuracy of 77.05%). Of the weights used in the artificial neural network model, `slope' yielded the highest weight value (1.330), and `aspect' yielded the lowest value (1.000). This research applied two statistical analysis methods in a GIS and compared their results. Based on the findings, we were able to derive a more effective method for analyzing landslide susceptibility.
Ren, Yilong; Wang, Yunpeng; Wu, Xinkai; Yu, Guizhen; Ding, Chuan
2016-10-01
Red light running (RLR) has become a major safety concern at signalized intersection. To prevent RLR related crashes, it is critical to identify the factors that significantly impact the drivers' behaviors of RLR, and to predict potential RLR in real time. In this research, 9-month's RLR events extracted from high-resolution traffic data collected by loop detectors from three signalized intersections were applied to identify the factors that significantly affect RLR behaviors. The data analysis indicated that occupancy time, time gap, used yellow time, time left to yellow start, whether the preceding vehicle runs through the intersection during yellow, and whether there is a vehicle passing through the intersection on the adjacent lane were significantly factors for RLR behaviors. Furthermore, due to the rare events nature of RLR, a modified rare events logistic regression model was developed for RLR prediction. The rare events logistic regression method has been applied in many fields for rare events studies and shows impressive performance, but so far none of previous research has applied this method to study RLR. The results showed that the rare events logistic regression model performed significantly better than the standard logistic regression model. More importantly, the proposed RLR prediction method is purely based on loop detector data collected from a single advance loop detector located 400 feet away from stop-bar. This brings great potential for future field applications of the proposed method since loops have been widely implemented in many intersections and can collect data in real time. This research is expected to contribute to the improvement of intersection safety significantly.
Engoren, Milo; Habib, Robert H; Dooner, John J; Schwann, Thomas A
2013-08-01
As many as 14 % of patients undergoing coronary artery bypass surgery are readmitted within 30 days. Readmission is usually the result of morbidity and may lead to death. The purpose of this study is to develop and compare statistical and genetic programming models to predict readmission. Patients were divided into separate Construction and Validation populations. Using 88 variables, logistic regression, genetic programs, and artificial neural nets were used to develop predictive models. Models were first constructed and tested on the Construction populations, then validated on the Validation population. Areas under the receiver operator characteristic curves (AU ROC) were used to compare the models. Two hundred and two patients (7.6 %) in the 2,644 patient Construction group and 216 (8.0 %) of the 2,711 patient Validation group were re-admitted within 30 days of CABG surgery. Logistic regression predicted readmission with AU ROC = .675 ± .021 in the Construction group. Genetic programs significantly improved the accuracy, AU ROC = .767 ± .001, p < .001). Artificial neural nets were less accurate with AU ROC = 0.597 ± .001 in the Construction group. Predictive accuracy of all three techniques fell in the Validation group. However, the accuracy of genetic programming (AU ROC = .654 ± .001) was still trivially but statistically non-significantly better than that of the logistic regression (AU ROC = .644 ± .020, p = .61). Genetic programming and logistic regression provide alternative methods to predict readmission that are similarly accurate.
Teshnizi, Saeed Hosseini; Ayatollahi, Sayyed Mohhamad Taghi
2015-01-01
Background and objective: Artificial Neural Networks (ANNs) have recently been applied in situations where an analysis based on the logistic regression (LR) is a standard statistical approach; direct comparisons of the results, however, are seldom attempted. In this study, we compared both logistic regression models and feed-forward neural networks on the academic failure data set. Methods: The data for this study included 18 questions about study situation of 275 undergraduate students selected randomly from among nursing and midwifery and paramedic schools of Hormozgan University of Medical Sciences in 2013. Logistic regression with forward method and feed forward Artificial Neural Network with 15 neurons in hidden layer were fitted to the dataset. The accuracy of the models in predicting academic failure was compared by using ROC (Receiver Operating Characteristic) and classification accuracy. Results: Among nine ANNs, the ANN with 15 neurons in hidden layer was a better ANN compared with LR. The Area Under Receiver Operating Characteristics (AUROC) of the LR model and ANN with 15 neurons in hidden layers, were estimated as 0.55 and 0.89, respectively and ANN was significantly greater than the LR. The LR and ANN models respectively classified 77.5% and 84.3% of the students correctly. Conclusion: Based on this dataset, it seems the classification of the students in two groups with and without academic failure by using ANN with 15 neurons in the hidden layer is better than the LR model. PMID:26635438
Hanson, Erin K; Mirza, Mohid; Rekab, Kamel; Ballantyne, Jack
2014-11-01
We report the identification of sensitive and specific miRNA biomarkers for menstrual blood, a tissue that might provide probative information in certain specialized instances. We incorporated these biomarkers into qPCR assays and developed a quantitative statistical model using logistic regression that permits the prediction of menstrual blood in a forensic sample with a high, and measurable, degree of accuracy. Using the developed model, we achieved 100% accuracy in determining the body fluid of interest for a set of test samples (i.e. samples not used in model development). The development, and details, of the logistic regression model are described. Testing and evaluation of the finalized logistic regression modeled assay using a small number of samples was carried out to preliminarily estimate the limit of detection (LOD), specificity in admixed samples and expression of the menstrual blood miRNA biomarkers throughout the menstrual cycle (25-28 days). The LOD was <1 ng of total RNA, the assay performed as expected with admixed samples and menstrual blood was identified only during the menses phase of the female reproductive cycle in two donors.
Zhao, Rui-Na; Zhang, Bo; Yang, Xiao; Jiang, Yu-Xin; Lai, Xing-Jian; Zhang, Xiao-Yan
2015-12-01
The purpose of the study described here was to determine specific characteristics of thyroid microcarcinoma (TMC) and explore the value of contrast-enhanced ultrasound (CEUS) combined with conventional ultrasound (US) in the diagnosis of TMC. Characteristics of 63 patients with TMC and 39 with benign sub-centimeter thyroid nodules were retrospectively analyzed. Multivariate logistic regression analysis was performed to determine independent risk factors. Four variables were included in the logistic regression models: age, shape, blood flow distribution and enhancement pattern. The area under the receiver operating characteristic curve was 0.919. With 0.113 selected as the cutoff value, sensitivity, specificity, positive predictive value, negative predictive value and accuracy were 90.5%, 82.1%, 89.1%, 84.2% and 87.3%, respectively. Independent risk factors for TMC determined with the combination of CEUS and conventional US were age, shape, blood flow distribution and enhancement pattern. Age was negatively correlated with malignancy, whereas shape, blood flow distribution and enhancement pattern were positively correlated. The logistic regression model involving CEUS and conventional US was found to be effective in the diagnosis of sub-centimeter thyroid nodules.
NASA Astrophysics Data System (ADS)
Schaeben, Helmut; Semmler, Georg
2016-09-01
The objective of prospectivity modeling is prediction of the conditional probability of the presence T = 1 or absence T = 0 of a target T given favorable or prohibitive predictors B, or construction of a two classes 0,1 classification of T. A special case of logistic regression called weights-of-evidence (WofE) is geologists' favorite method of prospectivity modeling due to its apparent simplicity. However, the numerical simplicity is deceiving as it is implied by the severe mathematical modeling assumption of joint conditional independence of all predictors given the target. General weights of evidence are explicitly introduced which are as simple to estimate as conventional weights, i.e., by counting, but do not require conditional independence. Complementary to the regression view is the classification view on prospectivity modeling. Boosting is the construction of a strong classifier from a set of weak classifiers. From the regression point of view it is closely related to logistic regression. Boost weights-of-evidence (BoostWofE) was introduced into prospectivity modeling to counterbalance violations of the assumption of conditional independence even though relaxation of modeling assumptions with respect to weak classifiers was not the (initial) purpose of boosting. In the original publication of BoostWofE a fabricated dataset was used to "validate" this approach. Using the same fabricated dataset it is shown that BoostWofE cannot generally compensate lacking conditional independence whatever the consecutively processing order of predictors. Thus the alleged features of BoostWofE are disproved by way of counterexamples, while theoretical findings are confirmed that logistic regression including interaction terms can exactly compensate violations of joint conditional independence if the predictors are indicators.
NASA Astrophysics Data System (ADS)
Bai, Shibiao; Glade, Thomas; Bell, Rainer; Wang, Jian
2010-05-01
Earthquake triggered landslides are very common throughout the world. In particular the last events, e.g. in Pakistan and in China 2008 have demonstrated, that this trigger should not been underestimated. In order to determine the most fragile landslide areas in the future for a similar earthquake, it is important to calculate for these areas landslide susceptibility maps. In this paper, firstly, the earthquake triggered landslide distribution inventory at Longnan, a case study in China, is build up by field investigation and interpretation of remote-sensing image data (SPOT 5 and ALOS). Then we presented the approach for the analysis and modeling of landslide data using rare events logistic regression. Data include digital orthophotomaps (DOM), digital elevation models (DEM), topographical parameters (e.g. altitude, slope, aspect, profile curvature, plan curvature, sediment transport capacity index, stream power index, topographic wetness index), geological information and further different GIS layers including settlement, road net and rivers. Landslides were identified by monoscopic manual interpretation, and validated during the field investigation. The quality of susceptibility mapping was validated by splitting the study area into a training and a validation set. The prediction capability analysis showed that the landslide susceptibility map could be used for land planning in this region as well as emergency planning by local authorities. The study are of Longnan is located in southern Gansu province bordering Shanxi in the east and Sichuan in the south. The major geographic features in Longnan are the Qinba Mountains in the east, the Loess Plateau in the north, and the Tibetan Plateau in the west. It is part of the Central Han basin in the east and the Sichuan basin in the south. The geological environment is in particular determined by regional fault zones. Neotectonic movements are active, and seismic activities are frequent. The length from east to west is
NASA Astrophysics Data System (ADS)
Mǎgut, F. L.; Zaharia, S.; Glade, T.; Irimuş, I. A.
2012-04-01
Various methods exist in analyzing spatial landslide susceptibility and classing the results in susceptibility classes. The prediction of spatial landslide distribution can be performed by using a variety of methods based on GIS techniques. The two very common methods of a heuristic assessment and a logistic regression model are employed in this study in order to compare their performance in predicting the spatial distribution of previously mapped landslides for a study area located in Maramureš County, in Northwestern Romania. The first model determines a susceptibility index by combining the heuristic approach with GIS techniques of spatial data analysis. The criteria used for quantifying each susceptibility factor and the expression used to determine the susceptibility index are taken from the Romanian legislation (Governmental Decision 447/2003). This procedure is followed in any Romanian state-ordered study which relies on financial support. The logistic regression model predicts the spatial distribution of landslides by statistically calculating regressive coefficients which describe the dependency of previously mapped landslides on different factors. The identified shallow landslides correspond generally to Pannonian marl and Quaternary contractile clay deposits. The study region is located in the Northwestern part of Romania, including the Baia Mare municipality, the capital of Maramureš County. The study focuses on the former piedmontal region situated to the south of the volcanic mountains Gutâi, in the Baia Mare Depression, where most of the landslide activity has been recorded. In addition, a narrow sector of the volcanic mountains which borders the city of Baia Mare to the north has also been included to test the accuracy of the models in different lithologic units. The results of both models indicate a general medium landslide susceptibility of the study area. The more detailed differences will be discussed with respect to the advantages and
NASA Astrophysics Data System (ADS)
García-Rodríguez, M. J.; Malpica, J. A.; Benito, B.
2009-04-01
In recent years, interest in landslide hazard assessment studies has increased substantially. They are appropriate for evaluation and mitigation plan development in landslide-prone areas. There are several techniques available for landslide hazard research at a regional scale. Generally, they can be classified in two groups: qualitative and quantitative methods. Most of qualitative methods tend to be subjective, since they depend on expert opinions and represent hazard levels in descriptive terms. On the other hand, quantitative methods are objective and they are commonly used due to the correlation between the instability factors and the location of the landslides. Within this group, statistical approaches and new heuristic techniques based on artificial intelligence (artificial neural network (ANN), fuzzy logic, etc.) provide rigorous analysis to assess landslide hazard over large regions. However, they depend on qualitative and quantitative data, scale, types of movements and characteristic factors used. We analysed and compared an approach for assessing earthquake-triggered landslides hazard using logistic regression (LR) and artificial neural networks (ANN) with a back-propagation learning algorithm. One application has been developed in El Salvador, a country of Central America where the earthquake-triggered landslides are usual phenomena. In a first phase, we analysed the susceptibility and hazard associated to the seismic scenario of the 2001 January 13th earthquake. We calibrated the models using data from the landslide inventory for this scenario. These analyses require input variables representing physical parameters to contribute to the initiation of slope instability, for example, slope gradient, elevation, aspect, mean annual precipitation, lithology, land use, and terrain roughness, while the occurrence or non-occurrence of landslides is considered as dependent variable. The results of the landslide susceptibility analysis are checked using landslide
Smith, Matthew I.; de Lusignan, Simon; Mullett, David; Correa, Ana; Tickner, Jermaine; Jones, Simon
2016-01-01
Introduction Falls are the leading cause of injury in older people. Reducing falls could reduce financial pressures on health services. We carried out this research to develop a falls risk model, using routine primary care and hospital data to identify those at risk of falls, and apply a cost analysis to enable commissioners of health services to identify those in whom savings can be made through referral to a falls prevention service. Methods Multilevel logistical regression was performed on routinely collected general practice and hospital data from 74751 over 65’s, to produce a risk model for falls. Validation measures were carried out. A cost-analysis was performed to identify at which level of risk it would be cost-effective to refer patients to a falls prevention service. 95% confidence intervals were calculated using a Monte Carlo Model (MCM), allowing us to adjust for uncertainty in the estimates of these variables. Results A risk model for falls was produced with an area under the curve of the receiver operating characteristics curve of 0.87. The risk cut-off with the highest combination of sensitivity and specificity was at p = 0.07 (sensitivity of 81% and specificity of 78%). The risk cut-off at which savings outweigh costs was p = 0.27 and the risk cut-off with the maximum savings was p = 0.53, which would result in referral of 1.8% and 0.45% of the over 65’s population respectively. Above a risk cut-off of p = 0.27, costs do not exceed savings. Conclusions This model is the best performing falls predictive tool developed to date; it has been developed on a large UK city population; can be readily run from routine data; and can be implemented in a way that optimises the use of health service resources. Commissioners of health services should use this model to flag and refer patients at risk to their falls service and save resources. PMID:27448280
Enticott, Joanne C; Cheng, I-Hao; Russell, Grant; Szwarc, Josef; Braitberg, George; Peek, Anne; Meadows, Graham
2015-01-01
This study investigated if people born in refugee source countries are disproportionately represented among those receiving a diagnosis of mental illness within emergency departments (EDs). The setting was the Cities of Greater Dandenong and Casey, the resettlement region for one-twelfth of Australia's refugees. An epidemiological, secondary data analysis compared mental illness diagnoses received in EDs by refugee and non-refugee populations. Data was the Victorian Emergency Minimum Dataset in the 2008-09 financial year. Univariate and multivariate logistic regression created predictive models for mental illness using five variables: age, sex, refugee background, interpreter use and preferred language. Collinearity, model fit and model stability were examined. Multivariate analysis showed age and sex to be the only significant risk factors for mental illness diagnosis in EDs. 'Refugee status', 'interpreter use' and 'preferred language' were not associatedwith a mental health diagnosis following risk adjustment forthe effects ofage and sex. The disappearance ofthe univariate association after adjustment for age and sex is a salutary lesson for Medicare Locals and other health planners regarding the importance of adjusting analyses of health service data for demographic characteristics.
Enticott, Joanne C; Cheng, I-Hao; Russell, Grant; Szwarc, Josef; Braitberg, George; Peek, Anne; Meadows, Graham
2015-01-01
This study investigated if people born in refugee source countries are disproportionately represented among those receiving a diagnosis of mental illness within emergency departments (EDs). The setting was the Cities of Greater Dandenong and Casey, the resettlement region for one-twelfth of Australia's refugees. An epidemiological, secondary data analysis compared mental illness diagnoses received in EDs by refugee and non-refugee populations. Data was the Victorian Emergency Minimum Dataset in the 2008-09 financial year. Univariate and multivariate logistic regression created predictive models for mental illness using five variables: age, sex, refugee background, interpreter use and preferred language. Collinearity, model fit and model stability were examined. Multivariate analysis showed age and sex to be the only significant risk factors for mental illness diagnosis in EDs. 'Refugee status', 'interpreter use' and 'preferred language' were not associatedwith a mental health diagnosis following risk adjustment forthe effects ofage and sex. The disappearance ofthe univariate association after adjustment for age and sex is a salutary lesson for Medicare Locals and other health planners regarding the importance of adjusting analyses of health service data for demographic characteristics. PMID:24922047
NASA Technical Reports Server (NTRS)
Smith, Kelly M.; Gay, Robert S.; Stachowiak, Susan J.
2013-01-01
In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter to improve altitude knowledge. In order to increase overall robustness, the vehicle also has an alternate method of triggering the parachute deployment sequence based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this backup trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to semi-automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a statistical classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers improved performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles.
NASA Technical Reports Server (NTRS)
Smith, Kelly; Gay, Robert; Stachowiak, Susan
2013-01-01
In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter to improve altitude knowledge. In order to increase overall robustness, the vehicle also has an alternate method of triggering the parachute deployment sequence based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this backup trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to semi-automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a statistical classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers improved performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles
NASA Astrophysics Data System (ADS)
Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele
2015-11-01
The aim of this work is to define reliable susceptibility models for shallow landslides using Logistic Regression and Random Forests multivariate statistical techniques. The study area, located in North-East Sicily, was hit on October 1st 2009 by a severe rainstorm (225 mm of cumulative rainfall in 7 h) which caused flash floods and more than 1000 landslides. Several small villages, such as Giampilieri, were hit with 31 fatalities, 6 missing persons and damage to buildings and transportation infrastructures. Landslides, mainly types such as earth and debris translational slides evolving into debris flows, were triggered on steep slopes and involved colluvium and regolith materials which cover the underlying metamorphic bedrock. The work has been carried out with the following steps: i) realization of a detailed event landslide inventory map through field surveys coupled with observation of high resolution aerial colour orthophoto; ii) identification of landslide source areas; iii) data preparation of landslide controlling factors and descriptive statistics based on a bivariate method (Frequency Ratio) to get an initial overview on existing relationships between causative factors and shallow landslide source areas; iv) choice of criteria for the selection and sizing of the mapping unit; v) implementation of 5 multivariate statistical susceptibility models based on Logistic Regression and Random Forests techniques and focused on landslide source areas; vi) evaluation of the influence of sample size and type of sampling on results and performance of the models; vii) evaluation of the predictive capabilities of the models using ROC curve, AUC and contingency tables; viii) comparison of model results and obtained susceptibility maps; and ix) analysis of temporal variation of landslide susceptibility related to input parameter changes. Models based on Logistic Regression and Random Forests have demonstrated excellent predictive capabilities. Land use and wildfire
NASA Technical Reports Server (NTRS)
Smith, Kelly M.; Gay, Robert S.; Stachowiak, Susan J.
2013-01-01
In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter. In order to increase overall robustness, the vehicle also has an alternate method of triggering the drogue parachute deployment based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this velocity-based trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers excellent performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles.
VanderWeele, Tyler J; Robinson, Whitney R
2014-07-01
We consider several possible interpretations of the "effect of race" when regressions are run with race as an exposure variable, controlling also for various confounding and mediating variables. When adjustment is made for socioeconomic status early in a person's life, we discuss under what contexts the regression coefficients for race can be interpreted as corresponding to the extent to which a racial inequality would remain if various socioeconomic distributions early in life across racial groups could be equalized. When adjustment is also made for adult socioeconomic status, we note how the overall racial inequality can be decomposed into the portion that would be eliminated by equalizing adult socioeconomic status across racial groups and the portion of the inequality that would remain even if adult socioeconomic status across racial groups were equalized. We also discuss a stronger interpretation of the effect of race (stronger in terms of assumptions) involving the joint effects of race-associated physical phenotype (eg, skin color), parental physical phenotype, genetic background, and cultural context when such variables are thought to be hypothetically manipulable and if adequate control for confounding were possible. We discuss some of the challenges with such an interpretation. Further discussion is given as to how the use of selected populations in examining racial disparities can additionally complicate the interpretation of the effects.
On causal interpretation of race in regressions adjusting for confounding and mediating variables
VanderWeele, Tyler J.; Robinson, Whitney R.
2014-01-01
We consider several possible interpretations of the “effect of race” when regressions are run with race as an exposure variable, controlling also for various confounding and mediating variables. When adjustment is made for socioeconomic status early in a person’s life, we discuss under what contexts the regression coefficients for race can be interpreted as corresponding to the extent to which a racial inequality would remain if various socioeconomic distributions early in life across racial groups could be equalized. When adjustment is also made for adult socioeconomic status, we note how the overall racial inequality can be decomposed into the portion that would be eliminated by equalizing adult socioeconomic status across racial groups and the portion of the inequality that would remain even if adult socioeconomic status across racial groups were equalized. We also discuss a stronger interpretation of the “effect of race” (stronger in terms of assumptions) involving the joint effects of race-associated physical phenotype (e.g. skin color), parental physical phenotype, genetic background and cultural context when such variables are thought to be hypothetically manipulable and if adequate control for confounding were possible. We discuss some of the challenges with such an interpretation. Further discussion is given as to how the use of selected populations in examining racial disparities can additionally complicate the interpretation of the effects. PMID:24887159
NASA Astrophysics Data System (ADS)
Margarint, M. C.; Grozavu, A.; Patriche, C. V.
2012-04-01
Landslides represent a significant natural hazard in hilly areas of Romania which cause important damages. The scientific interest for landslide susceptibility mapping is quite recent and standardized through legislation. However, there is need for improving the methodology, in order for the susceptibility maps to constitute a sound basis for territorial planning. The logistic regression is one of the main statistical methods used for assessing terrain susceptibility for landsliding. There are different degrees of weighting the landslide causal factors mentioned in the scientific literature, but with large variations. This study aims to identify the range of variation of landslide causal factors for different regions in Romania. The following factors were taken into consideration: slope angle, terrain altitude, terrain curvature (mean, plan and profile), soil type, lithologic class, land use, distance from drainage network and roads, mean annual precipitations. Four square perimeters of 15x15 km were chosen from representative regions in terms of spatial extent of landslides: two situated in the central-northern part of the Moldavian Plateau, one in the Transylvania Depression and one in the Moldavian Subcarpathians. The logistic regression was applied separately for the four sectors. In order to monitor the differences in the final results, numerous attempts have been made, starting from landslides polygons acquired from both the topographic maps at scale 1:25.000 (1984-1985 edition) and the ortophotoimages (2005-2006). The other elements were acquired from cartographic materials at appropriate scales, according the international methodology. The data integration was accomplished in the georeferenced environment provided by TNTMips 6.9 ArcGIS 9.3 and SAGA 2.0.8 software packages, while the statistical analysis was performed using Excel 2003 and XLSTAT 2010 trial version. Maps for all landslide causal factors were achieved for each perimeter. The logistic
NASA Astrophysics Data System (ADS)
Pradhan, Biswajeet
2010-05-01
This paper presents the results of the cross-validation of a multivariate logistic regression model using remote sensing data and GIS for landslide hazard analysis on the Penang, Cameron, and Selangor areas in Malaysia. Landslide locations in the study areas were identified by interpreting aerial photographs and satellite images, supported by field surveys. SPOT 5 and Landsat TM satellite imagery were used to map landcover and vegetation index, respectively. Maps of topography, soil type, lineaments and land cover were constructed from the spatial datasets. Ten factors which influence landslide occurrence, i.e., slope, aspect, curvature, distance from drainage, lithology, distance from lineaments, soil type, landcover, rainfall precipitation, and normalized difference vegetation index (ndvi), were extracted from the spatial database and the logistic regression coefficient of each factor was computed. Then the landslide hazard was analysed using the multivariate logistic regression coefficients derived not only from the data for the respective area but also using the logistic regression coefficients calculated from each of the other two areas (nine hazard maps in all) as a cross-validation of the model. For verification of the model, the results of the analyses were then compared with the field-verified landslide locations. Among the three cases of the application of logistic regression coefficient in the same study area, the case of Selangor based on the Selangor logistic regression coefficients showed the highest accuracy (94%), where as Penang based on the Penang coefficients showed the lowest accuracy (86%). Similarly, among the six cases from the cross application of logistic regression coefficient in other two areas, the case of Selangor based on logistic coefficient of Cameron showed highest (90%) prediction accuracy where as the case of Penang based on the Selangor logistic regression coefficients showed the lowest accuracy (79%). Qualitatively, the cross
Ulfenborg, Benjamin; Klinga-Levan, Karin; Olsson, Björn
2015-01-01
In silico prediction of novel miRNAs from genomic sequences remains a challenging problem. This study presents a genome-wide miRNA discovery software package called GenoScan and evaluates two hairpin classification methods. These methods, one ensemble-based and one using logistic regression were benchmarked along with 15 published methods. In addition, the sequence-folding step is addressed by investigating the impact of secondary structure prediction methods and the choice of input sequence length on prediction performance. Both the accuracy of secondary structure predictions and the miRNA prediction are evaluated. In the benchmark of hairpin classification methods, the regression model achieved highest classification accuracy. Of the structure prediction methods evaluated, ContextFold achieved the highest agreement between predicted and experimentally determined structures. However, both the choice of secondary structure prediction method and input sequence length had limited impact on hairpin classification performance.
Perez, Ivan; Chavez, Allison K.; Ponce, Dario
2016-01-01
Background: The Ricketts' posteroanterior (PA) cephalometry seems to be the most widely used and it has not been tested by multivariate statistics for sex determination. Objective: The objective was to determine the applicability of Ricketts' PA cephalometry for sex determination using the logistic regression analysis. Materials and Methods: The logistic models were estimated at distinct age cutoffs (all ages, 11 years, 13 years, and 15 years) in a database from 1,296 Hispano American Peruvians between 5 years and 44 years of age. Results: The logistic models were composed by six cephalometric measurements; the accuracy achieved by resubstitution varied between 60% and 70% and all the variables, with one exception, exhibited a direct relationship with the probability of being classified as male; the nasal width exhibited an indirect relationship. Conclusion: The maxillary and facial widths were present in all models and may represent a sexual dimorphism indicator. The accuracy found was lower than the literature and the Ricketts' PA cephalometry may not be adequate for sex determination. The indirect relationship of the nasal width in models with data from patients of 12 years of age or less may be a trait related to age or a characteristic in the studied population, which could be better studied and confirmed. PMID:27555732
2013-01-01
Background Genome-wide association studies have become very popular in identifying genetic contributions to phenotypes. Millions of SNPs are being tested for their association with diseases and traits using linear or logistic regression models. This conceptually simple strategy encounters the following computational issues: a large number of tests and very large genotype files (many Gigabytes) which cannot be directly loaded into the software memory. One of the solutions applied on a grand scale is cluster computing involving large-scale resources. We show how to speed up the computations using matrix operations in pure R code. Results We improve speed: computation time from 6 hours is reduced to 10-15 minutes. Our approach can handle essentially an unlimited amount of covariates efficiently, using projections. Data files in GWAS are vast and reading them into computer memory becomes an important issue. However, much improvement can be made if the data is structured beforehand in a way allowing for easy access to blocks of SNPs. We propose several solutions based on the R packages ff and ncdf. We adapted the semi-parallel computations for logistic regression. We show that in a typical GWAS setting, where SNP effects are very small, we do not lose any precision and our computations are few hundreds times faster than standard procedures. Conclusions We provide very fast algorithms for GWAS written in pure R code. We also show how to rearrange SNP data for fast access. PMID:23711206
Battaglin, William A.; Ulery, Randy L.; Winterstein, Thomas; Welborn, Toby
2003-01-01
In the State of Texas, surface water (streams, canals, and reservoirs) and ground water are used as sources of public water supply. Surface-water sources of public water supply are susceptible to contamination from point and nonpoint sources. To help protect sources of drinking water and to aid water managers in designing protective yet cost-effective and risk-mitigated monitoring strategies, the Texas Commission on Environmental Quality and the U.S. Geological Survey developed procedures to assess the susceptibility of public water-supply source waters in Texas to the occurrence of 227 contaminants. One component of the assessments is the determination of susceptibility of surface-water sources to nonpoint-source contamination. To accomplish this, water-quality data at 323 monitoring sites were matched with geographic information system-derived watershed- characteristic data for the watersheds upstream from the sites. Logistic regression models then were developed to estimate the probability that a particular contaminant will exceed a threshold concentration specified by the Texas Commission on Environmental Quality. Logistic regression models were developed for 63 of the 227 contaminants. Of the remaining contaminants, 106 were not modeled because monitoring data were available at less than 10 percent of the monitoring sites; 29 were not modeled because there were less than 15 percent detections of the contaminant in the monitoring data; 27 were not modeled because of the lack of any monitoring data; and 2 were not modeled because threshold values were not specified.
A general equation to obtain multiple cut-off scores on a test from multinomial logistic regression.
Bersabé, Rosa; Rivas, Teresa
2010-05-01
The authors derive a general equation to compute multiple cut-offs on a total test score in order to classify individuals into more than two ordinal categories. The equation is derived from the multinomial logistic regression (MLR) model, which is an extension of the binary logistic regression (BLR) model to accommodate polytomous outcome variables. From this analytical procedure, cut-off scores are established at the test score (the predictor variable) at which an individual is as likely to be in category j as in category j+1 of an ordinal outcome variable. The application of the complete procedure is illustrated by an example with data from an actual study on eating disorders. In this example, two cut-off scores on the Eating Attitudes Test (EAT-26) scores are obtained in order to classify individuals into three ordinal categories: asymptomatic, symptomatic and eating disorder. Diagnoses were made from the responses to a self-report (Q-EDD) that operationalises DSM-IV criteria for eating disorders. Alternatives to the MLR model to set multiple cut-off scores are discussed.
Lopez, Michael J; Gutman, Roee
2014-11-28
Propensity score methods are common for estimating a binary treatment effect when treatment assignment is not randomized. When exposure is measured on an ordinal scale (i.e. low-medium-high), however, propensity score inference requires extensions which have received limited attention. Estimands of possible interest with an ordinal exposure are the average treatment effects between each pair of exposure levels. Using these estimands, it is possible to determine an optimal exposure level. Traditional methods, including dichotomization of the exposure or a series of binary propensity score comparisons across exposure pairs, are generally inadequate for identification of optimal levels. We combine subclassification with regression adjustment to estimate transitive, unbiased average causal effects across an ordered exposure, and apply our method on the 2005-2006 National Health and Nutrition Examination Survey to estimate the effects of nutritional label use on body mass index.
Lidauer, M H; Emmerling, R; Mäntysaari, E A
2008-06-01
A multiplicative random regression (M-RRM) test-day (TD) model was used to analyse daily milk yields from all available parities of German and Austrian Simmental dairy cattle. The method to account for heterogeneous variance (HV) was based on the multiplicative mixed model approach of Meuwissen. The variance model for the heterogeneity parameters included a fixed region x year x month x parity effect and a random herd x test-month effect with a within-herd first-order autocorrelation between test-months. Acceleration of variance model solutions after each multiplicative model cycle enabled fast convergence of adjustment factors and reduced total computing time significantly. Maximum Likelihood estimation of within-strata residual variances was enhanced by inclusion of approximated information on loss in degrees of freedom due to estimation of location parameters. This improved heterogeneity estimates for very small herds. The multiplicative model was compared with a model that assumed homogeneous variance. Re-estimated genetic variances, based on Mendelian sampling deviations, were homogeneous for the M-RRM TD model but heterogeneous for the homogeneous random regression TD model. Accounting for HV had large effect on cow ranking but moderate effect on bull ranking.
NASA Astrophysics Data System (ADS)
Ariffin, Syaiba Balqish; Midi, Habshah; Arasan, Jayanthi; Rana, Md Sohel
2015-02-01
This article is concerned with the performance of the maximum estimated likelihood estimator in the presence of separation in the space of the independent variables and high leverage points. The maximum likelihood estimator suffers from the problem of non overlap cases in the covariates where the regression coefficients are not identifiable and the maximum likelihood estimator does not exist. Consequently, iteration scheme fails to converge and gives faulty results. To remedy this problem, the maximum estimated likelihood estimator is put forward. It is evident that the maximum estimated likelihood estimator is resistant against separation and the estimates always exist. The effect of high leverage points are then investigated on the performance of maximum estimated likelihood estimator through real data sets and Monte Carlo simulation study. The findings signify that the maximum estimated likelihood estimator fails to provide better parameter estimates in the presence of both separation, and high leverage points.
ERIC Educational Resources Information Center
Osborne, Jason W.
2012-01-01
Logistic regression is slowly gaining acceptance in the social sciences, and fills an important niche in the researcher's toolkit: being able to predict important outcomes that are not continuous in nature. While OLS regression is a valuable tool, it cannot routinely be used to predict outcomes that are binary or categorical in nature. These…
2013-01-01
Methods for analysis of network dynamics have seen great progress in the past decade. This article shows how Dynamic Network Logistic Regression techniques (a special case of the Temporal Exponential Random Graph Models) can be used to implement decision theoretic models for network dynamics in a panel data context. We also provide practical heuristics for model building and assessment. We illustrate the power of these techniques by applying them to a dynamic blog network sampled during the 2004 US presidential election cycle. This is a particularly interesting case because it marks the debut of Internet-based media such as blogs and social networking web sites as institutionally recognized features of the American political landscape. Using a longitudinal sample of all Democratic National Convention/Republican National Convention–designated blog citation networks, we are able to test the influence of various strategic, institutional, and balance-theoretic mechanisms as well as exogenous factors such as seasonality and political events on the propensity of blogs to cite one another over time. Using a combination of deviance-based model selection criteria and simulation-based model adequacy tests, we identify the combination of processes that best characterizes the choice behavior of the contending blogs. PMID:24143060
Raevsky, O A; Polianczyk, D E; Mukhametov, A; Grigorev, V Y
2016-08-01
Assessment of "CNS drugs/CNS candidates" classification abilities of the multi-parametric optimization (CNS MPO) approach was performed by logistic regression. It was found that the five out of the six separately used physical-chemical properties (topological polar surface area, number of hydrogen-bonded donor atoms, basicity, lipophilicity of compound in neutral form and at pH = 7.4) provided accuracy of recognition below 60%. Only the descriptor of molecular weight (MW) could correctly classify two-thirds of the studied compounds. Aggregation of all six properties in the MPOscore did not improve the classification, which was worse than the classification using only MW. The results of our study demonstrate the imperfection of the CNS MPO approach; in its current form it is not very useful for computer design of new, effective CNS drugs. PMID:27477321
NASA Astrophysics Data System (ADS)
Notario del Pino, Jesús S.; Ruiz-Gallardo, José-Reyes
2015-03-01
Treatments that minimize soil erosion after large wildfires depend, among other factors, on fire severity and landscape configuration so that, in practice, most of them are applied according to emergency criteria. Therefore, simple tools to predict soil erosion risk help to decide where the available resources should be used first. In this study, a predictive model for soil erosion degree, based on ordinal logistic regression, has been developed and evaluated using data from three large forest fires in South-eastern Spain. The field data were successfully fit to the model in 60% of cases after 50 runs (i.e., agreement between observed and predicted soil erosion degrees), using slope steepness, slope aspect, and fire severity as predictors. North-facing slopes were shown to be less prone to soil erosion than the rest.
Duchateau, L; Kruska, R L; Perry, B D
1997-10-01
Large databases with multiple variables, selected because they are available and might provide an insight into establishing causal relationships, are often difficult to analyse and interpret because of multicollinearity. The objective of this study was to reduce the dimensionality of a multivariable spatial database of Zimbabwe, containing many environmental variables that were collected to predict the distribution of outbreaks of theileriosis (the tick-borne infection of cattle caused by Theileria parva and transmitted by the brown ear tick). Principal-component analysis and varimax rotation of the principal components were first used to select a reduced number of variables. The logistic-regression model was evaluated by appropriate goodness-of-fit tests.
Jostins, Luke; McVean, Gilean
2016-01-01
Motivation: For many classes of disease the same genetic risk variants underly many related phenotypes or disease subtypes. Multinomial logistic regression provides an attractive framework to analyze multi-category phenotypes, and explore the genetic relationships between these phenotype categories. We introduce Trinculo, a program that implements a wide range of multinomial analyses in a single fast package that is designed to be easy to use by users of standard genome-wide association study software. Availability and implementation: An open source C implementation, with code and binaries for Linux and Mac OSX, is available for download at http://sourceforge.net/projects/trinculo Supplementary information: Supplementary data are available at Bioinformatics online. Contact: lj4@well.ox.ac.uk PMID:26873930
Bent, Gardner C.; Steeves, Peter A.
2006-01-01
A revised logistic regression equation and an automated procedure were developed for mapping the probability of a stream flowing perennially in Massachusetts. The equation provides city and town conservation commissions and the Massachusetts Department of Environmental Protection a method for assessing whether streams are intermittent or perennial at a specific site in Massachusetts by estimating the probability of a stream flowing perennially at that site. This information could assist the environmental agencies who administer the Commonwealth of Massachusetts Rivers Protection Act of 1996, which establishes a 200-foot-wide protected riverfront area extending from the mean annual high-water line along each side of a perennial stream, with exceptions for some urban areas. The equation was developed by relating the observed intermittent or perennial status of a stream site to selected basin characteristics of naturally flowing streams (defined as having no regulation by dams, surface-water withdrawals, ground-water withdrawals, diversion, wastewater discharge, and so forth) in Massachusetts. This revised equation differs from the equation developed in a previous U.S. Geological Survey study in that it is solely based on visual observations of the intermittent or perennial status of stream sites across Massachusetts and on the evaluation of several additional basin and land-use characteristics as potential explanatory variables in the logistic regression analysis. The revised equation estimated more accurately the intermittent or perennial status of the observed stream sites than the equation from the previous study. Stream sites used in the analysis were identified as intermittent or perennial based on visual observation during low-flow periods from late July through early September 2001. The database of intermittent and perennial streams included a total of 351 naturally flowing (no regulation) sites, of which 85 were observed to be intermittent and 266 perennial
Papritz, A; Reichard, P U
2009-07-01
Soils of allotments are often contaminated by heavy metals and persistent organic pollutants. In particular, lead (Pb) and polycyclic aromatic hydrocarbons (PAHs) frequently exceed legal intervention values (IVs). Allotments are popular in European countries; cities may own and let several thousand allotment plots. Assessing soil contamination for all the plots would be very costly. Soil contamination in allotments is often linked to gardening practice and historic land use. Hence, we predict the risk of IV exceedance from attributes that characterize the history and management of allotment areas (age, nearby presence of pollutant sources, prior land use). Robust logistic regression analyses of data of Swiss allotments demonstrate that the risk of IV exceedance can be predicted quite precisely without costly soil analyses. Thus, the new method allows screening many allotments at small costs, and it helps to deploy the resources available for soil contamination surveying more efficiently.
2015-01-01
This paper considers the problem of estimation and variable selection for large high-dimensional data (high number of predictors p and large sample size N, without excluding the possibility that N < p) resulting from an individually matched case-control study. We develop a simple algorithm for the adaptation of the Lasso and related methods to the conditional logistic regression model. Our proposal relies on the simplification of the calculations involved in the likelihood function. Then, the proposed algorithm iteratively solves reweighted Lasso problems using cyclical coordinate descent, computed along a regularization path. This method can handle large problems and deal with sparse features efficiently. We discuss benefits and drawbacks with respect to the existing available implementations. We also illustrate the interest and use of these techniques on a pharmacoepidemiological study of medication use and traffic safety. PMID:25916593
Bejaei, M; Wiseman, K; Cheng, K M
2015-01-01
Consumers' interest in specialty eggs appears to be growing in Europe and North America. The objective of this research was to develop logistic regression models that utilise purchaser attributes and demographics to predict the probability of a consumer purchasing a specific type of table egg including regular (white and brown), non-caged (free-run, free-range and organic) or nutrient-enhanced eggs. These purchase prediction models, together with the purchasers' attributes, can be used to assess market opportunities of different egg types specifically in British Columbia (BC). An online survey was used to gather data for the models. A total of 702 completed questionnaires were submitted by BC residents. Selected independent variables included in the logistic regression to develop models for different egg types to predict the probability of a consumer purchasing a specific type of table egg. The variables used in the model accounted for 54% and 49% of variances in the purchase of regular and non-caged eggs, respectively. Research results indicate that consumers of different egg types exhibit a set of unique and statistically significant characteristics and/or demographics. For example, consumers of regular eggs were less educated, older, price sensitive, major chain store buyers, and store flyer users, and had lower awareness about different types of eggs and less concern regarding animal welfare issues. However, most of the non-caged egg consumers were less concerned about price, had higher awareness about different types of table eggs, purchased their eggs from local/organic grocery stores, farm gates or farmers markets, and they were more concerned about care and feeding of hens compared to consumers of other eggs types. PMID:26103791
Bejaei, M; Wiseman, K; Cheng, K M
2015-01-01
Consumers' interest in specialty eggs appears to be growing in Europe and North America. The objective of this research was to develop logistic regression models that utilise purchaser attributes and demographics to predict the probability of a consumer purchasing a specific type of table egg including regular (white and brown), non-caged (free-run, free-range and organic) or nutrient-enhanced eggs. These purchase prediction models, together with the purchasers' attributes, can be used to assess market opportunities of different egg types specifically in British Columbia (BC). An online survey was used to gather data for the models. A total of 702 completed questionnaires were submitted by BC residents. Selected independent variables included in the logistic regression to develop models for different egg types to predict the probability of a consumer purchasing a specific type of table egg. The variables used in the model accounted for 54% and 49% of variances in the purchase of regular and non-caged eggs, respectively. Research results indicate that consumers of different egg types exhibit a set of unique and statistically significant characteristics and/or demographics. For example, consumers of regular eggs were less educated, older, price sensitive, major chain store buyers, and store flyer users, and had lower awareness about different types of eggs and less concern regarding animal welfare issues. However, most of the non-caged egg consumers were less concerned about price, had higher awareness about different types of table eggs, purchased their eggs from local/organic grocery stores, farm gates or farmers markets, and they were more concerned about care and feeding of hens compared to consumers of other eggs types.
NASA Astrophysics Data System (ADS)
Espino, Natalia V.
Foreign Object Debris/Damage (FOD) is a costly and high-risk problem that aeronautics industries such as Boeing, Lockheed Martin, among others are facing at their production lines every day. They spend an average of $350 thousand dollars per year fixing FOD problems. FOD can put pilots, passengers and other crews' lives into high-risk. FOD refers to any type of foreign object, particle, debris or agent in the manufacturing environment, which could contaminate/damage the product or otherwise undermine quality control standards. FOD can be in the form of any of the following categories: panstock, manufacturing debris, tools/shop aids, consumables and trash. Although aeronautics industries have put many prevention plans in place such as housekeeping and "clean as you go" philosophies, trainings, use of RFID for tooling control, etc. none of them has been able to completely eradicate the problem. This research presents a logistic regression statistical model approach to predict probability of FOD type under given specific circumstances such as workstation, month and aircraft/jet being built. FOD Quality Assurance Reports of the last three years were provided by an aeronautical industry for this study. By predicting type of FOD, custom reduction/elimination plans can be put in place and by such means being able to diminish the problem. Different aircrafts were analyzed and so different models developed through same methodology. Results of the study presented are predictions of FOD type for each aircraft and workstation throughout the year, which were obtained by applying proposed logistic regression models. This research would help aeronautic industries to address the FOD problem correctly, to be able to identify root causes and establish actual reduction/elimination plans.
NASA Astrophysics Data System (ADS)
Demir, Gökhan; aytekin, mustafa; banu ikizler, sabriye; angın, zekai
2013-04-01
The North Anatolian Fault is know as one of the most active and destructive fault zone which produced many earthquakes with high magnitudes. Along this fault zone, the morphology and the lithological features are prone to landsliding. However, many earthquake induced landslides were recorded by several studies along this fault zone, and these landslides caused both injuiries and live losts. Therefore, a detailed landslide susceptibility assessment for this area is indispancable. In this context, a landslide susceptibility assessment for the 1445 km2 area in the Kelkit River valley a part of North Anatolian Fault zone (Eastern Black Sea region of Turkey) was intended with this study, and the results of this study are summarized here. For this purpose, geographical information system (GIS) and a bivariate statistical model were used. Initially, Landslide inventory maps are prepared by using landslide data determined by field surveys and landslide data taken from General Directorate of Mineral Research and Exploration. The landslide conditioning factors are considered to be lithology, slope gradient, slope aspect, topographical elevation, distance to streams, distance to roads and distance to faults, drainage density and fault density. ArcGIS package was used to manipulate and analyze all the collected data Logistic regression method was applied to create a landslide susceptibility map. Landslide susceptibility maps were divided into five susceptibility regions such as very low, low, moderate, high and very high. The result of the analysis was verified using the inventoried landslide locations and compared with the produced probability model. For this purpose, Area Under Curvature (AUC) approach was applied, and a AUC value was obtained. Based on this AUC value, the obtained landslide susceptibility map was concluded as satisfactory. Keywords: North Anatolian Fault Zone, Landslide susceptibility map, Geographical Information Systems, Logistic Regression Analysis.
Calef, M.P.; McGuire, A.D.; Epstein, H.E.; Rupp, T.S.; Shugart, H.H.
2005-01-01
Aim: To understand drivers of vegetation type distribution and sensitivity to climate change. Location: Interior Alaska. Methods: A logistic regression model was developed that predicts the potential equilibrium distribution of four major vegetation types: tundra, deciduous forest, black spruce forest and white spruce forest based on elevation, aspect, slope, drainage type, fire interval, average growing season temperature and total growing season precipitation. The model was run in three consecutive steps. The hierarchical logistic regression model was used to evaluate how scenarios of changes in temperature, precipitation and fire interval may influence the distribution of the four major vegetation types found in this region. Results: At the first step, tundra was distinguished from forest, which was mostly driven by elevation, precipitation and south to north aspect. At the second step, forest was separated into deciduous and spruce forest, a distinction that was primarily driven by fire interval and elevation. At the third step, the identification of black vs. white spruce was driven mainly by fire interval and elevation. The model was verified for Interior Alaska, the region used to develop the model, where it predicted vegetation distribution among the steps with an accuracy of 60-83%. When the model was independently validated for north-west Canada, it predicted vegetation distribution among the steps with an accuracy of 53-85%. Black spruce remains the dominant vegetation type under all scenarios, potentially expanding most under warming coupled with increasing fire interval. White spruce is clearly limited by moisture once average growing season temperatures exceeded a critical limit (+2 ??C). Deciduous forests expand their range the most when any two of the following scenarios are combined: decreasing fire interval, warming and increasing precipitation. Tundra can be replaced by forest under warming but expands under precipitation increase. Main
Alishiri, Gholam Hossein; Bayat, Noushin; Fathi Ashtiani, Ali; Tavallaii, Seyed Abbas; Assari, Shervin; Moharamzad, Yashar
2008-01-01
The aim of this work was to develop two logistic regression models capable of predicting physical and mental health related quality of life (HRQOL) among rheumatoid arthritis (RA) patients. In this cross-sectional study which was conducted during 2006 in the outpatient rheumatology clinic of our university hospital, Short Form 36 (SF-36) was used for HRQOL measurements in 411 RA patients. A cutoff point to define poor versus good HRQOL was calculated using the first quartiles of SF-36 physical and mental component scores (33.4 and 36.8, respectively). Two distinct logistic regression models were used to derive predictive variables including demographic, clinical, and psychological factors. The sensitivity, specificity, and accuracy of each model were calculated. Poor physical HRQOL was positively associated with pain score, disease duration, monthly family income below 300 US$, comorbidity, patient global assessment of disease activity or PGA, and depression (odds ratios: 1.1; 1.004; 15.5; 1.1; 1.02; 2.08, respectively). The variables that entered into the poor mental HRQOL prediction model were monthly family income below 300 US$, comorbidity, PGA, and bodily pain (odds ratios: 6.7; 1.1; 1.01; 1.01, respectively). Optimal sensitivity and specificity were achieved at a cutoff point of 0.39 for the estimated probability of poor physical HRQOL and 0.18 for mental HRQOL. Sensitivity, specificity, and accuracy of the physical and mental models were 73.8, 87, 83.7% and 90.38, 70.36, 75.43%, respectively. The results show that the suggested models can be used to predict poor physical and mental HRQOL separately among RA patients using simple variables with acceptable accuracy. These models can be of use in the clinical decision-making of RA patients and to recognize patients with poor physical or mental HRQOL in advance, for better management.
Lyles, Robert H; Mitchell, Emily M; Weinberg, Clarice R; Umbach, David M; Schisterman, Enrique F
2016-09-01
Potential reductions in laboratory assay costs afforded by pooling equal aliquots of biospecimens have long been recognized in disease surveillance and epidemiological research and, more recently, have motivated design and analytic developments in regression settings. For example, Weinberg and Umbach (1999, Biometrics 55, 718-726) provided methods for fitting set-based logistic regression models to case-control data when a continuous exposure variable (e.g., a biomarker) is assayed on pooled specimens. We focus on improving estimation efficiency by utilizing available subject-specific information at the pool allocation stage. We find that a strategy that we call "(y,c)-pooling," which forms pooling sets of individuals within strata defined jointly by the outcome and other covariates, provides more precise estimation of the risk parameters associated with those covariates than does pooling within strata defined only by the outcome. We review the approach to set-based analysis through offsets developed by Weinberg and Umbach in a recent correction to their original paper. We propose a method for variance estimation under this design and use simulations and a real-data example to illustrate the precision benefits of (y,c)-pooling relative to y-pooling. We also note and illustrate that set-based models permit estimation of covariate interactions with exposure. PMID:26964741
Veazey, Lindsay M; Franklin, Erik C; Kelley, Christopher; Rooney, John; Frazer, L Neil; Toonen, Robert J
2016-01-01
Predictive habitat suitability models are powerful tools for cost-effective, statistically robust assessment of the environmental drivers of species distributions. The aim of this study was to develop predictive habitat suitability models for two genera of scleractinian corals (Leptoserisand Montipora) found within the mesophotic zone across the main Hawaiian Islands. The mesophotic zone (30-180 m) is challenging to reach, and therefore historically understudied, because it falls between the maximum limit of SCUBA divers and the minimum typical working depth of submersible vehicles. Here, we implement a logistic regression with rare events corrections to account for the scarcity of presence observations within the dataset. These corrections reduced the coefficient error and improved overall prediction success (73.6% and 74.3%) for both original regression models. The final models included depth, rugosity, slope, mean current velocity, and wave height as the best environmental covariates for predicting the occurrence of the two genera in the mesophotic zone. Using an objectively selected theta ("presence") threshold, the predicted presence probability values (average of 0.051 for Leptoseris and 0.040 for Montipora) were translated to spatially-explicit habitat suitability maps of the main Hawaiian Islands at 25 m grid cell resolution. Our maps are the first of their kind to use extant presence and absence data to examine the habitat preferences of these two dominant mesophotic coral genera across Hawai'i.
Franklin, Erik C.; Kelley, Christopher; Frazer, L. Neil; Toonen, Robert J.
2016-01-01
Predictive habitat suitability models are powerful tools for cost-effective, statistically robust assessment of the environmental drivers of species distributions. The aim of this study was to develop predictive habitat suitability models for two genera of scleractinian corals (Leptoserisand Montipora) found within the mesophotic zone across the main Hawaiian Islands. The mesophotic zone (30–180 m) is challenging to reach, and therefore historically understudied, because it falls between the maximum limit of SCUBA divers and the minimum typical working depth of submersible vehicles. Here, we implement a logistic regression with rare events corrections to account for the scarcity of presence observations within the dataset. These corrections reduced the coefficient error and improved overall prediction success (73.6% and 74.3%) for both original regression models. The final models included depth, rugosity, slope, mean current velocity, and wave height as the best environmental covariates for predicting the occurrence of the two genera in the mesophotic zone. Using an objectively selected theta (“presence”) threshold, the predicted presence probability values (average of 0.051 for Leptoseris and 0.040 for Montipora) were translated to spatially-explicit habitat suitability maps of the main Hawaiian Islands at 25 m grid cell resolution. Our maps are the first of their kind to use extant presence and absence data to examine the habitat preferences of these two dominant mesophotic coral genera across Hawai‘i. PMID:27441122
Veazey, Lindsay M; Franklin, Erik C; Kelley, Christopher; Rooney, John; Frazer, L Neil; Toonen, Robert J
2016-01-01
Predictive habitat suitability models are powerful tools for cost-effective, statistically robust assessment of the environmental drivers of species distributions. The aim of this study was to develop predictive habitat suitability models for two genera of scleractinian corals (Leptoserisand Montipora) found within the mesophotic zone across the main Hawaiian Islands. The mesophotic zone (30-180 m) is challenging to reach, and therefore historically understudied, because it falls between the maximum limit of SCUBA divers and the minimum typical working depth of submersible vehicles. Here, we implement a logistic regression with rare events corrections to account for the scarcity of presence observations within the dataset. These corrections reduced the coefficient error and improved overall prediction success (73.6% and 74.3%) for both original regression models. The final models included depth, rugosity, slope, mean current velocity, and wave height as the best environmental covariates for predicting the occurrence of the two genera in the mesophotic zone. Using an objectively selected theta ("presence") threshold, the predicted presence probability values (average of 0.051 for Leptoseris and 0.040 for Montipora) were translated to spatially-explicit habitat suitability maps of the main Hawaiian Islands at 25 m grid cell resolution. Our maps are the first of their kind to use extant presence and absence data to examine the habitat preferences of these two dominant mesophotic coral genera across Hawai'i. PMID:27441122
NASA Astrophysics Data System (ADS)
Madhu, B.; Ashok, N. C.; Balasubramanian, S.
2014-11-01
Multinomial logistic regression analysis was used to develop statistical model that can predict the probability of breast cancer in Southern Karnataka using the breast cancer occurrence data during 2007-2011. Independent socio-economic variables describing the breast cancer occurrence like age, education, occupation, parity, type of family, health insurance coverage, residential locality and socioeconomic status of each case was obtained. The models were developed as follows: i) Spatial visualization of the Urban- rural distribution of breast cancer cases that were obtained from the Bharat Hospital and Institute of Oncology. ii) Socio-economic risk factors describing the breast cancer occurrences were complied for each case. These data were then analysed using multinomial logistic regression analysis in a SPSS statistical software and relations between the occurrence of breast cancer across the socio-economic status and the influence of other socio-economic variables were evaluated and multinomial logistic regression models were constructed. iii) the model that best predicted the occurrence of breast cancer were identified. This multivariate logistic regression model has been entered into a geographic information system and maps showing the predicted probability of breast cancer occurrence in Southern Karnataka was created. This study demonstrates that Multinomial logistic regression is a valuable tool for developing models that predict the probability of breast cancer Occurrence in Southern Karnataka.
NASA Astrophysics Data System (ADS)
Cary, Theodore W.; Cwanger, Alyssa; Venkatesh, Santosh S.; Conant, Emily F.; Sehgal, Chandra M.
2012-03-01
This study compares the performance of two proven but very different machine learners, Naïve Bayes and logistic regression, for differentiating malignant and benign breast masses using ultrasound imaging. Ultrasound images of 266 masses were analyzed quantitatively for shape, echogenicity, margin characteristics, and texture features. These features along with patient age, race, and mammographic BI-RADS category were used to train Naïve Bayes and logistic regression classifiers to diagnose lesions as malignant or benign. ROC analysis was performed using all of the features and using only a subset that maximized information gain. Performance was determined by the area under the ROC curve, Az, obtained from leave-one-out cross validation. Naïve Bayes showed significant variation (Az 0.733 +/- 0.035 to 0.840 +/- 0.029, P < 0.002) with the choice of features, but the performance of logistic regression was relatively unchanged under feature selection (Az 0.839 +/- 0.029 to 0.859 +/- 0.028, P = 0.605). Out of 34 features, a subset of 6 gave the highest information gain: brightness difference, margin sharpness, depth-to-width, mammographic BI-RADs, age, and race. The probabilities of malignancy determined by Naïve Bayes and logistic regression after feature selection showed significant correlation (R2= 0.87, P < 0.0001). The diagnostic performance of Naïve Bayes and logistic regression can be comparable, but logistic regression is more robust. Since probability of malignancy cannot be measured directly, high correlation between the probabilities derived from two basic but dissimilar models increases confidence in the predictive power of machine learning models for characterizing solid breast masses on ultrasound.
Wyss, Richard; Ellis, Alan R; Brookhart, M Alan; Girman, Cynthia J; Jonsson Funk, Michele; LoCasale, Robert; Stürmer, Til
2014-09-15
The covariate-balancing propensity score (CBPS) extends logistic regression to simultaneously optimize covariate balance and treatment prediction. Although the CBPS has been shown to perform well in certain settings, its performance has not been evaluated in settings specific to pharmacoepidemiology and large database research. In this study, we use both simulations and empirical data to compare the performance of the CBPS with logistic regression and boosted classification and regression trees. We simulated various degrees of model misspecification to evaluate the robustness of each propensity score (PS) estimation method. We then applied these methods to compare the effect of initiating glucagonlike peptide-1 agonists versus sulfonylureas on cardiovascular events and all-cause mortality in the US Medicare population in 2007-2009. In simulations, the CBPS was generally more robust in terms of balancing covariates and reducing bias compared with misspecified logistic PS models and boosted classification and regression trees. All PS estimation methods performed similarly in the empirical example. For settings common to pharmacoepidemiology, logistic regression with balance checks to assess model specification is a valid method for PS estimation, but it can require refitting multiple models until covariate balance is achieved. The CBPS is a promising method to improve the robustness of PS models.
Liu, Hongjie; Li, Tianhao; Zhan, Sha; Pan, Meilan; Ma, Zhiguo; Li, Chenghua
2016-01-01
Aims. To establish a logistic regression (LR) prediction model for hepatotoxicity of Chinese herbal medicines (HMs) based on traditional Chinese medicine (TCM) theory and to provide a statistical basis for predicting hepatotoxicity of HMs. Methods. The correlations of hepatotoxic and nonhepatotoxic Chinese HMs with four properties, five flavors, and channel tropism were analyzed with chi-square test for two-way unordered categorical data. LR prediction model was established and the accuracy of the prediction by this model was evaluated. Results. The hepatotoxic and nonhepatotoxic Chinese HMs were related with four properties (p < 0.05), and the coefficient was 0.178 (p < 0.05); also they were related with five flavors (p < 0.05), and the coefficient was 0.145 (p < 0.05); they were not related with channel tropism (p > 0.05). There were totally 12 variables from four properties and five flavors for the LR. Four variables, warm and neutral of the four properties and pungent and salty of five flavors, were selected to establish the LR prediction model, with the cutoff value being 0.204. Conclusions. Warm and neutral of the four properties and pungent and salty of five flavors were the variables to affect the hepatotoxicity. Based on such results, the established LR prediction model had some predictive power for hepatotoxicity of Chinese HMs.
Liu, Hongjie; Li, Tianhao; Zhan, Sha; Pan, Meilan; Ma, Zhiguo; Li, Chenghua
2016-01-01
Aims. To establish a logistic regression (LR) prediction model for hepatotoxicity of Chinese herbal medicines (HMs) based on traditional Chinese medicine (TCM) theory and to provide a statistical basis for predicting hepatotoxicity of HMs. Methods. The correlations of hepatotoxic and nonhepatotoxic Chinese HMs with four properties, five flavors, and channel tropism were analyzed with chi-square test for two-way unordered categorical data. LR prediction model was established and the accuracy of the prediction by this model was evaluated. Results. The hepatotoxic and nonhepatotoxic Chinese HMs were related with four properties (p < 0.05), and the coefficient was 0.178 (p < 0.05); also they were related with five flavors (p < 0.05), and the coefficient was 0.145 (p < 0.05); they were not related with channel tropism (p > 0.05). There were totally 12 variables from four properties and five flavors for the LR. Four variables, warm and neutral of the four properties and pungent and salty of five flavors, were selected to establish the LR prediction model, with the cutoff value being 0.204. Conclusions. Warm and neutral of the four properties and pungent and salty of five flavors were the variables to affect the hepatotoxicity. Based on such results, the established LR prediction model had some predictive power for hepatotoxicity of Chinese HMs. PMID:27656240
NASA Astrophysics Data System (ADS)
Xiao, Rui; Su, Shiliang; Mai, Gengchen; Zhang, Zhonghao; Yang, Chenxue
2015-02-01
Cash crop expansion has been a major land use change in tropical and subtropical regions worldwide. Quantifying the determinants of cash crop expansion should provide deeper spatial insights into the dynamics and ecological consequences of cash crop expansion. This paper investigated the process of cash crop expansion in Hangzhou region (China) from 1985 to 2009 using remotely sensed data. The corresponding determinants (neighborhood, physical, and proximity) and their relative effects during three periods (1985-1994, 1994-2003, and 2003-2009) were quantified by logistic regression modeling and variance partitioning. Results showed that the total area of cash crops increased from 58,874.1 ha in 1985 to 90,375.1 ha in 2009, with a net growth of 53.5%. Cash crops were more likely to grow in loam soils. Steep areas with higher elevation would experience less likelihood of cash crop expansion. A consistently higher probability of cash crop expansion was found on places with abundant farmland and forest cover in the three periods. Besides, distance to river and lake, distance to county center, and distance to provincial road were decisive determinants for farmers' choice of cash crop plantation. Different categories of determinants and their combinations exerted different influences on cash crop expansion. The joint effects of neighborhood and proximity determinants were the strongest, and the unique effect of physical determinants decreased with time. Our study contributed to understanding of the proximate drivers of cash crop expansion in subtropical regions.
Bent, Gardner C.; Steeves, Peter A.
2006-01-01
A revised logistic regression equation and an automated procedure were developed for mapping the probability of a stream flowing perennially in Massachusetts. The equation provides city and town conservation commissions and the Massachusetts Department of Environmental Protection a method for assessing whether streams are intermittent or perennial at a specific site in Massachusetts by estimating the probability of a stream flowing perennially at that site. This information could assist the environmental agencies who administer the Commonwealth of Massachusetts Rivers Protection Act of 1996, which establishes a 200-foot-wide protected riverfront area extending from the mean annual high-water line along each side of a perennial stream, with exceptions for some urban areas. The equation was developed by relating the observed intermittent or perennial status of a stream site to selected basin characteristics of naturally flowing streams (defined as having no regulation by dams, surface-water withdrawals, ground-water withdrawals, diversion, wastewater discharge, and so forth) in Massachusetts. This revised equation differs from the equation developed in a previous U.S. Geological Survey study in that it is solely based on visual observations of the intermittent or perennial status of stream sites across Massachusetts and on the evaluation of several additional basin and land-use characteristics as potential explanatory variables in the logistic regression analysis. The revised equation estimated more accurately the intermittent or perennial status of the observed stream sites than the equation from the previous study. Stream sites used in the analysis were identified as intermittent or perennial based on visual observation during low-flow periods from late July through early September 2001. The database of intermittent and perennial streams included a total of 351 naturally flowing (no regulation) sites, of which 85 were observed to be intermittent and 266 perennial
NASA Astrophysics Data System (ADS)
Mousavi, S. Mostafa; Horton, Stephen P.; Langston, Charles A.; Samei, Borhan
2016-10-01
We develop an automated strategy for discriminating deep microseismic events from shallow ones on the basis of the waveforms recorded on a limited number of surface receivers. Machine-learning techniques are employed to explore the relationship between event hypocentres and seismic features of the recorded signals in time, frequency and time-frequency domains. We applied the technique to 440 microearthquakes -1.7 < Mw < 1.29, induced by an underground cavern collapse in the Napoleonville Salt Dome in Bayou Corne, Louisiana. Forty different seismic attributes of whole seismograms including degree of polarization and spectral attributes were measured. A selected set of features was then used to train the system to discriminate between deep and shallow events based on the knowledge gained from existing patterns. The cross-validation test showed that events with depth shallower than 250 m can be discriminated from events with hypocentral depth between 1000 and 2000 m with 88 per cent and 90.7 per cent accuracy using logistic regression and artificial neural network models, respectively. Similar results were obtained using single station seismograms. The results show that the spectral features have the highest correlation to source depth. Spectral centroids and 2-D cross-correlations in the time-frequency domain are two new seismic features used in this study that showed to be promising measures for seismic event classification. The used machine-learning techniques have application for efficient automatic classification of low energy signals recorded at one or more seismic stations.
Liu, Hongjie; Li, Tianhao; Chen, Lingxiu; Zhan, Sha; Pan, Meilan; Ma, Zhiguo; Li, Chenghua; Zhang, Zhe
2016-01-01
Aims. To establish a logistic regression (LR) prediction model for hepatotoxicity of Chinese herbal medicines (HMs) based on traditional Chinese medicine (TCM) theory and to provide a statistical basis for predicting hepatotoxicity of HMs. Methods. The correlations of hepatotoxic and nonhepatotoxic Chinese HMs with four properties, five flavors, and channel tropism were analyzed with chi-square test for two-way unordered categorical data. LR prediction model was established and the accuracy of the prediction by this model was evaluated. Results. The hepatotoxic and nonhepatotoxic Chinese HMs were related with four properties (p < 0.05), and the coefficient was 0.178 (p < 0.05); also they were related with five flavors (p < 0.05), and the coefficient was 0.145 (p < 0.05); they were not related with channel tropism (p > 0.05). There were totally 12 variables from four properties and five flavors for the LR. Four variables, warm and neutral of the four properties and pungent and salty of five flavors, were selected to establish the LR prediction model, with the cutoff value being 0.204. Conclusions. Warm and neutral of the four properties and pungent and salty of five flavors were the variables to affect the hepatotoxicity. Based on such results, the established LR prediction model had some predictive power for hepatotoxicity of Chinese HMs.
NASA Astrophysics Data System (ADS)
Dong, Longjun; Wesseloo, Johan; Potvin, Yves; Li, Xibing
2016-01-01
Seismic events and blasts generate seismic waveforms that have different characteristics. The challenge to confidently differentiate these two signatures is complex and requires the integration of physical and statistical techniques. In this paper, the different characteristics of blasts and seismic events were investigated by comparing probability density distributions of different parameters. Five typical parameters of blasts and events and the probability density functions of blast time, as well as probability density functions of origin time difference for neighbouring blasts were extracted as discriminant indicators. The Fisher classifier, naive Bayesian classifier and logistic regression were used to establish discriminators. Databases from three Australian and Canadian mines were established for training, calibrating and testing the discriminant models. The classification performances and discriminant precision of the three statistical techniques were discussed and compared. The proposed discriminators have explicit and simple functions which can be easily used by workers in mines or researchers. Back-test, applied results, cross-validated results and analysis of receiver operating characteristic curves in different mines have shown that the discriminator for one of the mines has a reasonably good discriminating performance.
NASA Astrophysics Data System (ADS)
Mousavi, S. Mostafa; Horton, Stephen, P.; Langston, Charles A.; Samei, Borhan
2016-07-01
We develop an automated strategy for discriminating deep microseismic events from shallow ones on the basis of the waveforms recorded on a limited number of surface receivers. Machine-learning techniques are employed to explore the relationship between event hypocenters and seismic features of the recorded signals in time, frequency, and time-frequency domains. We applied the technique to 440 microearthquakes -1.7
Korovessis, Panagiotis; Repantis, Thomas; Zacharatos, Spyros; Baikousis, Andreas
2012-12-01
The objective of this retrospective cross-sectional study was to estimate the 6-month prevalence and severity of low back pain and sciatica in a representative sample of an adult Mediterranean population. The study group comprised a sample of 674 adults aged 20 years or older from a mainly (74.8%) urban population. Information regarding low back pain and sciatica prevalence and severity and its related aspects, as well as socioeconomic and demographic characteristics, was collected by personal interviews with a validated questionnaire. The association between the intensity of low back pain and sciatica with several sociodemographic parameters was tested using ordered univariate and multivariate logistic regression analysis.A total of 266 (39.5%) patients reported low back pain and 166 (24.6%) reported sciatica during the previous 6-month period. A woman living in a Mediterranean country reported low back pain of increased severity if she was a married housewife aged older than 65 years who was a smoker and suffered from depression. More severe sciatic pain was reported by working married women older than 65 years who were smokers.
Liu, Hongjie; Li, Tianhao; Chen, Lingxiu; Zhan, Sha; Pan, Meilan; Ma, Zhiguo; Li, Chenghua; Zhang, Zhe
2016-01-01
Aims. To establish a logistic regression (LR) prediction model for hepatotoxicity of Chinese herbal medicines (HMs) based on traditional Chinese medicine (TCM) theory and to provide a statistical basis for predicting hepatotoxicity of HMs. Methods. The correlations of hepatotoxic and nonhepatotoxic Chinese HMs with four properties, five flavors, and channel tropism were analyzed with chi-square test for two-way unordered categorical data. LR prediction model was established and the accuracy of the prediction by this model was evaluated. Results. The hepatotoxic and nonhepatotoxic Chinese HMs were related with four properties (p < 0.05), and the coefficient was 0.178 (p < 0.05); also they were related with five flavors (p < 0.05), and the coefficient was 0.145 (p < 0.05); they were not related with channel tropism (p > 0.05). There were totally 12 variables from four properties and five flavors for the LR. Four variables, warm and neutral of the four properties and pungent and salty of five flavors, were selected to establish the LR prediction model, with the cutoff value being 0.204. Conclusions. Warm and neutral of the four properties and pungent and salty of five flavors were the variables to affect the hepatotoxicity. Based on such results, the established LR prediction model had some predictive power for hepatotoxicity of Chinese HMs. PMID:27656240
NASA Astrophysics Data System (ADS)
Wang, Ying; Song, Chongzhen; Lin, Qigen; Li, Juan
2016-04-01
The Newmark displacement model has been used to predict earthquake-triggered landslides. Logistic regression (LR) is also a common landslide hazard assessment method. We combined the Newmark displacement model and LR and applied them to Wenchuan County and Beichuan County in China, which were affected by the Ms. 8.0 Wenchuan earthquake on May 12th, 2008, to develop a mechanism-based landslide occurrence probability model and improve the predictive accuracy. A total of 1904 landslide sites in Wenchuan County and 3800 random non-landslide sites were selected as the training dataset. We applied the Newmark model and obtained the distribution of permanent displacement (Dn) for a 30 × 30 m grid. Four factors (Dn, topographic relief, and distances to drainages and roads) were used as independent variables for LR. Then, a combined model was obtained, with an AUC (area under the curve) value of 0.797 for Wenchuan County. A total of 617 landslide sites and non-landslide sites in Beichuan County were used as a validation dataset with AUC = 0.753. The proposed method may also be applied to earthquake-induced landslides in other regions.
Dai, Huanping; Micheyl, Christophe
2012-11-01
Psychophysical "reverse-correlation" methods allow researchers to gain insight into the perceptual representations and decision weighting strategies of individual subjects in perceptual tasks. Although these methods have gained momentum, until recently their development was limited to experiments involving only two response categories. Recently, two approaches for estimating decision weights in m-alternative experiments have been put forward. One approach extends the two-category correlation method to m > 2 alternatives; the second uses multinomial logistic regression (MLR). In this article, the relative merits of the two methods are discussed, and the issues of convergence and statistical efficiency of the methods are evaluated quantitatively using Monte Carlo simulations. The results indicate that, for a range of values of the number of trials, the estimated weighting patterns are closer to their asymptotic values for the correlation method than for the MLR method. Moreover, for the MLR method, weight estimates for different stimulus components can exhibit strong correlations, making the analysis and interpretation of measured weighting patterns less straightforward than for the correlation method. These and other advantages of the correlation method, which include computational simplicity and a close relationship to other well-established psychophysical reverse-correlation methods, make it an attractive tool to uncover decision strategies in m-alternative experiments.
NASA Astrophysics Data System (ADS)
Vozinaki, Anthi Eirini K.; Karatzas, George P.; Sibetheros, Ioannis A.; Varouchakis, Emmanouil A.
2014-05-01
Damage curves are the most significant component of the flood loss estimation models. Their development is quite complex. Two types of damage curves exist, historical and synthetic curves. Historical curves are developed from historical loss data from actual flood events. However, due to the scarcity of historical data, synthetic damage curves can be alternatively developed. Synthetic curves rely on the analysis of expected damage under certain hypothetical flooding conditions. A synthetic approach was developed and presented in this work for the development of damage curves, which are subsequently used as the basic input to a flood loss estimation model. A questionnaire-based survey took place among practicing and research agronomists, in order to generate rural loss data based on the responders' loss estimates, for several flood condition scenarios. In addition, a similar questionnaire-based survey took place among building experts, i.e. civil engineers and architects, in order to generate loss data for the urban sector. By answering the questionnaire, the experts were in essence expressing their opinion on how damage to various crop types or building types is related to a range of values of flood inundation parameters, such as floodwater depth and velocity. However, the loss data compiled from the completed questionnaires were not sufficient for the construction of workable damage curves; to overcome this problem, a Weighted Monte Carlo method was implemented, in order to generate extra synthetic datasets with statistical properties identical to those of the questionnaire-based data. The data generated by the Weighted Monte Carlo method were processed via Logistic Regression techniques in order to develop accurate logistic damage curves for the rural and the urban sectors. A Python-based code was developed, which combines the Weighted Monte Carlo method and the Logistic Regression analysis into a single code (WMCLR Python code). Each WMCLR code execution
Zhang, Peng; Parenteau, Chantal; Wang, Lu; Holcombe, Sven; Kohoyda-Inglis, Carla; Sullivan, June; Wang, Stewart
2013-11-01
This study resulted in a model-averaging methodology that predicts crash injury risk using vehicle, demographic, and morphomic variables and assesses the importance of individual predictors. The effectiveness of this methodology was illustrated through analysis of occupant chest injuries in frontal vehicle crashes. The crash data were obtained from the International Center for Automotive Medicine (ICAM) database for calendar year 1996 to 2012. The morphomic data are quantitative measurements of variations in human body 3-dimensional anatomy. Morphomics are obtained from imaging records. In this study, morphomics were obtained from chest, abdomen, and spine CT using novel patented algorithms. A NASS-trained crash investigator with over thirty years of experience collected the in-depth crash data. There were 226 cases available with occupants involved in frontal crashes and morphomic measurements. Only cases with complete recorded data were retained for statistical analysis. Logistic regression models were fitted using all possible configurations of vehicle, demographic, and morphomic variables. Different models were ranked by the Akaike Information Criteria (AIC). An averaged logistic regression model approach was used due to the limited sample size relative to the number of variables. This approach is helpful when addressing variable selection, building prediction models, and assessing the importance of individual variables. The final predictive results were developed using this approach, based on the top 100 models in the AIC ranking. Model-averaging minimized model uncertainty, decreased the overall prediction variance, and provided an approach to evaluating the importance of individual variables. There were 17 variables investigated: four vehicle, four demographic, and nine morphomic. More than 130,000 logistic models were investigated in total. The models were characterized into four scenarios to assess individual variable contribution to injury risk. Scenario
Zhang, Peng; Parenteau, Chantal; Wang, Lu; Holcombe, Sven; Kohoyda-Inglis, Carla; Sullivan, June; Wang, Stewart
2013-11-01
This study resulted in a model-averaging methodology that predicts crash injury risk using vehicle, demographic, and morphomic variables and assesses the importance of individual predictors. The effectiveness of this methodology was illustrated through analysis of occupant chest injuries in frontal vehicle crashes. The crash data were obtained from the International Center for Automotive Medicine (ICAM) database for calendar year 1996 to 2012. The morphomic data are quantitative measurements of variations in human body 3-dimensional anatomy. Morphomics are obtained from imaging records. In this study, morphomics were obtained from chest, abdomen, and spine CT using novel patented algorithms. A NASS-trained crash investigator with over thirty years of experience collected the in-depth crash data. There were 226 cases available with occupants involved in frontal crashes and morphomic measurements. Only cases with complete recorded data were retained for statistical analysis. Logistic regression models were fitted using all possible configurations of vehicle, demographic, and morphomic variables. Different models were ranked by the Akaike Information Criteria (AIC). An averaged logistic regression model approach was used due to the limited sample size relative to the number of variables. This approach is helpful when addressing variable selection, building prediction models, and assessing the importance of individual variables. The final predictive results were developed using this approach, based on the top 100 models in the AIC ranking. Model-averaging minimized model uncertainty, decreased the overall prediction variance, and provided an approach to evaluating the importance of individual variables. There were 17 variables investigated: four vehicle, four demographic, and nine morphomic. More than 130,000 logistic models were investigated in total. The models were characterized into four scenarios to assess individual variable contribution to injury risk. Scenario
ERIC Educational Resources Information Center
Hidalgo, Mª Dolores; Gómez-Benito, Juana; Zumbo, Bruno D.
2014-01-01
The authors analyze the effectiveness of the R[superscript 2] and delta log odds ratio effect size measures when using logistic regression analysis to detect differential item functioning (DIF) in dichotomous items. A simulation study was carried out, and the Type I error rate and power estimates under conditions in which only statistical testing…
ERIC Educational Resources Information Center
So, Tak-Shing Harry; Peng, Chao-Ying Joanne
This study compared the accuracy of predicting two-group membership obtained from K-means clustering with those derived from linear probability modeling, linear discriminant function, and logistic regression under various data properties. Multivariate normally distributed populations were simulated based on combinations of population proportions,…
ERIC Educational Resources Information Center
Woods, Carol M.; Oltmanns, Thomas F.; Turkheimer, Eric
2008-01-01
Person-fit assessment is used to identify persons who respond aberrantly to a test or questionnaire. In this study, S. P. Reise's (2000) method for evaluating person fit using 2-level logistic regression was applied to 13 personality scales of the Schedule for Nonadaptive and Adaptive Personality (SNAP; L. Clark, 1996) that had been administered…
ERIC Educational Resources Information Center
Land, Kenneth C.; And Others
1994-01-01
Advantages of using logistic and hazards regression techniques in assessing the overall impact of a treatment program and the differential impact on client subgroups are examined and compared using data from a juvenile court program for status offenders. Implications are drawn for management and effectiveness of intensive supervision programs.…
NASA Astrophysics Data System (ADS)
Ngadisih, Bhandary, Netra P.; Yatabe, Ryuichi; Dahal, Ranjan K.
2016-05-01
West Java Province is the most landslide risky area in Indonesia owing to extreme geo-morphological conditions, climatic conditions and densely populated settlements with immense completed and ongoing development activities. So, a landslide susceptibility map at regional scale in this province is a fundamental tool for risk management and land-use planning. Logistic regression and Artificial Neural Network (ANN) models are the most frequently used tools for landslide susceptibility assessment, mainly because they are capable of handling the nature of landslide data. The main objective of this study is to apply logistic regression and ANN models and compare their performance for landslide susceptibility mapping in volcanic mountains of West Java Province. In addition, the model application is proposed to identify the most contributing factors to landslide events in the study area. The spatial database built in GIS platform consists of landslide inventory, four topographical parameters (slope, aspect, relief, distance to river), three geological parameters (distance to volcano crater, distance to thrust and fault, geological formation), and two anthropogenic parameters (distance to road, land use). The logistic regression model in this study revealed that slope, geological formations, distance to road and distance to volcano are the most influential factors of landslide events while, the ANN model revealed that distance to volcano crater, geological formation, distance to road, and land-use are the most important causal factors of landslides in the study area. Moreover, an evaluation of the model showed that the ANN model has a higher accuracy than the logistic regression model.
NASA Astrophysics Data System (ADS)
Althuwaynee, Omar F.; Pradhan, Biswajeet; Ahmad, Noordin
2014-06-01
This article uses methodology based on chi-squared automatic interaction detection (CHAID), as a multivariate method that has an automatic classification capacity to analyse large numbers of landslide conditioning factors. This new algorithm was developed to overcome the subjectivity of the manual categorization of scale data of landslide conditioning factors, and to predict rainfall-induced susceptibility map in Kuala Lumpur city and surrounding areas using geographic information system (GIS). The main objective of this article is to use CHi-squared automatic interaction detection (CHAID) method to perform the best classification fit for each conditioning factor, then, combining it with logistic regression (LR). LR model was used to find the corresponding coefficients of best fitting function that assess the optimal terminal nodes. A cluster pattern of landslide locations was extracted in previous study using nearest neighbor index (NNI), which were then used to identify the clustered landslide locations range. Clustered locations were used as model training data with 14 landslide conditioning factors such as; topographic derived parameters, lithology, NDVI, land use and land cover maps. Pearson chi-squared value was used to find the best classification fit between the dependent variable and conditioning factors. Finally the relationship between conditioning factors were assessed and the landslide susceptibility map (LSM) was produced. An area under the curve (AUC) was used to test the model reliability and prediction capability with the training and validation landslide locations respectively. This study proved the efficiency and reliability of decision tree (DT) model in landslide susceptibility mapping. Also it provided a valuable scientific basis for spatial decision making in planning and urban management studies.
Burkness, Eric C; Galvan, Tederson L; Hutchison, W D
2009-04-01
Late-season plantings of sweet corn in Minnesota result in an abundant supply of silking corn, Zea mays L., throughout August to early September that is highly attractive to the corn earworm, Helicoverpa zea (Boddie) (Lepidoptera: Noctuidae). During a 10-yr period, 1997-2006, insecticide efficacy trials were conducted in late-planted sweet corn in Minnesota for management of H. zea. These data were used to develop a logistic regression model to identify the variables and interactions that most influenced efficacy (proportion control) of late-instar H. zea. The pyrethroid lambdacyhalothrin (0.028 kg [AI]/ha) is a commonly used insecticide in sweet corn and was therefore chosen for use in parameter evaluation. Three variables were found to be significant (alpha = 0.05), the percentage of plants silking at the time of the first insecticide application, the interval between the first and second insecticide applications, and the interval between the last insecticide application and harvest. Odds ratio estimates indicated that as the percentage of plants silking at the time of first application increased, control of H. zea increased. As the interval between the first and second insecticide application increased, control of H. zea decreased. Finally, as the interval between the last insecticide application and harvest increased, control of H. zea increased. An additional timing trial was conducted in 2007 by using lambda-cyhalothrin, to evaluate the impact of the percentage of plants silking at the first application. The results indicated no significant differences in efficacy against late-instar H. zea at 0, 50, 90, and 100% of plants silking at the first application (regimes of five or more sprays). The implications of these effects are discussed within the context of current integrated pest management programs for late-planted sweet corn in the upper midwestern United States. PMID:19449649
Boutilier, J; Chan, T; Lee, T; Craig, T; Sharpe, M
2014-06-15
Purpose: To develop a statistical model that predicts optimization objective function weights from patient geometry for intensity-modulation radiotherapy (IMRT) of prostate cancer. Methods: A previously developed inverse optimization method (IOM) is applied retrospectively to determine optimal weights for 51 treated patients. We use an overlap volume ratio (OVR) of bladder and rectum for different PTV expansions in order to quantify patient geometry in explanatory variables. Using the optimal weights as ground truth, we develop and train a logistic regression (LR) model to predict the rectum weight and thus the bladder weight. Post hoc, we fix the weights of the left femoral head, right femoral head, and an artificial structure that encourages conformity to the population average while normalizing the bladder and rectum weights accordingly. The population average of objective function weights is used for comparison. Results: The OVR at 0.7cm was found to be the most predictive of the rectum weights. The LR model performance is statistically significant when compared to the population average over a range of clinical metrics including bladder/rectum V53Gy, bladder/rectum V70Gy, and mean voxel dose to the bladder, rectum, CTV, and PTV. On average, the LR model predicted bladder and rectum weights that are both 63% closer to the optimal weights compared to the population average. The treatment plans resulting from the LR weights have, on average, a rectum V70Gy that is 35% closer to the clinical plan and a bladder V70Gy that is 43% closer. Similar results are seen for bladder V54Gy and rectum V54Gy. Conclusion: Statistical modelling from patient anatomy can be used to determine objective function weights in IMRT for prostate cancer. Our method allows the treatment planners to begin the personalization process from an informed starting point, which may lead to more consistent clinical plans and reduce overall planning time.
NASA Astrophysics Data System (ADS)
Yao, W.; Poleswki, P.; Krzystek, P.
2016-06-01
The recent success of deep convolutional neural networks (CNN) on a large number of applications can be attributed to large amounts of available training data and increasing computing power. In this paper, a semantic pixel labelling scheme for urban areas using multi-resolution CNN and hand-crafted spatial-spectral features of airborne remotely sensed data is presented. Both CNN and hand-crafted features are applied to image/DSM patches to produce per-pixel class probabilities with a L1-norm regularized logistical regression classifier. The evidence theory infers a degree of belief for pixel labelling from different sources to smooth regions by handling the conflicts present in the both classifiers while reducing the uncertainty. The aerial data used in this study were provided by ISPRS as benchmark datasets for 2D semantic labelling tasks in urban areas, which consists of two data sources from LiDAR and color infrared camera. The test sites are parts of a city in Germany which is assumed to consist of typical object classes including impervious surfaces, trees, buildings, low vegetation, vehicles and clutter. The evaluation is based on the computation of pixel-based confusion matrices by random sampling. The performance of the strategy with respect to scene characteristics and method combination strategies is analyzed and discussed. The competitive classification accuracy could be not only explained by the nature of input data sources: e.g. the above-ground height of nDSM highlight the vertical dimension of houses, trees even cars and the nearinfrared spectrum indicates vegetation, but also attributed to decision-level fusion of CNN's texture-based approach with multichannel spatial-spectral hand-crafted features based on the evidence combination theory.
Kononen, Douglas W; Flannagan, Carol A C; Wang, Stewart C
2011-01-01
A multivariate logistic regression model, based upon National Automotive Sampling System Crashworthiness Data System (NASS-CDS) data for calendar years 1999-2008, was developed to predict the probability that a crash-involved vehicle will contain one or more occupants with serious or incapacitating injuries. These vehicles were defined as containing at least one occupant coded with an Injury Severity Score (ISS) of greater than or equal to 15, in planar, non-rollover crash events involving Model Year 2000 and newer cars, light trucks, and vans. The target injury outcome measure was developed by the Centers for Disease Control and Prevention (CDC)-led National Expert Panel on Field Triage in their recent revision of the Field Triage Decision Scheme (American College of Surgeons, 2006). The parameters to be used for crash injury prediction were subsequently specified by the National Expert Panel. Model input parameters included: crash direction (front, left, right, and rear), change in velocity (delta-V), multiple vs. single impacts, belt use, presence of at least one older occupant (≥ 55 years old), presence of at least one female in the vehicle, and vehicle type (car, pickup truck, van, and sport utility). The model was developed using predictor variables that may be readily available, post-crash, from OnStar-like telematics systems. Model sensitivity and specificity were 40% and 98%, respectively, using a probability cutpoint of 0.20. The area under the receiver operator characteristic (ROC) curve for the final model was 0.84. Delta-V (mph), seat belt use and crash direction were the most important predictors of serious injury. Due to the complexity of factors associated with rollover-related injuries, a separate screening algorithm is needed to model injuries associated with this crash mode. PMID:21094304
Mao, Hui-Fen; Chang, Ling-Hui; Tsai, Athena Yi-Jung; Huang, Wen-Ni; Wang, Jye
2016-01-01
Because resources for long-term care services are limited, timely and appropriate referral for rehabilitation services is critical for optimizing clients' functions and successfully integrating them into the community. We investigated which client characteristics are most relevant in predicting Taiwan's community-based occupational therapy (OT) service referral based on experts' beliefs. Data were collected in face-to-face interviews using the Multidimensional Assessment Instrument (MDAI). Community-dwelling participants (n = 221) ≥ 18 years old who reported disabilities in the previous National Survey of Long-term Care Needs in Taiwan were enrolled. The standard for referral was the judgment and agreement of two experienced occupational therapists who reviewed the results of the MDAI. Logistic regressions and Generalized Additive Models were used for analysis. Two predictive models were proposed, one using basic activities of daily living (BADLs) and one using instrumental ADLs (IADLs). Dementia, psychiatric disorders, cognitive impairment, joint range-of-motion limitations, fear of falling, behavioral or emotional problems, expressive deficits (in the BADL-based model), and limitations in IADLs or BADLs were significantly correlated with the need for referral. Both models showed high area under the curve (AUC) values on receiver operating curve testing (AUC = 0.977 and 0.972, respectively). The probability of being referred for community OT services was calculated using the referral algorithm. The referral protocol facilitated communication between healthcare professionals to make appropriate decisions for OT referrals. The methods and findings should be useful for developing referral protocols for other long-term care services.
JACOB, BENJAMIN G.; NOVAK, ROBERT J.; TOE, LAURENT; SANFO, MOUSSA S.; AFRIYIE, ABENA N.; IBRAHIM, MOHAMMED A.; GRIFFITH, DANIEL A.; UNNASCH, THOMAS R.
2013-01-01
The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l. a major black-fly vector of Onchoceriasis, postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects. Generally, this correlation comes from two sources: (1) the design of the random effects and their assumed covariance from the multiple levels within the regression model; and, (2) the correlation structure of the residuals. Unfortunately, inconspicuous errors in residual intra-cluster correlation estimates can overstate precision in forecasted S.damnosum s.l. riverine larval habitat explanatory attributes regardless how they are treated (e.g., independent, autoregressive, Toeplitz, etc). In this research, the geographical locations for multiple riverine-based S. damnosum s.l. larval ecosystem habitats sampled from 2 pre-established epidemiological sites in Togo were identified and recorded from July 2009 to June 2010. Initially the data was aggregated into proc genmod. An agglomerative hierarchical residual cluster-based analysis was then performed. The sampled clustered study site data was then analyzed for statistical correlations using Monthly Biting Rates (MBR). Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS. A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by Annual Biting Rates (ABR). This data was overlain onto multitemporal sub-meter pixel resolution satellite data (i.e., QuickBird 0.61m wavbands ). Orthogonal spatial filter eigenvectors were then generated in SAS/GIS. Univariate and non-linear regression-based models (i.e., Logistic, Poisson and Negative Binomial) were also employed to determine probability distributions and to identify statistically significant parameter
Jacob, Benjamin G; Novak, Robert J; Toe, Laurent; Sanfo, Moussa S; Afriyie, Abena N; Ibrahim, Mohammed A; Griffith, Daniel A; Unnasch, Thomas R
2012-01-01
The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l. a major black-fly vector of Onchoceriasis, postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects. Generally, this correlation comes from two sources: (1) the design of the random effects and their assumed covariance from the multiple levels within the regression model; and, (2) the correlation structure of the residuals. Unfortunately, inconspicuous errors in residual intra-cluster correlation estimates can overstate precision in forecasted S.damnosum s.l. riverine larval habitat explanatory attributes regardless how they are treated (e.g., independent, autoregressive, Toeplitz, etc). In this research, the geographical locations for multiple riverine-based S. damnosum s.l. larval ecosystem habitats sampled from 2 pre-established epidemiological sites in Togo were identified and recorded from July 2009 to June 2010. Initially the data was aggregated into proc genmod. An agglomerative hierarchical residual cluster-based analysis was then performed. The sampled clustered study site data was then analyzed for statistical correlations using Monthly Biting Rates (MBR). Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS. A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by Annual Biting Rates (ABR). This data was overlain onto multitemporal sub-meter pixel resolution satellite data (i.e., QuickBird 0.61m wavbands ). Orthogonal spatial filter eigenvectors were then generated in SAS/GIS. Univariate and non-linear regression-based models (i.e., Logistic, Poisson and Negative Binomial) were also employed to determine probability distributions and to identify statistically significant parameter
Barks, C.S.
1995-01-01
Storm-runoff water-quality data were used to verify and, when appropriate, adjust regional regression models previously developed to estimate urban storm- runoff loads and mean concentrations in Little Rock, Arkansas. Data collected at 5 representative sites during 22 storms from June 1992 through January 1994 compose the Little Rock data base. Comparison of observed values (0) of storm-runoff loads and mean concentrations to the predicted values (Pu) from the regional regression models for nine constituents (chemical oxygen demand, suspended solids, total nitrogen, total ammonia plus organic nitrogen as nitrogen, total phosphorus, dissolved phosphorus, total recoverable copper, total recoverable lead, and total recoverable zinc) shows large prediction errors ranging from 63 to several thousand percent. Prediction errors for six of the regional regression models are less than 100 percent, and can be considered reasonable for water-quality models. Differences between 0 and Pu are due to variability in the Little Rock data base and error in the regional models. Where applicable, a model adjustment procedure (termed MAP-R-P) based upon regression with 0 against Pu was applied to improve predictive accuracy. For 11 of the 18 regional water-quality models, 0 and Pu are significantly correlated, that is much of the variation in 0 is explained by the regional models. Five of these 11 regional models consistently overestimate O; therefore, MAP-R-P can be used to provide a better estimate. For the remaining seven regional models, 0 and Pu are not significanfly correlated, thus neither the unadjusted regional models nor the MAP-R-P is appropriate. A simple estimator, such as the mean of the observed values may be used if the regression models are not appropriate. Standard error of estimate of the adjusted models ranges from 48 to 130 percent. Calibration results may be biased due to the limited data set sizes in the Little Rock data base. The relatively large values of
NASA Astrophysics Data System (ADS)
Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele
2015-04-01
first phase of the work addressed to identify the spatial relationships between the landslides location and the 13 related factors by using the Frequency Ratio bivariate statistical method. The analysis was then carried out by adopting a multivariate statistical approach, according to the Logistic Regression technique and Random Forests technique that gave best results in terms of AUC. The models were performed and evaluated with different sample sizes and also taking into account the temporal variation of input variables such as burned areas by wildfire. The most significant outcome of this work are: the relevant influence of the sample size on the model results and the strong importance of some environmental factors (e.g. land use and wildfires) for the identification of the depletion zones of extremely rapid shallow landslides.
Inokuchi, Shota; Kitayama, Tetsushi; Fujii, Koji; Nakahara, Hiroaki; Nakanishi, Hiroaki; Saito, Kazuyuki; Mizuno, Natsuko; Sekiguchi, Kazumasa
2016-03-01
Phenomena called allele dropouts are often observed in crime stain profiles. Allele dropouts are generated because one of a pair of heterozygous alleles is underrepresented by stochastic influences and is indicated by a low peak detection threshold. Therefore, it is important that such risks are statistically evaluated. In recent years, attempts to interpret allele dropout probabilities by logistic regression using the information on peak heights have been reported. However, these previous studies are limited to the use of a human identification kit and fragment analyzer. In the present study, we calculated allele dropout probabilities by logistic regression using contemporary capillary electrophoresis instruments, 3500xL Genetic Analyzer and 3130xl Genetic Analyzer with various commercially available human identification kits such as AmpFℓSTR® Identifiler® Plus PCR Amplification Kit. Furthermore, the differences in logistic curves between peak detection thresholds using analytical threshold (AT) and values recommended by the manufacturer were compared. The standard logistic curves for calculating allele dropout probabilities from the peak height of sister alleles were characterized. The present study confirmed that ATs were lower than the values recommended by the manufacturer in human identification kits; therefore, it is possible to reduce allele dropout probabilities and obtain more information using AT as the peak detection threshold.
ERIC Educational Resources Information Center
Kasapoglu, Koray
2014-01-01
This study aims to investigate which factors are associated with Turkey's 15-year-olds' scoring above the OECD average (493) on the PISA'09 reading assessment. Collected from a total of 4,996 15-year-old students from Turkey, data were analyzed by logistic regression analysis in order to model the data of students who were split…
Failing, K; Massing, S; Bülte, M
2004-05-01
Logistic regression is a powerful tool to analyse data sets with a dichotomous response variable. However, in most situations it is used as a model without interactions between the factor variables. This is done either by presumption or to avoid difficulties in the interpretation of the statistical results. In this article first the model of simple logistic regression without interactions is introduced followed by the expanded model with pairwise interactions between the factors. The application of both models is demonstrated at the present data set concerning the detection of E. coli O157.H7 in artificially contaminated minced beef. The influencing variables are the factors enrichment time, inoculation density, enrichment broth, subculturing medium, and state of samples (fresh vs. deep frozen). The statistical reanalysis displayed strongly differing results emphasizing the importance of interactions in logistic regression models. In particular, the odds ratio for E. coli detection dependant from the enrichment time (24 h vs. 6 h) (OR = 0.41) was strongly overestimated without simultaneous attention of the E. coli inoculation density (OR approximately equal to 0.2 to 0.02). In this context the possible interpretation of the interaction is discussed.
Hoos, Anne B.; Patel, Anant R.
1996-01-01
Model-adjustment procedures were applied to the combined data bases of storm-runoff quality for Chattanooga, Knoxville, and Nashville, Tennessee, to improve predictive accuracy for storm-runoff quality for urban watersheds in these three cities and throughout Middle and East Tennessee. Data for 45 storms at 15 different sites (five sites in each city) constitute the data base. Comparison of observed values of storm-runoff load and event-mean concentration to the predicted values from the regional regression models for 10 constituents shows prediction errors, as large as 806,000 percent. Model-adjustment procedures, which combine the regional model predictions with local data, are applied to improve predictive accuracy. Standard error of estimate after model adjustment ranges from 67 to 322 percent. Calibration results may be biased due to sampling error in the Tennessee data base. The relatively large values of standard error of estimate for some of the constituent models, although representing significant reduction (at least 50 percent) in prediction error compared to estimation with unadjusted regional models, may be unacceptable for some applications. The user may wish to collect additional local data for these constituents and repeat the analysis, or calibrate an independent local regression model.
Stratton, Kelly G; Cook, Andrea J; Jackson, Lisa A; Nelson, Jennifer C
2015-03-30
Sequential methods are well established for randomized clinical trials (RCTs), and their use in observational settings has increased with the development of national vaccine and drug safety surveillance systems that monitor large healthcare databases. Observational safety monitoring requires that sequential testing methods be better equipped to incorporate confounder adjustment and accommodate rare adverse events. New methods designed specifically for observational surveillance include a group sequential likelihood ratio test that uses exposure matching and generalized estimating equations approach that involves regression adjustment. However, little is known about the statistical performance of these methods or how they compare to RCT methods in both observational and rare outcome settings. We conducted a simulation study to determine the type I error, power and time-to-surveillance-end of group sequential likelihood ratio test, generalized estimating equations and RCT methods that construct group sequential Lan-DeMets boundaries using data from a matched (group sequential Lan-DeMets-matching) or unmatched regression (group sequential Lan-DeMets-regression) setting. We also compared the methods using data from a multisite vaccine safety study. All methods had acceptable type I error, but regression methods were more powerful, faster at detecting true safety signals and less prone to implementation difficulties with rare events than exposure matching methods. Method performance also depended on the distribution of information and extent of confounding by site. Our results suggest that choice of sequential method, especially the confounder control strategy, is critical in rare event observational settings. These findings provide guidance for choosing methods in this context and, in particular, suggest caution when conducting exposure matching.
Methods for Adjusting U.S. Geological Survey Rural Regression Peak Discharges in an Urban Setting
Moglen, Glenn E.; Shivers, Dorianne E.
2006-01-01
A study was conducted of 78 U.S. Geological Survey gaged streams that have been subjected to varying degrees of urbanization over the last three decades. Flood-frequency analysis coupled with nonlinear regression techniques were used to generate a set of equations for converting peak discharge estimates determined from rural regression equations to a set of peak discharge estimates that represent known urbanization. Specifically, urban regression equations for the 2-, 5-, 10-, 25-, 50-, 100-, and 500-year return periods were calibrated as a function of the corresponding rural peak discharge and the percentage of impervious area in a watershed. The results of this study indicate that two sets of equations, one set based on imperviousness and one set based on population density, performed well. Both sets of equations are dependent on rural peak discharges, a measure of development (average percentage of imperviousness or average population density), and a measure of homogeneity of development within a watershed. Average imperviousness was readily determined by using geographic information system methods and commonly available land-cover data. Similarly, average population density was easily determined from census data. Thus, a key advantage to the equations developed in this study is that they do not require field measurements of watershed characteristics as did the U.S. Geological Survey urban equations developed in an earlier investigation. During this study, the U.S. Geological Survey PeakFQ program was used as an integral tool in the calibration of all equations. The scarcity of historical land-use data, however, made exclusive use of flow records necessary for the 30-year period from 1970 to 2000. Such relatively short-duration streamflow time series required a nonstandard treatment of the historical data function of the PeakFQ program in comparison to published guidelines. Thus, the approach used during this investigation does not fully comply with the
Chen, Chen-Hsin; Tsay, Yuh-Chyuan; Wu, Ya-Chi; Horng, Cheng-Fang
2013-10-30
In conventional survival analysis there is an underlying assumption that all study subjects are susceptible to the event. In general, this assumption does not adequately hold when investigating the time to an event other than death. Owing to genetic and/or environmental etiology, study subjects may not be susceptible to the disease. Analyzing nonsusceptibility has become an important topic in biomedical, epidemiological, and sociological research, with recent statistical studies proposing several mixture models for right-censored data in regression analysis. In longitudinal studies, we often encounter left, interval, and right-censored data because of incomplete observations of the time endpoint, as well as possibly left-truncated data arising from the dissimilar entry ages of recruited healthy subjects. To analyze these kinds of incomplete data while accounting for nonsusceptibility and possible crossing hazards in the framework of mixture regression models, we utilize a logistic regression model to specify the probability of susceptibility, and a generalized gamma distribution, or a log-logistic distribution, in the accelerated failure time location-scale regression model to formulate the time to the event. Relative times of the conditional event time distribution for susceptible subjects are extended in the accelerated failure time location-scale submodel. We also construct graphical goodness-of-fit procedures on the basis of the Turnbull-Frydman estimator and newly proposed residuals. Simulation studies were conducted to demonstrate the validity of the proposed estimation procedure. The mixture regression models are illustrated with alcohol abuse data from the Taiwan Aboriginal Study Project and hypertriglyceridemia data from the Cardiovascular Disease Risk Factor Two-township Study in Taiwan. PMID:23661280
Chen, Chen-Hsin; Tsay, Yuh-Chyuan; Wu, Ya-Chi; Horng, Cheng-Fang
2013-10-30
In conventional survival analysis there is an underlying assumption that all study subjects are susceptible to the event. In general, this assumption does not adequately hold when investigating the time to an event other than death. Owing to genetic and/or environmental etiology, study subjects may not be susceptible to the disease. Analyzing nonsusceptibility has become an important topic in biomedical, epidemiological, and sociological research, with recent statistical studies proposing several mixture models for right-censored data in regression analysis. In longitudinal studies, we often encounter left, interval, and right-censored data because of incomplete observations of the time endpoint, as well as possibly left-truncated data arising from the dissimilar entry ages of recruited healthy subjects. To analyze these kinds of incomplete data while accounting for nonsusceptibility and possible crossing hazards in the framework of mixture regression models, we utilize a logistic regression model to specify the probability of susceptibility, and a generalized gamma distribution, or a log-logistic distribution, in the accelerated failure time location-scale regression model to formulate the time to the event. Relative times of the conditional event time distribution for susceptible subjects are extended in the accelerated failure time location-scale submodel. We also construct graphical goodness-of-fit procedures on the basis of the Turnbull-Frydman estimator and newly proposed residuals. Simulation studies were conducted to demonstrate the validity of the proposed estimation procedure. The mixture regression models are illustrated with alcohol abuse data from the Taiwan Aboriginal Study Project and hypertriglyceridemia data from the Cardiovascular Disease Risk Factor Two-township Study in Taiwan.
NASA Astrophysics Data System (ADS)
Ozdemir, Adnan
2011-12-01
SummaryIn this study, groundwater spring potential maps produced by three different methods, frequency ratio, weights of evidence, and logistic regression, were evaluated using validation data sets and compared to each other. Groundwater spring occurrence potential maps in the Sultan Mountains (Konya, Turkey) were constructed using the relationship between groundwater spring locations and their causative factors. Groundwater spring locations were identified in the study area from a topographic map. Different thematic maps of the study area, such as geology, topography, geomorphology, hydrology, and land use/cover, have been used to identify groundwater potential zones. Seventeen spring-related parameter layers of the entire study area were used to generate groundwater spring potential maps. These are geology (lithology), fault density, distance to fault, relative permeability of lithologies, elevation, slope aspect, slope steepness, curvature, plan curvature, profile curvature, topographic wetness index, stream power index, sediment transport capacity index, drainage density, distance to drainage, land use/cover, and precipitation. The predictive capability of each model was determined by the area under the relative operating characteristic curve. The areas under the curve for frequency ratio, weights of evidence and logistic regression methods were calculated as 0.903, 0.880, and 0.840, respectively. These results indicate that frequency ratio and weights of evidence models are relatively good estimators, whereas logistic regression is a relatively poor estimator of groundwater spring potential mapping in the study area. The frequency ratio model is simple; the process of input, calculation and output can be readily understood. The produced groundwater spring potential maps can serve planners and engineers in groundwater development plans and land-use planning.
NASA Astrophysics Data System (ADS)
Ozdemir, Adnan
2011-07-01
SummaryThe purpose of this study is to produce a groundwater spring potential map of the Sultan Mountains in central Turkey, based on a logistic regression method within a Geographic Information System (GIS) environment. Using field surveys, the locations of the springs (440 springs) were determined in the study area. In this study, 17 spring-related factors were used in the analysis: geology, relative permeability, land use/land cover, precipitation, elevation, slope, aspect, total curvature, plan curvature, profile curvature, wetness index, stream power index, sediment transport capacity index, distance to drainage, distance to fault, drainage density, and fault density map. The coefficients of the predictor variables were estimated using binary logistic regression analysis and were used to calculate the groundwater spring potential for the entire study area. The accuracy of the final spring potential map was evaluated based on the observed springs. The accuracy of the model was evaluated by calculating the relative operating characteristics. The area value of the relative operating characteristic curve model was found to be 0.82. These results indicate that the model is a good estimator of the spring potential in the study area. The spring potential map shows that the areas of very low, low, moderate and high groundwater spring potential classes are 105.586 km 2 (28.99%), 74.271 km 2 (19.906%), 101.203 km 2 (27.14%), and 90.05 km 2 (24.671%), respectively. The interpretations of the potential map showed that stream power index, relative permeability of lithologies, geology, elevation, aspect, wetness index, plan curvature, and drainage density play major roles in spring occurrence and distribution in the Sultan Mountains. The logistic regression approach has not yet been used to delineate groundwater potential zones. In this study, the logistic regression method was used to locate potential zones for groundwater springs in the Sultan Mountains. The evolved model
Ho Hoang, Khai-Long; Mombaur, Katja
2015-10-15
Dynamic modeling of the human body is an important tool to investigate the fundamentals of the biomechanics of human movement. To model the human body in terms of a multi-body system, it is necessary to know the anthropometric parameters of the body segments. For young healthy subjects, several data sets exist that are widely used in the research community, e.g. the tables provided by de Leva. None such comprehensive anthropometric parameter sets exist for elderly people. It is, however, well known that body proportions change significantly during aging, e.g. due to degenerative effects in the spine, such that parameters for young people cannot be used for realistically simulating the dynamics of elderly people. In this study, regression equations are derived from the inertial parameters, center of mass positions, and body segment lengths provided by de Leva to be adjustable to the changes in proportion of the body parts of male and female humans due to aging. Additional adjustments are made to the reference points of the parameters for the upper body segments as they are chosen in a more practicable way in the context of creating a multi-body model in a chain structure with the pelvis representing the most proximal segment.
Ho Hoang, Khai-Long; Mombaur, Katja
2015-10-15
Dynamic modeling of the human body is an important tool to investigate the fundamentals of the biomechanics of human movement. To model the human body in terms of a multi-body system, it is necessary to know the anthropometric parameters of the body segments. For young healthy subjects, several data sets exist that are widely used in the research community, e.g. the tables provided by de Leva. None such comprehensive anthropometric parameter sets exist for elderly people. It is, however, well known that body proportions change significantly during aging, e.g. due to degenerative effects in the spine, such that parameters for young people cannot be used for realistically simulating the dynamics of elderly people. In this study, regression equations are derived from the inertial parameters, center of mass positions, and body segment lengths provided by de Leva to be adjustable to the changes in proportion of the body parts of male and female humans due to aging. Additional adjustments are made to the reference points of the parameters for the upper body segments as they are chosen in a more practicable way in the context of creating a multi-body model in a chain structure with the pelvis representing the most proximal segment. PMID:26338096
Zhang, B; Liang, X L; Gao, H Y; Ye, L S; Wang, Y G
2016-05-13
We evaluated the application of three machine learning algorithms, including logistic regression, support vector machine and back-propagation neural network, for diagnosing congenital heart disease and colorectal cancer. By inspecting related serum tumor marker levels in colorectal cancer patients and healthy subjects, early diagnosis models for colorectal cancer were built using three machine learning algorithms to assess their corresponding diagnostic values. Except for serum alpha-fetoprotein, the levels of 11 other serum markers of patients in the colorectal cancer group were higher than those in the benign colorectal cancer group (P < 0.05). The results of logistic regression analysis indicted that individual detection of serum carcinoembryonic antigens, CA199, CA242, CA125, and CA153 and their combined detection was effective for diagnosing colorectal cancer. Combined detection had a better diagnostic effect with a sensitivity of 94.2% and specificity of 97.7%; combining serum carcinoembryonic antigens, CA199, CA242, CA125, and CA153, with the support vector machine diagnosis model and back-propagation, a neural network diagnosis model was built with diagnostic accuracies of 82 and 75%, sensitivities of 85 and 80%, and specificities of 80 and 70%, respectively. Colorectal cancer diagnosis models based on the three machine learning algorithms showed high diagnostic value and can help obtain evidence for the early diagnosis of colorectal cancer.
NASA Astrophysics Data System (ADS)
Dixon, Barnali
2009-09-01
Accurate and inexpensive identification of potentially contaminated wells is critical for water resources protection and management. The objectives of this study are to 1) assess the suitability of approximation tools such as neural networks (NN) and support vector machines (SVM) integrated in a geographic information system (GIS) for identifying contaminated wells and 2) use logistic regression and feature selection methods to identify significant variables for transporting contaminants in and through the soil profile to the groundwater. Fourteen GIS derived soil hydrogeologic and landuse parameters were used as initial inputs in this study. Well water quality data (nitrate-N) from 6,917 wells provided by Florida Department of Environmental Protection (USA) were used as an output target class. The use of the logistic regression and feature selection methods reduced the number of input variables to nine. Receiver operating characteristics (ROC) curves were used for evaluation of these approximation tools. Results showed superior performance with the NN as compared to SVM especially on training data while testing results were comparable. Feature selection did not improve accuracy; however, it helped increase the sensitivity or true positive rate (TPR). Thus, a higher TPR was obtainable with fewer variables.
de Vet, Emely; Chinapaw, Mai JM; de Boer, Michiel; Seidell, Jacob C; Brug, Johannes
2014-01-01
Background Playing video games contributes substantially to sedentary behavior in youth. A new generation of video games—active games—seems to be a promising alternative to sedentary games to promote physical activity and reduce sedentary behavior. At this time, little is known about correlates of active and non-active gaming among adolescents. Objective The objective of this study was to examine potential personal, social, and game-related correlates of both active and non-active gaming in adolescents. Methods A survey assessing game behavior and potential personal, social, and game-related correlates was conducted among adolescents (12-16 years, N=353) recruited via schools. Multivariable, multilevel logistic regression analyses, adjusted for demographics (age, sex and educational level of adolescents), were conducted to examine personal, social, and game-related correlates of active gaming ≥1 hour per week (h/wk) and non-active gaming >7 h/wk. Results Active gaming ≥1 h/wk was significantly associated with a more positive attitude toward active gaming (OR 5.3, CI 2.4-11.8; P<.001), a less positive attitude toward non-active games (OR 0.30, CI 0.1-0.6; P=.002), a higher score on habit strength regarding gaming (OR 1.9, CI 1.2-3.2; P=.008) and having brothers/sisters (OR 6.7, CI 2.6-17.1; P<.001) and friends (OR 3.4, CI 1.4-8.4; P=.009) who spend more time on active gaming and a little bit lower score on game engagement (OR 0.95, CI 0.91-0.997; P=.04). Non-active gaming >7 h/wk was significantly associated with a more positive attitude toward non-active gaming (OR 2.6, CI 1.1-6.3; P=.035), a stronger habit regarding gaming (OR 3.0, CI 1.7-5.3; P<.001), having friends who spend more time on non-active gaming (OR 3.3, CI 1.46-7.53; P=.004), and a more positive image of a non-active gamer (OR 2, CI 1.07–3.75; P=.03). Conclusions Various factors were significantly associated with active gaming ≥1 h/wk and non-active gaming >7 h/wk. Active gaming is most
NASA Astrophysics Data System (ADS)
WU, Chunhung
2015-04-01
The research built the original logistic regression landslide susceptibility model (abbreviated as or-LRLSM) and landslide ratio-based ogistic regression landslide susceptibility model (abbreviated as lr-LRLSM), compared the performance and explained the error source of two models. The research assumes that the performance of the logistic regression model can be better if the distribution of landslide ratio and weighted value of each variable is similar. Landslide ratio is the ratio of landslide area to total area in the specific area and an useful index to evaluate the seriousness of landslide disaster in Taiwan. The research adopted the landside inventory induced by 2009 Typhoon Morakot in the Chishan watershed, which was the most serious disaster event in the last decade, in Taiwan. The research adopted the 20 m grid as the basic unit in building the LRLSM, and six variables, including elevation, slope, aspect, geological formation, accumulated rainfall, and bank erosion, were included in the two models. The six variables were divided as continuous variables, including elevation, slope, and accumulated rainfall, and categorical variables, including aspect, geological formation and bank erosion in building the or-LRLSM, while all variables, which were classified based on landslide ratio, were categorical variables in building the lr-LRLSM. Because the count of whole basic unit in the Chishan watershed was too much to calculate by using commercial software, the research took random sampling instead of the whole basic units. The research adopted equal proportions of landslide unit and not landslide unit in logistic regression analysis. The research took 10 times random sampling and selected the group with the best Cox & Snell R2 value and Nagelkerker R2 value as the database for the following analysis. Based on the best result from 10 random sampling groups, the or-LRLSM (lr-LRLSM) is significant at the 1% level with Cox & Snell R2 = 0.190 (0.196) and Nagelkerke R2
NASA Astrophysics Data System (ADS)
Ozdemir, Adnan; Altural, Tolga
2013-03-01
This study evaluated and compared landslide susceptibility maps produced with three different methods, frequency ratio, weights of evidence, and logistic regression, by using validation datasets. The field surveys performed as part of this investigation mapped the locations of 90 landslides that had been identified in the Sultan Mountains of south-western Turkey. The landslide influence parameters used for this study are geology, relative permeability, land use/land cover, precipitation, elevation, slope, aspect, total curvature, plan curvature, profile curvature, wetness index, stream power index, sediment transportation capacity index, distance to drainage, distance to fault, drainage density, fault density, and spring density maps. The relationships between landslide distributions and these parameters were analysed using the three methods, and the results of these methods were then used to calculate the landslide susceptibility of the entire study area. The accuracy of the final landslide susceptibility maps was evaluated based on the landslides observed during the fieldwork, and the accuracy of the models was evaluated by calculating each model's relative operating characteristic curve. The predictive capability of each model was determined from the area under the relative operating characteristic curve and the areas under the curves obtained using the frequency ratio, logistic regression, and weights of evidence methods are 0.976, 0.952, and 0.937, respectively. These results indicate that the frequency ratio and weights of evidence models are relatively good estimators of landslide susceptibility in the study area. Specifically, the results of the correlation analysis show a high correlation between the frequency ratio and weights of evidence results, and the frequency ratio and logistic regression methods exhibit correlation coefficients of 0.771 and 0.727, respectively. The frequency ratio model is simple, and its input, calculation and output processes are
2013-01-01
Background In Brazil, 99% of the cases of malaria are concentrated in the Amazon region, with high level of transmission. The objectives of the study were to use geographic information systems (GIS) analysis and logistic regression as a tool to identify and analyse the relative likelihood and its socio-environmental determinants of malaria infection in the Vale do Amanhecer rural settlement, Brazil. Methods A GIS database of georeferenced malaria cases, recorded in 2005, and multiple explanatory data layers was built, based on a multispectral Landsat 5 TM image, digital map of the settlement blocks and a SRTM digital elevation model. Satellite imagery was used to map the spatial patterns of land use and cover (LUC) and to derive spectral indices of vegetation density (NDVI) and soil/vegetation humidity (VSHI). An Euclidian distance operator was applied to measure proximity of domiciles to potential mosquito breeding habitats and gold mining areas. The malaria risk model was generated by multiple logistic regression, in which environmental factors were considered as independent variables and the number of cases, binarized by a threshold value was the dependent variable. Results Out of a total of 336 cases of malaria, 133 positive slides were from inhabitants at Road 08, which corresponds to 37.60% of the notifications. The southern region of the settlement presented 276 cases and a greater number of domiciles in which more than ten cases/home were notified. From these, 102 (30.36%) cases were caused by Plasmodium falciparum and 174 (51.79%) cases by Plasmodium vivax. Malaria risk is the highest in the south of the settlement, associated with proximity to gold mining sites, intense land use, high levels of soil/vegetation humidity and low vegetation density. Conclusions Mid-resolution, remote sensing data and GIS-derived distance measures can be successfully combined with digital maps of the housing location of (non-) infected inhabitants to predict relative
Jafari, Peyman; Sharafi, Zahra; Bagheri, Zahra; Shalileh, Sara
2014-06-01
Measurement equivalence is a necessary assumption for meaningful comparison of pediatric quality of life rated by children and parents. In this study, differential item functioning (DIF) analysis is used to examine whether children and their parents respond consistently to the items in the KINDer Lebensqualitätsfragebogen (KINDL; in German, Children Quality of Life Questionnaire). Two DIF detection methods, graded response model (GRM) and ordinal logistic regression (OLR), were applied for comparability. The KINDL was completed by 1,086 school children and 1,061 of their parents. While the GRM revealed that 12 out of the 24 items were flagged with DIF, the OLR identified 14 out of the 24 items with DIF. Seven items with DIF and five items without DIF were common across the two methods, yielding a total agreement rate of 50 %. This study revealed that parent proxy-reports cannot be used as a substitute for a child's ratings in the KINDL.
Woods, Carol M; Oltmanns, Thomas F; Turkheimer, Eric
2008-06-01
Person-fit assessment is used to identify persons who respond aberrantly to a test or questionnaire. In this study, S. P. Reise's (2000) method for evaluating person fit using 2-level logistic regression was applied to 13 personality scales of the Schedule for Nonadaptive and Adaptive Personality (SNAP; L. Clark, 1996) that had been administered to military recruits (N = 2,026). Results revealed significant person-fit heterogeneity and indicated that for 5 SNAP scales (Disinhibition, Entitlement, Exhibitionism, Negative Temperament, and Workaholism), the scale was more discriminating for some people than for others. Possible causes of aberrant responding were explored with several covariates. On all 5 scales, severe pathology emerged as a key influence on responses, and there was evidence of differential test functioning with respect to gender, ethnicity, or both. Other potential sources of aberrancy were carelessness, haphazard responding, or uncooperativeness. Social desirability was not as influential as expected.
Chan, Kung-Sik; Jiao, Feiran; Mikulski, Marek A.; Gerke, Alicia; Guo, Junfeng; Newell, John D; Hoffman, Eric A.; Thompson, Brad; Lee, Chang Hyun; Fuortes, Laurence J.
2015-01-01
Rationale and Objectives We evaluated the role of automated quantitative computed tomography (CT) scan interpretation algorithm in detecting Interstitial Lung Disease (ILD) and/or emphysema in a sample of elderly subjects with mild lung disease.ypothesized that the quantification and distributions of CT attenuation values on lung CT, over a subset of Hounsfield Units (HU) range [−1000 HU, 0 HU], can differentiate early or mild disease from normal lung. Materials and Methods We compared results of quantitative spiral rapid end-exhalation (functional residual capacity; FRC) and end-inhalation (total lung capacity; TLC) CT scan analyses in 52 subjects with radiographic evidence of mild fibrotic lung disease to 17 normal subjects. Several CT value distributions were explored, including (i) that from the peripheral lung taken at TLC (with peels at 15 or 65mm), (ii) the ratio of (i) to that from the core of lung, and (iii) the ratio of (ii) to its FRC counterpart. We developed a fused-lasso logistic regression model that can automatically identify sub-intervals of [−1000 HU, 0 HU] over which a CT value distribution provides optimal discrimination between abnormal and normal scans. Results The fused-lasso logistic regression model based on (ii) with 15 mm peel identified the relative frequency of CT values over [−1000, −900] and that over [−450,−200] HU as a means of discriminating abnormal versus normal, resulting in a zero out-sample false positive rate and 15%false negative rate of that was lowered to 12% by pooling information. Conclusions We demonstrated the potential usefulness of this novel quantitative imaging analysis method in discriminating ILD and/or emphysema from normal lungs. PMID:26776294
Robertson, John M.; Soehn, Matthias; Yan Di
2010-05-01
Purpose: Understanding the dose-volume relationship of small bowel irradiation and severe acute diarrhea may help reduce the incidence of this side effect during adjuvant treatment for rectal cancer. Methods and Materials: Consecutive patients treated curatively for rectal cancer were reviewed, and the maximum grade of acute diarrhea was determined. The small bowel was outlined on the treatment planning CT scan, and a dose-volume histogram was calculated for the initial pelvic treatment (45 Gy). Logistic regression models were fitted for varying cutoff-dose levels from 5 to 45 Gy in 5-Gy increments. The model with the highest LogLikelihood was used to develop a cutoff-dose normal tissue complication probability (NTCP) model. Results: There were a total of 152 patients (48% preoperative, 47% postoperative, 5% other), predominantly treated prone (95%) with a three-field technique (94%) and a protracted venous infusion of 5-fluorouracil (78%). Acute Grade 3 diarrhea occurred in 21%. The largest LogLikelihood was found for the cutoff-dose logistic regression model with 15 Gy as the cutoff-dose, although the models for 20 Gy and 25 Gy had similar significance. According to this model, highly significant correlations (p <0.001) between small bowel volumes receiving at least 15 Gy and toxicity exist in the considered patient population. Similar findings applied to both the preoperatively (p = 0.001) and postoperatively irradiated groups (p = 0.001). Conclusion: The incidence of Grade 3 diarrhea was significantly correlated with the volume of small bowel receiving at least 15 Gy using a cutoff-dose NTCP model.
Snyder, Marcia; Freeman, Mary C.; Purucker, S. Thomas; Pringle, Catherine M.
2015-01-01
Freshwater shrimps are an important biotic component of tropical ecosystems. However, they can have a low probability of detection when abundances are low. We sampled 3 of the most common freshwater shrimp species, Macrobrachium olfersii, Macrobrachium carcinus, and Macrobrachium heterochirus, and used occupancy modeling and logistic regression models to improve our limited knowledge of distribution of these cryptic species by investigating both local- and landscape-scale effects at La Selva Biological Station in Costa Rica. Local-scale factors included substrate type and stream size, and landscape-scale factors included presence or absence of regional groundwater inputs. Capture rates for 2 of the sampled species (M. olfersii and M. carcinus) were sufficient to compare the fit of occupancy models. Occupancy models did not converge for M. heterochirus, but M. heterochirus had high enough occupancy rates that logistic regression could be used to model the relationship between occupancy rates and predictors. The best-supported models for M. olfersii and M. carcinus included conductivity, discharge, and substrate parameters. Stream size was positively correlated with occupancy rates of all 3 species. High stream conductivity, which reflects the quantity of regional groundwater input into the stream, was positively correlated with M. olfersii occupancy rates. Boulder substrates increased occupancy rate of M. carcinus and decreased the detection probability of M. olfersii. Our models suggest that shrimp distribution is driven by factors that function at local (substrate and discharge) and landscape (conductivity) scales.
O'Dwyer, Jean; Morris Downes, Margaret; Adley, Catherine C
2016-02-01
This study analyses the relationship between meteorological phenomena and outbreaks of waterborne-transmitted vero cytotoxin-producing Escherichia coli (VTEC) in the Republic of Ireland over an 8-year period (2005-2012). Data pertaining to the notification of waterborne VTEC outbreaks were extracted from the Computerised Infectious Disease Reporting system, which is administered through the national Health Protection Surveillance Centre as part of the Health Service Executive. Rainfall and temperature data were obtained from the national meteorological office and categorised as cumulative rainfall, heavy rainfall events in the previous 7 days, and mean temperature. Regression analysis was performed using logistic regression (LR) analysis. The LR model was significant (p < 0.001), with all independent variables: cumulative rainfall, heavy rainfall and mean temperature making a statistically significant contribution to the model. The study has found that rainfall, particularly heavy rainfall in the preceding 7 days of an outbreak, is a strong statistical indicator of a waterborne outbreak and that temperature also impacts waterborne VTEC outbreak occurrence.
NASA Astrophysics Data System (ADS)
Lombardo, L.; Cama, M.; Maerker, M.; Parisi, L.; Rotigliano, E.
2014-12-01
This study aims at comparing the performances of Binary Logistic Regression (BLR) and Boosted Regression Trees (BRT) methods in assessing landslide susceptibility for multiple-occurrence regional landslide events within the Mediterranean region. A test area was selected in the north-eastern sector of Sicily (southern Italy), corresponding to the catchments of the Briga and the Giampilieri streams both stretching for few kilometres from the Peloritan ridge (eastern Sicily, Italy) to the Ionian sea. This area was struck on the 1st October 2009 by an extreme climatic event resulting in thousands of rapid shallow landslides, mainly of debris flows and debris avalanches types involving the weathered layer of a low to high grade metamorphic bedrock. Exploiting the same set of predictors and the 2009 landslide archive, BLR- and BRT-based susceptibility models were obtained for the two catchments separately, adopting a random partition (RP) technique for validation; besides, the models trained in one of the two catchments (Briga) were tested in predicting the landslide distribution in the other (Giampilieri), adopting a spatial partition (SP) based validation procedure. All the validation procedures were based on multi-folds tests so to evaluate and compare the reliability of the fitting, the prediction skill, the coherence in the predictor selection and the precision of the susceptibility estimates. All the obtained models for the two methods produced very high predictive performances, with a general congruence between BLR and BRT in the predictor importance. In particular, the research highlighted that BRT-models reached a higher prediction performance with respect to BLR-models, for RP based modelling, whilst for the SP-based models the difference in predictive skills between the two methods dropped drastically, converging to an analogous excellent performance. However, when looking at the precision of the probability estimates, BLR demonstrated to produce more robust
Mumcu Kucuker, Derya; Baskent, Emin Zeki
2015-01-01
Integration of non-wood forest products (NWFPs) into forest management planning has become an increasingly important issue in forestry over the last decade. Among NWFPs, mushrooms are valued due to their medicinal, commercial, high nutritional and recreational importance. Commercial mushroom harvesting also provides important income to local dwellers and contributes to the economic value of regional forests. Sustainable management of these products at the regional scale requires information on their locations in diverse forest settings and the ability to predict and map their spatial distributions over the landscape. This study focuses on modeling the spatial distribution of commercially harvested Lactarius deliciosus and L. salmonicolor mushrooms in the Kızılcasu Forest Planning Unit, Turkey. The best models were developed based on topographic, climatic and stand characteristics, separately through logistic regression analysis using SPSS™. The best topographic model provided better classification success (69.3 %) than the best climatic (65.4 %) and stand (65 %) models. However, the overall best model, with 73 % overall classification success, used a mix of several variables. The best models were integrated into an Arc/Info GIS program to create spatial distribution maps of L. deliciosus and L. salmonicolor in the planning area. Our approach may be useful to predict the occurrence and distribution of other NWFPs and provide a valuable tool for designing silvicultural prescriptions and preparing multiple-use forest management plans. PMID:24821473
NASA Astrophysics Data System (ADS)
Venkataraman, Kartik; Uddameri, Venkatesh
2012-08-01
SummaryThe occurrence of elevated levels of arsenic and nitrate in aquifers impacted by agricultural activities is common and can result in adverse health effects in rural areas. Numerous wells located in the Ogallala aquifer in the Southern High Plains of Texas have tested positive for both arsenic and nitrate MCL exceedance. To model the simultaneous exceedance of both chemicals, two types of Logistic Regression (LR) models were developed by (a) treating arsenic and nitrate independently and combining the marginal probabilities of their exceedance, and (b) treating the two exceedances together by using a multinomial model. Influencing variables representative of both soil and aquifer properties and data for which was readily available were identified. The predictive capacities of the two models were evaluated using Received Operating Characteristics (ROCs) and spatial trends in predictions were studied. The LR model constructed from the marginal probabilities had lower overall accuracy (59% correct classifications) and was extremely conservative by over-predicting outcomes. In contrast, the multinomial model showed good overall accuracy (79% correct classifications), made the correct predictions 90% of the time when both arsenic and nitrate MCL exceedances were observed, and was a good fit for wells located in agricultural areas. The results of the multinomial model also confirm previous studies that attributed shallow subsurface arsenic to anthropogenic activities. Based on the insights provided by the model it is recommended that where agricultural areas are concerned, the occurrence of arsenic and nitrate are better evaluated together.
Hung, J.; Chaitman, B.R.; Lam, J.; Lesperance, J.; Dupras, G.; Fines, P.; Cherkaoui, O.; Robert, P.; Bourassa, M.G.
1985-08-01
The incremental diagnostic yield of clinical data, exercise ECG, stress thallium scintigraphy, and cardiac fluoroscopy to predict coronary and multivessel disease was assessed in 171 symptomatic men by means of multiple logistic regression analyses. When clinical variables alone were analyzed, chest pain type and age were predictive of coronary disease, whereas chest pain type, age, a family history of premature coronary disease before age 55 years, and abnormal ST-T wave changes on the rest ECG were predictive of multivessel disease. The percentage of patients correctly classified by cardiac fluoroscopy (presence or absence of coronary artery calcification), exercise ECG, and thallium scintigraphy was 9%, 25%, and 50%, respectively, greater than for clinical variables, when the presence or absence of coronary disease was the outcome, and 13%, 25%, and 29%, respectively, when multivessel disease was studied; 5% of patients were misclassified. When the 37 clinical and noninvasive test variables were analyzed jointly, the most significant variable predictive of coronary disease was an abnormal thallium scan and for multivessel disease, the amount of exercise performed. The data from this study provide a quantitative model and confirm previous reports that optimal diagnostic efficacy is obtained when noninvasive tests are ordered sequentially. In symptomatic men, cardiac fluoroscopy is a relatively ineffective test when compared to exercise ECG and thallium scintigraphy.
Botha, J; de Ridder, J H; Potgieter, J C; Steyn, H S; Malan, L
2013-10-01
A recently proposed model for waist circumference cut points (RPWC), driven by increased blood pressure, was demonstrated in an African population. We therefore aimed to validate the RPWC by comparing the RPWC and the Joint Statement Consensus (JSC) models via Logistic Regression (LR) and Neural Networks (NN) analyses. Urban African gender groups (N=171) were stratified according to the JSC and RPWC cut point models. Ultrasound carotid intima media thickness (CIMT), blood pressure (BP) and fasting bloods (glucose, high density lipoprotein (HDL) and triglycerides) were obtained in a well-controlled setting. The RPWC male model (LR ROC AUC: 0.71, NN ROC AUC: 0.71) was practically equal to the JSC model (LR ROC AUC: 0.71, NN ROC AUC: 0.69) to predict structural vascular -disease. Similarly, the female RPWC model (LR ROC AUC: 0.84, NN ROC AUC: 0.82) and JSC model (LR ROC AUC: 0.82, NN ROC AUC: 0.81) equally predicted CIMT as surrogate marker for structural vascular disease. Odds ratios supported validity where prediction of CIMT revealed -clinical -significance, well over 1, for both the JSC and RPWC models in African males and females (OR 3.75-13.98). In conclusion, the proposed RPWC model was substantially validated utilizing linear and non-linear analyses. We therefore propose ethnic-specific WC cut points (African males, ≥90 cm; -females, ≥98 cm) to predict a surrogate marker for structural vascular disease.
Allahyari, Elahe; Jafari, Peyman; Bagheri, Zahra
2016-01-01
Objective. The present study uses simulated data to find what the optimal number of response categories is to achieve adequate power in ordinal logistic regression (OLR) model for differential item functioning (DIF) analysis in psychometric research. Methods. A hypothetical ten-item quality of life scale with three, four, and five response categories was simulated. The power and type I error rates of OLR model for detecting uniform DIF were investigated under different combinations of ability distribution (θ), sample size, sample size ratio, and the magnitude of uniform DIF across reference and focal groups. Results. When θ was distributed identically in the reference and focal groups, increasing the number of response categories from 3 to 5 resulted in an increase of approximately 8% in power of OLR model for detecting uniform DIF. The power of OLR was less than 0.36 when ability distribution in the reference and focal groups was highly skewed to the left and right, respectively. Conclusions. The clearest conclusion from this research is that the minimum number of response categories for DIF analysis using OLR is five. However, the impact of the number of response categories in detecting DIF was lower than might be expected. PMID:27403207
NASA Astrophysics Data System (ADS)
Bornaetxea, Txomin; Antigüedad, Iñaki; Ormaetxea, Orbange
2016-04-01
In the Oria river basin (885 km2) shallow landslides are very frequent and they produce several roadblocks and damage in the infrastructure and properties, causing big economic loss every year. Considering that the zonification of the territory in different landslide susceptibility levels provides a useful tool for the territorial planning and natural risk management, this study has the objective of identifying the most prone landslide places applying an objective and reproducible methodology. To do so, a quantitative multivariate methodology, the logistic regression, has been used. Fieldwork landslide points and randomly selected stable points have been used along with Lithology, Land Use, Distance to the transport infrastructure, Altitude, Senoidal Slope and Normalized Difference Vegetation Index (NDVI) independent variables to carry out a landslide susceptibility map. The model has been validated by the prediction and success rate curves and their corresponding area under the curve (AUC). In addition, the result has been compared to those from two landslide susceptibility models, covering the study area previously applied in different scales, such as ELSUS1000 version 1 (2013) and Landslide Susceptibility Map of Gipuzkoa (2007). Validation results show an excellent prediction capacity of the proposed model (AUC 0,962), and comparisons highlight big differences with previous studies.
Yamashita, Takashi; Kart, Cary S; Noe, Douglas A
2012-12-01
Type 2 diabetes is known to contribute to health disparities in the U.S. and failure to adhere to recommended self-care behaviors is a contributing factor. Intervention programs face difficulties as a result of patient diversity and limited resources. With data from the 2005 Behavioral Risk Factor Surveillance System, this study employs a logistic regression tree algorithm to identify characteristics of sub-populations with type 2 diabetes according to their reported frequency of adherence to four recommended diabetes self-care behaviors including blood glucose monitoring, foot examination, eye examination and HbA1c testing. Using Andersen's health behavior model, need factors appear to dominate the definition of which sub-groups were at greatest risk for low as well as high adherence. Findings demonstrate the utility of easily interpreted tree diagrams to design specific culturally appropriate intervention programs targeting sub-populations of diabetes patients who need to improve their self-care behaviors. Limitations and contributions of the study are discussed.
Dubey, Anil K
2003-01-01
Coronary artery disease is a leading cause of death and disability in the United States and throughout the developed world. Results from large randomized, blinded, placebo-controlled trials have demonstrated clearly the benefit of lowering LDL cholesterol in lowering the risk for coronary artery disease. Unfortunately, despite the quantity of evidence, and the availability of medications that can efficiently lower LDL cholesterol with few side effects, not everyone who could benefit from cholesterol lowering interventions actually receives them. Despite the dissemination of national care guidelines for the evaluation and treatment of cholesterol levels (NCEP - National Cholesterol Education Program), compliance with such guidelines is suboptimal. There clearly is room for improvement in narrowing the gap between evidence based guidelines and actual clinical practice. The ability to classify those patients who are or will likely to be noncompliant on the basis of patient data routinely collected during patient care could be potentially useful by enabling the focusing of limited health care resources to those who are or will be at high risk of being under treated. In order to explore this possibility further, we attempted to create such classifiers of cholesterol guideline compliance. To do this, we obtained data from an ambulatory electronic medical record system at use at the MGH adult primary care practices for over 20 years. We obtained the data from this hierarchically-structured EMR using its own native query language, called MQL (Medical Query Language). Next, we applied to the collected data the machine learning techniques of rough set theory, neural networks (feed forward backpropagation nets), and logistic regression. We did this by using commonly available software that for the most part is freely available via the internet. We then compared the accuracy of the classifier models using the receiver operating characteristic (ROC) area and C-index summary
Unitary Response Regression Models
ERIC Educational Resources Information Center
Lipovetsky, S.
2007-01-01
The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…
ERIC Educational Resources Information Center
Thatcher, Greg W.; Henson, Robin K.
This study examined research in training and development to determine effect size reporting practices. It focused on the reporting of corrected effect sizes in research articles using multiple regression analyses. When possible, researchers calculated corrected effect sizes and determine if the associated shrinkage could have impacted researcher…
ERIC Educational Resources Information Center
Tipton, Elizabeth; Pustejovsky, James E.
2015-01-01
Randomized experiments are commonly used to evaluate the effectiveness of educational interventions. The goal of the present investigation is to develop small-sample corrections for multiple contrast hypothesis tests (i.e., F-tests) such as the omnibus test of meta-regression fit or a test for equality of three or more levels of a categorical…
Zhang, Shi-Yan; Lin, Bi-Ding; Li, Bu-Ren
2015-01-01
The purpose of this study was to evaluate the diagnostic efficiency for hepatocellular carcinoma (HCC) with the combined analysis of alpha-l-fucosidase (AFU), alpha-fetoprotein (AFP) and thymidine kinase 1 (TK1). Serum levels of AFU, AFP and TK1 were measured in: 116 patients with HCC, 109 patients with benign hepatic diseases, and 104 normal subjects. The diagnostic value was analyzed using the logistic regression equation and receiver operating characteristic curves (ROC). Statistical distribution of the three tested tumor markers in every group was non-normally distributed (Kolmogorov–Sminov test, Z = 0.156–0.517, P < 0.001). The serum levels of AFP and TK1 in patients with HCC were significantly higher than those in patients with benign hepatic diseases (Mann–Whitney U test, Z = −8.570 to –5.943, all P < 0.001). However, there was no statistically significant difference of AFU between these two groups (Mann–Whitney U test, Z = −1.820, P = 0.069). The levels of AFU were significantly higher in patients with benign hepatic diseases than in normal subjects (Mann–Whitney U test, Z = −7.984, P < 0.001). Receiver operating characteristic curves (ROC) in patients with HCC versus those without HCC indicated the optimal cut-off value was 40.80 U/L for AFU, 10.86 μg/L for AFP and 1.92 pmol/L for TK1, respectively. The area under ROC curve (AUC) was 0.718 for AFU, 0.832 for AFP, 0.773 for TK1 and 0.900 for the combination of the three tumor markers. The combination resulted in a higher Youden index and a sensitivity of 85.3%. The combined detection of serum AFU, AFP and TK1 could play a complementary role in the diagnosis of HCC, and could significantly improve the sensitivity for the diagnosis of HCC. PMID:25870783
NASA Astrophysics Data System (ADS)
Cama, Mariaelena; Cristi Nicu, Ionut; Conoscenti, Christian; Quénéhervé, Geraldine; Maerker, Michael
2016-04-01
Landslide susceptibility can be defined as the likelihood of a landslide occurring in a given area on the basis of local terrain conditions. In the last decades many research focused on its evaluation by means of stochastic approaches under the assumption that 'the past is the key to the future' which means that if a model is able to reproduce a known landslide spatial distribution, it will be able to predict the future locations of new (i.e. unknown) slope failures. Among the various stochastic approaches, Binary Logistic Regression (BLR) is one of the most used because it calculates the susceptibility in probabilistic terms and its results are easily interpretable from a geomorphological point of view. However, very often not much importance is given to multicollinearity assessment whose effect is that the coefficient estimates are unstable, with opposite sign and therefore difficult to interpret. Therefore, it should be evaluated every time in order to make a model whose results are geomorphologically correct. In this study the effects of multicollinearity in the predictive performance and robustness of landslide susceptibility models are analyzed. In particular, the multicollinearity is estimated by means of Variation Inflation Index (VIF) which is also used as selection criterion for the independent variables (VIF Stepwise Selection) and compared to the more commonly used AIC Stepwise Selection. The robustness of the results is evaluated through 100 replicates of the dataset. The study area selected to perform this analysis is the Moldavian Plateau where landslides are among the most frequent geomorphological processes. This area has an increasing trend of urbanization and a very high potential regarding the cultural heritage, being the place of discovery of the largest settlement belonging to the Cucuteni Culture from Eastern Europe (that led to the development of the great complex Cucuteni-Tripyllia). Therefore, identifying the areas susceptible to
NASA Astrophysics Data System (ADS)
El Kadiri, R.; Sultan, M.; Elbayoumi, T.; Sefry, S.
2013-12-01
Efforts to map the distribution of debris flows, to assess the factors controlling their development, and to identify the areas prone to their development are often hampered by the absence or paucity of appropriate monitoring systems and historical databases and the inaccessibility of these areas in many parts of the world. We developed methodologies that heavily rely on readily available observations extracted from remote sensing datasets and successfully applied these techniques over the the Jazan province, in the Red Sea hills of Saudi Arabia. We first identified debris flows (10,334 locations) from high spatial resolution satellite datasets (e.g., GeoEye, Orbview), and verified a subset of these occurrences in the field. We then constructed a GIS to host the identified debris flow locations together with co-registered relevant data (e.g., lithology, elevation) and derived products (e.g., slope, normalized difference vegetation index, etc). Spatial analysis of the data sets in the GIS sets indicated various degrees of correspondence between the distribution of debris flows and various variables (e.g., stream power index, topographic position index, normalized difference vegetation index, distance to stream, flow accumulation, slope and soil weathering index, aspect, elevation) suggesting a causal effect. For example, debris flows were found in areas of high slope, low distance to low stream orders and low vegetation index. To evaluate the extent to which these factors control landslide distribution, we constructed and applied: (1) a stepwise input selection by testing all input combinations to make the final model more compact and effective, (2) a statistic-based binary logistic regression (BLR) model, and (3) a mathematical-based artificial neural network (ANN) model. Only 80% (8267 locations) of the data was used for the construction of each of the models and the remaining samples (2067 locations) were used for the accuracy assessment purposes. Results
Diaz, Andres; Rossow, Stephanie; Ciarlet, Max; Marthaler, Douglas
2016-01-01
Rotaviruses (RV) are important causes of diarrhea in animals, especially in domestic animals. Of the 9 RV species, rotavirus A, B, and C (RVA, RVB, and RVC, respectively) had been established as important causes of diarrhea in pigs. The Minnesota Veterinary Diagnostic Laboratory receives swine stool samples from North America to determine the etiologic agents of disease. Between November 2009 and October 2011, 7,508 samples from pigs with diarrhea were submitted to determine if enteric pathogens, including RV, were present in the samples. All samples were tested for RVA, RVB, and RVC by real time RT-PCR. The majority of the samples (82%) were positive for RVA, RVB, and/or RVC. To better understand the risk factors associated with RV infections in swine diagnostic samples, three-level mixed-effects logistic regression models (3L-MLMs) were used to estimate associations among RV species, age, and geographical variability within the major swine production regions in North America. The conditional odds ratios (cORs) for RVA and RVB detection were lower for 1–3 day old pigs when compared to any other age group. However, the cOR of RVC detection in 1–3 day old pigs was significantly higher (p < 0.001) than pigs in the 4–20 days old and >55 day old age groups. Furthermore, pigs in the 21–55 day old age group had statistically higher cORs of RV co-detection compared to 1–3 day old pigs (p < 0.001). The 3L-MLMs indicated that RV status was more similar within states than among states or within each region. Our results indicated that 3L-MLMs are a powerful and adaptable tool to handle and analyze large-hierarchical datasets. In addition, our results indicated that, overall, swine RV epidemiology is complex, and RV species are associated with different age groups and vary by regions in North America. PMID:27145176
Homwong, Nitipong; Diaz, Andres; Rossow, Stephanie; Ciarlet, Max; Marthaler, Douglas
2016-01-01
Rotaviruses (RV) are important causes of diarrhea in animals, especially in domestic animals. Of the 9 RV species, rotavirus A, B, and C (RVA, RVB, and RVC, respectively) had been established as important causes of diarrhea in pigs. The Minnesota Veterinary Diagnostic Laboratory receives swine stool samples from North America to determine the etiologic agents of disease. Between November 2009 and October 2011, 7,508 samples from pigs with diarrhea were submitted to determine if enteric pathogens, including RV, were present in the samples. All samples were tested for RVA, RVB, and RVC by real time RT-PCR. The majority of the samples (82%) were positive for RVA, RVB, and/or RVC. To better understand the risk factors associated with RV infections in swine diagnostic samples, three-level mixed-effects logistic regression models (3L-MLMs) were used to estimate associations among RV species, age, and geographical variability within the major swine production regions in North America. The conditional odds ratios (cORs) for RVA and RVB detection were lower for 1-3 day old pigs when compared to any other age group. However, the cOR of RVC detection in 1-3 day old pigs was significantly higher (p < 0.001) than pigs in the 4-20 days old and >55 day old age groups. Furthermore, pigs in the 21-55 day old age group had statistically higher cORs of RV co-detection compared to 1-3 day old pigs (p < 0.001). The 3L-MLMs indicated that RV status was more similar within states than among states or within each region. Our results indicated that 3L-MLMs are a powerful and adaptable tool to handle and analyze large-hierarchical datasets. In addition, our results indicated that, overall, swine RV epidemiology is complex, and RV species are associated with different age groups and vary by regions in North America. PMID:27145176
NASA Astrophysics Data System (ADS)
Ciprian Margarint, Mihai; Niculita, Mihai
2014-05-01
The regions with monoclinic geological structure are large portions of earth surface where the repetition of similar landform patterns is very distinguished, the scarps of cuestas being characterized by similar values of morphometrical variables. Landslides are associated with these scarps of cuestas and consequently, a very high value of landslide susceptibility can be reported on its surface. In these regions, landslide susceptibility mapping can be realized for the entire region, or for test areas, with accurate, reliable, and available datasets, concerning multi-temporal inventories and landslide predictors. Because of the similar geomorphologic and landslide distribution we think that if any relevance of using test areas for extrapolating susceptibility models is present, these areas should be targeted first. This study case try to establish the level of usability of landslide predictors influence, obtained for a 90 km2 sample located in the northern part of the Moldavian Plateau (N-E Romania), in other areas of the same physio-geographic region. In a first phase, landslide susceptibility assessment was carried out and validated using logistic regression (LR) approach, using a multiple landslide inventory. This inventory was created using ortorectified aerial images from 1978 and 2005, for each period being considered both old and active landslides. The modeling strategy was based on a distinctly inventory of depletion areas of all landslide, for 1978 phase, and on a number of 30 covariates extracted from topographical and aerial images (both from 1978 and 2005 periods). The geomorphometric variables were computed from a Digital Elevation Model (DEM) obtained by interpolation from 1:5000 contour data (2.5 m equidistance), at 10x10 m resolution. Distance from river network, distance from roads and land use were extracted from topographic maps and aerial images. By applying Akaike Information Criterion (AIC) the covariates with significance under 0.001 level
Kjelstrom, L.C.
1995-01-01
Previously developed U.S. Geological Survey regional regression models of runoff and 11 chemical constituents were evaluated to assess their suitability for use in urban areas in Boise and Garden City. Data collected in the study area were used to develop adjusted regional models of storm-runoff volumes and mean concentrations and loads of chemical oxygen demand, dissolved and suspended solids, total nitrogen and total ammonia plus organic nitrogen as nitrogen, total and dissolved phosphorus, and total recoverable cadmium, copper, lead, and zinc. Explanatory variables used in these models were drainage area, impervious area, land-use information, and precipitation data. Mean annual runoff volume and loads at the five outfalls were estimated from 904 individual storms during 1976 through 1993. Two methods were used to compute individual storm loads. The first method used adjusted regional models of storm loads and the second used adjusted regional models for mean concentration and runoff volume. For large storms, the first method seemed to produce excessively high loads for some constituents and the second method provided more reliable results for all constituents except suspended solids. The first method provided more reliable results for large storms for suspended solids.
ERIC Educational Resources Information Center
Gómez-Benito, Juana; Hidalgo, Maria Dolores; Zumbo, Bruno D.
2013-01-01
The objective of this article was to find an optimal decision rule for identifying polytomous items with large or moderate amounts of differential functioning. The effectiveness of combining statistical tests with effect size measures was assessed using logistic discriminant function analysis and two effect size measures: R[superscript 2] and…
NASA Technical Reports Server (NTRS)
Duda, David P.; Minnis, Patrick
2009-01-01
Straightforward application of the Schmidt-Appleman contrail formation criteria to diagnose persistent contrail occurrence from numerical weather prediction data is hindered by significant bias errors in the upper tropospheric humidity. Logistic models of contrail occurrence have been proposed to overcome this problem, but basic questions remain about how random measurement error may affect their accuracy. A set of 5000 synthetic contrail observations is created to study the effects of random error in these probabilistic models. The simulated observations are based on distributions of temperature, humidity, and vertical velocity derived from Advanced Regional Prediction System (ARPS) weather analyses. The logistic models created from the simulated observations were evaluated using two common statistical measures of model accuracy, the percent correct (PC) and the Hanssen-Kuipers discriminant (HKD). To convert the probabilistic results of the logistic models into a dichotomous yes/no choice suitable for the statistical measures, two critical probability thresholds are considered. The HKD scores are higher when the climatological frequency of contrail occurrence is used as the critical threshold, while the PC scores are higher when the critical probability threshold is 0.5. For both thresholds, typical random errors in temperature, relative humidity, and vertical velocity are found to be small enough to allow for accurate logistic models of contrail occurrence. The accuracy of the models developed from synthetic data is over 85 percent for both the prediction of contrail occurrence and non-occurrence, although in practice, larger errors would be anticipated.
2013-01-01
Background Adherence is one of the most important determinants of viral suppression and drug resistance in HIV-infected people receiving antiretroviral therapy (ART). Methods We examined the association between long-term mortality and poor adherence to ART in DART trial participants in Uganda and Zimbabwe randomly assigned to receive laboratory and clinical monitoring (LCM), or clinically driven monitoring (CDM). Since over 50% of all deaths in the DART trial occurred during the first year on ART, we focussed on participants continuing ART for 12 months to investigate the implications of longer-term adherence to treatment on mortality. Participants’ ART adherence was assessed by pill counts and structured questionnaires at 4-weekly clinic visits. We studied the effect of recent adherence history on the risk of death at the individual level (odds ratios from dynamic logistic regression model), and on mortality at the population level (population attributable fraction based on this model). Analyses were conducted separately for both randomization groups, adjusted for relevant confounding factors. Adherence behaviour was also confounded by a partial factorial randomization comparing structured treatment interruptions (STI) with continuous ART (CT). Results In the CDM arm a significant association was found between poor adherence to ART in the previous 3-9 months with increased mortality risk. In the LCM arm the association was not significant. The odds ratios for mortality in participants with poor adherence against those with optimal adherence was 1.30 (95% CI 0.78,2.10) in the LCM arm and 2.18 (1.47,3.22) in the CDM arm. The estimated proportions of deaths that could have been avoided with optimal adherence (population attributable fraction) in the LCM and CDM groups during the 5 years follow-up period were 16.0% (95% CI 0.7%,31.6%) and 33.1% (20.5%,44.8%), correspondingly. Conclusions Recurrent poor adherence determined even through simple measures is associated
Asquith, William H.; Roussel, Meghan C.
2009-01-01
Annual peak-streamflow frequency estimates are needed for flood-plain management; for objective assessment of flood risk; for cost-effective design of dams, levees, and other flood-control structures; and for design of roads, bridges, and culverts. Annual peak-streamflow frequency represents the peak streamflow for nine recurrence intervals of 2, 5, 10, 25, 50, 100, 200, 250, and 500 years. Common methods for estimation of peak-streamflow frequency for ungaged or unmonitored watersheds are regression equations for each recurrence interval developed for one or more regions; such regional equations are the subject of this report. The method is based on analysis of annual peak-streamflow data from U.S. Geological Survey streamflow-gaging stations (stations). Beginning in 2007, the U.S. Geological Survey, in cooperation with the Texas Department of Transportation and in partnership with Texas Tech University, began a 3-year investigation concerning the development of regional equations to estimate annual peak-streamflow frequency for undeveloped watersheds in Texas. The investigation focuses primarily on 638 stations with 8 or more years of data from undeveloped watersheds and other criteria. The general approach is explicitly limited to the use of L-moment statistics, which are used in conjunction with a technique of multi-linear regression referred to as PRESS minimization. The approach used to develop the regional equations, which was refined during the investigation, is referred to as the 'L-moment-based, PRESS-minimized, residual-adjusted approach'. For the approach, seven unique distributions are fit to the sample L-moments of the data for each of 638 stations and trimmed means of the seven results of the distributions for each recurrence interval are used to define the station specific, peak-streamflow frequency. As a first iteration of regression, nine weighted-least-squares, PRESS-minimized, multi-linear regression equations are computed using the watershed
NASA Technical Reports Server (NTRS)
Duda, David P.; Minnis, Patrick
2009-01-01
Previous studies have shown that probabilistic forecasting may be a useful method for predicting persistent contrail formation. A probabilistic forecast to accurately predict contrail formation over the contiguous United States (CONUS) is created by using meteorological data based on hourly meteorological analyses from the Advanced Regional Prediction System (ARPS) and from the Rapid Update Cycle (RUC) as well as GOES water vapor channel measurements, combined with surface and satellite observations of contrails. Two groups of logistic models were created. The first group of models (SURFACE models) is based on surface-based contrail observations supplemented with satellite observations of contrail occurrence. The second group of models (OUTBREAK models) is derived from a selected subgroup of satellite-based observations of widespread persistent contrails. The mean accuracies for both the SURFACE and OUTBREAK models typically exceeded 75 percent when based on the RUC or ARPS analysis data, but decreased when the logistic models were derived from ARPS forecast data.
NASA Astrophysics Data System (ADS)
Roşca, S.; Bilaşco, Ş.; Petrea, D.; Fodorean, I.; Vescan, I.; Filip, S.; Măguţ, F.-L.
2015-11-01
The existence of a large number of GIS models for the identification of landslide occurrence probability makes difficult the selection of a specific one. The present study focuses on the application of two quantitative models: the logistic and the BSA models. The comparative analysis of the results aims at identifying the most suitable model. The territory corresponding to the Niraj Mic Basin (87 km2) is an area characterised by a wide variety of the landforms with their morphometric, morphographical and geological characteristics as well as by a high complexity of the land use types where active landslides exist. This is the reason why it represents the test area for applying the two models and for the comparison of the results. The large complexity of input variables is illustrated by 16 factors which were represented as 72 dummy variables, analysed on the basis of their importance within the model structures. The testing of the statistical significance corresponding to each variable reduced the number of dummy variables to 12 which were considered significant for the test area within the logistic model, whereas for the BSA model all the variables were employed. The predictability degree of the models was tested through the identification of the area under the ROC curve which indicated a good accuracy (AUROC = 0.86 for the testing area) and predictability of the logistic model (AUROC = 0.63 for the validation area).
ERIC Educational Resources Information Center
Davidson, J. Cody
2016-01-01
Mathematics is the most common subject area of remedial need and the majority of remedial math students never pass a college-level credit-bearing math class. The majorities of studies that investigate this phenomenon are conducted at community colleges and use some type of regression model; however, none have used a continuation ratio model. The…
Robertson, D.M.; Saad, D.A.; Heisey, D.M.
2006-01-01
Various approaches are used to subdivide large areas into regions containing streams that have similar reference or background water quality and that respond similarly to different factors. For many applications, such as establishing reference conditions, it is preferable to use physical characteristics that are not affected by human activities to delineate these regions. However, most approaches, such as ecoregion classifications, rely on land use to delineate regions or have difficulties compensating for the effects of land use. Land use not only directly affects water quality, but it is often correlated with the factors used to define the regions. In this article, we describe modifications to SPARTA (spatial regression-tree analysis), a relatively new approach applied to water-quality and environmental characteristic data to delineate zones with similar factors affecting water quality. In this modified approach, land-use-adjusted (residualized) water quality and environmental characteristics are computed for each site. Regression-tree analysis is applied to the residualized data to determine the most statistically important environmental characteristics describing the distribution of a specific water-quality constituent. Geographic information for small basins throughout the study area is then used to subdivide the area into relatively homogeneous environmental water-quality zones. For each zone, commonly used approaches are subsequently used to define its reference water quality and how its water quality responds to changes in land use. SPARTA is used to delineate zones of similar reference concentrations of total phosphorus and suspended sediment throughout the upper Midwestern part of the United States. ?? 2006 Springer Science+Business Media, Inc.
Lee, J H; Landrum, P F; Field, L J; Koh, C H
2001-09-01
Two models, a sigmapolycyclic aromatic hydrocarbon (PAH) model based on equilibrium partitioning theory and a logistic-regression model, were developed and evaluated to predict sediment-associated PAH toxicity to Hyalella azteca. A sigmaPAH model was applied to freshwater sediments. This study is the first attempt to use a sigmaPAH model based on water-only, median lethal concentration (LC50) toxic unit (TU) values for sediment-associated PAH mixtures and its application to freshwater sediments. To predict the toxicity (i.e., mortality) from contaminated sediments to H. azteca, an interstitial water TU, calculated as the ambient interstitial water concentration divided by the water-only LC50 in which the interstitial water concentrations were predicted by equilibrium partitioning theory, was used. Assuming additive toxicity for PAH, the sum of TUs was calculated to predict the total toxicity of PAH mixtures in sediments. The sigmaPAH model was developed from 10- and 14-d H. azteca water-only LC50 values. To obtain estimates of LC50 values for a wide range of PAHs, a quantitative structure-activity relationship (QSAR) model (log LC50 - log Kow) with a constant slope was derived using the time-variable LC50 values for four PAH congeners. The logistic-regression model was derived to assess the concentration-response relationship for field sediments, which showed that 1.3 (0.6-3.9) TU were required for a 50% probability that a sediment was toxic. The logistic-regression model reflects both the effects of co-occurring contaminants (i.e., nonmeasured PAH and unknown pollutants) and the overestimation of exposure to sediment-associated PAH. An apparent site-specific bioavailability limitation of sediment-associated PAH was found for a site contaminated by creosote. At this site, no toxic samples were less than 3.9 TU. Finally, the predictability of the sigmaPAH model can be affected by species-specific responses (Hyalella vs Rhepoxynius); chemical specific (PAH vs DDT in
Detecting Heterogeneity in Logistic Regression Models
ERIC Educational Resources Information Center
Balazs, Katalin; Hidegkuti, Istvan; De Boeck, Paul
2006-01-01
In the context of item response theory, it is not uncommon that person-by-item data are correlated beyond the correlation that is captured by the model--in other words, there is extra binomial variation. Heterogeneity of the parameters can explain this variation. There is a need for proper statistical methods to indicate possible extra…
Trashball: A Logistic Regression Classroom Activity
ERIC Educational Resources Information Center
Morrell, Christopher H.; Auer, Richard E.
2007-01-01
In the early 1990s, the National Science Foundation funded many research projects for improving statistical education. Many of these stressed the need for classroom activities that illustrate important issues of designing experiments, generating quality data, fitting models, and performing statistical tests. Our paper describes such an activity on…
Foster, Guy M.; Graham, Jennifer L.
2016-04-06
The Kansas River is a primary source of drinking water for about 800,000 people in northeastern Kansas. Source-water supplies are treated by a combination of chemical and physical processes to remove contaminants before distribution. Advanced notification of changing water-quality conditions and cyanobacteria and associated toxin and taste-and-odor compounds provides drinking-water treatment facilities time to develop and implement adequate treatment strategies. The U.S. Geological Survey (USGS), in cooperation with the Kansas Water Office (funded in part through the Kansas State Water Plan Fund), and the City of Lawrence, the City of Topeka, the City of Olathe, and Johnson County Water One, began a study in July 2012 to develop statistical models at two Kansas River sites located upstream from drinking-water intakes. Continuous water-quality monitors have been operated and discrete-water quality samples have been collected on the Kansas River at Wamego (USGS site number 06887500) and De Soto (USGS site number 06892350) since July 2012. Continuous and discrete water-quality data collected during July 2012 through June 2015 were used to develop statistical models for constituents of interest at the Wamego and De Soto sites. Logistic models to continuously estimate the probability of occurrence above selected thresholds were developed for cyanobacteria, microcystin, and geosmin. Linear regression models to continuously estimate constituent concentrations were developed for major ions, dissolved solids, alkalinity, nutrients (nitrogen and phosphorus species), suspended sediment, indicator bacteria (Escherichia coli, fecal coliform, and enterococci), and actinomycetes bacteria. These models will be used to provide real-time estimates of the probability that cyanobacteria and associated compounds exceed thresholds and of the concentrations of other water-quality constituents in the Kansas River. The models documented in this report are useful for characterizing changes
Classifying hospitals as mortality outliers: logistic versus hierarchical logistic models.
Alexandrescu, Roxana; Bottle, Alex; Jarman, Brian; Aylin, Paul
2014-05-01
The use of hierarchical logistic regression for provider profiling has been recommended due to the clustering of patients within hospitals, but has some associated difficulties. We assess changes in hospital outlier status based on standard logistic versus hierarchical logistic modelling of mortality. The study population consisted of all patients admitted to acute, non-specialist hospitals in England between 2007 and 2011 with a primary diagnosis of acute myocardial infarction, acute cerebrovascular disease or fracture of neck of femur or a primary procedure of coronary artery bypass graft or repair of abdominal aortic aneurysm. We compared standardised mortality ratios (SMRs) from non-hierarchical models with SMRs from hierarchical models, without and with shrinkage estimates of the predicted probabilities (Model 1 and Model 2). The SMRs from standard logistic and hierarchical models were highly statistically significantly correlated (r > 0.91, p = 0.01). More outliers were recorded in the standard logistic regression than hierarchical modelling only when using shrinkage estimates (Model 2): 21 hospitals (out of a cumulative number of 565 pairs of hospitals under study) changed from a low outlier and 8 hospitals changed from a high outlier based on the logistic regression to a not-an-outlier based on shrinkage estimates. Both standard logistic and hierarchical modelling have identified nearly the same hospitals as mortality outliers. The choice of methodological approach should, however, also consider whether the modelling aim is judgment or improvement, as shrinkage may be more appropriate for the former than the latter. PMID:24711175
Leffondré, Karen; Jager, Kitty J; Boucquemont, Julie; Stel, Vianda S; Heinze, Georg
2014-10-01
Regression models are being used to quantify the effect of an exposure on an outcome, while adjusting for potential confounders. While the type of regression model to be used is determined by the nature of the outcome variable, e.g. linear regression has to be applied for continuous outcome variables, all regression models can handle any kind of exposure variables. However, some fundamentals of representation of the exposure in a regression model and also some potential pitfalls have to be kept in mind in order to obtain meaningful interpretation of results. The objective of this educational paper was to illustrate these fundamentals and pitfalls, using various multiple regression models applied to data from a hypothetical cohort of 3000 patients with chronic kidney disease. In particular, we illustrate how to represent different types of exposure variables (binary, categorical with two or more categories and continuous), and how to interpret the regression coefficients in linear, logistic and Cox models. We also discuss the linearity assumption in these models, and show how wrongly assuming linearity may produce biased results and how flexible modelling using spline functions may provide better estimates.
NASA Astrophysics Data System (ADS)
Liberman, Neomi; Ben-David Kolikant, Yifat; Beeri, Catriel
2012-09-01
Due to a program reform in Israel, experienced CS high-school teachers faced the need to master and teach a new programming paradigm. This situation served as an opportunity to explore the relationship between teachers' content knowledge (CK) and their pedagogical content knowledge (PCK). This article focuses on three case studies, with emphasis on one of them. Using observations and interviews, we examine how the teachers, we observed taught and what development of their teaching occurred as a result of their teaching experience, if at all. Our findings suggest that this situation creates a new hybrid state of teachers, which we term "regressed experts." These teachers incorporate in their professional practice some elements typical of novices and some typical of experts. We also found that these teachers' experience, although established when teaching a different CK, serve as a leverage to improve their knowledge and understanding of aspects of the new content.
Chai, Zhiguo; Chen, Jihua; Zhang, Shaofeng
2014-01-01
Background Removable dentures are subject to plaque and/or staining problems. Denture hygiene habits and risk factors differ among countries and regions. The aims of this study were to assess hygiene habits and denture plaque and staining risk factors in Chinese removable denture wearers aged >40 years in Xi’an through multiple logistic regression analysis (MLRA). Methods Questionnaires were administered to 222 patients whose removable dentures were examined clinically to assess wear status and levels of plaque and staining. Univariate analyses were performed to identify potential risk factors for denture plaque/staining. MLRA was performed to identify significant risk factors. Results Brushing (77.93%) was the most prevalent cleaning method in the present study. Only 16.4% of patients regularly used commercial cleansers. Most (81.08%) patients removed their dentures overnight. MLRA indicated that potential risk factors for denture plaque were the duration of denture use (reference, ≤0.5 years; 2.1–5 years: OR = 4.155, P = 0.001; >5 years: OR = 7.238, P<0.001) and cleaning method (reference, chemical cleanser; running water: OR = 7.081, P = 0.010; brushing: OR = 3.567, P = 0.005). Potential risk factors for denture staining were female gender (OR = 0.377, P = 0.013), smoking (OR = 5.471, P = 0.031), tea consumption (OR = 3.957, P = 0.002), denture scratching (OR = 4.557, P = 0.036), duration of denture use (reference, ≤0.5 years; 2.1–5 years: OR = 7.899, P = 0.001; >5 years: OR = 27.226, P<0.001), and cleaning method (reference, chemical cleanser; running water: OR = 29.184, P<0.001; brushing: OR = 4.236, P = 0.007). Conclusion Denture hygiene habits need further improvement. An understanding of the risk factors for denture plaque and staining may provide the basis for preventive efforts. PMID:24498369
2011-01-01
Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed
Jiang, Honghua; Kulkarni, Pandurang M; Mallinckrodt, Craig H; Shurzinske, Linda; Molenberghs, Geert; Lipkovich, Ilya
2015-01-01
The benefits of adjusting for baseline covariates are not as straightforward with repeated binary responses as with continuous response variables. Therefore, in this study, we compared different methods for analyzing repeated binary data through simulations when the outcome at the study endpoint is of interest. Methods compared included chi-square, Fisher's exact test, covariate adjusted/unadjusted logistic regression (Adj.logit/Unadj.logit), covariate adjusted/unadjusted generalized estimating equations (Adj.GEE/Unadj.GEE), covariate adjusted/unadjusted generalized linear mixed model (Adj.GLMM/Unadj.GLMM). All these methods preserved the type I error close to the nominal level. Covariate adjusted methods improved power compared with the unadjusted methods because of the increased treatment effect estimates, especially when the correlation between the baseline and outcome was strong, even though there was an apparent increase in standard errors. Results of the Chi-squared test were identical to those for the unadjusted logistic regression. Fisher's exact test was the most conservative test regarding the type I error rate and also with the lowest power. Without missing data, there was no gain in using a repeated measures approach over a simple logistic regression at the final time point. Analysis of results from five phase III diabetes trials of the same compound was consistent with the simulation findings. Therefore, covariate adjusted analysis is recommended for repeated binary data when the study endpoint is of interest. PMID:25866149
NASA Astrophysics Data System (ADS)
Wu, Chunhung
2016-04-01
Few researches have discussed about the applicability of applying the statistical landslide susceptibility (LS) model for extreme rainfall-induced landslide events. The researches focuses on the comparison and applicability of LS models based on four methods, including landslide ratio-based logistic regression (LRBLR), frequency ratio (FR), weight of evidence (WOE), and instability index (II) methods, in an extreme rainfall-induced landslide cases. The landslide inventory in the Chishan river watershed, Southwestern Taiwan, after 2009 Typhoon Morakot is the main materials in this research. The Chishan river watershed is a tributary watershed of Kaoping river watershed, which is a landslide- and erosion-prone watershed with the annual average suspended load of 3.6×107 MT/yr (ranks 11th in the world). Typhoon Morakot struck Southern Taiwan from Aug. 6-10 in 2009 and dumped nearly 2,000 mm of rainfall in the Chishan river watershed. The 24-hour, 48-hour, and 72-hours accumulated rainfall in the Chishan river watershed exceeded the 200-year return period accumulated rainfall. 2,389 landslide polygons in the Chishan river watershed were extracted from SPOT 5 images after 2009 Typhoon Morakot. The total landslide area is around 33.5 km2, equals to the landslide ratio of 4.1%. The main landslide types based on Varnes' (1978) classification are rotational and translational slides. The two characteristics of extreme rainfall-induced landslide event are dense landslide distribution and large occupation of downslope landslide areas owing to headward erosion and bank erosion in the flooding processes. The area of downslope landslide in the Chishan river watershed after 2009 Typhoon Morakot is 3.2 times higher than that of upslope landslide areas. The prediction accuracy of LS models based on LRBLR, FR, WOE, and II methods have been proven over 70%. The model performance and applicability of four models in a landslide-prone watershed with dense distribution of rainfall
Killeen, Peter R
2015-07-01
The generalized matching law (GML) is reconstructed as a logistic regression equation that privileges no particular value of the sensitivity parameter, a. That value will often approach 1 due to the feedback that drives switching that is intrinsic to most concurrent schedules. A model of that feedback reproduced some features of concurrent data. The GML is a law only in the strained sense that any equation that maps data is a law. The machine under the hood of matching is in all likelihood the very law that was displaced by the Matching Law. It is now time to return the Law of Effect to centrality in our science.
ERIC Educational Resources Information Center
Zumbo, Bruno D.; Ochieng, Charles O.
Many measures found in educational research are ordered categorical response variables that are empirical realizations of an underlying normally distributed variate. These ordered categorical variables are commonly referred to as Likert or rating scale data. Regression models are commonly fit using these ordered categorical variables as the…
J. Richard Hess; Kevin L. Kenney; William A. Smith; Ian Bonner; David J. Muth
2015-04-01
Equipment manufacturers have made rapid improvements in biomass harvesting and handling equipment. These improvements have increased transportation and handling efficiencies due to higher biomass densities and reduced losses. Improvements in grinder efficiencies and capacity have reduced biomass grinding costs. Biomass collection efficiencies (the ratio of biomass collected to the amount available in the field) as high as 75% for crop residues and greater than 90% for perennial energy crops have also been demonstrated. However, as collection rates increase, the fraction of entrained soil in the biomass increases, and high biomass residue removal rates can violate agronomic sustainability limits. Advancements in quantifying multi-factor sustainability limits to increase removal rate as guided by sustainable residue removal plans, and mitigating soil contamination through targeted removal rates based on soil type and residue type/fraction is allowing the use of new high efficiency harvesting equipment and methods. As another consideration, single pass harvesting and other technologies that improve harvesting costs cause biomass storage moisture management challenges, which challenges are further perturbed by annual variability in biomass moisture content. Monitoring, sampling, simulation, and analysis provide basis for moisture, time, and quality relationships in storage, which has allowed the development of moisture tolerant storage systems and best management processes that combine moisture content and time to accommodate baled storage of wet material based upon “shelf-life.” The key to improving biomass supply logistics costs has been developing the associated agronomic sustainability and biomass quality technologies and processes that allow the implementation of equipment engineering solutions.
Quality Reporting of Multivariable Regression Models in Observational Studies
Real, Jordi; Forné, Carles; Roso-Llorach, Albert; Martínez-Sánchez, Jose M.
2016-01-01
Abstract Controlling for confounders is a crucial step in analytical observational studies, and multivariable models are widely used as statistical adjustment techniques. However, the validation of the assumptions of the multivariable regression models (MRMs) should be made clear in scientific reporting. The objective of this study is to review the quality of statistical reporting of the most commonly used MRMs (logistic, linear, and Cox regression) that were applied in analytical observational studies published between 2003 and 2014 by journals indexed in MEDLINE. Review of a representative sample of articles indexed in MEDLINE (n = 428) with observational design and use of MRMs (logistic, linear, and Cox regression). We assessed the quality of reporting about: model assumptions and goodness-of-fit, interactions, sensitivity analysis, crude and adjusted effect estimate, and specification of more than 1 adjusted model. The tests of underlying assumptions or goodness-of-fit of the MRMs used were described in 26.2% (95% CI: 22.0–30.3) of the articles and 18.5% (95% CI: 14.8–22.1) reported the interaction analysis. Reporting of all items assessed was higher in articles published in journals with a higher impact factor. A low percentage of articles indexed in MEDLINE that used multivariable techniques provided information demonstrating rigorous application of the model selected as an adjustment method. Given the importance of these methods to the final results and conclusions of observational studies, greater rigor is required in reporting the use of MRMs in the scientific literature. PMID:27196467
Country logistics performance and disaster impact.
Vaillancourt, Alain; Haavisto, Ira
2016-04-01
The aim of this paper is to deepen the understanding of the relationship between country logistics performance and disaster impact. The relationship is analysed through correlation analysis and regression models for 117 countries for the years 2007 to 2012 with disaster impact variables from the International Disaster Database (EM-DAT) and logistics performance indicators from the World Bank. The results show a significant relationship between country logistics performance and disaster impact overall and for five out of six specific logistic performance indicators. These specific indicators were further used to explore the relationship between country logistic performance and disaster impact for three specific disaster types (epidemic, flood and storm). The findings enhance the understanding of the role of logistics in a humanitarian context with empirical evidence of the importance of country logistics performance in disaster response operations.
ERIC Educational Resources Information Center
Matson, Johnny L.; Kozlowski, Alison M.
2010-01-01
Autistic regression is one of the many mysteries in the developmental course of autism and pervasive developmental disorders not otherwise specified (PDD-NOS). Various definitions of this phenomenon have been used, further clouding the study of the topic. Despite this problem, some efforts at establishing prevalence have been made. The purpose of…
Logistic Stick-Breaking Process
Ren, Lu; Du, Lan; Carin, Lawrence; Dunson, David B.
2013-01-01
A logistic stick-breaking process (LSBP) is proposed for non-parametric clustering of general spatially- or temporally-dependent data, imposing the belief that proximate data are more likely to be clustered together. The sticks in the LSBP are realized via multiple logistic regression functions, with shrinkage priors employed to favor contiguous and spatially localized segments. The LSBP is also extended for the simultaneous processing of multiple data sets, yielding a hierarchical logistic stick-breaking process (H-LSBP). The model parameters (atoms) within the H-LSBP are shared across the multiple learning tasks. Efficient variational Bayesian inference is derived, and comparisons are made to related techniques in the literature. Experimental analysis is performed for audio waveforms and images, and it is demonstrated that for segmentation applications the LSBP yields generally homogeneous segments with sharp boundaries. PMID:25258593
Turner, Elizabeth L.; Perel, Pablo; Clayton, Tim; Edwards, Phil; Hernández, Adrian V.; Roberts, Ian; Shakur, Haleema; Steyerberg, Ewout W.
2013-01-01
Objective We aimed to determine to what extent covariate adjustment could affect power in a randomized controlled trial (RCT) of a heterogeneous population with traumatic brain injury (TBI). Study Design and Setting We analyzed 14-day mortality in 9497 participants in the Corticosteroid Randomisation After Significant Head Injury (CRASH) RCT of corticosteroid vs. placebo. Adjustment was made using logistic regression for baseline covariates of two validated risk models derived from external data (IMPACT) and from the CRASH data. The relative sample size (RESS) measure, defined as the ratio of the sample size required by an adjusted analysis to attain the same power as the unadjusted reference analysis, was used to assess the impact of adjustment. Results Corticosteroid was associated with higher mortality compared to placebo (OR=1.25, 95% CI: 1.13, 1.39). RESS of 0.79 and 0.73 were obtained by adjustment using the IMPACT and CRASH models, respectively, which for example implies an increase from 80% to 88% and 91% power, respectively. Conclusion Moderate gains in power may be obtained using covariate adjustment from logistic regression in heterogeneous conditions such as TBI. Although analyses of RCTs might consider covariate adjustment to improve power, we caution against this approach in the planning of RCTs. PMID:22169080
Huang, Dong; Cabral, Ricardo; De la Torre, Fernando
2016-02-01
Discriminative methods (e.g., kernel regression, SVM) have been extensively used to solve problems such as object recognition, image alignment and pose estimation from images. These methods typically map image features ( X) to continuous (e.g., pose) or discrete (e.g., object category) values. A major drawback of existing discriminative methods is that samples are directly projected onto a subspace and hence fail to account for outliers common in realistic training sets due to occlusion, specular reflections or noise. It is important to notice that existing discriminative approaches assume the input variables X to be noise free. Thus, discriminative methods experience significant performance degradation when gross outliers are present. Despite its obvious importance, the problem of robust discriminative learning has been relatively unexplored in computer vision. This paper develops the theory of robust regression (RR) and presents an effective convex approach that uses recent advances on rank minimization. The framework applies to a variety of problems in computer vision including robust linear discriminant analysis, regression with missing data, and multi-label classification. Several synthetic and real examples with applications to head pose estimation from images, image and video classification and facial attribute classification with missing data are used to illustrate the benefits of RR. PMID:26761740
Nose, Hirohisa; Saito, Mineki; Usuku, Koichiro; Sabouri, Amir H; Matsuzaki, Toshio; Kubota, Ryuji; Eiraku, Nobutaka; Furukawa, Yoshitaka; Izumo, Shuji; Arimura, Kimiyoshi; Osame, Mitsuhiro
2006-06-01
The authors have previously developed a logistic regression equation to predict the odds that a human T-cell lymphotropic virus type 1 (HTLV-1)-infected individual of specified genotype, age, and provirus load has HTLV-1-associated myelopathy/tropical spastic paraparesis (HAM/TSP) in southern Japan. This study evaluated whether this equation is useful predictor for monitoring asymptomatic HTLV-1-seropositive carriers (HCs) in the same population. The authors genotyped 181 HCs for each HAM/TSP-associated gene (tumor necrosis factor [TNF]-alpha-863A/C, stromal cell-derived factor 1 (SDF-1) +801G/A, human leukocyte antigen [HLA]-A*02, HLA-Cw*08, HTLV-1 tax subgroup) and measured HTLV-1 provirus load in peripheral blood mononuclear cells using real-time polymerase chain reaction (PCR). Finally, the odds of HAM/TSP for each subject were calculated by using the equation and compared the results with clinical symptoms and laboratory findings. Although no clear difference was seen between the odds of HAM/TSP and either sex, family history of HAM/TSP or adult T-cell lenkemia (ATL), history of blood transfusion, it was found that brisk patellar deep tendon reflexes, which suggest latent central nervous system compromise, and flower cell-like abnormal lymphocytes, which is the morphological characteristic of ATL cells, were associated with a higher odds of HAM/TSP. The best-fit logistic regression equation may be useful for detecting subclinical abnormalities in HCs in southern Japan.
Adolescent suicide attempts and adult adjustment
Brière, Frédéric N.; Rohde, Paul; Seeley, John R.; Klein, Daniel; Lewinsohn, Peter M.
2014-01-01
Background Adolescent suicide attempts are disproportionally prevalent and frequently of low severity, raising questions regarding their long-term prognostic implications. In this study, we examined whether adolescent attempts were associated with impairments related to suicidality, psychopathology, and psychosocial functioning in adulthood (objective 1) and whether these impairments were better accounted for by concurrent adolescent confounders (objective 2). Method 816 adolescents were assessed using interviews and questionnaires at four time points from adolescence to adulthood. We examined whether lifetime suicide attempts in adolescence (by T2, mean age 17) predicted adult outcomes (by T4, mean age 30) using linear and logistic regressions in unadjusted models (objective 1) and adjusting for sociodemographic background, adolescent psychopathology, and family risk factors (objective 2). Results In unadjusted analyses, adolescent suicide attempts predicted poorer adjustment on all outcomes, except those related to social role status. After adjustment, adolescent attempts remained predictive of axis I and II psychopathology (anxiety disorder, antisocial and borderline personality disorder symptoms), global and social adjustment, risky sex, and psychiatric treatment utilization. However, adolescent attempts no longer predicted most adult outcomes, notably suicide attempts and major depressive disorder. Secondary analyses indicated that associations did not differ by sex and attempt characteristics (intent, lethality, recurrence). Conclusions Adolescent suicide attempters are at high risk of protracted and wide-ranging impairments, regardless of the characteristics of their attempt. Although attempts specifically predict (and possibly influence) several outcomes, results suggest that most impairments reflect the confounding contributions of other individual and family problems or vulnerabilites in adolescent attempters. PMID:25421360
NASA Technical Reports Server (NTRS)
Tellado, Joseph
2014-01-01
The presentation contains a status of KSC ISS Logistics Operations. It basically presents current top level ISS Logistics tasks being conducted at KSC, current International Partner activities, hardware processing flow focussing on late Stow operations, list of KSC Logistics POC's, and a backup list of Logistics launch site services. This presentation is being given at the annual International Space Station (ISS) Multi-lateral Logistics Maintenance Control Panel meeting to be held in Turin, Italy during the week of May 13-16. The presentatiuon content doesn't contain any potential lessons learned.
Assessing Lake Trophic Status: A Proportional Odds Logistic Regression Model
Lake trophic state classifications are good predictors of ecosystem condition and are indicative of both ecosystem services (e.g., recreation and aesthetics), and disservices (e.g., harmful algal blooms). Methods for classifying trophic state are based off the foundational work o...
Background or Experience? Using Logistic Regression to Predict College Retention
ERIC Educational Resources Information Center
Synco, Tracee M.
2012-01-01
Tinto, Astin and countless others have researched the retention and attrition of students from college for more than thirty years. However, the six year graduation rate for all first-time full-time freshmen for the 2002 cohort was 57%. This study sought to determine the retention variables that predicted continued enrollment of entering freshmen…
Electrocardiogram classification using reservoir computing with logistic regression.
Escalona-Morán, Miguel Angel; Soriano, Miguel C; Fischer, Ingo; Mirasso, Claudio R
2015-05-01
An adapted state-of-the-art method of processing information known as Reservoir Computing is used to show its utility on the open and time-consuming problem of heartbeat classification. The MIT-BIH arrhythmia database is used following the guidelines of the Association for the Advancement of Medical Instrumentation. Our approach requires a computationally inexpensive preprocessing of the electrocardiographic signal leading to a fast algorithm and approaching a real-time classification solution. Our multiclass classification results indicate an average specificity of 97.75% with an average accuracy of 98.43%. Sensitivity and positive predicted value show an average of 84.83% and 88.75%, respectively, what makes our approach significant for its use in a clinical context.
Incremental logistic regression for customizing automatic diagnostic models.
Tortajada, Salvador; Robles, Montserrat; García-Gómez, Juan Miguel
2015-01-01
In the last decades, and following the new trends in medicine, statistical learning techniques have been used for developing automatic diagnostic models for aiding the clinical experts throughout the use of Clinical Decision Support Systems. The development of these models requires a large, representative amount of data, which is commonly obtained from one hospital or a group of hospitals after an expensive and time-consuming gathering, preprocess, and validation of cases. After the model development, it has to overcome an external validation that is often carried out in a different hospital or health center. The experience is that the models show underperformed expectations. Furthermore, patient data needs ethical approval and patient consent to send and store data. For these reasons, we introduce an incremental learning algorithm base on the Bayesian inference approach that may allow us to build an initial model with a smaller number of cases and update it incrementally when new data are collected or even perform a new calibration of a model from a different center by using a reduced number of cases. The performance of our algorithm is demonstrated by employing different benchmark datasets and a real brain tumor dataset; and we compare its performance to a previous incremental algorithm and a non-incremental Bayesian model, showing that the algorithm is independent of the data model, iterative, and has a good convergence. PMID:25417079
Logistic Regression Analysis of Freezing Tolerance in Winter Wheat
Technology Transfer Automated Retrieval System (TEKTRAN)
Four winter wheat cultivars, Eltan, Froid, Kestrel, and Tiber, were cold-acclimated for five weeks and then tested for freezing tolerance in a programmable freezer. The temperature of the soil was recorded every two minutes and the freezing episode was described as five parameters: the minimum temp...
[Structural adjustment, cultural adjustment?].
Dujardin, B; Dujardin, M; Hermans, I
2003-12-01
Over the last two decades, multiple studies have been conducted and many articles published about Structural Adjustment Programmes (SAPs). These studies mainly describe the characteristics of SAPs and analyse their economic consequences as well as their effects upon a variety of sectors: health, education, agriculture and environment. However, very few focus on the sociological and cultural effects of SAPs. Following a summary of SAP's content and characteristics, the paper briefly discusses the historical course of SAPs and the different critiques which have been made. The cultural consequences of SAPs are introduced and are described on four different levels: political, community, familial, and individual. These levels are analysed through examples from the literature and individual testimonies from people in the Southern Hemisphere. The paper concludes that SAPs, alongside economic globalisation processes, are responsible for an acute breakdown of social and cultural structures in societies in the South. It should be a priority, not only to better understand the situation and its determining factors, but also to intervene and act with strategies that support and reinvest in the social and cultural sectors, which is vital in order to allow for individuals and communities in the South to strengthen their autonomy and identify.
Survival Data and Regression Models
NASA Astrophysics Data System (ADS)
Grégoire, G.
2014-12-01
We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.
Covariate-adjusted response-adaptive designs for binary response.
Rosenberger, W F; Vidyashankar, A N; Agarwal, D K
2001-11-01
An adaptive allocation design for phase III clinical trials that incorporates covariates is described. The allocation scheme maps the covariate-adjusted odds ratio from a logistic regression model onto [0, 1]. Simulations assume that both staggered entry and time to response are random and follow a known probability distribution that can depend on the treatment assigned, the patient's response, a covariate, or a time trend. Confidence intervals on the covariate-adjusted odds ratio is slightly anticonservative for the adaptive design under the null hypothesis, but power is similar to equal allocation under various alternatives for n = 200. For similar power, the net savings in terms of expected number of treatment failures is modest, but enough to make this design attractive for certain studies where known covariates are expected to be important and stratification is not desired, and treatment failures have a high ethical cost.
Applications of statistics to medical science, III. Correlation and regression.
Watanabe, Hiroshi
2012-01-01
In this third part of a series surveying medical statistics, the concepts of correlation and regression are reviewed. In particular, methods of linear regression and logistic regression are discussed. Arguments related to survival analysis will be made in a subsequent paper.
ERIC Educational Resources Information Center
Pedrini, D. T.; Pedrini, Bonnie C.
Regression, another mechanism studied by Sigmund Freud, has had much research, e.g., hypnotic regression, frustration regression, schizophrenic regression, and infra-human-animal regression (often directly related to fixation). Many investigators worked with hypnotic age regression, which has a long history, going back to Russian reflexologists.…
Application and Interpretation of Hierarchical Multiple Regression.
Jeong, Younhee; Jung, Mi Jung
2016-01-01
The authors reported the association between motivation and self-management behavior of individuals with chronic low back pain after adjusting control variables using hierarchical multiple regression (). This article describes details of the hierarchical regression applying the actual data used in the article by , including how to test assumptions, run the statistical tests, and report the results. PMID:27648796
NASA Astrophysics Data System (ADS)
Chang, Yoon S.; Oh, Chang H.
Nowadays, environmental management becomes a critical business consideration for companies to survive from many regulations and tough business requirements. Most of world-leading companies are now aware that environment friendly technology and management are critical to the sustainable growth of the company. The environment market has seen continuous growth marking 532B in 2000, and 590B in 2004. This growth rate is expected to grow to 700B in 2010. It is not hard to see the environment-friendly efforts in almost all aspects of business operations. Such trends can be easily found in logistics area. Green logistics aims to make environmental friendly decisions throughout a product lifecycle. Therefore for the success of green logistics, it is critical to have real time tracking capability on the product throughout the product lifecycle and smart solution service architecture. In this chapter, we introduce an RFID based green logistics solution and service.
Lunar Commercial Mining Logistics
NASA Astrophysics Data System (ADS)
Kistler, Walter P.; Citron, Bob; Taylor, Thomas C.
2008-01-01
Innovative commercial logistics is required for supporting lunar resource recovery operations and assisting larger consortiums in lunar mining, base operations, camp consumables and the future commercial sales of propellant over the next 50 years. To assist in lowering overall development costs, ``reuse'' innovation is suggested in reusing modified LTS in-space hardware for use on the moon's surface, developing product lines for recovered gases, regolith construction materials, surface logistics services, and other services as they evolve, (Kistler, Citron and Taylor, 2005) Surface logistics architecture is designed to have sustainable growth over 50 years, financed by private sector partners and capable of cargo transportation in both directions in support of lunar development and resource recovery development. The author's perspective on the importance of logistics is based on five years experience at remote sites on Earth, where remote base supply chain logistics didn't always work, (Taylor, 1975a). The planning and control of the flow of goods and materials to and from the moon's surface may be the most complicated logistics challenges yet to be attempted. Affordability is tied to the innovation and ingenuity used to keep the transportation and surface operations costs as low as practical. Eleven innovations are proposed and discussed by an entrepreneurial commercial space startup team that has had success in introducing commercial space innovation and reducing the cost of space operations in the past. This logistics architecture offers NASA and other exploring nations a commercial alternative for non-essential cargo. Five transportation technologies and eleven surface innovations create the logistics transportation system discussed.
Space Station fluid management logistics
NASA Technical Reports Server (NTRS)
Dominick, Sam M.
1990-01-01
Viewgraphs and discussion on space station fluid management logistics are presented. Topics covered include: fluid management logistics - issues for Space Station Freedom evolution; current fluid logistics approach; evolution of Space Station Freedom fluid resupply; launch vehicle evolution; ELV logistics system approach; logistics carrier configuration; expendable fluid/propellant carrier description; fluid carrier design concept; logistics carrier orbital operations; carrier operations at space station; summary/status of orbital fluid transfer techniques; Soviet progress tanker system; and Soviet propellant resupply system observations.
Logistics planning for phased programs.
NASA Technical Reports Server (NTRS)
Cook, W. H.
1973-01-01
It is pointed out that the proper and early integration of logistics planning into the phased program planning process will drastically reduce these logistics costs. Phased project planning is a phased approach to the planning, approval, and conduct of major research and development activity. A progressive build-up of knowledge of all aspects of the program is provided. Elements of logistics are discussed together with aspects of integrated logistics support, logistics program planning, and logistics activities for phased programs. Continuing logistics support can only be assured if there is a comprehensive sequential listing of all logistics activities tied to the program schedule and a real-time inventory of assets.
Regression Calibration with Heteroscedastic Error Variance
Spiegelman, Donna; Logan, Roger; Grove, Douglas
2011-01-01
The problem of covariate measurement error with heteroscedastic measurement error variance is considered. Standard regression calibration assumes that the measurement error has a homoscedastic measurement error variance. An estimator is proposed to correct regression coefficients for covariate measurement error with heteroscedastic variance. Point and interval estimates are derived. Validation data containing the gold standard must be available. This estimator is a closed-form correction of the uncorrected primary regression coefficients, which may be of logistic or Cox proportional hazards model form, and is closely related to the version of regression calibration developed by Rosner et al. (1990). The primary regression model can include multiple covariates measured without error. The use of these estimators is illustrated in two data sets, one taken from occupational epidemiology (the ACE study) and one taken from nutritional epidemiology (the Nurses’ Health Study). In both cases, although there was evidence of moderate heteroscedasticity, there was little difference in estimation or inference using this new procedure compared to standard regression calibration. It is shown theoretically that unless the relative risk is large or measurement error severe, standard regression calibration approximations will typically be adequate, even with moderate heteroscedasticity in the measurement error model variance. In a detailed simulation study, standard regression calibration performed either as well as or better than the new estimator. When the disease is rare and the errors normally distributed, or when measurement error is moderate, standard regression calibration remains the method of choice. PMID:22848187
Tharrington, Arnold N.
2015-09-09
The NCCS Regression Test Harness is a software package that provides a framework to perform regression and acceptance testing on NCCS High Performance Computers. The package is written in Python and has only the dependency of a Subversion repository to store the regression tests.
Harry, H.H.
1988-03-11
Abstract and method for the adjustment and alignment of shafts in high power devices. A plurality of adjacent rotatable angled cylinders are positioned between a base and the shaft to be aligned which when rotated introduce an axial offset. The apparatus is electrically conductive and constructed of a structurally rigid material. The angled cylinders allow the shaft such as the center conductor in a pulse line machine to be offset in any desired alignment position within the range of the apparatus. 3 figs.
Harry, Herbert H.
1989-01-01
Apparatus and method for the adjustment and alignment of shafts in high power devices. A plurality of adjacent rotatable angled cylinders are positioned between a base and the shaft to be aligned which when rotated introduce an axial offset. The apparatus is electrically conductive and constructed of a structurally rigid material. The angled cylinders allow the shaft such as the center conductor in a pulse line machine to be offset in any desired alignment position within the range of the apparatus.
Space Station logistics system evolution
NASA Technical Reports Server (NTRS)
Tucker, Michael W.
1990-01-01
This task investigates logistics requirements and logistics system concepts for the evolutionary Space Station. Requirements for the basic station, crew, user equipment, and free-flying platforms, as requirements for manned exploration initiative elements and crews while at the Space Station. Data is provided which assesses the ability of the Space Freedom logistics carriers to accommodate the logistics loads per year. Also, advanced carrier concepts are defined and assessed against the logistics requirements. The implications on Earth-to-orbit vehicles of accommodating the logistics requirements, using various types of carriers, are assessed on a year by year basis.
Ehrsam, Eric; Kallini, Joseph R.; Lebas, Damien; Modiano, Philippe; Cotten, Hervé
2016-01-01
Fully regressive melanoma is a phenomenon in which the primary cutaneous melanoma becomes completely replaced by fibrotic components as a result of host immune response. Although 10 to 35 percent of cases of cutaneous melanomas may partially regress, fully regressive melanoma is very rare; only 47 cases have been reported in the literature to date. AH of the cases of fully regressive melanoma reported in the literature were diagnosed in conjunction with metastasis on a patient. The authors describe a case of fully regressive melanoma without any metastases at the time of its diagnosis. Characteristic findings on dermoscopy, as well as the absence of melanoma on final biopsy, confirmed the diagnosis.
Ehrsam, Eric; Kallini, Joseph R.; Lebas, Damien; Modiano, Philippe; Cotten, Hervé
2016-01-01
Fully regressive melanoma is a phenomenon in which the primary cutaneous melanoma becomes completely replaced by fibrotic components as a result of host immune response. Although 10 to 35 percent of cases of cutaneous melanomas may partially regress, fully regressive melanoma is very rare; only 47 cases have been reported in the literature to date. AH of the cases of fully regressive melanoma reported in the literature were diagnosed in conjunction with metastasis on a patient. The authors describe a case of fully regressive melanoma without any metastases at the time of its diagnosis. Characteristic findings on dermoscopy, as well as the absence of melanoma on final biopsy, confirmed the diagnosis. PMID:27672418
NASA Astrophysics Data System (ADS)
Martínez-Fernández, J.; Chuvieco, E.; Koutsias, N.
2013-02-01
Humans are responsible for most forest fires in Europe, but anthropogenic factors behind these events are still poorly understood. We tried to identify the driving factors of human-caused fire occurrence in Spain by applying two different statistical approaches. Firstly, assuming stationary processes for the whole country, we created models based on multiple linear regression and binary logistic regression to find factors associated with fire density and fire presence, respectively. Secondly, we used geographically weighted regression (GWR) to better understand and explore the local and regional variations of those factors behind human-caused fire occurrence. The number of human-caused fires occurring within a 25-yr period (1983-2007) was computed for each of the 7638 Spanish mainland municipalities, creating a binary variable (fire/no fire) to develop logistic models, and a continuous variable (fire density) to build standard linear regression models. A total of 383 657 fires were registered in the study dataset. The binary logistic model, which estimates the probability of having/not having a fire, successfully classified 76.4% of the total observations, while the ordinary least squares (OLS) regression model explained 53% of the variation of the fire density patterns (adjusted R2 = 0.53). Both approaches confirmed, in addition to forest and climatic variables, the importance of variables related with agrarian activities, land abandonment, rural population exodus and developmental processes as underlying factors of fire occurrence. For the GWR approach, the explanatory power of the GW linear model for fire density using an adaptive bandwidth increased from 53% to 67%, while for the GW logistic model the correctly classified observations improved only slightly, from 76.4% to 78.4%, but significantly according to the corrected Akaike Information Criterion (AICc), from 3451.19 to 3321.19. The results from GWR indicated a significant spatial variation in the local
Gradus, Jaimie L; Qin, Ping; Lincoln, Alisa K; Miller, Matthew; Lawler, Elizabeth; Lash, Timothy L
2010-01-01
Adjustment disorder is a diagnosis given following a significant psychosocial stressor from which an individual has difficulty recovering. The individual's reaction to this event must exceed what would be observed among similar people experiencing the same stressor. Adjustment disorder is associated with suicidal ideation and suicide attempt. However the association between adjustment disorder and completed suicide has yet to be examined. The current study is a population-based case control study examining this association in the population of Denmark aged 15 to 90 years. All suicides in Denmark from 1994 to 2006 were included, resulting in 9,612 cases. For each case, up to 30 controls were matched on gender, exact date of birth, and calendar time, yielding 199,306 controls. Adjustment disorder diagnosis was found in 7.6% of suicide cases and 0.52% of controls. Conditional logistic regression analyses revealed that those diagnosed with adjustment disorder had 12 times the rate of suicide as those without an adjustment disorder diagnosis, after controlling for history of depression diagnosis, marital status, income, and the matched factors. PMID:20865099
Weine, Stevan Merrill; Ware, Norma; Tugenberg, Toni; Hakizimana, Leonce; Dahnweih, Gonwo; Currie, Madeleine; Wagner, Maureen; Levin, Elise
2013-01-01
Objectives The purpose of this mixed method study was to characterize the patterns of psychosocial adjustment among adolescent African refugees in U.S. resettlement. Methods A purposive sample of 73 recently resettled refugee adolescents from Burundi and Liberia were followed for two years and qualitative and quantitative data was analyzed using a mixed methods exploratory design. Results Protective resources identified were the family and community capacities that can promote youth psychosocial adjustment through: 1) Finances for necessities; 2) English proficiency; 3) Social support networks; 4) Engaged parenting; 5) Family cohesion; 6) Cultural adherence and guidance; 7) Educational support; and, 8) Faith and religious involvement. The researchers first inductively identified 19 thriving, 29 managing, and 25 struggling youths based on review of cases. Univariate analyses then indicated significant associations with country of origin, parental education, and parental employment. Multiple regressions indicated that better psychosocial adjustment was associated with Liberians and living with both parents. Logistic regressions showed that thriving was associated with Liberians and higher parental education, managing with more parental education, and struggling with Burundians and living parents. Qualitative analysis identified how these factors were proxy indicators for protective resources in families and communities. Conclusion These three trajectories of psychosocial adjustment and six domains of protective resources could assist in developing targeted prevention programs and policies for refugee youth. Further rigorous longitudinal mixed-methods study of adolescent refugees in U.S. resettlement are needed. PMID:24205467
Qadir, Farah; Khalid, Amna; Medhin, Girmay
2015-01-01
This study aimed to identify prevalence rates of psychological distress among Pakistani women seeking help for primary infertility. The associations of social support, marital adjustment, and sociodemographic factors with psychological distress were also examined. A total of 177 women with primary infertility were interviewed from one hospital in Islamabad using a Self-Reporting Questionnaire, the Multidimensional Scale of Perceived Social Support, and the Locke-Wallace Marital Adjustment Test. The data were collected between November 2012 and March 2013. The prevalence of psychological distress was 37.3 percent. The results of the logistic regression suggested that marital adjustment and social support were significantly negatively associated with psychological distress in this sample. These associations were not confounded by any of the demographic variables controlled in the multivariable regression models. The role of perceived social support and adjustment in marriage among women experiencing primary infertility are important factors in understanding their psychological distress. The results of this small-scale effort highlight the need for social and familial awareness to help tackle the psychological distress related to infertility. Future research needs to focus on the way the experience of infertility is conditioned by social structural realities. New ways need to be developed to better take into account the process and nature of the infertility experience.
Nodule Regression in Adults With Nodular Gastritis
Kim, Ji Wan; Lee, Sun-Young; Kim, Jeong Hwan; Sung, In-Kyung; Park, Hyung Seok; Shim, Chan-Sup; Han, Hye Seung
2015-01-01
Background Nodular gastritis (NG) is associated with the presence of Helicobacter pylori infection, but there are controversies on nodule regression in adults. The aim of this study was to analyze the factors that are related to the nodule regression in adults diagnosed as NG. Methods Adult population who were diagnosed as NG with H. pylori infection during esophagogastroduodenoscopy (EGD) at our center were included. Changes in the size and location of the nodules, status of H. pylori infection, upper gastrointestinal (UGI) symptom, EGD and pathology findings were analyzed between the initial and follow-up tests. Results Of the 117 NG patients, 66.7% (12/18) of the eradicated NG patients showed nodule regression after H. pylori eradication, whereas 9.9% (9/99) of the non-eradicated NG patients showed spontaneous nodule regression without H. pylori eradication (P < 0.001). Nodule regression was more frequent in NG patients with antral nodule location (P = 0.010), small-sized nodules (P = 0.029), H. pylori eradication (P < 0.001), UGI symptom (P = 0.007), and a long-term follow-up period (P = 0.030). On the logistic regression analysis, nodule regression was inversely correlated with the persistent H. pylori infection on the follow-up test (odds ratio (OR): 0.020, 95% confidence interval (CI): 0.003 - 0.137, P < 0.001) and short-term follow-up period < 30.5 months (OR: 0.140, 95% CI: 0.028 - 0.700, P = 0.017). Conclusions In adults with NG, H. pylori eradication is the most significant factor associated with nodule regression. Long-term follow-up period is also correlated with nodule regression, but is less significant than H. pylori eradication. Our findings suggest that H. pylori eradication should be considered to promote nodule regression in NG patients with H. pylori infection.
Assessing risk factors for periodontitis using regression
NASA Astrophysics Data System (ADS)
Lobo Pereira, J. A.; Ferreira, Maria Cristina; Oliveira, Teresa
2013-10-01
Multivariate statistical analysis is indispensable to assess the associations and interactions between different factors and the risk of periodontitis. Among others, regression analysis is a statistical technique widely used in healthcare to investigate and model the relationship between variables. In our work we study the impact of socio-demographic, medical and behavioral factors on periodontal health. Using regression, linear and logistic models, we can assess the relevance, as risk factors for periodontitis disease, of the following independent variables (IVs): Age, Gender, Diabetic Status, Education, Smoking status and Plaque Index. The multiple linear regression analysis model was built to evaluate the influence of IVs on mean Attachment Loss (AL). Thus, the regression coefficients along with respective p-values will be obtained as well as the respective p-values from the significance tests. The classification of a case (individual) adopted in the logistic model was the extent of the destruction of periodontal tissues defined by an Attachment Loss greater than or equal to 4 mm in 25% (AL≥4mm/≥25%) of sites surveyed. The association measures include the Odds Ratios together with the correspondent 95% confidence intervals.
Improved Regression Calibration
ERIC Educational Resources Information Center
Skrondal, Anders; Kuha, Jouni
2012-01-01
The likelihood for generalized linear models with covariate measurement error cannot in general be expressed in closed form, which makes maximum likelihood estimation taxing. A popular alternative is regression calibration which is computationally efficient at the cost of inconsistent estimation. We propose an improved regression calibration…
Prediction in Multiple Regression.
ERIC Educational Resources Information Center
Osborne, Jason W.
2000-01-01
Presents the concept of prediction via multiple regression (MR) and discusses the assumptions underlying multiple regression analyses. Also discusses shrinkage, cross-validation, and double cross-validation of prediction equations and describes how to calculate confidence intervals around individual predictions. (SLD)
Gerber, Samuel; Rubel, Oliver; Bremer, Peer -Timo; Pascucci, Valerio; Whitaker, Ross T.
2012-01-19
This paper introduces a novel partition-based regression approach that incorporates topological information. Partition-based regression typically introduces a quality-of-fit-driven decomposition of the domain. The emphasis in this work is on a topologically meaningful segmentation. Thus, the proposed regression approach is based on a segmentation induced by a discrete approximation of the Morse–Smale complex. This yields a segmentation with partitions corresponding to regions of the function with a single minimum and maximum that are often well approximated by a linear model. This approach yields regression models that are amenable to interpretation and have good predictive capacity. Typically, regression estimates are quantified by their geometrical accuracy. For the proposed regression, an important aspect is the quality of the segmentation itself. Thus, this article introduces a new criterion that measures the topological accuracy of the estimate. The topological accuracy provides a complementary measure to the classical geometrical error measures and is very sensitive to overfitting. The Morse–Smale regression is compared to state-of-the-art approaches in terms of geometry and topology and yields comparable or improved fits in many cases. Finally, a detailed study on climate-simulation data demonstrates the application of the Morse–Smale regression. Supplementary Materials are available online and contain an implementation of the proposed approach in the R package msr, an analysis and simulations on the stability of the Morse–Smale complex approximation, and additional tables for the climate-simulation study.
Regression problems for magnitudes
NASA Astrophysics Data System (ADS)
Castellaro, S.; Mulargia, F.; Kagan, Y. Y.
2006-06-01
Least-squares linear regression is so popular that it is sometimes applied without checking whether its basic requirements are satisfied. In particular, in studying earthquake phenomena, the conditions (a) that the uncertainty on the independent variable is at least one order of magnitude smaller than the one on the dependent variable, (b) that both data and uncertainties are normally distributed and (c) that residuals are constant are at times disregarded. This may easily lead to wrong results. As an alternative to least squares, when the ratio between errors on the independent and the dependent variable can be estimated, orthogonal regression can be applied. We test the performance of orthogonal regression in its general form against Gaussian and non-Gaussian data and error distributions and compare it with standard least-square regression. General orthogonal regression is found to be superior or equal to the standard least squares in all the cases investigated and its use is recommended. We also compare the performance of orthogonal regression versus standard regression when, as often happens in the literature, the ratio between errors on the independent and the dependent variables cannot be estimated and is arbitrarily set to 1. We apply these results to magnitude scale conversion, which is a common problem in seismology, with important implications in seismic hazard evaluation, and analyse it through specific tests. Our analysis concludes that the commonly used standard regression may induce systematic errors in magnitude conversion as high as 0.3-0.4, and, even more importantly, this can introduce apparent catalogue incompleteness, as well as a heavy bias in estimates of the slope of the frequency-magnitude distributions. All this can be avoided by using the general orthogonal regression in magnitude conversions.
Adjustment for local ancestry in genetic association analysis of admixed populations
Wang, Xuexia; Zhu, Xiaofeng; Qin, Huaizhen; Cooper, Richard S.; Ewens, Warren J.; Li, Chun; Li, Mingyao
2011-01-01
Motivation: Admixed populations offer a unique opportunity for mapping diseases that have large disease allele frequency differences between ancestral populations. However, association analysis in such populations is challenging because population stratification may lead to association with loci unlinked to the disease locus. Methods and results: We show that local ancestry at a test single nucleotide polymorphism (SNP) may confound with the association signal and ignoring it can lead to spurious association. We demonstrate theoretically that adjustment for local ancestry at the test SNP is sufficient to remove the spurious association regardless of the mechanism of population stratification, whether due to local or global ancestry differences among study subjects; however, global ancestry adjustment procedures may not be effective. We further develop two novel association tests that adjust for local ancestry. Our first test is based on a conditional likelihood framework which models the distribution of the test SNP given disease status and flanking marker genotypes. A key advantage of this test lies in its ability to incorporate different directions of association in the ancestral populations. Our second test, which is computationally simpler, is based on logistic regression, with adjustment for local ancestry proportion. We conducted extensive simulations and found that the Type I error rates of our tests are under control; however, the global adjustment procedures yielded inflated Type I error rates when stratification is due to local ancestry difference. Contact: mingyao@upenn.edu; chun.li@vanderbilt.edu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21169375
Space Shuttle operational logistics plan
NASA Technical Reports Server (NTRS)
Botts, J. W.
1983-01-01
The Kennedy Space Center plan for logistics to support Space Shuttle Operations and to establish the related policies, requirements, and responsibilities are described. The Directorate of Shuttle Management and Operations logistics responsibilities required by the Kennedy Organizational Manual, and the self-sufficiency contracting concept are implemented. The Space Shuttle Program Level 1 and Level 2 logistics policies and requirements applicable to KSC that are presented in HQ NASA and Johnson Space Center directives are also implemented.
Multivariate Regression with Calibration*
Liu, Han; Wang, Lie; Zhao, Tuo
2014-01-01
We propose a new method named calibrated multivariate regression (CMR) for fitting high dimensional multivariate regression models. Compared to existing methods, CMR calibrates the regularization for each regression task with respect to its noise level so that it is simultaneously tuning insensitive and achieves an improved finite-sample performance. Computationally, we develop an efficient smoothed proximal gradient algorithm which has a worst-case iteration complexity O(1/ε), where ε is a pre-specified numerical accuracy. Theoretically, we prove that CMR achieves the optimal rate of convergence in parameter estimation. We illustrate the usefulness of CMR by thorough numerical simulations and show that CMR consistently outperforms other high dimensional multivariate regression methods. We also apply CMR on a brain activity prediction problem and find that CMR is as competitive as the handcrafted model created by human experts. PMID:25620861
Metamorphic geodesic regression.
Hong, Yi; Joshi, Sarang; Sanchez, Mar; Styner, Martin; Niethammer, Marc
2012-01-01
We propose a metamorphic geodesic regression approach approximating spatial transformations for image time-series while simultaneously accounting for intensity changes. Such changes occur for example in magnetic resonance imaging (MRI) studies of the developing brain due to myelination. To simplify computations we propose an approximate metamorphic geodesic regression formulation that only requires pairwise computations of image metamorphoses. The approximated solution is an appropriately weighted average of initial momenta. To obtain initial momenta reliably, we develop a shooting method for image metamorphosis.
Miki, Takako; Kochi, Takeshi; Kuwahara, Keisuke; Eguchi, Masafumi; Kurotani, Kayo; Tsuruoka, Hiroko; Ito, Rie; Kabe, Isamu; Kawakami, Norito; Mizoue, Tetsuya; Nanri, Akiko
2015-09-30
Depression has been linked to the overall diet using both exploratory and pre-defined methods. However, neither of these methods incorporates specific knowledge on nutrient-disease associations. The aim of the present study was to empirically identify dietary patterns using reduced rank regression and to examine their relations to depressive symptoms. Participants were 2006 Japanese employees aged 19-69 years. Depressive symptoms were assessed using the Center for Epidemiologic Studies Depression Scale. Diet was assessed using a validated, self-administered diet history questionnaire. Dietary patterns were extracted by reduced rank regression with 6 depression-related nutrients as response variables. Logistic regression was used to estimate odds ratios of depressive symptoms adjusted for potential confounders. A dietary pattern characterized by a high intake of vegetables, mushrooms, seaweeds, soybean products, green tea, potatoes, fruits, and small fish with bones and a low intake of rice was associated with fewer depressive symptoms. The multivariable-adjusted odds ratios of having depressive symptoms were 0.62 (95% confidence interval, 0.48-0.81) in the highest versus lowest tertiles of dietary score. Results suggest that adherence to a diet rich in vegetables, fruits, and typical Japanese foods, including mushrooms, seaweeds, soybean products, and green tea, is associated with a lower probability of having depressive symptoms.
Logistic Discriminant Function Analysis for DIF Identification of Polytomously Scored Items.
ERIC Educational Resources Information Center
Miller, Timothy R.; Spray, Judith A.
1993-01-01
Presents logistic discriminant analysis as a means of detecting differential item functioning (DIF) in items that are polytomously scored. Provides examples of DIF detection using a 27-item mathematics test with 1,977 examinees. The proposed method is simpler and more practical than polytomous extensions of the logistic regression DIF procedure.…
NASA Technical Reports Server (NTRS)
1963-01-01
This document has been prepared to incorporate all presentation aid material, together with some explanatory text, used during an oral briefing on the Nuclear Lunar Logistics System given at the George C. Marshall Space Flight Center, National Aeronautics and Space Administration, on 18 July 1963. The briefing and this document are intended to present the general status of the NERVA (Nuclear Engine for Rocket Vehicle Application) nuclear rocket development, the characteristics of certain operational NERVA-class engines, and appropriate technical and schedule information. Some of the information presented herein is preliminary in nature and will be subject to further verification, checking and analysis during the remainder of the study program. In addition, more detailed information will be prepared in many areas for inclusion in a final summary report. This work has been performed by REON, a division of Aerojet-General Corporation under Subcontract 74-10039 from the Lockheed Missiles and Space Company. The presentation and this document have been prepared in partial fulfillment of the provisions of the subcontract. From the inception of the NERVA program in July 1961, the stated emphasis has centered around the demonstration of the ability of a nuclear rocket to perform safely and reliably in the space environment, with the understanding that the assignment of a mission (or missions) would place undue emphasis on performance and operational flexibility. However, all were aware that the ultimate justification for the development program must lie in the application of the nuclear propulsion system to the national space objectives.
Technical issues: logistics. AAMC.
Stillman, P L
1993-06-01
The author states that she became interested in standardized patients (SPs) around 20 years ago as a means of developing a more uniform and effective way to provide instruction and evaluation of basic clinical skills. She reflects upon in detail: (1) the logistics of using SPs in teaching; (2) how SPs are used in assessment; (3) what aspects of performance SPs can be trained to record and evaluate; (4) issues concerning checklists; (5) evaluation of interviewing skills; (6) evaluation of written communication skills; (7) importance of defining what is being tested; (8) various kinds and uses of inter-station exercises and problems of scoring them; (9) case development and the various sources for case material; (10) ways to generate scores; (11) selecting and training SPs; (12) role of the faculty and primary importance of bedside training with real patients; and (13) pros and cons of national versus single-school efforts to use SPs. She concludes by cautioning that further research must be done before SPs can be used for high-stakes certifying and licensing examinations. PMID:8507311
Tarpey, Thaddeus; Petkova, Eva
2010-07-01
Finite mixture models have come to play a very prominent role in modelling data. The finite mixture model is predicated on the assumption that distinct latent groups exist in the population. The finite mixture model therefore is based on a categorical latent variable that distinguishes the different groups. Often in practice distinct sub-populations do not actually exist. For example, disease severity (e.g. depression) may vary continuously and therefore, a distinction of diseased and not-diseased may not be based on the existence of distinct sub-populations. Thus, what is needed is a generalization of the finite mixture's discrete latent predictor to a continuous latent predictor. We cast the finite mixture model as a regression model with a latent Bernoulli predictor. A latent regression model is proposed by replacing the discrete Bernoulli predictor by a continuous latent predictor with a beta distribution. Motivation for the latent regression model arises from applications where distinct latent classes do not exist, but instead individuals vary according to a continuous latent variable. The shapes of the beta density are very flexible and can approximate the discrete Bernoulli distribution. Examples and a simulation are provided to illustrate the latent regression model. In particular, the latent regression model is used to model placebo effect among drug treated subjects in a depression study. PMID:20625443
Semiparametric Regression Pursuit.
Huang, Jian; Wei, Fengrong; Ma, Shuangge
2012-10-01
The semiparametric partially linear model allows flexible modeling of covariate effects on the response variable in regression. It combines the flexibility of nonparametric regression and parsimony of linear regression. The most important assumption in the existing methods for the estimation in this model is to assume a priori that it is known which covariates have a linear effect and which do not. However, in applied work, this is rarely known in advance. We consider the problem of estimation in the partially linear models without assuming a priori which covariates have linear effects. We propose a semiparametric regression pursuit method for identifying the covariates with a linear effect. Our proposed method is a penalized regression approach using a group minimax concave penalty. Under suitable conditions we show that the proposed approach is model-pursuit consistent, meaning that it can correctly determine which covariates have a linear effect and which do not with high probability. The performance of the proposed method is evaluated using simulation studies, which support our theoretical results. A real data example is used to illustrated the application of the proposed method. PMID:23559831
Guo, Qian; Unger, Jennifer B; Palmer, Paula H; Chou, Chih-Ping; Johnson, C Anderson
2013-01-01
Previous studies have documented that cognitive attributions are correlated with adolescent smoking. The present study further explored whether cognitive attributions for smoking influenced adolescents' future smoking behaviors, especially transitions to more advanced stages of smoking. Participants were 12,382 middle and high school students (48.5% males and 51.5% females) in seven large cities in China. They completed two waves of surveys one year apart. Cognitive attributions for smoking and three smoking behavior outcomes (lifetime smoking, past 30-day smoking, and daily smoking) were assessed. Changes in smoking, including progression from lower stages to higher stages and regression from higher stages to lower stages, over a one-year period, were defined longitudinally. Polychotomous logistic regression was used to examine associations between cognitive attributions for smoking and changes in smoking status over one year, adjusting for demographic characteristics and other plausible confounders. Seven out of eight cognitive attributions for smoking were associated with subsequent smoking behaviors (p<0.05). Curiosity, autonomy, social image, social belonging, and coping influenced earlier stages of smoking, whereas mental enhancement and engagement influenced later stages of smoking. Curiosity, autonomy, social image, and mental enhancement preceded smoking progression; social belonging prevented smoking regression; and coping and engagement both preceded smoking progression and prevented smoking regression. This study demonstrates that different cognitive attributions influence smoking at different stages in different ways. These findings could inform smoking prevention and cessation programs targeting Chinese adolescents.
Explorations in Statistics: Regression
ERIC Educational Resources Information Center
Curran-Everett, Douglas
2011-01-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This seventh installment of "Explorations in Statistics" explores regression, a technique that estimates the nature of the relationship between two things for which we may only surmise a mechanistic or predictive connection.…
Modern Regression Discontinuity Analysis
ERIC Educational Resources Information Center
Bloom, Howard S.
2012-01-01
This article provides a detailed discussion of the theory and practice of modern regression discontinuity (RD) analysis for estimating the effects of interventions or treatments. Part 1 briefly chronicles the history of RD analysis and summarizes its past applications. Part 2 explains how in theory an RD analysis can identify an average effect of…
Multiple linear regression analysis
NASA Technical Reports Server (NTRS)
Edwards, T. R.
1980-01-01
Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.
Mechanisms of neuroblastoma regression
Brodeur, Garrett M.; Bagatell, Rochelle
2014-01-01
Recent genomic and biological studies of neuroblastoma have shed light on the dramatic heterogeneity in the clinical behaviour of this disease, which spans from spontaneous regression or differentiation in some patients, to relentless disease progression in others, despite intensive multimodality therapy. This evidence also suggests several possible mechanisms to explain the phenomena of spontaneous regression in neuroblastomas, including neurotrophin deprivation, humoral or cellular immunity, loss of telomerase activity and alterations in epigenetic regulation. A better understanding of the mechanisms of spontaneous regression might help to identify optimal therapeutic approaches for patients with these tumours. Currently, the most druggable mechanism is the delayed activation of developmentally programmed cell death regulated by the tropomyosin receptor kinase A pathway. Indeed, targeted therapy aimed at inhibiting neurotrophin receptors might be used in lieu of conventional chemotherapy or radiation in infants with biologically favourable tumours that require treatment. Alternative approaches consist of breaking immune tolerance to tumour antigens or activating neurotrophin receptor pathways to induce neuronal differentiation. These approaches are likely to be most effective against biologically favourable tumours, but they might also provide insights into treatment of biologically unfavourable tumours. We describe the different mechanisms of spontaneous neuroblastoma regression and the consequent therapeutic approaches. PMID:25331179
Logistics in smallpox: the legacy.
Wickett, John; Carrasco, Peter
2011-12-30
Logistics, defined as "the time-related positioning of resources" was critical to the implementation of the smallpox eradication strategy of surveillance and containment. Logistical challenges in the smallpox programme included vaccine delivery, supplies, staffing, vehicle maintenance, and financing. Ensuring mobility was essential as health workers had to travel to outbreaks to contain them. Three examples illustrate a range of logistic challenges which required imagination and innovation. Standard price lists were developed to expedite vehicle maintenance and repair in Bihar, India. Innovative staffing ensured an adequate infrastructure for vehicle maintenance in Bangladesh. The use of disaster relief mechanisms in Somalia provided airlifts, vehicles and funding within 27 days of their initiation. In contrast the Expanded Programme on Immunization (EPI) faces more complex logistical challenges.
Ridge Regression Signal Processing
NASA Technical Reports Server (NTRS)
Kuhl, Mark R.
1990-01-01
The introduction of the Global Positioning System (GPS) into the National Airspace System (NAS) necessitates the development of Receiver Autonomous Integrity Monitoring (RAIM) techniques. In order to guarantee a certain level of integrity, a thorough understanding of modern estimation techniques applied to navigational problems is required. The extended Kalman filter (EKF) is derived and analyzed under poor geometry conditions. It was found that the performance of the EKF is difficult to predict, since the EKF is designed for a Gaussian environment. A novel approach is implemented which incorporates ridge regression to explain the behavior of an EKF in the presence of dynamics under poor geometry conditions. The basic principles of ridge regression theory are presented, followed by the derivation of a linearized recursive ridge estimator. Computer simulations are performed to confirm the underlying theory and to provide a comparative analysis of the EKF and the recursive ridge estimator.
Tailored logistics: the next advantage.
Fuller, J B; O'Conor, J; Rawlinson, R
1993-01-01
How many top executives have ever visited with managers who move materials from the factory to the store? How many still reduce the costs of logistics to the rent of warehouses and the fees charged by common carriers? To judge by hours of senior management attention, logistics problems do not rank high. But logistics have the potential to become the next governing element of strategy. Whether they know it or not, senior managers of every retail store and diversified manufacturing company compete in logistically distinct businesses. Customer needs vary, and companies can tailor their logistics systems to serve their customers better and more profitably. Companies do not create value for customers and sustainable advantage for themselves merely by offering varieties of goods. Rather, they offer goods in distinct ways. A particular can of Coca-Cola, for example, might be a can of Coca-Cola going to a vending machine, or a can of Coca-Cola that comes with billing services. There is a fortune buried in this distinction. The goal of logistics strategy is building distinct approaches to distinct groups of customers. The first step is organizing a cross-functional team to proceed through the following steps: segmenting customers according to purchase criteria, establishing different standards of service for different customer segments, tailoring logistics pipelines to support each segment, and creating economics of scale to determine which assets can be shared among various pipelines. The goal of establishing logistically distinct businesses is familiar: improved knowledge of customers and improved means of satisfying them.
Tailored logistics: the next advantage.
Fuller, J B; O'Conor, J; Rawlinson, R
1993-01-01
How many top executives have ever visited with managers who move materials from the factory to the store? How many still reduce the costs of logistics to the rent of warehouses and the fees charged by common carriers? To judge by hours of senior management attention, logistics problems do not rank high. But logistics have the potential to become the next governing element of strategy. Whether they know it or not, senior managers of every retail store and diversified manufacturing company compete in logistically distinct businesses. Customer needs vary, and companies can tailor their logistics systems to serve their customers better and more profitably. Companies do not create value for customers and sustainable advantage for themselves merely by offering varieties of goods. Rather, they offer goods in distinct ways. A particular can of Coca-Cola, for example, might be a can of Coca-Cola going to a vending machine, or a can of Coca-Cola that comes with billing services. There is a fortune buried in this distinction. The goal of logistics strategy is building distinct approaches to distinct groups of customers. The first step is organizing a cross-functional team to proceed through the following steps: segmenting customers according to purchase criteria, establishing different standards of service for different customer segments, tailoring logistics pipelines to support each segment, and creating economics of scale to determine which assets can be shared among various pipelines. The goal of establishing logistically distinct businesses is familiar: improved knowledge of customers and improved means of satisfying them. PMID:10126157
NASA Space Rocket Logistics Challenges
NASA Technical Reports Server (NTRS)
Neeley, James R.; Jones, James V.; Watson, Michael D.; Bramon, Christopher J.; Inman, Sharon K.; Tuttle, Loraine
2014-01-01
The Space Launch System (SLS) is the new NASA heavy lift launch vehicle and is scheduled for its first mission in 2017. The goal of the first mission, which will be uncrewed, is to demonstrate the integrated system performance of the SLS rocket and spacecraft before a crewed flight in 2021. SLS has many of the same logistics challenges as any other large scale program. Common logistics concerns for SLS include integration of discreet programs geographically separated, multiple prime contractors with distinct and different goals, schedule pressures and funding constraints. However, SLS also faces unique challenges. The new program is a confluence of new hardware and heritage, with heritage hardware constituting seventy-five percent of the program. This unique approach to design makes logistics concerns such as commonality especially problematic. Additionally, a very low manifest rate of one flight every four years makes logistics comparatively expensive. That, along with the SLS architecture being developed using a block upgrade evolutionary approach, exacerbates long-range planning for supportability considerations. These common and unique logistics challenges must be clearly identified and tackled to allow SLS to have a successful program. This paper will address the common and unique challenges facing the SLS programs, along with the analysis and decisions the NASA Logistics engineers are making to mitigate the threats posed by each.
You, Hua; Gu, Hai; Ning, Weiqing; Zhou, Hua; Dong, Hengjin
2016-01-01
Background The New Rural Cooperative Medical Scheme (NCMS) includes a maternal care benefits package that is associated with increasing maternal health services. The local compensation policies have been frequently adjusted in recent years. This study examined the association between the NCMS maternal-services policy adjustment and expense reimbursement in Yuyao, China. Methods Two household surveys were conducted in Yuyao in 2008 and 2011 (before and after the NCMS policy adjustment, respectively). Local women (N = 154) who had delivery history in the past three years were recruited. A questionnaire was used to collect information about delivery history, maternal health services utilization (prenatal care, postnatal care, and the grade of delivery institutions), NCMS participation, and reimbursement status. Logistic regression analyses were used to predict the association between policy adjustment and maternal health utilization and the association between policy adjustment and out-of-pocket proportion. Next, t-tests and covariance analyses adjusting for household income were used to compare the out-of-pocket proportion between 2008 and 2011. Results Results revealed that compensation policy adjustment was associated with an increase in postnatal visits (adjusted OR = 3.32, p = 0.009) and the use of second level or above institutions for delivery (adjusted OR = 2.32, p = 0.03) among participants. In 2008, only 9.1% of pregnant women received reimbursement from the NCMS; however, this rate increased to 36.8% in 2011. After policy adjustment, there were no significant changes in the proportion of out-of-pocket expenses shared in delivery fee (F = 0.24, p = 0.63) and in household income (F = 0.46, p = 0.50). Conclusions Financial compensation increase improved maternal health services utilization; however, this effect was limited. Although the reimbursement rate was raised, the out-of-pocket proportion was not significant changed; therefore, the compensation design
Orthogonal Regression: A Teaching Perspective
ERIC Educational Resources Information Center
Carr, James R.
2012-01-01
A well-known approach to linear least squares regression is that which involves minimizing the sum of squared orthogonal projections of data points onto the best fit line. This form of regression is known as orthogonal regression, and the linear model that it yields is known as the major axis. A similar method, reduced major axis regression, is…
A development of logistics management models for the Space Transportation System
NASA Technical Reports Server (NTRS)
Carrillo, M. J.; Jacobsen, S. E.; Abell, J. B.; Lippiatt, T. F.
1983-01-01
A new analytic queueing approach was described which relates stockage levels, repair level decisions, and the project network schedule of prelaunch operations directly to the probability distribution of the space transportation system launch delay. Finite source population and limited repair capability were additional factors included in this logistics management model developed specifically for STS maintenance requirements. Data presently available to support logistics decisions were based on a comparability study of heavy aircraft components. A two-phase program is recommended by which NASA would implement an integrated data collection system, assemble logistics data from previous STS flights, revise extant logistics planning and resource requirement parameters using Bayes-Lin techniques, and adjust for uncertainty surrounding logistics systems performance parameters. The implementation of these recommendations can be expected to deliver more cost-effective logistics support.
Logistics Management: New trends in the Reverse Logistics
NASA Astrophysics Data System (ADS)
Antonyová, A.; Antony, P.; Soewito, B.
2016-04-01
Present level and quality of the environment are directly dependent on our access to natural resources, as well as their sustainability. In particular production activities and phenomena associated with it have a direct impact on the future of our planet. Recycling process, which in large enterprises often becomes an important and integral part of the production program, is usually in small and medium-sized enterprises problematic. We can specify a few factors, which have direct impact on the development and successful application of the effective reverse logistics system. Find the ways to economically acceptable model of reverse logistics, focusing on converting waste materials for renewable energy, is the task in progress.
Kramer, S.
1996-12-31
In many real-world domains the task of machine learning algorithms is to learn a theory for predicting numerical values. In particular several standard test domains used in Inductive Logic Programming (ILP) are concerned with predicting numerical values from examples and relational and mostly non-determinate background knowledge. However, so far no ILP algorithm except one can predict numbers and cope with nondeterminate background knowledge. (The only exception is a covering algorithm called FORS.) In this paper we present Structural Regression Trees (SRT), a new algorithm which can be applied to the above class of problems. SRT integrates the statistical method of regression trees into ILP. It constructs a tree containing a literal (an atomic formula or its negation) or a conjunction of literals in each node, and assigns a numerical value to each leaf. SRT provides more comprehensible results than purely statistical methods, and can be applied to a class of problems most other ILP systems cannot handle. Experiments in several real-world domains demonstrate that the approach is competitive with existing methods, indicating that the advantages are not at the expense of predictive accuracy.
Logistics background study: underground mining
Hanslovan, J. J.; Visovsky, R. G.
1982-02-01
Logistical functions that are normally associated with US underground coal mining are investigated and analyzed. These functions imply all activities and services that support the producing sections of the mine. The report provides a better understanding of how these functions impact coal production in terms of time, cost, and safety. Major underground logistics activities are analyzed and include: transportation and personnel, supplies and equipment; transportation of coal and rock; electrical distribution and communications systems; water handling; hydraulics; and ventilation systems. Recommended areas for future research are identified and prioritized.
Continual Improvement in Shuttle Logistics
NASA Technical Reports Server (NTRS)
Flowers, Jean; Schafer, Loraine
1995-01-01
It has been said that Continual Improvement (CI) is difficult to apply to service oriented functions, especially in a government agency such as NASA. However, a constrained budget and increasing requirements are a way of life at NASA Kennedy Space Center (KSC), making it a natural environment for the application of CI tools and techniques. This paper describes how KSC, and specifically the Space Shuttle Logistics Project, a key contributor to KSC's mission, has embraced the CI management approach as a means of achieving its strategic goals and objectives. An overview of how the KSC Space Shuttle Logistics Project has structured its CI effort and examples of some of the initiatives are provided.
CSWS-related autistic regression versus autistic regression without CSWS.
Tuchman, Roberto
2009-08-01
Continuous spike-waves during slow-wave sleep (CSWS) and Landau-Kleffner syndrome (LKS) are two clinical epileptic syndromes that are associated with the electroencephalography (EEG) pattern of electrical status epilepticus during slow wave sleep (ESES). Autistic regression occurs in approximately 30% of children with autism and is associated with an epileptiform EEG in approximately 20%. The behavioral phenotypes of CSWS, LKS, and autistic regression overlap. However, the differences in age of regression, degree and type of regression, and frequency of epilepsy and EEG abnormalities suggest that these are distinct phenotypes. CSWS with autistic regression is rare, as is autistic regression associated with ESES. The pathophysiology and as such the treatment implications for children with CSWS and autistic regression are distinct from those with autistic regression without CSWS.
An age-adjusted seroprevalence study of Toxoplasma antibody in a Malaysian ophthalmology unit.
Singh, Sujaya; Khang, Tsung Fei; Andiappan, Hemah; Nissapatorn, Veeranoot; Subrayan, Visvaraja
2012-05-01
Toxoplasma gondii is a public health risk in developing countries, especially those located in the tropics. Widespread infection may inflict a substantial burden on state resources, as patients can develop severe neurological defects and ocular diseases that result in lifelong loss of economic independence. We tested sera for IgG antibody from 493 eye patients in Malaysia. Overall age-adjusted seroprevalence was estimated to be 25% (95% CI: [21%, 29%]). We found approximately equal age-adjusted seroprevalence in Chinese (31%; 95% CI: [25%, 38%]) and Malays (29%; 95% CI: [21%, 36%]), followed by Indians (19%; 95% CI: [13%, 25%]). A logistic regression of the odds for T. gondii seroprevalence against age, gender, ethnicity and the occurrence of six types of ocular diseases showed that only age and ethnicity were significant predictors. The odds for T. gondii seroprevalence were 2.7 (95% CI for OR: [1.9, 4.0]) times higher for a patient twice as old as the other, with ethnicity held constant. In Malays, we estimated the odds for T. gondii seroprevalence to be 2.9 (95% CI for OR: [1.8, 4.5]) times higher compared to non-Malays, with age held constant. Previous studies of T. gondii seroprevalence in Malaysia did not explicitly adjust for age, rendering comparisons difficult. Our study highlights the need to adopt a more rigorous epidemiological approach in monitoring T. gondii seroprevalence in Malaysia.
Logistics support of space facilities
NASA Technical Reports Server (NTRS)
Lewis, William C.
1988-01-01
The logistic support of space facilities is described, with special attention given to the problem of sizing the inventory of ready spares kept at the space facility. Where possible, data from the Space Shuttle Orbiter is extrapolated to provide numerical estimates for space facilities. Attention is also given to repair effort estimation and long duration missions.
NASA Space Rocket Logistics Challenges
NASA Technical Reports Server (NTRS)
Bramon, Chris; Neeley, James R.; Jones, James V.; Watson, Michael D.; Inman, Sharon K.; Tuttle, Loraine
2014-01-01
The Space Launch System (SLS) is the new NASA heavy lift launch vehicle in development and is scheduled for its first mission in 2017. SLS has many of the same logistics challenges as any other large scale program. However, SLS also faces unique challenges. This presentation will address the SLS challenges, along with the analysis and decisions to mitigate the threats posed by each.
Regression analysis for solving diagnosis problem of children's health
NASA Astrophysics Data System (ADS)
Cherkashina, Yu A.; Gerget, O. M.
2016-04-01
The paper includes results of scientific researches. These researches are devoted to the application of statistical techniques, namely, regression analysis, to assess the health status of children in the neonatal period based on medical data (hemostatic parameters, parameters of blood tests, the gestational age, vascular-endothelial growth factor) measured at 3-5 days of children's life. In this paper a detailed description of the studied medical data is given. A binary logistic regression procedure is discussed in the paper. Basic results of the research are presented. A classification table of predicted values and factual observed values is shown, the overall percentage of correct recognition is determined. Regression equation coefficients are calculated, the general regression equation is written based on them. Based on the results of logistic regression, ROC analysis was performed, sensitivity and specificity of the model are calculated and ROC curves are constructed. These mathematical techniques allow carrying out diagnostics of health of children providing a high quality of recognition. The results make a significant contribution to the development of evidence-based medicine and have a high practical importance in the professional activity of the author.
Wild bootstrap for quantile regression.
Feng, Xingdong; He, Xuming; Hu, Jianhua
2011-12-01
The existing theory of the wild bootstrap has focused on linear estimators. In this note, we broaden its validity by providing a class of weight distributions that is asymptotically valid for quantile regression estimators. As most weight distributions in the literature lead to biased variance estimates for nonlinear estimators of linear regression, we propose a modification of the wild bootstrap that admits a broader class of weight distributions for quantile regression. A simulation study on median regression is carried out to compare various bootstrap methods. With a simple finite-sample correction, the wild bootstrap is shown to account for general forms of heteroscedasticity in a regression model with fixed design points.
Do subfertile women adjust their habits when trying to conceive?
Joelsson, Lana Salih; Berglund, Anna; Wånggren, Kjell; Lood, Mikael; Rosenblad, Andreas; Tydén, Tanja
2016-01-01
Aim The aim of this study was to investigate lifestyle habits and lifestyle adjustments among subfertile women trying to conceive. Materials and methods Women (n = 747) were recruited consecutively at their first visit to fertility clinics in mid-Sweden. Participants completed a questionnaire. Data were analyzed using logistic regression, t tests, and chi-square tests. Results The response rate was 62% (n = 466). Mean duration of infertility was 1.9 years. During this time 13.2% used tobacco daily, 13.6% drank more than three cups of coffee per day, and 11.6% consumed more than two glasses of alcohol weekly. In this sample, 23.9% of the women were overweight (body mass index, BMI 25–29.9 kg/m2), and 12.5% were obese (BMI ≥30 kg/m2). Obese women exercised more and changed to healthy diets more frequently than normal-weight women (odds ratio 7.43; 95% confidence interval 3.7–14.9). Six out of ten women (n = 266) took folic acid when they started trying to conceive, but 11% stopped taking folic acid after some time. Taking folic acid was associated with a higher level of education (p < 0.001). Conclusions Among subfertile women, one-third were overweight or obese, and some had other lifestyle factors with known adverse effects on fertility such as use of tobacco. Overweight and obese women adjusted their habits but did not reduce their body mass index. Women of fertile age would benefit from preconception counseling, and the treatment of infertility should routinely offer interventions for lifestyle changes. PMID:27216564
ADJUSTABLE DOUBLE PULSE GENERATOR
Gratian, J.W.; Gratian, A.C.
1961-08-01
>A modulator pulse source having adjustable pulse width and adjustable pulse spacing is described. The generator consists of a cross coupled multivibrator having adjustable time constant circuitry in each leg, an adjustable differentiating circuit in the output of each leg, a mixing and rectifying circuit for combining the differentiated pulses and generating in its output a resultant sequence of negative pulses, and a final amplifying circuit for inverting and square-topping the pulses. (AEC)
Logistics, electronic commerce, and the environment
NASA Astrophysics Data System (ADS)
Sarkis, Joseph; Meade, Laura; Talluri, Srinivas
2002-02-01
Organizations realize that a strong supporting logistics or electronic logistics (e-logistics) function is important from both commercial and consumer perspectives. The implications of e-logistics models and practices cover the forward and reverse logistics functions of organizations. They also have direct and profound impact on the natural environment. This paper will focus on a discussion of forward and reverse e-logistics and their relationship to the natural environment. After discussion of the many pertinent issues in these areas, directions of practice and implications for study and research are then described.
Teaching and hospital production: the use of regression estimates.
Lehner, L A; Burgess, J F
1995-01-01
Medicare's Prospective Payment System pays U.S. teaching hospitals for the indirect costs of medical education based on a regression coefficient in a cost function. In regression studies using health care data, it is common for explanatory variables to be measured imperfectly, yet the potential for measurement error is often ignored. In this paper, U.S. Department of Veterans Affairs data is used to examine issues of health care production estimation and the use of regression estimates like the teaching adjustment factor. The findings show that measurement error and persistent multicollinearity confound attempts to have a large degree of confidence in the precise magnitude of parameter estimates.
Logistic systems with linear feedback
NASA Astrophysics Data System (ADS)
Son, Leonid; Shulgin, Dmitry; Ogluzdina, Olga
2016-08-01
A wide variety of systems may be described by specific dependence, which is known as logistic curve, or S-curve, between the internal characteristic and the external parameter. Linear feedback between these two values may be suggested for a wide set of systems also. In present paper, we suggest a bifurcation behavior for systems with both features, and discuss it for two cases, which are the Ising magnet in external field, and the development of manufacturing enterprise.
Multisource information fusion for logistics
NASA Astrophysics Data System (ADS)
Woodley, Robert; Petrov, Plamen; Noll, Warren
2011-05-01
Current Army logistical systems and databases contain massive amounts of data that need an effective method to extract actionable information. The databases do not contain root cause and case-based analysis needed to diagnose or predict breakdowns. A system is needed to find data from as many sources as possible, process it in an integrated fashion, and disseminate information products on the readiness of the fleet vehicles. 21st Century Systems, Inc. introduces the Agent- Enabled Logistics Enterprise Intelligence System (AELEIS) tool, designed to assist logistics analysts with assessing the availability and prognostics of assets in the logistics pipeline. AELEIS extracts data from multiple, heterogeneous data sets. This data is then aggregated and mined for data trends. Finally, data reasoning tools and prognostics tools evaluate the data for relevance and potential issues. Multiple types of data mining tools may be employed to extract the data and an information reasoning capability determines what tools are needed to apply them to extract information. This can be visualized as a push-pull system where data trends fire a reasoning engine to search for corroborating evidence and then integrate the data into actionable information. The architecture decides on what reasoning engine to use (i.e., it may start with a rule-based method, but, if needed, go to condition based reasoning, and even a model-based reasoning engine for certain types of equipment). Initial results show that AELEIS is able to indicate to the user of potential fault conditions and root-cause information mined from a database.
Evaluating differential effects using regression interactions and regression mixture models
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This paper focuses on understanding regression mixture models, a relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their formulation, and their assumptions are compared using Monte Carlo simulations and real data analysis. The capabilities of regression mixture models are described and specific issues to be addressed when conducting regression mixtures are proposed. The paper aims to clarify the role that regression mixtures can take in the estimation of differential effects and increase awareness of the benefits and potential pitfalls of this approach. Regression mixture models are shown to be a potentially effective exploratory method for finding differential effects when these effects can be defined by a small number of classes of respondents who share a typical relationship between a predictor and an outcome. It is also shown that the comparison between regression mixture models and interactions becomes substantially more complex as the number of classes increases. It is argued that regression interactions are well suited for direct tests of specific hypotheses about differential effects and regression mixtures provide a useful approach for exploring effect heterogeneity given adequate samples and study design. PMID:26556903
Comparing the Discrete and Continuous Logistic Models
ERIC Educational Resources Information Center
Gordon, Sheldon P.
2008-01-01
The solutions of the discrete logistic growth model based on a difference equation and the continuous logistic growth model based on a differential equation are compared and contrasted. The investigation is conducted using a dynamic interactive spreadsheet. (Contains 5 figures.)
Linear regression in astronomy. II
NASA Technical Reports Server (NTRS)
Feigelson, Eric D.; Babu, Gutti J.
1992-01-01
A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.
Quantile regression for climate data
NASA Astrophysics Data System (ADS)
Marasinghe, Dilhani Shalika
Quantile regression is a developing statistical tool which is used to explain the relationship between response and predictor variables. This thesis describes two examples of climatology using quantile regression.Our main goal is to estimate derivatives of a conditional mean and/or conditional quantile function. We introduce a method to handle autocorrelation in the framework of quantile regression and used it with the temperature data. Also we explain some properties of the tornado data which is non-normally distributed. Even though quantile regression provides a more comprehensive view, when talking about residuals with the normality and the constant variance assumption, we would prefer least square regression for our temperature analysis. When dealing with the non-normality and non constant variance assumption, quantile regression is a better candidate for the estimation of the derivative.
Risk-adjusted monitoring of survival times
Sego, Landon H.; Reynolds, Marion R.; Woodall, William H.
2009-02-26
We consider the monitoring of clinical outcomes, where each patient has a di®erent risk of death prior to undergoing a health care procedure.We propose a risk-adjusted survival time CUSUM chart (RAST CUSUM) for monitoring clinical outcomes where the primary endpoint is a continuous, time-to-event variable that may be right censored. Risk adjustment is accomplished using accelerated failure time regression models. We compare the average run length performance of the RAST CUSUM chart to the risk-adjusted Bernoulli CUSUM chart, using data from cardiac surgeries to motivate the details of the comparison. The comparisons show that the RAST CUSUM chart is more efficient at detecting a sudden decrease in the odds of death than the risk-adjusted Bernoulli CUSUM chart, especially when the fraction of censored observations is not too high. We also discuss the implementation of a prospective monitoring scheme using the RAST CUSUM chart.
Research on 6R Military Logistics Network
NASA Astrophysics Data System (ADS)
Jie, Wan; Wen, Wang
The building of military logistics network is an important issue for the construction of new forces. This paper has thrown out a concept model of 6R military logistics network model based on JIT. Then we conceive of axis spoke y logistics centers network, flexible 6R organizational network, lean 6R military information network based grid. And then the strategy and proposal for the construction of the three sub networks of 6Rmilitary logistics network are given.
Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models
ERIC Educational Resources Information Center
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…
Retro-regression--another important multivariate regression improvement.
Randić, M
2001-01-01
We review the serious problem associated with instabilities of the coefficients of regression equations, referred to as the MRA (multivariate regression analysis) "nightmare of the first kind". This is manifested when in a stepwise regression a descriptor is included or excluded from a regression. The consequence is an unpredictable change of the coefficients of the descriptors that remain in the regression equation. We follow with consideration of an even more serious problem, referred to as the MRA "nightmare of the second kind", arising when optimal descriptors are selected from a large pool of descriptors. This process typically causes at different steps of the stepwise regression a replacement of several previously used descriptors by new ones. We describe a procedure that resolves these difficulties. The approach is illustrated on boiling points of nonanes which are considered (1) by using an ordered connectivity basis; (2) by using an ordering resulting from application of greedy algorithm; and (3) by using an ordering derived from an exhaustive search for optimal descriptors. A novel variant of multiple regression analysis, called retro-regression (RR), is outlined showing how it resolves the ambiguities associated with both "nightmares" of the first and the second kind of MRA. PMID:11410035
Nowcasting sunshine number using logistic modeling
NASA Astrophysics Data System (ADS)
Brabec, Marek; Badescu, Viorel; Paulescu, Marius
2013-04-01
In this paper, we present a formalized approach to statistical modeling of the sunshine number, binary indicator of whether the Sun is covered by clouds introduced previously by Badescu (Theor Appl Climatol 72:127-136, 2002). Our statistical approach is based on Markov chain and logistic regression and yields fully specified probability models that are relatively easily identified (and their unknown parameters estimated) from a set of empirical data (observed sunshine number and sunshine stability number series). We discuss general structure of the model and its advantages, demonstrate its performance on real data and compare its results to classical ARIMA approach as to a competitor. Since the model parameters have clear interpretation, we also illustrate how, e.g., their inter-seasonal stability can be tested. We conclude with an outlook to future developments oriented to construction of models allowing for practically desirable smooth transition between data observed with different frequencies and with a short discussion of technical problems that such a goal brings.
Logistics Handbook, 1976. Colorado Outward Bound School.
ERIC Educational Resources Information Center
Colorado Outward Bound School, Denver.
Logistics, a support mission, is vital to the successful operation of the Colorado Outward Bound School (COBS) courses. Logistics is responsible for purchasing, maintaining, transporting, and replenishing a wide variety of items, i.e., food, mountaineering and camping equipment, medical and other supplies, and vehicles. The Logistics coordinator…
ERIC Educational Resources Information Center
Liu, Xing
2008-01-01
The proportional odds (PO) model, which is also called cumulative odds model (Agresti, 1996, 2002 ; Armstrong & Sloan, 1989; Long, 1997, Long & Freese, 2006; McCullagh, 1980; McCullagh & Nelder, 1989; Powers & Xie, 2000; O'Connell, 2006), is one of the most commonly used models for the analysis of ordinal categorical data and comes from the class…
ERIC Educational Resources Information Center
Kellermeyer, Steven Bruce
2011-01-01
In the last few decades high-stakes testing has become more political than educational. The Districts within Arizona are bound by the mandates of both AZ LEARNS and the No Child Left Behind Act of 2001. At the time of this writing, both legislative mandates relied on the Arizona Instrument for Measuring Standards (AIMS) as State Tests for gauging…
Narumalani, S.; Jensen, J.R.; Althausen, J.D.; Burkhalter, S.; Mackey, H.E. Jr.
1994-06-01
Since aquatic macrophytes have an important influence on the physical and chemical processes of an ecosystem while simultaneously affecting human activity, it is imperative that they be inventoried and managed wisely. However, mapping wetlands can be a major challenge because they are found in diverse geographic areas ranging from small tributary streams, to shrub or scrub and marsh communities, to open water lacustrian environments. In addition, the type and spatial distribution of wetlands can change dramatically from season to season, especially when nonpersistent species are present. This research, focuses on developing a model for predicting the future growth and distribution of aquatic macrophytes. This model will use a geographic information system (GIS) to analyze some of the biophysical variables that affect aquatic macrophyte growth and distribution. The data will provide scientists information on the future spatial growth and distribution of aquatic macrophytes. This study focuses on the Savannah River Site Par Pond (1,000 ha) and L Lake (400 ha) these are two cooling ponds that have received thermal effluent from nuclear reactor operations. Par Pond was constructed in 1958, and natural invasion of wetland has occurred over its 35-year history, with much of the shoreline having developed extensive beds of persistent and non-persistent aquatic macrophytes.
ERIC Educational Resources Information Center
Miller, Thomas E.; Herreid, Charlene H.
2009-01-01
This is the fifth in a series of articles describing an attrition prediction and intervention project at the University of South Florida (USF) in Tampa. The project was originally presented in the 83(2) issue (Miller 2007). The statistical model for predicting attrition was described in the 83(3) issue (Miller and Herreid 2008). The methods and…
ERIC Educational Resources Information Center
Del Prette, Zilda Aparecida Pereira; Prette, Almir Del; De Oliveira, Lael Almeida; Gresham, Frank M.; Vance, Michael J.
2012-01-01
Social skills are specific behaviors that individuals exhibit in order to successfully complete social tasks whereas social competence represents judgments by significant others that these social tasks have been successfully accomplished. The present investigation identified the best sociobehavioral predictors obtained from different raters…
ERIC Educational Resources Information Center
Schreiber, James B.
The student variables associated with scoring above the international mean on the Third International Mathematics and Science Study (TIMSS) were studied in a group of U.S. students who took advanced mathematics or advanced mathematics and physics. The total sample was 2,349, with 1,158 females and 1,191 males. Formal parent education level and…
Sufficient Sample Size and Power in Multilevel Ordinal Logistic Regression Models
Ali, Amjad; Khan, Sajjad Ahmad; Hussain, Sundas
2016-01-01
For most of the time, biomedical researchers have been dealing with ordinal outcome variable in multilevel models where patients are nested in doctors. We can justifiably apply multilevel cumulative logit model, where the outcome variable represents the mild, severe, and extremely severe intensity of diseases like malaria and typhoid in the form of ordered categories. Based on our simulation conditions, Maximum Likelihood (ML) method is better than Penalized Quasilikelihood (PQL) method in three-category ordinal outcome variable. PQL method, however, performs equally well as ML method where five-category ordinal outcome variable is used. Further, to achieve power more than 0.80, at least 50 groups are required for both ML and PQL methods of estimation. It may be pointed out that, for five-category ordinal response variable model, the power of PQL method is slightly higher than the power of ML method. PMID:27746826
ERIC Educational Resources Information Center
Tang, Hui; Kirk, John; Pienta, Norbert J.
2014-01-01
This paper includes two experiments, one investigating complexity factors in stoichiometry word problems, and the other identifying students' problem-solving protocols by using eye-tracking technology. The word problems used in this study had five different complexity factors, which were randomly assigned by a Web-based tool that we…
A Logistic Regression Analysis of Score Sending and College Matching among High School Students
ERIC Educational Resources Information Center
Oates, Krystle S.
2015-01-01
College decisions are often the result of a variety of influences related to student background characteristics, academic characteristics, college preferences and college aspirations. College counselors recommend that students choose a variety of schools, especially schools where the general student body matches the academic achievement of…
Technology Transfer Automated Retrieval System (TEKTRAN)
Improvement of cold tolerance of winter wheat (Triticum aestivum L.) through breeding methods has been problematic. A better understanding of how individual wheat cultivars respond to components of the freezing process may provide new information that can be used to develop more cold tolerance culti...
Technology Transfer Automated Retrieval System (TEKTRAN)
Improvement of cold tolerance of winter wheat (Triticum aestivum L.) through breeding methods has been problematic. A better understanding of how individual wheat cultivars respond to components of the freezing process may provide new information that can be used to develop more cold tolerance culti...
Logistic regression analysis to predict weaning-to-estrous interval in first-litter gilts
Technology Transfer Automated Retrieval System (TEKTRAN)
Delayed return to estrus after weaning is a significant problem for swine producers. In this study, we investigated the relationships between weaning-to-estrous interval (WEI) and body weight (BW), back fat (BF), plasma leptin (L), glucose (G), albumin (A), urea nitrogen (PUN) concentrations and lit...
ERIC Educational Resources Information Center
Breidenbach, Daniel H.; French, Brian F.
2011-01-01
Many factors can influence a student's decision to withdraw from college. Intervention programs aimed at retention can benefit from understanding the factors related to such decisions, especially in underrepresented groups. The Institutional Integration Scale (IIS) has been suggested as a predictor of student persistence. Accurate prediction of…
NASA Astrophysics Data System (ADS)
Maas, A.; Rottensteiner, F.; Heipke, C.
2016-06-01
Supervised classification of remotely sensed images is a classical method to update topographic geospatial databases. The task requires training data in the form of image data with known class labels, whose generation is time-consuming. To avoid this problem one can use the labels from the outdated database for training. As some of these labels may be wrong due to changes in land cover, one has to use training techniques that can cope with wrong class labels in the training data. In this paper we adapt a label noise tolerant training technique to the problem of database updating. No labelled data other than the existing database are necessary. The resulting label image and transition matrix between the labels can help to update the database and to detect changes between the two time epochs. Our experiments are based on different test areas, using real images with simulated existing databases. Our results show that this method can indeed detect changes that would remain undetected if label noise were not considered in training.
AP® Potential Predicted by PSAT/NMSQT® Scores Using Logistic Regression. Statistical Report 2014-1
ERIC Educational Resources Information Center
Zhang, Xiuyuan; Patel, Priyank; Ewing, Maureen
2014-01-01
AP Potential™ is an educational guidance tool that uses PSAT/NMSQT® scores to identify students who have the potential to do well on one or more Advanced Placement® (AP®) Exams. Students identified as having AP potential, perhaps students who would not have been otherwise identified, should consider enrolling in the corresponding AP course if they…
Exploring Person Fit with an Approach Based on Multilevel Logistic Regression
ERIC Educational Resources Information Center
Walker, A. Adrienne; Engelhard, George, Jr.
2015-01-01
The idea that test scores may not be valid representations of what students know, can do, and should learn next is well known. Person fit provides an important aspect of validity evidence. Person fit analyses at the individual student level are not typically conducted and person fit information is not communicated to educational stakeholders. In…
Mini pressurized logistics module (MPLM)
NASA Astrophysics Data System (ADS)
Vallerani, E.; Brondolo, D.; Basile, L.
1996-06-01
The MPLM Program was initiated through a Memorandum of Understanding (MOU) between the United States' National Aeronautics and Space Administration (NASA) and Italy's ASI, the Italian Space Agency, that was signed on 6 December 1991. The MPLM is a pressurized logistics module that will be used to transport supplies and materials (up to 20,000 lb), including user experiments, between Earth and International Space Station Alpha (ISSA) using the Shuttle, to support active and passive storage, and to provide a habitable environment for two people when docked to the Station. The Italian Space Agency has selected Alenia Spazio to develop MPLM modules that have always been considered a key element for the new International Space Station taking benefit from its design flexibility and consequent possible cost saving based on the maximum utilization of the Shuttle launch capability for any mission. In the frame of the very recent agreement between the U.S. and Russia for cooperation in space, that foresees the utilization of MIR 1 hardware, the Italian MPLM will remain an important element of the logistics system, being the only pressurized module designed for re-entry. Within the new scenario of anticipated Shuttle flights to MIR 1 during Space Station phase 1, MPLM remains a candidate for one or more missions to provide MIR 1 resupply capabilities and advanced ISSA hardware/procedures verification. Based on the concept of Flexible Carriers, Alenia Spazio is providing NASA with three MPLM flight units that can be configured according to the requirements of the Human-Tended Capability (HTC) and Permanent Human Capability (PHC) of the Space Station. Configurability will allow transportation of passive cargo only, or a combination of passive and cold cargo accommodated in R/F racks. Having developed and qualified the baseline configuration with respect to the worst enveloping condition, each unit could be easily configured to the passive or active version depending upon the
Precision Efficacy Analysis for Regression.
ERIC Educational Resources Information Center
Brooks, Gordon P.
When multiple linear regression is used to develop a prediction model, sample size must be large enough to ensure stable coefficients. If the derivation sample size is inadequate, the model may not predict well for future subjects. The precision efficacy analysis for regression (PEAR) method uses a cross- validity approach to select sample sizes…
Can luteal regression be reversed?
Telleria, Carlos M
2006-01-01
The corpus luteum is an endocrine gland whose limited lifespan is hormonally programmed. This debate article summarizes findings of our research group that challenge the principle that the end of function of the corpus luteum or luteal regression, once triggered, cannot be reversed. Overturning luteal regression by pharmacological manipulations may be of critical significance in designing strategies to improve fertility efficacy. PMID:17074090
2015-01-01
Background Over the past 50,000 years, shifts in human-environmental or human-human interactions shaped genetic differences within and among human populations, including variants under positive selection. Shaped by environmental factors, such variants influence the genetics of modern health, disease, and treatment outcome. Because evolutionary processes tend to act on gene regulation, we test whether regulatory variants are under positive selection. We introduce a new approach to enhance detection of genetic markers undergoing positive selection, using conditional entropy to capture recent local selection signals. Results We use conditional logistic regression to compare our Adjusted Haplotype Conditional Entropy (H|H) measure of positive selection to existing positive selection measures. H|H and existing measures were applied to published regulatory variants acting in cis (cis-eQTLs), with conditional logistic regression testing whether regulatory variants undergo stronger positive selection than the surrounding gene. These cis-eQTLs were drawn from six independent studies of genotype and RNA expression. The conditional logistic regression shows that, overall, H|H is substantially more powerful than existing positive-selection methods in identifying cis-eQTLs against other Single Nucleotide Polymorphisms (SNPs) in the same genes. When broken down by Gene Ontology, H|H predictions are particularly strong in some biological process categories, where regulatory variants are under strong positive selection compared to the bulk of the gene, distinct from those GO categories under overall positive selection. . However, cis-eQTLs in a second group of genes lack positive selection signatures detectable by H|H, consistent with ancient short haplotypes compared to the surrounding gene (for example, in innate immunity GO:0042742); under such other modes of selection, H|H would not be expected to be a strong predictor.. These conditional logistic regression models are
Wild bootstrap for quantile regression.
Feng, Xingdong; He, Xuming; Hu, Jianhua
2011-12-01
The existing theory of the wild bootstrap has focused on linear estimators. In this note, we broaden its validity by providing a class of weight distributions that is asymptotically valid for quantile regression estimators. As most weight distributions in the literature lead to biased variance estimates for nonlinear estimators of linear regression, we propose a modification of the wild bootstrap that admits a broader class of weight distributions for quantile regression. A simulation study on median regression is carried out to compare various bootstrap methods. With a simple finite-sample correction, the wild bootstrap is shown to account for general forms of heteroscedasticity in a regression model with fixed design points. PMID:23049133
[Regression grading in gastrointestinal tumors].
Tischoff, I; Tannapfel, A
2012-02-01
Preoperative neoadjuvant chemoradiation therapy is a well-established and essential part of the interdisciplinary treatment of gastrointestinal tumors. Neoadjuvant treatment leads to regressive changes in tumors. To evaluate the histological tumor response different scoring systems describing regressive changes are used and known as tumor regression grading. Tumor regression grading is usually based on the presence of residual vital tumor cells in proportion to the total tumor size. Currently, no nationally or internationally accepted grading systems exist. In general, common guidelines should be used in the pathohistological diagnostics of tumors after neoadjuvant therapy. In particularly, the standard tumor grading will be replaced by tumor regression grading. Furthermore, tumors after neoadjuvant treatment are marked with the prefix "y" in the TNM classification. PMID:22293790
Pazvakawambwa, Lillian; Indongo, Nelago; Kazembe, Lawrence N.
2013-01-01
Background Marriage is a significant event in life-course of individuals, and creates a system that characterizes societal and economic structures. Marital patterns and dynamics over the years have changed a lot, with decreasing proportions of marriage, increased levels of divorce and co-habitation in developing countries. Although, such changes have been reported in African societies including Namibia, they have largely remained unexplained. Objectives and Methods In this paper, we examined trends and patterns of marital status of women of marriageable age: 15 to 49 years, in Namibia using the 1992, 2000 and 2006 Demographic and Health Survey (DHS) data. Trends were established for selected demographic variables. Two binary logistic regression models for ever-married versus never married, and cohabitation versus married were fitted to establish factors associated with such nuptial systems. Further a multinomial logistic regression models, adjusted for bio-demographic and socio-economic variables, were fitted separately for each year, to establish determinants of type of union (never married, married and cohabitation). Results and Conclusions Findings indicate a general change away from marriage, with a shift in singulate mean age at marriage. Cohabitation was prevalent among those less than 30 years of age, the odds were higher in urban areas and increased since 1992. Be as it may marriage remained a persistent nuptiality pattern, and common among the less educated and employed, but lower odds in urban areas. Results from multinomial model suggest that marital status was associated with age at marriage, total children born, region, place of residence, education level and religion. We conclude that marital patterns have undergone significant transformation over the past two decades in Namibia, with a coexistence of traditional marriage framework with co-habitation, and sizeable proportion remaining unmarried to the late 30s. A shift in the singulate mean age is
Transport logistics in pollen tubes.
Chebli, Youssef; Kroeger, Jens; Geitmann, Anja
2013-07-01
Cellular organelles move within the cellular volume and the effect of the resulting drag forces on the liquid causes bulk movement in the cytosol. The movement of both organelles and cytosol leads to an overall motion pattern called cytoplasmic streaming or cyclosis. This streaming enables the active and passive transport of molecules and organelles between cellular compartments. Furthermore, the fusion and budding of vesicles with and from the plasma membrane (exo/endocytosis) allow for transport of material between the inside and the outside of the cell. In the pollen tube, cytoplasmic streaming and exo/endocytosis are very active and fulfill several different functions. In this review, we focus on the logistics of intracellular motion and transport processes as well as their biophysical underpinnings. We discuss various modeling attempts that have been performed to understand both long-distance shuttling and short-distance targeting of organelles. We show how the combination of mechanical and mathematical modeling with cell biological approaches has contributed to our understanding of intracellular transport logistics.
Frank, Laura K.; Jannasch, Franziska; Kröger, Janine; Bedu-Addo, George; Mockenhaupt, Frank P.; Schulze, Matthias B.; Danquah, Ina
2015-01-01
Reduced rank regression (RRR) is an innovative technique to establish dietary patterns related to biochemical risk factors for type 2 diabetes, but has not been applied in sub-Saharan Africa. In a hospital-based case-control study for type 2 diabetes in Kumasi (diabetes cases, 538; controls, 668) dietary intake was assessed by a specific food frequency questionnaire. After random split of our study population, we derived a dietary pattern in the training set using RRR with adiponectin, HDL-cholesterol and triglycerides as responses and 35 food items as predictors. This pattern score was applied to the validation set, and its association with type 2 diabetes was examined by logistic regression. The dietary pattern was characterized by a high consumption of plantain, cassava, and garden egg, and a low intake of rice, juice, vegetable oil, eggs, chocolate drink, sweets, and red meat; the score correlated positively with serum triglycerides and negatively with adiponectin. The multivariate-adjusted odds ratio of type 2 diabetes for the highest quintile compared to the lowest was 4.43 (95% confidence interval: 1.87–10.50, p for trend < 0.001). The identified dietary pattern increases the odds of type 2 diabetes in urban Ghanaians, which is mainly attributed to increased serum triglycerides. PMID:26198248
Risk factors for autistic regression: results of an ambispective cohort study.
Zhang, Ying; Xu, Qiong; Liu, Jing; Li, She-chang; Xu, Xiu
2012-08-01
A subgroup of children diagnosed with autism experience developmental regression featured by a loss of previously acquired abilities. The pathogeny of autistic regression is unknown, although many risk factors likely exist. To better characterize autistic regression and investigate the association between autistic regression and potential influencing factors in Chinese autistic children, we conducted an ambispective study with a cohort of 170 autistic subjects. Analyses by multiple logistic regression showed significant correlations between autistic regression and febrile seizures (OR = 3.53, 95% CI = 1.17-10.65, P = .025), as well as with a family history of neuropsychiatric disorders (OR = 3.62, 95% CI = 1.35-9.71, P = .011). This study suggests that febrile seizures and family history of neuropsychiatric disorders are correlated with autistic regression.
Practical Session: Simple Linear Regression
NASA Astrophysics Data System (ADS)
Clausel, M.; Grégoire, G.
2014-12-01
Two exercises are proposed to illustrate the simple linear regression. The first one is based on the famous Galton's data set on heredity. We use the lm R command and get coefficients estimates, standard error of the error, R2, residuals …In the second example, devoted to data related to the vapor tension of mercury, we fit a simple linear regression, predict values, and anticipate on multiple linear regression. This pratical session is an excerpt from practical exercises proposed by A. Dalalyan at EPNC (see Exercises 1 and 2 of http://certis.enpc.fr/~dalalyan/Download/TP_ENPC_4.pdf).
Accounting for Slipping and Other False Negatives in Logistic Models of Student Learning
ERIC Educational Resources Information Center
MacLellan, Christopher J.; Liu, Ran; Koedinger, Kenneth R.
2015-01-01
Additive Factors Model (AFM) and Performance Factors Analysis (PFA) are two popular models of student learning that employ logistic regression to estimate parameters and predict performance. This is in contrast to Bayesian Knowledge Tracing (BKT) which uses a Hidden Markov Model formalism. While all three models tend to make similar predictions,…
Regional regression of flood characteristics employing historical information
Tasker, Gary D.; Stedinger, J.R.
1987-01-01
Streamflow gauging networks provide hydrologic information for use in estimating the parameters of regional regression models. The regional regression models can be used to estimate flood statistics, such as the 100 yr peak, at ungauged sites as functions of drainage basin characteristics. A recent innovation in regional regression is the use of a generalized least squares (GLS) estimator that accounts for unequal station record lengths and sample cross correlation among the flows. However, this technique does not account for historical flood information. A method is proposed here to adjust this generalized least squares estimator to account for possible information about historical floods available at some stations in a region. The historical information is assumed to be in the form of observations of all peaks above a threshold during a long period outside the systematic record period. A Monte Carlo simulation experiment was performed to compare the GLS estimator adjusted for historical floods with the unadjusted GLS estimator and the ordinary least squares estimator. Results indicate that using the GLS estimator adjusted for historical information significantly improves the regression model. ?? 1987.
Multiple Regression and Its Discontents
ERIC Educational Resources Information Center
Snell, Joel C.; Marsh, Mitchell
2012-01-01
Multiple regression is part of a larger statistical strategy originated by Gauss. The authors raise questions about the theory and suggest some changes that would make room for Mandelbrot and Serendipity.
Regression methods for spatial data
NASA Technical Reports Server (NTRS)
Yakowitz, S. J.; Szidarovszky, F.
1982-01-01
The kriging approach, a parametric regression method used by hydrologists and mining engineers, among others also provides an error estimate the integral of the regression function. The kriging method is explored and some of its statistical characteristics are described. The Watson method and theory are extended so that the kriging features are displayed. Theoretical and computational comparisons of the kriging and Watson approaches are offered.
Wrong Signs in Regression Coefficients
NASA Technical Reports Server (NTRS)
McGee, Holly
1999-01-01
When using parametric cost estimation, it is important to note the possibility of the regression coefficients having the wrong sign. A wrong sign is defined as a sign on the regression coefficient opposite to the researcher's intuition and experience. Some possible causes for the wrong sign discussed in this paper are a small range of x's, leverage points, missing variables, multicollinearity, and computational error. Additionally, techniques for determining the cause of the wrong sign are given.
Basis Selection for Wavelet Regression
NASA Technical Reports Server (NTRS)
Wheeler, Kevin R.; Lau, Sonie (Technical Monitor)
1998-01-01
A wavelet basis selection procedure is presented for wavelet regression. Both the basis and the threshold are selected using cross-validation. The method includes the capability of incorporating prior knowledge on the smoothness (or shape of the basis functions) into the basis selection procedure. The results of the method are demonstrated on sampled functions widely used in the wavelet regression literature. The results of the method are contrasted with other published methods.
Regression Discontinuity Designs in Epidemiology
Moscoe, Ellen; Mutevedzi, Portia; Newell, Marie-Louise; Bärnighausen, Till
2014-01-01
When patients receive an intervention based on whether they score below or above some threshold value on a continuously measured random variable, the intervention will be randomly assigned for patients close to the threshold. The regression discontinuity design exploits this fact to estimate causal treatment effects. In spite of its recent proliferation in economics, the regression discontinuity design has not been widely adopted in epidemiology. We describe regression discontinuity, its implementation, and the assumptions required for causal inference. We show that regression discontinuity is generalizable to the survival and nonlinear models that are mainstays of epidemiologic analysis. We then present an application of regression discontinuity to the much-debated epidemiologic question of when to start HIV patients on antiretroviral therapy. Using data from a large South African cohort (2007–2011), we estimate the causal effect of early versus deferred treatment eligibility on mortality. Patients whose first CD4 count was just below the 200 cells/μL CD4 count threshold had a 35% lower hazard of death (hazard ratio = 0.65 [95% confidence interval = 0.45–0.94]) than patients presenting with CD4 counts just above the threshold. We close by discussing the strengths and limitations of regression discontinuity designs for epidemiology. PMID:25061922
McKenzie, K.R.
1959-07-01
An electrode support which permits accurate alignment and adjustment of the electrode in a plurality of planes and about a plurality of axes in a calutron is described. The support will align the slits in the electrode with the slits of an ionizing chamber so as to provide for the egress of ions. The support comprises an insulator, a leveling plate carried by the insulator and having diametrically opposed attaching screws screwed to the plate and the insulator and diametrically opposed adjusting screws for bearing against the insulator, and an electrode associated with the plate for adjustment therewith.
Vivilaki, Victoria G; Dafermos, Vassilis; Gevorgian, Liana; Dimopoulou, Athanasia; Patelarou, Evridiki; Bick, Debra; Tsopelas, Nicholas D; Lionis, Christos
2012-01-01
The Maternal Adjustment and Maternal Attitudes Scale is a self- administered scale, designed for use in primary care settings to identify postpartum maternal adjustment problems regarding body image, sex, somatic symptoms, and marital relationships. Women were recruited within four weeks of giving birth. Responses to the Maternal Adjustment and Maternal Attitudes Scale were compared for agreement with responses to the Edinburgh Postnatal Depression Scale as a gold standard. Psychometric measurements included: reliability coefficients, explanatory factor analysis, and confirmatory analysis by linear structural relations. A receiver operating characteristic analysis was carried out to evaluate the global functioning of the scale. Of 300 mothers screened, 121 (40.7%) were experiencing difficulties in maternal adjustment and maternal attitudes. Scores on the Maternal Adjustment and Maternal Attitudes Scale correlated well with those on the Edinburgh Postnatal Depression Scale. The internal consistency of the Maternal Adjustment and Maternal Attitudes Scale, Greek version-tested using Cronbach's alpha coefficient-was 0.859, and that of Guttman split-half coefficient was 0.820. Findings confirmed the multidimensionality of the Maternal Adjustment and Maternal Attitudes Scale, demonstrating a six-factor structure. The area under the receiver operating characteristic curve was 0.610, and the logistic estimate for the threshold score of 57/58 fitted the model sensitivity at 68% and model specificity at 64.6%. Data confirmed that the Greek version of the Maternal Adjustment and Maternal Attitudes Scale is a reliable and valid screening tool for both clinical practice and research purposes to detect postpartum adjustment difficulties.
Regression analysis of correlated ordinal data using orthogonalized residuals.
Perin, J; Preisser, J S; Phillips, C; Qaqish, B
2014-12-01
Semi-parametric regression models for the joint estimation of marginal mean and within-cluster pairwise association parameters are used in a variety of settings for population-averaged modeling of multivariate categorical outcomes. Recently, a formulation of alternating logistic regressions based on orthogonalized, marginal residuals has been introduced for correlated binary data. Unlike the original procedure based on conditional residuals, its covariance estimator is invariant to the ordering of observations within clusters. In this article, the orthogonalized residuals method is extended to model correlated ordinal data with a global odds ratio, and shown in a simulation study to be more efficient and less biased with regards to estimating within-cluster association parameters than an existing extension to ordinal data of alternating logistic regressions based on conditional residuals. Orthogonalized residuals are used to estimate a model for three correlated ordinal outcomes measured repeatedly in a longitudinal clinical trial of an intervention to improve recovery of patients' perception of altered sensation following jaw surgery.
Predicting Fusarium head blight epidemics with boosted regression trees.
Shah, D A; De Wolf, E D; Paul, P A; Madden, L V
2014-07-01
Predicting major Fusarium head blight (FHB) epidemics allows for the judicious use of fungicides in suppressing disease development. Our objectives were to investigate the utility of boosted regression trees (BRTs) for predictive modeling of FHB epidemics in the United States, and to compare the predictive performances of the BRT models with those of logistic regression models we had developed previously. The data included 527 FHB observations from 15 states over 26 years. BRTs were fit to a training data set of 369 FHB observations, in which FHB epidemics were classified as either major (severity ≥ 10%) or non-major (severity < 10%), linked to a predictor matrix consisting of 350 weather-based variables and categorical variables for wheat type (spring or winter), presence or absence of corn residue, and cultivar resistance. Predictive performance was estimated on a test (holdout) data set consisting of the remaining 158 observations. BRTs had a misclassification rate of 0.23 on the test data, which was 31% lower than the average misclassification rate over 15 logistic regression models we had presented earlier. The strongest predictors were generally one of mean daily relative humidity, mean daily temperature, and the number of hours in which the temperature was between 9 and 30°C and relative humidity ≥ 90% simultaneously. Moreover, the predicted risk of major epidemics increased substantially when mean daily relative humidity rose above 70%, which is a lower threshold than previously modeled for most plant pathosystems. BRTs led to novel insights into the weather-epidemic relationship.
Optimal distributions for multiplex logistic networks
NASA Astrophysics Data System (ADS)
Solá Conde, Luis E.; Used, Javier; Romance, Miguel
2016-06-01
This paper presents some mathematical models for distribution of goods in logistic networks based on spectral analysis of complex networks. Given a steady distribution of a finished product, some numerical algorithms are presented for computing the weights in a multiplex logistic network that reach the equilibrium dynamics with high convergence rate. As an application, the logistic networks of Germany and Spain are analyzed in terms of their convergence rates.
Optimal distributions for multiplex logistic networks.
Solá Conde, Luis E; Used, Javier; Romance, Miguel
2016-06-01
This paper presents some mathematical models for distribution of goods in logistic networks based on spectral analysis of complex networks. Given a steady distribution of a finished product, some numerical algorithms are presented for computing the weights in a multiplex logistic network that reach the equilibrium dynamics with high convergence rate. As an application, the logistic networks of Germany and Spain are analyzed in terms of their convergence rates.
Optimal distributions for multiplex logistic networks.
Solá Conde, Luis E; Used, Javier; Romance, Miguel
2016-06-01
This paper presents some mathematical models for distribution of goods in logistic networks based on spectral analysis of complex networks. Given a steady distribution of a finished product, some numerical algorithms are presented for computing the weights in a multiplex logistic network that reach the equilibrium dynamics with high convergence rate. As an application, the logistic networks of Germany and Spain are analyzed in terms of their convergence rates. PMID:27368801
Integrated Computer System of Management in Logistics
NASA Astrophysics Data System (ADS)
Chwesiuk, Krzysztof
2011-06-01
This paper aims at presenting a concept of an integrated computer system of management in logistics, particularly in supply and distribution chains. Consequently, the paper includes the basic idea of the concept of computer-based management in logistics and components of the system, such as CAM and CIM systems in production processes, and management systems for storage, materials flow, and for managing transport, forwarding and logistics companies. The platform which integrates computer-aided management systems is that of electronic data interchange.
Attachment style and adjustment to divorce.
Yárnoz-Yaben, Sagrario
2010-05-01
Divorce is becoming increasingly widespread in Europe. In this study, I present an analysis of the role played by attachment style (secure, dismissing, preoccupied and fearful, plus the dimensions of anxiety and avoidance) in the adaptation to divorce. Participants comprised divorced parents (N = 40) from a medium-sized city in the Basque Country. The results reveal a lower proportion of people with secure attachment in the sample group of divorcees. Attachment style and dependence (emotional and instrumental) are closely related. I have also found associations between measures that showed a poor adjustment to divorce and the preoccupied and fearful attachment styles. Adjustment is related to a dismissing attachment style and to the avoidance dimension. Multiple regression analysis confirmed that secure attachment and the avoidance dimension predict adjustment to divorce and positive affectivity while preoccupied attachment and the anxiety dimension predicted negative affectivity. Implications for research and interventions with divorcees are discussed.