NASA Astrophysics Data System (ADS)
Grégoire, G.
2014-12-01
The logistic regression originally is intended to explain the relationship between the probability of an event and a set of covariables. The model's coefficients can be interpreted via the odds and odds ratio, which are presented in introduction of the chapter. The observations are possibly got individually, then we speak of binary logistic regression. When they are grouped, the logistic regression is said binomial. In our presentation we mainly focus on the binary case. For statistical inference the main tool is the maximum likelihood methodology: we present the Wald, Rao and likelihoods ratio results and their use to compare nested models. The problems we intend to deal with are essentially the same as in multiple linear regression: testing global effect, individual effect, selection of variables to build a model, measure of the fitness of the model, prediction of new values… . The methods are demonstrated on data sets using R. Finally we briefly consider the binomial case and the situation where we are interested in several events, that is the polytomous (multinomial) logistic regression and the particular case of ordinal logistic regression.
Algamal, Zakariya Yahya; Lee, Muhammad Hisyam
2015-12-01
Cancer classification and gene selection in high-dimensional data have been popular research topics in genetics and molecular biology. Recently, adaptive regularized logistic regression using the elastic net regularization, which is called the adaptive elastic net, has been successfully applied in high-dimensional cancer classification to tackle both estimating the gene coefficients and performing gene selection simultaneously. The adaptive elastic net originally used elastic net estimates as the initial weight, however, using this weight may not be preferable for certain reasons: First, the elastic net estimator is biased in selecting genes. Second, it does not perform well when the pairwise correlations between variables are not high. Adjusted adaptive regularized logistic regression (AAElastic) is proposed to address these issues and encourage grouping effects simultaneously. The real data results indicate that AAElastic is significantly consistent in selecting genes compared to the other three competitor regularization methods. Additionally, the classification performance of AAElastic is comparable to the adaptive elastic net and better than other regularization methods. Thus, we can conclude that AAElastic is a reliable adaptive regularized logistic regression method in the field of high-dimensional cancer classification.
Lyles, Robert H; Tang, Li; Superak, Hillary M; King, Caroline C; Celentano, David D; Lo, Yungtai; Sobel, Jack D
2011-07-01
Misclassification of binary outcome variables is a known source of potentially serious bias when estimating adjusted odds ratios. Although researchers have described frequentist and Bayesian methods for dealing with the problem, these methods have seldom fully bridged the gap between statistical research and epidemiologic practice. In particular, there have been few real-world applications of readily grasped and computationally accessible methods that make direct use of internal validation data to adjust for differential outcome misclassification in logistic regression. In this paper, we illustrate likelihood-based methods for this purpose that can be implemented using standard statistical software. Using main study and internal validation data from the HIV Epidemiology Research Study, we demonstrate how misclassification rates can depend on the values of subject-specific covariates, and we illustrate the importance of accounting for this dependence. Simulation studies confirm the effectiveness of the maximum likelihood approach. We emphasize clear exposition of the likelihood function itself, to permit the reader to easily assimilate appended computer code that facilitates sensitivity analyses as well as the efficient handling of main/external and main/internal validation-study data. These methods are readily applicable under random cross-sectional sampling, and we discuss the extent to which the main/internal analysis remains appropriate under outcome-dependent (case-control) sampling.
Kleinman, Lawrence C; Norton, Edward C
2009-01-01
Objective To develop and validate a general method (called regression risk analysis) to estimate adjusted risk measures from logistic and other nonlinear multiple regression models. We show how to estimate standard errors for these estimates. These measures could supplant various approximations (e.g., adjusted odds ratio [AOR]) that may diverge, especially when outcomes are common. Study Design Regression risk analysis estimates were compared with internal standards as well as with Mantel–Haenszel estimates, Poisson and log-binomial regressions, and a widely used (but flawed) equation to calculate adjusted risk ratios (ARR) from AOR. Data Collection Data sets produced using Monte Carlo simulations. Principal Findings Regression risk analysis accurately estimates ARR and differences directly from multiple regression models, even when confounders are continuous, distributions are skewed, outcomes are common, and effect size is large. It is statistically sound and intuitive, and has properties favoring it over other methods in many cases. Conclusions Regression risk analysis should be the new standard for presenting findings from multiple regression analysis of dichotomous outcomes for cross-sectional, cohort, and population-based case–control studies, particularly when outcomes are common or effect size is large. PMID:18793213
Li, Li; Brumback, Babette A; Weppelmann, Thomas A; Morris, J Glenn; Ali, Afsar
2016-08-15
Motivated by an investigation of the effect of surface water temperature on the presence of Vibrio cholerae in water samples collected from different fixed surface water monitoring sites in Haiti in different months, we investigated methods to adjust for unmeasured confounding due to either of the two crossed factors site and month. In the process, we extended previous methods that adjust for unmeasured confounding due to one nesting factor (such as site, which nests the water samples from different months) to the case of two crossed factors. First, we developed a conditional pseudolikelihood estimator that eliminates fixed effects for the levels of each of the crossed factors from the estimating equation. Using the theory of U-Statistics for independent but non-identically distributed vectors, we show that our estimator is consistent and asymptotically normal, but that its variance depends on the nuisance parameters and thus cannot be easily estimated. Consequently, we apply our estimator in conjunction with a permutation test, and we investigate use of the pigeonhole bootstrap and the jackknife for constructing confidence intervals. We also incorporate our estimator into a diagnostic test for a logistic mixed model with crossed random effects and no unmeasured confounding. For comparison, we investigate between-within models extended to two crossed factors. These generalized linear mixed models include covariate means for each level of each factor in order to adjust for the unmeasured confounding. We conduct simulation studies, and we apply the methods to the Haitian data. Copyright © 2016 John Wiley & Sons, Ltd.
Thomas, Laine; Stefanski, Leonard A.; Davidian, Marie
2013-01-01
In clinical studies, covariates are often measured with error due to biological fluctuations, device error and other sources. Summary statistics and regression models that are based on mismeasured data will differ from the corresponding analysis based on the “true” covariate. Statistical analysis can be adjusted for measurement error, however various methods exhibit a tradeo between convenience and performance. Moment Adjusted Imputation (MAI) is method for measurement error in a scalar latent variable that is easy to implement and performs well in a variety of settings. In practice, multiple covariates may be similarly influenced by biological fluctuastions, inducing correlated multivariate measurement error. The extension of MAI to the setting of multivariate latent variables involves unique challenges. Alternative strategies are described, including a computationally feasible option that is shown to perform well. PMID:24072947
Practical Session: Logistic Regression
NASA Astrophysics Data System (ADS)
Clausel, M.; Grégoire, G.
2014-12-01
An exercise is proposed to illustrate the logistic regression. One investigates the different risk factors in the apparition of coronary heart disease. It has been proposed in Chapter 5 of the book of D.G. Kleinbaum and M. Klein, "Logistic Regression", Statistics for Biology and Health, Springer Science Business Media, LLC (2010) and also by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr341.pdf). This example is based on data given in the file evans.txt coming from http://www.sph.emory.edu/dkleinb/logreg3.htm#data.
[Understanding logistic regression].
El Sanharawi, M; Naudet, F
2013-10-01
Logistic regression is one of the most common multivariate analysis models utilized in epidemiology. It allows the measurement of the association between the occurrence of an event (qualitative dependent variable) and factors susceptible to influence it (explicative variables). The choice of explicative variables that should be included in the logistic regression model is based on prior knowledge of the disease physiopathology and the statistical association between the variable and the event, as measured by the odds ratio. The main steps for the procedure, the conditions of application, and the essential tools for its interpretation are discussed concisely. We also discuss the importance of the choice of variables that must be included and retained in the regression model in order to avoid the omission of important confounding factors. Finally, by way of illustration, we provide an example from the literature, which should help the reader test his or her knowledge.
Steganalysis using logistic regression
NASA Astrophysics Data System (ADS)
Lubenko, Ivans; Ker, Andrew D.
2011-02-01
We advocate Logistic Regression (LR) as an alternative to the Support Vector Machine (SVM) classifiers commonly used in steganalysis. LR offers more information than traditional SVM methods - it estimates class probabilities as well as providing a simple classification - and can be adapted more easily and efficiently for multiclass problems. Like SVM, LR can be kernelised for nonlinear classification, and it shows comparable classification accuracy to SVM methods. This work is a case study, comparing accuracy and speed of SVM and LR classifiers in detection of LSB Matching and other related spatial-domain image steganography, through the state-of-art 686-dimensional SPAM feature set, in three image sets.
Logistic Regression: Concept and Application
ERIC Educational Resources Information Center
Cokluk, Omay
2010-01-01
The main focus of logistic regression analysis is classification of individuals in different groups. The aim of the present study is to explain basic concepts and processes of binary logistic regression analysis intended to determine the combination of independent variables which best explain the membership in certain groups called dichotomous…
Transfer Learning Based on Logistic Regression
NASA Astrophysics Data System (ADS)
Paul, A.; Rottensteiner, F.; Heipke, C.
2015-08-01
In this paper we address the problem of classification of remote sensing images in the framework of transfer learning with a focus on domain adaptation. The main novel contribution is a method for transductive transfer learning in remote sensing on the basis of logistic regression. Logistic regression is a discriminative probabilistic classifier of low computational complexity, which can deal with multiclass problems. This research area deals with methods that solve problems in which labelled training data sets are assumed to be available only for a source domain, while classification is needed in the target domain with different, yet related characteristics. Classification takes place with a model of weight coefficients for hyperplanes which separate features in the transformed feature space. In term of logistic regression, our domain adaptation method adjusts the model parameters by iterative labelling of the target test data set. These labelled data features are iteratively added to the current training set which, at the beginning, only contains source features and, simultaneously, a number of source features are deleted from the current training set. Experimental results based on a test series with synthetic and real data constitutes a first proof-of-concept of the proposed method.
Standards for Standardized Logistic Regression Coefficients
ERIC Educational Resources Information Center
Menard, Scott
2011-01-01
Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…
Should metacognition be measured by logistic regression?
Rausch, Manuel; Zehetleitner, Michael
2017-03-01
Are logistic regression slopes suitable to quantify metacognitive sensitivity, i.e. the efficiency with which subjective reports differentiate between correct and incorrect task responses? We analytically show that logistic regression slopes are independent from rating criteria in one specific model of metacognition, which assumes (i) that rating decisions are based on sensory evidence generated independently of the sensory evidence used for primary task responses and (ii) that the distributions of evidence are logistic. Given a hierarchical model of metacognition, logistic regression slopes depend on rating criteria. According to all considered models, regression slopes depend on the primary task criterion. A reanalysis of previous data revealed that massive numbers of trials are required to distinguish between hierarchical and independent models with tolerable accuracy. It is argued that researchers who wish to use logistic regression as measure of metacognitive sensitivity need to control the primary task criterion and rating criteria.
Satellite rainfall retrieval by logistic regression
NASA Technical Reports Server (NTRS)
Chiu, Long S.
1986-01-01
The potential use of logistic regression in rainfall estimation from satellite measurements is investigated. Satellite measurements provide covariate information in terms of radiances from different remote sensors.The logistic regression technique can effectively accommodate many covariates and test their significance in the estimation. The outcome from the logistical model is the probability that the rainrate of a satellite pixel is above a certain threshold. By varying the thresholds, a rainrate histogram can be obtained, from which the mean and the variant can be estimated. A logistical model is developed and applied to rainfall data collected during GATE, using as covariates the fractional rain area and a radiance measurement which is deduced from a microwave temperature-rainrate relation. It is demonstrated that the fractional rain area is an important covariate in the model, consistent with the use of the so-called Area Time Integral in estimating total rain volume in other studies. To calibrate the logistical model, simulated rain fields generated by rainfield models with prescribed parameters are needed. A stringent test of the logistical model is its ability to recover the prescribed parameters of simulated rain fields. A rain field simulation model which preserves the fractional rain area and lognormality of rainrates as found in GATE is developed. A stochastic regression model of branching and immigration whose solutions are lognormally distributed in some asymptotic limits has also been developed.
Predicting Social Trust with Binary Logistic Regression
ERIC Educational Resources Information Center
Adwere-Boamah, Joseph; Hufstedler, Shirley
2015-01-01
This study used binary logistic regression to predict social trust with five demographic variables from a national sample of adult individuals who participated in The General Social Survey (GSS) in 2012. The five predictor variables were respondents' highest degree earned, race, sex, general happiness and the importance of personally assisting…
Logistic models--an odd(s) kind of regression.
Jupiter, Daniel C
2013-01-01
The logistic regression model bears some similarity to the multivariable linear regression with which we are familiar. However, the differences are great enough to warrant a discussion of the need for and interpretation of logistic regression.
Model selection for logistic regression models
NASA Astrophysics Data System (ADS)
Duller, Christine
2012-09-01
Model selection for logistic regression models decides which of some given potential regressors have an effect and hence should be included in the final model. The second interesting question is whether a certain factor is heterogeneous among some subsets, i.e. whether the model should include a random intercept or not. In this paper these questions will be answered with classical as well as with Bayesian methods. The application show some results of recent research projects in medicine and business administration.
Logistic Regression Applied to Seismic Discrimination
BG Amindan; DN Hagedorn
1998-10-08
The usefulness of logistic discrimination was examined in an effort to learn how it performs in a regional seismic setting. Logistic discrimination provides an easily understood method, works with user-defined models and few assumptions about the population distributions, and handles both continuous and discrete data. Seismic event measurements from a data set compiled by Los Alamos National Laboratory (LANL) of Chinese events recorded at station WMQ were used in this demonstration study. PNNL applied logistic regression techniques to the data. All possible combinations of the Lg and Pg measurements were tried, and a best-fit logistic model was created. The best combination of Lg and Pg frequencies for predicting the source of a seismic event (earthquake or explosion) used Lg{sub 3.0-6.0} and Pg{sub 3.0-6.0} as the predictor variables. A cross-validation test was run, which showed that this model was able to correctly predict 99.7% earthquakes and 98.0% explosions for this given data set. Two other models were identified that used Pg and Lg measurements from the 1.5 to 3.0 Hz frequency range. Although these other models did a good job of correctly predicting the earthquakes, they were not as effective at predicting the explosions. Two possible biases were discovered which affect the predicted probabilities for each outcome. The first bias was due to this being a case-controlled study. The sampling fractions caused a bias in the probabilities that were calculated using the models. The second bias is caused by a change in the proportions for each event. If at a later date the proportions (a priori probabilities) of explosions versus earthquakes change, this would cause a bias in the predicted probability for an event. When using logistic regression, the user needs to be aware of the possible biases and what affect they will have on the predicted probabilities.
Supporting Regularized Logistic Regression Privately and Efficiently.
Li, Wenfa; Liu, Hongzhe; Yang, Peng; Xie, Wei
2016-01-01
As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc.
Supporting Regularized Logistic Regression Privately and Efficiently
Li, Wenfa; Liu, Hongzhe; Yang, Peng; Xie, Wei
2016-01-01
As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc. PMID:27271738
Using Dominance Analysis to Determine Predictor Importance in Logistic Regression
ERIC Educational Resources Information Center
Azen, Razia; Traxel, Nicole
2009-01-01
This article proposes an extension of dominance analysis that allows researchers to determine the relative importance of predictors in logistic regression models. Criteria for choosing logistic regression R[superscript 2] analogues were determined and measures were selected that can be used to perform dominance analysis in logistic regression. A…
Model building strategy for logistic regression: purposeful selection.
Zhang, Zhongheng
2016-03-01
Logistic regression is one of the most commonly used models to account for confounders in medical literature. The article introduces how to perform purposeful selection model building strategy with R. I stress on the use of likelihood ratio test to see whether deleting a variable will have significant impact on model fit. A deleted variable should also be checked for whether it is an important adjustment of remaining covariates. Interaction should be checked to disentangle complex relationship between covariates and their synergistic effect on response variable. Model should be checked for the goodness-of-fit (GOF). In other words, how the fitted model reflects the real data. Hosmer-Lemeshow GOF test is the most widely used for logistic regression model.
[Logistic regression against a divergent Bayesian network].
Sánchez Trujillo, Noel Antonio
2015-02-03
This article is a discussion about two statistical tools used for prediction and causality assessment: logistic regression and Bayesian networks. Using data of a simulated example from a study assessing factors that might predict pulmonary emphysema (where fingertip pigmentation and smoking are considered); we posed the following questions. Is pigmentation a confounding, causal or predictive factor? Is there perhaps another factor, like smoking, that confounds? Is there a synergy between pigmentation and smoking? The results, in terms of prediction, are similar with the two techniques; regarding causation, differences arise. We conclude that, in decision-making, the sum of both: a statistical tool, used with common sense, and previous evidence, taking years or even centuries to develop; is better than the automatic and exclusive use of statistical resources.
On regression adjustment for the propensity score.
Vansteelandt, S; Daniel, R M
2014-10-15
Propensity scores are widely adopted in observational research because they enable adjustment for high-dimensional confounders without requiring models for their association with the outcome of interest. The results of statistical analyses based on stratification, matching or inverse weighting by the propensity score are therefore less susceptible to model extrapolation than those based solely on outcome regression models. This is attractive because extrapolation in outcome regression models may be alarming, yet difficult to diagnose, when the exposed and unexposed individuals have very different covariate distributions. Standard regression adjustment for the propensity score forms an alternative to the aforementioned propensity score methods, but the benefits of this are less clear because it still involves modelling the outcome in addition to the propensity score. In this article, we develop novel insights into the properties of this adjustment method. We demonstrate that standard tests of the null hypothesis of no exposure effect (based on robust variance estimators), as well as particular standardised effects obtained from such adjusted regression models, are robust against misspecification of the outcome model when a propensity score model is correctly specified; they are thus not vulnerable to the aforementioned problem of extrapolation. We moreover propose efficient estimators for these standardised effects, which retain a useful causal interpretation even when the propensity score model is misspecified, provided the outcome regression model is correctly specified.
A Methodology for Generating Placement Rules that Utilizes Logistic Regression
ERIC Educational Resources Information Center
Wurtz, Keith
2008-01-01
The purpose of this article is to provide the necessary tools for institutional researchers to conduct a logistic regression analysis and interpret the results. Aspects of the logistic regression procedure that are necessary to evaluate models are presented and discussed with an emphasis on cutoff values and choosing the appropriate number of…
Preserving Institutional Privacy in Distributed binary Logistic Regression.
Wu, Yuan; Jiang, Xiaoqian; Ohno-Machado, Lucila
2012-01-01
Privacy is becoming a major concern when sharing biomedical data across institutions. Although methods for protecting privacy of individual patients have been proposed, it is not clear how to protect the institutional privacy, which is many times a critical concern of data custodians. Built upon our previous work, Grid Binary LOgistic REgression (GLORE)1, we developed an Institutional Privacy-preserving Distributed binary Logistic Regression model (IPDLR) that considers both individual and institutional privacy for building a logistic regression model in a distributed manner. We tested our method using both simulated and clinical data, showing how it is possible to protect the privacy of individuals and of institutions using a distributed strategy.
Large unbalanced credit scoring using Lasso-logistic regression ensemble.
Wang, Hong; Xu, Qingsong; Zhou, Lifeng
2015-01-01
Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data.
MODELING SNAKE MICROHABITAT FROM RADIOTELEMETRY STUDIES USING POLYTOMOUS LOGISTIC REGRESSION
Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...
Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble
Wang, Hong; Xu, Qingsong; Zhou, Lifeng
2015-01-01
Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data. PMID:25706988
[Unconditioned logistic regression and sample size: a bibliographic review].
Ortega Calvo, Manuel; Cayuela Domínguez, Aurelio
2002-01-01
Unconditioned logistic regression is a highly useful risk prediction method in epidemiology. This article reviews the different solutions provided by different authors concerning the interface between the calculation of the sample size and the use of logistics regression. Based on the knowledge of the information initially provided, a review is made of the customized regression and predictive constriction phenomenon, the design of an ordinal exposition with a binary output, the event of interest per variable concept, the indicator variables, the classic Freeman equation, etc. Some skeptical ideas regarding this subject are also included.
Estimating the exceedance probability of rain rate by logistic regression
NASA Technical Reports Server (NTRS)
Chiu, Long S.; Kedem, Benjamin
1990-01-01
Recent studies have shown that the fraction of an area with rain intensity above a fixed threshold is highly correlated with the area-averaged rain rate. To estimate the fractional rainy area, a logistic regression model, which estimates the conditional probability that rain rate over an area exceeds a fixed threshold given the values of related covariates, is developed. The problem of dependency in the data in the estimation procedure is bypassed by the method of partial likelihood. Analyses of simulated scanning multichannel microwave radiometer and observed electrically scanning microwave radiometer data during the Global Atlantic Tropical Experiment period show that the use of logistic regression in pixel classification is superior to multiple regression in predicting whether rain rate at each pixel exceeds a given threshold, even in the presence of noisy data. The potential of the logistic regression technique in satellite rain rate estimation is discussed.
Differentially private distributed logistic regression using private and public data
2014-01-01
Background Privacy protecting is an important issue in medical informatics and differential privacy is a state-of-the-art framework for data privacy research. Differential privacy offers provable privacy against attackers who have auxiliary information, and can be applied to data mining models (for example, logistic regression). However, differentially private methods sometimes introduce too much noise and make outputs less useful. Given available public data in medical research (e.g. from patients who sign open-consent agreements), we can design algorithms that use both public and private data sets to decrease the amount of noise that is introduced. Methodology In this paper, we modify the update step in Newton-Raphson method to propose a differentially private distributed logistic regression model based on both public and private data. Experiments and results We try our algorithm on three different data sets, and show its advantage over: (1) a logistic regression model based solely on public data, and (2) a differentially private distributed logistic regression model based on private data under various scenarios. Conclusion Logistic regression models built with our new algorithm based on both private and public datasets demonstrate better utility than models that trained on private or public datasets alone without sacrificing the rigorous privacy guarantee. PMID:25079786
Identification of high leverage points in binary logistic regression
NASA Astrophysics Data System (ADS)
Fitrianto, Anwar; Wendy, Tham
2016-10-01
Leverage points are those which measures uncommon observations in x space of regression diagnostics. Detection of high leverage points plays a vital role because it is responsible in masking outlier. In regression, high observations which made at extreme in the space of explanatory variables and they are far apart from the average of the data are called as leverage points. In this project, a method for identification of high leverage point in logistic regression was shown using numerical example. We investigate the effects of high leverage point in the logistic regression model. The comparison of the result in the model with and without leverage model is being discussed. Some graphical analyses based on the result of the analysis are presented. We found that the presence of HLP have effect on the hii, estimated probability, estimated coefficients, p-value of variable, odds ratio and regression equation.
Improving Predictions in Imbalanced Data Using Pairwise Expanded Logistic Regression
Jiang, Xiaoqian; El-Kareh, Robert; Ohno-Machado, Lucila
2011-01-01
Building classifiers for medical problems often involves dealing with rare, but important events. Imbalanced datasets pose challenges to ordinary classification algorithms such as Logistic Regression (LR) and Support Vector Machines (SVM). The lack of effective strategies for dealing with imbalanced training data often results in models that exhibit poor discrimination. We propose a novel approach to estimate class memberships based on the evaluation of pairwise relationships in the training data. The method we propose, Pairwise Expanded Logistic Regression, improved discrimination and had higher accuracy when compared to existing methods in two imbalanced datasets, thus showing promise as a potential remedy for this problem. PMID:22195118
Sugarcane Land Classification with Satellite Imagery using Logistic Regression Model
NASA Astrophysics Data System (ADS)
Henry, F.; Herwindiati, D. E.; Mulyono, S.; Hendryli, J.
2017-03-01
This paper discusses the classification of sugarcane plantation area from Landsat-8 satellite imagery. The classification process uses binary logistic regression method with time series data of normalized difference vegetation index as input. The process is divided into two steps: training and classification. The purpose of training step is to identify the best parameter of the regression model using gradient descent algorithm. The best fit of the model can be utilized to classify sugarcane and non-sugarcane area. The experiment shows high accuracy and successfully maps the sugarcane plantation area which obtained best result of Cohen’s Kappa value 0.7833 (strong) with 89.167% accuracy.
Saddlepoint approximations for small sample logistic regression problems.
Platt, R W
2000-02-15
Double saddlepoint approximations provide quick and accurate approximations to exact conditional tail probabilities in a variety of situations. This paper describes the use of these approximations in two logistic regression problems. An investigation of regression analysis of the log-odds ratio in a sequence or set of 2x2 tables via simulation studies shows that in practical settings the saddlepoint methods closely approximate exact conditional inference. The double saddlepoint approximation in the test for trend in a sequence of binomial random variates is also shown, via simulation studies, to be an effective approximation to exact conditional inference.
Computing group cardinality constraint solutions for logistic regression problems.
Zhang, Yong; Kwon, Dongjin; Pohl, Kilian M
2017-01-01
We derive an algorithm to directly solve logistic regression based on cardinality constraint, group sparsity and use it to classify intra-subject MRI sequences (e.g. cine MRIs) of healthy from diseased subjects. Group cardinality constraint models are often applied to medical images in order to avoid overfitting of the classifier to the training data. Solutions within these models are generally determined by relaxing the cardinality constraint to a weighted feature selection scheme. However, these solutions relate to the original sparse problem only under specific assumptions, which generally do not hold for medical image applications. In addition, inferring clinical meaning from features weighted by a classifier is an ongoing topic of discussion. Avoiding weighing features, we propose to directly solve the group cardinality constraint logistic regression problem by generalizing the Penalty Decomposition method. To do so, we assume that an intra-subject series of images represents repeated samples of the same disease patterns. We model this assumption by combining series of measurements created by a feature across time into a single group. Our algorithm then derives a solution within that model by decoupling the minimization of the logistic regression function from enforcing the group sparsity constraint. The minimum to the smooth and convex logistic regression problem is determined via gradient descent while we derive a closed form solution for finding a sparse approximation of that minimum. We apply our method to cine MRI of 38 healthy controls and 44 adult patients that received reconstructive surgery of Tetralogy of Fallot (TOF) during infancy. Our method correctly identifies regions impacted by TOF and generally obtains statistically significant higher classification accuracy than alternative solutions to this model, i.e., ones relaxing group cardinality constraints.
Logistic Regression Model on Antenna Control Unit Autotracking Mode
2015-10-20
412TW-PA-15240 Logistic Regression Model on Antenna Control Unit Autotracking Mode DANIEL T. LAIRD AIR FORCE TEST CENTER EDWARDS AFB, CA...OCTOBER 20 2015 4 1 2 T W Approved for public release; distribution is unlimited. 412TW-PA-15240 AIR FORCE TEST ...unlimited. 13. SUPPLEMENTARY NOTES CA: Air Force Test Center Edwards AFB CA CC: 012100 14. ABSTRACT Over the past several years
Classifying machinery condition using oil samples and binary logistic regression
NASA Astrophysics Data System (ADS)
Phillips, J.; Cripps, E.; Lau, John W.; Hodkiewicz, M. R.
2015-08-01
The era of big data has resulted in an explosion of condition monitoring information. The result is an increasing motivation to automate the costly and time consuming human elements involved in the classification of machine health. When working with industry it is important to build an understanding and hence some trust in the classification scheme for those who use the analysis to initiate maintenance tasks. Typically "black box" approaches such as artificial neural networks (ANN) and support vector machines (SVM) can be difficult to provide ease of interpretability. In contrast, this paper argues that logistic regression offers easy interpretability to industry experts, providing insight to the drivers of the human classification process and to the ramifications of potential misclassification. Of course, accuracy is of foremost importance in any automated classification scheme, so we also provide a comparative study based on predictive performance of logistic regression, ANN and SVM. A real world oil analysis data set from engines on mining trucks is presented and using cross-validation we demonstrate that logistic regression out-performs the ANN and SVM approaches in terms of prediction for healthy/not healthy engines.
Using logistic regression to estimate delay-discounting functions.
Wileyto, E Paul; Audrain-McGovern, Janet; Epstein, Leonard H; Lerman, Caryn
2004-02-01
The monetary choice questionnaire (MCQ) and similar computer tasks ask preference questions in order to ascertain indifference, the perceived equivalence of immediate versus larger delayed rewards. Indifference data are then fitted with a hyperbolic function, summarizing the decline in perceived value with delay time. We present a fitting method that estimates the hyperbolic parameter k directly from survey responses. Binary preferences are modeled as a function of time (X2) and a transformed reward ratio (X1), yielding logistic regression coefficients beta 2 and beta 1. The hyperbolic parameter emerges as k = beta 2/beta 1, where the logistic predicted p = .5 (the definition of indifference). The MCQ was administered to 1,073 adolescents and was scored using both standard and logistic methods. The means for In(k) were similar (standard, -4.53; logistic, -4.51), and the results were highly correlated (rho = .973). Simulated MCQ data showed that k was unbiased, except where beta 1 > or = -1, indicating a vague survey response. Jackknife standard errors provided excellent coverage.
Comparison of a Bayesian Network with a Logistic Regression Model to Forecast IgA Nephropathy
Ducher, Michel; Kalbacher, Emilie; Combarnous, François; Finaz de Vilaine, Jérome; McGregor, Brigitte; Fouque, Denis; Fauvel, Jean Pierre
2013-01-01
Models are increasingly used in clinical practice to improve the accuracy of diagnosis. The aim of our work was to compare a Bayesian network to logistic regression to forecast IgA nephropathy (IgAN) from simple clinical and biological criteria. Retrospectively, we pooled the results of all biopsies (n = 155) performed by nephrologists in a specialist clinical facility between 2002 and 2009. Two groups were constituted at random. The first subgroup was used to determine the parameters of the models adjusted to data by logistic regression or Bayesian network, and the second was used to compare the performances of the models using receiver operating characteristics (ROC) curves. IgAN was found (on pathology) in 44 patients. Areas under the ROC curves provided by both methods were highly significant but not different from each other. Based on the highest Youden indices, sensitivity reached (100% versus 67%) and specificity (73% versus 95%) using the Bayesian network and logistic regression, respectively. A Bayesian network is at least as efficient as logistic regression to estimate the probability of a patient suffering IgAN, using simple clinical and biological data obtained during consultation. PMID:24328031
Comparison of a Bayesian network with a logistic regression model to forecast IgA nephropathy.
Ducher, Michel; Kalbacher, Emilie; Combarnous, François; Finaz de Vilaine, Jérome; McGregor, Brigitte; Fouque, Denis; Fauvel, Jean Pierre
2013-01-01
Models are increasingly used in clinical practice to improve the accuracy of diagnosis. The aim of our work was to compare a Bayesian network to logistic regression to forecast IgA nephropathy (IgAN) from simple clinical and biological criteria. Retrospectively, we pooled the results of all biopsies (n = 155) performed by nephrologists in a specialist clinical facility between 2002 and 2009. Two groups were constituted at random. The first subgroup was used to determine the parameters of the models adjusted to data by logistic regression or Bayesian network, and the second was used to compare the performances of the models using receiver operating characteristics (ROC) curves. IgAN was found (on pathology) in 44 patients. Areas under the ROC curves provided by both methods were highly significant but not different from each other. Based on the highest Youden indices, sensitivity reached (100% versus 67%) and specificity (73% versus 95%) using the Bayesian network and logistic regression, respectively. A Bayesian network is at least as efficient as logistic regression to estimate the probability of a patient suffering IgAN, using simple clinical and biological data obtained during consultation.
Detecting nonsense for Chinese comments based on logistic regression
NASA Astrophysics Data System (ADS)
Zhuolin, Ren; Guang, Chen; Shu, Chen
2016-07-01
To understand cyber citizens' opinion accurately from Chinese news comments, the clear definition on nonsense is present, and a detection model based on logistic regression (LR) is proposed. The detection of nonsense can be treated as a binary-classification problem. Besides of traditional lexical features, we propose three kinds of features in terms of emotion, structure and relevance. By these features, we train an LR model and demonstrate its effect in understanding Chinese news comments. We find that each of proposed features can significantly promote the result. In our experiments, we achieve a prediction accuracy of 84.3% which improves the baseline 77.3% by 7%.
Landslide Hazard Mapping in Rwanda Using Logistic Regression
NASA Astrophysics Data System (ADS)
Piller, A.; Anderson, E.; Ballard, H.
2015-12-01
Landslides in the United States cause more than $1 billion in damages and 50 deaths per year (USGS 2014). Globally, figures are much more grave, yet monitoring, mapping and forecasting of these hazards are less than adequate. Seventy-five percent of the population of Rwanda earns a living from farming, mostly subsistence. Loss of farmland, housing, or life, to landslides is a very real hazard. Landslides in Rwanda have an impact at the economic, social, and environmental level. In a developing nation that faces challenges in tracking, cataloging, and predicting the numerous landslides that occur each year, satellite imagery and spatial analysis allow for remote study. We have focused on the development of a landslide inventory and a statistical methodology for assessing landslide hazards. Using logistic regression on approximately 30 test variables (i.e. slope, soil type, land cover, etc.) and a sample of over 200 landslides, we determine which variables are statistically most relevant to landslide occurrence in Rwanda. A preliminary predictive hazard map for Rwanda has been produced, using the variables selected from the logistic regression analysis.
Parental Vaccine Acceptance: A Logistic Regression Model Using Previsit Decisions.
Lee, Sara; Riley-Behringer, Maureen; Rose, Jeanmarie C; Meropol, Sharon B; Lazebnik, Rina
2016-10-26
This study explores how parents' intentions regarding vaccination prior to their children's visit were associated with actual vaccine acceptance. A convenience sample of parents accompanying 6-week-old to 17-year-old children completed a written survey at 2 pediatric practices. Using hierarchical logistic regression, for hospital-based participants (n = 216), vaccine refusal history (P < .01) and vaccine decision made before the visit (P < .05) explained 87% of vaccine refusals. In community-based participants (n = 100), vaccine refusal history (P < .01) explained 81% of refusals. Over 1 in 5 parents changed their minds about vaccination during the visit. Thirty parents who were previous vaccine refusers accepted current vaccines, and 37 who had intended not to vaccinate choose vaccination. Twenty-nine parents without a refusal history declined vaccines, and 32 who did not intend to refuse before the visit declined vaccination. Future research should identify key factors to nudge parent decision making in favor of vaccination.
Confidence interval of difference of proportions in logistic regression in presence of covariates.
Reeve, Russell
2016-03-16
Comparison of treatment differences in incidence rates is an important objective of many clinical trials. However, often the proportion is affected by covariates, and the adjustment of the predicted proportion is made using logistic regression. It is desirable to estimate the treatment differences in proportions adjusting for the covariates, similarly to the comparison of adjusted means in analysis of variance. Because of the correlation between the point estimates in the different treatment groups, the standard methods for constructing confidence intervals are inadequate. The problem is more difficult in the binary case, as the comparison is not uniquely defined, and the sampling distribution more difficult to analyze. Four procedures for analyzing the data are presented, which expand upon existing methods and generalize the link function. It is shown that, among the four methods studied, the resampling method based on the exact distribution function yields a coverage rate closest to the nominal.
Simultaneous confidence bands for log-logistic regression with applications in risk assessment.
Kerns, Lucy X
2017-01-27
In risk assessment, it is often desired to make inferences on the low dose levels at which a specific benchmark risk is attained. Applications of simultaneous hyperbolic confidence bands for low-dose risk estimation with quantal data under different dose-response models (multistage, Abbott-adjusted Weibull, and Abbott-adjusted log-logistic models) have appeared in the literature. The use of simultaneous three-segment bands under the multistage model has also been proposed recently. In this article, we present explicit formulas for constructing asymptotic one-sided simultaneous hyperbolic and three-segment bands for the simple log-logistic regression model. We use the simultaneous construction to estimate upper hyperbolic and three-segment confidence bands on extra risk and to obtain lower limits on the benchmark dose by inverting the upper bands on risk under the Abbott-adjusted log-logistic model. Monte Carlo simulations evaluate the characteristics of the simultaneous limits. An example is given to illustrate the use of the proposed methods and to compare the two types of simultaneous limits at very low dose levels.
Metadata for ReVA logistic regression dataset
LaMotte, Andrew E.
2004-01-01
The U.S. Geological Survey in cooperation with the U.S. Environmental Protection Agency's Regional Vulnerability Assessment Program, has developed a set of statistical tools to support regional-scale, ground-water quality and vulnerability assessments. The Regional Vulnerability Assessment Program goals are to develop and demonstrate approaches to comprehensive, regional-scale assessments that effectively inform water-resources managers and decision-makers as to the magnitude, extent, distribution, and uncertainty of current and anticipated environmental risks. The U.S. Geological Survey is developing and exploring the use of statistical probability models to characterize the relation between ground-water quality and geographic factors in the Mid-Atlantic Region. Available water-quality data obtained from U.S. Geological Survey National Water-Quality Assessment Program studies conducted in the Mid-Atlantic Region were used in association with geographic data (land cover, geology, soils, and others) to develop logistic-regression equations that use explanatory variables to predict the presence of a selected water-quality parameter exceeding specified management concentration thresholds. The resulting logistic-regression equations were transformed to determine the probability, P(X), of a water-quality parameter exceeding a specified management threshold. Additional statistical procedures modified by the U.S. Geological Survey were used to compare the observed values to model-predicted values at each sample point. In addition, procedures to evaluate the confidence of the model predictions and estimate the uncertainty of the probability value were developed and applied. The resulting logistic-regression models were applied to the Mid-Atlantic Region to predict the spatial probability of nitrate concentrations exceeding specified management thresholds. These thresholds are usually set or established by regulators or managers at national or local levels. At management
On the Usefulness of a Multilevel Logistic Regression Approach to Person-Fit Analysis
ERIC Educational Resources Information Center
Conijn, Judith M.; Emons, Wilco H. M.; van Assen, Marcel A. L. M.; Sijtsma, Klaas
2011-01-01
The logistic person response function (PRF) models the probability of a correct response as a function of the item locations. Reise (2000) proposed to use the slope parameter of the logistic PRF as a person-fit measure. He reformulated the logistic PRF model as a multilevel logistic regression model and estimated the PRF parameters from this…
The crux of the method: assumptions in ordinary least squares and logistic regression.
Long, Rebecca G
2008-10-01
Logistic regression has increasingly become the tool of choice when analyzing data with a binary dependent variable. While resources relating to the technique are widely available, clear discussions of why logistic regression should be used in place of ordinary least squares regression are difficult to find. The current paper compares and contrasts the assumptions of ordinary least squares with those of logistic regression and explains why logistic regression's looser assumptions make it adept at handling violations of the more important assumptions in ordinary least squares.
Risk factors for temporomandibular disorder: Binary logistic regression analysis
Magalhães, Bruno G.; de-Sousa, Stéphanie T.; de Mello, Victor V C.; da-Silva-Barbosa, André C.; de-Assis-Morais, Mariana P L.; Barbosa-Vasconcelos, Márcia M V.
2014-01-01
Objectives: To analyze the influence of socioeconomic and demographic factors (gender, economic class, age and marital status) on the occurrence of temporomandibular disorder. Study Design: One hundred individuals from urban areas in the city of Recife (Brazil) registered at Family Health Units was examined using Axis I of the Research Diagnostic Criteria for Temporomandibular Disorders (RDC/TMD) which addresses myofascial pain and joint problems (disc displacement, arthralgia, osteoarthritis and oesteoarthrosis). The Brazilian Economic Classification Criteria (CCEB) was used for the collection of socioeconomic and demographic data. Then, it was categorized as Class A (high social class), Classes B/C (middle class) and Classes D/E (very poor social class). The results were analyzed using Pearson’s chi-square test for proportions, Fisher’s exact test, nonparametric Mann-Whitney test and Binary logistic regression analysis. Results: None of the participants belonged to Class A, 72% belonged to Classes B/C and 28% belonged to Classes D/E. The multivariate analysis revealed that participants from Classes D/E had a 4.35-fold greater chance of exhibiting myofascial pain and 11.3-fold greater chance of exhibiting joint problems. Conclusions: Poverty is a important condition to exhibit myofascial pain and joint problems. Key words:Temporomandibular joint disorders, risk factors, prevalence. PMID:24316706
Application of Bayesian logistic regression to mining biomedical data.
Avali, Viji R; Cooper, Gregory F; Gopalakrishnan, Vanathi
2014-01-01
Mining high dimensional biomedical data with existing classifiers is challenging and the predictions are often inaccurate. We investigated the use of Bayesian Logistic Regression (B-LR) for mining such data to predict and classify various disease conditions. The analysis was done on twelve biomedical datasets with binary class variables and the performance of B-LR was compared to those from other popular classifiers on these datasets with 10-fold cross validation using the WEKA data mining toolkit. The statistical significance of the results was analyzed by paired two tailed t-tests and non-parametric Wilcoxon signed-rank tests. We observed overall that B-LR with non-informative Gaussian priors performed on par with other classifiers in terms of accuracy, balanced accuracy and AUC. These results suggest that it is worthwhile to explore the application of B-LR to predictive modeling tasks in bioinformatics using informative biological prior probabilities. With informative prior probabilities, we conjecture that the performance of B-LR will improve.
Autoregressive logistic regression applied to atmospheric circulation patterns
NASA Astrophysics Data System (ADS)
Guanche, Y.; Mínguez, R.; Méndez, F. J.
2014-01-01
Autoregressive logistic regression models have been successfully applied in medical and pharmacology research fields, and in simple models to analyze weather types. The main purpose of this paper is to introduce a general framework to study atmospheric circulation patterns capable of dealing simultaneously with: seasonality, interannual variability, long-term trends, and autocorrelation of different orders. To show its effectiveness on modeling performance, daily atmospheric circulation patterns identified from observed sea level pressure fields over the Northeastern Atlantic, have been analyzed using this framework. Model predictions are compared with probabilities from the historical database, showing very good fitting diagnostics. In addition, the fitted model is used to simulate the evolution over time of atmospheric circulation patterns using Monte Carlo method. Simulation results are statistically consistent with respect to the historical sequence in terms of (1) probability of occurrence of the different weather types, (2) transition probabilities and (3) persistence. The proposed model constitutes an easy-to-use and powerful tool for a better understanding of the climate system.
Predicting β-Turns in Protein Using Kernel Logistic Regression
Elbashir, Murtada Khalafallah; Sheng, Yu; Wang, Jianxin; Wu, FangXiang; Li, Min
2013-01-01
A β-turn is a secondary protein structure type that plays a significant role in protein configuration and function. On average 25% of amino acids in protein structures are located in β-turns. It is very important to develope an accurate and efficient method for β-turns prediction. Most of the current successful β-turns prediction methods use support vector machines (SVMs) or neural networks (NNs). The kernel logistic regression (KLR) is a powerful classification technique that has been applied successfully in many classification problems. However, it is often not found in β-turns classification, mainly because it is computationally expensive. In this paper, we used KLR to obtain sparse β-turns prediction in short evolution time. Secondary structure information and position-specific scoring matrices (PSSMs) are utilized as input features. We achieved Qtotal of 80.7% and MCC of 50% on BT426 dataset. These results show that KLR method with the right algorithm can yield performance equivalent to or even better than NNs and SVMs in β-turns prediction. In addition, KLR yields probabilistic outcome and has a well-defined extension to multiclass case. PMID:23509793
ERIC Educational Resources Information Center
Chen, Chau-Kuang
2005-01-01
Logistic and Cox regression methods are practical tools used to model the relationships between certain student learning outcomes and their relevant explanatory variables. The logistic regression model fits an S-shaped curve into a binary outcome with data points of zero and one. The Cox regression model allows investigators to study the duration…
Barros, Aluísio JD; Hirakata, Vânia N
2003-01-01
Background Cross-sectional studies with binary outcomes analyzed by logistic regression are frequent in the epidemiological literature. However, the odds ratio can importantly overestimate the prevalence ratio, the measure of choice in these studies. Also, controlling for confounding is not equivalent for the two measures. In this paper we explore alternatives for modeling data of such studies with techniques that directly estimate the prevalence ratio. Methods We compared Cox regression with constant time at risk, Poisson regression and log-binomial regression against the standard Mantel-Haenszel estimators. Models with robust variance estimators in Cox and Poisson regressions and variance corrected by the scale parameter in Poisson regression were also evaluated. Results Three outcomes, from a cross-sectional study carried out in Pelotas, Brazil, with different levels of prevalence were explored: weight-for-age deficit (4%), asthma (31%) and mother in a paid job (52%). Unadjusted Cox/Poisson regression and Poisson regression with scale parameter adjusted by deviance performed worst in terms of interval estimates. Poisson regression with scale parameter adjusted by χ2 showed variable performance depending on the outcome prevalence. Cox/Poisson regression with robust variance, and log-binomial regression performed equally well when the model was correctly specified. Conclusions Cox or Poisson regression with robust variance and log-binomial regression provide correct estimates and are a better alternative for the analysis of cross-sectional studies with binary outcomes than logistic regression, since the prevalence ratio is more interpretable and easier to communicate to non-specialists than the odds ratio. However, precautions are needed to avoid estimation problems in specific situations. PMID:14567763
What Are the Odds of that? A Primer on Understanding Logistic Regression
ERIC Educational Resources Information Center
Huang, Francis L.; Moon, Tonya R.
2013-01-01
The purpose of this Methodological Brief is to present a brief primer on logistic regression, a commonly used technique when modeling dichotomous outcomes. Using data from the National Education Longitudinal Study of 1988 (NELS:88), logistic regression techniques were used to investigate student-level variables in eighth grade (i.e., enrolled in a…
Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning
ERIC Educational Resources Information Center
Li, Zhushan
2014-01-01
Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…
ERIC Educational Resources Information Center
Koon, Sharon; Petscher, Yaacov
2015-01-01
The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…
Sub-resolution assist feature (SRAF) printing prediction using logistic regression
NASA Astrophysics Data System (ADS)
Tan, Chin Boon; Koh, Kar Kit; Zhang, Dongqing; Foong, Yee Mei
2015-03-01
In optical proximity correction (OPC), the sub-resolution assist feature (SRAF) has been used to enhance the process window of main structures. However, the printing of SRAF on wafer is undesirable as this may adversely degrade the overall process yield if it is transferred into the final pattern. A reasonably accurate prediction model is needed during OPC to ensure that the SRAF placement and size have no risk of SRAF printing. Current common practice in OPC is either using the main OPC model or model threshold adjustment (MTA) solution to predict the SRAF printing. This paper studies the feasibility of SRAF printing prediction using logistic regression (LR). Logistic regression is a probabilistic classification model that gives discrete binary outputs after receiving sufficient input variables from SRAF printing conditions. In the application of SRAF printing prediction, the binary outputs can be treated as 1 for SRAFPrinting and 0 for No-SRAF-Printing. The experimental work was performed using a 20nm line/space process layer. The results demonstrate that the accuracy of SRAF printing prediction using LR approach outperforms MTA solution. Overall error rate of as low as calibration 2% and verification 5% was achieved by LR approach compared to calibration 6% and verification 15% for MTA solution. In addition, the performance of LR approach was found to be relatively independent and consistent across different resist image planes compared to MTA solution.
Modeling data for pancreatitis in presence of a duodenal diverticula using logistic regression
NASA Astrophysics Data System (ADS)
Dineva, S.; Prodanova, K.; Mlachkova, D.
2013-12-01
The presence of a periampullary duodenal diverticulum (PDD) is often observed during upper digestive tract barium meal studies and endoscopic retrograde cholangiopancreatography (ERCP). A few papers reported that the diverticulum had something to do with the incidence of pancreatitis. The aim of this study is to investigate if the presence of duodenal diverticula predisposes to the development of a pancreatic disease. A total 3966 patients who had undergone ERCP were studied retrospectively. They were divided into 2 groups-with and without PDD. Patients with a duodenal diverticula had a higher rate of acute pancreatitis. The duodenal diverticula is a risk factor for acute idiopathic pancreatitis. A multiple logistic regression to obtain adjusted estimate of odds and to identify if a PDD is a predictor of acute or chronic pancreatitis was performed. The software package STATISTICA 10.0 was used for analyzing the real data.
Schörgendorfer, Angela; Branscum, Adam J; Hanson, Timothy E
2013-06-01
Logistic regression is a popular tool for risk analysis in medical and population health science. With continuous response data, it is common to create a dichotomous outcome for logistic regression analysis by specifying a threshold for positivity. Fitting a linear regression to the nondichotomized response variable assuming a logistic sampling model for the data has been empirically shown to yield more efficient estimates of odds ratios than ordinary logistic regression of the dichotomized endpoint. We illustrate that risk inference is not robust to departures from the parametric logistic distribution. Moreover, the model assumption of proportional odds is generally not satisfied when the condition of a logistic distribution for the data is violated, leading to biased inference from a parametric logistic analysis. We develop novel Bayesian semiparametric methodology for testing goodness of fit of parametric logistic regression with continuous measurement data. The testing procedures hold for any cutoff threshold and our approach simultaneously provides the ability to perform semiparametric risk estimation. Bayes factors are calculated using the Savage-Dickey ratio for testing the null hypothesis of logistic regression versus a semiparametric generalization. We propose a fully Bayesian and a computationally efficient empirical Bayesian approach to testing, and we present methods for semiparametric estimation of risks, relative risks, and odds ratios when parametric logistic regression fails. Theoretical results establish the consistency of the empirical Bayes test. Results from simulated data show that the proposed approach provides accurate inference irrespective of whether parametric assumptions hold or not. Evaluation of risk factors for obesity shows that different inferences are derived from an analysis of a real data set when deviations from a logistic distribution are permissible in a flexible semiparametric framework.
Rastegari, Azam; Haghdoost, Ali Akbar; Baneshi, Mohammad Reza
2013-01-01
Background Due to the importance of medical studies, researchers of this field should be familiar with various types of statistical analyses to select the most appropriate method based on the characteristics of their data sets. Classification and regression trees (CARTs) can be as complementary to regression models. We compared the performance of a logistic regression model and a CART in predicting drug injection among prisoners. Methods Data of 2720 Iranian prisoners was studied to determine the factors influencing drug injection. The collected data was divided into two groups of training and testing. A logistic regression model and a CART were applied on training data. The performance of the two models was then evaluated on testing data. Findings The regression model and the CART had 8 and 4 significant variables, respectively. Overall, heroin use, history of imprisonment, age at first drug use, and marital status were important factors in determining the history of drug injection. Subjects without the history of heroin use or heroin users with short-term imprisonment were at lower risk of drug injection. Among heroin addicts with long-term imprisonment, individuals with higher age at first drug use and married subjects were at lower risk of drug injection. Although the logistic regression model was more sensitive than the CART, the two models had the same levels of specificity and classification accuracy. Conclusion In this study, both sensitivity and specificity were important. While the logistic regression model had better performance, the graphical presentation of the CART simplifies the interpretation of the results. In general, a combination of different analytical methods is recommended to explore the effects of variables. PMID:24494152
Geographical variation of unmet medical needs in Italy: a multivariate logistic regression analysis
2013-01-01
Background Unmet health needs should be, in theory, a minor issue in Italy where a publicly funded and universally accessible health system exists. This, however, does not seem to be the case. Moreover, in the last two decades responsibilities for health care have been progressively decentralized to regional governments, which have differently organized health service delivery within their territories. Regional decision-making has affected the use of health care services, further increasing the existing geographical disparities in the access to care across the country. This study aims at comparing self-perceived unmet needs across Italian regions and assessing how the reported reasons - grouped into the categories of availability, accessibility and acceptability – vary geographically. Methods Data from the 2006 Italian component of the European Union Statistics on Income and Living Conditions are employed to explore reasons and predictors of self-reported unmet medical needs among 45,175 Italian respondents aged 18 and over. Multivariate logistic regression models are used to determine adjusted rates for overall unmet medical needs and for each of the three categories of reasons. Results Results show that, overall, 6.9% of the Italian population stated having experienced at least one unmet medical need during the last 12 months. The unadjusted rates vary markedly across regions, thus resulting in a clear-cut north–south divide (4.6% in the North-East vs. 10.6% in the South). Among those reporting unmet medical needs, the leading reason was problems of accessibility related to cost or transportation (45.5%), followed by acceptability (26.4%) and availability due to the presence of too long waiting lists (21.4%). In the South, more than one out of two individuals with an unmet need refrained from seeing a physician due to economic reasons. In the northern regions, working and family responsibilities contribute relatively more to the underutilization of medical
Comparison of artificial neural networks with logistic regression for detection of obesity.
Heydari, Seyed Taghi; Ayatollahi, Seyed Mohammad Taghi; Zare, Najaf
2012-08-01
Obesity is a common problem in nutrition, both in the developed and developing countries. The aim of this study was to classify obesity by artificial neural networks and logistic regression. This cross-sectional study comprised of 414 healthy military personnel in southern Iran. All subjects completed questionnaires on their socio-economic status and their anthropometric measures were measured by a trained nurse. Classification of obesity was done by artificial neural networks and logistic regression. The mean age±SD of participants was 34.4 ± 7.5 years. A total of 187 (45.2%) were obese. In regard to logistic regression and neural networks the respective values were 80.2% and 81.2% when correctly classified, 80.2 and 79.7 for sensitivity and 81.9 and 83.7 for specificity; while the area under Receiver-Operating Characteristic (ROC) curve were 0.888 and 0.884 and the Kappa statistic were 0.600 and 0.629 for logistic regression and neural networks model respectively. We conclude that the neural networks and logistic regression both were good classifier for obesity detection but they were not significantly different in classification.
Binary logistic regression-Instrument for assessing museum indoor air impact on exhibits.
Bucur, Elena; Danet, Andrei Florin; Lehr, Carol Blaziu; Lehr, Elena; Nita-Lazar, Mihai
2017-04-01
This paper presents a new way to assess the environmental impact on historical artifacts using binary logistic regression. The prediction of the impact on the exhibits during certain pollution scenarios (environmental impact) was calculated by a mathematical model based on the binary logistic regression; it allows the identification of those environmental parameters from a multitude of possible parameters with a significant impact on exhibitions and ranks them according to their severity effect. Air quality (NO2, SO2, O3 and PM2.5) and microclimate parameters (temperature, humidity) monitoring data from a case study conducted within exhibition and storage spaces of the Romanian National Aviation Museum Bucharest have been used for developing and validating the binary logistic regression method and the mathematical model. The logistic regression analysis was used on 794 data combinations (715 to develop of the model and 79 to validate it) by a Statistical Package for Social Sciences (SPSS 20.0). The results from the binary logistic regression analysis demonstrated that from six parameters taken into consideration, four of them present a significant effect upon exhibits in the following order: O3>PM2.5>NO2>humidity followed at a significant distance by the effects of SO2 and temperature. The mathematical model, developed in this study, correctly predicted 95.1 % of the cumulated effect of the environmental parameters upon the exhibits. Moreover, this model could also be used in the decisional process regarding the preventive preservation measures that should be implemented within the exhibition space.
NASA Astrophysics Data System (ADS)
Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.
2014-07-01
Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.
[Clinical research XX. From clinical judgment to multiple logistic regression model].
Berea-Baltierra, Ricardo; Rivas-Ruiz, Rodolfo; Pérez-Rodríguez, Marcela; Palacios-Cruz, Lino; Moreno, Jorge; Talavera, Juan O
2014-01-01
The complexity of the causality phenomenon in clinical practice implies that the result of a maneuver is not solely caused by the maneuver, but by the interaction among the maneuver and other baseline factors or variables occurring during the maneuver. This requires methodological designs that allow the evaluation of these variables. When the outcome is a binary variable, we use the multiple logistic regression model (MLRM). This multivariate model is useful when we want to predict or explain, adjusting due to the effect of several risk factors, the effect of a maneuver or exposition over the outcome. In order to perform an MLRM, the outcome or dependent variable must be a binary variable and both categories must mutually exclude each other (i.e. live/death, healthy/ill); on the other hand, independent variables or risk factors may be either qualitative or quantitative. The effect measure obtained from this model is the odds ratio (OR) with 95 % confidence intervals (CI), from which we can estimate the proportion of the outcome's variability explained through the risk factors. For these reasons, the MLRM is used in clinical research, since one of the main objectives in clinical practice comprises the ability to predict or explain an event where different risk or prognostic factors are taken into account.
Bakhtiyari, Mahmood; Mehmandar, Mohammad Reza; Mirbagheri, Babak; Hariri, Gholam Reza; Delpisheh, Ali; Soori, Hamid
2014-01-01
Risk factors of human-related traffic crashes are the most important and preventable challenges for community health due to their noteworthy burden in developing countries in particular. The present study aims to investigate the role of human risk factors of road traffic crashes in Iran. Through a cross-sectional study using the COM 114 data collection forms, the police records of almost 600,000 crashes occurred in 2010 are investigated. The binary logistic regression and proportional odds regression models are used. The odds ratio for each risk factor is calculated. These models are adjusted for known confounding factors including age, sex and driving time. The traffic crash reports of 537,688 men (90.8%) and 54,480 women (9.2%) are analysed. The mean age is 34.1 ± 14 years. Not maintaining eyes on the road (53.7%) and losing control of the vehicle (21.4%) are the main causes of drivers' deaths in traffic crashes within cities. Not maintaining eyes on the road is also the most frequent human risk factor for road traffic crashes out of cities. Sudden lane excursion (OR = 9.9, 95% CI: 8.2-11.9) and seat belt non-compliance (OR = 8.7, CI: 6.7-10.1), exceeding authorised speed (OR = 17.9, CI: 12.7-25.1) and exceeding safe speed (OR = 9.7, CI: 7.2-13.2) are the most significant human risk factors for traffic crashes in Iran. The high mortality rate of 39 people for every 100,000 population emphasises on the importance of traffic crashes in Iran. Considering the important role of human risk factors in traffic crashes, struggling efforts are required to control dangerous driving behaviours such as exceeding speed, illegal overtaking and not maintaining eyes on the road.
Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping
2013-01-01
Background Complex binary traits are influenced by many factors including the main effects of many quantitative trait loci (QTLs), the epistatic effects involving more than one QTLs, environmental effects and the effects of gene-environment interactions. Although a number of QTL mapping methods for binary traits have been developed, there still lacks an efficient and powerful method that can handle both main and epistatic effects of a relatively large number of possible QTLs. Results In this paper, we use a Bayesian logistic regression model as the QTL model for binary traits that includes both main and epistatic effects. Our logistic regression model employs hierarchical priors for regression coefficients similar to the ones used in the Bayesian LASSO linear model for multiple QTL mapping for continuous traits. We develop efficient empirical Bayesian algorithms to infer the logistic regression model. Our simulation study shows that our algorithms can easily handle a QTL model with a large number of main and epistatic effects on a personal computer, and outperform five other methods examined including the LASSO, HyperLasso, BhGLM, RVM and the single-QTL mapping method based on logistic regression in terms of power of detection and false positive rate. The utility of our algorithms is also demonstrated through analysis of a real data set. A software package implementing the empirical Bayesian algorithms in this paper is freely available upon request. Conclusions The EBLASSO logistic regression method can handle a large number of effects possibly including the main and epistatic QTL effects, environmental effects and the effects of gene-environment interactions. It will be a very useful tool for multiple QTLs mapping for complex binary traits. PMID:23410082
The purpose of this report is to provide a reference manual that could be used by investigators for making informed use of logistic regression using two methods (standard logistic regression and MARS). The details for analyses of relationships between a dependent binary response ...
Use and interpretation of logistic regression in habitat-selection studies
Keating, Kim A.; Cherry, Steve
2004-01-01
Logistic regression is an important tool for wildlife habitat-selection studies, but the method frequently has been misapplied due to an inadequate understanding of the logistic model, its interpretation, and the influence of sampling design. To promote better use of this method, we review its application and interpretation under 3 sampling designs: random, case-control, and use-availability. Logistic regression is appropriate for habitat use-nonuse studies employing random sampling and can be used to directly model the conditional probability of use in such cases. Logistic regression also is appropriate for studies employing case-control sampling designs, but careful attention is required to interpret results correctly. Unless bias can be estimated or probability of use is small for all habitats, results of case-control studies should be interpreted as odds ratios, rather than probability of use or relative probability of use. When data are gathered under a use-availability design, logistic regression can be used to estimate approximate odds ratios if probability of use is small, at least on average. More generally, however, logistic regression is inappropriate for modeling habitat selection in use-availability studies. In particular, using logistic regression to fit the exponential model of Manly et al. (2002:100) does not guarantee maximum-likelihood estimates, valid probabilities, or valid likelihoods. We show that the resource selection function (RSF) commonly used for the exponential model is proportional to a logistic discriminant function. Thus, it may be used to rank habitats with respect to probability of use and to identify important habitat characteristics or their surrogates, but it is not guaranteed to be proportional to probability of use. Other problems associated with the exponential model also are discussed. We describe an alternative model based on Lancaster and Imbens (1996) that offers a method for estimating conditional probability of use in
ERIC Educational Resources Information Center
Ozechowski, Timothy J.; Turner, Charles W.; Hops, Hyman
2007-01-01
This article demonstrates the use of mixed-effects logistic regression (MLR) for conducting sequential analyses of binary observational data. MLR is a special case of the mixed-effects logit modeling framework, which may be applied to multicategorical observational data. The MLR approach is motivated in part by G. A. Dagne, G. W. Howe, C. H.…
Detecting DIF in Polytomous Items Using MACS, IRT and Ordinal Logistic Regression
ERIC Educational Resources Information Center
Elosua, Paula; Wells, Craig
2013-01-01
The purpose of the present study was to compare the Type I error rate and power of two model-based procedures, the mean and covariance structure model (MACS) and the item response theory (IRT), and an observed-score based procedure, ordinal logistic regression, for detecting differential item functioning (DIF) in polytomous items. A simulation…
ERIC Educational Resources Information Center
Rudner, Lawrence
2016-01-01
In the machine learning literature, it is commonly accepted as fact that as calibration sample sizes increase, Naïve Bayes classifiers initially outperform Logistic Regression classifiers in terms of classification accuracy. Applied to subtests from an on-line final examination and from a highly regarded certification examination, this study shows…
A Note on Three Statistical Tests in the Logistic Regression DIF Procedure
ERIC Educational Resources Information Center
Paek, Insu
2012-01-01
Although logistic regression became one of the well-known methods in detecting differential item functioning (DIF), its three statistical tests, the Wald, likelihood ratio (LR), and score tests, which are readily available under the maximum likelihood, do not seem to be consistently distinguished in DIF literature. This paper provides a clarifying…
Comparison of IRT Likelihood Ratio Test and Logistic Regression DIF Detection Procedures
ERIC Educational Resources Information Center
Atar, Burcu; Kamata, Akihito
2011-01-01
The Type I error rates and the power of IRT likelihood ratio test and cumulative logit ordinal logistic regression procedures in detecting differential item functioning (DIF) for polytomously scored items were investigated in this Monte Carlo simulation study. For this purpose, 54 simulation conditions (combinations of 3 sample sizes, 2 sample…
ERIC Educational Resources Information Center
Anderson, Carolyn J.; Verkuilen, Jay; Peyton, Buddy L.
2010-01-01
Survey items with multiple response categories and multiple-choice test questions are ubiquitous in psychological and educational research. We illustrate the use of log-multiplicative association (LMA) models that are extensions of the well-known multinomial logistic regression model for multiple dependent outcome variables to reanalyze a set of…
Risk Factors of Falls in Community-Dwelling Older Adults: Logistic Regression Tree Analysis
ERIC Educational Resources Information Center
Yamashita, Takashi; Noe, Douglas A.; Bailer, A. John
2012-01-01
Purpose of the Study: A novel logistic regression tree-based method was applied to identify fall risk factors and possible interaction effects of those risk factors. Design and Methods: A nationally representative sample of American older adults aged 65 years and older (N = 9,592) in the Health and Retirement Study 2004 and 2006 modules was used.…
ERIC Educational Resources Information Center
Schumacher, Phyllis; Olinsky, Alan; Quinn, John; Smith, Richard
2010-01-01
The authors extended previous research by 2 of the authors who conducted a study designed to predict the successful completion of students enrolled in an actuarial program. They used logistic regression to determine the probability of an actuarial student graduating in the major or dropping out. They compared the results of this study with those…
Propensity Score Estimation with Data Mining Techniques: Alternatives to Logistic Regression
ERIC Educational Resources Information Center
Keller, Bryan S. B.; Kim, Jee-Seon; Steiner, Peter M.
2013-01-01
Propensity score analysis (PSA) is a methodological technique which may correct for selection bias in a quasi-experiment by modeling the selection process using observed covariates. Because logistic regression is well understood by researchers in a variety of fields and easy to implement in a number of popular software packages, it has…
Multiple Logistic Regression Analysis of Cigarette Use among High School Students
ERIC Educational Resources Information Center
Adwere-Boamah, Joseph
2011-01-01
A binary logistic regression analysis was performed to predict high school students' cigarette smoking behavior from selected predictors from 2009 CDC Youth Risk Behavior Surveillance Survey. The specific target student behavior of interest was frequent cigarette use. Five predictor variables included in the model were: a) race, b) frequency of…
Monte Carlo Evaluation of Two-Level Logistic Regression for Assessing Person Fit
ERIC Educational Resources Information Center
Woods, Carol M.
2008-01-01
Person fit is the degree to which an item response model fits for individual examinees. Reise (2000) described how two-level logistic regression can be used to detect heterogeneity in person fit, evaluate potential predictors of person fit heterogeneity, and identify potentially aberrant individuals. The method has apparently never been applied to…
ERIC Educational Resources Information Center
Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza
2014-01-01
This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…
ERIC Educational Resources Information Center
Ferrer, Alvaro J. Arce; Wang, Lin
This study compared the classification performance among parametric discriminant analysis, nonparametric discriminant analysis, and logistic regression in a two-group classification application. Field data from an organizational survey were analyzed and bootstrapped for additional exploration. The data were observed to depart from multivariate…
An Additional Measure of Overall Effect Size for Logistic Regression Models
ERIC Educational Resources Information Center
Allen, Jeff; Le, Huy
2008-01-01
Users of logistic regression models often need to describe the overall predictive strength, or effect size, of the model's predictors. Analogs of R[superscript 2] have been developed, but none of these measures are interpretable on the same scale as effects of individual predictors. Furthermore, R[superscript 2] analogs are not invariant to the…
Comparison of Two Approaches for Handling Missing Covariates in Logistic Regression
ERIC Educational Resources Information Center
Peng, Chao-Ying Joanne; Zhu, Jin
2008-01-01
For the past 25 years, methodological advances have been made in missing data treatment. Most published work has focused on missing data in dependent variables under various conditions. The present study seeks to fill the void by comparing two approaches for handling missing data in categorical covariates in logistic regression: the…
ERIC Educational Resources Information Center
French, Brian F.; Maller, Susan J.
2007-01-01
Two unresolved implementation issues with logistic regression (LR) for differential item functioning (DIF) detection include ability purification and effect size use. Purification is suggested to control inaccuracies in DIF detection as a result of DIF items in the ability estimate. Additionally, effect size use may be beneficial in controlling…
ERIC Educational Resources Information Center
Courtney, Jon R.; Prophet, Retta
2011-01-01
Placement instability is often associated with a number of negative outcomes for children. To gain state level contextual knowledge of factors associated with placement stability/instability, logistic regression was applied to selected variables from the New Mexico Adoption and Foster Care Administrative Reporting System dataset. Predictors…
Predictive Discriminant Analysis Versus Logistic Regression in Two-Group Classification Problems.
ERIC Educational Resources Information Center
Meshbane, Alice; Morris, John D.
A method for comparing the cross-validated classification accuracies of predictive discriminant analysis and logistic regression classification models is presented under varying data conditions for the two-group classification problem. With this method, separate-group, as well as total-sample proportions of the correct classifications, can be…
Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De
2014-01-01
As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.
Goo, Yeong-Jia James; Shen, Zone-De
2014-01-01
As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338
Demenais, F M
1991-01-01
Statistical models have been developed to delineate the major-gene and non-major-gene factors accounting for the familial aggregation of complex diseases. The mixed model assumes an underlying liability to the disease, to which a major gene, a multifactorial component, and random environment contribute independently. Affection is defined by a threshold on the liability scale. The regressive logistic models assume that the logarithm of the odds of being affected is a linear function of major genotype, phenotypes of antecedents and other covariates. An equivalence between these two approaches cannot be derived analytically. I propose a formulation of the regressive logistic models on the supposition of an underlying liability model of disease. Relatives are assumed to have correlated liabilities to the disease; affected persons have liabilities exceeding an estimable threshold. Under the assumption that the correlation structure of the relatives' liabilities follows a regressive model, the regression coefficients on antecedents are expressed in terms of the relevant familial correlations. A parsimonious parameterization is a consequence of the assumed liability model, and a one-to-one correspondence with the parameters of the mixed model can be established. The logits, derived under the class A regressive model and under the class D regressive model, can be extended to include a large variety of patterns of family dependence, as well as gene-environment interactions. PMID:1897524
Assessing Longitudinal Change: Adjustment for Regression to the Mean Effects
ERIC Educational Resources Information Center
Rocconi, Louis M.; Ethington, Corinna A.
2009-01-01
Pascarella (J Coll Stud Dev 47:508-520, 2006) has called for an increase in use of longitudinal data with pretest-posttest design when studying effects on college students. However, such designs that use multiple measures to document change are vulnerable to an important threat to internal validity, regression to the mean. Herein, we discuss a…
Coercively Adjusted Auto Regression Model for Forecasting in Epilepsy EEG
Kim, Sun-Hee; Faloutsos, Christos; Yang, Hyung-Jeong
2013-01-01
Recently, data with complex characteristics such as epilepsy electroencephalography (EEG) time series has emerged. Epilepsy EEG data has special characteristics including nonlinearity, nonnormality, and nonperiodicity. Therefore, it is important to find a suitable forecasting method that covers these special characteristics. In this paper, we propose a coercively adjusted autoregression (CA-AR) method that forecasts future values from a multivariable epilepsy EEG time series. We use the technique of random coefficients, which forcefully adjusts the coefficients with −1 and 1. The fractal dimension is used to determine the order of the CA-AR model. We applied the CA-AR method reflecting special characteristics of data to forecast the future value of epilepsy EEG data. Experimental results show that when compared to previous methods, the proposed method can forecast faster and accurately. PMID:23710252
Phillips, Kirk T.; Street, W. Nick
2005-01-01
The purpose of this study is to determine the best prediction of heart failure outcomes, resulting from two methods -- standard epidemiologic analysis with logistic regression and knowledge discovery with supervised learning/data mining. Heart failure was chosen for this study as it exhibits higher prevalence and cost of treatment than most other hospitalized diseases. The prevalence of heart failure has exceeded 4 million cases in the U.S.. Findings of this study should be useful for the design of quality improvement initiatives, as particular aspects of patient comorbidity and treatment are found to be associated with mortality. This is also a proof of concept study, considering the feasibility of emerging health informatics methods of data mining in conjunction with or in lieu of traditional logistic regression methods of prediction. Findings may also support the design of decision support systems and quality improvement programming for other diseases. PMID:16779367
Hierarchical Bayesian Logistic Regression to forecast metabolic control in type 2 DM patients.
Dagliati, Arianna; Malovini, Alberto; Decata, Pasquale; Cogni, Giulia; Teliti, Marsida; Sacchi, Lucia; Cerra, Carlo; Chiovato, Luca; Bellazzi, Riccardo
2016-01-01
In this work we present our efforts in building a model able to forecast patients' changes in clinical conditions when repeated measurements are available. In this case the available risk calculators are typically not applicable. We propose a Hierarchical Bayesian Logistic Regression model, which allows taking into account individual and population variability in model parameters estimate. The model is used to predict metabolic control and its variation in type 2 diabetes mellitus. In particular we have analyzed a population of more than 1000 Italian type 2 diabetic patients, collected within the European project Mosaic. The results obtained in terms of Matthews Correlation Coefficient are significantly better than the ones gathered with standard logistic regression model, based on data pooling.
Hierarchical Bayesian Logistic Regression to forecast metabolic control in type 2 DM patients
Dagliati, Arianna; Malovini, Alberto; Decata, Pasquale; Cogni, Giulia; Teliti, Marsida; Sacchi, Lucia; Cerra, Carlo; Chiovato, Luca; Bellazzi, Riccardo
2016-01-01
In this work we present our efforts in building a model able to forecast patients’ changes in clinical conditions when repeated measurements are available. In this case the available risk calculators are typically not applicable. We propose a Hierarchical Bayesian Logistic Regression model, which allows taking into account individual and population variability in model parameters estimate. The model is used to predict metabolic control and its variation in type 2 diabetes mellitus. In particular we have analyzed a population of more than 1000 Italian type 2 diabetic patients, collected within the European project Mosaic. The results obtained in terms of Matthews Correlation Coefficient are significantly better than the ones gathered with standard logistic regression model, based on data pooling. PMID:28269842
Wang, Shuang; Jiang, Xiaoqian; Wu, Yuan; Cui, Lijuan; Cheng, Samuel; Ohno-Machado, Lucila
2013-01-01
We developed an EXpectation Propagation LOgistic REgRession (EXPLORER) model for distributed privacy-preserving online learning. The proposed framework provides a high level guarantee for protecting sensitive information, since the information exchanged between the server and the client is the encrypted posterior distribution of coefficients. Through experimental results, EXPLORER shows the same performance (e.g., discrimination, calibration, feature selection etc.) as the traditional frequentist Logistic Regression model, but provides more flexibility in model updating. That is, EXPLORER can be updated one point at a time rather than having to retrain the entire data set when new observations are recorded. The proposed EXPLORER supports asynchronized communication, which relieves the participants from coordinating with one another, and prevents service breakdown from the absence of participants or interrupted communications. PMID:23562651
Assessing the effects of different types of covariates for binary logistic regression
NASA Astrophysics Data System (ADS)
Hamid, Hamzah Abdul; Wah, Yap Bee; Xie, Xian-Jin; Rahman, Hezlin Aryani Abd
2015-02-01
It is well known that the type of data distribution in the independent variable(s) may affect many statistical procedures. This paper investigates and illustrates the effect of different types of covariates on the parameter estimation of a binary logistic regression model. A simulation study with different sample sizes and different types of covariates (uniform, normal, skewed) was carried out. Results showed that parameter estimation of binary logistic regression model is severely overestimated when sample size is less than 150 for covariate which have normal and uniform distribution while the parameter is underestimated when the distribution of covariate is skewed. Parameter estimation improves for all types of covariates when sample size is large, that is at least 500.
Predicting Student Success on the Texas Chemistry STAAR Test: A Logistic Regression Analysis
ERIC Educational Resources Information Center
Johnson, William L.; Johnson, Annabel M.; Johnson, Jared
2012-01-01
Background: The context is the new Texas STAAR end-of-course testing program. Purpose: The authors developed a logistic regression model to predict who would pass-or-fail the new Texas chemistry STAAR end-of-course exam. Setting: Robert E. Lee High School (5A) with an enrollment of 2700 students, Tyler, Texas. Date of the study was the 2011-2012…
Xie, Fei; Yang, Houpu; Wang, Shu; Zhou, Bo; Tong, Fuzhong; Yang, Deqi; Zhang, Jiaqing
2012-01-01
Nodal staging in breast cancer is a key predictor of prognosis. This paper presents the results of potential clinicopathological predictors of axillary lymph node involvement and develops an efficient prediction model to assist in predicting axillary lymph node metastases. Seventy patients with primary early breast cancer who underwent axillary dissection were evaluated. Univariate and multivariate logistic regression were performed to evaluate the association between clinicopathological factors and lymph node metastatic status. A logistic regression predictive model was built from 50 randomly selected patients; the model was also applied to the remaining 20 patients to assess its validity. Univariate analysis showed a significant relationship between lymph node involvement and absence of nm-23 (p = 0.010) and Kiss-1 (p = 0.001) expression. Absence of Kiss-1 remained significantly associated with positive axillary node status in the multivariate analysis (p = 0.018). Seven clinicopathological factors were involved in the multivariate logistic regression model: menopausal status, tumor size, ER, PR, HER2, nm-23 and Kiss-1. The model was accurate and discriminating, with an area under the receiver operating characteristic curve of 0.702 when applied to the validation group. Moreover, there is a need discover more specific candidate proteins and molecular biology tools to select more variables which should improve predictive accuracy.
Xie, Fei; Yang, Houpu; Wang, Shu; Zhou, Bo; Tong, Fuzhong; Yang, Deqi; Zhang, Jiaqing
2012-01-01
Nodal staging in breast cancer is a key predictor of prognosis. This paper presents the results of potential clinicopathological predictors of axillary lymph node involvement and develops an efficient prediction model to assist in predicting axillary lymph node metastases. Seventy patients with primary early breast cancer who underwent axillary dissection were evaluated. Univariate and multivariate logistic regression were performed to evaluate the association between clinicopathological factors and lymph node metastatic status. A logistic regression predictive model was built from 50 randomly selected patients; the model was also applied to the remaining 20 patients to assess its validity. Univariate analysis showed a significant relationship between lymph node involvement and absence of nm-23 (p = 0.010) and Kiss-1 (p = 0.001) expression. Absence of Kiss-1 remained significantly associated with positive axillary node status in the multivariate analysis (p = 0.018). Seven clinicopathological factors were involved in the multivariate logistic regression model: menopausal status, tumor size, ER, PR, HER2, nm-23 and Kiss-1. The model was accurate and discriminating, with an area under the receiver operating characteristic curve of 0.702 when applied to the validation group. Moreover, there is a need discover more specific candidate proteins and molecular biology tools to select more variables which should improve predictive accuracy. PMID:23012578
Lin, H. Y.; Desmond, R.; Liu, Y. H.; Bridges, S. L.; Soong, S. J.
2013-01-01
Summary Many complex disease traits are observed to be associated with single nucleotide polymorphism (SNP) interactions. In testing small-scale SNP-SNP interactions, variable selection procedures in logistic regressions are commonly used. The empirical evidence of variable selection for testing interactions in logistic regressions is limited. This simulation study was designed to compare nine variable selection procedures in logistic regressions for testing SNP-SNP interactions. Data on 10 SNPs were simulated for 400 and 1000 subjects (case/control ratio=1). The simulated model included one main effect and two 2-way interactions. The variable selection procedures included automatic selection (stepwise, forward and backward), common 2-step selection, AIC- and BIC-based selection. The hierarchical rule effect, in which all main effects and lower order terms of the highest-order interaction term are included in the model regardless of their statistical significance, was also examined. We found that the stepwise variable selection without the hierarchical rule which had reasonably high authentic (true positive) proportion and low noise (false positive) proportion, is a better method compared to other variable selection procedures. The procedure without the hierarchical rule requires fewer terms in testing interactions, so it can accommodate more SNPs than the procedure with the hierarchical rule. For testing interactions, the procedures without the hierarchical rule had higher authentic proportion and lower noise proportion compared with ones with the hierarchical rule. These variable selection procedures were also applied and compared in a rheumatoid arthritis study. PMID:18231122
A comparative study on entrepreneurial attitudes modeled with logistic regression and Bayes nets.
López Puga, Jorge; García García, Juan
2012-11-01
Entrepreneurship research is receiving increasing attention in our context, as entrepreneurs are key social agents involved in economic development. We compare the success of the dichotomic logistic regression model and the Bayes simple classifier to predict entrepreneurship, after manipulating the percentage of missing data and the level of categorization in predictors. A sample of undergraduate university students (N = 1230) completed five scales (motivation, attitude towards business creation, obstacles, deficiencies, and training needs) and we found that each of them predicted different aspects of the tendency to business creation. Additionally, our results show that the receiver operating characteristic (ROC) curve is affected by the rate of missing data in both techniques, but logistic regression seems to be more vulnerable when faced with missing data, whereas Bayes nets underperform slightly when categorization has been manipulated. Our study sheds light on the potential entrepreneur profile and we propose to use Bayesian networks as an additional alternative to overcome the weaknesses of logistic regression when missing data are present in applied research.
Direkvand-Moghadam, Ashraf; Suhrabi, Zainab; Akbari, Malihe
2016-01-01
Background Female sexual dysfunction, which can occur during any stage of a normal sexual activity, is a serious condition for individuals and couples. The present study aimed to determine the prevalence and predictive factors of female sexual dysfunction in women referred to health centers in Ilam, the Western Iran, in 2014. Methods In the present cross-sectional study, 444 women who attended health centers in Ilam were enrolled from May to September 2014. Participants were selected according to the simple random sampling method. Univariate and multivariate logistic regression analyses were used to predict the risk factors of female sexual dysfunction. Diffe rences with an alpha error of 0.05 were regarded as statistically significant. Results Overall, 75.9% of the study population exhibited sexual dysfunction. Univariate logistic regression analysis demonstrated that there was a significant association between female sexual dysfunction and age, menarche age, gravidity, parity, and education (P<0.05). Multivariate logistic regression analysis indicated that, menarche age (odds ratio, 1.26), education level (odds ratio, 1.71), and gravida (odds ratio, 1.59) were independent predictive variables for female sexual dysfunction. Conclusion The majority of Iranian women suffer from sexual dysfunction. A lack of awareness of Iranian women's sexual pleasure and formal training on sexual function and its influencing factors, such as menarche age, gravida, and level of education, may lead to a high prevalence of female sexual dysfunction. PMID:27688863
Lee, Saro
2004-08-01
For landslide susceptibility mapping, this study applied and verified a Bayesian probability model, a likelihood ratio and statistical model, and logistic regression to Janghung, Korea, using a Geographic Information System (GIS). Landslide locations were identified in the study area from interpretation of IRS satellite imagery and field surveys; and a spatial database was constructed from topographic maps, soil type, forest cover, geology and land cover. The factors that influence landslide occurrence, such as slope gradient, slope aspect, and curvature of topography, were calculated from the topographic database. Soil texture, material, drainage, and effective depth were extracted from the soil database, while forest type, diameter, and density were extracted from the forest database. Land cover was classified from Landsat TM satellite imagery using unsupervised classification. The likelihood ratio and logistic regression coefficient were overlaid to determine each factor's rating for landslide susceptibility mapping. Then the landslide susceptibility map was verified and compared with known landslide locations. The logistic regression model had higher prediction accuracy than the likelihood ratio model. The method can be used to reduce hazards associated with landslides and to land cover planning.
Austin, Peter C; Steyerberg, Ewout W
2014-02-10
Predicting the probability of the occurrence of a binary outcome or condition is important in biomedical research. While assessing discrimination is an essential issue in developing and validating binary prediction models, less attention has been paid to methods for assessing model calibration. Calibration refers to the degree of agreement between observed and predicted probabilities and is often assessed by testing for lack-of-fit. The objective of our study was to examine the ability of graphical methods to assess the calibration of logistic regression models. We examined lack of internal calibration, which was related to misspecification of the logistic regression model, and external calibration, which was related to an overfit model or to shrinkage of the linear predictor. We conducted an extensive set of Monte Carlo simulations with a locally weighted least squares regression smoother (i.e., the loess algorithm) to examine the ability of graphical methods to assess model calibration. We found that loess-based methods were able to provide evidence of moderate departures from linearity and indicate omission of a moderately strong interaction. Misspecification of the link function was harder to detect. Visual patterns were clearer with higher sample sizes, higher incidence of the outcome, or higher discrimination. Loess-based methods were also able to identify the lack of calibration in external validation samples when an overfit regression model had been used. In conclusion, loess-based smoothing methods are adequate tools to graphically assess calibration and merit wider application.
Multiple Imputation of a Randomly Censored Covariate Improves Logistic Regression Analysis.
Atem, Folefac D; Qian, Jing; Maye, Jacqueline E; Johnson, Keith A; Betensky, Rebecca A
2016-01-01
Randomly censored covariates arise frequently in epidemiologic studies. The most commonly used methods, including complete case and single imputation or substitution, suffer from inefficiency and bias. They make strong parametric assumptions or they consider limit of detection censoring only. We employ multiple imputation, in conjunction with semi-parametric modeling of the censored covariate, to overcome these shortcomings and to facilitate robust estimation. We develop a multiple imputation approach for randomly censored covariates within the framework of a logistic regression model. We use the non-parametric estimate of the covariate distribution or the semiparametric Cox model estimate in the presence of additional covariates in the model. We evaluate this procedure in simulations, and compare its operating characteristics to those from the complete case analysis and a survival regression approach. We apply the procedures to an Alzheimer's study of the association between amyloid positivity and maternal age of onset of dementia. Multiple imputation achieves lower standard errors and higher power than the complete case approach under heavy and moderate censoring and is comparable under light censoring. The survival regression approach achieves the highest power among all procedures, but does not produce interpretable estimates of association. Multiple imputation offers a favorable alternative to complete case analysis and ad hoc substitution methods in the presence of randomly censored covariates within the framework of logistic regression.
A general framework for the use of logistic regression models in meta-analysis.
Simmonds, Mark C; Higgins, Julian Pt
2016-12-01
Where individual participant data are available for every randomised trial in a meta-analysis of dichotomous event outcomes, "one-stage" random-effects logistic regression models have been proposed as a way to analyse these data. Such models can also be used even when individual participant data are not available and we have only summary contingency table data. One benefit of this one-stage regression model over conventional meta-analysis methods is that it maximises the correct binomial likelihood for the data and so does not require the common assumption that effect estimates are normally distributed. A second benefit of using this model is that it may be applied, with only minor modification, in a range of meta-analytic scenarios, including meta-regression, network meta-analyses and meta-analyses of diagnostic test accuracy. This single model can potentially replace the variety of often complex methods used in these areas. This paper considers, with a range of meta-analysis examples, how random-effects logistic regression models may be used in a number of different types of meta-analyses. This one-stage approach is compared with widely used meta-analysis methods including Bayesian network meta-analysis and the bivariate and hierarchical summary receiver operating characteristic (ROC) models for meta-analyses of diagnostic test accuracy.
NASA Astrophysics Data System (ADS)
Ko, K.; Cheong, B.; Koh, D.
2010-12-01
Groundwater has been used a main source to provide a drinking water in a rural area with no regional potable water supply system in Korea. More than 50 percent of rural area residents depend on groundwater as drinking water. Thus, research on predicting groundwater pollution for the sustainable groundwater usage and protection from potential pollutants was demanded. This study was carried out to know the vulnerability of groundwater nitrate contamination reflecting the effect of land use in Nonsan city of a representative rural area of South Korea. About 47% of the study area is occupied by cultivated land with high vulnerable area to groundwater nitrate contamination because it has higher nitrogen fertilizer input of 62.3 tons/km2 than that of country’s average of 44.0 tons/km2. The two vulnerability assessment methods, logistic regression and DRASTIC model, were tested and compared to know more suitable techniques for the assessment of groundwater nitrate contamination in Nonsan area. The groundwater quality data were acquired from the collection of analyses of 111 samples of small potable supply system in the study area. The analyzed values of nitrate were classified by land use such as resident, upland, paddy, and field area. One dependent and two independent variables were addressed for logistic regression analysis. One dependent variable was a binary categorical data with 0 or 1 whether or not nitrate exceeding thresholds of 1 through 10 mg/L. The independent variables were one continuous data of slope indicating topography and multiple categorical data of land use which are classified by resident, upland, paddy, and field area. The results of the Levene’s test and T-test for slope and land use were showed the significant difference of mean values among groups in 95% confidence level. From the logistic regression, we could know the negative correlation between slope and nitrate which was caused by the decrease of contaminants inputs into groundwater with
Stiglic, Gregor; Povalej Brzan, Petra; Fijacko, Nino; Wang, Fei; Delibasic, Boris; Kalousis, Alexandros; Obradovic, Zoran
2015-01-01
Different studies have demonstrated the importance of comorbidities to better understand the origin and evolution of medical complications. This study focuses on improvement of the predictive model interpretability based on simple logical features representing comorbidities. We use group lasso based feature interaction discovery followed by a post-processing step, where simple logic terms are added. In the final step, we reduce the feature set by applying lasso logistic regression to obtain a compact set of non-zero coefficients that represent a more comprehensible predictive model. The effectiveness of the proposed approach was demonstrated on a pediatric hospital discharge dataset that was used to build a readmission risk estimation model. The evaluation of the proposed method demonstrates a reduction of the initial set of features in a regression model by 72%, with a slight improvement in the Area Under the ROC Curve metric from 0.763 (95% CI: 0.755–0.771) to 0.769 (95% CI: 0.761–0.777). Additionally, our results show improvement in comprehensibility of the final predictive model using simple comorbidity based terms for logistic regression. PMID:26645087
Montgomery, M E; White, M E; Martin, S W
1987-01-01
Results from discriminant analysis and logistic regression were compared using two data sets from a study on predictors of coliform mastitis in dairy cows. Both techniques selected the same set of variables as important predictors and were of nearly equal value in classifying cows as having, or not having mastitis. The logistic regression model made fewer classification errors. The magnitudes of the effects were considerably different for some variables. Given the failure to meet the underlying assumptions of discriminant analysis, the coefficients from logistic regression are preferable. PMID:3453271
Adjustment of regional regression equations for urban storm-runoff quality using at-site data
Barks, C.S.
1996-01-01
Regional regression equations have been developed to estimate urban storm-runoff loads and mean concentrations using a national data base. Four statistical methods using at-site data to adjust the regional equation predictions were developed to provide better local estimates. The four adjustment procedures are a single-factor adjustment, a regression of the observed data against the predicted values, a regression of the observed values against the predicted values and additional local independent variables, and a weighted combination of a local regression with the regional prediction. Data collected at five representative storm-runoff sites during 22 storms in Little Rock, Arkansas, were used to verify, and, when appropriate, adjust the regional regression equation predictions. Comparison of observed values of stormrunoff loads and mean concentrations to the predicted values from the regional regression equations for nine constituents (chemical oxygen demand, suspended solids, total nitrogen as N, total ammonia plus organic nitrogen as N, total phosphorus as P, dissolved phosphorus as P, total recoverable copper, total recoverable lead, and total recoverable zinc) showed large prediction errors ranging from 63 percent to more than several thousand percent. Prediction errors for 6 of the 18 regional regression equations were less than 100 percent and could be considered reasonable for water-quality prediction equations. The regression adjustment procedure was used to adjust five of the regional equation predictions to improve the predictive accuracy. For seven of the regional equations the observed and the predicted values are not significantly correlated. Thus neither the unadjusted regional equations nor any of the adjustments were appropriate. The mean of the observed values was used as a simple estimator when the regional equation predictions and adjusted predictions were not appropriate.
Olson, Scott A.; Brouillette, Michael C.
2006-01-01
A logistic regression equation was developed for estimating the probability of a stream flowing intermittently at unregulated, rural stream sites in Vermont. These determinations can be used for a wide variety of regulatory and planning efforts at the Federal, State, regional, county and town levels, including such applications as assessing fish and wildlife habitats, wetlands classifications, recreational opportunities, water-supply potential, waste-assimilation capacities, and sediment transport. The equation will be used to create a derived product for the Vermont Hydrography Dataset having the streamflow characteristic of 'intermittent' or 'perennial.' The Vermont Hydrography Dataset is Vermont's implementation of the National Hydrography Dataset and was created at a scale of 1:5,000 based on statewide digital orthophotos. The equation was developed by relating field-verified perennial or intermittent status of a stream site during normal summer low-streamflow conditions in the summer of 2005 to selected basin characteristics of naturally flowing streams in Vermont. The database used to develop the equation included 682 stream sites with drainage areas ranging from 0.05 to 5.0 square miles. When the 682 sites were observed, 126 were intermittent (had no flow at the time of the observation) and 556 were perennial (had flowing water at the time of the observation). The results of the logistic regression analysis indicate that the probability of a stream having intermittent flow in Vermont is a function of drainage area, elevation of the site, the ratio of basin relief to basin perimeter, and the areal percentage of well- and moderately well-drained soils in the basin. Using a probability cutpoint (a lower probability indicates the site has perennial flow and a higher probability indicates the site has intermittent flow) of 0.5, the logistic regression equation correctly predicted the perennial or intermittent status of 116 test sites 85 percent of the time.
A secure distributed logistic regression protocol for the detection of rare adverse drug events
El Emam, Khaled; Samet, Saeed; Arbuckle, Luk; Tamblyn, Robyn; Earle, Craig; Kantarcioglu, Murat
2013-01-01
Background There is limited capacity to assess the comparative risks of medications after they enter the market. For rare adverse events, the pooling of data from multiple sources is necessary to have the power and sufficient population heterogeneity to detect differences in safety and effectiveness in genetic, ethnic and clinically defined subpopulations. However, combining datasets from different data custodians or jurisdictions to perform an analysis on the pooled data creates significant privacy concerns that would need to be addressed. Existing protocols for addressing these concerns can result in reduced analysis accuracy and can allow sensitive information to leak. Objective To develop a secure distributed multi-party computation protocol for logistic regression that provides strong privacy guarantees. Methods We developed a secure distributed logistic regression protocol using a single analysis center with multiple sites providing data. A theoretical security analysis demonstrates that the protocol is robust to plausible collusion attacks and does not allow the parties to gain new information from the data that are exchanged among them. The computational performance and accuracy of the protocol were evaluated on simulated datasets. Results The computational performance scales linearly as the dataset sizes increase. The addition of sites results in an exponential growth in computation time. However, for up to five sites, the time is still short and would not affect practical applications. The model parameters are the same as the results on pooled raw data analyzed in SAS, demonstrating high model accuracy. Conclusion The proposed protocol and prototype system would allow the development of logistic regression models in a secure manner without requiring the sharing of personal health information. This can alleviate one of the key barriers to the establishment of large-scale post-marketing surveillance programs. We extended the secure protocol to account for
An, Lihua; Fung, Karen Y; Krewski, Daniel
2010-09-01
Spontaneous adverse event reporting systems are widely used to identify adverse reactions to drugs following their introduction into the marketplace. In this article, a James-Stein type shrinkage estimation strategy was developed in a Bayesian logistic regression model to analyze pharmacovigilance data. This method is effective in detecting signals as it combines information and borrows strength across medically related adverse events. Computer simulation demonstrated that the shrinkage estimator is uniformly better than the maximum likelihood estimator in terms of mean squared error. This method was used to investigate the possible association of a series of diabetic drugs and the risk of cardiovascular events using data from the Canada Vigilance Online Database.
Petersen, Jørgen Holm
2016-01-15
This paper describes a new approach to the estimation in a logistic regression model with two crossed random effects where special interest is in estimating the variance of one of the effects while not making distributional assumptions about the other effect. A composite likelihood is studied. For each term in the composite likelihood, a conditional likelihood is used that eliminates the influence of the random effects, which results in a composite conditional likelihood consisting of only one-dimensional integrals that may be solved numerically. Good properties of the resulting estimator are described in a small simulation study.
Analyzing Beijing's in-use vehicle emissions test results using logistic regression.
Chang, Cheng; Ortolano, Leonard
2008-10-01
A logistic regression model was built using vehicle emissions test data collected in 2003 for 129 604 motor vehicles in Beijing. The regression model uses vehicle model, model year, inspection station, ownership, and vehicle registration area as covariates to predict the probability that a vehicle fails an annual emissions test on the first try. Vehicle model is the most influential predictor variable: some vehicle models are much more likely to fail in emissions tests than an "average" vehicle. Five out of 14 vehicle models that performed the worst (out of a total of 52 models) were manufactured by foreign companies or by their joint ventures with Chinese enterprises. These 14 vehicle model types may have failed at relatively high rates because of design and manufacturing deficiencies, and such deficiencies cannot be easily detected and corrected without further efforts, such as programs for in-use surveillance and vehicle recall.
Chen, Guangchao; Li, Xuehua; Chen, Jingwen; Zhang, Ya-Nan; Peijnenburg, Willie J G M
2014-12-01
Biodegradation is the principal environmental dissipation process of chemicals. As such, it is a dominant factor determining the persistence and fate of organic chemicals in the environment, and is therefore of critical importance to chemical management and regulation. In the present study, the authors developed in silico methods assessing biodegradability based on a large heterogeneous set of 825 organic compounds, using the techniques of the C4.5 decision tree, the functional inner regression tree, and logistic regression. External validation was subsequently carried out by 2 independent test sets of 777 and 27 chemicals. As a result, the functional inner regression tree exhibited the best predictability with predictive accuracies of 81.5% and 81.0%, respectively, on the training set (825 chemicals) and test set I (777 chemicals). Performance of the developed models on the 2 test sets was subsequently compared with that of the Estimation Program Interface (EPI) Suite Biowin 5 and Biowin 6 models, which also showed a better predictability of the functional inner regression tree model. The model built in the present study exhibits a reasonable predictability compared with existing models while possessing a transparent algorithm. Interpretation of the mechanisms of biodegradation was also carried out based on the models developed.
Predicting students' success at pre-university studies using linear and logistic regressions
NASA Astrophysics Data System (ADS)
Suliman, Noor Azizah; Abidin, Basir; Manan, Norhafizah Abdul; Razali, Ahmad Mahir
2014-09-01
The study is aimed to find the most suitable model that could predict the students' success at the medical pre-university studies, Centre for Foundation in Science, Languages and General Studies of Cyberjaya University College of Medical Sciences (CUCMS). The predictors under investigation were the national high school exit examination-Sijil Pelajaran Malaysia (SPM) achievements such as Biology, Chemistry, Physics, Additional Mathematics, Mathematics, English and Bahasa Malaysia results as well as gender and high school background factors. The outcomes showed that there is a significant difference in the final CGPA, Biology and Mathematics subjects at pre-university by gender factor, while by high school background also for Mathematics subject. In general, the correlation between the academic achievements at the high school and medical pre-university is moderately significant at α-level of 0.05, except for languages subjects. It was found also that logistic regression techniques gave better prediction models than the multiple linear regression technique for this data set. The developed logistic models were able to give the probability that is almost accurate with the real case. Hence, it could be used to identify successful students who are qualified to enter the CUCMS medical faculty before accepting any students to its foundation program.
2011-01-01
Background The study attempts to develop an ordinal logistic regression (OLR) model to identify the determinants of child malnutrition instead of developing traditional binary logistic regression (BLR) model using the data of Bangladesh Demographic and Health Survey 2004. Methods Based on weight-for-age anthropometric index (Z-score) child nutrition status is categorized into three groups-severely undernourished (< -3.0), moderately undernourished (-3.0 to -2.01) and nourished (≥-2.0). Since nutrition status is ordinal, an OLR model-proportional odds model (POM) can be developed instead of two separate BLR models to find predictors of both malnutrition and severe malnutrition if the proportional odds assumption satisfies. The assumption is satisfied with low p-value (0.144) due to violation of the assumption for one co-variate. So partial proportional odds model (PPOM) and two BLR models have also been developed to check the applicability of the OLR model. Graphical test has also been adopted for checking the proportional odds assumption. Results All the models determine that age of child, birth interval, mothers' education, maternal nutrition, household wealth status, child feeding index, and incidence of fever, ARI & diarrhoea were the significant predictors of child malnutrition; however, results of PPOM were more precise than those of other models. Conclusion These findings clearly justify that OLR models (POM and PPOM) are appropriate to find predictors of malnutrition instead of BLR models. PMID:22082256
An application of a microcomputer compiler program to multiple logistic regression analysis.
Sakai, R
1988-01-01
Microcomputer programs for multiple logistic regression analysis were written in BASIC language to determine the usefulness of microcomputers for multivariate analysis, which is an important method in epidemiological studies. The program, carried out by an interpreter system, required a comparatively long computing time for a small amount of data. For example, it took approximately thirty minutes to compute the data of 6 independent variables and 63 matched sets of case and controls (1:4). The majority of the calculation time was spent computing a matrix. The matrix computation time increased cumulatively in proportion to additions in the number of subjects, and increased exponentially with the number of variables. A BASIC compiler was utilized for the program of multiple logistic regression analysis. The compiled program carried out the same computations as above, but within 4 minutes. Therefore, it is evident that a compiler can be an extremely convenient tool for computing multivariate analysis. The two programs produced here were also easily linked with spreadsheet packages to enter data.
Ordinal logistic regression analysis on the nutritional status of children in KarangKitri village
NASA Astrophysics Data System (ADS)
Ohyver, Margaretha; Yongharto, Kimmy Octavian
2015-09-01
Ordinal logistic regression is a statistical technique that can be used to describe the relationship between ordinal response variable with one or more independent variables. This method has been used in various fields including in the health field. In this research, ordinal logistic regression is used to describe the relationship between nutritional status of children with age, gender, height, and family status. Nutritional status of children in this research is divided into over nutrition, well nutrition, less nutrition, and malnutrition. The purpose for this research is to describe the characteristics of children in the KarangKitri Village and to determine the factors that influence the nutritional status of children in the KarangKitri village. There are three things that obtained from this research. First, there are still children who are not categorized as well nutritional status. Second, there are children who come from sufficient economic level which include in not normal status. Third, the factors that affect the nutritional level of children are age, family status, and height.
NASA Astrophysics Data System (ADS)
Junek, W. N.; Jones, W. L.; Woods, M. T.
2011-12-01
An automated event tree analysis system for estimating the probability of short term volcanic activity is presented. The algorithm is driven by a suite of empirical statistical models that are derived through logistic regression. Each model is constructed from a multidisciplinary dataset that was assembled from a collection of historic volcanic unrest episodes. The dataset consists of monitoring measurements (e.g. InSAR, seismic), source modeling results, and historic eruption activity. This provides a simple mechanism for simultaneously accounting for the geophysical changes occurring within the volcano and the historic behavior of analog volcanoes. The algorithm is extensible and can be easily recalibrated to include new or additional monitoring, modeling, or historic information. Standard cross validation techniques are employed to optimize its forecasting capabilities. Analysis results from several recent volcanic unrest episodes are presented.
Binary logistic regression modelling: Measuring the probability of relapse cases among drug addict
NASA Astrophysics Data System (ADS)
Ismail, Mohd Tahir; Alias, Siti Nor Shadila
2014-07-01
For many years Malaysia faced the drug addiction issues. The most serious case is relapse phenomenon among treated drug addict (drug addict who have under gone the rehabilitation programme at Narcotic Addiction Rehabilitation Centre, PUSPEN). Thus, the main objective of this study is to find the most significant factor that contributes to relapse to happen. The binary logistic regression analysis was employed to model the relationship between independent variables (predictors) and dependent variable. The dependent variable is the status of the drug addict either relapse, (Yes coded as 1) or not, (No coded as 0). Meanwhile the predictors involved are age, age at first taking drug, family history, education level, family crisis, community support and self motivation. The total of the sample is 200 which the data are provided by AADK (National Antidrug Agency). The finding of the study revealed that age and self motivation are statistically significant towards the relapse cases..
Battaglin, W.A.
1996-01-01
Agricultural chemicals (herbicides, insecticides, other pesticides and fertilizers) in surface water may constitute a human health risk. Recent research on unregulated rivers in the midwestern USA documents that elevated concentrations of herbicides occur for 1-4 months following application in spring and early summer. In contrast, nitrate concentrations in unregulated rivers are elevated during the fall, winter and spring. Natural and anthropogenic variables of river drainage basins, such as soil permeability, the amount of agricultural chemicals applied or percentage of land planted in corn, affect agricultural chemical concentrations in rivers. Logistic regression (LGR) models are used to investigate relations between various drainage basin variables and the concentration of selected agricultural chemicals in rivers. The method is successful in contributing to the understanding of agricultural chemical concentration in rivers. Overall accuracies of the best LGR models, defined as the number of correct classifications divided by the number of attempted classifications, averaged about 66%.
Novikov, I; Fund, N; Freedman, L S
2010-01-15
Different methods for the calculation of sample size for simple logistic regression (LR) with one normally distributed continuous covariate give different results. Sometimes the difference can be large. Furthermore, some methods require the user to specify the prevalence of cases when the covariate equals its population mean, rather than the more natural population prevalence. We focus on two commonly used methods and show through simulations that the power for a given sample size may differ substantially from the nominal value for one method, especially when the covariate effect is large, while the other method performs poorly if the user provides the population prevalence instead of the required parameter. We propose a modification of the method of Hsieh et al. that requires specification of the population prevalence and that employs Schouten's sample size formula for a t-test with unequal variances and group sizes. This approach appears to increase the accuracy of the sample size estimates for LR with one continuous covariate.
NASA Astrophysics Data System (ADS)
Chan, H. C.; Chang, C. C.; Laio, P. Y.
2012-04-01
Landslide susceptibility analysis usually combines several factors, including the terrain, geology, and hydrology. The analysis tries to find a suitable combination of these factors in order to establish a landslide susceptibility model and calculate the susceptibility value. A potential landslide map can be established by using the calculated the susceptibility value of landslide. This study took Alishan area as an example and aimed to assess landslide susceptibility analysis by Logistic regression, a multivariate analysis method. In order to select the factors efficiently, the calibration and selection procedure were performed. The results were verified by a previous typhoon event. The classification error matrix was used to evaluate the accuracy of landslide predicted by the present model. Finally, this study applied 10-, 25-, 50-, and 100-year return periods precipitation to estimate the susceptibility values for the study area. The landslide susceptibilities were separated into four levels, including high, medium-high, medium, and low, to delineate the map of potential landslide.
Spackman, K. A.
1991-01-01
This paper presents maximum likelihood back-propagation (ML-BP), an approach to training neural networks. The widely reported original approach uses least squares back-propagation (LS-BP), minimizing the sum of squared errors (SSE). Unfortunately, least squares estimation does not give a maximum likelihood (ML) estimate of the weights in the network. Logistic regression, on the other hand, gives ML estimates for single layer linear models only. This report describes how to obtain ML estimates of the weights in a multi-layer model, and compares LS-BP to ML-BP using several examples. It shows that in many neural networks, least squares estimation gives inferior results and should be abandoned in favor of maximum likelihood estimation. Questions remain about the potential uses of multi-level connectionist models in such areas as diagnostic systems and risk-stratification in outcomes research. PMID:1807606
Logistic regression analysis of pedestrian casualty risk in passenger vehicle collisions in China.
Kong, Chunyu; Yang, Jikuang
2010-07-01
A large number of pedestrian fatalities were reported in China since the 1990s, however the exposure of pedestrians in public traffic has never been measured quantitatively using in-depth accident data. This study aimed to investigate the association between the impact speed and risk of pedestrian casualties in passenger vehicle collisions based on real-world accident cases in China. The cases were selected from a database of in-depth investigation of vehicle accidents in Changsha-IVAC. The sampling criteria were defined as (1) the accident was a frontal impact that occurred between 2003 and 2009; (2) the pedestrian age was above 14; (3) the injury according to the Abbreviated Injury Scale (AIS) was 1+; (4) the accident involved passenger cars, SUVs, or MPVs; and (5) the vehicle impact speed can be determined. The selected IVAC data set, which included 104 pedestrian accident cases, was weighted based on the national traffic accident data. The logistical regression models of the risks for pedestrian fatalities and AIS 3+ injuries were developed in terms of vehicle impact speed using the unweighted and weighted data sets. A multiple logistic regression model on the risk of pedestrian AIS 3+ injury was developed considering the age and impact speed as two variables. It was found that the risk of pedestrian fatality is 26% at 50 km/h, 50% at 58 km/h, and 82% at 70 km/h. At an impact speed of 80 km/h, the pedestrian rarely survives. The weighted risk curves indicated that the risks of pedestrian fatality and injury in China were higher than that in other high-income countries, whereas the risks of pedestrian casualty was lower than in these countries 30 years ago. The findings could have a contribution to better understanding of the exposures of pedestrians in urban traffic in China, and provide background knowledge for the development of strategies for pedestrian protection.
Modeling Group Size and Scalar Stress by Logistic Regression from an Archaeological Perspective
Alberti, Gianmarco
2014-01-01
Johnson’s scalar stress theory, describing the mechanics of (and the remedies to) the increase in in-group conflictuality that parallels the increase in groups’ size, provides scholars with a useful theoretical framework for the understanding of different aspects of the material culture of past communities (i.e., social organization, communal food consumption, ceramic style, architecture and settlement layout). Due to its relevance in archaeology and anthropology, the article aims at proposing a predictive model of critical level of scalar stress on the basis of community size. Drawing upon Johnson’s theory and on Dunbar’s findings on the cognitive constrains to human group size, a model is built by means of Logistic Regression on the basis of the data on colony fissioning among the Hutterites of North America. On the grounds of the theoretical framework sketched in the first part of the article, the absence or presence of colony fissioning is considered expression of not critical vs. critical level of scalar stress for the sake of the model building. The model, which is also tested against a sample of archaeological and ethnographic cases: a) confirms the existence of a significant relationship between critical scalar stress and group size, setting the issue on firmer statistical grounds; b) allows calculating the intercept and slope of the logistic regression model, which can be used in any time to estimate the probability that a community experienced a critical level of scalar stress; c) allows locating a critical scalar stress threshold at community size 127 (95% CI: 122–132), while the maximum probability of critical scale stress is predicted at size 158 (95% CI: 147–170). The model ultimately provides grounds to assess, for the sake of any further archaeological/anthropological interpretation, the probability that a group reached a hot spot of size development critical for its internal cohesion. PMID:24626241
Optimization of Game Formats in U-10 Soccer Using Logistic Regression Analysis
Amatria, Mario; Arana, Javier; Anguera, M. Teresa; Garzón, Belén
2016-01-01
Abstract Small-sided games provide young soccer players with better opportunities to develop their skills and progress as individual and team players. There is, however, little evidence on the effectiveness of different game formats in different age groups, and furthermore, these formats can vary between and even within countries. The Royal Spanish Soccer Association replaced the traditional grassroots 7-a-side format (F-7) with the 8-a-side format (F-8) in the 2011-12 season and the country’s regional federations gradually followed suit. The aim of this observational methodology study was to investigate which of these formats best suited the learning needs of U-10 players transitioning from 5-aside futsal. We built a multiple logistic regression model to predict the success of offensive moves depending on the game format and the area of the pitch in which the move was initiated. Success was defined as a shot at the goal. We also built two simple logistic regression models to evaluate how the game format influenced the acquisition of technicaltactical skills. It was found that the probability of a shot at the goal was higher in F-7 than in F-8 for moves initiated in the Creation Sector-Own Half (0.08 vs 0.07) and the Creation Sector-Opponent's Half (0.18 vs 0.16). The probability was the same (0.04) in the Safety Sector. Children also had more opportunities to control the ball and pass or take a shot in the F-7 format (0.24 vs 0.20), and these were also more likely to be successful in this format (0.28 vs 0.19). PMID:28031768
Modeling group size and scalar stress by logistic regression from an archaeological perspective.
Alberti, Gianmarco
2014-01-01
Johnson's scalar stress theory, describing the mechanics of (and the remedies to) the increase in in-group conflictuality that parallels the increase in groups' size, provides scholars with a useful theoretical framework for the understanding of different aspects of the material culture of past communities (i.e., social organization, communal food consumption, ceramic style, architecture and settlement layout). Due to its relevance in archaeology and anthropology, the article aims at proposing a predictive model of critical level of scalar stress on the basis of community size. Drawing upon Johnson's theory and on Dunbar's findings on the cognitive constrains to human group size, a model is built by means of Logistic Regression on the basis of the data on colony fissioning among the Hutterites of North America. On the grounds of the theoretical framework sketched in the first part of the article, the absence or presence of colony fissioning is considered expression of not critical vs. critical level of scalar stress for the sake of the model building. The model, which is also tested against a sample of archaeological and ethnographic cases: a) confirms the existence of a significant relationship between critical scalar stress and group size, setting the issue on firmer statistical grounds; b) allows calculating the intercept and slope of the logistic regression model, which can be used in any time to estimate the probability that a community experienced a critical level of scalar stress; c) allows locating a critical scalar stress threshold at community size 127 (95% CI: 122-132), while the maximum probability of critical scale stress is predicted at size 158 (95% CI: 147-170). The model ultimately provides grounds to assess, for the sake of any further archaeological/anthropological interpretation, the probability that a group reached a hot spot of size development critical for its internal cohesion.
1991-03-01
Adjusted Estimators for Variance 1Redutilol in Computer Simutlation by Riichiardl L. R’ r March, 1991 D~issertation Advisor: Peter A.W. Lewis Approved for...OF NONLINEAR CONTROLS AND REGRESSION-ADJUSTED ESTIMATORS FOR VARIANCE REDUCTION IN COMPUTER SIMULATION 12. Personal Author(s) Richard L. Ressler 13a...necessary and identify by block number) This dissertation develops new techniques for variance reduction in computer simulation. It demonstrates that
Genomic-Enabled Prediction of Ordinal Data with Bayesian Logistic Ordinal Regression
Montesinos-López, Osval A.; Montesinos-López, Abelardo; Crossa, José; Burgueño, Juan; Eskridge, Kent
2015-01-01
Most genomic-enabled prediction models developed so far assume that the response variable is continuous and normally distributed. The exception is the probit model, developed for ordered categorical phenotypes. In statistical applications, because of the easy implementation of the Bayesian probit ordinal regression (BPOR) model, Bayesian logistic ordinal regression (BLOR) is implemented rarely in the context of genomic-enabled prediction [sample size (n) is much smaller than the number of parameters (p)]. For this reason, in this paper we propose a BLOR model using the Pólya-Gamma data augmentation approach that produces a Gibbs sampler with similar full conditional distributions of the BPOR model and with the advantage that the BPOR model is a particular case of the BLOR model. We evaluated the proposed model by using simulation and two real data sets. Results indicate that our BLOR model is a good alternative for analyzing ordinal data in the context of genomic-enabled prediction with the probit or logit link. PMID:26290569
Lin, Wan-Yu; Schaid, Daniel J
2009-04-01
Recently, a genomic distance-based regression for multilocus associations was proposed (Wessel and Schork [2006] Am. J. Hum. Genet. 79:792-806) in which either locus or haplotype scoring can be used to measure genetic distance. Although it allows various measures of genomic similarity and simultaneous analyses of multiple phenotypes, its power relative to other methods for case-control analyses is not well known. We compare the power of traditional methods with this new distance-based approach, for both locus-scoring and haplotype-scoring strategies. We discuss the relative power of these association methods with respect to five properties: (1) the marker informativity; (2) the number of markers; (3) the causal allele frequency; (4) the preponderance of the most common high-risk haplotype; (5) the correlation between the causal single-nucleotide polymorphism (SNP) and its flanking markers. We found that locus-based logistic regression and the global score test for haplotypes suffered from power loss when many markers were included in the analyses, due to many degrees of freedom. In contrast, the distance-based approach was not as vulnerable to more markers or more haplotypes. A genotype counting measure was more sensitive to the marker informativity and the correlation between the causal SNP and its flanking markers. After examining the impact of the five properties on power, we found that on average, the genomic distance-based regression that uses a matching measure for diplotypes was the most powerful and robust method among the seven methods we compared.
Comparison of the Properties of Regression and Categorical Risk-Adjustment Models
Averill, Richard F.; Muldoon, John H.; Hughes, John S.
2016-01-01
Clinical risk-adjustment, the ability to standardize the comparison of individuals with different health needs, is based upon 2 main alternative approaches: regression models and clinical categorical models. In this article, we examine the impact of the differences in the way these models are constructed on end user applications. PMID:26945302
ERIC Educational Resources Information Center
Olejnik, Stephen; Mills, Jamie; Keselman, Harvey
2000-01-01
Evaluated the use of Mallow's C(p) and Wherry's adjusted R squared (R. Wherry, 1931) statistics to select a final model from a pool of model solutions using computer generated data. Neither statistic identified the underlying regression model any better than, and usually less well than, the stepwise selection method, which itself was poor for…
Franklin, Jessica M; Eddings, Wesley; Glynn, Robert J; Schneeweiss, Sebastian
2015-10-01
Selection and measurement of confounders is critical for successful adjustment in nonrandomized studies. Although the principles behind confounder selection are now well established, variable selection for confounder adjustment remains a difficult problem in practice, particularly in secondary analyses of databases. We present a simulation study that compares the high-dimensional propensity score algorithm for variable selection with approaches that utilize direct adjustment for all potential confounders via regularized regression, including ridge regression and lasso regression. Simulations were based on 2 previously published pharmacoepidemiologic cohorts and used the plasmode simulation framework to create realistic simulated data sets with thousands of potential confounders. Performance of methods was evaluated with respect to bias and mean squared error of the estimated effects of a binary treatment. Simulation scenarios varied the true underlying outcome model, treatment effect, prevalence of exposure and outcome, and presence of unmeasured confounding. Across scenarios, high-dimensional propensity score approaches generally performed better than regularized regression approaches. However, including the variables selected by lasso regression in a regular propensity score model also performed well and may provide a promising alternative variable selection method.
Huang, Hai-Hui; Liu, Xiao-Ying; Liang, Yong
2016-01-01
Cancer classification and feature (gene) selection plays an important role in knowledge discovery in genomic data. Although logistic regression is one of the most popular classification methods, it does not induce feature selection. In this paper, we presented a new hybrid L1/2 +2 regularization (HLR) function, a linear combination of L1/2 and L2 penalties, to select the relevant gene in the logistic regression. The HLR approach inherits some fascinating characteristics from L1/2 (sparsity) and L2 (grouping effect where highly correlated variables are in or out a model together) penalties. We also proposed a novel univariate HLR thresholding approach to update the estimated coefficients and developed the coordinate descent algorithm for the HLR penalized logistic regression model. The empirical results and simulations indicate that the proposed method is highly competitive amongst several state-of-the-art methods. PMID:27136190
Multiple logistic regression model of signalling practices of drivers on urban highways
NASA Astrophysics Data System (ADS)
Puan, Othman Che; Ibrahim, Muttaka Na'iya; Zakaria, Rozana
2015-05-01
Giving signal is a way of informing other road users, especially to the conflicting drivers, the intention of a driver to change his/her movement course. Other users are exposed to hazard situation and risks of accident if the driver who changes his/her course failed to give signal as required. This paper describes the application of logistic regression model for the analysis of driver's signalling practices on multilane highways based on possible factors affecting driver's decision such as driver's gender, vehicle's type, vehicle's speed and traffic flow intensity. Data pertaining to the analysis of such factors were collected manually. More than 2000 drivers who have performed a lane changing manoeuvre while driving on two sections of multilane highways were observed. Finding from the study shows that relatively a large proportion of drivers failed to give any signals when changing lane. The result of the analysis indicates that although the proportion of the drivers who failed to provide signal prior to lane changing manoeuvre is high, the degree of compliances of the female drivers is better than the male drivers. A binary logistic model was developed to represent the probability of a driver to provide signal indication prior to lane changing manoeuvre. The model indicates that driver's gender, type of vehicle's driven, speed of vehicle and traffic volume influence the driver's decision to provide a signal indication prior to a lane changing manoeuvre on a multilane urban highway. In terms of types of vehicles driven, about 97% of motorcyclists failed to comply with the signal indication requirement. The proportion of non-compliance drivers under stable traffic flow conditions is much higher than when the flow is relatively heavy. This is consistent with the data which indicates a high degree of non-compliances when the average speed of the traffic stream is relatively high.
NASA Astrophysics Data System (ADS)
Heckmann, Tobias; Gegg, Katharina; Becht, Michael
2013-04-01
Statistical approaches to landslide susceptibility modelling on the catchment and regional scale are used very frequently compared to heuristic and physically based approaches. In the present study, we deal with the problem of the optimal sample size for a logistic regression model. More specifically, a stepwise approach has been chosen in order to select those independent variables (from a number of derivatives of a digital elevation model and landcover data) that explain best the spatial distribution of debris flow initiation zones in two neighbouring central alpine catchments in Austria (used mutually for model calculation and validation). In order to minimise problems arising from spatial autocorrelation, we sample a single raster cell from each debris flow initiation zone within an inventory. In addition, as suggested by previous work using the "rare events logistic regression" approach, we take a sample of the remaining "non-event" raster cells. The recommendations given in the literature on the size of this sample appear to be motivated by practical considerations, e.g. the time and cost of acquiring data for non-event cases, which do not apply to the case of spatial data. In our study, we aim at finding empirically an "optimal" sample size in order to avoid two problems: First, a sample too large will violate the independent sample assumption as the independent variables are spatially autocorrelated; hence, a variogram analysis leads to a sample size threshold above which the average distance between sampled cells falls below the autocorrelation range of the independent variables. Second, if the sample is too small, repeated sampling will lead to very different results, i.e. the independent variables and hence the result of a single model calculation will be extremely dependent on the choice of non-event cells. Using a Monte-Carlo analysis with stepwise logistic regression, 1000 models are calculated for a wide range of sample sizes. For each sample size
A logistic regression based approach for the prediction of flood warning threshold exceedance
NASA Astrophysics Data System (ADS)
Diomede, Tommaso; Trotter, Luca; Stefania Tesini, Maria; Marsigli, Chiara
2016-04-01
A method based on logistic regression is proposed for the prediction of river level threshold exceedance at short (+0-18h) and medium (+18-42h) lead times. The aim of the study is to provide a valuable tool for the issue of warnings by the authority responsible of public safety in case of flood. The role of different precipitation periods as predictors for the exceedance of a fixed river level has been investigated, in order to derive significant information for flood forecasting. Based on catchment-averaged values, a separation of "antecedent" and "peak-triggering" rainfall amounts as independent variables is attempted. In particular, the following flood-related precipitation periods have been considered: (i) the period from 1 to n days before the forecast issue time, which may be relevant for the soil saturation, (ii) the last 24 hours, which may be relevant for the current water level in the river, and (iii) the period from 0 to x hours in advance with respect to the forecast issue time, when the flood-triggering precipitation generally occurs. Several combinations and values of these predictors have been tested to optimise the method implementation. In particular, the period for the precursor antecedent precipitation ranges between 5 and 45 days; the state of the river can be represented by the last 24-h precipitation or, as alternative, by the current river level. The flood-triggering precipitation has been cumulated over the next 18 hours (for the short lead time) and 36-42 hours (for the medium lead time). The proposed approach requires a specific implementation of logistic regression for each river section and warning threshold. The method performance has been evaluated over the Santerno river catchment (about 450 km2) in the Emilia-Romagna Region, northern Italy. A statistical analysis in terms of false alarms, misses and related scores was carried out by using a 8-year long database. The results are quite satisfactory, with slightly better performances
Adjusting for Cell Type Composition in DNA Methylation Data Using a Regression-Based Approach.
Jones, Meaghan J; Islam, Sumaiya A; Edgar, Rachel D; Kobor, Michael S
2017-01-01
Analysis of DNA methylation in a population context has the potential to uncover novel gene and environment interactions as well as markers of health and disease. In order to find such associations it is important to control for factors which may mask or alter DNA methylation signatures. Since tissue of origin and coinciding cell type composition are major contributors to DNA methylation patterns, and can easily confound important findings, it is vital to adjust DNA methylation data for such differences across individuals. Here we describe the use of a regression method to adjust for cell type composition in DNA methylation data. We specifically discuss what information is required to adjust for cell type composition and then provide detailed instructions on how to perform cell type adjustment on high dimensional DNA methylation data. This method has been applied mainly to Illumina 450K data, but can also be adapted to pyrosequencing or genome-wide bisulfite sequencing data.
Sze, N N; Wong, S C; Lee, C Y
2014-12-01
In past several decades, many countries have set quantified road safety targets to motivate transport authorities to develop systematic road safety strategies and measures and facilitate the achievement of continuous road safety improvement. Studies have been conducted to evaluate the association between the setting of quantified road safety targets and road fatality reduction, in both the short and long run, by comparing road fatalities before and after the implementation of a quantified road safety target. However, not much work has been done to evaluate whether the quantified road safety targets are actually achieved. In this study, we used a binary logistic regression model to examine the factors - including vehicle ownership, fatality rate, and national income, in addition to level of ambition and duration of target - that contribute to a target's success. We analyzed 55 quantified road safety targets set by 29 countries from 1981 to 2009, and the results indicate that targets that are in progress and with lower level of ambitions had a higher likelihood of eventually being achieved. Moreover, possible interaction effects on the association between level of ambition and the likelihood of success are also revealed.
Erickson, Wallace P.; Nick, Todd G.; Ward, David H.; Peck, Roxy; Haugh, Larry D.; Goodman, Arnold
1998-01-01
Izembek Lagoon, an estuary in Alaska, is a very important staging area for Pacific brant, a small migratory goose. Each fall, nearly the entire Pacific Flyway population of 130,000 brant flies to Izembek Lagoon and feeds on eelgrass to accumulate fat reserves for nonstop transoceanic migration to wintering areas as distant as Mexico. In the past 10 years, offshore drilling activities in this area have increased, and, as a result, the air traffic in and out of the nearby Cold Bay airport has also increased. There has been a concern that this increased air traffic could affect the brant by disturbing them from their feeding and resting activities, which in turn could result in reduced energy intake and buildup. This may increase the mortality rates during their migratory journey. Because of these concerns, a study was conducted to investigate the flight response of brant to overflights of large helicopters. Response was measured on flocks during experimental overflights of large helicopters flown at varying altitudes and lateral (perpendicular) distances from the flocks. Logistic regression models were developed for predicting probability of flight response as a function of these distance variables. Results of this study may be used in the development of new FAA guidelines for aircraft near Izembek Lagoon.
Greenland, Sander; Mansournia, Mohammad Ali
2015-10-15
Penalization is a very general method of stabilizing or regularizing estimates, which has both frequentist and Bayesian rationales. We consider some questions that arise when considering alternative penalties for logistic regression and related models. The most widely programmed penalty appears to be the Firth small-sample bias-reduction method (albeit with small differences among implementations and the results they provide), which corresponds to using the log density of the Jeffreys invariant prior distribution as a penalty function. The latter representation raises some serious contextual objections to the Firth reduction, which also apply to alternative penalties based on t-distributions (including Cauchy priors). Taking simplicity of implementation and interpretation as our chief criteria, we propose that the log-F(1,1) prior provides a better default penalty than other proposals. Penalization based on more general log-F priors is trivial to implement and facilitates mean-squared error reduction and sensitivity analyses of penalty strength by varying the number of prior degrees of freedom. We caution however against penalization of intercepts, which are unduly sensitive to covariate coding and design idiosyncrasies.
Snedden, Gregg A.; Steyer, Gregory D.
2013-01-01
Understanding plant community zonation along estuarine stress gradients is critical for effective conservation and restoration of coastal wetland ecosystems. We related the presence of plant community types to estuarine hydrology at 173 sites across coastal Louisiana. Percent relative cover by species was assessed at each site near the end of the growing season in 2008, and hourly water level and salinity were recorded at each site Oct 2007–Sep 2008. Nine plant community types were delineated with k-means clustering, and indicator species were identified for each of the community types with indicator species analysis. An inverse relation between salinity and species diversity was observed. Canonical correspondence analysis (CCA) effectively segregated the sites across ordination space by community type, and indicated that salinity and tidal amplitude were both important drivers of vegetation composition. Multinomial logistic regression (MLR) and Akaike's Information Criterion (AIC) were used to predict the probability of occurrence of the nine vegetation communities as a function of salinity and tidal amplitude, and probability surfaces obtained from the MLR model corroborated the CCA results. The weighted kappa statistic, calculated from the confusion matrix of predicted versus actual community types, was 0.7 and indicated good agreement between observed community types and model predictions. Our results suggest that models based on a few key hydrologic variables can be valuable tools for predicting vegetation community development when restoring and managing coastal wetlands.
Predicting the "graduate on time (GOT)" of PhD students using binary logistics regression model
NASA Astrophysics Data System (ADS)
Shariff, S. Sarifah Radiah; Rodzi, Nur Atiqah Mohd; Rahman, Kahartini Abdul; Zahari, Siti Meriam; Deni, Sayang Mohd
2016-10-01
Malaysian government has recently set a new goal to produce 60,000 Malaysian PhD holders by the year 2023. As a Malaysia's largest institution of higher learning in terms of size and population which offers more than 500 academic programmes in a conducive and vibrant environment, UiTM has taken several initiatives to fill up the gap. Strategies to increase the numbers of graduates with PhD are a process that is challenging. In many occasions, many have already identified that the struggle to get into the target set is even more daunting, and that implementation is far too ideal. This has further being progressing slowly as the attrition rate increases. This study aims to apply the proposed models that incorporates several factors in predicting the number PhD students that will complete their PhD studies on time. Binary Logistic Regression model is proposed and used on the set of data to determine the number. The results show that only 6.8% of the 2014 PhD students are predicted to graduate on time and the results are compared wih the actual number for validation purpose.
CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model
Wang, Liguo; Park, Hyun Jung; Dasari, Surendra; Wang, Shengqin; Kocher, Jean-Pierre; Li, Wei
2013-01-01
Thousands of novel transcripts have been identified using deep transcriptome sequencing. This discovery of large and ‘hidden’ transcriptome rejuvenates the demand for methods that can rapidly distinguish between coding and noncoding RNA. Here, we present a novel alignment-free method, Coding Potential Assessment Tool (CPAT), which rapidly recognizes coding and noncoding transcripts from a large pool of candidates. To this end, CPAT uses a logistic regression model built with four sequence features: open reading frame size, open reading frame coverage, Fickett TESTCODE statistic and hexamer usage bias. CPAT software outperformed (sensitivity: 0.96, specificity: 0.97) other state-of-the-art alignment-based software such as Coding-Potential Calculator (sensitivity: 0.99, specificity: 0.74) and Phylo Codon Substitution Frequencies (sensitivity: 0.90, specificity: 0.63). In addition to high accuracy, CPAT is approximately four orders of magnitude faster than Coding-Potential Calculator and Phylo Codon Substitution Frequencies, enabling its users to process thousands of transcripts within seconds. The software accepts input sequences in either FASTA- or BED-formatted data files. We also developed a web interface for CPAT that allows users to submit sequences and receive the prediction results almost instantly. PMID:23335781
Li, Wentian; Sun, Fengzhu; Grosse, Ivo
2004-01-01
One important issue commonly encountered in the analysis of microarray data is to decide which and how many genes should be selected for further studies. For discriminant microarray data analyses based on statistical models, such as the logistic regression models, gene selection can be accomplished by a comparison of the maximum likelihood of the model given the real data, L(D|M), and the expected maximum likelihood of the model given an ensemble of surrogate data with randomly permuted label, L(D(0)|M). Typically, the computational burden for obtaining L(D(0)M) is immense, often exceeding the limits of available computing resources by orders of magnitude. Here, we propose an approach that circumvents such heavy computations by mapping the simulation problem to an extreme-value problem. We present the derivation of an asymptotic distribution of the extreme-value as well as its mean, median, and variance. Using this distribution, we propose two gene selection criteria, and we apply them to two microarray datasets and three classification tasks for illustration.
Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.
2003-01-01
Logistic regression was used to predict the probability of debris flows occurring in areas recently burned by wildland fires. Multiple logistic regression is conceptually similar to multiple linear regression because statistical relations between one dependent variable and several independent variables are evaluated. In logistic regression, however, the dependent variable is transformed to a binary variable (debris flow did or did not occur), and the actual probability of the debris flow occurring is statistically modeled. Data from 399 basins located within 15 wildland fires that burned during 2000-2002 in Colorado, Idaho, Montana, and New Mexico were evaluated. More than 35 independent variables describing the burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows were delineated from National Elevation Data using a Geographic Information System (GIS). (2) Data describing the burn severity, geology, land surface gradient, rainfall, and soil properties were determined for each basin. These data were then downloaded to a statistics software package for analysis using logistic regression. (3) Relations between the occurrence/non-occurrence of debris flows and burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated and several preliminary multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combination produced the most effective model. The multivariate model that best predicted the occurrence of debris flows was selected. (4) The multivariate logistic regression model was entered into a GIS, and a map showing the probability of debris flows was constructed. The most effective model incorporates the percentage of each basin with slope greater than 30 percent, percentage of land burned at medium and high burn severity
Procedures for adjusting regional regression models of urban-runoff quality using local data
Hoos, A.B.; Sisolak, J.K.
1993-01-01
Statistical operations termed model-adjustment procedures (MAP?s) can be used to incorporate local data into existing regression models to improve the prediction of urban-runoff quality. Each MAP is a form of regression analysis in which the local data base is used as a calibration data set. Regression coefficients are determined from the local data base, and the resulting `adjusted? regression models can then be used to predict storm-runoff quality at unmonitored sites. The response variable in the regression analyses is the observed load or mean concentration of a constituent in storm runoff for a single storm. The set of explanatory variables used in the regression analyses is different for each MAP, but always includes the predicted value of load or mean concentration from a regional regression model. The four MAP?s examined in this study were: single-factor regression against the regional model prediction, P, (termed MAP-lF-P), regression against P,, (termed MAP-R-P), regression against P, and additional local variables (termed MAP-R-P+nV), and a weighted combination of P, and a local-regression prediction (termed MAP-W). The procedures were tested by means of split-sample analysis, using data from three cities included in the Nationwide Urban Runoff Program: Denver, Colorado; Bellevue, Washington; and Knoxville, Tennessee. The MAP that provided the greatest predictive accuracy for the verification data set differed among the three test data bases and among model types (MAP-W for Denver and Knoxville, MAP-lF-P and MAP-R-P for Bellevue load models, and MAP-R-P+nV for Bellevue concentration models) and, in many cases, was not clearly indicated by the values of standard error of estimate for the calibration data set. A scheme to guide MAP selection, based on exploratory data analysis of the calibration data set, is presented and tested. The MAP?s were tested for sensitivity to the size of a calibration data set. As expected, predictive accuracy of all MAP?s for
An evaluation of bias in propensity score-adjusted non-linear regression models.
Wan, Fei; Mitra, Nandita
2016-04-19
Propensity score methods are commonly used to adjust for observed confounding when estimating the conditional treatment effect in observational studies. One popular method, covariate adjustment of the propensity score in a regression model, has been empirically shown to be biased in non-linear models. However, no compelling underlying theoretical reason has been presented. We propose a new framework to investigate bias and consistency of propensity score-adjusted treatment effects in non-linear models that uses a simple geometric approach to forge a link between the consistency of the propensity score estimator and the collapsibility of non-linear models. Under this framework, we demonstrate that adjustment of the propensity score in an outcome model results in the decomposition of observed covariates into the propensity score and a remainder term. Omission of this remainder term from a non-collapsible regression model leads to biased estimates of the conditional odds ratio and conditional hazard ratio, but not for the conditional rate ratio. We further show, via simulation studies, that the bias in these propensity score-adjusted estimators increases with larger treatment effect size, larger covariate effects, and increasing dissimilarity between the coefficients of the covariates in the treatment model versus the outcome model.
ERIC Educational Resources Information Center
Gordovil-Merino, Amalia; Guardia-Olmos, Joan; Pero-Cebollero, Maribel
2012-01-01
In this paper, we used simulations to compare the performance of classical and Bayesian estimations in logistic regression models using small samples. In the performed simulations, conditions were varied, including the type of relationship between independent and dependent variable values (i.e., unrelated and related values), the type of variable…
ERIC Educational Resources Information Center
Le, Huy; Marcus, Justin
2012-01-01
This study used Monte Carlo simulation to examine the properties of the overall odds ratio (OOR), which was recently introduced as an index for overall effect size in multiple logistic regression. It was found that the OOR was relatively independent of study base rate and performed better than most commonly used R-square analogs in indexing model…
ERIC Educational Resources Information Center
Guler, Nese; Penfield, Randall D.
2009-01-01
In this study, we investigate the logistic regression (LR), Mantel-Haenszel (MH), and Breslow-Day (BD) procedures for the simultaneous detection of both uniform and nonuniform differential item functioning (DIF). A simulation study was used to assess and compare the Type I error rate and power of a combined decision rule (CDR), which assesses DIF…
NASA Astrophysics Data System (ADS)
Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd; Baharum, Adam
2015-10-01
Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test of the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.
ERIC Educational Resources Information Center
Hidalgo, M. Dolores; Lopez-Pina, Jose Antonio
2004-01-01
This article compares several procedures in their efficacy for detecting differential item functioning (DIF): logistic regression analysis, the Mantel-Haenszel (MH) procedure, and the modified Mantel-Haenszel procedure by Mazor, Clauser, and Hambleton. It also compares the effect size measures that these procedures provide. In this study,…
Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd; Baharum, Adam
2015-10-22
Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test of the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.
ERIC Educational Resources Information Center
Taylor, Aaron B.; West, Stephen G.; Aiken, Leona S.
2006-01-01
Variables that have been coarsely categorized into a small number of ordered categories are often modeled as outcome variables in psychological research. The authors employ a Monte Carlo study to investigate the effects of this coarse categorization of dependent variables on power to detect true effects using three classes of regression models:…
Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.; Michael, John A.; Helsel, Dennis R.
2008-01-01
Logistic regression was used to develop statistical models that can be used to predict the probability of debris flows in areas recently burned by wildfires by using data from 14 wildfires that burned in southern California during 2003-2006. Twenty-eight independent variables describing the basin morphology, burn severity, rainfall, and soil properties of 306 drainage basins located within those burned areas were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows soon after the 2003 to 2006 fires were delineated from data in the National Elevation Dataset using a geographic information system; (2) Data describing the basin morphology, burn severity, rainfall, and soil properties were compiled for each basin. These data were then input to a statistics software package for analysis using logistic regression; and (3) Relations between the occurrence or absence of debris flows and the basin morphology, burn severity, rainfall, and soil properties were evaluated, and five multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combinations produced the most effective models, and the multivariate models that best predicted the occurrence of debris flows were identified. Percentage of high burn severity and 3-hour peak rainfall intensity were significant variables in all models. Soil organic matter content and soil clay content were significant variables in all models except Model 5. Soil slope was a significant variable in all models except Model 4. The most suitable model can be selected from these five models on the basis of the availability of independent variables in the particular area of interest and field checking of probability maps. The multivariate logistic regression models can be entered into a geographic information system, and maps showing the probability of debris flows can be constructed in recently burned areas of
Predictors of psychiatric disorders in liver transplantation candidates: logistic regression models.
Rocca, Paola; Cocuzza, Elena; Rasetti, Roberta; Rocca, Giuseppe; Zanalda, Enrico; Bogetto, Filippo
2003-07-01
This study has two goals. The first goal is to assess the prevalence of psychiatric disorders in orthotopic liver transplantation (OLT) candidates by means of standardized procedures because there has been little research concerning psychiatric problems of potential OLT candidates using standardized instruments. The second goal focuses on identifying predictors of these psychiatric disorders. One hundred sixty-five elective OLT candidates were assessed by our unit. Psychiatric diagnoses were based on the Structured Clinical Interview for the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition. Patients also were assessed using the Hamilton Depression Rating Scale (HDRS) and the Spielberger Anxiety Index, State and Trait forms (STAI-X1 and STAI-X2). Severity of cirrhosis was assessed by applying Child-Pugh score criteria. Chi-squared and general linear model analysis of variance were used to test the univariate association between patient characteristics and both clinical psychiatric diagnoses and severity of psychiatric diseases. Variables with P less than.10 in univariate analyses were included in multiple regression models. Forty-three percent of patients presented at least one psychiatric diagnosis. Child-Pugh score and previous psychiatric diagnoses were independent significant predictors of depressive disorders. Severity of psychiatric symptoms measured by psychometric scales (HDRS, STAI-X1, and STAI-X2) was associated with Child-Pugh score in the multiple regression model. Our data suggest a high rate of psychiatric disorders, particularly adjustment disorders, in our sample of OLT candidates. Severity of liver disease emerges as the most important variable in predicting severity of psychiatric disorders in these patients.
Bent, Gardner C.; Archfield, Stacey A.
2002-01-01
A logistic regression equation was developed for estimating the probability of a stream flowing perennially at a specific site in Massachusetts. The equation provides city and town conservation commissions and the Massachusetts Department of Environmental Protection with an additional method for assessing whether streams are perennial or intermittent at a specific site in Massachusetts. This information is needed to assist these environmental agencies, who administer the Commonwealth of Massachusetts Rivers Protection Act of 1996, which establishes a 200-foot-wide protected riverfront area extending along the length of each side of the stream from the mean annual high-water line along each side of perennial streams, with exceptions in some urban areas. The equation was developed by relating the verified perennial or intermittent status of a stream site to selected basin characteristics of naturally flowing streams (no regulation by dams, surface-water withdrawals, ground-water withdrawals, diversion, waste-water discharge, and so forth) in Massachusetts. Stream sites used in the analysis were identified as perennial or intermittent on the basis of review of measured streamflow at sites throughout Massachusetts and on visual observation at sites in the South Coastal Basin, southeastern Massachusetts. Measured or observed zero flow(s) during months of extended drought as defined by the 310 Code of Massachusetts Regulations (CMR) 10.58(2)(a) were not considered when designating the perennial or intermittent status of a stream site. The database used to develop the equation included a total of 305 stream sites (84 intermittent- and 89 perennial-stream sites in the State, and 50 intermittent- and 82 perennial-stream sites in the South Coastal Basin). Stream sites included in the database had drainage areas that ranged from 0.14 to 8.94 square miles in the State and from 0.02 to 7.00 square miles in the South Coastal Basin.Results of the logistic regression analysis
Buchner, Florian; Wasem, Jürgen; Schillo, Sonja
2017-01-01
Risk equalization formulas have been refined since their introduction about two decades ago. Because of the complexity and the abundance of possible interactions between the variables used, hardly any interactions are considered. A regression tree is used to systematically search for interactions, a methodologically new approach in risk equalization. Analyses are based on a data set of nearly 2.9 million individuals from a major German social health insurer. A two-step approach is applied: In the first step a regression tree is built on the basis of the learning data set. Terminal nodes characterized by more than one morbidity-group-split represent interaction effects of different morbidity groups. In the second step the 'traditional' weighted least squares regression equation is expanded by adding interaction terms for all interactions detected by the tree, and regression coefficients are recalculated. The resulting risk adjustment formula shows an improvement in the adjusted R(2) from 25.43% to 25.81% on the evaluation data set. Predictive ratios are calculated for subgroups affected by the interactions. The R(2) improvement detected is only marginal. According to the sample level performance measures used, not involving a considerable number of morbidity interactions forms no relevant loss in accuracy. Copyright © 2015 John Wiley & Sons, Ltd.
Using Quantile and Asymmetric Least Squares Regression for Optimal Risk Adjustment.
Lorenz, Normann
2016-06-13
In this paper, we analyze optimal risk adjustment for direct risk selection (DRS). Integrating insurers' activities for risk selection into a discrete choice model of individuals' health insurance choice shows that DRS has the structure of a contest. For the contest success function (csf) used in most of the contest literature (the Tullock-csf), optimal transfers for a risk adjustment scheme have to be determined by means of a restricted quantile regression, irrespective of whether insurers are primarily engaged in positive DRS (attracting low risks) or negative DRS (repelling high risks). This is at odds with the common practice of determining transfers by means of a least squares regression. However, this common practice can be rationalized for a new csf, but only if positive and negative DRSs are equally important; if they are not, optimal transfers have to be calculated by means of a restricted asymmetric least squares regression. Using data from German and Swiss health insurers, we find considerable differences between the three types of regressions. Optimal transfers therefore critically depend on which csf represents insurers' incentives for DRS and, if it is not the Tullock-csf, whether insurers are primarily engaged in positive or negative DRS. Copyright © 2016 John Wiley & Sons, Ltd.
LOGISTIC NETWORK REGRESSION FOR SCALABLE ANALYSIS OF NETWORKS WITH JOINT EDGE/VERTEX DYNAMICS.
Almquist, Zack W; Butts, Carter T
2014-08-01
Change in group size and composition has long been an important area of research in the social sciences. Similarly, interest in interaction dynamics has a long history in sociology and social psychology. However, the effects of endogenous group change on interaction dynamics are a surprisingly understudied area. One way to explore these relationships is through social network models. Network dynamics may be viewed as a process of change in the edge structure of a network, in the vertex set on which edges are defined, or in both simultaneously. Although early studies of such processes were primarily descriptive, recent work on this topic has increasingly turned to formal statistical models. Although showing great promise, many of these modern dynamic models are computationally intensive and scale very poorly in the size of the network under study and/or the number of time points considered. Likewise, currently used models focus on edge dynamics, with little support for endogenously changing vertex sets. Here, the authors show how an existing approach based on logistic network regression can be extended to serve as a highly scalable framework for modeling large networks with dynamic vertex sets. The authors place this approach within a general dynamic exponential family (exponential-family random graph modeling) context, clarifying the assumptions underlying the framework (and providing a clear path for extensions), and they show how model assessment methods for cross-sectional networks can be extended to the dynamic case. Finally, the authors illustrate this approach on a classic data set involving interactions among windsurfers on a California beach.
Logistic regression modeling to assess groundwater vulnerability to contamination in Hawaii, USA.
Mair, Alan; El-Kadi, Aly I
2013-10-01
Capture zone analysis combined with a subjective susceptibility index is currently used in Hawaii to assess vulnerability to contamination of drinking water sources derived from groundwater. In this study, we developed an alternative objective approach that combines well capture zones with multiple-variable logistic regression (LR) modeling and applied it to the highly-utilized Pearl Harbor and Honolulu aquifers on the island of Oahu, Hawaii. Input for the LR models utilized explanatory variables based on hydrogeology, land use, and well geometry/location. A suite of 11 target contaminants detected in the region, including elevated nitrate (>1 mg/L), four chlorinated solvents, four agricultural fumigants, and two pesticides, was used to develop the models. We then tested the ability of the new approach to accurately separate groups of wells with low and high vulnerability, and the suitability of nitrate as an indicator of other types of contamination. Our results produced contaminant-specific LR models that accurately identified groups of wells with the lowest/highest reported detections and the lowest/highest nitrate concentrations. Current and former agricultural land uses were identified as significant explanatory variables for eight of the 11 target contaminants, while elevated nitrate was a significant variable for five contaminants. The utility of the combined approach is contingent on the availability of hydrologic and chemical monitoring data for calibrating groundwater and LR models. Application of the approach using a reference site with sufficient data could help identify key variables in areas with similar hydrogeology and land use but limited data. In addition, elevated nitrate may also be a suitable indicator of groundwater contamination in areas with limited data. The objective LR modeling approach developed in this study is flexible enough to address a wide range of contaminants and represents a suitable addition to the current subjective approach.
Logistic regression modeling to assess groundwater vulnerability to contamination in Hawaii, USA
NASA Astrophysics Data System (ADS)
Mair, Alan; El-Kadi, Aly I.
2013-10-01
Capture zone analysis combined with a subjective susceptibility index is currently used in Hawaii to assess vulnerability to contamination of drinking water sources derived from groundwater. In this study, we developed an alternative objective approach that combines well capture zones with multiple-variable logistic regression (LR) modeling and applied it to the highly-utilized Pearl Harbor and Honolulu aquifers on the island of Oahu, Hawaii. Input for the LR models utilized explanatory variables based on hydrogeology, land use, and well geometry/location. A suite of 11 target contaminants detected in the region, including elevated nitrate (> 1 mg/L), four chlorinated solvents, four agricultural fumigants, and two pesticides, was used to develop the models. We then tested the ability of the new approach to accurately separate groups of wells with low and high vulnerability, and the suitability of nitrate as an indicator of other types of contamination. Our results produced contaminant-specific LR models that accurately identified groups of wells with the lowest/highest reported detections and the lowest/highest nitrate concentrations. Current and former agricultural land uses were identified as significant explanatory variables for eight of the 11 target contaminants, while elevated nitrate was a significant variable for five contaminants. The utility of the combined approach is contingent on the availability of hydrologic and chemical monitoring data for calibrating groundwater and LR models. Application of the approach using a reference site with sufficient data could help identify key variables in areas with similar hydrogeology and land use but limited data. In addition, elevated nitrate may also be a suitable indicator of groundwater contamination in areas with limited data. The objective LR modeling approach developed in this study is flexible enough to address a wide range of contaminants and represents a suitable addition to the current subjective approach.
Bias-reduced and separation-proof conditional logistic regression with small or sparse data sets.
Heinze, Georg; Puhr, Rainer
2010-03-30
Conditional logistic regression is used for the analysis of binary outcomes when subjects are stratified into several subsets, e.g. matched pairs or blocks. Log odds ratio estimates are usually found by maximizing the conditional likelihood. This approach eliminates all strata-specific parameters by conditioning on the number of events within each stratum. However, in the analyses of both an animal experiment and a lung cancer case-control study, conditional maximum likelihood (CML) resulted in infinite odds ratio estimates and monotone likelihood. Estimation can be improved by using Cytel Inc.'s well-known LogXact software, which provides a median unbiased estimate and exact or mid-p confidence intervals. Here, we suggest and outline point and interval estimation based on maximization of a penalized conditional likelihood in the spirit of Firth's (Biometrika 1993; 80:27-38) bias correction method (CFL). We present comparative analyses of both studies, demonstrating some advantages of CFL over competitors. We report on a small-sample simulation study where CFL log odds ratio estimates were almost unbiased, whereas LogXact estimates showed some bias and CML estimates exhibited serious bias. Confidence intervals and tests based on the penalized conditional likelihood had close-to-nominal coverage rates and yielded highest power among all methods compared, respectively. Therefore, we propose CFL as an attractive solution to the stratified analysis of binary data, irrespective of the occurrence of monotone likelihood. A SAS program implementing CFL is available at: http://www.muw.ac.at/msi/biometrie/programs.
LOGISTIC NETWORK REGRESSION FOR SCALABLE ANALYSIS OF NETWORKS WITH JOINT EDGE/VERTEX DYNAMICS
Almquist, Zack W.; Butts, Carter T.
2015-01-01
Change in group size and composition has long been an important area of research in the social sciences. Similarly, interest in interaction dynamics has a long history in sociology and social psychology. However, the effects of endogenous group change on interaction dynamics are a surprisingly understudied area. One way to explore these relationships is through social network models. Network dynamics may be viewed as a process of change in the edge structure of a network, in the vertex set on which edges are defined, or in both simultaneously. Although early studies of such processes were primarily descriptive, recent work on this topic has increasingly turned to formal statistical models. Although showing great promise, many of these modern dynamic models are computationally intensive and scale very poorly in the size of the network under study and/or the number of time points considered. Likewise, currently used models focus on edge dynamics, with little support for endogenously changing vertex sets. Here, the authors show how an existing approach based on logistic network regression can be extended to serve as a highly scalable framework for modeling large networks with dynamic vertex sets. The authors place this approach within a general dynamic exponential family (exponential-family random graph modeling) context, clarifying the assumptions underlying the framework (and providing a clear path for extensions), and they show how model assessment methods for cross-sectional networks can be extended to the dynamic case. Finally, the authors illustrate this approach on a classic data set involving interactions among windsurfers on a California beach. PMID:26120218
Nakasone, Yutaka Ikeda, Osamu; Yamashita, Yasuyuki; Kudoh, Kouichi; Shigematsu, Yoshinori; Harada, Kazunori
2007-09-15
We applied multivariate analysis to the clinical findings in patients with acute gastrointestinal (GI) hemorrhage and compared the relationship between these findings and angiographic evidence of extravasation. Our study population consisted of 46 patients with acute GI bleeding. They were divided into two groups. In group 1 we retrospectively analyzed 41 angiograms obtained in 29 patients (age range, 25-91 years; average, 71 years). Their clinical findings including the shock index (SI), diastolic blood pressure, hemoglobin, platelet counts, and age, which were quantitatively analyzed. In group 2, consisting of 17 patients (age range, 21-78 years; average, 60 years), we prospectively applied statistical analysis by a logistics regression model to their clinical findings and then assessed 21 angiograms obtained in these patients to determine whether our model was useful for predicting the presence of angiographic evidence of extravasation. On 18 of 41 (43.9%) angiograms in group 1 there was evidence of extravasation; in 3 patients it was demonstrated only by selective angiography. Factors significantly associated with angiographic visualization of extravasation were the SI and patient age. For differentiation between cases with and cases without angiographic evidence of extravasation, the maximum cutoff point was between 0.51 and 0.0.53. Of the 21 angiograms obtained in group 2, 13 (61.9%) showed evidence of extravasation; in 1 patient it was demonstrated only on selective angiograms. We found that in 90% of the cases, the prospective application of our model correctly predicted the angiographically confirmed presence or absence of extravasation. We conclude that in patients with GI hemorrhage, angiographic visualization of extravasation is associated with the pre-embolization SI. Patients with a high SI value should undergo study to facilitate optimal treatment planning.
Duchesne, Thierry; Fortin, Daniel
2017-01-01
Conditional logistic regression (CLR) is widely used to analyze habitat selection and movement of animals when resource availability changes over space and time. Observations used for these analyses are typically autocorrelated, which biases model-based variance estimation of CLR parameters. This bias can be corrected using generalized estimating equations (GEE), an approach that requires partitioning the data into independent clusters. Here we establish the link between clustering rules in GEE and their effectiveness to remove statistical biases in variance estimation of CLR parameters. The current lack of guidelines is such that broad variation in clustering rules can be found among studies (e.g., 14–450 clusters) with unknown consequences on the robustness of statistical inference. We simulated datasets reflecting conditions typical of field studies. Longitudinal data were generated based on several parameters of habitat selection with varying strength of autocorrelation and some individuals having more observations than others. We then evaluated how changing the number of clusters impacted the effectiveness of variance estimators. Simulations revealed that 30 clusters were sufficient to get unbiased and relatively precise estimates of variance of parameter estimates. The use of destructive sampling to increase the number of independent clusters was successful at removing statistical bias, but only when observations were temporally autocorrelated and the strength of inter-individual heterogeneity was weak. GEE also provided robust estimates of variance for different magnitudes of unbalanced datasets. Our simulations demonstrate that GEE should be estimated by assigning each individual to a cluster when at least 30 animals are followed, or by using destructive sampling for studies with fewer individuals having intermediate level of behavioural plasticity in selection and temporally autocorrelated observations. The simulations provide valuable information to
Binary logistic regression analysis of hard palate dimensions for sexing human crania
Asif, Muhammed; Shetty, Radhakrishna; Avadhani, Ramakrishna
2016-01-01
Sex determination is the preliminary step in every forensic investigation and the hard palate assumes significance in cranial sexing in cases involving burns and explosions due to its resistant nature and secluded location. This study analyzes the sexing potential of incisive foramen to posterior nasal spine length, palatine process of maxilla length, horizontal plate of palatine bone length and transverse length between the greater palatine foramina. The study deviates from the conventional method of measuring the maxillo-alveolar length and breadth as the dimensions considered in this study are more heat resistant and useful in situations with damaged alveolar margins. The study involves 50 male and 50 female adult dry skulls of Indian ethnic group. The dimensions measured were statistically analyzed using Student's t test, binary logistic regression and receiver operating characteristic curve. It was observed that the incisive foramen to posterior nasal spine length is a definite sex marker with sex predictability of 87.2%. The palatine process of maxilla length with 66.8% sex predictability and the horizontal plate of palatine bone length with 71.9% sex predictability cannot be relied upon as definite sex markers. The transverse length between the greater palatine foramina is statistically insignificant in sexing crania (P=0.318). Considering a significant overlap of values in both the sexes the palatal dimensions singularly cannot be relied upon for sexing. Nevertheless, considering the high sex predictability of incisive foramen to posterior nasal spine length this dimension can definitely be used to supplement other sexing evidence available to precisely conclude the cranial sex. PMID:27382518
Abad, Cesar C. C.; Barros, Ronaldo V.; Bertuzzi, Romulo; Gagliardi, João F. L.; Lima-Silva, Adriano E.; Lambert, Mike I.
2016-01-01
Abstract The aim of this study was to verify the power of VO2max, peak treadmill running velocity (PTV), and running economy (RE), unadjusted or allometrically adjusted, in predicting 10 km running performance. Eighteen male endurance runners performed: 1) an incremental test to exhaustion to determine VO2max and PTV; 2) a constant submaximal run at 12 km·h−1 on an outdoor track for RE determination; and 3) a 10 km running race. Unadjusted (VO2max, PTV and RE) and adjusted variables (VO2max0.72, PTV0.72 and RE0.60) were investigated through independent multiple regression models to predict 10 km running race time. There were no significant correlations between 10 km running time and either the adjusted or unadjusted VO2max. Significant correlations (p < 0.01) were found between 10 km running time and adjusted and unadjusted RE and PTV, providing models with effect size > 0.84 and power > 0.88. The allometrically adjusted predictive model was composed of PTV0.72 and RE0.60 and explained 83% of the variance in 10 km running time with a standard error of the estimate (SEE) of 1.5 min. The unadjusted model composed of a single PVT accounted for 72% of the variance in 10 km running time (SEE of 1.9 min). Both regression models provided powerful estimates of 10 km running time; however, the unadjusted PTV may provide an uncomplicated estimation. PMID:28149382
Education-Based Gaps in eHealth: A Weighted Logistic Regression Approach
2016-01-01
Background Persons with a college degree are more likely to engage in eHealth behaviors than persons without a college degree, compounding the health disadvantages of undereducated groups in the United States. However, the extent to which quality of recent eHealth experience reduces the education-based eHealth gap is unexplored. Objective The goal of this study was to examine how eHealth information search experience moderates the relationship between college education and eHealth behaviors. Methods Based on a nationally representative sample of adults who reported using the Internet to conduct the most recent health information search (n=1458), I evaluated eHealth search experience in relation to the likelihood of engaging in different eHealth behaviors. I examined whether Internet health information search experience reduces the eHealth behavior gaps among college-educated and noncollege-educated adults. Weighted logistic regression models were used to estimate the probability of different eHealth behaviors. Results College education was significantly positively related to the likelihood of 4 eHealth behaviors. In general, eHealth search experience was negatively associated with health care behaviors, health information-seeking behaviors, and user-generated or content sharing behaviors after accounting for other covariates. Whereas Internet health information search experience has narrowed the education gap in terms of likelihood of using email or Internet to communicate with a doctor or health care provider and likelihood of using a website to manage diet, weight, or health, it has widened the education gap in the instances of searching for health information for oneself, searching for health information for someone else, and downloading health information on a mobile device. Conclusion The relationship between college education and eHealth behaviors is moderated by Internet health information search experience in different ways depending on the type of e
Wagner, Philippe; Ghith, Nermin; Leckie, George
2016-01-01
Background and Aim Many multilevel logistic regression analyses of “neighbourhood and health” focus on interpreting measures of associations (e.g., odds ratio, OR). In contrast, multilevel analysis of variance is rarely considered. We propose an original stepwise analytical approach that distinguishes between “specific” (measures of association) and “general” (measures of variance) contextual effects. Performing two empirical examples we illustrate the methodology, interpret the results and discuss the implications of this kind of analysis in public health. Methods We analyse 43,291 individuals residing in 218 neighbourhoods in the city of Malmö, Sweden in 2006. We study two individual outcomes (psychotropic drug use and choice of private vs. public general practitioner, GP) for which the relative importance of neighbourhood as a source of individual variation differs substantially. In Step 1 of the analysis, we evaluate the OR and the area under the receiver operating characteristic (AUC) curve for individual-level covariates (i.e., age, sex and individual low income). In Step 2, we assess general contextual effects using the AUC. Finally, in Step 3 the OR for a specific neighbourhood characteristic (i.e., neighbourhood income) is interpreted jointly with the proportional change in variance (i.e., PCV) and the proportion of ORs in the opposite direction (POOR) statistics. Results For both outcomes, information on individual characteristics (Step 1) provide a low discriminatory accuracy (AUC = 0.616 for psychotropic drugs; = 0.600 for choosing a private GP). Accounting for neighbourhood of residence (Step 2) only improved the AUC for choosing a private GP (+0.295 units). High neighbourhood income (Step 3) was strongly associated to choosing a private GP (OR = 3.50) but the PCV was only 11% and the POOR 33%. Conclusion Applying an innovative stepwise multilevel analysis, we observed that, in Malmö, the neighbourhood context per se had a negligible
The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring
ERIC Educational Resources Information Center
Haberman, Shelby J.; Sinharay, Sandip
2010-01-01
Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…
Staley, Dennis M.; Negri, Jacquelyn A.; Kean, Jason W.; Laber, Jayme L.; Tillery, Anne C.; Youberg, Ann M.
2016-06-30
Wildfire can significantly alter the hydrologic response of a watershed to the extent that even modest rainstorms can generate dangerous flash floods and debris flows. To reduce public exposure to hazard, the U.S. Geological Survey produces post-fire debris-flow hazard assessments for select fires in the western United States. We use publicly available geospatial data describing basin morphology, burn severity, soil properties, and rainfall characteristics to estimate the statistical likelihood that debris flows will occur in response to a storm of a given rainfall intensity. Using an empirical database and refined geospatial analysis methods, we defined new equations for the prediction of debris-flow likelihood using logistic regression methods. We showed that the new logistic regression model outperformed previous models used to predict debris-flow likelihood.
Chen, Chau-Kuang; Bruce, Michelle; Tyler, Lauren; Brown, Claudine; Garrett, Angelica; Goggins, Susan; Lewis-Polite, Brandy; Weriwoh, Mirabel L; Juarez, Paul D; Hood, Darryl B; Skelton, Tyler
2013-02-01
The goal of this study was to analyze a 54-item instrument for assessment of perception of exposure to environmental contaminants within the context of the built environment, or exposome. This exposome was defined in five domains to include 1) home and hobby, 2) school, 3) community, 4) occupation, and 5) exposure history. Interviews were conducted with child-bearing-age minority women at Metro Nashville General Hospital at Meharry Medical College. Data were analyzed utilizing DTReg software for Support Vector Machine (SVM) modeling followed by an SPSS package for a logistic regression model. The target (outcome) variable of interest was respondent's residence by ZIP code. The results demonstrate that the rank order of important variables with respect to SVM modeling versus traditional logistic regression models is almost identical. This is the first study documenting that SVM analysis has discriminate power for determination of higher-ordered spatial relationships on an environmental exposure history questionnaire.
Li, J.; Gray, B.R.; Bates, D.M.
2008-01-01
Partitioning the variance of a response by design levels is challenging for binomial and other discrete outcomes. Goldstein (2003) proposed four definitions for variance partitioning coefficients (VPC) under a two-level logistic regression model. In this study, we explicitly derived formulae for multi-level logistic regression model and subsequently studied the distributional properties of the calculated VPCs. Using simulations and a vegetation dataset, we demonstrated associations between different VPC definitions, the importance of methods for estimating VPCs (by comparing VPC obtained using Laplace and penalized quasilikehood methods), and bivariate dependence between VPCs calculated at different levels. Such an empirical study lends an immediate support to wider applications of VPC in scientific data analysis.
NASA Astrophysics Data System (ADS)
Colkesen, Ismail; Sahin, Emrehan Kutlug; Kavzoglu, Taskin
2016-06-01
Identification of landslide prone areas and production of accurate landslide susceptibility zonation maps have been crucial topics for hazard management studies. Since the prediction of susceptibility is one of the main processing steps in landslide susceptibility analysis, selection of a suitable prediction method plays an important role in the success of the susceptibility zonation process. Although simple statistical algorithms (e.g. logistic regression) have been widely used in the literature, the use of advanced non-parametric algorithms in landslide susceptibility zonation has recently become an active research topic. The main purpose of this study is to investigate the possible application of kernel-based Gaussian process regression (GPR) and support vector regression (SVR) for producing landslide susceptibility map of Tonya district of Trabzon, Turkey. Results of these two regression methods were compared with logistic regression (LR) method that is regarded as a benchmark method. Results showed that while kernel-based GPR and SVR methods generally produced similar results (90.46% and 90.37%, respectively), they outperformed the conventional LR method by about 18%. While confirming the superiority of the GPR method, statistical tests based on ROC statistics, success rate and prediction rate curves revealed the significant improvement in susceptibility map accuracy by applying kernel-based GPR and SVR methods.
Engoren, Milo; Habib, Robert H; Dooner, John J; Schwann, Thomas A
2013-08-01
As many as 14 % of patients undergoing coronary artery bypass surgery are readmitted within 30 days. Readmission is usually the result of morbidity and may lead to death. The purpose of this study is to develop and compare statistical and genetic programming models to predict readmission. Patients were divided into separate Construction and Validation populations. Using 88 variables, logistic regression, genetic programs, and artificial neural nets were used to develop predictive models. Models were first constructed and tested on the Construction populations, then validated on the Validation population. Areas under the receiver operator characteristic curves (AU ROC) were used to compare the models. Two hundred and two patients (7.6 %) in the 2,644 patient Construction group and 216 (8.0 %) of the 2,711 patient Validation group were re-admitted within 30 days of CABG surgery. Logistic regression predicted readmission with AU ROC = .675 ± .021 in the Construction group. Genetic programs significantly improved the accuracy, AU ROC = .767 ± .001, p < .001). Artificial neural nets were less accurate with AU ROC = 0.597 ± .001 in the Construction group. Predictive accuracy of all three techniques fell in the Validation group. However, the accuracy of genetic programming (AU ROC = .654 ± .001) was still trivially but statistically non-significantly better than that of the logistic regression (AU ROC = .644 ± .020, p = .61). Genetic programming and logistic regression provide alternative methods to predict readmission that are similarly accurate.
Hanson, Erin K; Mirza, Mohid; Rekab, Kamel; Ballantyne, Jack
2014-11-01
We report the identification of sensitive and specific miRNA biomarkers for menstrual blood, a tissue that might provide probative information in certain specialized instances. We incorporated these biomarkers into qPCR assays and developed a quantitative statistical model using logistic regression that permits the prediction of menstrual blood in a forensic sample with a high, and measurable, degree of accuracy. Using the developed model, we achieved 100% accuracy in determining the body fluid of interest for a set of test samples (i.e. samples not used in model development). The development, and details, of the logistic regression model are described. Testing and evaluation of the finalized logistic regression modeled assay using a small number of samples was carried out to preliminarily estimate the limit of detection (LOD), specificity in admixed samples and expression of the menstrual blood miRNA biomarkers throughout the menstrual cycle (25-28 days). The LOD was <1 ng of total RNA, the assay performed as expected with admixed samples and menstrual blood was identified only during the menses phase of the female reproductive cycle in two donors.
Zhao, Rui-Na; Zhang, Bo; Yang, Xiao; Jiang, Yu-Xin; Lai, Xing-Jian; Zhang, Xiao-Yan
2015-12-01
The purpose of the study described here was to determine specific characteristics of thyroid microcarcinoma (TMC) and explore the value of contrast-enhanced ultrasound (CEUS) combined with conventional ultrasound (US) in the diagnosis of TMC. Characteristics of 63 patients with TMC and 39 with benign sub-centimeter thyroid nodules were retrospectively analyzed. Multivariate logistic regression analysis was performed to determine independent risk factors. Four variables were included in the logistic regression models: age, shape, blood flow distribution and enhancement pattern. The area under the receiver operating characteristic curve was 0.919. With 0.113 selected as the cutoff value, sensitivity, specificity, positive predictive value, negative predictive value and accuracy were 90.5%, 82.1%, 89.1%, 84.2% and 87.3%, respectively. Independent risk factors for TMC determined with the combination of CEUS and conventional US were age, shape, blood flow distribution and enhancement pattern. Age was negatively correlated with malignancy, whereas shape, blood flow distribution and enhancement pattern were positively correlated. The logistic regression model involving CEUS and conventional US was found to be effective in the diagnosis of sub-centimeter thyroid nodules.
Fufaa, Gudeta; Cai, Bin
2016-01-01
Context Reduced insulin sensitivity is one of the traditional risk factors for chronic diseases such as type 2 diabetes. Reduced insulin sensitivity leads to insulin resistance, which in turn can lead to the development of type 2 diabetes. Few studies have examined factors such as blood pressure, tobacco and alcohol consumption that influence changes in insulin sensitivity over time especially among young adults. Purpose To examine temporal changes in insulin sensitivity in young adults (18-30 years of age at baseline) over a period of 20 years by taking into account the effects of tobacco and alcohol consumptions at baseline. In other words, the purpose of the present study is to examine if baseline tobacco and alcohol consumptions can be used in predicting lowered insulin sensitivity. Method This is a retrospective study using data collected by the Coronary Artery Risk Development in Young Adults (CARDIA) study from the National Heart, Lung, and Blood Institute. Participants were enrolled into the study in 1985 (baseline) and followed up to 2005. Insulin sensitivity, measured by the quantitative insulin sensitivity check index (QUICKI), was recorded at baseline and 20 years later, in 2005. The number of participants included in the study was 3,547. The original study included a total of 5,112 participants at baseline. Of these, 54.48% were female, and 45.52% were male; 45.31% were 18 to 24 years of age, and 54.69% were 25 to 30 years of age. Ordinal logistic regression was used to assess changes in insulin sensitivity. Changes in insulin sensitivity from baseline were calculated and grouped into three categories (more than 15%, more than 8.5% to at most 15%, and at most 8.5%), which provided the basis for employing ordinal logistic regression to assess changes in insulin sensitivity. The effects of alcohol and smoking consumption at baseline on the change in insulin sensitivity were accounted for by including these variables in the model. Results Daily alcohol
NASA Astrophysics Data System (ADS)
Mǎgut, F. L.; Zaharia, S.; Glade, T.; Irimuş, I. A.
2012-04-01
Various methods exist in analyzing spatial landslide susceptibility and classing the results in susceptibility classes. The prediction of spatial landslide distribution can be performed by using a variety of methods based on GIS techniques. The two very common methods of a heuristic assessment and a logistic regression model are employed in this study in order to compare their performance in predicting the spatial distribution of previously mapped landslides for a study area located in Maramureš County, in Northwestern Romania. The first model determines a susceptibility index by combining the heuristic approach with GIS techniques of spatial data analysis. The criteria used for quantifying each susceptibility factor and the expression used to determine the susceptibility index are taken from the Romanian legislation (Governmental Decision 447/2003). This procedure is followed in any Romanian state-ordered study which relies on financial support. The logistic regression model predicts the spatial distribution of landslides by statistically calculating regressive coefficients which describe the dependency of previously mapped landslides on different factors. The identified shallow landslides correspond generally to Pannonian marl and Quaternary contractile clay deposits. The study region is located in the Northwestern part of Romania, including the Baia Mare municipality, the capital of Maramureš County. The study focuses on the former piedmontal region situated to the south of the volcanic mountains Gutâi, in the Baia Mare Depression, where most of the landslide activity has been recorded. In addition, a narrow sector of the volcanic mountains which borders the city of Baia Mare to the north has also been included to test the accuracy of the models in different lithologic units. The results of both models indicate a general medium landslide susceptibility of the study area. The more detailed differences will be discussed with respect to the advantages and
Enticott, Joanne C; Cheng, I-Hao; Russell, Grant; Szwarc, Josef; Braitberg, George; Peek, Anne; Meadows, Graham
2015-01-01
This study investigated if people born in refugee source countries are disproportionately represented among those receiving a diagnosis of mental illness within emergency departments (EDs). The setting was the Cities of Greater Dandenong and Casey, the resettlement region for one-twelfth of Australia's refugees. An epidemiological, secondary data analysis compared mental illness diagnoses received in EDs by refugee and non-refugee populations. Data was the Victorian Emergency Minimum Dataset in the 2008-09 financial year. Univariate and multivariate logistic regression created predictive models for mental illness using five variables: age, sex, refugee background, interpreter use and preferred language. Collinearity, model fit and model stability were examined. Multivariate analysis showed age and sex to be the only significant risk factors for mental illness diagnosis in EDs. 'Refugee status', 'interpreter use' and 'preferred language' were not associatedwith a mental health diagnosis following risk adjustment forthe effects ofage and sex. The disappearance ofthe univariate association after adjustment for age and sex is a salutary lesson for Medicare Locals and other health planners regarding the importance of adjusting analyses of health service data for demographic characteristics.
NASA Astrophysics Data System (ADS)
Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele
2015-11-01
The aim of this work is to define reliable susceptibility models for shallow landslides using Logistic Regression and Random Forests multivariate statistical techniques. The study area, located in North-East Sicily, was hit on October 1st 2009 by a severe rainstorm (225 mm of cumulative rainfall in 7 h) which caused flash floods and more than 1000 landslides. Several small villages, such as Giampilieri, were hit with 31 fatalities, 6 missing persons and damage to buildings and transportation infrastructures. Landslides, mainly types such as earth and debris translational slides evolving into debris flows, were triggered on steep slopes and involved colluvium and regolith materials which cover the underlying metamorphic bedrock. The work has been carried out with the following steps: i) realization of a detailed event landslide inventory map through field surveys coupled with observation of high resolution aerial colour orthophoto; ii) identification of landslide source areas; iii) data preparation of landslide controlling factors and descriptive statistics based on a bivariate method (Frequency Ratio) to get an initial overview on existing relationships between causative factors and shallow landslide source areas; iv) choice of criteria for the selection and sizing of the mapping unit; v) implementation of 5 multivariate statistical susceptibility models based on Logistic Regression and Random Forests techniques and focused on landslide source areas; vi) evaluation of the influence of sample size and type of sampling on results and performance of the models; vii) evaluation of the predictive capabilities of the models using ROC curve, AUC and contingency tables; viii) comparison of model results and obtained susceptibility maps; and ix) analysis of temporal variation of landslide susceptibility related to input parameter changes. Models based on Logistic Regression and Random Forests have demonstrated excellent predictive capabilities. Land use and wildfire
Fitzpatrick, Cole D; Rakasi, Saritha; Knodler, Michael A
2017-01-01
Speed is one of the most important factors in traffic safety as higher speeds are linked to increased crash risk and higher injury severities. Nearly a third of fatal crashes in the United States are designated as "speeding-related", which is defined as either "the driver behavior of exceeding the posted speed limit or driving too fast for conditions." While many studies have utilized the speeding-related designation in safety analyses, no studies have examined the underlying accuracy of this designation. Herein, we investigate the speeding-related crash designation through the development of a series of logistic regression models that were derived from the established speeding-related crash typologies and validated using a blind review, by multiple researchers, of 604 crash narratives. The developed logistic regression model accurately identified crashes which were not originally designated as speeding-related but had crash narratives that suggested speeding as a causative factor. Only 53.4% of crashes designated as speeding-related contained narratives which described speeding as a causative factor. Further investigation of these crashes revealed that the driver contributing code (DCC) of "driving too fast for conditions" was being used in three separate situations. Additionally, this DCC was also incorrectly used when "exceeding the posted speed limit" would likely have been a more appropriate designation. Finally, it was determined that the responding officer only utilized one DCC in 82% of crashes not designated as speeding-related but contained a narrative indicating speed as a contributing causal factor. The use of logistic regression models based upon speeding-related crash typologies offers a promising method by which all possible speeding-related crashes could be identified.
NASA Technical Reports Server (NTRS)
Smith, Kelly M.; Gay, Robert S.; Stachowiak, Susan J.
2013-01-01
In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter to improve altitude knowledge. In order to increase overall robustness, the vehicle also has an alternate method of triggering the parachute deployment sequence based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this backup trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to semi-automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a statistical classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers improved performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles.
NASA Technical Reports Server (NTRS)
Smith, Kelly; Gay, Robert; Stachowiak, Susan
2013-01-01
In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter to improve altitude knowledge. In order to increase overall robustness, the vehicle also has an alternate method of triggering the parachute deployment sequence based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this backup trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to semi-automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a statistical classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers improved performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles
NASA Technical Reports Server (NTRS)
Smith, Kelly M.; Gay, Robert S.; Stachowiak, Susan J.
2013-01-01
In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter. In order to increase overall robustness, the vehicle also has an alternate method of triggering the drogue parachute deployment based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this velocity-based trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers excellent performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles.
NASA Astrophysics Data System (ADS)
Pradhan, Biswajeet
2010-05-01
This paper presents the results of the cross-validation of a multivariate logistic regression model using remote sensing data and GIS for landslide hazard analysis on the Penang, Cameron, and Selangor areas in Malaysia. Landslide locations in the study areas were identified by interpreting aerial photographs and satellite images, supported by field surveys. SPOT 5 and Landsat TM satellite imagery were used to map landcover and vegetation index, respectively. Maps of topography, soil type, lineaments and land cover were constructed from the spatial datasets. Ten factors which influence landslide occurrence, i.e., slope, aspect, curvature, distance from drainage, lithology, distance from lineaments, soil type, landcover, rainfall precipitation, and normalized difference vegetation index (ndvi), were extracted from the spatial database and the logistic regression coefficient of each factor was computed. Then the landslide hazard was analysed using the multivariate logistic regression coefficients derived not only from the data for the respective area but also using the logistic regression coefficients calculated from each of the other two areas (nine hazard maps in all) as a cross-validation of the model. For verification of the model, the results of the analyses were then compared with the field-verified landslide locations. Among the three cases of the application of logistic regression coefficient in the same study area, the case of Selangor based on the Selangor logistic regression coefficients showed the highest accuracy (94%), where as Penang based on the Penang coefficients showed the lowest accuracy (86%). Similarly, among the six cases from the cross application of logistic regression coefficient in other two areas, the case of Selangor based on logistic coefficient of Cameron showed highest (90%) prediction accuracy where as the case of Penang based on the Selangor logistic regression coefficients showed the lowest accuracy (79%). Qualitatively, the cross
Zhang, Y J; Xue, F X; Bai, Z P
2017-03-06
The impact of maternal air pollution exposure on offspring health has received much attention. Precise and feasible exposure estimation is particularly important for clarifying exposure-response relationships and reducing heterogeneity among studies. Temporally-adjusted land use regression (LUR) models are exposure assessment methods developed in recent years that have the advantage of having high spatial-temporal resolution. Studies on the health effects of outdoor air pollution exposure during pregnancy have been increasingly carried out using this model. In China, research applying LUR models was done mostly at the model construction stage, and findings from related epidemiological studies were rarely reported. In this paper, the sources of heterogeneity and research progress of meta-analysis research on the associations between air pollution and adverse pregnancy outcomes were analyzed. The methods of the characteristics of temporally-adjusted LUR models were introduced. The current epidemiological studies on adverse pregnancy outcomes that applied this model were systematically summarized. Recommendations for the development and application of LUR models in China are presented. This will encourage the implementation of more valid exposure predictions during pregnancy in large-scale epidemiological studies on the health effects of air pollution in China.
Perez, Ivan; Chavez, Allison K.; Ponce, Dario
2016-01-01
Background: The Ricketts' posteroanterior (PA) cephalometry seems to be the most widely used and it has not been tested by multivariate statistics for sex determination. Objective: The objective was to determine the applicability of Ricketts' PA cephalometry for sex determination using the logistic regression analysis. Materials and Methods: The logistic models were estimated at distinct age cutoffs (all ages, 11 years, 13 years, and 15 years) in a database from 1,296 Hispano American Peruvians between 5 years and 44 years of age. Results: The logistic models were composed by six cephalometric measurements; the accuracy achieved by resubstitution varied between 60% and 70% and all the variables, with one exception, exhibited a direct relationship with the probability of being classified as male; the nasal width exhibited an indirect relationship. Conclusion: The maxillary and facial widths were present in all models and may represent a sexual dimorphism indicator. The accuracy found was lower than the literature and the Ricketts' PA cephalometry may not be adequate for sex determination. The indirect relationship of the nasal width in models with data from patients of 12 years of age or less may be a trait related to age or a characteristic in the studied population, which could be better studied and confirmed. PMID:27555732
Battaglin, William A.; Ulery, Randy L.; Winterstein, Thomas; Welborn, Toby
2003-01-01
In the State of Texas, surface water (streams, canals, and reservoirs) and ground water are used as sources of public water supply. Surface-water sources of public water supply are susceptible to contamination from point and nonpoint sources. To help protect sources of drinking water and to aid water managers in designing protective yet cost-effective and risk-mitigated monitoring strategies, the Texas Commission on Environmental Quality and the U.S. Geological Survey developed procedures to assess the susceptibility of public water-supply source waters in Texas to the occurrence of 227 contaminants. One component of the assessments is the determination of susceptibility of surface-water sources to nonpoint-source contamination. To accomplish this, water-quality data at 323 monitoring sites were matched with geographic information system-derived watershed- characteristic data for the watersheds upstream from the sites. Logistic regression models then were developed to estimate the probability that a particular contaminant will exceed a threshold concentration specified by the Texas Commission on Environmental Quality. Logistic regression models were developed for 63 of the 227 contaminants. Of the remaining contaminants, 106 were not modeled because monitoring data were available at less than 10 percent of the monitoring sites; 29 were not modeled because there were less than 15 percent detections of the contaminant in the monitoring data; 27 were not modeled because of the lack of any monitoring data; and 2 were not modeled because threshold values were not specified.
Wang, Shuang; Zhang, Yuchen; Dai, Wenrui; Lauter, Kristin; Kim, Miran; Tang, Yuzhe; Xiong, Hongkai; Jiang, Xiaoqian
2016-01-01
Motivation: Genome-wide association studies (GWAS) have been widely used in discovering the association between genotypes and phenotypes. Human genome data contain valuable but highly sensitive information. Unprotected disclosure of such information might put individual’s privacy at risk. It is important to protect human genome data. Exact logistic regression is a bias-reduction method based on a penalized likelihood to discover rare variants that are associated with disease susceptibility. We propose the HEALER framework to facilitate secure rare variants analysis with a small sample size. Results: We target at the algorithm design aiming at reducing the computational and storage costs to learn a homomorphic exact logistic regression model (i.e. evaluate P-values of coefficients), where the circuit depth is proportional to the logarithmic scale of data size. We evaluate the algorithm performance using rare Kawasaki Disease datasets. Availability and implementation: Download HEALER at http://research.ucsd-dbmi.org/HEALER/ Contact: shw070@ucsd.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26446135
Jen, Min-Hua; Bottle, Alex; Kirkwood, Graham; Johnston, Ron; Aylin, Paul
2011-09-01
We have previously described a system for monitoring a number of healthcare outcomes using case-mix adjustment models. It is desirable to automate the model fitting process in such a system if monitoring covers a large number of outcome measures or subgroup analyses. Our aim was to compare the performance of three different variable selection strategies: "manual", "automated" backward elimination and re-categorisation, and including all variables at once, irrespective of their apparent importance, with automated re-categorisation. Logistic regression models for predicting in-hospital mortality and emergency readmission within 28 days were fitted to an administrative database for 78 diagnosis groups and 126 procedures from 1996 to 2006 for National Health Services hospital trusts in England. The performance of models was assessed with Receiver Operating Characteristic (ROC) c statistics, (measuring discrimination) and Brier score (assessing the average of the predictive accuracy). Overall, discrimination was similar for diagnoses and procedures and consistently better for mortality than for emergency readmission. Brier scores were generally low overall (showing higher accuracy) and were lower for procedures than diagnoses, with a few exceptions for emergency readmission within 28 days. Among the three variable selection strategies, the automated procedure had similar performance to the manual method in almost all cases except low-risk groups with few outcome events. For the rapid generation of multiple case-mix models we suggest applying automated modelling to reduce the time required, in particular when examining different outcomes of large numbers of procedures and diseases in routinely collected administrative health data.
Almquist, Zack W; Butts, Carter T
2013-10-01
Methods for analysis of network dynamics have seen great progress in the past decade. This article shows how Dynamic Network Logistic Regression techniques (a special case of the Temporal Exponential Random Graph Models) can be used to implement decision theoretic models for network dynamics in a panel data context. We also provide practical heuristics for model building and assessment. We illustrate the power of these techniques by applying them to a dynamic blog network sampled during the 2004 US presidential election cycle. This is a particularly interesting case because it marks the debut of Internet-based media such as blogs and social networking web sites as institutionally recognized features of the American political landscape. Using a longitudinal sample of all Democratic National Convention/Republican National Convention-designated blog citation networks, we are able to test the influence of various strategic, institutional, and balance-theoretic mechanisms as well as exogenous factors such as seasonality and political events on the propensity of blogs to cite one another over time. Using a combination of deviance-based model selection criteria and simulation-based model adequacy tests, we identify the combination of processes that best characterizes the choice behavior of the contending blogs.
2013-01-01
Methods for analysis of network dynamics have seen great progress in the past decade. This article shows how Dynamic Network Logistic Regression techniques (a special case of the Temporal Exponential Random Graph Models) can be used to implement decision theoretic models for network dynamics in a panel data context. We also provide practical heuristics for model building and assessment. We illustrate the power of these techniques by applying them to a dynamic blog network sampled during the 2004 US presidential election cycle. This is a particularly interesting case because it marks the debut of Internet-based media such as blogs and social networking web sites as institutionally recognized features of the American political landscape. Using a longitudinal sample of all Democratic National Convention/Republican National Convention–designated blog citation networks, we are able to test the influence of various strategic, institutional, and balance-theoretic mechanisms as well as exogenous factors such as seasonality and political events on the propensity of blogs to cite one another over time. Using a combination of deviance-based model selection criteria and simulation-based model adequacy tests, we identify the combination of processes that best characterizes the choice behavior of the contending blogs. PMID:24143060
Jostins, Luke; McVean, Gilean
2016-01-01
Motivation: For many classes of disease the same genetic risk variants underly many related phenotypes or disease subtypes. Multinomial logistic regression provides an attractive framework to analyze multi-category phenotypes, and explore the genetic relationships between these phenotype categories. We introduce Trinculo, a program that implements a wide range of multinomial analyses in a single fast package that is designed to be easy to use by users of standard genome-wide association study software. Availability and implementation: An open source C implementation, with code and binaries for Linux and Mac OSX, is available for download at http://sourceforge.net/projects/trinculo Supplementary information: Supplementary data are available at Bioinformatics online. Contact: lj4@well.ox.ac.uk PMID:26873930
Gysemans, K P M; Bernaerts, K; Vermeulen, A; Geeraerd, A H; Debevere, J; Devlieghere, F; Van Impe, J F
2007-03-20
Several model types have already been developed to describe the boundary between growth and no growth conditions. In this article two types were thoroughly studied and compared, namely (i) the ordinary (linear) logistic regression model, i.e., with a polynomial on the right-hand side of the model equation (type I) and (ii) the (nonlinear) logistic regression model derived from a square root-type kinetic model (type II). The examination was carried out on the basis of the data described in Vermeulen et al. [Vermeulen, A., Gysemans, K.P.M., Bernaerts, K., Geeraerd, A.H., Van Impe, J.F., Debevere, J., Devlieghere, F., 2006-this issue. Influence of pH, water activity and acetic acid concentration on Listeria monocytogenes at 7 degrees C: data collection for the development of a growth/no growth model. International Journal of Food Microbiology. .]. These data sets consist of growth/no growth data for Listeria monocytogenes as a function of water activity (0.960-0.990), pH (5.0-6.0) and acetic acid percentage (0-0.8% (w/w)), both for a monoculture and a mixed strain culture. Numerous replicates, namely twenty, were performed at closely spaced conditions. In this way detailed information was obtained about the position of the interface and the transition zone between growth and no growth. The main questions investigated were (i) which model type performs best on the monoculture and the mixed strain data, (ii) are there differences between the growth/no growth interfaces of monocultures and mixed strain cultures, (iii) which parameter estimation approach works best for the type II models, and (iv) how sensitive is the performance of these models to the values of their nonlinear-appearing parameters. The results showed that both type I and II models performed well on the monoculture data with respect to goodness-of-fit and predictive power. The type I models were, however, more sensitive to anomalous data points. The situation was different for the mixed strain culture. In
Raevsky, O A; Polianczyk, D E; Mukhametov, A; Grigorev, V Y
2016-08-01
Assessment of "CNS drugs/CNS candidates" classification abilities of the multi-parametric optimization (CNS MPO) approach was performed by logistic regression. It was found that the five out of the six separately used physical-chemical properties (topological polar surface area, number of hydrogen-bonded donor atoms, basicity, lipophilicity of compound in neutral form and at pH = 7.4) provided accuracy of recognition below 60%. Only the descriptor of molecular weight (MW) could correctly classify two-thirds of the studied compounds. Aggregation of all six properties in the MPOscore did not improve the classification, which was worse than the classification using only MW. The results of our study demonstrate the imperfection of the CNS MPO approach; in its current form it is not very useful for computer design of new, effective CNS drugs.
Calef, M.P.; McGuire, A.D.; Epstein, H.E.; Rupp, T.S.; Shugart, H.H.
2005-01-01
Aim: To understand drivers of vegetation type distribution and sensitivity to climate change. Location: Interior Alaska. Methods: A logistic regression model was developed that predicts the potential equilibrium distribution of four major vegetation types: tundra, deciduous forest, black spruce forest and white spruce forest based on elevation, aspect, slope, drainage type, fire interval, average growing season temperature and total growing season precipitation. The model was run in three consecutive steps. The hierarchical logistic regression model was used to evaluate how scenarios of changes in temperature, precipitation and fire interval may influence the distribution of the four major vegetation types found in this region. Results: At the first step, tundra was distinguished from forest, which was mostly driven by elevation, precipitation and south to north aspect. At the second step, forest was separated into deciduous and spruce forest, a distinction that was primarily driven by fire interval and elevation. At the third step, the identification of black vs. white spruce was driven mainly by fire interval and elevation. The model was verified for Interior Alaska, the region used to develop the model, where it predicted vegetation distribution among the steps with an accuracy of 60-83%. When the model was independently validated for north-west Canada, it predicted vegetation distribution among the steps with an accuracy of 53-85%. Black spruce remains the dominant vegetation type under all scenarios, potentially expanding most under warming coupled with increasing fire interval. White spruce is clearly limited by moisture once average growing season temperatures exceeded a critical limit (+2 ??C). Deciduous forests expand their range the most when any two of the following scenarios are combined: decreasing fire interval, warming and increasing precipitation. Tundra can be replaced by forest under warming but expands under precipitation increase. Main
NASA Astrophysics Data System (ADS)
Espino, Natalia V.
Foreign Object Debris/Damage (FOD) is a costly and high-risk problem that aeronautics industries such as Boeing, Lockheed Martin, among others are facing at their production lines every day. They spend an average of $350 thousand dollars per year fixing FOD problems. FOD can put pilots, passengers and other crews' lives into high-risk. FOD refers to any type of foreign object, particle, debris or agent in the manufacturing environment, which could contaminate/damage the product or otherwise undermine quality control standards. FOD can be in the form of any of the following categories: panstock, manufacturing debris, tools/shop aids, consumables and trash. Although aeronautics industries have put many prevention plans in place such as housekeeping and "clean as you go" philosophies, trainings, use of RFID for tooling control, etc. none of them has been able to completely eradicate the problem. This research presents a logistic regression statistical model approach to predict probability of FOD type under given specific circumstances such as workstation, month and aircraft/jet being built. FOD Quality Assurance Reports of the last three years were provided by an aeronautical industry for this study. By predicting type of FOD, custom reduction/elimination plans can be put in place and by such means being able to diminish the problem. Different aircrafts were analyzed and so different models developed through same methodology. Results of the study presented are predictions of FOD type for each aircraft and workstation throughout the year, which were obtained by applying proposed logistic regression models. This research would help aeronautic industries to address the FOD problem correctly, to be able to identify root causes and establish actual reduction/elimination plans.
NASA Astrophysics Data System (ADS)
Madhu, B.; Ashok, N. C.; Balasubramanian, S.
2014-11-01
Multinomial logistic regression analysis was used to develop statistical model that can predict the probability of breast cancer in Southern Karnataka using the breast cancer occurrence data during 2007-2011. Independent socio-economic variables describing the breast cancer occurrence like age, education, occupation, parity, type of family, health insurance coverage, residential locality and socioeconomic status of each case was obtained. The models were developed as follows: i) Spatial visualization of the Urban- rural distribution of breast cancer cases that were obtained from the Bharat Hospital and Institute of Oncology. ii) Socio-economic risk factors describing the breast cancer occurrences were complied for each case. These data were then analysed using multinomial logistic regression analysis in a SPSS statistical software and relations between the occurrence of breast cancer across the socio-economic status and the influence of other socio-economic variables were evaluated and multinomial logistic regression models were constructed. iii) the model that best predicted the occurrence of breast cancer were identified. This multivariate logistic regression model has been entered into a geographic information system and maps showing the predicted probability of breast cancer occurrence in Southern Karnataka was created. This study demonstrates that Multinomial logistic regression is a valuable tool for developing models that predict the probability of breast cancer Occurrence in Southern Karnataka.
Franklin, Erik C.; Kelley, Christopher; Frazer, L. Neil; Toonen, Robert J.
2016-01-01
Predictive habitat suitability models are powerful tools for cost-effective, statistically robust assessment of the environmental drivers of species distributions. The aim of this study was to develop predictive habitat suitability models for two genera of scleractinian corals (Leptoserisand Montipora) found within the mesophotic zone across the main Hawaiian Islands. The mesophotic zone (30–180 m) is challenging to reach, and therefore historically understudied, because it falls between the maximum limit of SCUBA divers and the minimum typical working depth of submersible vehicles. Here, we implement a logistic regression with rare events corrections to account for the scarcity of presence observations within the dataset. These corrections reduced the coefficient error and improved overall prediction success (73.6% and 74.3%) for both original regression models. The final models included depth, rugosity, slope, mean current velocity, and wave height as the best environmental covariates for predicting the occurrence of the two genera in the mesophotic zone. Using an objectively selected theta (“presence”) threshold, the predicted presence probability values (average of 0.051 for Leptoseris and 0.040 for Montipora) were translated to spatially-explicit habitat suitability maps of the main Hawaiian Islands at 25 m grid cell resolution. Our maps are the first of their kind to use extant presence and absence data to examine the habitat preferences of these two dominant mesophotic coral genera across Hawai‘i. PMID:27441122
Veazey, Lindsay M; Franklin, Erik C; Kelley, Christopher; Rooney, John; Frazer, L Neil; Toonen, Robert J
2016-01-01
Predictive habitat suitability models are powerful tools for cost-effective, statistically robust assessment of the environmental drivers of species distributions. The aim of this study was to develop predictive habitat suitability models for two genera of scleractinian corals (Leptoserisand Montipora) found within the mesophotic zone across the main Hawaiian Islands. The mesophotic zone (30-180 m) is challenging to reach, and therefore historically understudied, because it falls between the maximum limit of SCUBA divers and the minimum typical working depth of submersible vehicles. Here, we implement a logistic regression with rare events corrections to account for the scarcity of presence observations within the dataset. These corrections reduced the coefficient error and improved overall prediction success (73.6% and 74.3%) for both original regression models. The final models included depth, rugosity, slope, mean current velocity, and wave height as the best environmental covariates for predicting the occurrence of the two genera in the mesophotic zone. Using an objectively selected theta ("presence") threshold, the predicted presence probability values (average of 0.051 for Leptoseris and 0.040 for Montipora) were translated to spatially-explicit habitat suitability maps of the main Hawaiian Islands at 25 m grid cell resolution. Our maps are the first of their kind to use extant presence and absence data to examine the habitat preferences of these two dominant mesophotic coral genera across Hawai'i.
NASA Astrophysics Data System (ADS)
Conoscenti, Christian; Ciaccio, Marilena; Caraballo-Arias, Nathalie Almaru; Gómez-Gutiérrez, Álvaro; Rotigliano, Edoardo; Agnesi, Valerio
2015-08-01
In this paper, terrain susceptibility to earth-flow occurrence was evaluated by using geographic information systems (GIS) and two statistical methods: Logistic regression (LR) and multivariate adaptive regression splines (MARS). LR has been already demonstrated to provide reliable predictions of earth-flow occurrence, whereas MARS, as far as we know, has never been used to generate earth-flow susceptibility models. The experiment was carried out in a basin of western Sicily (Italy), which extends for 51 km2 and is severely affected by earth-flows. In total, we mapped 1376 earth-flows, covering an area of 4.59 km2. To explore the effect of pre-failure topography on earth-flow spatial distribution, we performed a reconstruction of topography before the landslide occurrence. This was achieved by preparing a digital terrain model (DTM) where altitude of areas hosting landslides was interpolated from the adjacent undisturbed land surface by using the algorithm topo-to-raster. This DTM was exploited to extract 15 morphological and hydrological variables that, in addition to outcropping lithology, were employed as explanatory variables of earth-flow spatial distribution. The predictive skill of the earth-flow susceptibility models and the robustness of the procedure were tested by preparing five datasets, each including a different subset of landslides and stable areas. The accuracy of the predictive models was evaluated by drawing receiver operating characteristic (ROC) curves and by calculating the area under the ROC curve (AUC). The results demonstrate that the overall accuracy of LR and MARS earth-flow susceptibility models is from excellent to outstanding. However, AUC values of the validation datasets attest to a higher predictive power of MARS-models (AUC between 0.881 and 0.912) with respect to LR-models (AUC between 0.823 and 0.870). The adopted procedure proved to be resistant to overfitting and stable when changes of the learning and validation samples are
Wyss, Richard; Ellis, Alan R; Brookhart, M Alan; Girman, Cynthia J; Jonsson Funk, Michele; LoCasale, Robert; Stürmer, Til
2014-09-15
The covariate-balancing propensity score (CBPS) extends logistic regression to simultaneously optimize covariate balance and treatment prediction. Although the CBPS has been shown to perform well in certain settings, its performance has not been evaluated in settings specific to pharmacoepidemiology and large database research. In this study, we use both simulations and empirical data to compare the performance of the CBPS with logistic regression and boosted classification and regression trees. We simulated various degrees of model misspecification to evaluate the robustness of each propensity score (PS) estimation method. We then applied these methods to compare the effect of initiating glucagonlike peptide-1 agonists versus sulfonylureas on cardiovascular events and all-cause mortality in the US Medicare population in 2007-2009. In simulations, the CBPS was generally more robust in terms of balancing covariates and reducing bias compared with misspecified logistic PS models and boosted classification and regression trees. All PS estimation methods performed similarly in the empirical example. For settings common to pharmacoepidemiology, logistic regression with balance checks to assess model specification is a valid method for PS estimation, but it can require refitting multiple models until covariate balance is achieved. The CBPS is a promising method to improve the robustness of PS models.
Li, Y.; Graubard, B. I.; Huang, P.; Gastwirth, J. L.
2015-01-01
Determining the extent of a disparity, if any, between groups of people, for example, race or gender, is of interest in many fields, including public health for medical treatment and prevention of disease. An observed difference in the mean outcome between an advantaged group (AG) and disadvantaged group (DG) can be due to differences in the distribution of relevant covariates. The Peters–Belson (PB) method fits a regression model with covariates to the AG to predict, for each DG member, their outcome measure as if they had been from the AG. The difference between the mean predicted and the mean observed outcomes of DG members is the (unexplained) disparity of interest. We focus on applying the PB method to estimate the disparity based on binary/multinomial/proportional odds logistic regression models using data collected from complex surveys with more than one DG. Estimators of the unexplained disparity, an analytic variance–covariance estimator that is based on the Taylor linearization variance–covariance estimation method, as well as a Wald test for testing a joint null hypothesis of zero for unexplained disparities between two or more minority groups and a majority group, are provided. Simulation studies with data selected from simple random sampling and cluster sampling, as well as the analyses of disparity in body mass index in the National Health and Nutrition Examination Survey 1999–2004, are conducted. Empirical results indicate that the Taylor linearization variance–covariance estimation is accurate and that the proposed Wald test maintains the nominal level. PMID:25382235
NASA Astrophysics Data System (ADS)
Xiao, Rui; Su, Shiliang; Mai, Gengchen; Zhang, Zhonghao; Yang, Chenxue
2015-02-01
Cash crop expansion has been a major land use change in tropical and subtropical regions worldwide. Quantifying the determinants of cash crop expansion should provide deeper spatial insights into the dynamics and ecological consequences of cash crop expansion. This paper investigated the process of cash crop expansion in Hangzhou region (China) from 1985 to 2009 using remotely sensed data. The corresponding determinants (neighborhood, physical, and proximity) and their relative effects during three periods (1985-1994, 1994-2003, and 2003-2009) were quantified by logistic regression modeling and variance partitioning. Results showed that the total area of cash crops increased from 58,874.1 ha in 1985 to 90,375.1 ha in 2009, with a net growth of 53.5%. Cash crops were more likely to grow in loam soils. Steep areas with higher elevation would experience less likelihood of cash crop expansion. A consistently higher probability of cash crop expansion was found on places with abundant farmland and forest cover in the three periods. Besides, distance to river and lake, distance to county center, and distance to provincial road were decisive determinants for farmers' choice of cash crop plantation. Different categories of determinants and their combinations exerted different influences on cash crop expansion. The joint effects of neighborhood and proximity determinants were the strongest, and the unique effect of physical determinants decreased with time. Our study contributed to understanding of the proximate drivers of cash crop expansion in subtropical regions.
Bent, Gardner C.; Steeves, Peter A.
2006-01-01
A revised logistic regression equation and an automated procedure were developed for mapping the probability of a stream flowing perennially in Massachusetts. The equation provides city and town conservation commissions and the Massachusetts Department of Environmental Protection a method for assessing whether streams are intermittent or perennial at a specific site in Massachusetts by estimating the probability of a stream flowing perennially at that site. This information could assist the environmental agencies who administer the Commonwealth of Massachusetts Rivers Protection Act of 1996, which establishes a 200-foot-wide protected riverfront area extending from the mean annual high-water line along each side of a perennial stream, with exceptions for some urban areas. The equation was developed by relating the observed intermittent or perennial status of a stream site to selected basin characteristics of naturally flowing streams (defined as having no regulation by dams, surface-water withdrawals, ground-water withdrawals, diversion, wastewater discharge, and so forth) in Massachusetts. This revised equation differs from the equation developed in a previous U.S. Geological Survey study in that it is solely based on visual observations of the intermittent or perennial status of stream sites across Massachusetts and on the evaluation of several additional basin and land-use characteristics as potential explanatory variables in the logistic regression analysis. The revised equation estimated more accurately the intermittent or perennial status of the observed stream sites than the equation from the previous study. Stream sites used in the analysis were identified as intermittent or perennial based on visual observation during low-flow periods from late July through early September 2001. The database of intermittent and perennial streams included a total of 351 naturally flowing (no regulation) sites, of which 85 were observed to be intermittent and 266 perennial
NASA Astrophysics Data System (ADS)
Mousavi, S. Mostafa; Horton, Stephen P.; Langston, Charles A.; Samei, Borhan
2016-10-01
We develop an automated strategy for discriminating deep microseismic events from shallow ones on the basis of the waveforms recorded on a limited number of surface receivers. Machine-learning techniques are employed to explore the relationship between event hypocentres and seismic features of the recorded signals in time, frequency and time-frequency domains. We applied the technique to 440 microearthquakes -1.7 < Mw < 1.29, induced by an underground cavern collapse in the Napoleonville Salt Dome in Bayou Corne, Louisiana. Forty different seismic attributes of whole seismograms including degree of polarization and spectral attributes were measured. A selected set of features was then used to train the system to discriminate between deep and shallow events based on the knowledge gained from existing patterns. The cross-validation test showed that events with depth shallower than 250 m can be discriminated from events with hypocentral depth between 1000 and 2000 m with 88 per cent and 90.7 per cent accuracy using logistic regression and artificial neural network models, respectively. Similar results were obtained using single station seismograms. The results show that the spectral features have the highest correlation to source depth. Spectral centroids and 2-D cross-correlations in the time-frequency domain are two new seismic features used in this study that showed to be promising measures for seismic event classification. The used machine-learning techniques have application for efficient automatic classification of low energy signals recorded at one or more seismic stations.
Wilson, Asa B; Kerr, Bernard J; Bastian, Nathaniel D; Fulton, Lawrence V
2012-01-01
From 1980 to 1999, rural designated hospitals closed at a disproportionally high rate. In response to this emergent threat to healthcare access in rural settings, the Balanced Budget Act of 1997 made provisions for the creation of a new rural hospital--the critical access hospital (CAH). The conversion to CAH and the associated cost-based reimbursement scheme significantly slowed the closure rate of rural hospitals. This work investigates which methods can ensure the long-term viability of small hospitals. This article uses a two-step design to focus on a hypothesized relationship between technical efficiency of CAHs and a recently developed set of financial monitors for these entities. The goal is to identify the financial performance measures associated with efficiency. The first step uses data envelopment analysis (DEA) to differentiate efficient from inefficient facilities within a data set of 183 CAHs. Determining DEA efficiency is an a priori categorization of hospitals in the data set as efficient or inefficient. In the second step, DEA efficiency is the categorical dependent variable (efficient = 0, inefficient = 1) in the subsequent binary logistic regression (LR) model. A set of six financial monitors selected from the array of 20 measures were the LR independent variables. We use a binary LR to test the null hypothesis that recently developed CAH financial indicators had no predictive value for categorizing a CAH as efficient or inefficient, (i.e., there is no relationship between DEA efficiency and fiscal performance).
Wu, Liang; Deng, Fei; Xie, Zhong; Hu, Sheng; Shen, Shu; Shi, Junming; Liu, Dan
2016-01-01
Severe fever with thrombocytopenia syndrome (SFTS) is caused by severe fever with thrombocytopenia syndrome virus (SFTSV), which has had a serious impact on public health in parts of Asia. There is no specific antiviral drug or vaccine for SFTSV and, therefore, it is important to determine the factors that influence the occurrence of SFTSV infections. This study aimed to explore the spatial associations between SFTSV infections and several potential determinants, and to predict the high-risk areas in mainland China. The analysis was carried out at the level of provinces in mainland China. The potential explanatory variables that were investigated consisted of meteorological factors (average temperature, average monthly precipitation and average relative humidity), the average proportion of rural population and the average proportion of primary industries over three years (2010–2012). We constructed a geographically weighted logistic regression (GWLR) model in order to explore the associations between the selected variables and confirmed cases of SFTSV. The study showed that: (1) meteorological factors have a strong influence on the SFTSV cover; (2) a GWLR model is suitable for exploring SFTSV cover in mainland China; (3) our findings can be used for predicting high-risk areas and highlighting when meteorological factors pose a risk in order to aid in the implementation of public health strategies. PMID:27845737
Herndon, Nic; Caragea, Doina
2016-01-01
Supervised classifiers are highly dependent on abundant labeled training data. Alternatives for addressing the lack of labeled data include: labeling data (but this is costly and time consuming); training classifiers with abundant data from another domain (however, the classification accuracy usually decreases as the distance between domains increases); or complementing the limited labeled data with abundant unlabeled data from the same domain and learning semi-supervised classifiers (but the unlabeled data can mislead the classifier). A better alternative is to use both the abundant labeled data from a source domain, the limited labeled data and optionally the unlabeled data from the target domain to train classifiers in a domain adaptation setting. We propose two such classifiers, based on logistic regression, and evaluate them for the task of splice site prediction – a difficult and essential step in gene prediction. Our classifiers achieved high accuracy, with highest areas under the precision-recall curve between 50.83% and 82.61%. PMID:26849871
NASA Astrophysics Data System (ADS)
Conoscenti, Christian; Angileri, Silvia; Cappadonia, Chiara; Rotigliano, Edoardo; Agnesi, Valerio; Märker, Michael
2014-01-01
This research aims at characterizing susceptibility conditions to gully erosion by means of GIS and multivariate statistical analysis. The study area is a 9.5 km2 river catchment in central-northern Sicily, where agriculture activities are limited by intense erosion. By means of field surveys and interpretation of aerial images, we prepared a digital map of the spatial distribution of 260 gullies in the study area. In addition, from available thematic maps, a 5 m cell size digital elevation model and field checks, we derived 27 environmental attributes that describe the variability of lithology, land use, topography and road position. These attributes were selected for their potential influence on erosion processes, while the dependent variable was given by presence or absence of gullies within two different types of mapping units: 5 m grid cells and slope units (average size = 2.66 ha). The functional relationships between gully occurrence and the controlling factors were obtained from forward stepwise logistic regression to calculate the probability to host a gully for each mapping unit. In order to train and test the predictive models, three calibration and three validation subsets, of both grid cells and slope units, were randomly selected. Results of validation, based on ROC (receiving operating characteristic) curves, attest for acceptable to excellent accuracies of the models, showing better predictive skill and more stable performance of the susceptibility model based on grid cells.
Dai, Huanping; Micheyl, Christophe
2012-11-01
Psychophysical "reverse-correlation" methods allow researchers to gain insight into the perceptual representations and decision weighting strategies of individual subjects in perceptual tasks. Although these methods have gained momentum, until recently their development was limited to experiments involving only two response categories. Recently, two approaches for estimating decision weights in m-alternative experiments have been put forward. One approach extends the two-category correlation method to m > 2 alternatives; the second uses multinomial logistic regression (MLR). In this article, the relative merits of the two methods are discussed, and the issues of convergence and statistical efficiency of the methods are evaluated quantitatively using Monte Carlo simulations. The results indicate that, for a range of values of the number of trials, the estimated weighting patterns are closer to their asymptotic values for the correlation method than for the MLR method. Moreover, for the MLR method, weight estimates for different stimulus components can exhibit strong correlations, making the analysis and interpretation of measured weighting patterns less straightforward than for the correlation method. These and other advantages of the correlation method, which include computational simplicity and a close relationship to other well-established psychophysical reverse-correlation methods, make it an attractive tool to uncover decision strategies in m-alternative experiments.
NASA Astrophysics Data System (ADS)
Thompson, E. David; Bowling, Bethany V.; Markle, Ross E.
2017-02-01
Studies over the last 30 years have considered various factors related to student success in introductory biology courses. While much of the available literature suggests that the best predictors of success in a college course are prior college grade point average (GPA) and class attendance, faculty often require a valuable predictor of success in those courses wherein the majority of students are in the first semester and have no previous record of college GPA or attendance. In this study, we evaluated the efficacy of the ACT Mathematics subject exam and Lawson's Classroom Test of Scientific Reasoning in predicting success in a major's introductory biology course. A logistic regression was utilized to determine the effectiveness of a combination of scientific reasoning (SR) scores and ACT math (ACT-M) scores to predict student success. In summary, we found that the model—with both SR and ACT-M as significant predictors—could be an effective predictor of student success and thus could potentially be useful in practical decision making for the course, such as directing students to support services at an early point in the semester.
Parodi, Stefano; Dosi, Corrado; Zambon, Antonella; Ferrari, Enrico; Muselli, Marco
2017-03-02
Identifying potential risk factors for problem gambling (PG) is of primary importance for planning preventive and therapeutic interventions. We illustrate a new approach based on the combination of standard logistic regression and an innovative method of supervised data mining (Logic Learning Machine or LLM). Data were taken from a pilot cross-sectional study to identify subjects with PG behaviour, assessed by two internationally validated scales (SOGS and Lie/Bet). Information was obtained from 251 gamblers recruited in six betting establishments. Data on socio-demographic characteristics, lifestyle and cognitive-related factors, and type, place and frequency of preferred gambling were obtained by a self-administered questionnaire. The following variables associated with PG were identified: instant gratification games, alcohol abuse, cognitive distortion, illegal behaviours and having started gambling with a relative or a friend. Furthermore, the combination of LLM and LR indicated the presence of two different types of PG, namely: (a) daily gamblers, more prone to illegal behaviour, with poor money management skills and who started gambling at an early age, and (b) non-daily gamblers, characterised by superstitious beliefs and a higher preference for immediate reward games. Finally, instant gratification games were strongly associated with the number of games usually played. Studies on gamblers habitually frequently betting shops are rare. The finding of different types of PG by habitual gamblers deserves further analysis in larger studies. Advanced data mining algorithms, like LLM, are powerful tools and potentially useful in identifying risk factors for PG.
Liu, Hongjie; Li, Tianhao; Zhan, Sha; Pan, Meilan; Ma, Zhiguo; Li, Chenghua
2016-01-01
Aims. To establish a logistic regression (LR) prediction model for hepatotoxicity of Chinese herbal medicines (HMs) based on traditional Chinese medicine (TCM) theory and to provide a statistical basis for predicting hepatotoxicity of HMs. Methods. The correlations of hepatotoxic and nonhepatotoxic Chinese HMs with four properties, five flavors, and channel tropism were analyzed with chi-square test for two-way unordered categorical data. LR prediction model was established and the accuracy of the prediction by this model was evaluated. Results. The hepatotoxic and nonhepatotoxic Chinese HMs were related with four properties (p < 0.05), and the coefficient was 0.178 (p < 0.05); also they were related with five flavors (p < 0.05), and the coefficient was 0.145 (p < 0.05); they were not related with channel tropism (p > 0.05). There were totally 12 variables from four properties and five flavors for the LR. Four variables, warm and neutral of the four properties and pungent and salty of five flavors, were selected to establish the LR prediction model, with the cutoff value being 0.204. Conclusions. Warm and neutral of the four properties and pungent and salty of five flavors were the variables to affect the hepatotoxicity. Based on such results, the established LR prediction model had some predictive power for hepatotoxicity of Chinese HMs. PMID:27656240
NASA Astrophysics Data System (ADS)
Song, Sutao; Chen, Gongxiang; Zhan, Yu; Zhang, Jiacai; Yao, Li
2014-03-01
Recently, sparse algorithms, such as Sparse Multinomial Logistic Regression (SMLR), have been successfully applied in decoding visual information from functional magnetic resonance imaging (fMRI) data, where the contrast of visual stimuli was predicted by a classifier. The contrast classifier combined brain activities of voxels with sparse weights. For sparse algorithms, the goal is to learn a classifier whose weights distributed as sparse as possible by introducing some prior belief about the weights. There are two ways to introduce a sparse prior constraints for weights: the Automatic Relevance Determination (ARD-SMLR) and Laplace prior (LAP-SMLR). In this paper, we presented comparison results between the ARD-SMLR and LAP-SMLR models in computational time, classification accuracy and voxel selection. Results showed that, for fMRI data, no significant difference was found in classification accuracy between these two methods when voxels in V1 were chosen as input features (totally 1017 voxels). As for computation time, LAP-SMLR was superior to ARD-SMLR; the survived voxels for ARD-SMLR was less than LAP-SMLR. Using simulation data, we confirmed the classification performance for the two SMLR models was sensitive to the sparsity of the initial features, when the ratio of relevant features to the initial features was larger than 0.01, ARD-SMLR outperformed LAP-SMLR; otherwise, LAP-SMLR outperformed LAP-SMLR. Simulation data showed ARD-SMLR was more efficient in selecting relevant features.
NASA Astrophysics Data System (ADS)
Vozinaki, Anthi Eirini K.; Karatzas, George P.; Sibetheros, Ioannis A.; Varouchakis, Emmanouil A.
2014-05-01
Damage curves are the most significant component of the flood loss estimation models. Their development is quite complex. Two types of damage curves exist, historical and synthetic curves. Historical curves are developed from historical loss data from actual flood events. However, due to the scarcity of historical data, synthetic damage curves can be alternatively developed. Synthetic curves rely on the analysis of expected damage under certain hypothetical flooding conditions. A synthetic approach was developed and presented in this work for the development of damage curves, which are subsequently used as the basic input to a flood loss estimation model. A questionnaire-based survey took place among practicing and research agronomists, in order to generate rural loss data based on the responders' loss estimates, for several flood condition scenarios. In addition, a similar questionnaire-based survey took place among building experts, i.e. civil engineers and architects, in order to generate loss data for the urban sector. By answering the questionnaire, the experts were in essence expressing their opinion on how damage to various crop types or building types is related to a range of values of flood inundation parameters, such as floodwater depth and velocity. However, the loss data compiled from the completed questionnaires were not sufficient for the construction of workable damage curves; to overcome this problem, a Weighted Monte Carlo method was implemented, in order to generate extra synthetic datasets with statistical properties identical to those of the questionnaire-based data. The data generated by the Weighted Monte Carlo method were processed via Logistic Regression techniques in order to develop accurate logistic damage curves for the rural and the urban sectors. A Python-based code was developed, which combines the Weighted Monte Carlo method and the Logistic Regression analysis into a single code (WMCLR Python code). Each WMCLR code execution
HE, PENG; ERIKSSON, FRANK; SCHEIKE, THOMAS H.; ZHANG, MEI-JIE
2015-01-01
With competing risks data, one often needs to assess the treatment and covariate effects on the cumulative incidence function. Fine and Gray proposed a proportional hazards regression model for the subdistribution of a competing risk with the assumption that the censoring distribution and the covariates are independent. Covariate-dependent censoring sometimes occurs in medical studies. In this paper, we study the proportional hazards regression model for the subdistribution of a competing risk with proper adjustments for covariate-dependent censoring. We consider a covariate-adjusted weight function by fitting the Cox model for the censoring distribution and using the predictive probability for each individual. Our simulation study shows that the covariate-adjusted weight estimator is basically unbiased when the censoring time depends on the covariates, and the covariate-adjusted weight approach works well for the variance estimator as well. We illustrate our methods with bone marrow transplant data from the Center for International Blood and Marrow Transplant Research (CIBMTR). Here cancer relapse and death in complete remission are two competing risks. PMID:27034534
ERIC Educational Resources Information Center
Hidalgo, Mª Dolores; Gómez-Benito, Juana; Zumbo, Bruno D.
2014-01-01
The authors analyze the effectiveness of the R[superscript 2] and delta log odds ratio effect size measures when using logistic regression analysis to detect differential item functioning (DIF) in dichotomous items. A simulation study was carried out, and the Type I error rate and power estimates under conditions in which only statistical testing…
NASA Astrophysics Data System (ADS)
Ngadisih, Bhandary, Netra P.; Yatabe, Ryuichi; Dahal, Ranjan K.
2016-05-01
West Java Province is the most landslide risky area in Indonesia owing to extreme geo-morphological conditions, climatic conditions and densely populated settlements with immense completed and ongoing development activities. So, a landslide susceptibility map at regional scale in this province is a fundamental tool for risk management and land-use planning. Logistic regression and Artificial Neural Network (ANN) models are the most frequently used tools for landslide susceptibility assessment, mainly because they are capable of handling the nature of landslide data. The main objective of this study is to apply logistic regression and ANN models and compare their performance for landslide susceptibility mapping in volcanic mountains of West Java Province. In addition, the model application is proposed to identify the most contributing factors to landslide events in the study area. The spatial database built in GIS platform consists of landslide inventory, four topographical parameters (slope, aspect, relief, distance to river), three geological parameters (distance to volcano crater, distance to thrust and fault, geological formation), and two anthropogenic parameters (distance to road, land use). The logistic regression model in this study revealed that slope, geological formations, distance to road and distance to volcano are the most influential factors of landslide events while, the ANN model revealed that distance to volcano crater, geological formation, distance to road, and land-use are the most important causal factors of landslides in the study area. Moreover, an evaluation of the model showed that the ANN model has a higher accuracy than the logistic regression model.
Barks, C.S.
1995-01-01
Storm-runoff water-quality data were used to verify and, when appropriate, adjust regional regression models previously developed to estimate urban storm- runoff loads and mean concentrations in Little Rock, Arkansas. Data collected at 5 representative sites during 22 storms from June 1992 through January 1994 compose the Little Rock data base. Comparison of observed values (0) of storm-runoff loads and mean concentrations to the predicted values (Pu) from the regional regression models for nine constituents (chemical oxygen demand, suspended solids, total nitrogen, total ammonia plus organic nitrogen as nitrogen, total phosphorus, dissolved phosphorus, total recoverable copper, total recoverable lead, and total recoverable zinc) shows large prediction errors ranging from 63 to several thousand percent. Prediction errors for six of the regional regression models are less than 100 percent, and can be considered reasonable for water-quality models. Differences between 0 and Pu are due to variability in the Little Rock data base and error in the regional models. Where applicable, a model adjustment procedure (termed MAP-R-P) based upon regression with 0 against Pu was applied to improve predictive accuracy. For 11 of the 18 regional water-quality models, 0 and Pu are significantly correlated, that is much of the variation in 0 is explained by the regional models. Five of these 11 regional models consistently overestimate O; therefore, MAP-R-P can be used to provide a better estimate. For the remaining seven regional models, 0 and Pu are not significanfly correlated, thus neither the unadjusted regional models nor the MAP-R-P is appropriate. A simple estimator, such as the mean of the observed values may be used if the regression models are not appropriate. Standard error of estimate of the adjusted models ranges from 48 to 130 percent. Calibration results may be biased due to the limited data set sizes in the Little Rock data base. The relatively large values of
Tang, Li-Na; Ye, Xiao-Zhou; Yan, Qiu-Ge; Chang, Hong-Juan; Ma, Yu-Qiao; Liu, De-Bin; Li, Zhi-Gen; Yu, Yi-Zhen
2017-02-01
The risk factors of high trait anger of juvenile offenders were explored through questionnaire study in a youth correctional facility of Hubei province, China. A total of 1090 juvenile offenders in Hubei province were investigated by self-compiled social-demographic questionnaire, Childhood Trauma Questionnaire (CTQ), and State-Trait Anger Expression Inventory-II (STAXI-II). The risk factors were analyzed by chi-square tests, correlation analysis, and binary logistic regression analysis with SPSS 19.0. A total of 1082 copies of valid questionnaires were collected. High trait anger group (n=316) was defined as those who scored in the upper 27th percentile of STAXI-II trait anger scale (TAS), and the rest were defined as low trait anger group (n=766). The risk factors associated with high level of trait anger included: childhood emotional abuse, childhood sexual abuse, step family, frequent drug abuse, and frequent internet using (P<0.05 or P<0.01). Birth sequence, number of sibling, ranking in the family, identity of the main care-taker, the education level of care-taker, educational style of care-taker, family income, relationship between parents, social atmosphere of local area, frequent drinking, and frequent smoking did not predict to high level of trait anger (P>0.05). It was suggested that traumatic experience in childhood and unhealthy life style may significantly increase the level of trait anger in adulthood. The risk factors of high trait anger and their effects should be taken into consideration seriously.
NASA Astrophysics Data System (ADS)
Yao, W.; Poleswki, P.; Krzystek, P.
2016-06-01
The recent success of deep convolutional neural networks (CNN) on a large number of applications can be attributed to large amounts of available training data and increasing computing power. In this paper, a semantic pixel labelling scheme for urban areas using multi-resolution CNN and hand-crafted spatial-spectral features of airborne remotely sensed data is presented. Both CNN and hand-crafted features are applied to image/DSM patches to produce per-pixel class probabilities with a L1-norm regularized logistical regression classifier. The evidence theory infers a degree of belief for pixel labelling from different sources to smooth regions by handling the conflicts present in the both classifiers while reducing the uncertainty. The aerial data used in this study were provided by ISPRS as benchmark datasets for 2D semantic labelling tasks in urban areas, which consists of two data sources from LiDAR and color infrared camera. The test sites are parts of a city in Germany which is assumed to consist of typical object classes including impervious surfaces, trees, buildings, low vegetation, vehicles and clutter. The evaluation is based on the computation of pixel-based confusion matrices by random sampling. The performance of the strategy with respect to scene characteristics and method combination strategies is analyzed and discussed. The competitive classification accuracy could be not only explained by the nature of input data sources: e.g. the above-ground height of nDSM highlight the vertical dimension of houses, trees even cars and the nearinfrared spectrum indicates vegetation, but also attributed to decision-level fusion of CNN's texture-based approach with multichannel spatial-spectral hand-crafted features based on the evidence combination theory.
Chang, Ling-Hui; Tsai, Athena Yi-Jung; Huang, Wen-Ni
2016-01-01
Because resources for long-term care services are limited, timely and appropriate referral for rehabilitation services is critical for optimizing clients’ functions and successfully integrating them into the community. We investigated which client characteristics are most relevant in predicting Taiwan’s community-based occupational therapy (OT) service referral based on experts’ beliefs. Data were collected in face-to-face interviews using the Multidimensional Assessment Instrument (MDAI). Community-dwelling participants (n = 221) ≥ 18 years old who reported disabilities in the previous National Survey of Long-term Care Needs in Taiwan were enrolled. The standard for referral was the judgment and agreement of two experienced occupational therapists who reviewed the results of the MDAI. Logistic regressions and Generalized Additive Models were used for analysis. Two predictive models were proposed, one using basic activities of daily living (BADLs) and one using instrumental ADLs (IADLs). Dementia, psychiatric disorders, cognitive impairment, joint range-of-motion limitations, fear of falling, behavioral or emotional problems, expressive deficits (in the BADL-based model), and limitations in IADLs or BADLs were significantly correlated with the need for referral. Both models showed high area under the curve (AUC) values on receiver operating curve testing (AUC = 0.977 and 0.972, respectively). The probability of being referred for community OT services was calculated using the referral algorithm. The referral protocol facilitated communication between healthcare professionals to make appropriate decisions for OT referrals. The methods and findings should be useful for developing referral protocols for other long-term care services. PMID:26863544
Alavi, Seyyed Salman; Mohammadi, Mohammad Reza; Souri, Hamid; Mohammadi Kalhori, Soroush; Jannatifard, Fereshteh; Sepahbodi, Ghazal
2017-01-01
Background: The aim of this study was to evaluate the effect of variables such as personality traits, driving behavior and mental illness on road traffic accidents among the drivers with accidents and those without road crash. Methods: In this cohort study, 800 bus and truck drivers were recruited. Participants were selected among drivers who referred to Imam Sajjad Hospital (Tehran, Iran) during 2013-2015. The Manchester driving behavior questionnaire (MDBQ), big five personality test (NEO personality inventory) and semi-structured interview (schizophrenia and affective disorders scale) were used. After two years, we surveyed all accidents due to human factors that involved the recruited drivers. The data were analyzed using the SPSS software by performing the descriptive statistics, t-test, and multiple logistic regression analysis methods. P values less than 0.05 were considered statistically significant. Results: In terms of controlling the effective and demographic variables, the findings revealed significant differences between the two groups of drivers that were and were not involved in road accidents. In addition, it was found that depression and anxiety could increase the odds ratio (OR) of road accidents by 2.4- and 2.7-folds, respectively (P=0.04, P=0.004). It is noteworthy to mention that neuroticism alone can increase the odds of road accidents by 1.1-fold (P=0.009), but other personality factors did not have a significant effect on the equation. Conclusion: The results revealed that some mental disorders affect the incidence of road collisions. Considering the importance and sensitivity of driving behavior, it is necessary to evaluate multiple psychological factors influencing drivers before and after receiving or renewing their driver’s license. PMID:28293047
Mao, Hui-Fen; Chang, Ling-Hui; Tsai, Athena Yi-Jung; Huang, Wen-Ni; Wang, Jye
2016-01-01
Because resources for long-term care services are limited, timely and appropriate referral for rehabilitation services is critical for optimizing clients' functions and successfully integrating them into the community. We investigated which client characteristics are most relevant in predicting Taiwan's community-based occupational therapy (OT) service referral based on experts' beliefs. Data were collected in face-to-face interviews using the Multidimensional Assessment Instrument (MDAI). Community-dwelling participants (n = 221) ≥ 18 years old who reported disabilities in the previous National Survey of Long-term Care Needs in Taiwan were enrolled. The standard for referral was the judgment and agreement of two experienced occupational therapists who reviewed the results of the MDAI. Logistic regressions and Generalized Additive Models were used for analysis. Two predictive models were proposed, one using basic activities of daily living (BADLs) and one using instrumental ADLs (IADLs). Dementia, psychiatric disorders, cognitive impairment, joint range-of-motion limitations, fear of falling, behavioral or emotional problems, expressive deficits (in the BADL-based model), and limitations in IADLs or BADLs were significantly correlated with the need for referral. Both models showed high area under the curve (AUC) values on receiver operating curve testing (AUC = 0.977 and 0.972, respectively). The probability of being referred for community OT services was calculated using the referral algorithm. The referral protocol facilitated communication between healthcare professionals to make appropriate decisions for OT referrals. The methods and findings should be useful for developing referral protocols for other long-term care services.
Boutilier, J; Chan, T; Lee, T; Craig, T; Sharpe, M
2014-06-15
Purpose: To develop a statistical model that predicts optimization objective function weights from patient geometry for intensity-modulation radiotherapy (IMRT) of prostate cancer. Methods: A previously developed inverse optimization method (IOM) is applied retrospectively to determine optimal weights for 51 treated patients. We use an overlap volume ratio (OVR) of bladder and rectum for different PTV expansions in order to quantify patient geometry in explanatory variables. Using the optimal weights as ground truth, we develop and train a logistic regression (LR) model to predict the rectum weight and thus the bladder weight. Post hoc, we fix the weights of the left femoral head, right femoral head, and an artificial structure that encourages conformity to the population average while normalizing the bladder and rectum weights accordingly. The population average of objective function weights is used for comparison. Results: The OVR at 0.7cm was found to be the most predictive of the rectum weights. The LR model performance is statistically significant when compared to the population average over a range of clinical metrics including bladder/rectum V53Gy, bladder/rectum V70Gy, and mean voxel dose to the bladder, rectum, CTV, and PTV. On average, the LR model predicted bladder and rectum weights that are both 63% closer to the optimal weights compared to the population average. The treatment plans resulting from the LR weights have, on average, a rectum V70Gy that is 35% closer to the clinical plan and a bladder V70Gy that is 43% closer. Similar results are seen for bladder V54Gy and rectum V54Gy. Conclusion: Statistical modelling from patient anatomy can be used to determine objective function weights in IMRT for prostate cancer. Our method allows the treatment planners to begin the personalization process from an informed starting point, which may lead to more consistent clinical plans and reduce overall planning time.
Wei, Peng; Tang, Hongwei; Li, Donghui
2014-01-01
Most complex human diseases are likely the consequence of the joint actions of genetic and environmental factors. Identification of gene-environment (GxE) interactions not only contributes to a better understanding of the disease mechanisms, but also improves disease risk prediction and targeted intervention. In contrast to the large number of genetic susceptibility loci discovered by genome-wide association studies, there have been very few successes in identifying GxE interactions which may be partly due to limited statistical power and inaccurately measured exposures. While existing statistical methods only consider interactions between genes and static environmental exposures, many environmental/lifestyle factors, such as air pollution and diet, change over time, and cannot be accurately captured at one measurement time point or by simply categorizing into static exposure categories. There is a dearth of statistical methods for detecting gene by time-varying environmental exposure interactions. Here we propose a powerful functional logistic regression (FLR) approach to model the time-varying effect of longitudinal environmental exposure and its interaction with genetic factors on disease risk. Capitalizing on the powerful functional data analysis framework, our proposed FLR model is capable of accommodating longitudinal exposures measured at irregular time points and contaminated by measurement errors, commonly encountered in observational studies. We use extensive simulations to show that the proposed method can control the Type I error and is more powerful than alternative ad hoc methods. We demonstrate the utility of this new method using data from a case-control study of pancreatic cancer to identify the windows of vulnerability of lifetime body mass index on the risk of pancreatic cancer as well as genes which may modify this association. PMID:25219575
JACOB, BENJAMIN G.; NOVAK, ROBERT J.; TOE, LAURENT; SANFO, MOUSSA S.; AFRIYIE, ABENA N.; IBRAHIM, MOHAMMED A.; GRIFFITH, DANIEL A.; UNNASCH, THOMAS R.
2013-01-01
The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l. a major black-fly vector of Onchoceriasis, postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects. Generally, this correlation comes from two sources: (1) the design of the random effects and their assumed covariance from the multiple levels within the regression model; and, (2) the correlation structure of the residuals. Unfortunately, inconspicuous errors in residual intra-cluster correlation estimates can overstate precision in forecasted S.damnosum s.l. riverine larval habitat explanatory attributes regardless how they are treated (e.g., independent, autoregressive, Toeplitz, etc). In this research, the geographical locations for multiple riverine-based S. damnosum s.l. larval ecosystem habitats sampled from 2 pre-established epidemiological sites in Togo were identified and recorded from July 2009 to June 2010. Initially the data was aggregated into proc genmod. An agglomerative hierarchical residual cluster-based analysis was then performed. The sampled clustered study site data was then analyzed for statistical correlations using Monthly Biting Rates (MBR). Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS. A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by Annual Biting Rates (ABR). This data was overlain onto multitemporal sub-meter pixel resolution satellite data (i.e., QuickBird 0.61m wavbands ). Orthogonal spatial filter eigenvectors were then generated in SAS/GIS. Univariate and non-linear regression-based models (i.e., Logistic, Poisson and Negative Binomial) were also employed to determine probability distributions and to identify statistically significant parameter
Jacob, Benjamin G; Novak, Robert J; Toe, Laurent; Sanfo, Moussa S; Afriyie, Abena N; Ibrahim, Mohammed A; Griffith, Daniel A; Unnasch, Thomas R
2012-01-01
The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l. a major black-fly vector of Onchoceriasis, postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects. Generally, this correlation comes from two sources: (1) the design of the random effects and their assumed covariance from the multiple levels within the regression model; and, (2) the correlation structure of the residuals. Unfortunately, inconspicuous errors in residual intra-cluster correlation estimates can overstate precision in forecasted S.damnosum s.l. riverine larval habitat explanatory attributes regardless how they are treated (e.g., independent, autoregressive, Toeplitz, etc). In this research, the geographical locations for multiple riverine-based S. damnosum s.l. larval ecosystem habitats sampled from 2 pre-established epidemiological sites in Togo were identified and recorded from July 2009 to June 2010. Initially the data was aggregated into proc genmod. An agglomerative hierarchical residual cluster-based analysis was then performed. The sampled clustered study site data was then analyzed for statistical correlations using Monthly Biting Rates (MBR). Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS. A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by Annual Biting Rates (ABR). This data was overlain onto multitemporal sub-meter pixel resolution satellite data (i.e., QuickBird 0.61m wavbands ). Orthogonal spatial filter eigenvectors were then generated in SAS/GIS. Univariate and non-linear regression-based models (i.e., Logistic, Poisson and Negative Binomial) were also employed to determine probability distributions and to identify statistically significant parameter
Stratton, Kelly G; Cook, Andrea J; Jackson, Lisa A; Nelson, Jennifer C
2015-03-30
Sequential methods are well established for randomized clinical trials (RCTs), and their use in observational settings has increased with the development of national vaccine and drug safety surveillance systems that monitor large healthcare databases. Observational safety monitoring requires that sequential testing methods be better equipped to incorporate confounder adjustment and accommodate rare adverse events. New methods designed specifically for observational surveillance include a group sequential likelihood ratio test that uses exposure matching and generalized estimating equations approach that involves regression adjustment. However, little is known about the statistical performance of these methods or how they compare to RCT methods in both observational and rare outcome settings. We conducted a simulation study to determine the type I error, power and time-to-surveillance-end of group sequential likelihood ratio test, generalized estimating equations and RCT methods that construct group sequential Lan-DeMets boundaries using data from a matched (group sequential Lan-DeMets-matching) or unmatched regression (group sequential Lan-DeMets-regression) setting. We also compared the methods using data from a multisite vaccine safety study. All methods had acceptable type I error, but regression methods were more powerful, faster at detecting true safety signals and less prone to implementation difficulties with rare events than exposure matching methods. Method performance also depended on the distribution of information and extent of confounding by site. Our results suggest that choice of sequential method, especially the confounder control strategy, is critical in rare event observational settings. These findings provide guidance for choosing methods in this context and, in particular, suggest caution when conducting exposure matching.
NASA Astrophysics Data System (ADS)
Ettinger, Susanne; Mounaud, Loïc; Magill, Christina; Yao-Lafourcade, Anne-Françoise; Thouret, Jean-Claude; Manville, Vern; Negulescu, Caterina; Zuccaro, Giulio; De Gregorio, Daniela; Nardone, Stefano; Uchuchoque, Juan Alexis Luque; Arguedas, Anita; Macedo, Luisa; Manrique Llerena, Nélida
2016-10-01
bivariate analyses were applied to better characterize each vulnerability parameter. Multiple corresponding analyses revealed strong relationships between the "Distance to channel or bridges", "Structural building type", "Building footprint" and the observed damage. Logistic regression enabled quantification of the contribution of each explanatory parameter to potential damage, and determination of the significant parameters that express the damage susceptibility of a building. The model was applied 200 times on different calibration and validation data sets in order to examine performance. Results show that 90% of these tests have a success rate of more than 67%. Probabilities (at building scale) of experiencing different damage levels during a future event similar to the 8 February 2013 flash flood are the major outcomes of this study.
NASA Astrophysics Data System (ADS)
Heckmann, T.; Gegg, K.; Gegg, A.; Becht, M.
2014-02-01
Predictive spatial modelling is an important task in natural hazard assessment and regionalisation of geomorphic processes or landforms. Logistic regression is a multivariate statistical approach frequently used in predictive modelling; it can be conducted stepwise in order to select from a number of candidate independent variables those that lead to the best model. In our case study on a debris flow susceptibility model, we investigate the sensitivity of model selection and quality to different sample sizes in light of the following problem: on the one hand, a sample has to be large enough to cover the variability of geofactors within the study area, and to yield stable and reproducible results; on the other hand, the sample must not be too large, because a large sample is likely to violate the assumption of independent observations due to spatial autocorrelation. Using stepwise model selection with 1000 random samples for a number of sample sizes between n = 50 and n = 5000, we investigate the inclusion and exclusion of geofactors and the diversity of the resulting models as a function of sample size; the multiplicity of different models is assessed using numerical indices borrowed from information theory and biodiversity research. Model diversity decreases with increasing sample size and reaches either a local minimum or a plateau; even larger sample sizes do not further reduce it, and they approach the upper limit of sample size given, in this study, by the autocorrelation range of the spatial data sets. In this way, an optimised sample size can be derived from an exploratory analysis. Model uncertainty due to sampling and model selection, and its predictive ability, are explored statistically and spatially through the example of 100 models estimated in one study area and validated in a neighbouring area: depending on the study area and on sample size, the predicted probabilities for debris flow release differed, on average, by 7 to 23 percentage points. In
NASA Astrophysics Data System (ADS)
Heckmann, T.; Gegg, K.; Gegg, A.; Becht, M.
2013-06-01
Predictive spatial modelling is an important task in natural hazard assessment and regionalisation of geomorphic processes or landforms. Logistic regression is a multivariate statistical approach frequently used in predictive modelling; it can be conducted stepwise in order to select from a number of candidate independent variables those that lead to the best model. In our case study on a debris flow susceptibility model, we investigate the sensitivity of model selection and quality to different sample sizes in light of the following problem: on the one hand, a sample has to be large enough to cover the variability of geofactors within the study area, and to yield stable results; on the other hand, the sample must not be too large, because a large sample is likely to violate the assumption of independent observations due to spatial autocorrelation. Using stepwise model selection with 1000 random samples for a number of sample sizes between n = 50 and n = 5000, we investigate the inclusion and exclusion of geofactors and the diversity of the resulting models as a function of sample size; the multiplicity of different models is assessed using numerical indices borrowed from information theory and biodiversity research. Model diversity decreases with increasing sample size and reaches either a local minimum or a plateau; even larger sample sizes do not further reduce it, and approach the upper limit of sample size given, in this study, by the autocorrelation range of the spatial datasets. In this way, an optimised sample size can be derived from an exploratory analysis. Model uncertainty due to sampling and model selection, and its predictive ability, are explored statistically and spatially through the example of 100 models estimated in one study area and validated in a neighbouring area: depending on the study area and on sample size, the predicted probabilities for debris flow release differed, on average, by 7 to 23 percentage points. In view of these results, we
NASA Astrophysics Data System (ADS)
Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele
2015-04-01
first phase of the work addressed to identify the spatial relationships between the landslides location and the 13 related factors by using the Frequency Ratio bivariate statistical method. The analysis was then carried out by adopting a multivariate statistical approach, according to the Logistic Regression technique and Random Forests technique that gave best results in terms of AUC. The models were performed and evaluated with different sample sizes and also taking into account the temporal variation of input variables such as burned areas by wildfire. The most significant outcome of this work are: the relevant influence of the sample size on the model results and the strong importance of some environmental factors (e.g. land use and wildfires) for the identification of the depletion zones of extremely rapid shallow landslides.
2009-01-01
Background Laser-Doppler imaging (LDI) of cutaneous blood flow is beginning to be used by burn surgeons to predict the healing time of burn wounds; predicted healing time is used to determine wound treatment as either dressings or surgery. In this paper, we do a statistical analysis of the performance of the technique. Methods We used data from a study carried out by five burn centers: LDI was done once between days 2 to 5 post burn, and healing was assessed at both 14 days and 21 days post burn. Random-effects ordinal logistic regression and other models such as the continuation ratio model were used to model healing-time as a function of the LDI data, and of demographic and wound history variables. Statistical methods were also used to study the false-color palette, which enables the laser-Doppler imager to be used by clinicians as a decision-support tool. Results Overall performance is that diagnoses are over 90% correct. Related questions addressed were what was the best blood flow summary statistic and whether, given the blood flow measurements, demographic and observational variables had any additional predictive power (age, sex, race, % total body surface area burned (%TBSA), site and cause of burn, day of LDI scan, burn center). It was found that mean laser-Doppler flux over a wound area was the best statistic, and that, given the same mean flux, women recover slightly more slowly than men. Further, the likely degradation in predictive performance on moving to a patient group with larger %TBSA than those in the data sample was studied, and shown to be small. Conclusion Modeling healing time is a complex statistical problem, with random effects due to multiple burn areas per individual, and censoring caused by patients missing hospital visits and undergoing surgery. This analysis applies state-of-the art statistical methods such as the bootstrap and permutation tests to a medical problem of topical interest. New medical findings are that age and %TBSA are
Inokuchi, Shota; Kitayama, Tetsushi; Fujii, Koji; Nakahara, Hiroaki; Nakanishi, Hiroaki; Saito, Kazuyuki; Mizuno, Natsuko; Sekiguchi, Kazumasa
2016-03-01
Phenomena called allele dropouts are often observed in crime stain profiles. Allele dropouts are generated because one of a pair of heterozygous alleles is underrepresented by stochastic influences and is indicated by a low peak detection threshold. Therefore, it is important that such risks are statistically evaluated. In recent years, attempts to interpret allele dropout probabilities by logistic regression using the information on peak heights have been reported. However, these previous studies are limited to the use of a human identification kit and fragment analyzer. In the present study, we calculated allele dropout probabilities by logistic regression using contemporary capillary electrophoresis instruments, 3500xL Genetic Analyzer and 3130xl Genetic Analyzer with various commercially available human identification kits such as AmpFℓSTR® Identifiler® Plus PCR Amplification Kit. Furthermore, the differences in logistic curves between peak detection thresholds using analytical threshold (AT) and values recommended by the manufacturer were compared. The standard logistic curves for calculating allele dropout probabilities from the peak height of sister alleles were characterized. The present study confirmed that ATs were lower than the values recommended by the manufacturer in human identification kits; therefore, it is possible to reduce allele dropout probabilities and obtain more information using AT as the peak detection threshold.
Held, Elizabeth; Cape, Joshua; Tintle, Nathan
2016-01-01
Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data.
Sargolzaie, Narjes; Miri-Moghaddam, Ebrahim
2014-01-01
The most common differential diagnosis of β-thalassemia (β-thal) trait is iron deficiency anemia. Several red blood cell equations were introduced during different studies for differential diagnosis between β-thal trait and iron deficiency anemia. Due to genetic variations in different regions, these equations cannot be useful in all population. The aim of this study was to determine a native equation with high accuracy for differential diagnosis of β-thal trait and iron deficiency anemia for the Sistan and Baluchestan population by logistic regression analysis. We selected 77 iron deficiency anemia and 100 β-thal trait cases. We used binary logistic regression analysis and determined best equations for probability prediction of β-thal trait against iron deficiency anemia in our population. We compared diagnostic values and receiver operative characteristic (ROC) curve related to this equation and another 10 published equations in discriminating β-thal trait and iron deficiency anemia. The binary logistic regression analysis determined the best equation for best probability prediction of β-thal trait against iron deficiency anemia with area under curve (AUC) 0.998. Based on ROC curves and AUC, Green & King, England & Frazer, and then Sirdah indices, respectively, had the most accuracy after our equation. We suggest that to get the best equation and cut-off in each region, one needs to evaluate specific information of each region, specifically in areas where populations are homogeneous, to provide a specific formula for differentiating between β-thal trait and iron deficiency anemia.
Yi, Honggang; Wo, Hongmei; Zhao, Yang; Zhang, Ruyang; Dai, Junchen; Jin, Guangfu; Ma, Hongxia; Wu, Tangchun; Hu, Zhibin; Lin, Dongxin; Shen, Hongbing; Chen, Feng
2015-01-01
Abstract With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the performance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data. PMID:26243516
Ho Hoang, Khai-Long; Mombaur, Katja
2015-10-15
Dynamic modeling of the human body is an important tool to investigate the fundamentals of the biomechanics of human movement. To model the human body in terms of a multi-body system, it is necessary to know the anthropometric parameters of the body segments. For young healthy subjects, several data sets exist that are widely used in the research community, e.g. the tables provided by de Leva. None such comprehensive anthropometric parameter sets exist for elderly people. It is, however, well known that body proportions change significantly during aging, e.g. due to degenerative effects in the spine, such that parameters for young people cannot be used for realistically simulating the dynamics of elderly people. In this study, regression equations are derived from the inertial parameters, center of mass positions, and body segment lengths provided by de Leva to be adjustable to the changes in proportion of the body parts of male and female humans due to aging. Additional adjustments are made to the reference points of the parameters for the upper body segments as they are chosen in a more practicable way in the context of creating a multi-body model in a chain structure with the pelvis representing the most proximal segment.
Heller, J; Innocent, G T; Denwood, M; Reid, S W J; Kelly, L; Mellor, D J
2011-05-01
Meticillin-resistant Staphylococcus aureus (MRSA) is an important nosocomial and community-acquired pathogen with zoonotic potential. The relationship between MRSA in humans and companion animals is poorly understood. This study presents a quantitative exposure assessment, based on expert opinion and published data, in the form of a second order stochastic simulation model with accompanying logistic regression sensitivity analysis that aims to define the most important factors for MRSA acquisition in dogs. The simulation model was parameterised using expert opinion estimates, along with published and unpublished data. The outcome of the model was biologically plausible and found to be dominated by uncertainty over variability. The sensitivity analysis, in the form of four separate logistic regression models, found that both veterinary and non-veterinary routes of acquisition of MRSA are likely to be relevant for dogs. The effects of exposure to, and probability of, transmission of MRSA from the home environment were ranked as the most influential predictors in all sensitivity analyses, although it is unlikely that this environmental source of MRSA is independent of alternative sources of MRSA (human and/or animal). Exposure to and transmission from MRSA positive family members were also found to be influential for acquisition of MRSA in pet dogs, along with veterinary clinic attendance and, while exposure to and transmission from the veterinary clinic environment was also found to be influential, it was difficult to differentiate between the importance of independent sources of MRSA within the veterinary clinic. The implementation of logistic regression analyses directly to the input/output relationship within the simulation model presented in this paper represents the application of a variance based sensitivity analysis technique in the area of veterinary medicine and is a useful means of ranking the relative importance of input variables.
Liu, Shujie; Kawamoto, Taisuke; Morita, Osamu; Yoshinari, Kouichi; Honda, Hiroshi
2017-03-01
Chemical exposure often results in liver hypertrophy in animal tests, characterized by increased liver weight, hepatocellular hypertrophy, and/or cell proliferation. While most of these changes are considered adaptive responses, there is concern that they may be associated with carcinogenesis. In this study, we have employed a toxicogenomic approach using a logistic ridge regression model to identify genes responsible for liver hypertrophy and hypertrophic hepatocarcinogenesis and to develop a predictive model for assessing hypertrophy-inducing compounds. Logistic regression models have previously been used in the quantification of epidemiological risk factors. DNA microarray data from the Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System were used to identify hypertrophy-related genes that are expressed differently in hypertrophy induced by carcinogens and non-carcinogens. Data were collected for 134 chemicals (72 non-hypertrophy-inducing chemicals, 27 hypertrophy-inducing non-carcinogenic chemicals, and 15 hypertrophy-inducing carcinogenic compounds). After applying logistic ridge regression analysis, 35 genes for liver hypertrophy (e.g., Acot1 and Abcc3) and 13 genes for hypertrophic hepatocarcinogenesis (e.g., Asns and Gpx2) were selected. The predictive models built using these genes were 94.8% and 82.7% accurate, respectively. Pathway analysis of the genes indicates that, aside from a xenobiotic metabolism-related pathway as an adaptive response for liver hypertrophy, amino acid biosynthesis and oxidative responses appear to be involved in hypertrophic hepatocarcinogenesis. Early detection and toxicogenomic characterization of liver hypertrophy using our models may be useful for predicting carcinogenesis. In addition, the identified genes provide novel insight into discrimination between adverse hypertrophy associated with carcinogenesis and adaptive hypertrophy in risk assessment.
2016-01-01
Understanding the relationship between physiological measurements from human subjects and their demographic data is important within both the biometric and forensic domains. In this paper we explore the relationship between measurements of the human hand and a range of demographic features. We assess the ability of linear regression and machine learning classifiers to predict demographics from hand features, thereby providing evidence on both the strength of relationship and the key features underpinning this relationship. Our results show that we are able to predict sex, height, weight and foot size accurately within various data-range bin sizes, with machine learning classification algorithms out-performing linear regression in most situations. In addition, we identify the features used to provide these relationships applicable across multiple applications. PMID:27806075
Miguel-Hurtado, Oscar; Guest, Richard; Stevenage, Sarah V; Neil, Greg J; Black, Sue
2016-01-01
Understanding the relationship between physiological measurements from human subjects and their demographic data is important within both the biometric and forensic domains. In this paper we explore the relationship between measurements of the human hand and a range of demographic features. We assess the ability of linear regression and machine learning classifiers to predict demographics from hand features, thereby providing evidence on both the strength of relationship and the key features underpinning this relationship. Our results show that we are able to predict sex, height, weight and foot size accurately within various data-range bin sizes, with machine learning classification algorithms out-performing linear regression in most situations. In addition, we identify the features used to provide these relationships applicable across multiple applications.
Zhang, B; Liang, X L; Gao, H Y; Ye, L S; Wang, Y G
2016-05-13
We evaluated the application of three machine learning algorithms, including logistic regression, support vector machine and back-propagation neural network, for diagnosing congenital heart disease and colorectal cancer. By inspecting related serum tumor marker levels in colorectal cancer patients and healthy subjects, early diagnosis models for colorectal cancer were built using three machine learning algorithms to assess their corresponding diagnostic values. Except for serum alpha-fetoprotein, the levels of 11 other serum markers of patients in the colorectal cancer group were higher than those in the benign colorectal cancer group (P < 0.05). The results of logistic regression analysis indicted that individual detection of serum carcinoembryonic antigens, CA199, CA242, CA125, and CA153 and their combined detection was effective for diagnosing colorectal cancer. Combined detection had a better diagnostic effect with a sensitivity of 94.2% and specificity of 97.7%; combining serum carcinoembryonic antigens, CA199, CA242, CA125, and CA153, with the support vector machine diagnosis model and back-propagation, a neural network diagnosis model was built with diagnostic accuracies of 82 and 75%, sensitivities of 85 and 80%, and specificities of 80 and 70%, respectively. Colorectal cancer diagnosis models based on the three machine learning algorithms showed high diagnostic value and can help obtain evidence for the early diagnosis of colorectal cancer.
Mei, Suyu; Zhang, Kun
2016-01-01
Protein-protein interaction (PPI) networks are naturally viewed as infrastructure to infer signalling pathways. The descriptors of signal events between two interacting proteins such as upstream/downstream signal flow, activation/inhibition relationship and protein modification are indispensable for inferring signalling pathways from PPI networks. However, such descriptors are not available in most cases as most PPI networks are seldom semantically annotated. In this work, we extend ℓ2-regularized logistic regression to the scenario of multi-label learning for predicting the activation/inhibition relationships in human PPI networks. The phenomenon that both activation and inhibition relationships exist between two interacting proteins is computationally modelled by multi-label learning framework. The problem of GO (gene ontology) sparsity is tackled by introducing the homolog knowledge as independent homolog instances. ℓ2-regularized logistic regression is accordingly adopted here to penalize the homolog noise and to reduce the computational complexity of the double-sized training data. Computational results show that the proposed method achieves satisfactory multi-label learning performance and outperforms the existing phenotype correlation method on the experimental data of Drosophila melanogaster. Several predictions have been validated against recent literature. The predicted activation/inhibition relationships in human PPI networks are provided in the supplementary file for further biomedical research. PMID:27819359
NASA Astrophysics Data System (ADS)
Dixon, Barnali
2009-09-01
Accurate and inexpensive identification of potentially contaminated wells is critical for water resources protection and management. The objectives of this study are to 1) assess the suitability of approximation tools such as neural networks (NN) and support vector machines (SVM) integrated in a geographic information system (GIS) for identifying contaminated wells and 2) use logistic regression and feature selection methods to identify significant variables for transporting contaminants in and through the soil profile to the groundwater. Fourteen GIS derived soil hydrogeologic and landuse parameters were used as initial inputs in this study. Well water quality data (nitrate-N) from 6,917 wells provided by Florida Department of Environmental Protection (USA) were used as an output target class. The use of the logistic regression and feature selection methods reduced the number of input variables to nine. Receiver operating characteristics (ROC) curves were used for evaluation of these approximation tools. Results showed superior performance with the NN as compared to SVM especially on training data while testing results were comparable. Feature selection did not improve accuracy; however, it helped increase the sensitivity or true positive rate (TPR). Thus, a higher TPR was obtainable with fewer variables.
Belletti, Nicoletta; Kamdem, Sylvain Sado; Patrignani, Francesca; Lanciotti, Rosalba; Covelli, Alessandro; Gardini, Fausto
2007-09-01
The combined effects of a mild heat treatment (55 degrees C) and the presence of three aroma compounds [citron essential oil, citral, and (E)-2-hexenal] on the spoilage of noncarbonated beverages inoculated with different amounts of a Saccharomyces cerevisiae strain were evaluated. The results, expressed as growth/no growth, were elaborated using a logistic regression in order to assess the probability of beverage spoilage as a function of thermal treatment length, concentration of flavoring agents, and yeast inoculum. The logit models obtained for the three substances were extremely precise. The thermal treatment alone, even if prolonged for 20 min, was not able to prevent yeast growth. However, the presence of increasing concentrations of aroma compounds improved the stability of the products. The inhibiting effect of the compounds was enhanced by a prolonged thermal treatment. In fact, it influenced the vapor pressure of the molecules, which can easily interact within microbial membranes when they are in gaseous form. (E)-2-Hexenal showed a threshold level, related to initial inoculum and thermal treatment length, over which yeast growth was rapidly inhibited. Concentrations over 100 ppm of citral and thermal treatment longer than 16 min allowed a 90% probability of stability for bottles inoculated with 10(5) CFU/bottle. Citron gave the most interesting responses: beverages with 500 ppm of essential oil needed only 3 min of treatment to prevent yeast growth. In this framework, the logistic regression proved to be an important tool to study alternative hurdle strategies for the stabilization of noncarbonated beverages.
Huo, Yuankai; Aboud, Katherine; Kang, Hakmook; Cutting, Laurie E; Landman, Bennett A
2016-10-01
Understanding brain volumetry is essential to understand neurodevelopment and disease. Historically, age-related changes have been studied in detail for specific age ranges (e.g., early childhood, teen, young adults, elderly, etc.) or more sparsely sampled for wider considerations of lifetime aging. Recent advancements in data sharing and robust processing have made available considerable quantities of brain images from normal, healthy volunteers. However, existing analysis approaches have had difficulty addressing (1) complex volumetric developments on the large cohort across the life time (e.g., beyond cubic age trends), (2) accounting for confound effects, and (3) maintaining an analysis framework consistent with the general linear model (GLM) approach pervasive in neuroscience. To address these challenges, we propose to use covariate-adjusted restricted cubic spline (C-RCS) regression within a multi-site cross-sectional framework. This model allows for flexible consideration of non-linear age-associated patterns while accounting for traditional covariates and interaction effects. As a demonstration of this approach on lifetime brain aging, we derive normative volumetric trajectories and 95% confidence intervals from 5111 healthy patients from 64 sites while accounting for confounding sex, intracranial volume and field strength effects. The volumetric results are shown to be consistent with traditional studies that have explored more limited age ranges using single-site analyses. This work represents the first integration of C-RCS with neuroimaging and the derivation of structural covariance networks (SCNs) from a large study of multi-site, cross-sectional data.
Camargos, Vitor Passos; César, Cibele Comini; Caiaffa, Waleska Teixeira; Xavier, Cesar Coelho; Proietti, Fernando Augusto
2011-12-01
Researchers in the health field often deal with the problem of incomplete databases. Complete Case Analysis (CCA), which restricts the analysis to subjects with complete data, reduces the sample size and may result in biased estimates. Based on statistical grounds, Multiple Imputation (MI) uses all collected data and is recommended as an alternative to CCA. Data from the study Saúde em Beagá, attended by 4,048 adults from two of nine health districts in the city of Belo Horizonte, Minas Gerais State, Brazil, in 2008-2009, were used to evaluate CCA and different MI approaches in the context of logistic models with incomplete covariate data. Peculiarities in some variables in this study allowed analyzing a situation in which the missing covariate data are recovered and thus the results before and after recovery are compared. Based on the analysis, even the more simplistic MI approach performed better than CCA, since it was closer to the post-recovery results.
de Vet, Emely; Chinapaw, Mai JM; de Boer, Michiel; Seidell, Jacob C; Brug, Johannes
2014-01-01
Background Playing video games contributes substantially to sedentary behavior in youth. A new generation of video games—active games—seems to be a promising alternative to sedentary games to promote physical activity and reduce sedentary behavior. At this time, little is known about correlates of active and non-active gaming among adolescents. Objective The objective of this study was to examine potential personal, social, and game-related correlates of both active and non-active gaming in adolescents. Methods A survey assessing game behavior and potential personal, social, and game-related correlates was conducted among adolescents (12-16 years, N=353) recruited via schools. Multivariable, multilevel logistic regression analyses, adjusted for demographics (age, sex and educational level of adolescents), were conducted to examine personal, social, and game-related correlates of active gaming ≥1 hour per week (h/wk) and non-active gaming >7 h/wk. Results Active gaming ≥1 h/wk was significantly associated with a more positive attitude toward active gaming (OR 5.3, CI 2.4-11.8; P<.001), a less positive attitude toward non-active games (OR 0.30, CI 0.1-0.6; P=.002), a higher score on habit strength regarding gaming (OR 1.9, CI 1.2-3.2; P=.008) and having brothers/sisters (OR 6.7, CI 2.6-17.1; P<.001) and friends (OR 3.4, CI 1.4-8.4; P=.009) who spend more time on active gaming and a little bit lower score on game engagement (OR 0.95, CI 0.91-0.997; P=.04). Non-active gaming >7 h/wk was significantly associated with a more positive attitude toward non-active gaming (OR 2.6, CI 1.1-6.3; P=.035), a stronger habit regarding gaming (OR 3.0, CI 1.7-5.3; P<.001), having friends who spend more time on non-active gaming (OR 3.3, CI 1.46-7.53; P=.004), and a more positive image of a non-active gamer (OR 2, CI 1.07–3.75; P=.03). Conclusions Various factors were significantly associated with active gaming ≥1 h/wk and non-active gaming >7 h/wk. Active gaming is most
NASA Astrophysics Data System (ADS)
WU, Chunhung
2015-04-01
The research built the original logistic regression landslide susceptibility model (abbreviated as or-LRLSM) and landslide ratio-based ogistic regression landslide susceptibility model (abbreviated as lr-LRLSM), compared the performance and explained the error source of two models. The research assumes that the performance of the logistic regression model can be better if the distribution of landslide ratio and weighted value of each variable is similar. Landslide ratio is the ratio of landslide area to total area in the specific area and an useful index to evaluate the seriousness of landslide disaster in Taiwan. The research adopted the landside inventory induced by 2009 Typhoon Morakot in the Chishan watershed, which was the most serious disaster event in the last decade, in Taiwan. The research adopted the 20 m grid as the basic unit in building the LRLSM, and six variables, including elevation, slope, aspect, geological formation, accumulated rainfall, and bank erosion, were included in the two models. The six variables were divided as continuous variables, including elevation, slope, and accumulated rainfall, and categorical variables, including aspect, geological formation and bank erosion in building the or-LRLSM, while all variables, which were classified based on landslide ratio, were categorical variables in building the lr-LRLSM. Because the count of whole basic unit in the Chishan watershed was too much to calculate by using commercial software, the research took random sampling instead of the whole basic units. The research adopted equal proportions of landslide unit and not landslide unit in logistic regression analysis. The research took 10 times random sampling and selected the group with the best Cox & Snell R2 value and Nagelkerker R2 value as the database for the following analysis. Based on the best result from 10 random sampling groups, the or-LRLSM (lr-LRLSM) is significant at the 1% level with Cox & Snell R2 = 0.190 (0.196) and Nagelkerke R2
Chen, Cong; Zhang, Guohui; Liu, Xiaoyue Cathy; Ci, Yusheng; Huang, Helai; Ma, Jianming; Chen, Yanyan; Guan, Hongzhi
2016-12-01
There is a high potential of severe injury outcomes in traffic crashes on rural interstate highways due to the significant amount of high speed traffic on these corridors. Hierarchical Bayesian models are capable of incorporating between-crash variance and within-crash correlations into traffic crash data analysis and are increasingly utilized in traffic crash severity analysis. This paper applies a hierarchical Bayesian logistic model to examine the significant factors at crash and vehicle/driver levels and their heterogeneous impacts on driver injury severity in rural interstate highway crashes. Analysis results indicate that the majority of the total variance is induced by the between-crash variance, showing the appropriateness of the utilized hierarchical modeling approach. Three crash-level variables and six vehicle/driver-level variables are found significant in predicting driver injury severities: road curve, maximum vehicle damage in a crash, number of vehicles in a crash, wet road surface, vehicle type, driver age, driver gender, driver seatbelt use and driver alcohol or drug involvement. Among these variables, road curve, functional and disabled vehicle damage in crash, single-vehicle crashes, female drivers, senior drivers, motorcycles and driver alcohol or drug involvement tend to increase the odds of drivers being incapably injured or killed in rural interstate crashes, while wet road surface, male drivers and driver seatbelt use are more likely to decrease the probability of severe driver injuries. The developed methodology and estimation results provide insightful understanding of the internal mechanism of rural interstate crashes and beneficial references for developing effective countermeasures for rural interstate crash prevention.
Choi, Seung W.; Gibbons, Laura E.; Crane, Paul K.
2011-01-01
Logistic regression provides a flexible framework for detecting various types of differential item functioning (DIF). Previous efforts extended the framework by using item response theory (IRT) based trait scores, and by employing an iterative process using group–specific item parameters to account for DIF in the trait scores, analogous to purification approaches used in other DIF detection frameworks. The current investigation advances the technique by developing a computational platform integrating both statistical and IRT procedures into a single program. Furthermore, a Monte Carlo simulation approach was incorporated to derive empirical criteria for various DIF statistics and effect size measures. For purposes of illustration, the procedure was applied to data from a questionnaire of anxiety symptoms for detecting DIF associated with age from the Patient–Reported Outcomes Measurement Information System. PMID:21572908
Uchino, Makoto; Hirano, Teruyuki; Satoh, Hiroshi; Arimura, Kimiyoshi; Nakagawa, Masanori; Wakamiya, Jyunji
2005-01-01
Minamata disease (MD) was caused by ingestion of seafood from the methylmercury-contaminated areas. Although 50 years have passed since the discovery of MD, there have been only a few studies on the temporal profile of neurological findings in certified MD patients. Thus, we evaluated changes in neurological symptoms and signs of MD using discriminants by multiple logistic regression analysis. The severity of predictive index declined in 25 years in most of the patients. Only a few patients showed aggravation of neurological findings, which was due to complications such as spino-cerebellar degeneration. Patients with chronic MD aged over 45 years had several concomitant diseases so that their clinical pictures were complicated. It was difficult to differentiate chronic MD using statistically established discriminants based on sensory disturbance alone. In conclusion, the severity of MD declined in 25 years along with the modification by age-related concomitant disorders.
Dubovyk, Olena; Menz, Gunter; Conrad, Christopher; Kan, Elena; Machwitz, Miriam; Khamzina, Asia
2013-06-01
Advancing land degradation in the irrigated areas of Central Asia hinders sustainable development of this predominantly agricultural region. To support decisions on mitigating cropland degradation, this study combines linear trend analysis and spatial logistic regression modeling to expose a land degradation trend in the Khorezm region, Uzbekistan, and to analyze the causes. Time series of the 250-m MODIS NDVI, summed over the growing seasons of 2000-2010, were used to derive areas with an apparent negative vegetation trend; this was interpreted as an indicator of land degradation. About one third (161,000 ha) of the region's area experienced negative trends of different magnitude. The vegetation decline was particularly evident on the low-fertility lands bordering on the natural sandy desert, suggesting that these areas should be prioritized in mitigation planning. The results of logistic modeling indicate that the spatial pattern of the observed trend is mainly associated with the level of the groundwater table (odds = 330 %), land-use intensity (odds = 103 %), low soil quality (odds = 49 %), slope (odds = 29 %), and salinity of the groundwater (odds = 26 %). Areas, threatened by land degradation, were mapped by fitting the estimated model parameters to available data. The elaborated approach, combining remote-sensing and GIS, can form the basis for developing a common tool for monitoring land degradation trends in irrigated croplands of Central Asia.
Chan, Kung-Sik; Jiao, Feiran; Mikulski, Marek A.; Gerke, Alicia; Guo, Junfeng; Newell, John D; Hoffman, Eric A.; Thompson, Brad; Lee, Chang Hyun; Fuortes, Laurence J.
2015-01-01
Rationale and Objectives We evaluated the role of automated quantitative computed tomography (CT) scan interpretation algorithm in detecting Interstitial Lung Disease (ILD) and/or emphysema in a sample of elderly subjects with mild lung disease.ypothesized that the quantification and distributions of CT attenuation values on lung CT, over a subset of Hounsfield Units (HU) range [−1000 HU, 0 HU], can differentiate early or mild disease from normal lung. Materials and Methods We compared results of quantitative spiral rapid end-exhalation (functional residual capacity; FRC) and end-inhalation (total lung capacity; TLC) CT scan analyses in 52 subjects with radiographic evidence of mild fibrotic lung disease to 17 normal subjects. Several CT value distributions were explored, including (i) that from the peripheral lung taken at TLC (with peels at 15 or 65mm), (ii) the ratio of (i) to that from the core of lung, and (iii) the ratio of (ii) to its FRC counterpart. We developed a fused-lasso logistic regression model that can automatically identify sub-intervals of [−1000 HU, 0 HU] over which a CT value distribution provides optimal discrimination between abnormal and normal scans. Results The fused-lasso logistic regression model based on (ii) with 15 mm peel identified the relative frequency of CT values over [−1000, −900] and that over [−450,−200] HU as a means of discriminating abnormal versus normal, resulting in a zero out-sample false positive rate and 15%false negative rate of that was lowered to 12% by pooling information. Conclusions We demonstrated the potential usefulness of this novel quantitative imaging analysis method in discriminating ILD and/or emphysema from normal lungs. PMID:26776294
Robertson, John M.; Soehn, Matthias; Yan Di
2010-05-01
Purpose: Understanding the dose-volume relationship of small bowel irradiation and severe acute diarrhea may help reduce the incidence of this side effect during adjuvant treatment for rectal cancer. Methods and Materials: Consecutive patients treated curatively for rectal cancer were reviewed, and the maximum grade of acute diarrhea was determined. The small bowel was outlined on the treatment planning CT scan, and a dose-volume histogram was calculated for the initial pelvic treatment (45 Gy). Logistic regression models were fitted for varying cutoff-dose levels from 5 to 45 Gy in 5-Gy increments. The model with the highest LogLikelihood was used to develop a cutoff-dose normal tissue complication probability (NTCP) model. Results: There were a total of 152 patients (48% preoperative, 47% postoperative, 5% other), predominantly treated prone (95%) with a three-field technique (94%) and a protracted venous infusion of 5-fluorouracil (78%). Acute Grade 3 diarrhea occurred in 21%. The largest LogLikelihood was found for the cutoff-dose logistic regression model with 15 Gy as the cutoff-dose, although the models for 20 Gy and 25 Gy had similar significance. According to this model, highly significant correlations (p <0.001) between small bowel volumes receiving at least 15 Gy and toxicity exist in the considered patient population. Similar findings applied to both the preoperatively (p = 0.001) and postoperatively irradiated groups (p = 0.001). Conclusion: The incidence of Grade 3 diarrhea was significantly correlated with the volume of small bowel receiving at least 15 Gy using a cutoff-dose NTCP model.
Snyder, Marcia; Freeman, Mary C.; Purucker, S. Thomas; Pringle, Catherine M.
2016-01-01
Freshwater shrimps are an important biotic component of tropical ecosystems. However, they can have a low probability of detection when abundances are low. We sampled 3 of the most common freshwater shrimp species, Macrobrachium olfersii, Macrobrachium carcinus, and Macrobrachium heterochirus, and used occupancy modeling and logistic regression models to improve our limited knowledge of distribution of these cryptic species by investigating both local- and landscape-scale effects at La Selva Biological Station in Costa Rica. Local-scale factors included substrate type and stream size, and landscape-scale factors included presence or absence of regional groundwater inputs. Capture rates for 2 of the sampled species (M. olfersii and M. carcinus) were sufficient to compare the fit of occupancy models. Occupancy models did not converge for M. heterochirus, but M. heterochirus had high enough occupancy rates that logistic regression could be used to model the relationship between occupancy rates and predictors. The best-supported models for M. olfersii and M. carcinus included conductivity, discharge, and substrate parameters. Stream size was positively correlated with occupancy rates of all 3 species. High stream conductivity, which reflects the quantity of regional groundwater input into the stream, was positively correlated with M. olfersii occupancy rates. Boulder substrates increased occupancy rate of M. carcinus and decreased the detection probability of M. olfersii. Our models suggest that shrimp distribution is driven by factors that function at local (substrate and discharge) and landscape (conductivity) scales.
NASA Astrophysics Data System (ADS)
Lombardo, L.; Cama, M.; Maerker, M.; Parisi, L.; Rotigliano, E.
2014-12-01
This study aims at comparing the performances of Binary Logistic Regression (BLR) and Boosted Regression Trees (BRT) methods in assessing landslide susceptibility for multiple-occurrence regional landslide events within the Mediterranean region. A test area was selected in the north-eastern sector of Sicily (southern Italy), corresponding to the catchments of the Briga and the Giampilieri streams both stretching for few kilometres from the Peloritan ridge (eastern Sicily, Italy) to the Ionian sea. This area was struck on the 1st October 2009 by an extreme climatic event resulting in thousands of rapid shallow landslides, mainly of debris flows and debris avalanches types involving the weathered layer of a low to high grade metamorphic bedrock. Exploiting the same set of predictors and the 2009 landslide archive, BLR- and BRT-based susceptibility models were obtained for the two catchments separately, adopting a random partition (RP) technique for validation; besides, the models trained in one of the two catchments (Briga) were tested in predicting the landslide distribution in the other (Giampilieri), adopting a spatial partition (SP) based validation procedure. All the validation procedures were based on multi-folds tests so to evaluate and compare the reliability of the fitting, the prediction skill, the coherence in the predictor selection and the precision of the susceptibility estimates. All the obtained models for the two methods produced very high predictive performances, with a general congruence between BLR and BRT in the predictor importance. In particular, the research highlighted that BRT-models reached a higher prediction performance with respect to BLR-models, for RP based modelling, whilst for the SP-based models the difference in predictive skills between the two methods dropped drastically, converging to an analogous excellent performance. However, when looking at the precision of the probability estimates, BLR demonstrated to produce more robust
O'Dwyer, Jean; Morris Downes, Margaret; Adley, Catherine C
2016-02-01
This study analyses the relationship between meteorological phenomena and outbreaks of waterborne-transmitted vero cytotoxin-producing Escherichia coli (VTEC) in the Republic of Ireland over an 8-year period (2005-2012). Data pertaining to the notification of waterborne VTEC outbreaks were extracted from the Computerised Infectious Disease Reporting system, which is administered through the national Health Protection Surveillance Centre as part of the Health Service Executive. Rainfall and temperature data were obtained from the national meteorological office and categorised as cumulative rainfall, heavy rainfall events in the previous 7 days, and mean temperature. Regression analysis was performed using logistic regression (LR) analysis. The LR model was significant (p < 0.001), with all independent variables: cumulative rainfall, heavy rainfall and mean temperature making a statistically significant contribution to the model. The study has found that rainfall, particularly heavy rainfall in the preceding 7 days of an outbreak, is a strong statistical indicator of a waterborne outbreak and that temperature also impacts waterborne VTEC outbreak occurrence.
2009-01-01
Background There is a growing awareness that interaction between multiple genes play an important role in the risk of common, complex multi-factorial diseases. Many common diseases are affected by certain genotype combinations (associated with some genes and their interactions). The identification and characterization of these susceptibility genes and gene-gene interaction have been limited by small sample size and large number of potential interactions between genes. Several methods have been proposed to detect gene-gene interaction in a case control study. The penalized logistic regression (PLR), a variant of logistic regression with L2 regularization, is a parametric approach to detect gene-gene interaction. On the other hand, the Multifactor Dimensionality Reduction (MDR) is a nonparametric and genetic model-free approach to detect genotype combinations associated with disease risk. Methods We compared the power of MDR and PLR for detecting two-way and three-way interactions in a case-control study through extensive simulations. We generated several interaction models with different magnitudes of interaction effect. For each model, we simulated 100 datasets, each with 200 cases and 200 controls and 20 SNPs. We considered a wide variety of models such as models with just main effects, models with only interaction effects or models with both main and interaction effects. We also compared the performance of MDR and PLR to detect gene-gene interaction associated with acute rejection(AR) in kidney transplant patients. Results In this paper, we have studied the power of MDR and PLR for detecting gene-gene interaction in a case-control study through extensive simulation. We have compared their performances for different two-way and three-way interaction models. We have studied the effect of different allele frequencies on these methods. We have also implemented their performance on a real dataset. As expected, none of these methods were consistently better for all
NASA Astrophysics Data System (ADS)
Nahib, Irmadi; Suryanta, Jaka
2017-01-01
Forest destruction, climate change and global warming could reduce an indirect forest benefit because forest is the largest carbon sink and it plays a very important role in global carbon cycle. To support Reducing Emissions from Deforestation and Forest Degradation (REDD +) program, people pay attention of forest cover changes as the basis for calculating carbon stock changes. This study try to explore the forest cover dynamics as well as the prediction model of forest cover in Indragiri Hulu Regency, Riau Province Indonesia. The study aims to analyse some various explanatory variables associated with forest conversion processes and predict forest cover change using logistic regression model (LRM). The main data used in this study is Land use/cover map (1990 – 2011). Performance of developed model was assessed through a comparison of the predicted model of forest cover change and the actual forest cover in 2011. The analysis result showed that forest cover has decreased continuously between 1990 and 2011, up to the loss of 165,284.82 ha (35.19 %) of forest area. The LRM successfully predicted the forest cover for the period 2010 with reasonably high accuracy (ROC = 92.97 % and 70.26 %).
NASA Astrophysics Data System (ADS)
Sidi, P.; Mamat, M.; Sukono; Supian, S.
2017-01-01
Floods have always occurred in the Citarum river basin. The adverse effects caused by floods can cover all their property, including the destruction of houses. The impact due to damage to residential buildings is usually not small. Indeed, each of flooding, the government and several social organizations providing funds to repair the building. But the donations are given very limited, so it cannot cover the entire cost of repair was necessary. The presence of insurance products for property damage caused by the floods is considered very important. However, if its presence is also considered necessary by the public or not? In this paper, the factors that affect the supply and demand of insurance product for damaged building due to floods are analyzed. The method used in this analysis is the ordinal logistic regression. Based on the analysis that the factors that affect the supply and demand of insurance product for damaged building due to floods, it is included: age, economic circumstances, family situations, insurance motivations, and lifestyle. Simultaneously that the factors affecting supply and demand of insurance product for damaged building due to floods mounted to 65.7%.
Allahyari, Elahe; Jafari, Peyman; Bagheri, Zahra
2016-01-01
Objective. The present study uses simulated data to find what the optimal number of response categories is to achieve adequate power in ordinal logistic regression (OLR) model for differential item functioning (DIF) analysis in psychometric research. Methods. A hypothetical ten-item quality of life scale with three, four, and five response categories was simulated. The power and type I error rates of OLR model for detecting uniform DIF were investigated under different combinations of ability distribution (θ), sample size, sample size ratio, and the magnitude of uniform DIF across reference and focal groups. Results. When θ was distributed identically in the reference and focal groups, increasing the number of response categories from 3 to 5 resulted in an increase of approximately 8% in power of OLR model for detecting uniform DIF. The power of OLR was less than 0.36 when ability distribution in the reference and focal groups was highly skewed to the left and right, respectively. Conclusions. The clearest conclusion from this research is that the minimum number of response categories for DIF analysis using OLR is five. However, the impact of the number of response categories in detecting DIF was lower than might be expected. PMID:27403207
NASA Astrophysics Data System (ADS)
Bornaetxea, Txomin; Antigüedad, Iñaki; Ormaetxea, Orbange
2016-04-01
In the Oria river basin (885 km2) shallow landslides are very frequent and they produce several roadblocks and damage in the infrastructure and properties, causing big economic loss every year. Considering that the zonification of the territory in different landslide susceptibility levels provides a useful tool for the territorial planning and natural risk management, this study has the objective of identifying the most prone landslide places applying an objective and reproducible methodology. To do so, a quantitative multivariate methodology, the logistic regression, has been used. Fieldwork landslide points and randomly selected stable points have been used along with Lithology, Land Use, Distance to the transport infrastructure, Altitude, Senoidal Slope and Normalized Difference Vegetation Index (NDVI) independent variables to carry out a landslide susceptibility map. The model has been validated by the prediction and success rate curves and their corresponding area under the curve (AUC). In addition, the result has been compared to those from two landslide susceptibility models, covering the study area previously applied in different scales, such as ELSUS1000 version 1 (2013) and Landslide Susceptibility Map of Gipuzkoa (2007). Validation results show an excellent prediction capacity of the proposed model (AUC 0,962), and comparisons highlight big differences with previous studies.
Hung, J.; Chaitman, B.R.; Lam, J.; Lesperance, J.; Dupras, G.; Fines, P.; Cherkaoui, O.; Robert, P.; Bourassa, M.G.
1985-08-01
The incremental diagnostic yield of clinical data, exercise ECG, stress thallium scintigraphy, and cardiac fluoroscopy to predict coronary and multivessel disease was assessed in 171 symptomatic men by means of multiple logistic regression analyses. When clinical variables alone were analyzed, chest pain type and age were predictive of coronary disease, whereas chest pain type, age, a family history of premature coronary disease before age 55 years, and abnormal ST-T wave changes on the rest ECG were predictive of multivessel disease. The percentage of patients correctly classified by cardiac fluoroscopy (presence or absence of coronary artery calcification), exercise ECG, and thallium scintigraphy was 9%, 25%, and 50%, respectively, greater than for clinical variables, when the presence or absence of coronary disease was the outcome, and 13%, 25%, and 29%, respectively, when multivessel disease was studied; 5% of patients were misclassified. When the 37 clinical and noninvasive test variables were analyzed jointly, the most significant variable predictive of coronary disease was an abnormal thallium scan and for multivessel disease, the amount of exercise performed. The data from this study provide a quantitative model and confirm previous reports that optimal diagnostic efficacy is obtained when noninvasive tests are ordered sequentially. In symptomatic men, cardiac fluoroscopy is a relatively ineffective test when compared to exercise ECG and thallium scintigraphy.
Mumcu Kucuker, Derya; Baskent, Emin Zeki
2015-01-01
Integration of non-wood forest products (NWFPs) into forest management planning has become an increasingly important issue in forestry over the last decade. Among NWFPs, mushrooms are valued due to their medicinal, commercial, high nutritional and recreational importance. Commercial mushroom harvesting also provides important income to local dwellers and contributes to the economic value of regional forests. Sustainable management of these products at the regional scale requires information on their locations in diverse forest settings and the ability to predict and map their spatial distributions over the landscape. This study focuses on modeling the spatial distribution of commercially harvested Lactarius deliciosus and L. salmonicolor mushrooms in the Kızılcasu Forest Planning Unit, Turkey. The best models were developed based on topographic, climatic and stand characteristics, separately through logistic regression analysis using SPSS™. The best topographic model provided better classification success (69.3 %) than the best climatic (65.4 %) and stand (65 %) models. However, the overall best model, with 73 % overall classification success, used a mix of several variables. The best models were integrated into an Arc/Info GIS program to create spatial distribution maps of L. deliciosus and L. salmonicolor in the planning area. Our approach may be useful to predict the occurrence and distribution of other NWFPs and provide a valuable tool for designing silvicultural prescriptions and preparing multiple-use forest management plans.
ERIC Educational Resources Information Center
Tipton, Elizabeth; Pustejovsky, James E.
2015-01-01
Randomized experiments are commonly used to evaluate the effectiveness of educational interventions. The goal of the present investigation is to develop small-sample corrections for multiple contrast hypothesis tests (i.e., F-tests) such as the omnibus test of meta-regression fit or a test for equality of three or more levels of a categorical…
Unitary Response Regression Models
ERIC Educational Resources Information Center
Lipovetsky, S.
2007-01-01
The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…
Mendes, Renata G.; de Souza, César R.; Machado, Maurício N.; Correa, Paulo R.; Di Thommazo-Luporini, Luciana; Arena, Ross; Myers, Jonathan; Pizzolato, Ednaldo B.
2015-01-01
Introduction In coronary artery bypass (CABG) surgery, the common complications are the need for reintubation, prolonged mechanical ventilation (PMV) and death. Thus, a reliable model for the prognostic evaluation of those particular outcomes is a worthwhile pursuit. The existence of such a system would lead to better resource planning, cost reductions and an increased ability to guide preventive strategies. The aim of this study was to compare different methods – logistic regression (LR) and artificial neural networks (ANNs) – in accomplishing this goal. Material and methods Subjects undergoing CABG (n = 1315) were divided into training (n = 1053) and validation (n = 262) groups. The set of independent variables consisted of age, gender, weight, height, body mass index, diabetes, creatinine level, cardiopulmonary bypass, presence of preserved ventricular function, moderate and severe ventricular dysfunction and total number of grafts. The PMV was also an input for the prediction of death. The ability of ANN to discriminate outcomes was assessed using receiver-operating characteristic (ROC) analysis and the results were compared using a multivariate LR. Results The ROC curve areas for LR and ANN models, respectively, were: for reintubation 0.62 (CI: 0.50–0.75) and 0.65 (CI: 0.53–0.77); for PMV 0.67 (CI: 0.57–0.78) and 0.72 (CI: 0.64–0.81); and for death 0.86 (CI: 0.79–0.93) and 0.85 (CI: 0.80–0.91). No differences were observed between models. Conclusions The ANN has similar discriminating power in predicting reintubation, PMV and death outcomes. Thus, both models may be applicable as a predictor for these outcomes in subjects undergoing CABG. PMID:26322087
Obeso, José M.; García, Pilar; Martínez, Beatriz; Arroyo-López, Francisco Noé; Garrido-Fernández, Antonio; Rodriguez, Ana
2010-01-01
The use of bacteriophages provides an attractive approach to the fight against food-borne pathogenic bacteria, since they can be found in different environments and are unable to infect humans, both characteristics of which support their use as biocontrol agents. Two lytic bacteriophages, vB_SauS-phiIPLA35 (phiIPLA35) and vB_SauS-phiIPLA88 (phiIPLA88), previously isolated from the dairy environment inhibited the growth of Staphylococcus aureus. To facilitate the successful application of both bacteriophages as biocontrol agents, probabilistic models for predicting S. aureus inactivation by the phages in pasteurized milk were developed. A linear logistic regression procedure was used to describe the survival/death interface of S. aureus after 8 h of storage as a function of the initial phage titer (2 to 8 log10 PFU/ml), initial bacterial contamination (2 to 6 log10 CFU/ml), and temperature (15 to 37°C). Two successive models were built, with the first including only data from the experimental design and a global one in which results derived from the validation experiments were also included. The temperature, interaction temperature-initial level of bacterial contamination, and initial level of bacterial contamination-phage titer contributed significantly to the first model prediction. However, only the phage titer and temperature were significantly involved in the global model prediction. The predictions of both models were fail-safe and highly consistent with the observed S. aureus responses. Nevertheless, the global model, deduced from a higher number of experiments (with a higher degree of freedom), was dependent on a lower number of variables and had an apparent better fit. Therefore, it can be considered a convenient evolution of the first model. Besides, the global model provides the minimum phage concentration (about 2 × 108 PFU/ml) required to inactivate S. aureus in milk at different temperatures, irrespective of the bacterial contamination level. PMID
2013-01-01
Background Osteoporotic hip fractures with a significant morbidity and excess mortality among the elderly have imposed huge health and economic burdens on societies worldwide. In this age- and sex-matched case control study, we examined the risk factors of hip fractures and assessed the fracture risk by conditional logistic regression (CLR) and ensemble artificial neural network (ANN). The performances of these two classifiers were compared. Methods The study population consisted of 217 pairs (149 women and 68 men) of fractures and controls with an age older than 60 years. All the participants were interviewed with the same standardized questionnaire including questions on 66 risk factors in 12 categories. Univariate CLR analysis was initially conducted to examine the unadjusted odds ratio of all potential risk factors. The significant risk factors were then tested by multivariate analyses. For fracture risk assessment, the participants were randomly divided into modeling and testing datasets for 10-fold cross validation analyses. The predicting models built by CLR and ANN in modeling datasets were applied to testing datasets for generalization study. The performances, including discrimination and calibration, were compared with non-parametric Wilcoxon tests. Results In univariate CLR analyses, 16 variables achieved significant level, and six of them remained significant in multivariate analyses, including low T score, low BMI, low MMSE score, milk intake, walking difficulty, and significant fall at home. For discrimination, ANN outperformed CLR in both 16- and 6-variable analyses in modeling and testing datasets (p?
Campillo-Gimenez, Boris; Jouini, Wassim; Bayat, Sahar; Cuggia, Marc
2013-01-01
Introduction Case-based reasoning (CBR) is an emerging decision making paradigm in medical research where new cases are solved relying on previously solved similar cases. Usually, a database of solved cases is provided, and every case is described through a set of attributes (inputs) and a label (output). Extracting useful information from this database can help the CBR system providing more reliable results on the yet to be solved cases. Objective We suggest a general framework where a CBR system, viz. K-Nearest Neighbour (K-NN) algorithm, is combined with various information obtained from a Logistic Regression (LR) model, in order to improve prediction of access to the transplant waiting list. Methods LR is applied, on the case database, to assign weights to the attributes as well as the solved cases. Thus, five possible decision making systems based on K-NN and/or LR were identified: a standalone K-NN, a standalone LR and three soft K-NN algorithms that rely on the weights based on the results of the LR. The evaluation was performed under two conditions, either using predictive factors known to be related to registration, or using a combination of factors related and not related to registration. Results and Conclusion The results show that our suggested approach, where the K-NN algorithm relies on both weighted attributes and cases, can efficiently deal with non relevant attributes, whereas the four other approaches suffer from this kind of noisy setups. The robustness of this approach suggests interesting perspectives for medical problem solving tools using CBR methodology. PMID:24039730
Jahandideh, Samad; Abdolmaleki, Parviz; Movahedi, Mohammad Mehdi
2010-02-01
Various studies have been reported on the bioeffects of magnetic field exposure; however, no consensus or guideline is available for experimental designs relating to exposure conditions as yet. In this study, logistic regression (LR) and artificial neural networks (ANNs) were used in order to analyze and predict the melatonin excretion patterns in the rat exposed to extremely low frequency magnetic fields (ELF-MF). Subsequently, on a database containing 33 experiments, performances of LR and ANNs were compared through resubstitution and jackknife tests. Predictor variables were more effective parameters and included frequency, polarization, exposure duration, and strength of magnetic fields. Also, five performance measures including accuracy, sensitivity, specificity, Matthew's Correlation Coefficient (MCC) and normalized percentage, better than random (S) were used to evaluate the performance of models. The LR as a conventional model obtained poor prediction performance. Nonetheless, LR distinguished the duration of magnetic fields as a statistically significant parameter. Also, horizontal polarization of magnetic fields with the highest logit coefficient (or parameter estimate) with negative sign was found to be the strongest indicator for experimental designs relating to exposure conditions. This means that each experiment with horizontal polarization of magnetic fields has a higher probability to result in "not changed melatonin level" pattern. On the other hand, ANNs, a more powerful model which has not been introduced in predicting melatonin excretion patterns in the rat exposed to ELF-MF, showed high performance measure values and higher reliability, especially obtaining 0.55 value of MCC through jackknife tests. Obtained results showed that such predictor models are promising and may play a useful role in defining guidelines for experimental designs relating to exposure conditions. In conclusion, analysis of the bioelectromagnetic data could result in
Wang, Tien-ni; Wu, Ching-yi; Chen, Chia-ling; Shieh, Jeng-yi; Lu, Lu; Lin, Keh-chung
2013-03-01
Given the growing evidence for the effects of constraint-induced therapy (CIT) in children with cerebral palsy (CP), there is a need for investigating the characteristics of potential participants who may benefit most from this intervention. This study aimed to establish predictive models for the effects of pediatric CIT on motor and functional outcomes. Therapists administered CIT to 49 children (aged 3-11 years) with CP. Sessions were 1-3.5h a day, twice a week, for 3-4 weeks. Parents were asked to document the number of restraint hours outside of the therapy sessions. Domains of treatment outcomes included motor capacity (measured by the Peabody Developmental Motor Scales II), motor performance (measured by the Pediatric Motor Activity Log), and functional independence (measured by the Pediatric Functional Independence Measure). Potential predictors included age, affected side, compliance (measured by time of restraint), and the initial level of motor impairment severity. Tests were administered before, immediately after, and 3 months after the intervention. Logistic regression analyses showed that total amount of restraint time was the only significant predictor for improved motor capacity immediately after CIT. Younger children who restrained the less affected arm for a longer time had a greater chance to achieve clinically significant improvements in motor performance. For outcomes of functional independence in daily life, younger age was associated with clinically meaningful improvement in the self-care domain. Baseline motor abilities were significantly predictive of better improvement in mobility and cognition. Significant predictors varied according to the aspects of motor outcomes after 3 months of follow-up. The potential predictors identified in this study allow clinicians to target those children who may benefit most from CIT.
Pecori, Biagio; Lastoria, Secondo; Caracò, Corradina; Celentani, Marco; Tatangelo, Fabiana; Avallone, Antonio; Rega, Daniela; De Palma, Giampaolo; Mormile, Maria; Budillon, Alfredo; Muto, Paolo; Bianco, Francesco; Aloj, Luigi; Petrillo, Antonella; Delrio, Paolo
2017-01-01
Previous studies indicate that FDG PET/CT may predict pathological response in patients undergoing neoadjuvant chemo-radiotherapy for locally advanced rectal cancer (LARC). Aim of the current study is evaluate if pathological response can be similarly predicted in LARC patients after short course radiation therapy alone. Methods: Thirty-three patients with cT2-3, N0-2, M0 rectal adenocarcinoma treated with hypo fractionated short course neoadjuvant RT (5x5 Gy) with delayed surgery (SCRTDS) were prospectively studied. All patients underwent 3 PET/CT studies at baseline, 10 days from RT end (early), and 53 days from RT end (delayed). Maximal standardized uptake value (SUVmax), mean standardized uptake value (SUVmean) and total lesion glycolysis (TLG) of the primary tumor were measured and recorded at each PET/CT study. We use logistic regression analysis to aggregate different measures of metabolic response to predict the pathological response in the course of SCRTDS. Results: We provide straightforward formulas to classify response and estimate the probability of being a major responder (TRG1-2) or a complete responder (TRG1) for each individual. The formulas are based on the level of TLG at the early PET and on the overall proportional reduction of TLG between baseline and delayed PET studies. Conclusions: This study demonstrates that in the course of SCRTDS it is possible to estimate the probabilities of pathological tumor responses on the basis of PET/CT with FDG. Our formulas make it possible to assess the risks associated to LARC borne by a patient in the course of SCRTDS. These risk assessments can be balanced against other health risks associated with further treatments and can therefore be used to make informed therapy adjustments during SCRTDS. PMID:28060889
NASA Astrophysics Data System (ADS)
Cama, Mariaelena; Cristi Nicu, Ionut; Conoscenti, Christian; Quénéhervé, Geraldine; Maerker, Michael
2016-04-01
Landslide susceptibility can be defined as the likelihood of a landslide occurring in a given area on the basis of local terrain conditions. In the last decades many research focused on its evaluation by means of stochastic approaches under the assumption that 'the past is the key to the future' which means that if a model is able to reproduce a known landslide spatial distribution, it will be able to predict the future locations of new (i.e. unknown) slope failures. Among the various stochastic approaches, Binary Logistic Regression (BLR) is one of the most used because it calculates the susceptibility in probabilistic terms and its results are easily interpretable from a geomorphological point of view. However, very often not much importance is given to multicollinearity assessment whose effect is that the coefficient estimates are unstable, with opposite sign and therefore difficult to interpret. Therefore, it should be evaluated every time in order to make a model whose results are geomorphologically correct. In this study the effects of multicollinearity in the predictive performance and robustness of landslide susceptibility models are analyzed. In particular, the multicollinearity is estimated by means of Variation Inflation Index (VIF) which is also used as selection criterion for the independent variables (VIF Stepwise Selection) and compared to the more commonly used AIC Stepwise Selection. The robustness of the results is evaluated through 100 replicates of the dataset. The study area selected to perform this analysis is the Moldavian Plateau where landslides are among the most frequent geomorphological processes. This area has an increasing trend of urbanization and a very high potential regarding the cultural heritage, being the place of discovery of the largest settlement belonging to the Cucuteni Culture from Eastern Europe (that led to the development of the great complex Cucuteni-Tripyllia). Therefore, identifying the areas susceptible to
Diaz, Andres; Rossow, Stephanie; Ciarlet, Max; Marthaler, Douglas
2016-01-01
Rotaviruses (RV) are important causes of diarrhea in animals, especially in domestic animals. Of the 9 RV species, rotavirus A, B, and C (RVA, RVB, and RVC, respectively) had been established as important causes of diarrhea in pigs. The Minnesota Veterinary Diagnostic Laboratory receives swine stool samples from North America to determine the etiologic agents of disease. Between November 2009 and October 2011, 7,508 samples from pigs with diarrhea were submitted to determine if enteric pathogens, including RV, were present in the samples. All samples were tested for RVA, RVB, and RVC by real time RT-PCR. The majority of the samples (82%) were positive for RVA, RVB, and/or RVC. To better understand the risk factors associated with RV infections in swine diagnostic samples, three-level mixed-effects logistic regression models (3L-MLMs) were used to estimate associations among RV species, age, and geographical variability within the major swine production regions in North America. The conditional odds ratios (cORs) for RVA and RVB detection were lower for 1–3 day old pigs when compared to any other age group. However, the cOR of RVC detection in 1–3 day old pigs was significantly higher (p < 0.001) than pigs in the 4–20 days old and >55 day old age groups. Furthermore, pigs in the 21–55 day old age group had statistically higher cORs of RV co-detection compared to 1–3 day old pigs (p < 0.001). The 3L-MLMs indicated that RV status was more similar within states than among states or within each region. Our results indicated that 3L-MLMs are a powerful and adaptable tool to handle and analyze large-hierarchical datasets. In addition, our results indicated that, overall, swine RV epidemiology is complex, and RV species are associated with different age groups and vary by regions in North America. PMID:27145176
Wang, Xiaoli; Wu, Shuangsheng; MacIntyre, C. Raina; Zhang, Hongbin; Shi, Weixian; Peng, Xiaomin; Duan, Wei; Yang, Peng; Zhang, Yi; Wang, Quanyi
2015-01-01
Serfling-type periodic regression models have been widely used to identify and analyse epidemic of influenza. In these approaches, the baseline is traditionally determined using cleaned historical non-epidemic data. However, we found that the previous exclusion of epidemic seasons was empirical, since year-year variations in the seasonal pattern of activity had been ignored. Therefore, excluding fixed ‘epidemic’ months did not seem reasonable. We made some adjustments in the rule of epidemic-period removal to avoid potentially subjective definition of the start and end of epidemic periods. We fitted the baseline iteratively. Firstly, we established a Serfling regression model based on the actual observations without any removals. After that, instead of manually excluding a predefined ‘epidemic’ period (the traditional method), we excluded observations which exceeded a calculated boundary. We then established Serfling regression once more using the cleaned data and excluded observations which exceeded a calculated boundary. We repeated this process until the R2 value stopped to increase. In addition, the definitions of the onset of influenza epidemic were heterogeneous, which might make it impossible to accurately evaluate the performance of alternative approaches. We then used this modified model to detect the peak timing of influenza instead of the onset of epidemic and compared this model with traditional Serfling models using observed weekly case counts of influenza-like illness (ILIs), in terms of sensitivity, specificity and lead time. A better performance was observed. In summary, we provide an adjusted Serfling model which may have improved performance over traditional models in early warning at arrival of peak timing of influenza. PMID:25756205
Agogo, George O
2017-01-01
Measurement error in exposure variables is a serious impediment in epidemiological studies that relate exposures to health outcomes. In nutritional studies, interest could be in the association between long-term dietary intake and disease occurrence. Long-term intake is usually assessed with food frequency questionnaire (FFQ), which is prone to recall bias. Measurement error in FFQ-reported intakes leads to bias in parameter estimate that quantifies the association. To adjust for bias in the association, a calibration study is required to obtain unbiased intake measurements using a short-term instrument such as 24-hour recall (24HR). The 24HR intakes are used as response in regression calibration to adjust for bias in the association. For foods not consumed daily, 24HR-reported intakes are usually characterized by excess zeroes, right skewness, and heteroscedasticity posing serious challenge in regression calibration modeling. We proposed a zero-augmented calibration model to adjust for measurement error in reported intake, while handling excess zeroes, skewness, and heteroscedasticity simultaneously without transforming 24HR intake values. We compared the proposed calibration method with the standard method and with methods that ignore measurement error by estimating long-term intake with 24HR and FFQ-reported intakes. The comparison was done in real and simulated datasets. With the 24HR, the mean increase in mercury level per ounce fish intake was about 0.4; with the FFQ intake, the increase was about 1.2. With both calibration methods, the mean increase was about 2.0. Similar trend was observed in the simulation study. In conclusion, the proposed calibration method performs at least as good as the standard method.
Kjelstrom, L.C.
1995-01-01
Previously developed U.S. Geological Survey regional regression models of runoff and 11 chemical constituents were evaluated to assess their suitability for use in urban areas in Boise and Garden City. Data collected in the study area were used to develop adjusted regional models of storm-runoff volumes and mean concentrations and loads of chemical oxygen demand, dissolved and suspended solids, total nitrogen and total ammonia plus organic nitrogen as nitrogen, total and dissolved phosphorus, and total recoverable cadmium, copper, lead, and zinc. Explanatory variables used in these models were drainage area, impervious area, land-use information, and precipitation data. Mean annual runoff volume and loads at the five outfalls were estimated from 904 individual storms during 1976 through 1993. Two methods were used to compute individual storm loads. The first method used adjusted regional models of storm loads and the second used adjusted regional models for mean concentration and runoff volume. For large storms, the first method seemed to produce excessively high loads for some constituents and the second method provided more reliable results for all constituents except suspended solids. The first method provided more reliable results for large storms for suspended solids.
Introduction to the use of regression models in epidemiology.
Bender, Ralf
2009-01-01
Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.
Asquith, William H.; Roussel, Meghan C.
2009-01-01
Annual peak-streamflow frequency estimates are needed for flood-plain management; for objective assessment of flood risk; for cost-effective design of dams, levees, and other flood-control structures; and for design of roads, bridges, and culverts. Annual peak-streamflow frequency represents the peak streamflow for nine recurrence intervals of 2, 5, 10, 25, 50, 100, 200, 250, and 500 years. Common methods for estimation of peak-streamflow frequency for ungaged or unmonitored watersheds are regression equations for each recurrence interval developed for one or more regions; such regional equations are the subject of this report. The method is based on analysis of annual peak-streamflow data from U.S. Geological Survey streamflow-gaging stations (stations). Beginning in 2007, the U.S. Geological Survey, in cooperation with the Texas Department of Transportation and in partnership with Texas Tech University, began a 3-year investigation concerning the development of regional equations to estimate annual peak-streamflow frequency for undeveloped watersheds in Texas. The investigation focuses primarily on 638 stations with 8 or more years of data from undeveloped watersheds and other criteria. The general approach is explicitly limited to the use of L-moment statistics, which are used in conjunction with a technique of multi-linear regression referred to as PRESS minimization. The approach used to develop the regional equations, which was refined during the investigation, is referred to as the 'L-moment-based, PRESS-minimized, residual-adjusted approach'. For the approach, seven unique distributions are fit to the sample L-moments of the data for each of 638 stations and trimmed means of the seven results of the distributions for each recurrence interval are used to define the station specific, peak-streamflow frequency. As a first iteration of regression, nine weighted-least-squares, PRESS-minimized, multi-linear regression equations are computed using the watershed
NASA Astrophysics Data System (ADS)
Ciprian Margarint, Mihai; Niculita, Mihai
2014-05-01
The regions with monoclinic geological structure are large portions of earth surface where the repetition of similar landform patterns is very distinguished, the scarps of cuestas being characterized by similar values of morphometrical variables. Landslides are associated with these scarps of cuestas and consequently, a very high value of landslide susceptibility can be reported on its surface. In these regions, landslide susceptibility mapping can be realized for the entire region, or for test areas, with accurate, reliable, and available datasets, concerning multi-temporal inventories and landslide predictors. Because of the similar geomorphologic and landslide distribution we think that if any relevance of using test areas for extrapolating susceptibility models is present, these areas should be targeted first. This study case try to establish the level of usability of landslide predictors influence, obtained for a 90 km2 sample located in the northern part of the Moldavian Plateau (N-E Romania), in other areas of the same physio-geographic region. In a first phase, landslide susceptibility assessment was carried out and validated using logistic regression (LR) approach, using a multiple landslide inventory. This inventory was created using ortorectified aerial images from 1978 and 2005, for each period being considered both old and active landslides. The modeling strategy was based on a distinctly inventory of depletion areas of all landslide, for 1978 phase, and on a number of 30 covariates extracted from topographical and aerial images (both from 1978 and 2005 periods). The geomorphometric variables were computed from a Digital Elevation Model (DEM) obtained by interpolation from 1:5000 contour data (2.5 m equidistance), at 10x10 m resolution. Distance from river network, distance from roads and land use were extracted from topographic maps and aerial images. By applying Akaike Information Criterion (AIC) the covariates with significance under 0.001 level
NASA Technical Reports Server (NTRS)
Duda, David P.; Minnis, Patrick
2009-01-01
Straightforward application of the Schmidt-Appleman contrail formation criteria to diagnose persistent contrail occurrence from numerical weather prediction data is hindered by significant bias errors in the upper tropospheric humidity. Logistic models of contrail occurrence have been proposed to overcome this problem, but basic questions remain about how random measurement error may affect their accuracy. A set of 5000 synthetic contrail observations is created to study the effects of random error in these probabilistic models. The simulated observations are based on distributions of temperature, humidity, and vertical velocity derived from Advanced Regional Prediction System (ARPS) weather analyses. The logistic models created from the simulated observations were evaluated using two common statistical measures of model accuracy, the percent correct (PC) and the Hanssen-Kuipers discriminant (HKD). To convert the probabilistic results of the logistic models into a dichotomous yes/no choice suitable for the statistical measures, two critical probability thresholds are considered. The HKD scores are higher when the climatological frequency of contrail occurrence is used as the critical threshold, while the PC scores are higher when the critical probability threshold is 0.5. For both thresholds, typical random errors in temperature, relative humidity, and vertical velocity are found to be small enough to allow for accurate logistic models of contrail occurrence. The accuracy of the models developed from synthetic data is over 85 percent for both the prediction of contrail occurrence and non-occurrence, although in practice, larger errors would be anticipated.
ERIC Educational Resources Information Center
Gómez-Benito, Juana; Hidalgo, Maria Dolores; Zumbo, Bruno D.
2013-01-01
The objective of this article was to find an optimal decision rule for identifying polytomous items with large or moderate amounts of differential functioning. The effectiveness of combining statistical tests with effect size measures was assessed using logistic discriminant function analysis and two effect size measures: R[superscript 2] and…
NASA Technical Reports Server (NTRS)
Duda, David P.; Minnis, Patrick
2009-01-01
Previous studies have shown that probabilistic forecasting may be a useful method for predicting persistent contrail formation. A probabilistic forecast to accurately predict contrail formation over the contiguous United States (CONUS) is created by using meteorological data based on hourly meteorological analyses from the Advanced Regional Prediction System (ARPS) and from the Rapid Update Cycle (RUC) as well as GOES water vapor channel measurements, combined with surface and satellite observations of contrails. Two groups of logistic models were created. The first group of models (SURFACE models) is based on surface-based contrail observations supplemented with satellite observations of contrail occurrence. The second group of models (OUTBREAK models) is derived from a selected subgroup of satellite-based observations of widespread persistent contrails. The mean accuracies for both the SURFACE and OUTBREAK models typically exceeded 75 percent when based on the RUC or ARPS analysis data, but decreased when the logistic models were derived from ARPS forecast data.
NASA Astrophysics Data System (ADS)
Roşca, S.; Bilaşco, Ş.; Petrea, D.; Fodorean, I.; Vescan, I.; Filip, S.; Măguţ, F.-L.
2015-11-01
The existence of a large number of GIS models for the identification of landslide occurrence probability makes difficult the selection of a specific one. The present study focuses on the application of two quantitative models: the logistic and the BSA models. The comparative analysis of the results aims at identifying the most suitable model. The territory corresponding to the Niraj Mic Basin (87 km2) is an area characterised by a wide variety of the landforms with their morphometric, morphographical and geological characteristics as well as by a high complexity of the land use types where active landslides exist. This is the reason why it represents the test area for applying the two models and for the comparison of the results. The large complexity of input variables is illustrated by 16 factors which were represented as 72 dummy variables, analysed on the basis of their importance within the model structures. The testing of the statistical significance corresponding to each variable reduced the number of dummy variables to 12 which were considered significant for the test area within the logistic model, whereas for the BSA model all the variables were employed. The predictability degree of the models was tested through the identification of the area under the ROC curve which indicated a good accuracy (AUROC = 0.86 for the testing area) and predictability of the logistic model (AUROC = 0.63 for the validation area).
Chung, Doo Yong; Cho, Kang Su; Lee, Dae Hun; Han, Jang Hee; Kang, Dong Hyuk; Jung, Hae Do; Kown, Jong Kyou; Ham, Won Sik; Choi, Young Deuk; Lee, Joo Yong
2015-01-01
Purpose This study was conducted to evaluate colic pain as a prognostic pretreatment factor that can influence ureter stone clearance and to estimate the probability of stone-free status in shock wave lithotripsy (SWL) patients with a ureter stone. Materials and Methods We retrospectively reviewed the medical records of 1,418 patients who underwent their first SWL between 2005 and 2013. Among these patients, 551 had a ureter stone measuring 4–20 mm and were thus eligible for our analyses. The colic pain as the chief complaint was defined as either subjective flank pain during history taking and physical examination. Propensity-scores for established for colic pain was calculated for each patient using multivariate logistic regression based upon the following covariates: age, maximal stone length (MSL), and mean stone density (MSD). Each factor was evaluated as predictor for stone-free status by Bayesian and non-Bayesian logistic regression model. Results After propensity-score matching, 217 patients were extracted in each group from the total patient cohort. There were no statistical differences in variables used in propensity- score matching. One-session success and stone-free rate were also higher in the painful group (73.7% and 71.0%, respectively) than in the painless group (63.6% and 60.4%, respectively). In multivariate non-Bayesian and Bayesian logistic regression models, a painful stone, shorter MSL, and lower MSD were significant factors for one-session stone-free status in patients who underwent SWL. Conclusions Colic pain in patients with ureter calculi was one of the significant predicting factors including MSL and MSD for one-session stone-free status of SWL. PMID:25902059
ERIC Educational Resources Information Center
Davidson, J. Cody
2016-01-01
Mathematics is the most common subject area of remedial need and the majority of remedial math students never pass a college-level credit-bearing math class. The majorities of studies that investigate this phenomenon are conducted at community colleges and use some type of regression model; however, none have used a continuation ratio model. The…
Fujita, A; Takabatake, H; Tagaki, S; Sohda, T; Sekine, K
1996-03-01
To evaluate the effect of chemotherapy on QOL, the survival period was categorized by 3 intervals: one in the hospital for chemotherapy (TOX), on an outpatient basis (TWiST Time without Symptom and Toxicity), and in the hospital for conservative therapy (REL). Coefficients showing the QOL level were expressed as ut, uw and ur. If uw was 1 and ut and ur were plotted at less than 1, ut TOX+uwTWiST+urREL could be a quality-adjusted value relative to TWiST (Q-TWiST). One hundred five patients with stage IV non-small cell lung cancer were included. Sixty-five were given chemotherapy, and the other 40 were not. The observation period was 2 years. Q-TWiST values for age, sex, PS, histology and chemotherapy were calculated. Their quantification was performed employing a regression tree type method. Chemotherapy contributed to Q-TWiST when ut approached 1 i.e., no side effect was supposed). When ut was less than 0.5, PS and sex had an appreciable role.
AMINI, Payam; AHMADINIA, Hasan; POOROLAJAL, Jalal; MOQADDASI AMIRI, Mohammad
2016-01-01
Background: We aimed to assess the high-risk group for suicide using different classification methods includinglogistic regression (LR), decision tree (DT), artificial neural network (ANN), and support vector machine (SVM). Methods: We used the dataset of a study conducted to predict risk factors of completed suicide in Hamadan Province, the west of Iran, in 2010. To evaluate the high-risk groups for suicide, LR, SVM, DT and ANN were performed. The applied methods were compared using sensitivity, specificity, positive predicted value, negative predicted value, accuracy and the area under curve. Cochran-Q test was implied to check differences in proportion among methods. To assess the association between the observed and predicted values, Ø coefficient, contingency coefficient, and Kendall tau-b were calculated. Results: Gender, age, and job were the most important risk factors for fatal suicide attempts in common for four methods. SVM method showed the highest accuracy 0.68 and 0.67 for training and testing sample, respectively. However, this method resulted in the highest specificity (0.67 for training and 0.68 for testing sample) and the highest sensitivity for training sample (0.85), but the lowest sensitivity for the testing sample (0.53). Cochran-Q test resulted in differences between proportions in different methods (P<0.001). The association of SVM predictions and observed values, Ø coefficient, contingency coefficient, and Kendall tau-b were 0.239, 0.232 and 0.239, respectively. Conclusion: SVM had the best performance to classify fatal suicide attempts comparing to DT, LR and ANN. PMID:27957463
Keith, Scott W; Allison, David B
2014-09-29
This paper details the design, evaluation, and implementation of a framework for detecting and modeling nonlinearity between a binary outcome and a continuous predictor variable adjusted for covariates in complex samples. The framework provides familiar-looking parameterizations of output in terms of linear slope coefficients and odds ratios. Estimation methods focus on maximum likelihood optimization of piecewise linear free-knot splines formulated as B-splines. Correctly specifying the optimal number and positions of the knots improves the model, but is marked by computational intensity and numerical instability. Our inference methods utilize both parametric and nonparametric bootstrapping. Unlike other nonlinear modeling packages, this framework is designed to incorporate multistage survey sample designs common to nationally representative datasets. We illustrate the approach and evaluate its performance in specifying the correct number of knots under various conditions with an example using body mass index (BMI; kg/m(2)) and the complex multi-stage sampling design from the Third National Health and Nutrition Examination Survey to simulate binary mortality outcomes data having realistic nonlinear sample-weighted risk associations with BMI. BMI and mortality data provide a particularly apt example and area of application since BMI is commonly recorded in large health surveys with complex designs, often categorized for modeling, and nonlinearly related to mortality. When complex sample design considerations were ignored, our method was generally similar to or more accurate than two common model selection procedures, Schwarz's Bayesian Information Criterion (BIC) and Akaike's Information Criterion (AIC), in terms of correctly selecting the correct number of knots. Our approach provided accurate knot selections when complex sampling weights were incorporated, while AIC and BIC were not effective under these conditions.
Keith, Scott W.; Allison, David B.
2014-01-01
This paper details the design, evaluation, and implementation of a framework for detecting and modeling non-linearity between a binary outcome and a continuous predictor variable adjusted for covariates in complex samples. The framework provides familiar-looking parameterizations of output in terms of linear slope coefficients and odds ratios. Estimation methods focus on maximum likelihood optimization of piecewise linear free-knot splines formulated as B-splines. Correctly specifying the optimal number and positions of the knots improves the model, but is marked by computational intensity and numerical instability. Our inference methods utilize both parametric and non-parametric bootstrapping. Unlike other non-linear modeling packages, this framework is designed to incorporate multistage survey sample designs common to nationally representative datasets. We illustrate the approach and evaluate its performance in specifying the correct number of knots under various conditions with an example using body mass index (BMI, kg/m2) and the complex multistage sampling design from the Third National Health and Nutrition Examination Survey to simulate binary mortality outcomes data having realistic non-linear sample-weighted risk associations with BMI. BMI and mortality data provide a particularly apt example and area of application since BMI is commonly recorded in large health surveys with complex designs, often categorized for modeling, and non-linearly related to mortality. When complex sample design considerations were ignored, our method was generally similar to or more accurate than two common model selection procedures, Schwarz’s Bayesian Information Criterion (BIC) and Akaike’s Information Criterion (AIC), in terms of correctly selecting the correct number of knots. Our approach provided accurate knot selections when complex sampling weights were incorporated, while AIC and BIC were not effective under these conditions. PMID:25610831
Laubender, Ruediger P; Bender, Ralf
2014-02-28
Recently, Laubender and Bender (Stat. Med. 2010; 29: 851-859) applied the average risk difference (RD) approach to estimate adjusted RD and corresponding number needed to treat measures in the Cox proportional hazards model. We calculated standard errors and confidence intervals by using bootstrap techniques. In this paper, we develop asymptotic variance estimates of the adjusted RD measures and corresponding asymptotic confidence intervals within the counting process theory and evaluated them in a simulation study. We illustrate the use of the asymptotic confidence intervals by means of data of the Düsseldorf Obesity Mortality Study.
A Latent Transition Model with Logistic Regression
ERIC Educational Resources Information Center
Chung, Hwan; Walls, Theodore A.; Park, Yousung
2007-01-01
Latent transition models increasingly include covariates that predict prevalence of latent classes at a given time or transition rates among classes over time. In many situations, the covariate of interest may be latent. This paper describes an approach for handling both manifest and latent covariates in a latent transition model. A Bayesian…
NASA Astrophysics Data System (ADS)
Liberman, Neomi; Ben-David Kolikant, Yifat; Beeri, Catriel
2012-09-01
Due to a program reform in Israel, experienced CS high-school teachers faced the need to master and teach a new programming paradigm. This situation served as an opportunity to explore the relationship between teachers' content knowledge (CK) and their pedagogical content knowledge (PCK). This article focuses on three case studies, with emphasis on one of them. Using observations and interviews, we examine how the teachers, we observed taught and what development of their teaching occurred as a result of their teaching experience, if at all. Our findings suggest that this situation creates a new hybrid state of teachers, which we term "regressed experts." These teachers incorporate in their professional practice some elements typical of novices and some typical of experts. We also found that these teachers' experience, although established when teaching a different CK, serve as a leverage to improve their knowledge and understanding of aspects of the new content.
Foster, Guy M.; Graham, Jennifer L.
2016-04-06
The Kansas River is a primary source of drinking water for about 800,000 people in northeastern Kansas. Source-water supplies are treated by a combination of chemical and physical processes to remove contaminants before distribution. Advanced notification of changing water-quality conditions and cyanobacteria and associated toxin and taste-and-odor compounds provides drinking-water treatment facilities time to develop and implement adequate treatment strategies. The U.S. Geological Survey (USGS), in cooperation with the Kansas Water Office (funded in part through the Kansas State Water Plan Fund), and the City of Lawrence, the City of Topeka, the City of Olathe, and Johnson County Water One, began a study in July 2012 to develop statistical models at two Kansas River sites located upstream from drinking-water intakes. Continuous water-quality monitors have been operated and discrete-water quality samples have been collected on the Kansas River at Wamego (USGS site number 06887500) and De Soto (USGS site number 06892350) since July 2012. Continuous and discrete water-quality data collected during July 2012 through June 2015 were used to develop statistical models for constituents of interest at the Wamego and De Soto sites. Logistic models to continuously estimate the probability of occurrence above selected thresholds were developed for cyanobacteria, microcystin, and geosmin. Linear regression models to continuously estimate constituent concentrations were developed for major ions, dissolved solids, alkalinity, nutrients (nitrogen and phosphorus species), suspended sediment, indicator bacteria (Escherichia coli, fecal coliform, and enterococci), and actinomycetes bacteria. These models will be used to provide real-time estimates of the probability that cyanobacteria and associated compounds exceed thresholds and of the concentrations of other water-quality constituents in the Kansas River. The models documented in this report are useful for characterizing changes
2011-01-01
Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed
Chai, Zhiguo; Chen, Jihua; Zhang, Shaofeng
2014-01-01
Background Removable dentures are subject to plaque and/or staining problems. Denture hygiene habits and risk factors differ among countries and regions. The aims of this study were to assess hygiene habits and denture plaque and staining risk factors in Chinese removable denture wearers aged >40 years in Xi’an through multiple logistic regression analysis (MLRA). Methods Questionnaires were administered to 222 patients whose removable dentures were examined clinically to assess wear status and levels of plaque and staining. Univariate analyses were performed to identify potential risk factors for denture plaque/staining. MLRA was performed to identify significant risk factors. Results Brushing (77.93%) was the most prevalent cleaning method in the present study. Only 16.4% of patients regularly used commercial cleansers. Most (81.08%) patients removed their dentures overnight. MLRA indicated that potential risk factors for denture plaque were the duration of denture use (reference, ≤0.5 years; 2.1–5 years: OR = 4.155, P = 0.001; >5 years: OR = 7.238, P<0.001) and cleaning method (reference, chemical cleanser; running water: OR = 7.081, P = 0.010; brushing: OR = 3.567, P = 0.005). Potential risk factors for denture staining were female gender (OR = 0.377, P = 0.013), smoking (OR = 5.471, P = 0.031), tea consumption (OR = 3.957, P = 0.002), denture scratching (OR = 4.557, P = 0.036), duration of denture use (reference, ≤0.5 years; 2.1–5 years: OR = 7.899, P = 0.001; >5 years: OR = 27.226, P<0.001), and cleaning method (reference, chemical cleanser; running water: OR = 29.184, P<0.001; brushing: OR = 4.236, P = 0.007). Conclusion Denture hygiene habits need further improvement. An understanding of the risk factors for denture plaque and staining may provide the basis for preventive efforts. PMID:24498369
NASA Astrophysics Data System (ADS)
Wu, Chunhung
2016-04-01
Few researches have discussed about the applicability of applying the statistical landslide susceptibility (LS) model for extreme rainfall-induced landslide events. The researches focuses on the comparison and applicability of LS models based on four methods, including landslide ratio-based logistic regression (LRBLR), frequency ratio (FR), weight of evidence (WOE), and instability index (II) methods, in an extreme rainfall-induced landslide cases. The landslide inventory in the Chishan river watershed, Southwestern Taiwan, after 2009 Typhoon Morakot is the main materials in this research. The Chishan river watershed is a tributary watershed of Kaoping river watershed, which is a landslide- and erosion-prone watershed with the annual average suspended load of 3.6×107 MT/yr (ranks 11th in the world). Typhoon Morakot struck Southern Taiwan from Aug. 6-10 in 2009 and dumped nearly 2,000 mm of rainfall in the Chishan river watershed. The 24-hour, 48-hour, and 72-hours accumulated rainfall in the Chishan river watershed exceeded the 200-year return period accumulated rainfall. 2,389 landslide polygons in the Chishan river watershed were extracted from SPOT 5 images after 2009 Typhoon Morakot. The total landslide area is around 33.5 km2, equals to the landslide ratio of 4.1%. The main landslide types based on Varnes' (1978) classification are rotational and translational slides. The two characteristics of extreme rainfall-induced landslide event are dense landslide distribution and large occupation of downslope landslide areas owing to headward erosion and bank erosion in the flooding processes. The area of downslope landslide in the Chishan river watershed after 2009 Typhoon Morakot is 3.2 times higher than that of upslope landslide areas. The prediction accuracy of LS models based on LRBLR, FR, WOE, and II methods have been proven over 70%. The model performance and applicability of four models in a landslide-prone watershed with dense distribution of rainfall
Bias associated with using the estimated propensity score as a regression covariate.
Hade, Erinn M; Lu, Bo
2014-01-15
The use of propensity score methods to adjust for selection bias in observational studies has become increasingly popular in public health and medical research. A substantial portion of studies using propensity score adjustment treat the propensity score as a conventional regression predictor. Through a Monte Carlo simulation study, Austin and colleagues. investigated the bias associated with treatment effect estimation when the propensity score is used as a covariate in nonlinear regression models, such as logistic regression and Cox proportional hazards models. We show that the bias exists even in a linear regression model when the estimated propensity score is used and derive the explicit form of the bias. We also conduct an extensive simulation study to compare the performance of such covariate adjustment with propensity score stratification, propensity score matching, inverse probability of treatment weighted method, and nonparametric functional estimation using splines. The simulation scenarios are designed to reflect real data analysis practice. Instead of specifying a known parametric propensity score model, we generate the data by considering various degrees of overlap of the covariate distributions between treated and control groups. Propensity score matching excels when the treated group is contained within a larger control pool, while the model-based adjustment may have an edge when treated and control groups do not have too much overlap. Overall, adjusting for the propensity score through stratification or matching followed by regression or using splines, appears to be a good practical strategy.
Complementary Log Regression for Sufficient-Cause Modeling of Epidemiologic Data.
Lin, Jui-Hsiang; Lee, Wen-Chung
2016-12-13
The logistic regression model is the workhorse of epidemiological data analysis. The model helps to clarify the relationship between multiple exposures and a binary outcome. Logistic regression analysis is readily implemented using existing statistical software, and this has contributed to it becoming a routine procedure for epidemiologists. In this paper, the authors focus on a causal model which has recently received much attention from the epidemiologic community, namely, the sufficient-component cause model (causal-pie model). The authors show that the sufficient-component cause model is associated with a particular 'link' function: the complementary log link. In a complementary log regression, the exponentiated coefficient of a main-effect term corresponds to an adjusted 'peril ratio', and the coefficient of a cross-product term can be used directly to test for causal mechanistic interaction (sufficient-cause interaction). The authors provide detailed instructions on how to perform a complementary log regression using existing statistical software and use three datasets to illustrate the methodology. Complementary log regression is the model of choice for sufficient-cause analysis of binary outcomes. Its implementation is as easy as conventional logistic regression.
Killeen, Peter R
2015-07-01
The generalized matching law (GML) is reconstructed as a logistic regression equation that privileges no particular value of the sensitivity parameter, a. That value will often approach 1 due to the feedback that drives switching that is intrinsic to most concurrent schedules. A model of that feedback reproduced some features of concurrent data. The GML is a law only in the strained sense that any equation that maps data is a law. The machine under the hood of matching is in all likelihood the very law that was displaced by the Matching Law. It is now time to return the Law of Effect to centrality in our science.
Wang, Ming; Flanders, W Dana; Bostick, Roberd M; Long, Qi
2012-12-20
Measurement error is common in epidemiological and biomedical studies. When biomarkers are measured in batches or groups, measurement error is potentially correlated within each batch or group. In regression analysis, most existing methods are not applicable in the presence of batch-specific measurement error in predictors. We propose a robust conditional likelihood approach to account for batch-specific error in predictors when batch effect is additive and the predominant source of error, which requires no assumptions on the distribution of measurement error. Although a regression model with batch as a categorical covariable yields the same parameter estimates as the proposed conditional likelihood approach for linear regression, this result does not hold in general for all generalized linear models, in particular, logistic regression. Our simulation studies show that the conditional likelihood approach achieves better finite sample performance than the regression calibration approach or a naive approach without adjustment for measurement error. In the case of logistic regression, our proposed approach is shown to also outperform the regression approach with batch as a categorical covariate. In addition, we also examine a 'hybrid' approach combining the conditional likelihood method and the regression calibration method, which is shown in simulations to achieve good performance in the presence of both batch-specific and measurement-specific errors. We illustrate our method by using data from a colorectal adenoma study.
Country logistics performance and disaster impact.
Vaillancourt, Alain; Haavisto, Ira
2016-04-01
The aim of this paper is to deepen the understanding of the relationship between country logistics performance and disaster impact. The relationship is analysed through correlation analysis and regression models for 117 countries for the years 2007 to 2012 with disaster impact variables from the International Disaster Database (EM-DAT) and logistics performance indicators from the World Bank. The results show a significant relationship between country logistics performance and disaster impact overall and for five out of six specific logistic performance indicators. These specific indicators were further used to explore the relationship between country logistic performance and disaster impact for three specific disaster types (epidemic, flood and storm). The findings enhance the understanding of the role of logistics in a humanitarian context with empirical evidence of the importance of country logistics performance in disaster response operations.
A regularization corrected score method for nonlinear regression models with covariate error.
Zucker, David M; Gorfine, Malka; Li, Yi; Tadesse, Mahlet G; Spiegelman, Donna
2013-03-01
Many regression analyses involve explanatory variables that are measured with error, and failing to account for this error is well known to lead to biased point and interval estimates of the regression coefficients. We present here a new general method for adjusting for covariate error. Our method consists of an approximate version of the Stefanski-Nakamura corrected score approach, using the method of regularization to obtain an approximate solution of the relevant integral equation. We develop the theory in the setting of classical likelihood models; this setting covers, for example, linear regression, nonlinear regression, logistic regression, and Poisson regression. The method is extremely general in terms of the types of measurement error models covered, and is a functional method in the sense of not involving assumptions on the distribution of the true covariate. We discuss the theoretical properties of the method and present simulation results in the logistic regression setting (univariate and multivariate). For illustration, we apply the method to data from the Harvard Nurses' Health Study concerning the relationship between physical activity and breast cancer mortality in the period following a diagnosis of breast cancer.
J. Richard Hess; Kevin L. Kenney; William A. Smith; Ian Bonner; David J. Muth
2015-04-01
Equipment manufacturers have made rapid improvements in biomass harvesting and handling equipment. These improvements have increased transportation and handling efficiencies due to higher biomass densities and reduced losses. Improvements in grinder efficiencies and capacity have reduced biomass grinding costs. Biomass collection efficiencies (the ratio of biomass collected to the amount available in the field) as high as 75% for crop residues and greater than 90% for perennial energy crops have also been demonstrated. However, as collection rates increase, the fraction of entrained soil in the biomass increases, and high biomass residue removal rates can violate agronomic sustainability limits. Advancements in quantifying multi-factor sustainability limits to increase removal rate as guided by sustainable residue removal plans, and mitigating soil contamination through targeted removal rates based on soil type and residue type/fraction is allowing the use of new high efficiency harvesting equipment and methods. As another consideration, single pass harvesting and other technologies that improve harvesting costs cause biomass storage moisture management challenges, which challenges are further perturbed by annual variability in biomass moisture content. Monitoring, sampling, simulation, and analysis provide basis for moisture, time, and quality relationships in storage, which has allowed the development of moisture tolerant storage systems and best management processes that combine moisture content and time to accommodate baled storage of wet material based upon “shelf-life.” The key to improving biomass supply logistics costs has been developing the associated agronomic sustainability and biomass quality technologies and processes that allow the implementation of equipment engineering solutions.
Real, Jordi; Forné, Carles; Roso-Llorach, Albert; Martínez-Sánchez, Jose M
2016-05-01
Controlling for confounders is a crucial step in analytical observational studies, and multivariable models are widely used as statistical adjustment techniques. However, the validation of the assumptions of the multivariable regression models (MRMs) should be made clear in scientific reporting. The objective of this study is to review the quality of statistical reporting of the most commonly used MRMs (logistic, linear, and Cox regression) that were applied in analytical observational studies published between 2003 and 2014 by journals indexed in MEDLINE.Review of a representative sample of articles indexed in MEDLINE (n = 428) with observational design and use of MRMs (logistic, linear, and Cox regression). We assessed the quality of reporting about: model assumptions and goodness-of-fit, interactions, sensitivity analysis, crude and adjusted effect estimate, and specification of more than 1 adjusted model.The tests of underlying assumptions or goodness-of-fit of the MRMs used were described in 26.2% (95% CI: 22.0-30.3) of the articles and 18.5% (95% CI: 14.8-22.1) reported the interaction analysis. Reporting of all items assessed was higher in articles published in journals with a higher impact factor.A low percentage of articles indexed in MEDLINE that used multivariable techniques provided information demonstrating rigorous application of the model selected as an adjustment method. Given the importance of these methods to the final results and conclusions of observational studies, greater rigor is required in reporting the use of MRMs in the scientific literature.
Zou, G Y; Donner, Allan
2013-12-01
The Poisson regression model using a sandwich variance estimator has become a viable alternative to the logistic regression model for the analysis of prospective studies with independent binary outcomes. The primary advantage of this approach is that it readily provides covariate-adjusted risk ratios and associated standard errors. In this article, the model is extended to studies with correlated binary outcomes as arise in longitudinal or cluster randomization studies. The key step involves a cluster-level grouping strategy for the computation of the middle term in the sandwich estimator. For a single binary exposure variable without covariate adjustment, this approach results in risk ratio estimates and standard errors that are identical to those found in the survey sampling literature. Simulation results suggest that it is reliable for studies with correlated binary data, provided the total number of clusters is at least 50. Data from observational and cluster randomized studies are used to illustrate the methods.
Survival Data and Regression Models
NASA Astrophysics Data System (ADS)
Grégoire, G.
2014-12-01
We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.
Barros, Márcio Vinícius Lins de; Arancibia, Ana Elisa Loyola; Costa, Ana Paula; Bueno, Fernando Brito; Martins, Marcela Aparecida Corrêa; Magalhães, Maria Cláudia; Silva, José Luiz Padilha; Bastos, Marcos de
2016-04-01
Deep venous thrombosis (DVT) management includes prediction rule evaluation to define standard pretest DVT probabilities in symptomatic patients. The aim of this study was to evaluate the incremental usefulness of hormonal therapy to the Wells prediction rules for DVT in women. We studied women undertaking compressive ultrasound scanning for suspected DVT. We adjusted the Wells score for DVT, taking into account the β-coefficients of the logistic regression model. Data discrimination was evaluated by the receiver operating characteristic (ROC) curve. The adjusted score calibration was assessed graphically and by the Hosmer-Lemeshow test. Reclassification tables and the net reclassification index were used for the adjusted score comparison with the Wells score for DVT. We observed 461 women including 103 DVT events. The mean age was 56 years (±21 years). The adjusted logistic regression model included hormonal therapy and six Wells prediction rules for DVT. The adjusted score weights ranged from -4 to 4. Hosmer-Lemeshow test showed a nonsignificant P value (0.69) and the calibration graph showed no differences between the expected and the observed values. The area under the ROC curve was 0.92 [95% confidence interval (CI) 0.90-0.95] for the adjusted model and 0.87 (95% CI 0.84-0.91) for the Wells score for DVT (Delong test, P value < 0.01). Net reclassification index for the adjusted score was 0.22 (95% CI 0.11-0.33, P value < 0.01). Our results suggest an incremental usefulness of hormonal therapy as an independent DVT prediction rule in women compared with the Wells score for DVT. The adjusted score must be evaluated in different populations before clinical use.
Kim, Min-hyung; Banerjee, Samprit; Park, Sang Min; Pathak, Jyotishman
2016-01-01
Depression, despite its high prevalence, remains severely under-diagnosed across the healthcare system. This demands the development of data-driven approaches that can help screen patients who are at a high risk of depression. In this work, we develop depression risk prediction models that incorporate disease co-morbidities using logistic regression with Elastic Net. Using data from the one million twelve-year longitudinal cohort from Korean National Health Insurance Services (KNHIS), our model achieved an Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) of 0.7818, compared to a traditional logistic regression model without co-morbidity analysis (AUC of 0.6992). We also showed co-morbidity adjusted Odds Ratios (ORs), which may be more accurate independent estimate of each predictor variable. In conclusion, inclusion of co-morbidity analysis improved the performance of depression risk prediction models. PMID:28269945
Kim, Min-Hyung; Banerjee, Samprit; Park, Sang Min; Pathak, Jyotishman
2016-01-01
Depression, despite its high prevalence, remains severely under-diagnosed across the healthcare system. This demands the development of data-driven approaches that can help screen patients who are at a high risk of depression. In this work, we develop depression risk prediction models that incorporate disease co-morbidities using logistic regression with Elastic Net. Using data from the one million twelve-year longitudinal cohort from Korean National Health Insurance Services (KNHIS), our model achieved an Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) of 0.7818, compared to a traditional logistic regression model without co-morbidity analysis (AUC of 0.6992). We also showed co-morbidity adjusted Odds Ratios (ORs), which may be more accurate independent estimate of each predictor variable. In conclusion, inclusion of co-morbidity analysis improved the performance of depression risk prediction models.
Incremental logistic regression for customizing automatic diagnostic models.
Tortajada, Salvador; Robles, Montserrat; García-Gómez, Juan Miguel
2015-01-01
In the last decades, and following the new trends in medicine, statistical learning techniques have been used for developing automatic diagnostic models for aiding the clinical experts throughout the use of Clinical Decision Support Systems. The development of these models requires a large, representative amount of data, which is commonly obtained from one hospital or a group of hospitals after an expensive and time-consuming gathering, preprocess, and validation of cases. After the model development, it has to overcome an external validation that is often carried out in a different hospital or health center. The experience is that the models show underperformed expectations. Furthermore, patient data needs ethical approval and patient consent to send and store data. For these reasons, we introduce an incremental learning algorithm base on the Bayesian inference approach that may allow us to build an initial model with a smaller number of cases and update it incrementally when new data are collected or even perform a new calibration of a model from a different center by using a reduced number of cases. The performance of our algorithm is demonstrated by employing different benchmark datasets and a real brain tumor dataset; and we compare its performance to a previous incremental algorithm and a non-incremental Bayesian model, showing that the algorithm is independent of the data model, iterative, and has a good convergence.
Assessing Lake Trophic Status: A Proportional Odds Logistic Regression Model
Lake trophic state classifications are good predictors of ecosystem condition and are indicative of both ecosystem services (e.g., recreation and aesthetics), and disservices (e.g., harmful algal blooms). Methods for classifying trophic state are based off the foundational work o...
Analyzing Army Reserve Unsatisfactory Participants through Logistic Regression
2012-06-08
interactions, and referenced Dr. Fredric Jablin’s four stages of socialization (anticipatory socialization, organizational encounter, metamorphosis ...Soldier’s encounter is positive he begins the metamorphosis stage, which “marks the newcomer’s alignment of expectations to those of the organization...anticipatory socialization, encounter, and metamorphosis stages do not have enough time in the unit to make educated decisions. The metamorphosis stage should
Measuring NAVSPASUR Sensor Performance using Logistic Regression Models
1992-09-01
consists of three transmitting and six receiving stations or sites. The three transmitting sites, located in Lake Kickapoo TX, Gila River AZ and Jordan...frequency of 216.980 MHz. The largest of the transmitters, located in Lake Kickapoo TX, operates at 810 kW of output power and consists of eighteen...parameters for the Lake Kickapoo transmitter and the San Diego receiver’ will be used. The following parameters are given [Ref. 1: p. 6-7]: P, = 6.76 X 10
Interpreting Multiple Logistic Regression Coefficients in Prospective Observational Studies
1982-11-01
prompted close examination of the issue at a workshop on hypertriglyceridemia where some of the cautions and perspectives given in this paper were...longevity," Circulation, 34, 679-697, (1966). 19. Lippel, K., Tyroler, H., Eder, H., Gotto, A., and Vahouny, G. "Rela- tionship of hypertriglyceridemia
Background or Experience? Using Logistic Regression to Predict College Retention
ERIC Educational Resources Information Center
Synco, Tracee M.
2012-01-01
Tinto, Astin and countless others have researched the retention and attrition of students from college for more than thirty years. However, the six year graduation rate for all first-time full-time freshmen for the 2002 cohort was 57%. This study sought to determine the retention variables that predicted continued enrollment of entering freshmen…
NASA Technical Reports Server (NTRS)
Tellado, Joseph
2014-01-01
The presentation contains a status of KSC ISS Logistics Operations. It basically presents current top level ISS Logistics tasks being conducted at KSC, current International Partner activities, hardware processing flow focussing on late Stow operations, list of KSC Logistics POC's, and a backup list of Logistics launch site services. This presentation is being given at the annual International Space Station (ISS) Multi-lateral Logistics Maintenance Control Panel meeting to be held in Turin, Italy during the week of May 13-16. The presentatiuon content doesn't contain any potential lessons learned.
ERIC Educational Resources Information Center
Pedrini, D. T.; Pedrini, Bonnie C.
Regression, another mechanism studied by Sigmund Freud, has had much research, e.g., hypnotic regression, frustration regression, schizophrenic regression, and infra-human-animal regression (often directly related to fixation). Many investigators worked with hypnotic age regression, which has a long history, going back to Russian reflexologists.…
Rank regression: an alternative regression approach for data with outliers.
Chen, Tian; Tang, Wan; Lu, Ying; Tu, Xin
2014-10-01
Linear regression models are widely used in mental health and related health services research. However, the classic linear regression analysis assumes that the data are normally distributed, an assumption that is not met by the data obtained in many studies. One method of dealing with this problem is to use semi-parametric models, which do not require that the data be normally distributed. But semi-parametric models are quite sensitive to outlying observations, so the generated estimates are unreliable when study data includes outliers. In this situation, some researchers trim the extreme values prior to conducting the analysis, but the ad-hoc rules used for data trimming are based on subjective criteria so different methods of adjustment can yield different results. Rank regression provides a more objective approach to dealing with non-normal data that includes outliers. This paper uses simulated and real data to illustrate this useful regression approach for dealing with outliers and compares it to the results generated using classical regression models and semi-parametric regression models.
... structural alignment and improve your body's physical function. Low back pain, neck pain and headache are the most common ... treated. Chiropractic adjustment can be effective in treating low back pain, although much of the research done shows only ...
... from other people Skipped heartbeats and other physical complaints Trembling or twitching To have adjustment disorder, you ... ADAM Health Solutions. About MedlinePlus Site Map FAQs Customer Support Get email updates Subscribe to RSS Follow ...
2006-02-07
Logistics modernization, end-to-end logistics, logistics transformation, and just - in - time logistics -- whichever name it is being called today, it is...demonstrations as to what these systems will do for the end user, consumer confidence will increase and " just - in - time " logistics will lead to a lighter and leaner combat logistics support.
Focused Logistics: Putting Agility in Agile Logistics
2011-05-19
envisioned an agile and adaptable logstics system built around common situational understanding.5 The Focused Logistics concept specified the requirement to...the tracking of resources moving through the TD network. As a result, 112 Ibid, 20. 113 Ibid, 18-20. 39 distribution centers tagged inbound
ERIC Educational Resources Information Center
Walton, Joseph M.; And Others
1978-01-01
Ridge regression is an approach to the problem of large standard errors of regression estimates of intercorrelated regressors. The effect of ridge regression on the estimated squared multiple correlation coefficient is discussed and illustrated. (JKS)
The "Smarter Regression" Add-In for Linear and Logistic Regression in Excel
2007-07-01
only Visual Basic , the built-in programming language of Excel (Walkenbach, 1999). We wanted to avoid the use of external Dynamic Linked Libraries...least two different ways of entering results into workbook cells in Visual Basic . One is to establish an array in Visual Basic , fill up the elements of
Chang, Chun-Ming; Yin, Wen-Yao; Wei, Chang-Kao; Wu, Chin-Chia; Su, Yu-Chieh; Yu, Chia-Hui; Lee, Ching-Chih
2016-01-01
Background Identification of patients at risk of death from cancer surgery should aid in preoperative preparation. The purpose of this study is to assess and adjust the age-adjusted Charlson comorbidity index (ACCI) to identify cancer patients with increased risk of perioperative mortality. Methods We identified 156,151 patients undergoing surgery for one of the ten common cancers between 2007 and 2011 in the Taiwan National Health Insurance Research Database. Half of the patients were randomly selected, and a multivariate logistic regression analysis was used to develop an adjusted-ACCI score for estimating the risk of 90-day mortality by variables from the original ACCI. The score was validated. The association between the score and perioperative mortality was analyzed. Results The adjusted-ACCI score yield a better discrimination on mortality after cancer surgery than the original ACCI score, with c-statics of 0.75 versus 0.71. Over 80 years of age, 70–80 years, and renal disease had the strongest impact on mortality, hazard ratios 8.40, 3.63, and 3.09 (P < 0.001), respectively. The overall 90-day mortality rates in the entire cohort varied from 0.9%, 2.9%, 7.0%, and 13.2% in four risk groups stratifying by the adjusted-ACCI score; the adjusted hazard ratio for score 4–7, 8–11, and ≥ 12 was 2.84, 6.07, and 11.17 (P < 0.001), respectively, in 90-day mortality compared to score 0–3. Conclusions The adjusted-ACCI score helps to identify patients with a higher risk of 90-day mortality after cancer surgery. It might be particularly helpful for preoperative evaluation of patients over 80 years of age. PMID:26848761
NASA Astrophysics Data System (ADS)
Cempírek, Václav; Nachtigall, Petr; Široký, Jaromír
2016-12-01
This paper deals with security of logistic chains according to incorrect declaration of transported goods, fraudulent transport and forwarding companies and possible threats caused by political influences. The main goal of this paper is to highlight possible logistic costs increase due to these fraudulent threats. An analysis of technological processes will beis provided, and an increase of these transport times considering the possible threatswhich will beis evaluated economic costs-wise. In the conclusion, possible threat of companies'` efficiency in logistics due to the costs`, means of transport and increase in human resources` increase will beare pointed out.
Harry, H.H.
1988-03-11
Abstract and method for the adjustment and alignment of shafts in high power devices. A plurality of adjacent rotatable angled cylinders are positioned between a base and the shaft to be aligned which when rotated introduce an axial offset. The apparatus is electrically conductive and constructed of a structurally rigid material. The angled cylinders allow the shaft such as the center conductor in a pulse line machine to be offset in any desired alignment position within the range of the apparatus. 3 figs.
Harry, Herbert H.
1989-01-01
Apparatus and method for the adjustment and alignment of shafts in high power devices. A plurality of adjacent rotatable angled cylinders are positioned between a base and the shaft to be aligned which when rotated introduce an axial offset. The apparatus is electrically conductive and constructed of a structurally rigid material. The angled cylinders allow the shaft such as the center conductor in a pulse line machine to be offset in any desired alignment position within the range of the apparatus.
NASA Astrophysics Data System (ADS)
Chang, Yoon S.; Oh, Chang H.
Nowadays, environmental management becomes a critical business consideration for companies to survive from many regulations and tough business requirements. Most of world-leading companies are now aware that environment friendly technology and management are critical to the sustainable growth of the company. The environment market has seen continuous growth marking 532B in 2000, and 590B in 2004. This growth rate is expected to grow to 700B in 2010. It is not hard to see the environment-friendly efforts in almost all aspects of business operations. Such trends can be easily found in logistics area. Green logistics aims to make environmental friendly decisions throughout a product lifecycle. Therefore for the success of green logistics, it is critical to have real time tracking capability on the product throughout the product lifecycle and smart solution service architecture. In this chapter, we introduce an RFID based green logistics solution and service.
Lunar Commercial Mining Logistics
NASA Astrophysics Data System (ADS)
Kistler, Walter P.; Citron, Bob; Taylor, Thomas C.
2008-01-01
Innovative commercial logistics is required for supporting lunar resource recovery operations and assisting larger consortiums in lunar mining, base operations, camp consumables and the future commercial sales of propellant over the next 50 years. To assist in lowering overall development costs, ``reuse'' innovation is suggested in reusing modified LTS in-space hardware for use on the moon's surface, developing product lines for recovered gases, regolith construction materials, surface logistics services, and other services as they evolve, (Kistler, Citron and Taylor, 2005) Surface logistics architecture is designed to have sustainable growth over 50 years, financed by private sector partners and capable of cargo transportation in both directions in support of lunar development and resource recovery development. The author's perspective on the importance of logistics is based on five years experience at remote sites on Earth, where remote base supply chain logistics didn't always work, (Taylor, 1975a). The planning and control of the flow of goods and materials to and from the moon's surface may be the most complicated logistics challenges yet to be attempted. Affordability is tied to the innovation and ingenuity used to keep the transportation and surface operations costs as low as practical. Eleven innovations are proposed and discussed by an entrepreneurial commercial space startup team that has had success in introducing commercial space innovation and reducing the cost of space operations in the past. This logistics architecture offers NASA and other exploring nations a commercial alternative for non-essential cargo. Five transportation technologies and eleven surface innovations create the logistics transportation system discussed.
Space Station fluid management logistics
NASA Technical Reports Server (NTRS)
Dominick, Sam M.
1990-01-01
Viewgraphs and discussion on space station fluid management logistics are presented. Topics covered include: fluid management logistics - issues for Space Station Freedom evolution; current fluid logistics approach; evolution of Space Station Freedom fluid resupply; launch vehicle evolution; ELV logistics system approach; logistics carrier configuration; expendable fluid/propellant carrier description; fluid carrier design concept; logistics carrier orbital operations; carrier operations at space station; summary/status of orbital fluid transfer techniques; Soviet progress tanker system; and Soviet propellant resupply system observations.
Regressive systemic sclerosis.
Black, C; Dieppe, P; Huskisson, T; Hart, F D
1986-01-01
Systemic sclerosis is a disease which usually progresses or reaches a plateau with persistence of symptoms and signs. Regression is extremely unusual. Four cases of established scleroderma are described in which regression is well documented. The significance of this observation and possible mechanisms of disease regression are discussed. Images PMID:3718012
Tharrington, Arnold N.
2015-09-09
The NCCS Regression Test Harness is a software package that provides a framework to perform regression and acceptance testing on NCCS High Performance Computers. The package is written in Python and has only the dependency of a Subversion repository to store the regression tests.
Ehrsam, Eric; Kallini, Joseph R.; Lebas, Damien; Modiano, Philippe; Cotten, Hervé
2016-01-01
Fully regressive melanoma is a phenomenon in which the primary cutaneous melanoma becomes completely replaced by fibrotic components as a result of host immune response. Although 10 to 35 percent of cases of cutaneous melanomas may partially regress, fully regressive melanoma is very rare; only 47 cases have been reported in the literature to date. AH of the cases of fully regressive melanoma reported in the literature were diagnosed in conjunction with metastasis on a patient. The authors describe a case of fully regressive melanoma without any metastases at the time of its diagnosis. Characteristic findings on dermoscopy, as well as the absence of melanoma on final biopsy, confirmed the diagnosis. PMID:27672418
Weine, Stevan Merrill; Ware, Norma; Tugenberg, Toni; Hakizimana, Leonce; Dahnweih, Gonwo; Currie, Madeleine; Wagner, Maureen; Levin, Elise
2013-01-01
Objectives The purpose of this mixed method study was to characterize the patterns of psychosocial adjustment among adolescent African refugees in U.S. resettlement. Methods A purposive sample of 73 recently resettled refugee adolescents from Burundi and Liberia were followed for two years and qualitative and quantitative data was analyzed using a mixed methods exploratory design. Results Protective resources identified were the family and community capacities that can promote youth psychosocial adjustment through: 1) Finances for necessities; 2) English proficiency; 3) Social support networks; 4) Engaged parenting; 5) Family cohesion; 6) Cultural adherence and guidance; 7) Educational support; and, 8) Faith and religious involvement. The researchers first inductively identified 19 thriving, 29 managing, and 25 struggling youths based on review of cases. Univariate analyses then indicated significant associations with country of origin, parental education, and parental employment. Multiple regressions indicated that better psychosocial adjustment was associated with Liberians and living with both parents. Logistic regressions showed that thriving was associated with Liberians and higher parental education, managing with more parental education, and struggling with Burundians and living parents. Qualitative analysis identified how these factors were proxy indicators for protective resources in families and communities. Conclusion These three trajectories of psychosocial adjustment and six domains of protective resources could assist in developing targeted prevention programs and policies for refugee youth. Further rigorous longitudinal mixed-methods study of adolescent refugees in U.S. resettlement are needed. PMID:24205467
Hussey, Michael A; Koch, Gary G; Preisser, John S; Saville, Benjamin R
2016-01-01
Time-to-event or dichotomous outcomes in randomized clinical trials often have analyses using the Cox proportional hazards model or conditional logistic regression, respectively, to obtain covariate-adjusted log hazard (or odds) ratios. Nonparametric Randomization-Based Analysis of Covariance (NPANCOVA) can be applied to unadjusted log hazard (or odds) ratios estimated from a model containing treatment as the only explanatory variable. These adjusted estimates are stratified population-averaged treatment effects and only require a valid randomization to the two treatment groups and avoid key modeling assumptions (e.g., proportional hazards in the case of a Cox model) for the adjustment variables. The methodology has application in the regulatory environment where such assumptions cannot be verified a priori. Application of the methodology is illustrated through three examples on real data from two randomized trials.
NASA Astrophysics Data System (ADS)
Martínez-Fernández, J.; Chuvieco, E.; Koutsias, N.
2013-02-01
Humans are responsible for most forest fires in Europe, but anthropogenic factors behind these events are still poorly understood. We tried to identify the driving factors of human-caused fire occurrence in Spain by applying two different statistical approaches. Firstly, assuming stationary processes for the whole country, we created models based on multiple linear regression and binary logistic regression to find factors associated with fire density and fire presence, respectively. Secondly, we used geographically weighted regression (GWR) to better understand and explore the local and regional variations of those factors behind human-caused fire occurrence. The number of human-caused fires occurring within a 25-yr period (1983-2007) was computed for each of the 7638 Spanish mainland municipalities, creating a binary variable (fire/no fire) to develop logistic models, and a continuous variable (fire density) to build standard linear regression models. A total of 383 657 fires were registered in the study dataset. The binary logistic model, which estimates the probability of having/not having a fire, successfully classified 76.4% of the total observations, while the ordinary least squares (OLS) regression model explained 53% of the variation of the fire density patterns (adjusted R2 = 0.53). Both approaches confirmed, in addition to forest and climatic variables, the importance of variables related with agrarian activities, land abandonment, rural population exodus and developmental processes as underlying factors of fire occurrence. For the GWR approach, the explanatory power of the GW linear model for fire density using an adaptive bandwidth increased from 53% to 67%, while for the GW logistic model the correctly classified observations improved only slightly, from 76.4% to 78.4%, but significantly according to the corrected Akaike Information Criterion (AICc), from 3451.19 to 3321.19. The results from GWR indicated a significant spatial variation in the local
Nuclear Shuttle Logistics Configuration
NASA Technical Reports Server (NTRS)
1971-01-01
This 1971 artist's concept shows the Nuclear Shuttle in both its lunar logistics configuraton and geosynchronous station configuration. As envisioned by Marshall Space Flight Center Program Development persornel, the Nuclear Shuttle would deliver payloads to lunar orbits or other destinations then return to Earth orbit for refueling and additional missions.
Optimal estimation for regression models on τ-year survival probability.
Kwak, Minjung; Kim, Jinseog; Jung, Sin-Ho
2015-01-01
A logistic regression method can be applied to regressing the [Formula: see text]-year survival probability to covariates, if there are no censored observations before time [Formula: see text]. But if some observations are incomplete due to censoring before time [Formula: see text], then the logistic regression cannot be applied. Jung (1996) proposed to modify the score function for logistic regression to accommodate the right-censored observations. His modified score function, motivated for a consistent estimation of regression parameters, becomes a regular logistic score function if no observations are censored before time [Formula: see text]. In this article, we propose a modification of Jung's estimating function for an optimal estimation for the regression parameters in addition to consistency. We prove that the optimal estimator is more efficient than Jung's estimator. This theoretical comparison is illustrated with a real example data analysis and simulations.
Efforts to adjust for confounding by neighborhood using complex survey data.
Brumback, Babette A; Dailey, Amy B; He, Zhulin; Brumback, Lyndia C; Livingston, Melvin D
2010-08-15
In social epidemiology, one often considers neighborhood or contextual effects on health outcomes, in addition to effects of individual exposures. This paper is concerned with the estimation of an individual exposure effect in the presence of confounding by neighborhood effects, motivated by an analysis of National Health Interview Survey (NHIS) data. In the analysis, we operationalize neighborhood as the secondary sampling unit of the survey, which consists of small groups of neighboring census blocks. Thus the neighborhoods are sampled with unequal probabilities, as are individuals within neighborhoods. We develop and compare several approaches for the analysis of the effect of dichotomized individual-level education on the receipt of adequate mammography screening. In the analysis, neighborhood effects are likely to confound the individual effects, due to such factors as differential availability of health services and differential neighborhood culture. The approaches can be grouped into three broad classes: ordinary logistic regression for survey data, with either no effect or a fixed effect for each cluster; conditional logistic regression extended for survey data; and generalized linear mixed model (GLMM) regression for survey data. Standard use of GLMMs with small clusters fails to adjust for confounding by cluster (e.g. neighborhood); this motivated us to develop an adaptation. We use theory, simulation, and analyses of the NHIS data to compare and contrast all of these methods. One conclusion is that all of the methods perform poorly when the sampling bias is strong; more research and new methods are clearly needed.
Assessing risk factors for periodontitis using regression
NASA Astrophysics Data System (ADS)
Lobo Pereira, J. A.; Ferreira, Maria Cristina; Oliveira, Teresa
2013-10-01
Multivariate statistical analysis is indispensable to assess the associations and interactions between different factors and the risk of periodontitis. Among others, regression analysis is a statistical technique widely used in healthcare to investigate and model the relationship between variables. In our work we study the impact of socio-demographic, medical and behavioral factors on periodontal health. Using regression, linear and logistic models, we can assess the relevance, as risk factors for periodontitis disease, of the following independent variables (IVs): Age, Gender, Diabetic Status, Education, Smoking status and Plaque Index. The multiple linear regression analysis model was built to evaluate the influence of IVs on mean Attachment Loss (AL). Thus, the regression coefficients along with respective p-values will be obtained as well as the respective p-values from the significance tests. The classification of a case (individual) adopted in the logistic model was the extent of the destruction of periodontal tissues defined by an Attachment Loss greater than or equal to 4 mm in 25% (AL≥4mm/≥25%) of sites surveyed. The association measures include the Odds Ratios together with the correspondent 95% confidence intervals.
Gerber, Samuel; Rubel, Oliver; Bremer, Peer -Timo; Pascucci, Valerio; Whitaker, Ross T.
2012-01-19
This paper introduces a novel partition-based regression approach that incorporates topological information. Partition-based regression typically introduces a quality-of-fit-driven decomposition of the domain. The emphasis in this work is on a topologically meaningful segmentation. Thus, the proposed regression approach is based on a segmentation induced by a discrete approximation of the Morse–Smale complex. This yields a segmentation with partitions corresponding to regions of the function with a single minimum and maximum that are often well approximated by a linear model. This approach yields regression models that are amenable to interpretation and have good predictive capacity. Typically, regression estimates are quantified by their geometrical accuracy. For the proposed regression, an important aspect is the quality of the segmentation itself. Thus, this article introduces a new criterion that measures the topological accuracy of the estimate. The topological accuracy provides a complementary measure to the classical geometrical error measures and is very sensitive to overfitting. The Morse–Smale regression is compared to state-of-the-art approaches in terms of geometry and topology and yields comparable or improved fits in many cases. Finally, a detailed study on climate-simulation data demonstrates the application of the Morse–Smale regression. Supplementary Materials are available online and contain an implementation of the proposed approach in the R package msr, an analysis and simulations on the stability of the Morse–Smale complex approximation, and additional tables for the climate-simulation study.
Improved Regression Calibration
ERIC Educational Resources Information Center
Skrondal, Anders; Kuha, Jouni
2012-01-01
The likelihood for generalized linear models with covariate measurement error cannot in general be expressed in closed form, which makes maximum likelihood estimation taxing. A popular alternative is regression calibration which is computationally efficient at the cost of inconsistent estimation. We propose an improved regression calibration…
Gerber, Samuel; Rübel, Oliver; Bremer, Peer-Timo; Pascucci, Valerio; Whitaker, Ross T.
2012-01-01
This paper introduces a novel partition-based regression approach that incorporates topological information. Partition-based regression typically introduce a quality-of-fit-driven decomposition of the domain. The emphasis in this work is on a topologically meaningful segmentation. Thus, the proposed regression approach is based on a segmentation induced by a discrete approximation of the Morse-Smale complex. This yields a segmentation with partitions corresponding to regions of the function with a single minimum and maximum that are often well approximated by a linear model. This approach yields regression models that are amenable to interpretation and have good predictive capacity. Typically, regression estimates are quantified by their geometrical accuracy. For the proposed regression, an important aspect is the quality of the segmentation itself. Thus, this paper introduces a new criterion that measures the topological accuracy of the estimate. The topological accuracy provides a complementary measure to the classical geometrical error measures and is very sensitive to over-fitting. The Morse-Smale regression is compared to state-of-the-art approaches in terms of geometry and topology and yields comparable or improved fits in many cases. Finally, a detailed study on climate-simulation data demonstrates the application of the Morse-Smale regression. Supplementary materials are available online and contain an implementation of the proposed approach in the R package msr, an analysis and simulations on the stability of the Morse-Smale complex approximation and additional tables for the climate-simulation study. PMID:23687424
Bonnín, C M; Torrent, C; Goikolea, J M; Reinares, M; Solé, B; Valentí, M; Sánchez-Moreno, J; Hidalgo, D; Tabarés-Seisdedos, R; Martínez-Arán, A; Vieta, E
2014-04-01
The aim of this study was to study the clinical and neurocognitive variables that best explain poor work adjustment in a sample of bipolar I euthymic patients. Eighty-five euthymic patients at the Hospital Clinic of Barcelona were assessed for this study by means of a comprehensive neuropsychological battery and a work-focused interview to determine work adjustment. Clinical and sociodemographic variables were also collected. Direct logistic regression was performed to assess the impact of demographic, clinical and neuropsychological variables on the likelihood of presenting poor work adjustment. The model that best fitted contained five variables (Hamilton Depression Rating scores, number of manic episodes, number of perseverative errors in the Wisconsin Card Sorting Test (WCST), number of depressive episodes and Trail Making Test-part B). However, only two out of these variables made a unique statistically significant contribution to the model, which were number of manic episodes (OR 1.401; CI 1.05-1.86; p = 0.021) and number of perseverative errors in the WCST (OR 1.062; CI 1.00-1.12; p = 0.044). The model explained up to 36 % of the variance in work adjustment. This study highlights the role of manic relapses and neurocognitive impairment, specifically the role of executive function, in work adjustment.
Logistics Assessment Guidebook
2011-07-01
significant. Another case involved the integration of an aircraft and ground vehicle system. Integration issues were identified early in the design...for height and width; insufficient power requirements to support maintenance actions; and insufficient design of the autonomic logistics system...evaluation at IOT &E and thus on track for Initial Operational Capability (IOC). If not, a plan is in place to ensure they are met (ref DoDI 5000; CJCSM
Logistics Software Implementation.
1986-02-28
to obtain benchmark performance results. The list presented in Appendix A is not final and should be considered only as the initial worksheet . d...has been used in more applications and its imple-... mentation is better understood. All of the preliminary tests have indi- cated excellent ...environment. The fifth-generation languages, such as PROLOG, also provide excellent dynamic data base support. The proposed approach for the Logistics
2014-12-04
2014 by: , Director, Graduate Degree Programs Robert F. Baumann, PhD The opinions and conclusions expressed herein are those of the...Washington, DC: Center for Military History, 1993), 244. 24 Spencer C. Tucker and Priscilla Mary Roberts , Encyclopedia of World War I (Santa Barbara...Theater of Operations. See Richard M. Leighton and Robert W. Coakley, Global Logistics and Strategy 1940-1943 (Washington, DC: Center for Military
Relative risk regression analysis of epidemiologic data.
Prentice, R L
1985-11-01
Relative risk regression methods are described. These methods provide a unified approach to a range of data analysis problems in environmental risk assessment and in the study of disease risk factors more generally. Relative risk regression methods are most readily viewed as an outgrowth of Cox's regression and life model. They can also be viewed as a regression generalization of more classical epidemiologic procedures, such as that due to Mantel and Haenszel. In the context of an epidemiologic cohort study, relative risk regression methods extend conventional survival data methods and binary response (e.g., logistic) regression models by taking explicit account of the time to disease occurrence while allowing arbitrary baseline disease rates, general censorship, and time-varying risk factors. This latter feature is particularly relevant to many environmental risk assessment problems wherein one wishes to relate disease rates at a particular point in time to aspects of a preceding risk factor history. Relative risk regression methods also adapt readily to time-matched case-control studies and to certain less standard designs. The uses of relative risk regression methods are illustrated and the state of development of these procedures is discussed. It is argued that asymptotic partial likelihood estimation techniques are now well developed in the important special case in which the disease rates of interest have interpretations as counting process intensity functions. Estimation of relative risks processes corresponding to disease rates falling outside this class has, however, received limited attention. The general area of relative risk regression model criticism has, as yet, not been thoroughly studied, though a number of statistical groups are studying such features as tests of fit, residuals, diagnostics and graphical procedures. Most such studies have been restricted to exponential form relative risks as have simulation studies of relative risk estimation
Schmid, Matthias; Wickler, Florian; Maloney, Kelly O.; Mitchell, Richard; Fenske, Nora; Mayr, Andreas
2013-01-01
Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1). Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures. PMID:23626706
Federal Register 2010, 2011, 2012, 2013, 2014
2010-09-13
... Logistics, Inc., Florence, SC; Amended Certification Regarding Eligibility To Apply for Worker Adjustment... Schenkers Logistics, Inc., Florence, South Carolina (subject firm). The Department's notice was published in... Logistics, Inc., Florence, South Carolina, who became totally or partially separated from employment on...
George: Gaussian Process regression
NASA Astrophysics Data System (ADS)
Foreman-Mackey, Daniel
2015-11-01
George is a fast and flexible library, implemented in C++ with Python bindings, for Gaussian Process regression useful for accounting for correlated noise in astronomical datasets, including those for transiting exoplanet discovery and characterization and stellar population modeling.
Understanding poisson regression.
Hayat, Matthew J; Higgins, Melinda
2014-04-01
Nurse investigators often collect study data in the form of counts. Traditional methods of data analysis have historically approached analysis of count data either as if the count data were continuous and normally distributed or with dichotomization of the counts into the categories of occurred or did not occur. These outdated methods for analyzing count data have been replaced with more appropriate statistical methods that make use of the Poisson probability distribution, which is useful for analyzing count data. The purpose of this article is to provide an overview of the Poisson distribution and its use in Poisson regression. Assumption violations for the standard Poisson regression model are addressed with alternative approaches, including addition of an overdispersion parameter or negative binomial regression. An illustrative example is presented with an application from the ENSPIRE study, and regression modeling of comorbidity data is included for illustrative purposes.
NASA Astrophysics Data System (ADS)
de Carvalho, R. Egydio; Leonel, Edson D.
2016-12-01
A periodic time perturbation is introduced in the logistic map as an attempt to investigate new scenarios of bifurcations and new mechanisms toward the chaos. With a squared sine perturbation we observe that a point attractor reaches the chaotic attractor without following a cascade of bifurcations. One fixed point of the system presents a new scenario of bifurcations through an infinite sequence of alternating changes of stability. At the bifurcations, the perturbation does not modify the scaling features observed in the convergence toward the stationary state.
Estimation of the Regression Effect Using a Latent Trait Model.
ERIC Educational Resources Information Center
Quinn, Jimmy L.
A logistic model was used to generate data to serve as a proxy for an immediate retest from item responses to a fourth grade standardized reading comprehension test of 45 items. Assuming that the actual test may be considered a pretest and the proxy data may be considered a retest, the effect of regression was investigated using a percentage of…
Ridge Regression: A Regression Procedure for Analyzing correlated Independent Variables
ERIC Educational Resources Information Center
Rakow, Ernest A.
1978-01-01
Ridge regression is a technique used to ameliorate the problem of highly correlated independent variables in multiple regression analysis. This paper explains the fundamentals of ridge regression and illustrates its use. (JKS)
Developing Future Strategic Logistics Leaders
2013-03-01
and our focus on tactical level lessons learned has 12 skewed our emphasis in PME to a very narrow band of excellence. As such our current...A partnership between senior logistics leaders, PME developers, and personnel managers is essential to constructing and maintaining strategic...short term and inconsistent PME system that does not produce strategic logistic leaders. The Logistics Corps, sustainment branches, LOO lead for leader
Space Shuttle operational logistics plan
NASA Technical Reports Server (NTRS)
Botts, J. W.
1983-01-01
The Kennedy Space Center plan for logistics to support Space Shuttle Operations and to establish the related policies, requirements, and responsibilities are described. The Directorate of Shuttle Management and Operations logistics responsibilities required by the Kennedy Organizational Manual, and the self-sufficiency contracting concept are implemented. The Space Shuttle Program Level 1 and Level 2 logistics policies and requirements applicable to KSC that are presented in HQ NASA and Johnson Space Center directives are also implemented.
Modern Regression Discontinuity Analysis
ERIC Educational Resources Information Center
Bloom, Howard S.
2012-01-01
This article provides a detailed discussion of the theory and practice of modern regression discontinuity (RD) analysis for estimating the effects of interventions or treatments. Part 1 briefly chronicles the history of RD analysis and summarizes its past applications. Part 2 explains how in theory an RD analysis can identify an average effect of…
Multiple linear regression analysis
NASA Technical Reports Server (NTRS)
Edwards, T. R.
1980-01-01
Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.
Explorations in Statistics: Regression
ERIC Educational Resources Information Center
Curran-Everett, Douglas
2011-01-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This seventh installment of "Explorations in Statistics" explores regression, a technique that estimates the nature of the relationship between two things for which we may only surmise a mechanistic or predictive…
2011-01-01
Background Although the validity and safety of antipsychotic polypharmacy remains unclear, it is commonplace in the treatment of schizophrenia. This study aimed to investigate the degree that antipsychotic polypharmacy contributed to metabolic syndrome in outpatients with schizophrenia, after adjustment for the effects of lifestyle. Methods A cross-sectional survey was carried out between April 2007 and October 2007 at Yamanashi Prefectural KITA hospital in Japan. 334 patients consented to this cross-sectional study. We measured the components consisting metabolic syndrome, and interviewed the participants about their lifestyle. We classified metabolic syndrome into four groups according to the severity of metabolic disturbance: the metabolic syndrome; the pre-metabolic syndrome; the visceral fat obesity; and the normal group. We used multinomial logistic regression models to assess the association of metabolic syndrome with antipsychotic polypharmacy, adjusting for lifestyle. Results Seventy-four (22.2%) patients were in the metabolic syndrome group, 61 (18.3%) patients were in the pre-metabolic syndrome group, and 41 (12.3%) patients were in visceral fat obesity group. Antipsychotic polypharmacy was present in 167 (50.0%) patients. In multinomial logistic regression analyses, antipsychotic polypharmacy was significantly associated with the pre-metabolic syndrome group (adjusted odds ratio [AOR], 2.348; 95% confidence interval [CI], 1.181-4.668), but not with the metabolic syndrome group (AOR, 1.269; 95%CI, 0.679-2.371). Conclusions These results suggest that antipsychotic polypharmacy, compared with monotherapy, may be independently associated with an increased risk of having pre-metabolic syndrome, even after adjusting for patients' lifestyle characteristics. As metabolic syndrome is associated with an increased risk of cardiovascular mortality, further studies are needed to clarify the validity and safety of antipsychotic polypharmacy. PMID:21791046
NASA Technical Reports Server (NTRS)
1963-01-01
This document has been prepared to incorporate all presentation aid material, together with some explanatory text, used during an oral briefing on the Nuclear Lunar Logistics System given at the George C. Marshall Space Flight Center, National Aeronautics and Space Administration, on 18 July 1963. The briefing and this document are intended to present the general status of the NERVA (Nuclear Engine for Rocket Vehicle Application) nuclear rocket development, the characteristics of certain operational NERVA-class engines, and appropriate technical and schedule information. Some of the information presented herein is preliminary in nature and will be subject to further verification, checking and analysis during the remainder of the study program. In addition, more detailed information will be prepared in many areas for inclusion in a final summary report. This work has been performed by REON, a division of Aerojet-General Corporation under Subcontract 74-10039 from the Lockheed Missiles and Space Company. The presentation and this document have been prepared in partial fulfillment of the provisions of the subcontract. From the inception of the NERVA program in July 1961, the stated emphasis has centered around the demonstration of the ability of a nuclear rocket to perform safely and reliably in the space environment, with the understanding that the assignment of a mission (or missions) would place undue emphasis on performance and operational flexibility. However, all were aware that the ultimate justification for the development program must lie in the application of the nuclear propulsion system to the national space objectives.
Selected Logistics Models and Techniques.
1984-09-01
Programmable Calculator LCC...Program 27 TI-59 Programmable Calculator LCC Model 30 Unmanned Spacecraft Cost Model 31 iv I: TABLE OF CONTENTS (CONT’D) (Subject Index) LOGISTICS...34"" - % - "° > - " ° .° - " .’ > -% > ]*° - LOGISTICS ANALYSIS MODEL/TECHNIQUE DATA MODEL/TECHNIQUE NAME: TI-59 Programmable Calculator LCC Model TYPE MODEL: Cost Estimating DEVELOPED BY:
Saboonchi, Fredrik; Petersson, Lena-Marie; Wennman-Larsen, Agneta; Alexanderson, Kristina; Vaez, Marjan
2015-01-01
Anxiety is one of the main components of distress among women with breast cancer (BC), particularly in the early stages of the disease. Changes in anxiety over time may reflect the process of adjustment or lack thereof. The process of adjustment in the traverse of acute to transitional stages of survivorship warrants further examination. To examine the trajectory of anxiety and the specific patterns that may indicate a lack of adjustment within 2 years following BC surgery, survey data from a 2-year prospective cohort study of 725 women with BC were analyzed by Mixture Growth Modelling and logistic regression and Analysis of Variance. A piece-wise growth curve displayed the best fit to the data, indicating a significant decrease in anxiety in the first year, followed by a slower rate of change during the second year. Four classes of trajectories were identified: High Stable, High Decrease, Mild Decrease, and Low Decrease. Of these, High Stable anxiety showed the most substantive indications of lack of adjustment. This subgroup was predominantly characterized by sociodemographic variables such as financial difficulties. Our results support an emphasis on the transitional nature of the stage that follows the end of primary active treatment and imply a need for supportive follow up care for those who display lack of adjustment at this stage.
A logistic mixture model for a family-based association study
Xing, Guan; Xing, Chao; Lu, Qing; Elston, Robert C
2007-01-01
A family-based association study design is not only able to localize causative genes more precisely than linkage analysis, but it also helps explain the genetic mechanism underlying the trait under study. Therefore, it can be used to follow up an initial linkage scan. For an association study of binary traits in general pedigrees, we propose a logistic mixture model that regresses the trait value on the genotypic values of markers under investigation and other covariates such as environmental factors. We first tested both the validity and power of the new model by simulating nuclear families inheriting a simple Mendelian trait. It is powerful when the correct disease model is specified and shows much loss of power when the dominance of a model is inversely specified, i.e., a dominant model is wrongly specified as recessive or vice versa. We then applied the new model to the Genetic Analysis Workshop (GAW) 15 simulation data to test the performance of the model when adjusting for covariates in the case of complex traits. Adjusting for the covariate that interacts with disease loci improves the power to detect association. The simplest version of the model only takes monogenic inheritance into account, but analysis of the GAW simulation data shows that even this simple model can be powerful for complex traits. PMID:18466543
A logistic mixture model for a family-based association study.
Xing, Guan; Xing, Chao; Lu, Qing; Elston, Robert C
2007-01-01
A family-based association study design is not only able to localize causative genes more precisely than linkage analysis, but it also helps explain the genetic mechanism underlying the trait under study. Therefore, it can be used to follow up an initial linkage scan. For an association study of binary traits in general pedigrees, we propose a logistic mixture model that regresses the trait value on the genotypic values of markers under investigation and other covariates such as environmental factors. We first tested both the validity and power of the new model by simulating nuclear families inheriting a simple Mendelian trait. It is powerful when the correct disease model is specified and shows much loss of power when the dominance of a model is inversely specified, i.e., a dominant model is wrongly specified as recessive or vice versa. We then applied the new model to the Genetic Analysis Workshop (GAW) 15 simulation data to test the performance of the model when adjusting for covariates in the case of complex traits. Adjusting for the covariate that interacts with disease loci improves the power to detect association. The simplest version of the model only takes monogenic inheritance into account, but analysis of the GAW simulation data shows that even this simple model can be powerful for complex traits.
Akacha, Mouna; Hutton, Jane L
2011-05-10
The Collaborative Ankle Support Trial (CAST) is a longitudinal trial of treatments for severe ankle sprains in which interest lies in the rate of improvement, the effectiveness of reminders and potentially informative missingness. A model is proposed for continuous longitudinal data with non-ignorable or informative missingness, taking into account the nature of attempts made to contact initial non-responders. The model combines a non-linear mixed model for the outcome model with logistic regression models for the reminder processes. A sensitivity analysis is used to contrast this model with the traditional selection model, where we adjust for missingness by modelling the missingness process. The conclusions that recovery is slower, and less satisfactory with age and more rapid with below knee cast than with a tubular bandage do not alter materially across all models investigated. The results also suggest that phone calls are most effective in retrieving questionnaires.
Calculating a Stepwise Ridge Regression.
ERIC Educational Resources Information Center
Morris, John D.
1986-01-01
Although methods for using ordinary least squares regression computer programs to calculate a ridge regression are available, the calculation of a stepwise ridge regression requires a special purpose algorithm and computer program. The correct stepwise ridge regression procedure is given, and a parallel FORTRAN computer program is described.…
Orthogonal Regression: A Teaching Perspective
ERIC Educational Resources Information Center
Carr, James R.
2012-01-01
A well-known approach to linear least squares regression is that which involves minimizing the sum of squared orthogonal projections of data points onto the best fit line. This form of regression is known as orthogonal regression, and the linear model that it yields is known as the major axis. A similar method, reduced major axis regression, is…
Kramer, S.
1996-12-31
In many real-world domains the task of machine learning algorithms is to learn a theory for predicting numerical values. In particular several standard test domains used in Inductive Logic Programming (ILP) are concerned with predicting numerical values from examples and relational and mostly non-determinate background knowledge. However, so far no ILP algorithm except one can predict numbers and cope with nondeterminate background knowledge. (The only exception is a covering algorithm called FORS.) In this paper we present Structural Regression Trees (SRT), a new algorithm which can be applied to the above class of problems. SRT integrates the statistical method of regression trees into ILP. It constructs a tree containing a literal (an atomic formula or its negation) or a conjunction of literals in each node, and assigns a numerical value to each leaf. SRT provides more comprehensible results than purely statistical methods, and can be applied to a class of problems most other ILP systems cannot handle. Experiments in several real-world domains demonstrate that the approach is competitive with existing methods, indicating that the advantages are not at the expense of predictive accuracy.
Logistics in smallpox: the legacy.
Wickett, John; Carrasco, Peter
2011-12-30
Logistics, defined as "the time-related positioning of resources" was critical to the implementation of the smallpox eradication strategy of surveillance and containment. Logistical challenges in the smallpox programme included vaccine delivery, supplies, staffing, vehicle maintenance, and financing. Ensuring mobility was essential as health workers had to travel to outbreaks to contain them. Three examples illustrate a range of logistic challenges which required imagination and innovation. Standard price lists were developed to expedite vehicle maintenance and repair in Bihar, India. Innovative staffing ensured an adequate infrastructure for vehicle maintenance in Bangladesh. The use of disaster relief mechanisms in Somalia provided airlifts, vehicles and funding within 27 days of their initiation. In contrast the Expanded Programme on Immunization (EPI) faces more complex logistical challenges.
What Exactly is Space Logistics?
2011-01-01
information on the space shuttle in the news and have inferred by now that resupply missions to the International Space Sta- tion, Hubble Space ... Telescope repair missions, and satellite deployment missions are space logistics—and you would be technically correct. The science of logistics as applied...What Exactly is Space Logistics? James C. Breidenbach Report Documentation Page Form ApprovedOMB No. 0704-0188 Public reporting burden for the
Logistics Transformation: The Paradigm Shift
2007-05-14
NUMBER 6. AUTHOR(S) MAJ Derrick A. Corbett 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND...ADDRESS(ES) 8. PERFORMING ORGANIZATION REPORT U.S. Army Command and General Staff College School of Advanced Military Studies 250 Gibbon...logistics transformation be defined. In the 18th Century, the French invented “a third military science which they called Logistique , or Logistics
NASA Technical Reports Server (NTRS)
Kuhl, Mark R.
1990-01-01
Current navigation requirements depend on a geometric dilution of precision (GDOP) criterion. As long as the GDOP stays below a specific value, navigation requirements are met. The GDOP will exceed the specified value when the measurement geometry becomes too collinear. A new signal processing technique, called Ridge Regression Processing, can reduce the effects of nearly collinear measurement geometry; thereby reducing the inflation of the measurement errors. It is shown that the Ridge signal processor gives a consistently better mean squared error (MSE) in position than the Ordinary Least Mean Squares (OLS) estimator. The applicability of this technique is currently being investigated to improve the following areas: receiver autonomous integrity monitoring (RAIM), coverage requirements, availability requirements, and precision approaches.
NASA Space Rocket Logistics Challenges
NASA Technical Reports Server (NTRS)
Neeley, James R.; Jones, James V.; Watson, Michael D.; Bramon, Christopher J.; Inman, Sharon K.; Tuttle, Loraine
2014-01-01
The Space Launch System (SLS) is the new NASA heavy lift launch vehicle and is scheduled for its first mission in 2017. The goal of the first mission, which will be uncrewed, is to demonstrate the integrated system performance of the SLS rocket and spacecraft before a crewed flight in 2021. SLS has many of the same logistics challenges as any other large scale program. Common logistics concerns for SLS include integration of discreet programs geographically separated, multiple prime contractors with distinct and different goals, schedule pressures and funding constraints. However, SLS also faces unique challenges. The new program is a confluence of new hardware and heritage, with heritage hardware constituting seventy-five percent of the program. This unique approach to design makes logistics concerns such as commonality especially problematic. Additionally, a very low manifest rate of one flight every four years makes logistics comparatively expensive. That, along with the SLS architecture being developed using a block upgrade evolutionary approach, exacerbates long-range planning for supportability considerations. These common and unique logistics challenges must be clearly identified and tackled to allow SLS to have a successful program. This paper will address the common and unique challenges facing the SLS programs, along with the analysis and decisions the NASA Logistics engineers are making to mitigate the threats posed by each.
A development of logistics management models for the Space Transportation System
NASA Technical Reports Server (NTRS)
Carrillo, M. J.; Jacobsen, S. E.; Abell, J. B.; Lippiatt, T. F.
1983-01-01
A new analytic queueing approach was described which relates stockage levels, repair level decisions, and the project network schedule of prelaunch operations directly to the probability distribution of the space transportation system launch delay. Finite source population and limited repair capability were additional factors included in this logistics management model developed specifically for STS maintenance requirements. Data presently available to support logistics decisions were based on a comparability study of heavy aircraft components. A two-phase program is recommended by which NASA would implement an integrated data collection system, assemble logistics data from previous STS flights, revise extant logistics planning and resource requirement parameters using Bayes-Lin techniques, and adjust for uncertainty surrounding logistics systems performance parameters. The implementation of these recommendations can be expected to deliver more cost-effective logistics support.
Logistic Mixed Models to Investigate Implicit and Explicit Belief Tracking
Lages, Martin; Scheel, Anne
2016-01-01
We investigated the proposition of a two-systems Theory of Mind in adults’ belief tracking. A sample of N = 45 participants predicted the choice of one of two opponent players after observing several rounds in an animated card game. Three matches of this card game were played and initial gaze direction on target and subsequent choice predictions were recorded for each belief task and participant. We conducted logistic regressions with mixed effects on the binary data and developed Bayesian logistic mixed models to infer implicit and explicit mentalizing in true belief and false belief tasks. Although logistic regressions with mixed effects predicted the data well a Bayesian logistic mixed model with latent task- and subject-specific parameters gave a better account of the data. As expected explicit choice predictions suggested a clear understanding of true and false beliefs (TB/FB). Surprisingly, however, model parameters for initial gaze direction also indicated belief tracking. We discuss why task-specific parameters for initial gaze directions are different from choice predictions yet reflect second-order perspective taking. PMID:27853440
Do subfertile women adjust their habits when trying to conceive?
Joelsson, Lana Salih; Berglund, Anna; Wånggren, Kjell; Lood, Mikael; Rosenblad, Andreas; Tydén, Tanja
2016-01-01
Aim The aim of this study was to investigate lifestyle habits and lifestyle adjustments among subfertile women trying to conceive. Materials and methods Women (n = 747) were recruited consecutively at their first visit to fertility clinics in mid-Sweden. Participants completed a questionnaire. Data were analyzed using logistic regression, t tests, and chi-square tests. Results The response rate was 62% (n = 466). Mean duration of infertility was 1.9 years. During this time 13.2% used tobacco daily, 13.6% drank more than three cups of coffee per day, and 11.6% consumed more than two glasses of alcohol weekly. In this sample, 23.9% of the women were overweight (body mass index, BMI 25–29.9 kg/m2), and 12.5% were obese (BMI ≥30 kg/m2). Obese women exercised more and changed to healthy diets more frequently than normal-weight women (odds ratio 7.43; 95% confidence interval 3.7–14.9). Six out of ten women (n = 266) took folic acid when they started trying to conceive, but 11% stopped taking folic acid after some time. Taking folic acid was associated with a higher level of education (p < 0.001). Conclusions Among subfertile women, one-third were overweight or obese, and some had other lifestyle factors with known adverse effects on fertility such as use of tobacco. Overweight and obese women adjusted their habits but did not reduce their body mass index. Women of fertile age would benefit from preconception counseling, and the treatment of infertility should routinely offer interventions for lifestyle changes. PMID:27216564
Joint regression analysis of correlated data using Gaussian copulas.
Song, Peter X-K; Li, Mingyao; Yuan, Ying
2009-03-01
This article concerns a new joint modeling approach for correlated data analysis. Utilizing Gaussian copulas, we present a unified and flexible machinery to integrate separate one-dimensional generalized linear models (GLMs) into a joint regression analysis of continuous, discrete, and mixed correlated outcomes. This essentially leads to a multivariate analogue of the univariate GLM theory and hence an efficiency gain in the estimation of regression coefficients. The availability of joint probability models enables us to develop a full maximum likelihood inference. Numerical illustrations are focused on regression models for discrete correlated data, including multidimensional logistic regression models and a joint model for mixed normal and binary outcomes. In the simulation studies, the proposed copula-based joint model is compared to the popular generalized estimating equations, which is a moment-based estimating equation method to join univariate GLMs. Two real-world data examples are used in the illustration.
Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.
Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg
2009-11-01
G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.
Logistics Management: New trends in the Reverse Logistics
NASA Astrophysics Data System (ADS)
Antonyová, A.; Antony, P.; Soewito, B.
2016-04-01
Present level and quality of the environment are directly dependent on our access to natural resources, as well as their sustainability. In particular production activities and phenomena associated with it have a direct impact on the future of our planet. Recycling process, which in large enterprises often becomes an important and integral part of the production program, is usually in small and medium-sized enterprises problematic. We can specify a few factors, which have direct impact on the development and successful application of the effective reverse logistics system. Find the ways to economically acceptable model of reverse logistics, focusing on converting waste materials for renewable energy, is the task in progress.
Regression analysis for solving diagnosis problem of children's health
NASA Astrophysics Data System (ADS)
Cherkashina, Yu A.; Gerget, O. M.
2016-04-01
The paper includes results of scientific researches. These researches are devoted to the application of statistical techniques, namely, regression analysis, to assess the health status of children in the neonatal period based on medical data (hemostatic parameters, parameters of blood tests, the gestational age, vascular-endothelial growth factor) measured at 3-5 days of children's life. In this paper a detailed description of the studied medical data is given. A binary logistic regression procedure is discussed in the paper. Basic results of the research are presented. A classification table of predicted values and factual observed values is shown, the overall percentage of correct recognition is determined. Regression equation coefficients are calculated, the general regression equation is written based on them. Based on the results of logistic regression, ROC analysis was performed, sensitivity and specificity of the model are calculated and ROC curves are constructed. These mathematical techniques allow carrying out diagnostics of health of children providing a high quality of recognition. The results make a significant contribution to the development of evidence-based medicine and have a high practical importance in the professional activity of the author.
Operational Logistics: Lessons from the Inchon Landing.
2007-11-02
logistics at the strategic, operational, or tactical levels. An analysis of Operation Chromite , the amphibious landing at Inchon, reveals that logistical...commander’s intent, and define the logistics focus of effort demonstrate that operational logistics was a key enabler in Operation Chromite . This analysis
Dose-Weighted Adjusted Mantel-Haenszel Tests for Numeric Scaled Strata in a Randomized Trial
Gansky, Stuart A.; Cheng, Nancy F.; Koch, Gary G.
2011-01-01
A recent three-arm parallel groups randomized clinical prevention trial had a protocol deviation causing participants to have fewer active doses of an in-office treatment than planned. The original statistical analysis plan stipulated a minimal assumption randomization-based extended Mantel-Haenszel (EMH) trend test of the high frequency, low frequency, and zero frequency treatment groups and a binary outcome. Thus a dose-weighted adjusted EMH (DWAEMH) test was developed with an extra set of weights corresponding to the number of active doses actually available, in the spirit of a pattern mixture model. The method can easily be implemented using standard statistical software. A set of Monte Carlo simulations using a logistic model was undertaken with (and without) actual dose-response effects through 1000 replicates for empirical power estimates (and 2100 for empirical size). Results showed size was maintained and power was improved for DWAEMH versus EMH and logistic regression Wald tests in the presence of a dose effect and treatment by dose interaction. PMID:21709814
Logistics background study: underground mining
Hanslovan, J. J.; Visovsky, R. G.
1982-02-01
Logistical functions that are normally associated with US underground coal mining are investigated and analyzed. These functions imply all activities and services that support the producing sections of the mine. The report provides a better understanding of how these functions impact coal production in terms of time, cost, and safety. Major underground logistics activities are analyzed and include: transportation and personnel, supplies and equipment; transportation of coal and rock; electrical distribution and communications systems; water handling; hydraulics; and ventilation systems. Recommended areas for future research are identified and prioritized.
Continual Improvement in Shuttle Logistics
NASA Technical Reports Server (NTRS)
Flowers, Jean; Schafer, Loraine
1995-01-01
It has been said that Continual Improvement (CI) is difficult to apply to service oriented functions, especially in a government agency such as NASA. However, a constrained budget and increasing requirements are a way of life at NASA Kennedy Space Center (KSC), making it a natural environment for the application of CI tools and techniques. This paper describes how KSC, and specifically the Space Shuttle Logistics Project, a key contributor to KSC's mission, has embraced the CI management approach as a means of achieving its strategic goals and objectives. An overview of how the KSC Space Shuttle Logistics Project has structured its CI effort and examples of some of the initiatives are provided.
Evaluating differential effects using regression interactions and regression mixture models
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This paper focuses on understanding regression mixture models, a relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their formulation, and their assumptions are compared using Monte Carlo simulations and real data analysis. The capabilities of regression mixture models are described and specific issues to be addressed when conducting regression mixtures are proposed. The paper aims to clarify the role that regression mixtures can take in the estimation of differential effects and increase awareness of the benefits and potential pitfalls of this approach. Regression mixture models are shown to be a potentially effective exploratory method for finding differential effects when these effects can be defined by a small number of classes of respondents who share a typical relationship between a predictor and an outcome. It is also shown that the comparison between regression mixture models and interactions becomes substantially more complex as the number of classes increases. It is argued that regression interactions are well suited for direct tests of specific hypotheses about differential effects and regression mixtures provide a useful approach for exploring effect heterogeneity given adequate samples and study design. PMID:26556903
NASA Space Rocket Logistics Challenges
NASA Technical Reports Server (NTRS)
Bramon, Chris; Neeley, James R.; Jones, James V.; Watson, Michael D.; Inman, Sharon K.; Tuttle, Loraine
2014-01-01
The Space Launch System (SLS) is the new NASA heavy lift launch vehicle in development and is scheduled for its first mission in 2017. SLS has many of the same logistics challenges as any other large scale program. However, SLS also faces unique challenges. This presentation will address the SLS challenges, along with the analysis and decisions to mitigate the threats posed by each.
Equivalent Linear Logistic Test Models.
ERIC Educational Resources Information Center
Bechger, Timo M.; Verstralen, Huub H. F. M.; Verhelst, Norma D.
2002-01-01
Discusses the Linear Logistic Test Model (LLTM) and demonstrates that there are many equivalent ways to specify a model. Analyzed a real data set (300 responses to 5 analogies) using a Lagrange multiplier test for the specification of the model, and demonstrated that there may be many ways to change the specification of an LLTM and achieve the…
Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models
ERIC Educational Resources Information Center
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…
Error bounds in cascading regressions
Karlinger, M.R.; Troutman, B.M.
1985-01-01
Cascading regressions is a technique for predicting a value of a dependent variable when no paired measurements exist to perform a standard regression analysis. Biases in coefficients of a cascaded-regression line as well as error variance of points about the line are functions of the correlation coefficient between dependent and independent variables. Although this correlation cannot be computed because of the lack of paired data, bounds can be placed on errors through the required properties of the correlation coefficient. The potential meansquared error of a cascaded-regression prediction can be large, as illustrated through an example using geomorphologic data. ?? 1985 Plenum Publishing Corporation.
2015-01-01
Background Over the past 50,000 years, shifts in human-environmental or human-human interactions shaped genetic differences within and among human populations, including variants under positive selection. Shaped by environmental factors, such variants influence the genetics of modern health, disease, and treatment outcome. Because evolutionary processes tend to act on gene regulation, we test whether regulatory variants are under positive selection. We introduce a new approach to enhance detection of genetic markers undergoing positive selection, using conditional entropy to capture recent local selection signals. Results We use conditional logistic regression to compare our Adjusted Haplotype Conditional Entropy (H|H) measure of positive selection to existing positive selection measures. H|H and existing measures were applied to published regulatory variants acting in cis (cis-eQTLs), with conditional logistic regression testing whether regulatory variants undergo stronger positive selection than the surrounding gene. These cis-eQTLs were drawn from six independent studies of genotype and RNA expression. The conditional logistic regression shows that, overall, H|H is substantially more powerful than existing positive-selection methods in identifying cis-eQTLs against other Single Nucleotide Polymorphisms (SNPs) in the same genes. When broken down by Gene Ontology, H|H predictions are particularly strong in some biological process categories, where regulatory variants are under strong positive selection compared to the bulk of the gene, distinct from those GO categories under overall positive selection. . However, cis-eQTLs in a second group of genes lack positive selection signatures detectable by H|H, consistent with ancient short haplotypes compared to the surrounding gene (for example, in innate immunity GO:0042742); under such other modes of selection, H|H would not be expected to be a strong predictor.. These conditional logistic regression models are
Logistics, electronic commerce, and the environment
NASA Astrophysics Data System (ADS)
Sarkis, Joseph; Meade, Laura; Talluri, Srinivas
2002-02-01
Organizations realize that a strong supporting logistics or electronic logistics (e-logistics) function is important from both commercial and consumer perspectives. The implications of e-logistics models and practices cover the forward and reverse logistics functions of organizations. They also have direct and profound impact on the natural environment. This paper will focus on a discussion of forward and reverse e-logistics and their relationship to the natural environment. After discussion of the many pertinent issues in these areas, directions of practice and implications for study and research are then described.
Precision Efficacy Analysis for Regression.
ERIC Educational Resources Information Center
Brooks, Gordon P.
When multiple linear regression is used to develop a prediction model, sample size must be large enough to ensure stable coefficients. If the derivation sample size is inadequate, the model may not predict well for future subjects. The precision efficacy analysis for regression (PEAR) method uses a cross- validity approach to select sample sizes…
Pazvakawambwa, Lillian; Indongo, Nelago; Kazembe, Lawrence N.
2013-01-01
Background Marriage is a significant event in life-course of individuals, and creates a system that characterizes societal and economic structures. Marital patterns and dynamics over the years have changed a lot, with decreasing proportions of marriage, increased levels of divorce and co-habitation in developing countries. Although, such changes have been reported in African societies including Namibia, they have largely remained unexplained. Objectives and Methods In this paper, we examined trends and patterns of marital status of women of marriageable age: 15 to 49 years, in Namibia using the 1992, 2000 and 2006 Demographic and Health Survey (DHS) data. Trends were established for selected demographic variables. Two binary logistic regression models for ever-married versus never married, and cohabitation versus married were fitted to establish factors associated with such nuptial systems. Further a multinomial logistic regression models, adjusted for bio-demographic and socio-economic variables, were fitted separately for each year, to establish determinants of type of union (never married, married and cohabitation). Results and Conclusions Findings indicate a general change away from marriage, with a shift in singulate mean age at marriage. Cohabitation was prevalent among those less than 30 years of age, the odds were higher in urban areas and increased since 1992. Be as it may marriage remained a persistent nuptiality pattern, and common among the less educated and employed, but lower odds in urban areas. Results from multinomial model suggest that marital status was associated with age at marriage, total children born, region, place of residence, education level and religion. We conclude that marital patterns have undergone significant transformation over the past two decades in Namibia, with a coexistence of traditional marriage framework with co-habitation, and sizeable proportion remaining unmarried to the late 30s. A shift in the singulate mean age is
Logistics support of lunar bases
NASA Astrophysics Data System (ADS)
Woodcock, Gordon R.
Concepts for lunar base design and operations have been studied for over twenty years. A brief summary of past concepts is presented, with reference to new ideas on design, uses and logistics schemes. Transportation systems and mission modes are reviewed, showing the significant cost benefits of reusable transportation at the higher traffic rates representative of lunar base buildup and logistics operations. Parametric studies are presented over the range of base size from 6 crew to 1000 crew, and the "choke points", or barriers to growth are identified and means for their resolution presented. The leverages of food growth on the Moon, crew stay time, use of lunar oxygen, indigenous resources for construction of facilities, and transportation systems size and operations modes are presented. It is concluded that bases as large as 1000 people are affordable at less than twice the cost of an initial base of six people if these leverages of advanced basing technologies are exploited.
Logistic analysis of algae cultivation.
Slegers, P M; Leduc, S; Wijffels, R H; van Straten, G; van Boxtel, A J B
2015-03-01
Energy requirements for resource transport of algae cultivation are unknown. This work describes the quantitative analysis of energy requirements for water and CO2 transport. Algae cultivation models were combined with the quantitative logistic decision model 'BeWhere' for the regions Benelux (Northwest Europe), southern France and Sahara. For photobioreactors, the energy consumed for transport of water and CO2 turns out to be a small percentage of the energy contained in the algae biomass (0.1-3.6%). For raceway ponds the share for transport is higher (0.7-38.5%). The energy consumption for transport is the lowest in the Benelux due to good availability of both water and CO2. Analysing transport logistics is still important, despite the low energy consumption for transport. The results demonstrate that resource requirements, resource distribution and availability and transport networks have a profound effect on the location choices for algae cultivation.
Logistical teamwork tames Madagascar wildcats
Twa, W.
1986-03-01
Amoco Production Company's exploration program in western Madagascar's Sakalaya coastal plain exemplifies the unique logistical challenges both operator and drilling contractor must undergo to reach the few remaining onshore frontier areas. Sakalava is characterized by deep rivers, flood prone tributaries, and a lone 40 km hard-surface road. Problems caused by a lack of port facilities and oil field services are complicated by thousands of square miles of unimproved wooded plains. Rainwater from the nearby mountains of central Madagascar frequently floods rivers in the Sakalava coastal plain leaving impassable marshes in their wake. Prior to this project, about 45 wells had been drilled in Madagascar. Most recently, state oil company Omnis contracted Bawden Drilling International Inc. to drill nine wells for its heavy oil project on the Tsimioro oil prospect. Bawden provided both logistical and drilling services for that program.
Fleet Logistics Center, Puget Sound
2012-08-01
ORGANIZATION NAME(S) AND ADDRESS(ES) Naval Supply Systems Command,Fleet Logistics Center, Puget Sound,467 W Street , Bremerton ,WA,98314-5100 8... Bremerton , WA Established: October 1967 Name Changes: Naval Supply Center Puget Sound, Fleet and Industrial Supply Center Puget...or Sasebo) deployed Ships in the Western Pacific (WestPac) Naval Base Kitsap at Bremerton and Bangor (NBK at Bremerton or Bangor) Navy Region
Frank, Laura K.; Jannasch, Franziska; Kröger, Janine; Bedu-Addo, George; Mockenhaupt, Frank P.; Schulze, Matthias B.; Danquah, Ina
2015-01-01
Reduced rank regression (RRR) is an innovative technique to establish dietary patterns related to biochemical risk factors for type 2 diabetes, but has not been applied in sub-Saharan Africa. In a hospital-based case-control study for type 2 diabetes in Kumasi (diabetes cases, 538; controls, 668) dietary intake was assessed by a specific food frequency questionnaire. After random split of our study population, we derived a dietary pattern in the training set using RRR with adiponectin, HDL-cholesterol and triglycerides as responses and 35 food items as predictors. This pattern score was applied to the validation set, and its association with type 2 diabetes was examined by logistic regression. The dietary pattern was characterized by a high consumption of plantain, cassava, and garden egg, and a low intake of rice, juice, vegetable oil, eggs, chocolate drink, sweets, and red meat; the score correlated positively with serum triglycerides and negatively with adiponectin. The multivariate-adjusted odds ratio of type 2 diabetes for the highest quintile compared to the lowest was 4.43 (95% confidence interval: 1.87–10.50, p for trend < 0.001). The identified dietary pattern increases the odds of type 2 diabetes in urban Ghanaians, which is mainly attributed to increased serum triglycerides. PMID:26198248
A regressive model analysis of congenital sensorineural deafness in German Dalmatian dogs.
Juraschko, Kathrin; Meyer-Lindenberg, Andrea; Nolte, Ingo; Distl, Ottmar
2003-08-01
The objective of the present study was to analyze the mode of inheritance for congenital sensorineural deafness (CSD) in German Dalmatian dogs by consideration of association between phenotypic breed characteristics and CSD. Segregation analysis with regressive logistic models was employed to test for different mechanisms of genetic transmission. Data were obtained from all three Dalmatian kennel clubs associated with the German Association for Dog Breeding and Husbandry (VDH). CSD was tested by veterinary practitioners using standardized protocols for Brainstem Auditory-Evoked Response (BAER). The sample included 1899 Dalmatian dogs from 354 litters in 169 different kennels. BAER testing results were from the years 1986 to 1999. Pedigree information was available for up to seven generations. The segregation analysis showed that a mixed monogenic-polygenic model including eye color as covariate among all other tested models best explained the segregation of affected animals in the pedigrees. The recessive major gene segregated in dogs with blue and brown eye color as well as in dogs with and without pigmented coat patches. Models which took into account the occurrence of patches, percentage of puppies tested per litter, or inbreeding coefficient gave no better adjustment to the most general (saturated) model. A procedure for the simultaneous prediction of breeding values and the estimation of genotype probabilities for CSD is expected to improve breeding programs significantly.
Remotely Adjustable Hydraulic Pump
NASA Technical Reports Server (NTRS)
Kouns, H. H.; Gardner, L. D.
1987-01-01
Outlet pressure adjusted to match varying loads. Electrohydraulic servo has positioned sleeve in leftmost position, adjusting outlet pressure to maximum value. Sleeve in equilibrium position, with control land covering control port. For lowest pressure setting, sleeve shifted toward right by increased pressure on sleeve shoulder from servovalve. Pump used in aircraft and robots, where hydraulic actuators repeatedly turned on and off, changing pump load frequently and over wide range.
NASA Technical Reports Server (NTRS)
Ashby, George C., Jr.; Robbins, W. Eugene; Horsley, Lewis A.
1991-01-01
Probe readily positionable in core of uniform flow in hypersonic wind tunnel. Formed of pair of mating cylindrical housings: transducer housing and pitot-tube housing. Pitot tube supported by adjustable wedge fairing attached to top of pitot-tube housing with semicircular foot. Probe adjusted both radially and circumferentially. In addition, pressure-sensing transducer cooled internally by water or other cooling fluid passing through annulus of cooling system.
A Predictive Logistic Regression Model of World Conflict Using Open Source Data
2015-03-26
model that predicts the future state of the world where nations will either be in a state of “ violent conflict” or “not in violent conflict” based on...state of violent conflict in 2015, seventeen of them are new to conflict since the last published list in 2013. A prediction tool is created to allow...war- game subject matter experts and students to identify future predicted violent conflict and the responsible variables. v Dedication
Comparative Analysis of Decision Trees with Logistic Regression in Predicting Fault-Prone Classes
NASA Astrophysics Data System (ADS)
Singh, Yogesh; Takkar, Arvinder Kaur; Malhotra, Ruchika
There are available metrics for predicting fault prone classes, which may help software organizations for planning and performing testing activities. This may be possible due to proper allocation of resources on fault prone parts of the design and code of the software. Hence, importance and usefulness of such metrics is understandable, but empirical validation of these metrics is always a great challenge. Decision Tree (DT) methods have been successfully applied for solving classification problems in many applications. This paper evaluates the capability of three DT methods and compares its performance with statistical method in predicting fault prone software classes using publicly available NASA data set. The results indicate that the prediction performance of DT is generally better than statistical model. However, similar types of studies are required to be carried out in order to establish the acceptability of the DT models.
ERIC Educational Resources Information Center
Breidenbach, Daniel H.; French, Brian F.
2011-01-01
Many factors can influence a student's decision to withdraw from college. Intervention programs aimed at retention can benefit from understanding the factors related to such decisions, especially in underrepresented groups. The Institutional Integration Scale (IIS) has been suggested as a predictor of student persistence. Accurate prediction of…
Narumalani, S.; Jensen, J.R.; Althausen, J.D.; Burkhalter, S.; Mackey, H.E. Jr.
1994-06-01
Since aquatic macrophytes have an important influence on the physical and chemical processes of an ecosystem while simultaneously affecting human activity, it is imperative that they be inventoried and managed wisely. However, mapping wetlands can be a major challenge because they are found in diverse geographic areas ranging from small tributary streams, to shrub or scrub and marsh communities, to open water lacustrian environments. In addition, the type and spatial distribution of wetlands can change dramatically from season to season, especially when nonpersistent species are present. This research, focuses on developing a model for predicting the future growth and distribution of aquatic macrophytes. This model will use a geographic information system (GIS) to analyze some of the biophysical variables that affect aquatic macrophyte growth and distribution. The data will provide scientists information on the future spatial growth and distribution of aquatic macrophytes. This study focuses on the Savannah River Site Par Pond (1,000 ha) and L Lake (400 ha) these are two cooling ponds that have received thermal effluent from nuclear reactor operations. Par Pond was constructed in 1958, and natural invasion of wetland has occurred over its 35-year history, with much of the shoreline having developed extensive beds of persistent and non-persistent aquatic macrophytes.
ERIC Educational Resources Information Center
Liu, Xing
2008-01-01
The proportional odds (PO) model, which is also called cumulative odds model (Agresti, 1996, 2002 ; Armstrong & Sloan, 1989; Long, 1997, Long & Freese, 2006; McCullagh, 1980; McCullagh & Nelder, 1989; Powers & Xie, 2000; O'Connell, 2006), is one of the most commonly used models for the analysis of ordinal categorical data and comes from the class…
Technology Transfer Automated Retrieval System (TEKTRAN)
Improvement of cold tolerance of winter wheat (Triticum aestivum L.) through breeding methods has been problematic. A better understanding of how individual wheat cultivars respond to components of the freezing process may provide new information that can be used to develop more cold tolerance culti...
ERIC Educational Resources Information Center
Miller, Thomas E.; Herreid, Charlene H.
2009-01-01
This is the fifth in a series of articles describing an attrition prediction and intervention project at the University of South Florida (USF) in Tampa. The project was originally presented in the 83(2) issue (Miller 2007). The statistical model for predicting attrition was described in the 83(3) issue (Miller and Herreid 2008). The methods and…
AP® Potential Predicted by PSAT/NMSQT® Scores Using Logistic Regression. Statistical Report 2014-1
ERIC Educational Resources Information Center
Zhang, Xiuyuan; Patel, Priyank; Ewing, Maureen
2014-01-01
AP Potential™ is an educational guidance tool that uses PSAT/NMSQT® scores to identify students who have the potential to do well on one or more Advanced Placement® (AP®) Exams. Students identified as having AP potential, perhaps students who would not have been otherwise identified, should consider enrolling in the corresponding AP course if they…
ERIC Educational Resources Information Center
Kellermeyer, Steven Bruce
2011-01-01
In the last few decades high-stakes testing has become more political than educational. The Districts within Arizona are bound by the mandates of both AZ LEARNS and the No Child Left Behind Act of 2001. At the time of this writing, both legislative mandates relied on the Arizona Instrument for Measuring Standards (AIMS) as State Tests for gauging…
ERIC Educational Resources Information Center
Del Prette, Zilda Aparecida Pereira; Prette, Almir Del; De Oliveira, Lael Almeida; Gresham, Frank M.; Vance, Michael J.
2012-01-01
Social skills are specific behaviors that individuals exhibit in order to successfully complete social tasks whereas social competence represents judgments by significant others that these social tasks have been successfully accomplished. The present investigation identified the best sociobehavioral predictors obtained from different raters…
2012-03-22
preconditions for unsafe acts which ultimately result in active failures (mishaps). The DoD HFACS classification system has been used to categorize the...reoccur as future individuals repeat the same locally rational acts , while a classification scheme on these errors merely provides a label to what in...boundaries, or public endangerment ”. The second document, AFRLMAN 99-103, defines Class A through C mishaps much like the DoD classification
Exploring Person Fit with an Approach Based on Multilevel Logistic Regression
ERIC Educational Resources Information Center
Walker, A. Adrienne; Engelhard, George, Jr.
2015-01-01
The idea that test scores may not be valid representations of what students know, can do, and should learn next is well known. Person fit provides an important aspect of validity evidence. Person fit analyses at the individual student level are not typically conducted and person fit information is not communicated to educational stakeholders. In…
NASA Astrophysics Data System (ADS)
Uzunova, Yordanka; Prodanova, Krasimira; Spassov, Lubomir
2016-12-01
Orthotopic liver transplantation (OLT) is the only curative treatment for end-stage liver disease. Early diagnosis and treatment of infections after OLT are usually associated with improved outcomes. This study's objective is to identify reliable factors that can predict postoperative infectious morbidity. 27 children were included in the analysis. They underwent liver transplantation in our department. The correlation between two parameters (the level of blood glucose at 5th postoperative day and the duration of the anhepatic phase) and postoperative infections was analyzed, using univariate analysis. In this analysis, an independent predictive factor was derived which adequately identifies patients at risk of infectious complications after a liver transplantation.
ERIC Educational Resources Information Center
Tang, Hui; Kirk, John; Pienta, Norbert J.
2014-01-01
This paper includes two experiments, one investigating complexity factors in stoichiometry word problems, and the other identifying students' problem-solving protocols by using eye-tracking technology. The word problems used in this study had five different complexity factors, which were randomly assigned by a Web-based tool that we developed. The…
Comparing the Discrete and Continuous Logistic Models
ERIC Educational Resources Information Center
Gordon, Sheldon P.
2008-01-01
The solutions of the discrete logistic growth model based on a difference equation and the continuous logistic growth model based on a differential equation are compared and contrasted. The investigation is conducted using a dynamic interactive spreadsheet. (Contains 5 figures.)
EMS incident management: emergency medical logistics.
Maniscalco, P M; Christen, H T
1999-01-01
If you had to get x amount of supplies to point A or point B, or both, in 10 minutes, how would you do it? The answer lies in the following steps: 1. Develop a logistics plan. 2. Use emergency management as a partner agency for developing your logistics plan. 3. Implement a push logistics system by determining what supplies/medications and equipment are important. 4. Place mass casualty/disaster caches at key locations for rapid deployment. Have medication/fluid caches available at local hospitals. 5. Develop and implement command caches for key supervisors and managers. 6. Anticipate the logistics requirements of a terrorism/tactical violence event based on a community threat assessment. 7. Educate the public about preparing a BLS family disaster kit. 8. Test logistics capabilities at disaster exercises. 9. Budget for logistics needs. 10. Never underestimate the importance of logistics. When logistics support fails, the EMS system fails.
Weighted triangulation adjustment
Anderson, Walter L.
1969-01-01
The variation of coordinates method is employed to perform a weighted least squares adjustment of horizontal survey networks. Geodetic coordinates are required for each fixed and adjustable station. A preliminary inverse geodetic position computation is made for each observed line. Weights associated with each observed equation for direction, azimuth, and distance are applied in the formation of the normal equations in-the least squares adjustment. The number of normal equations that may be solved is twice the number of new stations and less than 150. When the normal equations are solved, shifts are produced at adjustable stations. Previously computed correction factors are applied to the shifts and a most probable geodetic position is found for each adjustable station. Pinal azimuths and distances are computed. These may be written onto magnetic tape for subsequent computation of state plane or grid coordinates. Input consists of punch cards containing project identification, program options, and position and observation information. Results listed include preliminary and final positions, residuals, observation equations, solution of the normal equations showing magnitudes of shifts, and a plot of each adjusted and fixed station. During processing, data sets containing irrecoverable errors are rejected and the type of error is listed. The computer resumes processing of additional data sets.. Other conditions cause warning-errors to be issued, and processing continues with the current data set.
Practical Session: Simple Linear Regression
NASA Astrophysics Data System (ADS)
Clausel, M.; Grégoire, G.
2014-12-01
Two exercises are proposed to illustrate the simple linear regression. The first one is based on the famous Galton's data set on heredity. We use the lm R command and get coefficients estimates, standard error of the error, R2, residuals …In the second example, devoted to data related to the vapor tension of mercury, we fit a simple linear regression, predict values, and anticipate on multiple linear regression. This pratical session is an excerpt from practical exercises proposed by A. Dalalyan at EPNC (see Exercises 1 and 2 of http://certis.enpc.fr/~dalalyan/Download/TP_ENPC_4.pdf).
The Evolution of Centralized Operational Logistics
2012-05-17
Command Reports 1-10; Lieutenant General William G. Pagonis and Jeffery L . Cruikshank. Moving Mountains: Lessons in Leadership and Logistics from the Gulf...1943- 1945, (Washington, DC: Government Printing Office, 1955), 320-321. 22 Alan Gropman, The Big " L "- American Logistics in World War II...29 Carter B. Magruder, Recurring Logistic Problems As I Have Observed Them, 29. 30 Ibid., 245; Lieutenant Colonel Robert L . Burke, “Corps Logistic
Research on 6R Military Logistics Network
NASA Astrophysics Data System (ADS)
Jie, Wan; Wen, Wang
The building of military logistics network is an important issue for the construction of new forces. This paper has thrown out a concept model of 6R military logistics network model based on JIT. Then we conceive of axis spoke y logistics centers network, flexible 6R organizational network, lean 6R military information network based grid. And then the strategy and proposal for the construction of the three sub networks of 6Rmilitary logistics network are given.
Nowcasting sunshine number using logistic modeling
NASA Astrophysics Data System (ADS)
Brabec, Marek; Badescu, Viorel; Paulescu, Marius
2013-04-01
In this paper, we present a formalized approach to statistical modeling of the sunshine number, binary indicator of whether the Sun is covered by clouds introduced previously by Badescu (Theor Appl Climatol 72:127-136, 2002). Our statistical approach is based on Markov chain and logistic regression and yields fully specified probability models that are relatively easily identified (and their unknown parameters estimated) from a set of empirical data (observed sunshine number and sunshine stability number series). We discuss general structure of the model and its advantages, demonstrate its performance on real data and compare its results to classical ARIMA approach as to a competitor. Since the model parameters have clear interpretation, we also illustrate how, e.g., their inter-seasonal stability can be tested. We conclude with an outlook to future developments oriented to construction of models allowing for practically desirable smooth transition between data observed with different frequencies and with a short discussion of technical problems that such a goal brings.
Crisis Management- Operational Logistics & Asset Visibility Technologies
2006-06-01
greatly increase the effectiveness of future crisis response operations. The proposed logistics framework serves as a viable solution for common...increase the effectiveness of future crisis response operations. The proposed logistics framework serves as a viable solution for common logistical...FRAMEWORK............................................................29 A. INTEGRATED COMMUNICATIONS NETWORK MODEL ................29 B. INTEGRATED
Forecasting Workload for Defense Logistics Agency Distribution
2014-12-01
NAVAL POSTGRADUATE SCHOOL MONTEREY, CALIFORNIA MBA PROFESSIONAL REPORT FORECASTING WORKLOAD FOR DEFENSE LOGISTICS AGENCY...DATE December 2014 3. REPORT TYPE AND DATES COVERED MBA Professional Report 4. TITLE AND SUBTITLE FORECASTING WORKLOAD FOR DEFENSE LOGISTICS ...maximum 200 words) The Defense Logistics Agency (DLA) predicts issue and receipt workload for its distribution agency in order to maintain
Naval Logistics Integration Through Interoperable Supply Systems
2014-06-13
NAVAL LOGISTICS INTEGRATION THROUGH INTEROPERABLE SUPPLY SYSTEMS A thesis presented to the Faculty of the U.S. Army Command...Thesis 3. DATES COVERED (From - To) AUG 2013 – JUNE 2014 4. TITLE AND SUBTITLE Naval Logistics Integration through Interoperable Supply Systems...Unlimited 13. SUPPLEMENTARY NOTES 14. ABSTRACT This research investigates how the Navy and the Marine Corps could increase Naval Logistics
Logistics hardware and services control system
NASA Technical Reports Server (NTRS)
Koromilas, A.; Miller, K.; Lamb, T.
1973-01-01
Software system permits onsite direct control of logistics operations, which include spare parts, initial installation, tool control, and repairable parts status and control, through all facets of operations. System integrates logistics actions and controls receipts, issues, loans, repairs, fabrications, and modifications and assets in predicting and allocating logistics parts and services effectively.
Regional regression of flood characteristics employing historical information
Tasker, Gary D.; Stedinger, J.R.
1987-01-01
Streamflow gauging networks provide hydrologic information for use in estimating the parameters of regional regression models. The regional regression models can be used to estimate flood statistics, such as the 100 yr peak, at ungauged sites as functions of drainage basin characteristics. A recent innovation in regional regression is the use of a generalized least squares (GLS) estimator that accounts for unequal station record lengths and sample cross correlation among the flows. However, this technique does not account for historical flood information. A method is proposed here to adjust this generalized least squares estimator to account for possible information about historical floods available at some stations in a region. The historical information is assumed to be in the form of observations of all peaks above a threshold during a long period outside the systematic record period. A Monte Carlo simulation experiment was performed to compare the GLS estimator adjusted for historical floods with the unadjusted GLS estimator and the ordinary least squares estimator. Results indicate that using the GLS estimator adjusted for historical information significantly improves the regression model. ?? 1987.
Multiple Regression and Its Discontents
ERIC Educational Resources Information Center
Snell, Joel C.; Marsh, Mitchell
2012-01-01
Multiple regression is part of a larger statistical strategy originated by Gauss. The authors raise questions about the theory and suggest some changes that would make room for Mandelbrot and Serendipity.
Wrong Signs in Regression Coefficients
NASA Technical Reports Server (NTRS)
McGee, Holly
1999-01-01
When using parametric cost estimation, it is important to note the possibility of the regression coefficients having the wrong sign. A wrong sign is defined as a sign on the regression coefficient opposite to the researcher's intuition and experience. Some possible causes for the wrong sign discussed in this paper are a small range of x's, leverage points, missing variables, multicollinearity, and computational error. Additionally, techniques for determining the cause of the wrong sign are given.
Recirculating valve lash adjuster
Stoody, R.R.
1987-02-24
This patent describes an internal combustion engine with a valve assembly of the type including overhead valves supported by a cylinder head for opening and closing movements in a substantially vertical direction and a rotatable overhead camshaft thereabove lubricated by engine oil pumped by an engine oil pump. A hydraulic lash adjuster with an internal reservoir therein is solely supplied with run-off lubricating oil from the camshaft which oil is pumped into the internal reservoir of the lash adjuster by self-pumping operation of the lash adjuster produced by lateral forces thereon by the rotative operation of the camshaft comprising: a housing of the lash adjuster including an axially extending bore therethrough with a lower wall means of the housing closing the lower end thereof; a first plunger member being closely slidably received in the bore of the housing and having wall means defining a fluid filled power chamber with the lower wall means of the housing; and a second plunger member of the lash adjuster having a portion being loosely slidably received and extending into the bore of the housing for reciprocation therein. Another portion extends upwardly from the housing to operatively receive alternating side-to-side force inputs from operation of the camshaft.
Mini pressurized logistics module (MPLM)
NASA Astrophysics Data System (ADS)
Vallerani, E.; Brondolo, D.; Basile, L.
1996-06-01
The MPLM Program was initiated through a Memorandum of Understanding (MOU) between the United States' National Aeronautics and Space Administration (NASA) and Italy's ASI, the Italian Space Agency, that was signed on 6 December 1991. The MPLM is a pressurized logistics module that will be used to transport supplies and materials (up to 20,000 lb), including user experiments, between Earth and International Space Station Alpha (ISSA) using the Shuttle, to support active and passive storage, and to provide a habitable environment for two people when docked to the Station. The Italian Space Agency has selected Alenia Spazio to develop MPLM modules that have always been considered a key element for the new International Space Station taking benefit from its design flexibility and consequent possible cost saving based on the maximum utilization of the Shuttle launch capability for any mission. In the frame of the very recent agreement between the U.S. and Russia for cooperation in space, that foresees the utilization of MIR 1 hardware, the Italian MPLM will remain an important element of the logistics system, being the only pressurized module designed for re-entry. Within the new scenario of anticipated Shuttle flights to MIR 1 during Space Station phase 1, MPLM remains a candidate for one or more missions to provide MIR 1 resupply capabilities and advanced ISSA hardware/procedures verification. Based on the concept of Flexible Carriers, Alenia Spazio is providing NASA with three MPLM flight units that can be configured according to the requirements of the Human-Tended Capability (HTC) and Permanent Human Capability (PHC) of the Space Station. Configurability will allow transportation of passive cargo only, or a combination of passive and cold cargo accommodated in R/F racks. Having developed and qualified the baseline configuration with respect to the worst enveloping condition, each unit could be easily configured to the passive or active version depending upon the
Background stratified Poisson regression analysis of cohort data.
Richardson, David B; Langholz, Bryan
2012-03-01
Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as 'nuisance' variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this 'conditional' regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models.
Incremental learning for ν-Support Vector Regression.
Gu, Bin; Sheng, Victor S; Wang, Zhijie; Ho, Derek; Osman, Said; Li, Shuo
2015-07-01
The ν-Support Vector Regression (ν-SVR) is an effective regression learning algorithm, which has the advantage of using a parameter ν on controlling the number of support vectors and adjusting the width of the tube automatically. However, compared to ν-Support Vector Classification (ν-SVC) (Schölkopf et al., 2000), ν-SVR introduces an additional linear term into its objective function. Thus, directly applying the accurate on-line ν-SVC algorithm (AONSVM) to ν-SVR will not generate an effective initial solution. It is the main challenge to design an incremental ν-SVR learning algorithm. To overcome this challenge, we propose a special procedure called initial adjustments in this paper. This procedure adjusts the weights of ν-SVC based on the Karush-Kuhn-Tucker (KKT) conditions to prepare an initial solution for the incremental learning. Combining the initial adjustments with the two steps of AONSVM produces an exact and effective incremental ν-SVR learning algorithm (INSVR). Theoretical analysis has proven the existence of the three key inverse matrices, which are the cornerstones of the three steps of INSVR (including the initial adjustments), respectively. The experiments on benchmark datasets demonstrate that INSVR can avoid the infeasible updating paths as far as possible, and successfully converges to the optimal solution. The results also show that INSVR is faster than batch ν-SVR algorithms with both cold and warm starts.
Eugster, Patrick; Sennhauser, Michèle; Zweifel, Peter
2010-07-01
When premiums are community-rated, risk adjustment (RA) serves to mitigate competitive insurers' incentive to select favorable risks. However, unless fully prospective, it also undermines their incentives for efficiency. By capping its volume, one may try to counteract this tendency, exposing insurers to some financial risk. This in term runs counter the quest to refine the RA formula, which would increase RA volume. Specifically, the adjuster, "Hospitalization or living in a nursing home during the previous year" will be added in Switzerland starting 2012. This paper investigates how to minimize the opportunity cost of capping RA in terms of increased incentives for risk selection.
Accounting for Slipping and Other False Negatives in Logistic Models of Student Learning
ERIC Educational Resources Information Center
MacLellan, Christopher J.; Liu, Ran; Koedinger, Kenneth R.
2015-01-01
Additive Factors Model (AFM) and Performance Factors Analysis (PFA) are two popular models of student learning that employ logistic regression to estimate parameters and predict performance. This is in contrast to Bayesian Knowledge Tracing (BKT) which uses a Hidden Markov Model formalism. While all three models tend to make similar predictions,…
Soh, Chang-Heok; Harrington, David P; Zaslavsky, Alan M
2008-03-01
When variable selection with stepwise regression and model fitting are conducted on the same data set, competition for inclusion in the model induces a selection bias in coefficient estimators away from zero. In proportional hazards regression with right-censored data, selection bias inflates the absolute value of parameter estimate of selected parameters, while the omission of other variables may shrink coefficients toward zero. This paper explores the extent of the bias in parameter estimates from stepwise proportional hazards regression and proposes a bootstrap method, similar to those proposed by Miller (Subset Selection in Regression, 2nd edn. Chapman & Hall/CRC, 2002) for linear regression, to correct for selection bias. We also use bootstrap methods to estimate the standard error of the adjusted estimators. Simulation results show that substantial biases could be present in uncorrected stepwise estimators and, for binary covariates, could exceed 250% of the true parameter value. The simulations also show that the conditional mean of the proposed bootstrap bias-corrected parameter estimator, given that a variable is selected, is moved closer to the unconditional mean of the standard partial likelihood estimator in the chosen model, and to the population value of the parameter. We also explore the effect of the adjustment on estimates of log relative risk, given the values of the covariates in a selected model. The proposed method is illustrated with data sets in primary biliary cirrhosis and in multiple myeloma from the Eastern Cooperative Oncology Group.
XRA image segmentation using regression
NASA Astrophysics Data System (ADS)
Jin, Jesse S.
1996-04-01
Segmentation is an important step in image analysis. Thresholding is one of the most important approaches. There are several difficulties in segmentation, such as automatic selecting threshold, dealing with intensity distortion and noise removal. We have developed an adaptive segmentation scheme by applying the Central Limit Theorem in regression. A Gaussian regression is used to separate the distribution of background from foreground in a single peak histogram. The separation will help to automatically determine the threshold. A small 3 by 3 widow is applied and the modal of the local histogram is used to overcome noise. Thresholding is based on local weighting, where regression is used again for parameter estimation. A connectivity test is applied to the final results to remove impulse noise. We have applied the algorithm to x-ray angiogram images to extract brain arteries. The algorithm works well for single peak distribution where there is no valley in the histogram. The regression provides a method to apply knowledge in clustering. Extending regression for multiple-level segmentation needs further investigation.
Psychological Adjustment and Homosexuality.
ERIC Educational Resources Information Center
Gonsiorek, John C.
In this paper, the diverse literature bearing on the topic of homosexuality and psychological adjustment is critically reviewed and synthesized. The first chapter discusses the most crucial methodological issue in this area, the problem of sampling. The kinds of samples used to date are critically examined, and some suggestions for improved…
NASA Technical Reports Server (NTRS)
1986-01-01
Corning Glass Works' Serengeti Driver sunglasses are unique in that their lenses self-adjust and filter light while suppressing glare. They eliminate more than 99% of the ultraviolet rays in sunlight. The frames are based on the NASA Anthropometric Source Book.
Hunter, Steven L.
2002-01-01
An inclinometer utilizing synchronous demodulation for high resolution and electronic offset adjustment provides a wide dynamic range without any moving components. A device encompassing a tiltmeter and accompanying electronic circuitry provides quasi-leveled tilt sensors that detect highly resolved tilt change without signal saturation.
Regressive evolution in Astyanax cavefish.
Jeffery, William R
2009-01-01
A diverse group of animals, including members of most major phyla, have adapted to life in the perpetual darkness of caves. These animals are united by the convergence of two regressive phenotypes, loss of eyes and pigmentation. The mechanisms of regressive evolution are poorly understood. The teleost Astyanax mexicanus is of special significance in studies of regressive evolution in cave animals. This species includes an ancestral surface dwelling form and many con-specific cave-dwelling forms, some of which have evolved their recessive phenotypes independently. Recent advances in Astyanax development and genetics have provided new information about how eyes and pigment are lost during cavefish evolution; namely, they have revealed some of the molecular and cellular mechanisms involved in trait modification, the number and identity of the underlying genes and mutations, the molecular basis of parallel evolution, and the evolutionary forces driving adaptation to the cave environment.
2014-01-01
Background Risk adjustment is crucial for comparison of outcome in medical care. Knowledge of the external factors that impact measured outcome but that cannot be influenced by the physician is a prerequisite for this adjustment. To date, a universal and reproducible method for identification of the relevant external factors has not been published. The selection of external factors in current quality assurance programmes is mainly based on expert opinion. We propose and demonstrate a methodology for identification of external factors requiring risk adjustment of outcome indicators and we apply it to a cataract surgery register. Methods Defined test criteria to determine the relevance for risk adjustment are “clinical relevance” and “statistical significance”. Clinical relevance of the association is presumed when observed success rates of the indicator in the presence and absence of the external factor exceed a pre-specified range of 10%. Statistical significance of the association between the external factor and outcome indicators is assessed by univariate stratification and multivariate logistic regression adjustment. The cataract surgery register was set up as part of a German multi-centre register trial for out-patient cataract surgery in three high-volume surgical sites. A total of 14,924 patient follow-ups have been documented since 2005. Eight external factors potentially relevant for risk adjustment were related to the outcome indicators “refractive accuracy” and “visual rehabilitation” 2–5 weeks after surgery. Results The clinical relevance criterion confirmed 2 (“refractive accuracy”) and 5 (“visual rehabilitation”) external factors. The significance criterion was verified in two ways. Univariate and multivariate analyses revealed almost identical external factors: 4 were related to “refractive accuracy” and 7 (6) to “visual rehabilitation”. Two (“refractive accuracy”) and 5 (“visual rehabilitation”) factors
A linear regression solution to the spatial autocorrelation problem
NASA Astrophysics Data System (ADS)
Griffith, Daniel A.
The Moran Coefficient spatial autocorrelation index can be decomposed into orthogonal map pattern components. This decomposition relates it directly to standard linear regression, in which corresponding eigenvectors can be used as predictors. This paper reports comparative results between these linear regressions and their auto-Gaussian counterparts for the following georeferenced data sets: Columbus (Ohio) crime, Ottawa-Hull median family income, Toronto population density, southwest Ohio unemployment, Syracuse pediatric lead poisoning, and Glasgow standard mortality rates, and a small remotely sensed image of the High Peak district. This methodology is extended to auto-logistic and auto-Poisson situations, with selected data analyses including percentage of urban population across Puerto Rico, and the frequency of SIDs cases across North Carolina. These data analytic results suggest that this approach to georeferenced data analysis offers considerable promise.
Cactus: An Introduction to Regression
ERIC Educational Resources Information Center
Hyde, Hartley
2008-01-01
When the author first used "VisiCalc," the author thought it a very useful tool when he had the formulas. But how could he design a spreadsheet if there was no known formula for the quantities he was trying to predict? A few months later, the author relates he learned to use multiple linear regression software and suddenly it all clicked into…
Multiple Regression: A Leisurely Primer.
ERIC Educational Resources Information Center
Daniel, Larry G.; Onwuegbuzie, Anthony J.
Multiple regression is a useful statistical technique when the researcher is considering situations in which variables of interest are theorized to be multiply caused. It may also be useful in those situations in which the researchers is interested in studies of predictability of phenomena of interest. This paper provides an introduction to…
Weighting Regressions by Propensity Scores
ERIC Educational Resources Information Center
Freedman, David A.; Berk, Richard A.
2008-01-01
Regressions can be weighted by propensity scores in order to reduce bias. However, weighting is likely to increase random error in the estimates, and to bias the estimated standard errors downward, even when selection mechanisms are well understood. Moreover, in some cases, weighting will increase the bias in estimated causal parameters. If…
Quantile Regression with Censored Data
ERIC Educational Resources Information Center
Lin, Guixian
2009-01-01
The Cox proportional hazards model and the accelerated failure time model are frequently used in survival data analysis. They are powerful, yet have limitation due to their model assumptions. Quantile regression offers a semiparametric approach to model data with possible heterogeneity. It is particularly powerful for censored responses, where the…
Integrated Computer System of Management in Logistics
NASA Astrophysics Data System (ADS)
Chwesiuk, Krzysztof
2011-06-01
This paper aims at presenting a concept of an integrated computer system of management in logistics, particularly in supply and distribution chains. Consequently, the paper includes the basic idea of the concept of computer-based management in logistics and components of the system, such as CAM and CIM systems in production processes, and management systems for storage, materials flow, and for managing transport, forwarding and logistics companies. The platform which integrates computer-aided management systems is that of electronic data interchange.
Optimal distributions for multiplex logistic networks.
Solá Conde, Luis E; Used, Javier; Romance, Miguel
2016-06-01
This paper presents some mathematical models for distribution of goods in logistic networks based on spectral analysis of complex networks. Given a steady distribution of a finished product, some numerical algorithms are presented for computing the weights in a multiplex logistic network that reach the equilibrium dynamics with high convergence rate. As an application, the logistic networks of Germany and Spain are analyzed in terms of their convergence rates.
Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors
Woodard, Dawn B.; Crainiceanu, Ciprian; Ruppert, David
2013-01-01
We propose a new method for regression using a parsimonious and scientifically interpretable representation of functional predictors. Our approach is designed for data that exhibit features such as spikes, dips, and plateaus whose frequency, location, size, and shape varies stochastically across subjects. We propose Bayesian inference of the joint functional and exposure models, and give a method for efficient computation. We contrast our approach with existing state-of-the-art methods for regression with functional predictors, and show that our method is more effective and efficient for data that include features occurring at varying locations. We apply our methodology to a large and complex dataset from the Sleep Heart Health Study, to quantify the association between sleep characteristics and health outcomes. Software and technical appendices are provided in online supplemental materials. PMID:24293988
NASA Technical Reports Server (NTRS)
Malin, Jane T.; Schrenkenghost, Debra K.
2001-01-01
The Adjustable Autonomy Testbed (AAT) is a simulation-based testbed located in the Intelligent Systems Laboratory in the Automation, Robotics and Simulation Division at NASA Johnson Space Center. The purpose of the testbed is to support evaluation and validation of prototypes of adjustable autonomous agent software for control and fault management for complex systems. The AA T project has developed prototype adjustable autonomous agent software and human interfaces for cooperative fault management. This software builds on current autonomous agent technology by altering the architecture, components and interfaces for effective teamwork between autonomous systems and human experts. Autonomous agents include a planner, flexible executive, low level control and deductive model-based fault isolation. Adjustable autonomy is intended to increase the flexibility and effectiveness of fault management with an autonomous system. The test domain for this work is control of advanced life support systems for habitats for planetary exploration. The CONFIG hybrid discrete event simulation environment provides flexible and dynamically reconfigurable models of the behavior of components and fluids in the life support systems. Both discrete event and continuous (discrete time) simulation are supported, and flows and pressures are computed globally. This provides fast dynamic simulations of interacting hardware systems in closed loops that can be reconfigured during operations scenarios, producing complex cascading effects of operations and failures. Current object-oriented model libraries support modeling of fluid systems, and models have been developed of physico-chemical and biological subsystems for processing advanced life support gases. In FY01, water recovery system models will be developed.
Cutburth, Ronald W.; Silva, Leonard L.
1988-01-01
An improved mounting stage of the type used for the detection of laser beams is disclosed. A stage center block is mounted on each of two opposite sides by a pair of spaced ball bearing tracks which provide stability as well as simplicity. The use of the spaced ball bearing pairs in conjunction with an adjustment screw which also provides support eliminates extraneous stabilization components and permits maximization of the area of the center block laser transmission hole.
Logistics Reduction Technologies for Exploration Missions
NASA Technical Reports Server (NTRS)
Broyan, James L., Jr.; Ewert, Michael K.; Fink, Patrick W.
2014-01-01
Human exploration missions under study are limited by the launch mass capacity of existing and planned launch vehicles. The logistical mass of crew items is typically considered separate from the vehicle structure, habitat outfitting, and life support systems. Although mass is typically the focus of exploration missions, due to its strong impact on launch vehicle and habitable volume for the crew, logistics volume also needs to be considered. NASA's Advanced Exploration Systems (AES) Logistics Reduction and Repurposing (LRR) Project is developing six logistics technologies guided by a systems engineering cradle-to-grave approach to enable after-use crew items to augment vehicle systems. Specifically, AES LRR is investigating the direct reduction of clothing mass, the repurposing of logistical packaging, the use of autonomous logistics management technologies, the processing of spent crew items to benefit radiation shielding and water recovery, and the conversion of trash to propulsion gases. Reduction of mass has a corresponding and significant impact to logistical volume. The reduction of logistical volume can reduce the overall pressurized vehicle mass directly, or indirectly benefit the mission by allowing for an increase in habitable volume during the mission. The systematic implementation of these types of technologies will increase launch mass efficiency by enabling items to be used for secondary purposes and improve the habitability of the vehicle as mission durations increase. Early studies have shown that the use of advanced logistics technologies can save approximately 20 m(sup 3) of volume during transit alone for a six-person Mars conjunction class mission.
Analysis of Jingdong Mall Logistics Distribution Model
NASA Astrophysics Data System (ADS)
Shao, Kang; Cheng, Feng
In recent years, the development of electronic commerce in our country to speed up the pace. The role of logistics has been highlighted, more and more electronic commerce enterprise are beginning to realize the importance of logistics in the success or failure of the enterprise. In this paper, the author take Jingdong Mall for example, performing a SWOT analysis of their current situation of self-built logistics system, find out the problems existing in the current Jingdong Mall logistics distribution and give appropriate recommendations.
Regression Verification Using Impact Summaries
NASA Technical Reports Server (NTRS)
Backes, John; Person, Suzette J.; Rungta, Neha; Thachuk, Oksana
2013-01-01
Regression verification techniques are used to prove equivalence of syntactically similar programs. Checking equivalence of large programs, however, can be computationally expensive. Existing regression verification techniques rely on abstraction and decomposition techniques to reduce the computational effort of checking equivalence of the entire program. These techniques are sound but not complete. In this work, we propose a novel approach to improve scalability of regression verification by classifying the program behaviors generated during symbolic execution as either impacted or unimpacted. Our technique uses a combination of static analysis and symbolic execution to generate summaries of impacted program behaviors. The impact summaries are then checked for equivalence using an o-the-shelf decision procedure. We prove that our approach is both sound and complete for sequential programs, with respect to the depth bound of symbolic execution. Our evaluation on a set of sequential C artifacts shows that reducing the size of the summaries can help reduce the cost of software equivalence checking. Various reduction, abstraction, and compositional techniques have been developed to help scale software verification techniques to industrial-sized systems. Although such techniques have greatly increased the size and complexity of systems that can be checked, analysis of large software systems remains costly. Regression analysis techniques, e.g., regression testing [16], regression model checking [22], and regression verification [19], restrict the scope of the analysis by leveraging the differences between program versions. These techniques are based on the idea that if code is checked early in development, then subsequent versions can be checked against a prior (checked) version, leveraging the results of the previous analysis to reduce analysis cost of the current version. Regression verification addresses the problem of proving equivalence of closely related program
Psychosocial adjustment to ALS: a longitudinal study.
Matuz, Tamara; Birbaumer, Niels; Hautzinger, Martin; Kübler, Andrea
2015-01-01
For the current study the Lazarian stress-coping theory and the appendant model of psychosocial adjustment to chronic illness and disabilities (Pakenham, 1999) has shaped the foundation for identifying determinants of adjustment to ALS. We aimed to investigate the evolution of psychosocial adjustment to ALS and to determine its long-term predictors. A longitudinal study design with four measurement time points was therefore, used to assess patients' quality of life, depression, and stress-coping model related aspects, such as illness characteristics, social support, cognitive appraisals, and coping strategies during a period of 2 years. Regression analyses revealed that 55% of the variance of severity of depressive symptoms and 47% of the variance in quality of life at T2 was accounted for by all the T1 predictor variables taken together. On the level of individual contributions, protective buffering, and appraisal of own coping potential accounted for a significant percentage in the variance in severity of depressive symptoms, whereas problem management coping strategies explained variance in quality of life scores. Illness characteristics at T2 did not explain any variance of both adjustment outcomes. Overall, the pattern of the longitudinal results indicated stable depressive symptoms and quality of life indices reflecting a successful adjustment to the disease across four measurement time points during a period of about two years. Empirical evidence is provided for the predictive value of social support, cognitive appraisals, and coping strategies, but not illness parameters such as severity and duration for adaptation to ALS. The current study contributes to a better conceptualization of adjustment, allowing us to provide evidence-based support beyond medical and physical intervention for people with ALS.
Psychosocial adjustment to ALS: a longitudinal study
Matuz, Tamara; Birbaumer, Niels; Hautzinger, Martin; Kübler, Andrea
2015-01-01
For the current study the Lazarian stress-coping theory and the appendant model of psychosocial adjustment to chronic illness and disabilities (Pakenham, 1999) has shaped the foundation for identifying determinants of adjustment to ALS. We aimed to investigate the evolution of psychosocial adjustment to ALS and to determine its long-term predictors. A longitudinal study design with four measurement time points was therefore, used to assess patients' quality of life, depression, and stress-coping model related aspects, such as illness characteristics, social support, cognitive appraisals, and coping strategies during a period of 2 years. Regression analyses revealed that 55% of the variance of severity of depressive symptoms and 47% of the variance in quality of life at T2 was accounted for by all the T1 predictor variables taken together. On the level of individual contributions, protective buffering, and appraisal of own coping potential accounted for a significant percentage in the variance in severity of depressive symptoms, whereas problem management coping strategies explained variance in quality of life scores. Illness characteristics at T2 did not explain any variance of both adjustment outcomes. Overall, the pattern of the longitudinal results indicated stable depressive symptoms and quality of life indices reflecting a successful adjustment to the disease across four measurement time points during a period of about two years. Empirical evidence is provided for the predictive value of social support, cognitive appraisals, and coping strategies, but not illness parameters such as severity and duration for adaptation to ALS. The current study contributes to a better conceptualization of adjustment, allowing us to provide evidence-based support beyond medical and physical intervention for people with ALS. PMID:26441696
Combining biomarkers for classification with covariate adjustment.
Kim, Soyoung; Huang, Ying
2017-03-09
Combining multiple markers can improve classification accuracy compared with using a single marker. In practice, covariates associated with markers or disease outcome can affect the performance of a biomarker or biomarker combination in the population. The covariate-adjusted receiver operating characteristic (ROC) curve has been proposed as a tool to tease out the covariate effect in the evaluation of a single marker; this curve characterizes the classification accuracy solely because of the marker of interest. However, research on the effect of covariates on the performance of marker combinations and on how to adjust for the covariate effect when combining markers is still lacking. In this article, we examine the effect of covariates on classification performance of linear marker combinations and propose to adjust for covariates in combining markers by maximizing the nonparametric estimate of the area under the covariate-adjusted ROC curve. The proposed method provides a way to estimate the best linear biomarker combination that is robust to risk model assumptions underlying alternative regression-model-based methods. The proposed estimator is shown to be consistent and asymptotically normally distributed. We conduct simulations to evaluate the performance of our estimator in cohort and case/control designs and compare several different weighting strategies during estimation with respect to efficiency. Our estimator is also compared with alternative regression-model-based estimators or estimators that maximize the empirical area under the ROC curve, with respect to bias and efficiency. We apply the proposed method to a biomarker study from an human immunodeficiency virus vaccine trial. Copyright © 2017 John Wiley & Sons, Ltd.
3D Regression Heat Map Analysis of Population Study Data.
Klemm, Paul; Lawonn, Kai; Glaßer, Sylvia; Niemann, Uli; Hegenscheid, Katrin; Völzke, Henry; Preim, Bernhard
2016-01-01
Epidemiological studies comprise heterogeneous data about a subject group to define disease-specific risk factors. These data contain information (features) about a subject's lifestyle, medical status as well as medical image data. Statistical regression analysis is used to evaluate these features and to identify feature combinations indicating a disease (the target feature). We propose an analysis approach of epidemiological data sets by incorporating all features in an exhaustive regression-based analysis. This approach combines all independent features w.r.t. a target feature. It provides a visualization that reveals insights into the data by highlighting relationships. The 3D Regression Heat Map, a novel 3D visual encoding, acts as an overview of the whole data set. It shows all combinations of two to three independent features with a specific target disease. Slicing through the 3D Regression Heat Map allows for the detailed analysis of the underlying relationships. Expert knowledge about disease-specific hypotheses can be included into the analysis by adjusting the regression model formulas. Furthermore, the influences of features can be assessed using a difference view comparing different calculation results. We applied our 3D Regression Heat Map method to a hepatic steatosis data set to reproduce results from a data mining-driven analysis. A qualitative analysis was conducted on a breast density data set. We were able to derive new hypotheses about relations between breast density and breast lesions with breast cancer. With the 3D Regression Heat Map, we present a visual overview of epidemiological data that allows for the first time an interactive regression-based analysis of large feature sets with respect to a disease.
Interaction Models for Functional Regression
USSET, JOSEPH; STAICU, ANA-MARIA; MAITY, ARNAB
2015-01-01
A functional regression model with a scalar response and multiple functional predictors is proposed that accommodates two-way interactions in addition to their main effects. The proposed estimation procedure models the main effects using penalized regression splines, and the interaction effect by a tensor product basis. Extensions to generalized linear models and data observed on sparse grids or with measurement error are presented. A hypothesis testing procedure for the functional interaction effect is described. The proposed method can be easily implemented through existing software. Numerical studies show that fitting an additive model in the presence of interaction leads to both poor estimation performance and lost prediction power, while fitting an interaction model where there is in fact no interaction leads to negligible losses. The methodology is illustrated on the AneuRisk65 study data. PMID:26744549
Astronomical Methods for Nonparametric Regression
NASA Astrophysics Data System (ADS)
Steinhardt, Charles L.; Jermyn, Adam
2017-01-01
I will discuss commonly used techniques for nonparametric regression in astronomy. We find that several of them, particularly running averages and running medians, are generically biased, asymmetric between dependent and independent variables, and perform poorly in recovering the underlying function, even when errors are present only in one variable. We then examine less-commonly used techniques such as Multivariate Adaptive Regressive Splines and Boosted Trees and find them superior in bias, asymmetry, and variance both theoretically and in practice under a wide range of numerical benchmarks. In this context the chief advantage of the common techniques is runtime, which even for large datasets is now measured in microseconds compared with milliseconds for the more statistically robust techniques. This points to a tradeoff between bias, variance, and computational resources which in recent years has shifted heavily in favor of the more advanced methods, primarily driven by Moore's Law. Along these lines, we also propose a new algorithm which has better overall statistical properties than all techniques examined thus far, at the cost of significantly worse runtime, in addition to providing guidance on choosing the nonparametric regression technique most suitable to any specific problem. We then examine the more general problem of errors in both variables and provide a new algorithm which performs well in most cases and lacks the clear asymmetry of existing non-parametric methods, which fail to account for errors in both variables.
Biomass Supply Logistics and Infrastructure
Sokhansanj, Shahabaddine
2009-04-01
Feedstock supply system encompasses numerous unit operations necessary to move lignocellulosic feedstock from the place where it is produced (in the field or on the stump) to the start of the conversion process (reactor throat) of the Biorefinery. These unit operations, which include collection, storage, preprocessing, handling, and transportation, represent one of the largest technical and logistics challenges to the emerging lignocellulosic biorefining industry. This chapter briefly reviews methods of estimating the quantities of biomass followed by harvesting and collection processes based on current practices on handling wet and dry forage materials. Storage and queuing are used to deal with seasonal harvest times, variable yields, and delivery schedules. Preprocessing can be as simple as grinding and formatting the biomass for increased bulk density or improved conversion efficiency, or it can be as complex as improving feedstock quality through fractionation, tissue separation, drying, blending, and densification. Handling and Transportation consists of using a variety of transport equipment (truck, train, ship) for moving the biomass from one point to another. The chapter also provides typical cost figures for harvest and processing of biomass.
Biomass supply logistics and infrastructure.
Sokhansanj, Shahabaddine; Hess, J Richard
2009-01-01
Feedstock supply system encompasses numerous unit operations necessary to move lignocellulosic feedstock from the place where it is produced (in the field or on the stump) to the start of the conversion process (reactor throat) of the biorefinery. These unit operations, which include collection, storage, preprocessing, handling, and transportation, represent one of the largest technical and logistics challenges to the emerging lignocellulosic biorefining industry. This chapter briefly reviews the methods of estimating the quantities of biomass, followed by harvesting and collection processes based on current practices on handling wet and dry forage materials. Storage and queuing are used to deal with seasonal harvest times, variable yields, and delivery schedules. Preprocessing can be as simple as grinding and formatting the biomass for increased bulk density or improved conversion efficiency, or it can be as complex as improving feedstock quality through fractionation, tissue separation, drying, blending, and densification. Handling and transportation consists of using a variety of transport equipment (truck, train, ship) for moving the biomass from one point to another. The chapter also provides typical cost figures for harvest and processing of biomass.
FUTURE LOGISTICS AND OPERATIONAL ADAPTABILITY
Houck, Roger P.
2009-10-01
While we cannot predict the future, we can ascertain trends and examine them through the use of alternative futures methodologies and tools. From a logistics perspective, we know that many different futures are possible, all of which are obviously dependent on decisions we make in the present. As professional logisticians we are obligated to provide the field - our Soldiers - with our best professional opinion of what will result in success on the battlefield. Our view of the future should take history and contemporary conflict into account, but it must also consider that continuity with the past cannot be taken for granted. If we are too focused on past and current experience, then our vision of the future will be limited indeed. On the one hand, the future must be explained in language that does not defy common sense. On the other hand, the pace of change is such that we must conduct qualitative and quantitative trend analyses, forecasting, and explorative scenario development in ways that allow for significant breaks - or "shocks" - that may "change the game". We will need capabilities and solutions that are constantly evolving - and improving - to match the operational tempo of a radically changing threat environment. For those who provide quartermaster services, this article will briefly examine what this means from the perspective of creating what might be termed a preferred future.
Improvement in fresh fruit and vegetable logistics quality: berry logistics field studies.
do Nascimento Nunes, M Cecilia; Nicometo, Mike; Emond, Jean Pierre; Melis, Ricardo Badia; Uysal, Ismail
2014-06-13
Shelf life of fresh fruits and vegetables is greatly influenced by environmental conditions. Increasing temperature usually results in accelerated loss of quality and shelf-life reduction, which is not physically visible until too late in the supply chain to adjust logistics to match shelf life. A blackberry study showed that temperatures inside pallets varied significantly and 57% of the berries arriving at the packinghouse did not have enough remaining shelf life for the longest supply routes. Yet, the advanced shelf-life loss was not physically visible. Some of those pallets would be sent on longer supply routes than necessary, creating avoidable waste. Other studies showed that variable pre-cooling at the centre of pallets resulted in physically invisible uneven shelf life. We have shown that using simple temperature measurements much waste can be avoided using 'first expiring first out'. Results from our studies showed that shelf-life prediction should not be based on a single quality factor as, depending on the temperature history, the quality attribute that limits shelf life may vary. Finally, methods to use air temperature to predict product temperature for highest shelf-life prediction accuracy in the absence of individual sensors for each monitored product have been developed. Our results show a significant reduction of up to 98% in the root-mean-square-error difference between the product temperature and air temperature when advanced estimation methods are used.
NASA Technical Reports Server (NTRS)
Farley, Gary L.
1994-01-01
Local characteristics of fabrics varied to suit special applications. Adjustable reed machinery proposed for use in weaving fabrics in various net shapes, widths, yarn spacings, and yarn angles. Locations of edges of fabric and configuration of warp and filling yarns varied along fabric to obtain specified properties. In machinery, reed wires mounted in groups on sliders, mounted on lengthwise rails in reed frame. Mechanisms incorporated to move sliders lengthwise, parallel to warp yarns, by sliding them along rails; move sliders crosswise by translating reed frame rails perpendicular to warp yarns; and crosswise by spreading reed rails within group. Profile of reed wires in group on each slider changed.
The Automated Logistics Element Planning System (ALEPS)
NASA Technical Reports Server (NTRS)
Schwaab, Douglas G.
1991-01-01
The design and functions of ALEPS (Automated Logistics Element Planning System) is a computer system that will automate planning and decision support for Space Station Freedom Logistical Elements (LEs) resupply and return operations. ALEPS provides data management, planning, analysis, monitoring, interfacing, and flight certification for support of LE flight load planning activities. The prototype ALEPS algorithm development is described.
Visualizing the Logistic Map with a Microcontroller
ERIC Educational Resources Information Center
Serna, Juan D.; Joshi, Amitabh
2012-01-01
The logistic map is one of the simplest nonlinear dynamical systems that clearly exhibits the route to chaos. In this paper, we explore the evolution of the logistic map using an open-source microcontroller connected to an array of light-emitting diodes (LEDs). We divide the one-dimensional domain interval [0,1] into ten equal parts, an associate…
Exploration Mission Benefits From Logistics Reduction Technologies
NASA Technical Reports Server (NTRS)
Broyan, James Lee, Jr.; Ewert, Michael K.; Schlesinger, Thilini
2016-01-01
Technologies that reduce logistical mass, volume, and the crew time dedicated to logistics management become more important as exploration missions extend further from the Earth. Even modest reductions in logistical mass can have a significant impact because it also reduces the packaging burden. NASA's Advanced Exploration Systems' Logistics Reduction Project is developing technologies that can directly reduce the mass and volume of crew clothing and metabolic waste collection. Also, cargo bags have been developed that can be reconfigured for crew outfitting, and trash processing technologies are under development to increase habitable volume and improve protection against solar storm events. Additionally, Mars class missions are sufficiently distant that even logistics management without resupply can be problematic due to the communication time delay with Earth. Although exploration vehicles are launched with all consumables and logistics in a defined configuration, the configuration continually changes as the mission progresses. Traditionally significant ground and crew time has been required to understand the evolving configuration and to help locate misplaced items. For key mission events and unplanned contingencies, the crew will not be able to rely on the ground for logistics localization assistance. NASA has been developing a radio-frequency-identification autonomous logistics management system to reduce crew time for general inventory and enable greater crew self-response to unplanned events when a wide range of items may need to be located in a very short time period. This paper provides a status of the technologies being developed and their mission benefits for exploration missions.
The Impact of Financial Sophistication on Adjustable Rate Mortgage Ownership
ERIC Educational Resources Information Center
Smith, Hyrum; Finke, Michael S.; Huston, Sandra J.
2011-01-01
The influence of a financial sophistication scale on adjustable-rate mortgage (ARM) borrowing is explored. Descriptive statistics and regression analysis using recent data from the Survey of Consumer Finances reveal that ARM borrowing is driven by both the least and most financially sophisticated households but for different reasons. Less…
Effects of Relational Authenticity on Adjustment to College
ERIC Educational Resources Information Center
Lenz, A. Stephen; Holman, Rachel L.; Lancaster, Chloe; Gotay, Stephanie G.
2016-01-01
The authors examined the association between relational health and student adjustment to college. Data were collected from 138 undergraduate students completing their 1st semester at a large university in the mid-southern United States. Regression analysis indicated that higher levels of relational authenticity were a predictor of success during…
The Automated Logistics Element Planning System (ALEPS)
NASA Technical Reports Server (NTRS)
Schwaab, Douglas G.
1992-01-01
ALEPS, which is being developed to provide the SSF program with a computer system to automate logistics resupply/return cargo load planning and verification, is presented. ALEPS will make it possible to simultaneously optimize both the resupply flight load plan and the return flight reload plan for any of the logistics carriers. In the verification mode ALEPS will support the carrier's flight readiness reviews and control proper execution of the approved plans. It will also support the SSF inventory management system by providing electronic block updates to the inventory database on the cargo arriving at or departing the station aboard a logistics carrier. A prototype drawer packing algorithm is described which is capable of generating solutions for 3D packing of cargo items into a logistics carrier storage accommodation. It is concluded that ALEPS will provide the capability to generate and modify optimized loading plans for the logistics elements fleet.
Pakenham, Kenneth I; Samios, Christina; Sofronoff, Kate
2005-05-01
The present study examined the applicability of the double ABCX model of family adjustment in explaining maternal adjustment to caring for a child diagnosed with Asperger syndrome. Forty-seven mothers completed questionnaires at a university clinic while their children were participating in an anxiety intervention. The children were aged between 10 and 12 years. Results of correlations showed that each of the model components was related to one or more domains of maternal adjustment in the direction predicted, with the exception of problem-focused coping. Hierarchical regression analyses demonstrated that, after controlling for the effects of relevant demographics, stressor severity, pile-up of demands and coping were related to adjustment. Findings indicate the utility of the double ABCX model in guiding research into parental adjustment when caring for a child with Asperger syndrome. Limitations of the study and clinical implications are discussed.
Continuously adjustable Pulfrich spectacles
NASA Astrophysics Data System (ADS)
Jacobs, Ken; Karpf, Ron
2011-03-01
A number of Pulfrich 3-D movies and TV shows have been produced, but the standard implementation has inherent drawbacks. The movie and TV industries have correctly concluded that the standard Pulfrich 3-D implementation is not a useful 3-D technique. Continuously Adjustable Pulfrich Spectacles (CAPS) is a new implementation of the Pulfrich effect that allows any scene containing movement in a standard 2-D movie, which are most scenes, to be optionally viewed in 3-D using inexpensive viewing specs. Recent scientific results in the fields of human perception, optoelectronics, video compression and video format conversion are translated into a new implementation of Pulfrich 3- D. CAPS uses these results to continuously adjust to the movie so that the viewing spectacles always conform to the optical density that optimizes the Pulfrich stereoscopic illusion. CAPS instantly provides 3-D immersion to any moving scene in any 2-D movie. Without the glasses, the movie will appear as a normal 2-D image. CAPS work on any viewing device, and with any distribution medium. CAPS is appropriate for viewing Internet streamed movies in 3-D.
An assessment of precipitation adjustment and feedback computation methods
NASA Astrophysics Data System (ADS)
Richardson, T. B.; Samset, B. H.; Andrews, T.; Myhre, G.; Forster, P. M.
2016-10-01
The precipitation adjustment and feedback framework is a useful tool for understanding global and regional precipitation changes. However, there is no definitive method for making the decomposition. In this study we highlight important differences which arise in results due to methodological choices. The responses to five different forcing agents (CO2, CH4, SO4, black carbon, and solar insolation) are analyzed using global climate model simulations. Three decomposition methods are compared: using fixed sea surface temperature experiments (fSST), regressing transient climate change after an abrupt forcing (regression), and separating based on timescale using the first year of coupled simulations (YR1). The YR1 method is found to incorporate significant SST-driven feedbacks into the adjustment and is therefore not suitable for making the decomposition. Globally, the regression and fSST methods produce generally consistent results; however, the regression values are dependent on the number of years analyzed and have considerably larger uncertainties. Regionally, there are substantial differences between methods. The pattern of change calculated using regression reverses sign in many regions as the number of years analyzed increases. This makes it difficult to establish what effects are included in the decomposition. The fSST method provides a more clear-cut separation in terms of what physical drivers are included in each component. The fSST results are less affected by methodological choices and exhibit much less variability. We find that the precipitation adjustment is weakly affected by the choice of SST climatology.
Fazaeli, S; Yousefi, M; Banikazemi, S H; Ghazizadeh Hashemi, Sah; Khorsand, A; Badiee, Sh
2015-01-01
Responsiveness was proposed via WHO as a fundamental sign to evaluate the enforcement of wellness practices and evaluates with a standard organization of fields that are classified to 2 principal classes "Respect as characters" and "customer adjustment". The current research included the value of customer adjustment areas in low and high-income communities of Mashhad. In the current descriptive research, an example of 923 families was chosen stochastically of 2 low and high pay areas of Mashhad. WHO survey employed for information gathering. Regular rate reviews and Ordinal Logistic Regression (OLR) applied for information investigation. In overall, respondents chose basic amenities quality as the primary area, and the path to social care networks recognized as the wicked primary area. Families in high-income states obtained higher areas of immediate notations and selection associated with low-income. There is a meaningful correlation among parameters of ages, having a part whom required care and self-imposed health via the ranking of customer adjustment areas. The investigation of the homes' viewpoint concerning the classification of non-clinical perspectives of care quality, particularly while confronted by restricted sources, can assist in managing enterprises towards topics that are more relevant and results in the development of the wellness policy achievement and fecundity.
Regression analysis of cytopathological data
Whittemore, A.S.; McLarty, J.W.; Fortson, N.; Anderson, K.
1982-12-01
Epithelial cells from the human body are frequently labelled according to one of several ordered levels of abnormality, ranging from normal to malignant. The label of the most abnormal cell in a specimen determines the score for the specimen. This paper presents a model for the regression of specimen scores against continuous and discrete variables, as in host exposure to carcinogens. Application to data and tests for adequacy of model fit are illustrated using sputum specimens obtained from a cohort of former asbestos workers.
Logistics Modeling for Lunar Exploration Systems
NASA Technical Reports Server (NTRS)
Andraschko, Mark R.; Merrill, R. Gabe; Earle, Kevin D.
2008-01-01
The extensive logistics required to support extended crewed operations in space make effective modeling of logistics requirements and deployment critical to predicting the behavior of human lunar exploration systems. This paper discusses the software that has been developed as part of the Campaign Manifest Analysis Tool in support of strategic analysis activities under the Constellation Architecture Team - Lunar. The described logistics module enables definition of logistics requirements across multiple surface locations and allows for the transfer of logistics between those locations. A key feature of the module is the loading algorithm that is used to efficiently load logistics by type into carriers and then onto landers. Attention is given to the capabilities and limitations of this loading algorithm, particularly with regard to surface transfers. These capabilities are described within the context of the object-oriented software implementation, with details provided on the applicability of using this approach to model other human exploration scenarios. Some challenges of incorporating probabilistics into this type of logistics analysis model are discussed at a high level.
ISS Logistics Hardware Disposition and Metrics Validation
NASA Technical Reports Server (NTRS)
Rogers, Toneka R.
2010-01-01
I was assigned to the Logistics Division of the International Space Station (ISS)/Spacecraft Processing Directorate. The Division consists of eight NASA engineers and specialists that oversee the logistics portion of the Checkout, Assembly, and Payload Processing Services (CAPPS) contract. Boeing, their sub-contractors and the Boeing Prime contract out of Johnson Space Center, provide the Integrated Logistics Support for the ISS activities at Kennedy Space Center. Essentially they ensure that spares are available to support flight hardware processing and the associated ground support equipment (GSE). Boeing maintains a Depot for electrical, mechanical and structural modifications and/or repair capability as required. My assigned task was to learn project management techniques utilized by NASA and its' contractors to provide an efficient and effective logistics support infrastructure to the ISS program. Within the Space Station Processing Facility (SSPF) I was exposed to Logistics support components, such as, the NASA Spacecraft Services Depot (NSSD) capabilities, Mission Processing tools, techniques and Warehouse support issues, required for integrating Space Station elements at the Kennedy Space Center. I also supported the identification of near-term ISS Hardware and Ground Support Equipment (GSE) candidates for excessing/disposition prior to October 2010; and the validation of several Logistics Metrics used by the contractor to measure logistics support effectiveness.
Exploration Mission Benefits From Logistics Reduction Technologies
NASA Technical Reports Server (NTRS)
Broyan, James Lee, Jr.; Schlesinger, Thilini; Ewert, Michael K.
2016-01-01
Technologies that reduce logistical mass, volume, and the crew time dedicated to logistics management become more important as exploration missions extend further from the Earth. Even modest reductions in logical mass can have a significant impact because it also reduces the packing burden. NASA's Advanced Exploration Systems' Logistics Reduction Project is developing technologies that can directly reduce the mass and volume of crew clothing and metabolic waste collection. Also, cargo bags have been developed that can be reconfigured for crew outfitting and trash processing technologies to increase habitable volume and improve protection against solar storm events are under development. Additionally, Mars class missions are sufficiently distant that even logistics management without resupply can be problematic due to the communication time delay with Earth. Although exploration vehicles are launched with all consumables and logistics in a defined configuration, the configuration continually changes as the mission progresses. Traditionally significant ground and crew time has been required to understand the evolving configuration and locate misplaced items. For key mission events and unplanned contingencies, the crew will not be able to rely on the ground for logistics localization assistance. NASA has been developing a radio frequency identification autonomous logistics management system to reduce crew time for general inventory and enable greater crew self-response to unplanned events when a wide range of items may need to be located in a very short time period. This paper provides a status of the technologies being developed and there mission benefits for exploration missions.
Lees, Mackenzie C.; Merani, Shaheed; Tauh, Keerit; Khadaroo, Rachel G.
2015-01-01
Background Older adults (≥ 65 yr) are the fastest growing population and are presenting in increasing numbers for acute surgical care. Emergency surgery is frequently life threatening for older patients. Our objective was to identify predictors of mortality and poor outcome among elderly patients undergoing emergency general surgery. Methods We conducted a retrospective cohort study of patients aged 65–80 years undergoing emergency general surgery between 2009 and 2010 at a tertiary care centre. Demographics, comorbidities, in-hospital complications, mortality and disposition characteristics of patients were collected. Logistic regression analysis was used to identify covariate-adjusted predictors of in-hospital mortality and discharge of patients home. Results Our analysis included 257 patients with a mean age of 72 years; 52% were men. In-hospital mortality was 12%. Mortality was associated with patients who had higher American Society of Anesthesiologists (ASA) class (odds ratio [OR] 3.85, 95% confidence interval [CI] 1.43–10.33, p = 0.008) and in-hospital complications (OR 1.93, 95% CI 1.32–2.83, p = 0.001). Nearly two-thirds of patients discharged home were younger (OR 0.92, 95% CI 0.85–0.99, p = 0.036), had lower ASA class (OR 0.45, 95% CI 0.27–0.74, p = 0.002) and fewer in-hospital complications (OR 0.69, 95% CI 0.53–0.90, p = 0.007). Conclusion American Society of Anesthesiologists class and in-hospital complications are perioperative predictors of mortality and disposition in the older surgical population. Understanding the predictors of poor outcome and the importance of preventing in-hospital complications in older patients will have important clinical utility in terms of preoperative counselling, improving health care and discharging patients home. PMID:26204143