Sample records for logistic regression probability

  1. Using Multiple and Logistic Regression to Estimate the Median WillCost and Probability of Cost and Schedule Overrun for Program Managers

    DTIC Science & Technology

    2017-03-23

    PUBLIC RELEASE; DISTRIBUTION UNLIMITED Using Multiple and Logistic Regression to Estimate the Median Will- Cost and Probability of Cost and... Cost and Probability of Cost and Schedule Overrun for Program Managers Ryan C. Trudelle Follow this and additional works at: https://scholar.afit.edu...afit.edu. Recommended Citation Trudelle, Ryan C., "Using Multiple and Logistic Regression to Estimate the Median Will- Cost and Probability of Cost and

  2. Use and interpretation of logistic regression in habitat-selection studies

    USGS Publications Warehouse

    Keating, Kim A.; Cherry, Steve

    2004-01-01

     Logistic regression is an important tool for wildlife habitat-selection studies, but the method frequently has been misapplied due to an inadequate understanding of the logistic model, its interpretation, and the influence of sampling design. To promote better use of this method, we review its application and interpretation under 3 sampling designs: random, case-control, and use-availability. Logistic regression is appropriate for habitat use-nonuse studies employing random sampling and can be used to directly model the conditional probability of use in such cases. Logistic regression also is appropriate for studies employing case-control sampling designs, but careful attention is required to interpret results correctly. Unless bias can be estimated or probability of use is small for all habitats, results of case-control studies should be interpreted as odds ratios, rather than probability of use or relative probability of use. When data are gathered under a use-availability design, logistic regression can be used to estimate approximate odds ratios if probability of use is small, at least on average. More generally, however, logistic regression is inappropriate for modeling habitat selection in use-availability studies. In particular, using logistic regression to fit the exponential model of Manly et al. (2002:100) does not guarantee maximum-likelihood estimates, valid probabilities, or valid likelihoods. We show that the resource selection function (RSF) commonly used for the exponential model is proportional to a logistic discriminant function. Thus, it may be used to rank habitats with respect to probability of use and to identify important habitat characteristics or their surrogates, but it is not guaranteed to be proportional to probability of use. Other problems associated with the exponential model also are discussed. We describe an alternative model based on Lancaster and Imbens (1996) that offers a method for estimating conditional probability of use in use-availability studies. Although promising, this model fails to converge to a unique solution in some important situations. Further work is needed to obtain a robust method that is broadly applicable to use-availability studies.

  3. Applying Kaplan-Meier to Item Response Data

    ERIC Educational Resources Information Center

    McNeish, Daniel

    2018-01-01

    Some IRT models can be equivalently modeled in alternative frameworks such as logistic regression. Logistic regression can also model time-to-event data, which concerns the probability of an event occurring over time. Using the relation between time-to-event models and logistic regression and the relation between logistic regression and IRT, this…

  4. Multinomial Logistic Regression Predicted Probability Map To Visualize The Influence Of Socio-Economic Factors On Breast Cancer Occurrence in Southern Karnataka

    NASA Astrophysics Data System (ADS)

    Madhu, B.; Ashok, N. C.; Balasubramanian, S.

    2014-11-01

    Multinomial logistic regression analysis was used to develop statistical model that can predict the probability of breast cancer in Southern Karnataka using the breast cancer occurrence data during 2007-2011. Independent socio-economic variables describing the breast cancer occurrence like age, education, occupation, parity, type of family, health insurance coverage, residential locality and socioeconomic status of each case was obtained. The models were developed as follows: i) Spatial visualization of the Urban- rural distribution of breast cancer cases that were obtained from the Bharat Hospital and Institute of Oncology. ii) Socio-economic risk factors describing the breast cancer occurrences were complied for each case. These data were then analysed using multinomial logistic regression analysis in a SPSS statistical software and relations between the occurrence of breast cancer across the socio-economic status and the influence of other socio-economic variables were evaluated and multinomial logistic regression models were constructed. iii) the model that best predicted the occurrence of breast cancer were identified. This multivariate logistic regression model has been entered into a geographic information system and maps showing the predicted probability of breast cancer occurrence in Southern Karnataka was created. This study demonstrates that Multinomial logistic regression is a valuable tool for developing models that predict the probability of breast cancer Occurrence in Southern Karnataka.

  5. Using Logistic Regression To Predict the Probability of Debris Flows Occurring in Areas Recently Burned By Wildland Fires

    USGS Publications Warehouse

    Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.

    2003-01-01

    Logistic regression was used to predict the probability of debris flows occurring in areas recently burned by wildland fires. Multiple logistic regression is conceptually similar to multiple linear regression because statistical relations between one dependent variable and several independent variables are evaluated. In logistic regression, however, the dependent variable is transformed to a binary variable (debris flow did or did not occur), and the actual probability of the debris flow occurring is statistically modeled. Data from 399 basins located within 15 wildland fires that burned during 2000-2002 in Colorado, Idaho, Montana, and New Mexico were evaluated. More than 35 independent variables describing the burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows were delineated from National Elevation Data using a Geographic Information System (GIS). (2) Data describing the burn severity, geology, land surface gradient, rainfall, and soil properties were determined for each basin. These data were then downloaded to a statistics software package for analysis using logistic regression. (3) Relations between the occurrence/non-occurrence of debris flows and burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated and several preliminary multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combination produced the most effective model. The multivariate model that best predicted the occurrence of debris flows was selected. (4) The multivariate logistic regression model was entered into a GIS, and a map showing the probability of debris flows was constructed. The most effective model incorporates the percentage of each basin with slope greater than 30 percent, percentage of land burned at medium and high burn severity in each basin, particle size sorting, average storm intensity (millimeters per hour), soil organic matter content, soil permeability, and soil drainage. The results of this study demonstrate that logistic regression is a valuable tool for predicting the probability of debris flows occurring in recently-burned landscapes.

  6. Robust mislabel logistic regression without modeling mislabel probabilities.

    PubMed

    Hung, Hung; Jou, Zhi-Yu; Huang, Su-Yun

    2018-03-01

    Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression. © 2017, The International Biometric Society.

  7. Predicting U.S. Army Reserve Unit Manning Using Market Demographics

    DTIC Science & Technology

    2015-06-01

    develops linear regression , classification tree, and logistic regression models to determine the ability of the location to support manning requirements... logistic regression model delivers predictive results that allow decision-makers to identify locations with a high probability of meeting unit...manning requirements. The recommendation of this thesis is that the USAR implement the logistic regression model. 14. SUBJECT TERMS U.S

  8. Estimating the exceedance probability of rain rate by logistic regression

    NASA Technical Reports Server (NTRS)

    Chiu, Long S.; Kedem, Benjamin

    1990-01-01

    Recent studies have shown that the fraction of an area with rain intensity above a fixed threshold is highly correlated with the area-averaged rain rate. To estimate the fractional rainy area, a logistic regression model, which estimates the conditional probability that rain rate over an area exceeds a fixed threshold given the values of related covariates, is developed. The problem of dependency in the data in the estimation procedure is bypassed by the method of partial likelihood. Analyses of simulated scanning multichannel microwave radiometer and observed electrically scanning microwave radiometer data during the Global Atlantic Tropical Experiment period show that the use of logistic regression in pixel classification is superior to multiple regression in predicting whether rain rate at each pixel exceeds a given threshold, even in the presence of noisy data. The potential of the logistic regression technique in satellite rain rate estimation is discussed.

  9. Estimating the Probability of Rare Events Occurring Using a Local Model Averaging.

    PubMed

    Chen, Jin-Hua; Chen, Chun-Shu; Huang, Meng-Fan; Lin, Hung-Chih

    2016-10-01

    In statistical applications, logistic regression is a popular method for analyzing binary data accompanied by explanatory variables. But when one of the two outcomes is rare, the estimation of model parameters has been shown to be severely biased and hence estimating the probability of rare events occurring based on a logistic regression model would be inaccurate. In this article, we focus on estimating the probability of rare events occurring based on logistic regression models. Instead of selecting a best model, we propose a local model averaging procedure based on a data perturbation technique applied to different information criteria to obtain different probability estimates of rare events occurring. Then an approximately unbiased estimator of Kullback-Leibler loss is used to choose the best one among them. We design complete simulations to show the effectiveness of our approach. For illustration, a necrotizing enterocolitis (NEC) data set is analyzed. © 2016 Society for Risk Analysis.

  10. On the Usefulness of a Multilevel Logistic Regression Approach to Person-Fit Analysis

    ERIC Educational Resources Information Center

    Conijn, Judith M.; Emons, Wilco H. M.; van Assen, Marcel A. L. M.; Sijtsma, Klaas

    2011-01-01

    The logistic person response function (PRF) models the probability of a correct response as a function of the item locations. Reise (2000) proposed to use the slope parameter of the logistic PRF as a person-fit measure. He reformulated the logistic PRF model as a multilevel logistic regression model and estimated the PRF parameters from this…

  11. Comparison of naïve Bayes and logistic regression for computer-aided diagnosis of breast masses using ultrasound imaging

    NASA Astrophysics Data System (ADS)

    Cary, Theodore W.; Cwanger, Alyssa; Venkatesh, Santosh S.; Conant, Emily F.; Sehgal, Chandra M.

    2012-03-01

    This study compares the performance of two proven but very different machine learners, Naïve Bayes and logistic regression, for differentiating malignant and benign breast masses using ultrasound imaging. Ultrasound images of 266 masses were analyzed quantitatively for shape, echogenicity, margin characteristics, and texture features. These features along with patient age, race, and mammographic BI-RADS category were used to train Naïve Bayes and logistic regression classifiers to diagnose lesions as malignant or benign. ROC analysis was performed using all of the features and using only a subset that maximized information gain. Performance was determined by the area under the ROC curve, Az, obtained from leave-one-out cross validation. Naïve Bayes showed significant variation (Az 0.733 +/- 0.035 to 0.840 +/- 0.029, P < 0.002) with the choice of features, but the performance of logistic regression was relatively unchanged under feature selection (Az 0.839 +/- 0.029 to 0.859 +/- 0.028, P = 0.605). Out of 34 features, a subset of 6 gave the highest information gain: brightness difference, margin sharpness, depth-to-width, mammographic BI-RADs, age, and race. The probabilities of malignancy determined by Naïve Bayes and logistic regression after feature selection showed significant correlation (R2= 0.87, P < 0.0001). The diagnostic performance of Naïve Bayes and logistic regression can be comparable, but logistic regression is more robust. Since probability of malignancy cannot be measured directly, high correlation between the probabilities derived from two basic but dissimilar models increases confidence in the predictive power of machine learning models for characterizing solid breast masses on ultrasound.

  12. Determination of riverbank erosion probability using Locally Weighted Logistic Regression

    NASA Astrophysics Data System (ADS)

    Ioannidou, Elena; Flori, Aikaterini; Varouchakis, Emmanouil A.; Giannakis, Georgios; Vozinaki, Anthi Eirini K.; Karatzas, George P.; Nikolaidis, Nikolaos

    2015-04-01

    Riverbank erosion is a natural geomorphologic process that affects the fluvial environment. The most important issue concerning riverbank erosion is the identification of the vulnerable locations. An alternative to the usual hydrodynamic models to predict vulnerable locations is to quantify the probability of erosion occurrence. This can be achieved by identifying the underlying relations between riverbank erosion and the geomorphological or hydrological variables that prevent or stimulate erosion. Thus, riverbank erosion can be determined by a regression model using independent variables that are considered to affect the erosion process. The impact of such variables may vary spatially, therefore, a non-stationary regression model is preferred instead of a stationary equivalent. Locally Weighted Regression (LWR) is proposed as a suitable choice. This method can be extended to predict the binary presence or absence of erosion based on a series of independent local variables by using the logistic regression model. It is referred to as Locally Weighted Logistic Regression (LWLR). Logistic regression is a type of regression analysis used for predicting the outcome of a categorical dependent variable (e.g. binary response) based on one or more predictor variables. The method can be combined with LWR to assign weights to local independent variables of the dependent one. LWR allows model parameters to vary over space in order to reflect spatial heterogeneity. The probabilities of the possible outcomes are modelled as a function of the independent variables using a logistic function. Logistic regression measures the relationship between a categorical dependent variable and, usually, one or several continuous independent variables by converting the dependent variable to probability scores. Then, a logistic regression is formed, which predicts success or failure of a given binary variable (e.g. erosion presence or absence) for any value of the independent variables. The erosion occurrence probability can be calculated in conjunction with the model deviance regarding the independent variables tested. The most straightforward measure for goodness of fit is the G statistic. It is a simple and effective way to study and evaluate the Logistic Regression model efficiency and the reliability of each independent variable. The developed statistical model is applied to the Koiliaris River Basin on the island of Crete, Greece. Two datasets of river bank slope, river cross-section width and indications of erosion were available for the analysis (12 and 8 locations). Two different types of spatial dependence functions, exponential and tricubic, were examined to determine the local spatial dependence of the independent variables at the measurement locations. The results show a significant improvement when the tricubic function is applied as the erosion probability is accurately predicted at all eight validation locations. Results for the model deviance show that cross-section width is more important than bank slope in the estimation of erosion probability along the Koiliaris riverbanks. The proposed statistical model is a useful tool that quantifies the erosion probability along the riverbanks and can be used to assist managing erosion and flooding events. Acknowledgements This work is part of an on-going THALES project (CYBERSENSORS - High Frequency Monitoring System for Integrated Water Resources Management of Rivers). The project has been co-financed by the European Union (European Social Fund - ESF) and Greek national funds through the Operational Program "Education and Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research Funding Program: THALES. Investing in knowledge society through the European Social Fund.

  13. Adjusting for Confounding in Early Postlaunch Settings: Going Beyond Logistic Regression Models.

    PubMed

    Schmidt, Amand F; Klungel, Olaf H; Groenwold, Rolf H H

    2016-01-01

    Postlaunch data on medical treatments can be analyzed to explore adverse events or relative effectiveness in real-life settings. These analyses are often complicated by the number of potential confounders and the possibility of model misspecification. We conducted a simulation study to compare the performance of logistic regression, propensity score, disease risk score, and stabilized inverse probability weighting methods to adjust for confounding. Model misspecification was induced in the independent derivation dataset. We evaluated performance using relative bias confidence interval coverage of the true effect, among other metrics. At low events per coefficient (1.0 and 0.5), the logistic regression estimates had a large relative bias (greater than -100%). Bias of the disease risk score estimates was at most 13.48% and 18.83%. For the propensity score model, this was 8.74% and >100%, respectively. At events per coefficient of 1.0 and 0.5, inverse probability weighting frequently failed or reduced to a crude regression, resulting in biases of -8.49% and 24.55%. Coverage of logistic regression estimates became less than the nominal level at events per coefficient ≤5. For the disease risk score, inverse probability weighting, and propensity score, coverage became less than nominal at events per coefficient ≤2.5, ≤1.0, and ≤1.0, respectively. Bias of misspecified disease risk score models was 16.55%. In settings with low events/exposed subjects per coefficient, disease risk score methods can be useful alternatives to logistic regression models, especially when propensity score models cannot be used. Despite better performance of disease risk score methods than logistic regression and propensity score models in small events per coefficient settings, bias, and coverage still deviated from nominal.

  14. Using Logistic Regression to Predict the Probability of Debris Flows in Areas Burned by Wildfires, Southern California, 2003-2006

    USGS Publications Warehouse

    Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.; Michael, John A.; Helsel, Dennis R.

    2008-01-01

    Logistic regression was used to develop statistical models that can be used to predict the probability of debris flows in areas recently burned by wildfires by using data from 14 wildfires that burned in southern California during 2003-2006. Twenty-eight independent variables describing the basin morphology, burn severity, rainfall, and soil properties of 306 drainage basins located within those burned areas were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows soon after the 2003 to 2006 fires were delineated from data in the National Elevation Dataset using a geographic information system; (2) Data describing the basin morphology, burn severity, rainfall, and soil properties were compiled for each basin. These data were then input to a statistics software package for analysis using logistic regression; and (3) Relations between the occurrence or absence of debris flows and the basin morphology, burn severity, rainfall, and soil properties were evaluated, and five multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combinations produced the most effective models, and the multivariate models that best predicted the occurrence of debris flows were identified. Percentage of high burn severity and 3-hour peak rainfall intensity were significant variables in all models. Soil organic matter content and soil clay content were significant variables in all models except Model 5. Soil slope was a significant variable in all models except Model 4. The most suitable model can be selected from these five models on the basis of the availability of independent variables in the particular area of interest and field checking of probability maps. The multivariate logistic regression models can be entered into a geographic information system, and maps showing the probability of debris flows can be constructed in recently burned areas of southern California. This study demonstrates that logistic regression is a valuable tool for developing models that predict the probability of debris flows occurring in recently burned landscapes.

  15. Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model

    NASA Astrophysics Data System (ADS)

    Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami

    2017-06-01

    A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.

  16. Calibrating random forests for probability estimation.

    PubMed

    Dankowski, Theresa; Ziegler, Andreas

    2016-09-30

    Probabilities can be consistently estimated using random forests. It is, however, unclear how random forests should be updated to make predictions for other centers or at different time points. In this work, we present two approaches for updating random forests for probability estimation. The first method has been proposed by Elkan and may be used for updating any machine learning approach yielding consistent probabilities, so-called probability machines. The second approach is a new strategy specifically developed for random forests. Using the terminal nodes, which represent conditional probabilities, the random forest is first translated to logistic regression models. These are, in turn, used for re-calibration. The two updating strategies were compared in a simulation study and are illustrated with data from the German Stroke Study Collaboration. In most simulation scenarios, both methods led to similar improvements. In the simulation scenario in which the stricter assumptions of Elkan's method were not met, the logistic regression-based re-calibration approach for random forests outperformed Elkan's method. It also performed better on the stroke data than Elkan's method. The strength of Elkan's method is its general applicability to any probability machine. However, if the strict assumptions underlying this approach are not met, the logistic regression-based approach is preferable for updating random forests for probability estimation. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  17. 2012 Workplace and Gender Relations Survey of Reserve Component Members: Statistical Methodology Report

    DTIC Science & Technology

    2012-09-01

    3,435 10,461 9.1 3.1 63 Unmarried with Children+ Unmarried without Children 439,495 0.01 10,350 43,870 10.1 2.2 64 Married with Children+ Married ...logistic regression model was used to predict the probability of eligibility for the survey (known eligibility vs . unknown eligibility). A second logistic...regression model was used to predict the probability of response among eligible sample members (complete response vs . non-response). CHAID (Chi

  18. Latin hypercube approach to estimate uncertainty in ground water vulnerability

    USGS Publications Warehouse

    Gurdak, J.J.; McCray, J.E.; Thyne, G.; Qi, S.L.

    2007-01-01

    A methodology is proposed to quantify prediction uncertainty associated with ground water vulnerability models that were developed through an approach that coupled multivariate logistic regression with a geographic information system (GIS). This method uses Latin hypercube sampling (LHS) to illustrate the propagation of input error and estimate uncertainty associated with the logistic regression predictions of ground water vulnerability. Central to the proposed method is the assumption that prediction uncertainty in ground water vulnerability models is a function of input error propagation from uncertainty in the estimated logistic regression model coefficients (model error) and the values of explanatory variables represented in the GIS (data error). Input probability distributions that represent both model and data error sources of uncertainty were simultaneously sampled using a Latin hypercube approach with logistic regression calculations of probability of elevated nonpoint source contaminants in ground water. The resulting probability distribution represents the prediction intervals and associated uncertainty of the ground water vulnerability predictions. The method is illustrated through a ground water vulnerability assessment of the High Plains regional aquifer. Results of the LHS simulations reveal significant prediction uncertainties that vary spatially across the regional aquifer. Additionally, the proposed method enables a spatial deconstruction of the prediction uncertainty that can lead to improved prediction of ground water vulnerability. ?? 2007 National Ground Water Association.

  19. Probability estimation with machine learning methods for dichotomous and multicategory outcome: theory.

    PubMed

    Kruppa, Jochen; Liu, Yufeng; Biau, Gérard; Kohler, Michael; König, Inke R; Malley, James D; Ziegler, Andreas

    2014-07-01

    Probability estimation for binary and multicategory outcome using logistic and multinomial logistic regression has a long-standing tradition in biostatistics. However, biases may occur if the model is misspecified. In contrast, outcome probabilities for individuals can be estimated consistently with machine learning approaches, including k-nearest neighbors (k-NN), bagged nearest neighbors (b-NN), random forests (RF), and support vector machines (SVM). Because machine learning methods are rarely used by applied biostatisticians, the primary goal of this paper is to explain the concept of probability estimation with these methods and to summarize recent theoretical findings. Probability estimation in k-NN, b-NN, and RF can be embedded into the class of nonparametric regression learning machines; therefore, we start with the construction of nonparametric regression estimates and review results on consistency and rates of convergence. In SVMs, outcome probabilities for individuals are estimated consistently by repeatedly solving classification problems. For SVMs we review classification problem and then dichotomous probability estimation. Next we extend the algorithms for estimating probabilities using k-NN, b-NN, and RF to multicategory outcomes and discuss approaches for the multicategory probability estimation problem using SVM. In simulation studies for dichotomous and multicategory dependent variables we demonstrate the general validity of the machine learning methods and compare it with logistic regression. However, each method fails in at least one simulation scenario. We conclude with a discussion of the failures and give recommendations for selecting and tuning the methods. Applications to real data and example code are provided in a companion article (doi:10.1002/bimj.201300077). © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  20. Two-factor logistic regression in pediatric liver transplantation

    NASA Astrophysics Data System (ADS)

    Uzunova, Yordanka; Prodanova, Krasimira; Spasov, Lyubomir

    2017-12-01

    Using a two-factor logistic regression analysis an estimate is derived for the probability of absence of infections in the early postoperative period after pediatric liver transplantation. The influence of both the bilirubin level and the international normalized ratio of prothrombin time of blood coagulation at the 5th postoperative day is studied.

  1. Accuracy of Bayes and Logistic Regression Subscale Probabilities for Educational and Certification Tests

    ERIC Educational Resources Information Center

    Rudner, Lawrence

    2016-01-01

    In the machine learning literature, it is commonly accepted as fact that as calibration sample sizes increase, Naïve Bayes classifiers initially outperform Logistic Regression classifiers in terms of classification accuracy. Applied to subtests from an on-line final examination and from a highly regarded certification examination, this study shows…

  2. Comparing Linear Discriminant Function with Logistic Regression for the Two-Group Classification Problem.

    ERIC Educational Resources Information Center

    Fan, Xitao; Wang, Lin

    The Monte Carlo study compared the performance of predictive discriminant analysis (PDA) and that of logistic regression (LR) for the two-group classification problem. Prior probabilities were used for classification, but the cost of misclassification was assumed to be equal. The study used a fully crossed three-factor experimental design (with…

  3. Simulating land-use changes by incorporating spatial autocorrelation and self-organization in CLUE-S modeling: a case study in Zengcheng District, Guangzhou, China

    NASA Astrophysics Data System (ADS)

    Mei, Zhixiong; Wu, Hao; Li, Shiyun

    2018-06-01

    The Conversion of Land Use and its Effects at Small regional extent (CLUE-S), which is a widely used model for land-use simulation, utilizes logistic regression to estimate the relationships between land use and its drivers, and thus, predict land-use change probabilities. However, logistic regression disregards possible spatial autocorrelation and self-organization in land-use data. Autologistic regression can depict spatial autocorrelation but cannot address self-organization, while logistic regression by considering only self-organization (NElogistic regression) fails to capture spatial autocorrelation. Therefore, this study developed a regression (NE-autologistic regression) method, which incorporated both spatial autocorrelation and self-organization, to improve CLUE-S. The Zengcheng District of Guangzhou, China was selected as the study area. The land-use data of 2001, 2005, and 2009, as well as 10 typical driving factors, were used to validate the proposed regression method and the improved CLUE-S model. Then, three future land-use scenarios in 2020: the natural growth scenario, ecological protection scenario, and economic development scenario, were simulated using the improved model. Validation results showed that NE-autologistic regression performed better than logistic regression, autologistic regression, and NE-logistic regression in predicting land-use change probabilities. The spatial allocation accuracy and kappa values of NE-autologistic-CLUE-S were higher than those of logistic-CLUE-S, autologistic-CLUE-S, and NE-logistic-CLUE-S for the simulations of two periods, 2001-2009 and 2005-2009, which proved that the improved CLUE-S model achieved the best simulation and was thereby effective to a certain extent. The scenario simulation results indicated that under all three scenarios, traffic land and residential/industrial land would increase, whereas arable land and unused land would decrease during 2009-2020. Apparent differences also existed in the simulated change sizes and locations of each land-use type under different scenarios. The results not only demonstrate the validity of the improved model but also provide a valuable reference for relevant policy-makers.

  4. A logistic regression equation for estimating the probability of a stream in Vermont having intermittent flow

    USGS Publications Warehouse

    Olson, Scott A.; Brouillette, Michael C.

    2006-01-01

    A logistic regression equation was developed for estimating the probability of a stream flowing intermittently at unregulated, rural stream sites in Vermont. These determinations can be used for a wide variety of regulatory and planning efforts at the Federal, State, regional, county and town levels, including such applications as assessing fish and wildlife habitats, wetlands classifications, recreational opportunities, water-supply potential, waste-assimilation capacities, and sediment transport. The equation will be used to create a derived product for the Vermont Hydrography Dataset having the streamflow characteristic of 'intermittent' or 'perennial.' The Vermont Hydrography Dataset is Vermont's implementation of the National Hydrography Dataset and was created at a scale of 1:5,000 based on statewide digital orthophotos. The equation was developed by relating field-verified perennial or intermittent status of a stream site during normal summer low-streamflow conditions in the summer of 2005 to selected basin characteristics of naturally flowing streams in Vermont. The database used to develop the equation included 682 stream sites with drainage areas ranging from 0.05 to 5.0 square miles. When the 682 sites were observed, 126 were intermittent (had no flow at the time of the observation) and 556 were perennial (had flowing water at the time of the observation). The results of the logistic regression analysis indicate that the probability of a stream having intermittent flow in Vermont is a function of drainage area, elevation of the site, the ratio of basin relief to basin perimeter, and the areal percentage of well- and moderately well-drained soils in the basin. Using a probability cutpoint (a lower probability indicates the site has perennial flow and a higher probability indicates the site has intermittent flow) of 0.5, the logistic regression equation correctly predicted the perennial or intermittent status of 116 test sites 85 percent of the time.

  5. A Comparison of Logistic Regression, Neural Networks, and Classification Trees Predicting Success of Actuarial Students

    ERIC Educational Resources Information Center

    Schumacher, Phyllis; Olinsky, Alan; Quinn, John; Smith, Richard

    2010-01-01

    The authors extended previous research by 2 of the authors who conducted a study designed to predict the successful completion of students enrolled in an actuarial program. They used logistic regression to determine the probability of an actuarial student graduating in the major or dropping out. They compared the results of this study with those…

  6. A Method for Calculating the Probability of Successfully Completing a Rocket Propulsion Ground Test

    NASA Technical Reports Server (NTRS)

    Messer, Bradley

    2007-01-01

    Propulsion ground test facilities face the daily challenge of scheduling multiple customers into limited facility space and successfully completing their propulsion test projects. Over the last decade NASA s propulsion test facilities have performed hundreds of tests, collected thousands of seconds of test data, and exceeded the capabilities of numerous test facility and test article components. A logistic regression mathematical modeling technique has been developed to predict the probability of successfully completing a rocket propulsion test. A logistic regression model is a mathematical modeling approach that can be used to describe the relationship of several independent predictor variables X(sub 1), X(sub 2),.., X(sub k) to a binary or dichotomous dependent variable Y, where Y can only be one of two possible outcomes, in this case Success or Failure of accomplishing a full duration test. The use of logistic regression modeling is not new; however, modeling propulsion ground test facilities using logistic regression is both a new and unique application of the statistical technique. Results from this type of model provide project managers with insight and confidence into the effectiveness of rocket propulsion ground testing.

  7. Assessing the potential for improving S2S forecast skill through multimodel ensembling

    NASA Astrophysics Data System (ADS)

    Vigaud, N.; Robertson, A. W.; Tippett, M. K.; Wang, L.; Bell, M. J.

    2016-12-01

    Non-linear logistic regression is well suited to probability forecasting and has been successfully applied in the past to ensemble weather and climate predictions, providing access to the full probabilities distribution without any Gaussian assumption. However, little work has been done at sub-monthly lead times where relatively small re-forecast ensembles and lengths represent new challenges for which post-processing avenues have yet to be investigated. A promising approach consists in extending the definition of non-linear logistic regression by including the quantile of the forecast distribution as one of the predictors. So-called Extended Logistic Regression (ELR), which enables mutually consistent individual threshold probabilities, is here applied to ECMWF, CFSv2 and CMA re-forecasts from the S2S database in order to produce rainfall probabilities at weekly resolution. The ELR model is trained on seasonally-varying tercile categories computed for lead times of 1 to 4 weeks. It is then tested in a cross-validated manner, i.e. allowing real-time predictability applications, to produce rainfall tercile probabilities from individual weekly hindcasts that are finally combined by equal pooling. Results will be discussed over a broader North American region, where individual and MME forecasts generated out to 4 weeks lead are characterized by good probabilistic reliability but low sharpness, exhibiting systematically more skill in winter than summer.

  8. A nonparametric multiple imputation approach for missing categorical data.

    PubMed

    Zhou, Muhan; He, Yulei; Yu, Mandi; Hsu, Chiu-Hsieh

    2017-06-06

    Incomplete categorical variables with more than two categories are common in public health data. However, most of the existing missing-data methods do not use the information from nonresponse (missingness) probabilities. We propose a nearest-neighbour multiple imputation approach to impute a missing at random categorical outcome and to estimate the proportion of each category. The donor set for imputation is formed by measuring distances between each missing value with other non-missing values. The distance function is calculated based on a predictive score, which is derived from two working models: one fits a multinomial logistic regression for predicting the missing categorical outcome (the outcome model) and the other fits a logistic regression for predicting missingness probabilities (the missingness model). A weighting scheme is used to accommodate contributions from two working models when generating the predictive score. A missing value is imputed by randomly selecting one of the non-missing values with the smallest distances. We conduct a simulation to evaluate the performance of the proposed method and compare it with several alternative methods. A real-data application is also presented. The simulation study suggests that the proposed method performs well when missingness probabilities are not extreme under some misspecifications of the working models. However, the calibration estimator, which is also based on two working models, can be highly unstable when missingness probabilities for some observations are extremely high. In this scenario, the proposed method produces more stable and better estimates. In addition, proper weights need to be chosen to balance the contributions from the two working models and achieve optimal results for the proposed method. We conclude that the proposed multiple imputation method is a reasonable approach to dealing with missing categorical outcome data with more than two levels for assessing the distribution of the outcome. In terms of the choices for the working models, we suggest a multinomial logistic regression for predicting the missing outcome and a binary logistic regression for predicting the missingness probability.

  9. Risk estimation using probability machines

    PubMed Central

    2014-01-01

    Background Logistic regression has been the de facto, and often the only, model used in the description and analysis of relationships between a binary outcome and observed features. It is widely used to obtain the conditional probabilities of the outcome given predictors, as well as predictor effect size estimates using conditional odds ratios. Results We show how statistical learning machines for binary outcomes, provably consistent for the nonparametric regression problem, can be used to provide both consistent conditional probability estimation and conditional effect size estimates. Effect size estimates from learning machines leverage our understanding of counterfactual arguments central to the interpretation of such estimates. We show that, if the data generating model is logistic, we can recover accurate probability predictions and effect size estimates with nearly the same efficiency as a correct logistic model, both for main effects and interactions. We also propose a method using learning machines to scan for possible interaction effects quickly and efficiently. Simulations using random forest probability machines are presented. Conclusions The models we propose make no assumptions about the data structure, and capture the patterns in the data by just specifying the predictors involved and not any particular model structure. So they do not run the same risks of model mis-specification and the resultant estimation biases as a logistic model. This methodology, which we call a “risk machine”, will share properties from the statistical machine that it is derived from. PMID:24581306

  10. Risk estimation using probability machines.

    PubMed

    Dasgupta, Abhijit; Szymczak, Silke; Moore, Jason H; Bailey-Wilson, Joan E; Malley, James D

    2014-03-01

    Logistic regression has been the de facto, and often the only, model used in the description and analysis of relationships between a binary outcome and observed features. It is widely used to obtain the conditional probabilities of the outcome given predictors, as well as predictor effect size estimates using conditional odds ratios. We show how statistical learning machines for binary outcomes, provably consistent for the nonparametric regression problem, can be used to provide both consistent conditional probability estimation and conditional effect size estimates. Effect size estimates from learning machines leverage our understanding of counterfactual arguments central to the interpretation of such estimates. We show that, if the data generating model is logistic, we can recover accurate probability predictions and effect size estimates with nearly the same efficiency as a correct logistic model, both for main effects and interactions. We also propose a method using learning machines to scan for possible interaction effects quickly and efficiently. Simulations using random forest probability machines are presented. The models we propose make no assumptions about the data structure, and capture the patterns in the data by just specifying the predictors involved and not any particular model structure. So they do not run the same risks of model mis-specification and the resultant estimation biases as a logistic model. This methodology, which we call a "risk machine", will share properties from the statistical machine that it is derived from.

  11. Fuzzy multinomial logistic regression analysis: A multi-objective programming approach

    NASA Astrophysics Data System (ADS)

    Abdalla, Hesham A.; El-Sayed, Amany A.; Hamed, Ramadan

    2017-05-01

    Parameter estimation for multinomial logistic regression is usually based on maximizing the likelihood function. For large well-balanced datasets, Maximum Likelihood (ML) estimation is a satisfactory approach. Unfortunately, ML can fail completely or at least produce poor results in terms of estimated probabilities and confidence intervals of parameters, specially for small datasets. In this study, a new approach based on fuzzy concepts is proposed to estimate parameters of the multinomial logistic regression. The study assumes that the parameters of multinomial logistic regression are fuzzy. Based on the extension principle stated by Zadeh and Bárdossy's proposition, a multi-objective programming approach is suggested to estimate these fuzzy parameters. A simulation study is used to evaluate the performance of the new approach versus Maximum likelihood (ML) approach. Results show that the new proposed model outperforms ML in cases of small datasets.

  12. Collapse susceptibility mapping in karstified gypsum terrain (Sivas basin - Turkey) by conditional probability, logistic regression, artificial neural network models

    NASA Astrophysics Data System (ADS)

    Yilmaz, Isik; Keskin, Inan; Marschalko, Marian; Bednarik, Martin

    2010-05-01

    This study compares the GIS based collapse susceptibility mapping methods such as; conditional probability (CP), logistic regression (LR) and artificial neural networks (ANN) applied in gypsum rock masses in Sivas basin (Turkey). Digital Elevation Model (DEM) was first constructed using GIS software. Collapse-related factors, directly or indirectly related to the causes of collapse occurrence, such as distance from faults, slope angle and aspect, topographical elevation, distance from drainage, topographic wetness index- TWI, stream power index- SPI, Normalized Difference Vegetation Index (NDVI) by means of vegetation cover, distance from roads and settlements were used in the collapse susceptibility analyses. In the last stage of the analyses, collapse susceptibility maps were produced from CP, LR and ANN models, and they were then compared by means of their validations. Area Under Curve (AUC) values obtained from all three methodologies showed that the map obtained from ANN model looks like more accurate than the other models, and the results also showed that the artificial neural networks is a usefull tool in preparation of collapse susceptibility map and highly compatible with GIS operating features. Key words: Collapse; doline; susceptibility map; gypsum; GIS; conditional probability; logistic regression; artificial neural networks.

  13. Does clinical pretest probability influence image quality and diagnostic accuracy in dual-source coronary CT angiography?

    PubMed

    Thomas, Christoph; Brodoefel, Harald; Tsiflikas, Ilias; Bruckner, Friederike; Reimann, Anja; Ketelsen, Dominik; Drosch, Tanja; Claussen, Claus D; Kopp, Andreas; Heuschmid, Martin; Burgstahler, Christof

    2010-02-01

    To prospectively evaluate the influence of the clinical pretest probability assessed by the Morise score onto image quality and diagnostic accuracy in coronary dual-source computed tomography angiography (DSCTA). In 61 patients, DSCTA and invasive coronary angiography were performed. Subjective image quality and accuracy for stenosis detection (>50%) of DSCTA with invasive coronary angiography as gold standard were evaluated. The influence of pretest probability onto image quality and accuracy was assessed by logistic regression and chi-square testing. Correlations of image quality and accuracy with the Morise score were determined using linear regression. Thirty-eight patients were categorized into the high, 21 into the intermediate, and 2 into the low probability group. Accuracies for the detection of significant stenoses were 0.94, 0.97, and 1.00, respectively. Logistic regressions and chi-square tests showed statistically significant correlations between Morise score and image quality (P < .0001 and P < .001) and accuracy (P = .0049 and P = .027). Linear regression revealed a cutoff Morise score for a good image quality of 16 and a cutoff for a barely diagnostic image quality beyond the upper Morise scale. Pretest probability is a weak predictor of image quality and diagnostic accuracy in coronary DSCTA. A sufficient image quality for diagnostic images can be reached with all pretest probabilities. Therefore, coronary DSCTA might be suitable also for patients with a high pretest probability. Copyright 2010 AUR. Published by Elsevier Inc. All rights reserved.

  14. Three methods to construct predictive models using logistic regression and likelihood ratios to facilitate adjustment for pretest probability give similar results.

    PubMed

    Chan, Siew Foong; Deeks, Jonathan J; Macaskill, Petra; Irwig, Les

    2008-01-01

    To compare three predictive models based on logistic regression to estimate adjusted likelihood ratios allowing for interdependency between diagnostic variables (tests). This study was a review of the theoretical basis, assumptions, and limitations of published models; and a statistical extension of methods and application to a case study of the diagnosis of obstructive airways disease based on history and clinical examination. Albert's method includes an offset term to estimate an adjusted likelihood ratio for combinations of tests. Spiegelhalter and Knill-Jones method uses the unadjusted likelihood ratio for each test as a predictor and computes shrinkage factors to allow for interdependence. Knottnerus' method differs from the other methods because it requires sequencing of tests, which limits its application to situations where there are few tests and substantial data. Although parameter estimates differed between the models, predicted "posttest" probabilities were generally similar. Construction of predictive models using logistic regression is preferred to the independence Bayes' approach when it is important to adjust for dependency of tests errors. Methods to estimate adjusted likelihood ratios from predictive models should be considered in preference to a standard logistic regression model to facilitate ease of interpretation and application. Albert's method provides the most straightforward approach.

  15. Methods for estimating drought streamflow probabilities for Virginia streams

    USGS Publications Warehouse

    Austin, Samuel H.

    2014-01-01

    Maximum likelihood logistic regression model equations used to estimate drought flow probabilities for Virginia streams are presented for 259 hydrologic basins in Virginia. Winter streamflows were used to estimate the likelihood of streamflows during the subsequent drought-prone summer months. The maximum likelihood logistic regression models identify probable streamflows from 5 to 8 months in advance. More than 5 million streamflow daily values collected over the period of record (January 1, 1900 through May 16, 2012) were compiled and analyzed over a minimum 10-year (maximum 112-year) period of record. The analysis yielded the 46,704 equations with statistically significant fit statistics and parameter ranges published in two tables in this report. These model equations produce summer month (July, August, and September) drought flow threshold probabilities as a function of streamflows during the previous winter months (November, December, January, and February). Example calculations are provided, demonstrating how to use the equations to estimate probable streamflows as much as 8 months in advance.

  16. Logistic regression trees for initial selection of interesting loci in case-control studies

    PubMed Central

    Nickolov, Radoslav Z; Milanov, Valentin B

    2007-01-01

    Modern genetic epidemiology faces the challenge of dealing with hundreds of thousands of genetic markers. The selection of a small initial subset of interesting markers for further investigation can greatly facilitate genetic studies. In this contribution we suggest the use of a logistic regression tree algorithm known as logistic tree with unbiased selection. Using the simulated data provided for Genetic Analysis Workshop 15, we show how this algorithm, with incorporation of multifactor dimensionality reduction method, can reduce an initial large pool of markers to a small set that includes the interesting markers with high probability. PMID:18466557

  17. Modeling the probability of giving birth at health institutions among pregnant women attending antenatal care in West Shewa Zone, Oromia, Ethiopia: a cross sectional study.

    PubMed

    Dida, Nagasa; Birhanu, Zewdie; Gerbaba, Mulusew; Tilahun, Dejen; Morankar, Sudhakar

    2014-06-01

    Although ante natal care and institutional delivery is effective means for reducing maternal morbidity and mortality, the probability of giving birth at health institutions among ante natal care attendants has not been modeled in Ethiopia. Therefore, the objective of this study was to model predictors of giving birth at health institutions among expectant mothers following antenatal care. Facility based cross sectional study design was conducted among 322 consecutively selected mothers who were following ante natal care in two districts of West Shewa Zone, Oromia Regional State, Ethiopia. Participants were proportionally recruited from six health institutions. The data were analyzed using SPSS version 17.0. Multivariable logistic regression was employed to develop the prediction model. The final regression model had good discrimination power (89.2%), optimum sensitivity (89.0%) and specificity (80.0%) to predict the probability of giving birth at health institutions. Accordingly, self efficacy (beta=0.41), perceived barrier (beta=-0.31) and perceived susceptibility (beta=0.29) were significantly predicted the probability of giving birth at health institutions. The present study showed that logistic regression model has predicted the probability of giving birth at health institutions and identified significant predictors which health care providers should take into account in promotion of institutional delivery.

  18. Filtering data from the collaborative initial glaucoma treatment study for improved identification of glaucoma progression.

    PubMed

    Schell, Greggory J; Lavieri, Mariel S; Stein, Joshua D; Musch, David C

    2013-12-21

    Open-angle glaucoma (OAG) is a prevalent, degenerate ocular disease which can lead to blindness without proper clinical management. The tests used to assess disease progression are susceptible to process and measurement noise. The aim of this study was to develop a methodology which accounts for the inherent noise in the data and improve significant disease progression identification. Longitudinal observations from the Collaborative Initial Glaucoma Treatment Study (CIGTS) were used to parameterize and validate a Kalman filter model and logistic regression function. The Kalman filter estimates the true value of biomarkers associated with OAG and forecasts future values of these variables. We develop two logistic regression models via generalized estimating equations (GEE) for calculating the probability of experiencing significant OAG progression: one model based on the raw measurements from CIGTS and another model based on the Kalman filter estimates of the CIGTS data. Receiver operating characteristic (ROC) curves and associated area under the ROC curve (AUC) estimates are calculated using cross-fold validation. The logistic regression model developed using Kalman filter estimates as data input achieves higher sensitivity and specificity than the model developed using raw measurements. The mean AUC for the Kalman filter-based model is 0.961 while the mean AUC for the raw measurements model is 0.889. Hence, using the probability function generated via Kalman filter estimates and GEE for logistic regression, we are able to more accurately classify patients and instances as experiencing significant OAG progression. A Kalman filter approach for estimating the true value of OAG biomarkers resulted in data input which improved the accuracy of a logistic regression classification model compared to a model using raw measurements as input. This methodology accounts for process and measurement noise to enable improved discrimination between progression and nonprogression in chronic diseases.

  19. Assessing landslide susceptibility by statistical data analysis and GIS: the case of Daunia (Apulian Apennines, Italy)

    NASA Astrophysics Data System (ADS)

    Ceppi, C.; Mancini, F.; Ritrovato, G.

    2009-04-01

    This study aim at the landslide susceptibility mapping within an area of the Daunia (Apulian Apennines, Italy) by a multivariate statistical method and data manipulation in a Geographical Information System (GIS) environment. Among the variety of existing statistical data analysis techniques, the logistic regression was chosen to produce a susceptibility map all over an area where small settlements are historically threatened by landslide phenomena. By logistic regression a best fitting between the presence or absence of landslide (dependent variable) and the set of independent variables is performed on the basis of a maximum likelihood criterion, bringing to the estimation of regression coefficients. The reliability of such analysis is therefore due to the ability to quantify the proneness to landslide occurrences by the probability level produced by the analysis. The inventory of dependent and independent variables were managed in a GIS, where geometric properties and attributes have been translated into raster cells in order to proceed with the logistic regression by means of SPSS (Statistical Package for the Social Sciences) package. A landslide inventory was used to produce the bivariate dependent variable whereas the independent set of variable concerned with slope, aspect, elevation, curvature, drained area, lithology and land use after their reductions to dummy variables. The effect of independent parameters on landslide occurrence was assessed by the corresponding coefficient in the logistic regression function, highlighting a major role played by the land use variable in determining occurrence and distribution of phenomena. Once the outcomes of the logistic regression are determined, data are re-introduced in the GIS to produce a map reporting the proneness to landslide as predicted level of probability. As validation of results and regression model a cell-by-cell comparison between the susceptibility map and the initial inventory of landslide events was performed and an agreement at 75% level achieved.

  20. Detecting Anomalies in Process Control Networks

    NASA Astrophysics Data System (ADS)

    Rrushi, Julian; Kang, Kyoung-Don

    This paper presents the estimation-inspection algorithm, a statistical algorithm for anomaly detection in process control networks. The algorithm determines if the payload of a network packet that is about to be processed by a control system is normal or abnormal based on the effect that the packet will have on a variable stored in control system memory. The estimation part of the algorithm uses logistic regression integrated with maximum likelihood estimation in an inductive machine learning process to estimate a series of statistical parameters; these parameters are used in conjunction with logistic regression formulas to form a probability mass function for each variable stored in control system memory. The inspection part of the algorithm uses the probability mass functions to estimate the normalcy probability of a specific value that a network packet writes to a variable. Experimental results demonstrate that the algorithm is very effective at detecting anomalies in process control networks.

  1. Modelling of binary logistic regression for obesity among secondary students in a rural area of Kedah

    NASA Astrophysics Data System (ADS)

    Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.

    2014-07-01

    Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.

  2. Combination of a Stressor-Response Model with a Conditional Probability Analysis Approach for Developing Candidate Criteria from MBSS

    EPA Science Inventory

    I show that a conditional probability analysis using a stressor-response model based on a logistic regression provides a useful approach for developing candidate water quality criteria from empirical data, such as the Maryland Biological Streams Survey (MBSS) data.

  3. Identification of immune correlates of protection in Shigella infection by application of machine learning.

    PubMed

    Arevalillo, Jorge M; Sztein, Marcelo B; Kotloff, Karen L; Levine, Myron M; Simon, Jakub K

    2017-10-01

    Immunologic correlates of protection are important in vaccine development because they give insight into mechanisms of protection, assist in the identification of promising vaccine candidates, and serve as endpoints in bridging clinical vaccine studies. Our goal is the development of a methodology to identify immunologic correlates of protection using the Shigella challenge as a model. The proposed methodology utilizes the Random Forests (RF) machine learning algorithm as well as Classification and Regression Trees (CART) to detect immune markers that predict protection, identify interactions between variables, and define optimal cutoffs. Logistic regression modeling is applied to estimate the probability of protection and the confidence interval (CI) for such a probability is computed by bootstrapping the logistic regression models. The results demonstrate that the combination of Classification and Regression Trees and Random Forests complements the standard logistic regression and uncovers subtle immune interactions. Specific levels of immunoglobulin IgG antibody in blood on the day of challenge predicted protection in 75% (95% CI 67-86). Of those subjects that did not have blood IgG at or above a defined threshold, 100% were protected if they had IgA antibody secreting cells above a defined threshold. Comparison with the results obtained by applying only logistic regression modeling with standard Akaike Information Criterion for model selection shows the usefulness of the proposed method. Given the complexity of the immune system, the use of machine learning methods may enhance traditional statistical approaches. When applied together, they offer a novel way to quantify important immune correlates of protection that may help the development of vaccines. Copyright © 2017 Elsevier Inc. All rights reserved.

  4. Validation of use of the International Consultation on Incontinence Questionnaire-Urinary Incontinence-Short Form (ICIQ-UI-SF) for impairment rating: a transversal retrospective study of 120 patients.

    PubMed

    Timmermans, Luc; Falez, Freddy; Mélot, Christian; Wespes, Eric

    2013-09-01

    A urinary incontinence impairment rating must be a highly accurate, non-invasive exploration of the condition using International Classification of Functioning (ICF)-based assessment tools. The objective of this study was to identify the best evaluation test and to determine an impairment rating model of urinary incontinence. In performing a cross-sectional study comparing successive urodynamic tests using both the International Consultation on Incontinence Questionnaire-Urinary Incontinence-Short Form (ICIQ-UI-SF) and the 1-hr pad-weighing test in 120 patients, we performed statistical likelihood ratio analysis and used logistic regression to calculate the probability of urodynamic incontinence using the most significant independent predictors. Subsequently, we created a template that was based on the significant predictors and the probability of urodynamic incontinence. The mean ICIQ-UI-SF score was 13.5 ± 4.6, and the median pad test value was 8 g. The discrimination statistic (receiver operating characteristic) described how well the urodynamic observations matched the ICIQ-UI-SF scores (under curve area (UDA):0.689) and the pad test data (UDA: 0.693). Using logistic regression analysis, we demonstrated that the best independent predictors of urodynamic incontinence were the patient's age and the ICIQ-UI-SF score. The logistic regression model permitted us to construct an equation to determine the probability of urodynamic incontinence. Using these tools, we created a template to generate a probability index of urodynamic urinary incontinence. Using this probability index, relative to the patient and to the maximum impairment of the whole person (MIWP) relative to urinary incontinence, we were able to calculate a patient's permanent impairment. Copyright © 2012 Wiley Periodicals, Inc.

  5. Ensemble learning of inverse probability weights for marginal structural modeling in large observational datasets.

    PubMed

    Gruber, Susan; Logan, Roger W; Jarrín, Inmaculada; Monge, Susana; Hernán, Miguel A

    2015-01-15

    Inverse probability weights used to fit marginal structural models are typically estimated using logistic regression. However, a data-adaptive procedure may be able to better exploit information available in measured covariates. By combining predictions from multiple algorithms, ensemble learning offers an alternative to logistic regression modeling to further reduce bias in estimated marginal structural model parameters. We describe the application of two ensemble learning approaches to estimating stabilized weights: super learning (SL), an ensemble machine learning approach that relies on V-fold cross validation, and an ensemble learner (EL) that creates a single partition of the data into training and validation sets. Longitudinal data from two multicenter cohort studies in Spain (CoRIS and CoRIS-MD) were analyzed to estimate the mortality hazard ratio for initiation versus no initiation of combined antiretroviral therapy among HIV positive subjects. Both ensemble approaches produced hazard ratio estimates further away from the null, and with tighter confidence intervals, than logistic regression modeling. Computation time for EL was less than half that of SL. We conclude that ensemble learning using a library of diverse candidate algorithms offers an alternative to parametric modeling of inverse probability weights when fitting marginal structural models. With large datasets, EL provides a rich search over the solution space in less time than SL with comparable results. Copyright © 2014 John Wiley & Sons, Ltd.

  6. Ensemble learning of inverse probability weights for marginal structural modeling in large observational datasets

    PubMed Central

    Gruber, Susan; Logan, Roger W.; Jarrín, Inmaculada; Monge, Susana; Hernán, Miguel A.

    2014-01-01

    Inverse probability weights used to fit marginal structural models are typically estimated using logistic regression. However a data-adaptive procedure may be able to better exploit information available in measured covariates. By combining predictions from multiple algorithms, ensemble learning offers an alternative to logistic regression modeling to further reduce bias in estimated marginal structural model parameters. We describe the application of two ensemble learning approaches to estimating stabilized weights: super learning (SL), an ensemble machine learning approach that relies on V -fold cross validation, and an ensemble learner (EL) that creates a single partition of the data into training and validation sets. Longitudinal data from two multicenter cohort studies in Spain (CoRIS and CoRIS-MD) were analyzed to estimate the mortality hazard ratio for initiation versus no initiation of combined antiretroviral therapy among HIV positive subjects. Both ensemble approaches produced hazard ratio estimates further away from the null, and with tighter confidence intervals, than logistic regression modeling. Computation time for EL was less than half that of SL. We conclude that ensemble learning using a library of diverse candidate algorithms offers an alternative to parametric modeling of inverse probability weights when fitting marginal structural models. With large datasets, EL provides a rich search over the solution space in less time than SL with comparable results. PMID:25316152

  7. Application of classification tree and logistic regression for the management and health intervention plans in a community-based study.

    PubMed

    Teng, Ju-Hsi; Lin, Kuan-Chia; Ho, Bin-Shenq

    2007-10-01

    A community-based aboriginal study was conducted and analysed to explore the application of classification tree and logistic regression. A total of 1066 aboriginal residents in Yilan County were screened during 2003-2004. The independent variables include demographic characteristics, physical examinations, geographic location, health behaviours, dietary habits and family hereditary diseases history. Risk factors of cardiovascular diseases were selected as the dependent variables in further analysis. The completion rate for heath interview is 88.9%. The classification tree results find that if body mass index is higher than 25.72 kg m(-2) and the age is above 51 years, the predicted probability for number of cardiovascular risk factors > or =3 is 73.6% and the population is 322. If body mass index is higher than 26.35 kg m(-2) and geographical latitude of the village is lower than 24 degrees 22.8', the predicted probability for number of cardiovascular risk factors > or =4 is 60.8% and the population is 74. As the logistic regression results indicate that body mass index, drinking habit and menopause are the top three significant independent variables. The classification tree model specifically shows the discrimination paths and interactions between the risk groups. The logistic regression model presents and analyses the statistical independent factors of cardiovascular risks. Applying both models to specific situations will provide a different angle for the design and management of future health intervention plans after community-based study.

  8. Attributes associated with probability of infestation by the pinon ips, Ips confusus, (Coleoptera: Scolytidae) in pinon pine, Pinus edulis

    Treesearch

    Jose E. Negron; Jill L. Wilson

    2003-01-01

    We examined attributes of pinon pine (Pinus edulis) associated with the probability of infestation by pinon ips (Ips confusus) in an outbreak in the Coconino National Forest, Arizona. We used data collected from 87 plots, 59 infested and 28 uninfested, and a logistic regression approach to estimate the probability ofinfestation based on plotand tree-level attributes....

  9. A Longitudinal Study of Welfare Exit among American Indian Families

    ERIC Educational Resources Information Center

    Pandey, Shanta; Guo, Baorong

    2007-01-01

    Data from a longitudinal survey of families from three reservations (Navajo Nation, San Carlos, and Salt River) in Arizona were used to examine their probability of welfare use. Logistic regression models were used to estimate the effects of individual, family, and structural factors on welfare exit. Results indicate that their probability of…

  10. Predicting postfire Douglas-fir beetle attacks and tree mortality in the northern Rocky Mountains

    Treesearch

    Sharon Hood; Barbara Bentz

    2007-01-01

    Douglas-fir (Pseudotsuga menziesii (Mirb.) Franco) were monitored for 4 years following three wildfires. Logistic regression analyses were used to develop models predicting the probability of attack by Douglas-fir beetle (Dendroctonus pseudotsugae Hopkins, 1905) and the probability of Douglas-fir mortality within 4 years following...

  11. Combination of a Stresor-Response Model with a Conditional Probability Anaylsis Approach to Develop Candidate Criteria from Empirical Data

    EPA Science Inventory

    We show that a conditional probability analysis that utilizes a stressor-response model based on a logistic regression provides a useful approach for developing candidate water quality criterai from empirical data. The critical step in this approach is transforming the response ...

  12. Multinomial logistic regression in workers' health

    NASA Astrophysics Data System (ADS)

    Grilo, Luís M.; Grilo, Helena L.; Gonçalves, Sónia P.; Junça, Ana

    2017-11-01

    In European countries, namely in Portugal, it is common to hear some people mentioning that they are exposed to excessive and continuous psychosocial stressors at work. This is increasing in diverse activity sectors, such as, the Services sector. A representative sample was collected from a Portuguese Services' organization, by applying a survey (internationally validated), which variables were measured in five ordered categories in Likert-type scale. A multinomial logistic regression model is used to estimate the probability of each category of the dependent variable general health perception where, among other independent variables, burnout appear as statistically significant.

  13. Multinomial logistic regression modelling of obesity and overweight among primary school students in a rural area of Negeri Sembilan

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd

    Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test ofmore » the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.« less

  14. Multinomial logistic regression modelling of obesity and overweight among primary school students in a rural area of Negeri Sembilan

    NASA Astrophysics Data System (ADS)

    Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd; Baharum, Adam

    2015-10-01

    Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test of the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.

  15. Development of a statistical model for the determination of the probability of riverbank erosion in a Meditteranean river basin

    NASA Astrophysics Data System (ADS)

    Varouchakis, Emmanouil; Kourgialas, Nektarios; Karatzas, George; Giannakis, Georgios; Lilli, Maria; Nikolaidis, Nikolaos

    2014-05-01

    Riverbank erosion affects the river morphology and the local habitat and results in riparian land loss, damage to property and infrastructures, ultimately weakening flood defences. An important issue concerning riverbank erosion is the identification of the areas vulnerable to erosion, as it allows for predicting changes and assists with stream management and restoration. One way to predict the vulnerable to erosion areas is to determine the erosion probability by identifying the underlying relations between riverbank erosion and the geomorphological and/or hydrological variables that prevent or stimulate erosion. A statistical model for evaluating the probability of erosion based on a series of independent local variables and by using logistic regression is developed in this work. The main variables affecting erosion are vegetation index (stability), the presence or absence of meanders, bank material (classification), stream power, bank height, river bank slope, riverbed slope, cross section width and water velocities (Luppi et al. 2009). In statistics, logistic regression is a type of regression analysis used for predicting the outcome of a categorical dependent variable, e.g. binary response, based on one or more predictor variables (continuous or categorical). The probabilities of the possible outcomes are modelled as a function of independent variables using a logistic function. Logistic regression measures the relationship between a categorical dependent variable and, usually, one or several continuous independent variables by converting the dependent variable to probability scores. Then, a logistic regression is formed, which predicts success or failure of a given binary variable (e.g. 1 = "presence of erosion" and 0 = "no erosion") for any value of the independent variables. The regression coefficients are estimated by using maximum likelihood estimation. The erosion occurrence probability can be calculated in conjunction with the model deviance regarding the independent variables tested (Atkinson et al. 2003). The developed statistical model is applied to the Koiliaris River Basin in the island of Crete, Greece. The aim is to determine the probability of erosion along the Koiliaris' riverbanks considering a series of independent geomorphological and/or hydrological variables. Data for the river bank slope and for the river cross section width are available at ten locations along the river. The riverbank has indications of erosion at six of the ten locations while four has remained stable. Based on a recent work, measurements for the two independent variables and data regarding bank stability are available at eight different locations along the river. These locations were used as validation points for the proposed statistical model. The results show a very close agreement between the observed erosion indications and the statistical model as the probability of erosion was accurately predicted at seven out of the eight locations. The next step is to apply the model at more locations along the riverbanks. In November 2013, stakes were inserted at selected locations in order to be able to identify the presence or absence of erosion after the winter period. In April 2014 the presence or absence of erosion will be identified and the model results will be compared to the field data. Our intent is to extend the model by increasing the number of independent variables in order to indentify the key factors favouring erosion along the Koiliaris River. We aim at developing an easy to use statistical tool that will provide a quantified measure of the erosion probability along the riverbanks, which could consequently be used to prevent erosion and flooding events. Atkinson, P. M., German, S. E., Sear, D. A. and Clark, M. J. 2003. Exploring the relations between riverbank erosion and geomorphological controls using geographically weighted logistic regression. Geographical Analysis, 35 (1), 58-82. Luppi, L., Rinaldi, M., Teruggi, L. B., Darby, S. E. and Nardi, L. 2009. Monitoring and numerical modelling of riverbank erosion processes: A case study along the Cecina River (central Italy). Earth Surface Processes and Landforms, 34 (4), 530-546. Acknowledgements This work is part of an on-going THALES project (CYBERSENSORS - High Frequency Monitoring System for Integrated Water Resources Management of Rivers). The project has been co-financed by the European Union (European Social Fund - ESF) and Greek national funds through the Operational Program "Education and Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research Funding Program: THALES. Investing in knowledge society through the European Social Fund.

  16. A comparison between Bayes discriminant analysis and logistic regression for prediction of debris flow in southwest Sichuan, China

    NASA Astrophysics Data System (ADS)

    Xu, Wenbo; Jing, Shaocai; Yu, Wenjuan; Wang, Zhaoxian; Zhang, Guoping; Huang, Jianxi

    2013-11-01

    In this study, the high risk areas of Sichuan Province with debris flow, Panzhihua and Liangshan Yi Autonomous Prefecture, were taken as the studied areas. By using rainfall and environmental factors as the predictors and based on the different prior probability combinations of debris flows, the prediction of debris flows was compared in the areas with statistical methods: logistic regression (LR) and Bayes discriminant analysis (BDA). The results through the comprehensive analysis show that (a) with the mid-range scale prior probability, the overall predicting accuracy of BDA is higher than those of LR; (b) with equal and extreme prior probabilities, the overall predicting accuracy of LR is higher than those of BDA; (c) the regional predicting models of debris flows with rainfall factors only have worse performance than those introduced environmental factors, and the predicting accuracies of occurrence and nonoccurrence of debris flows have been changed in the opposite direction as the supplemented information.

  17. Nowcasting of Low-Visibility Procedure States with Ordered Logistic Regression at Vienna International Airport

    NASA Astrophysics Data System (ADS)

    Kneringer, Philipp; Dietz, Sebastian; Mayr, Georg J.; Zeileis, Achim

    2017-04-01

    Low-visibility conditions have a large impact on aviation safety and economic efficiency of airports and airlines. To support decision makers, we develop a statistical probabilistic nowcasting tool for the occurrence of capacity-reducing operations related to low visibility. The probabilities of four different low visibility classes are predicted with an ordered logistic regression model based on time series of meteorological point measurements. Potential predictor variables for the statistical models are visibility, humidity, temperature and wind measurements at several measurement sites. A stepwise variable selection method indicates that visibility and humidity measurements are the most important model inputs. The forecasts are tested with a 30 minute forecast interval up to two hours, which is a sufficient time span for tactical planning at Vienna Airport. The ordered logistic regression models outperform persistence and are competitive with human forecasters.

  18. Investigation of possibility of surface rupture derived from PFDHA and calculation of surface displacement based on dislocation

    NASA Astrophysics Data System (ADS)

    Inoue, N.; Kitada, N.; Irikura, K.

    2013-12-01

    A probability of surface rupture is important to configure the seismic source, such as area sources or fault models, for a seismic hazard evaluation. In Japan, Takemura (1998) estimated the probability based on the historical earthquake data. Kagawa et al. (2004) evaluated the probability based on a numerical simulation of surface displacements. The estimated probability indicates a sigmoid curve and increases between Mj (the local magnitude defined and calculated by Japan Meteorological Agency) =6.5 and Mj=7.0. The probability of surface rupture is also used in a probabilistic fault displacement analysis (PFDHA). The probability is determined from the collected earthquake catalog, which were classified into two categories: with surface rupture or without surface rupture. The logistic regression is performed for the classified earthquake data. Youngs et al. (2003), Ross and Moss (2011) and Petersen et al. (2011) indicate the logistic curves of the probability of surface rupture by normal, reverse and strike-slip faults, respectively. Takao et al. (2013) shows the logistic curve derived from only Japanese earthquake data. The Japanese probability curve shows the sharply increasing in narrow magnitude range by comparison with other curves. In this study, we estimated the probability of surface rupture applying the logistic analysis to the surface displacement derived from a surface displacement calculation. A source fault was defined in according to the procedure of Kagawa et al. (2004), which determined a seismic moment from a magnitude and estimated the area size of the asperity and the amount of slip. Strike slip and reverse faults were considered as source faults. We applied Wang et al. (2003) for calculations. The surface displacements with defined source faults were calculated by varying the depth of the fault. A threshold value as 5cm of surface displacement was used to evaluate whether a surface rupture reach or do not reach to the surface. We carried out the logistic regression analysis to the calculated displacements, which were classified by the above threshold. The estimated probability curve indicated the similar trend to the result of Takao et al. (2013). The probability of revere faults is larger than that of strike slip faults. On the other hand, PFDHA results show different trends. The probability of reverse faults at higher magnitude is lower than that of strike slip and normal faults. Ross and Moss (2011) suggested that the sediment and/or rock over the fault compress and not reach the displacement to the surface enough. The numerical theory applied in this study cannot deal with a complex initial situation such as topography.

  19. Attributes associated with probability of infestation by the pinon Ips, Ips confusus, (Coleoptera: Scolytidae) in pinon pine, Pinus edulis

    Treesearch

    Jose F. Negron; Jill L. Wilson

    2008-01-01

    (Please note, this is an abstract only) We examined attributes associated with the probability of infestation by pinon ips (Ips confusus), in pinon pine (Pinus edulis), in an outbreak in the Coconino National Forest, Arizona. We used data collected from 87 plots, 59 infested and 28 uninfested, and a logistic regression approach to estimate the probability of...

  20. Satellite rainfall retrieval by logistic regression

    NASA Technical Reports Server (NTRS)

    Chiu, Long S.

    1986-01-01

    The potential use of logistic regression in rainfall estimation from satellite measurements is investigated. Satellite measurements provide covariate information in terms of radiances from different remote sensors.The logistic regression technique can effectively accommodate many covariates and test their significance in the estimation. The outcome from the logistical model is the probability that the rainrate of a satellite pixel is above a certain threshold. By varying the thresholds, a rainrate histogram can be obtained, from which the mean and the variant can be estimated. A logistical model is developed and applied to rainfall data collected during GATE, using as covariates the fractional rain area and a radiance measurement which is deduced from a microwave temperature-rainrate relation. It is demonstrated that the fractional rain area is an important covariate in the model, consistent with the use of the so-called Area Time Integral in estimating total rain volume in other studies. To calibrate the logistical model, simulated rain fields generated by rainfield models with prescribed parameters are needed. A stringent test of the logistical model is its ability to recover the prescribed parameters of simulated rain fields. A rain field simulation model which preserves the fractional rain area and lognormality of rainrates as found in GATE is developed. A stochastic regression model of branching and immigration whose solutions are lognormally distributed in some asymptotic limits has also been developed.

  1. A local equation for differential diagnosis of β-thalassemia trait and iron deficiency anemia by logistic regression analysis in Southeast Iran.

    PubMed

    Sargolzaie, Narjes; Miri-Moghaddam, Ebrahim

    2014-01-01

    The most common differential diagnosis of β-thalassemia (β-thal) trait is iron deficiency anemia. Several red blood cell equations were introduced during different studies for differential diagnosis between β-thal trait and iron deficiency anemia. Due to genetic variations in different regions, these equations cannot be useful in all population. The aim of this study was to determine a native equation with high accuracy for differential diagnosis of β-thal trait and iron deficiency anemia for the Sistan and Baluchestan population by logistic regression analysis. We selected 77 iron deficiency anemia and 100 β-thal trait cases. We used binary logistic regression analysis and determined best equations for probability prediction of β-thal trait against iron deficiency anemia in our population. We compared diagnostic values and receiver operative characteristic (ROC) curve related to this equation and another 10 published equations in discriminating β-thal trait and iron deficiency anemia. The binary logistic regression analysis determined the best equation for best probability prediction of β-thal trait against iron deficiency anemia with area under curve (AUC) 0.998. Based on ROC curves and AUC, Green & King, England & Frazer, and then Sirdah indices, respectively, had the most accuracy after our equation. We suggest that to get the best equation and cut-off in each region, one needs to evaluate specific information of each region, specifically in areas where populations are homogeneous, to provide a specific formula for differentiating between β-thal trait and iron deficiency anemia.

  2. Optimized endogenous post-stratification in forest inventories

    Treesearch

    Paul L. Patterson

    2012-01-01

    An example of endogenous post-stratification is the use of remote sensing data with a sample of ground data to build a logistic regression model to predict the probability that a plot is forested and using the predicted probabilities to form categories for post-stratification. An optimized endogenous post-stratified estimator of the proportion of forest has been...

  3. Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods?

    PubMed Central

    Austin, Peter C; Lee, Douglas S; Steyerberg, Ewout W; Tu, Jack V

    2012-01-01

    In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. PMID:22777999

  4. Comparison of Different Risk Perception Measures in Predicting Seasonal Influenza Vaccination among Healthy Chinese Adults in Hong Kong: A Prospective Longitudinal Study

    PubMed Central

    Liao, Qiuyan; Wong, Wing Sze; Fielding, Richard

    2013-01-01

    Background Risk perception is a reported predictor of vaccination uptake, but which measures of risk perception best predict influenza vaccination uptake remain unclear. Methodology During the main influenza seasons (between January and March) of 2009 (Wave 1) and 2010 (Wave 2),505 Chinese students and employees from a Hong Kong university completed an online survey. Multivariate logistic regression models were conducted to assess how well different risk perceptions measures in Wave 1 predicted vaccination uptake against seasonal influenza in Wave 2. Principal Findings The results of the multivariate logistic regression models showed that feeling at risk (β = 0.25, p = 0.021) was the better predictor compared with probability judgment while probability judgment (β = 0.25, p = 0.029 ) was better than beliefs about risk in predicting subsequent influenza vaccination uptake. Beliefs about risk and feeling at risk seemed to predict the same aspect of subsequent vaccination uptake because their associations with vaccination uptake became insignificant when paired into the logistic regression model. Similarly, to compare the four scales for assessing probability judgment in predicting vaccination uptake, the 7-point verbal scale remained a significant and stronger predictor for vaccination uptake when paired with other three scales; the 6-point verbal scale was a significant and stronger predictor when paired with the percentage scale or the 2-point verbal scale; and the percentage scale was a significant and stronger predictor only when paired with the 2-point verbal scale. Conclusions/Significance Beliefs about risk and feeling at risk are not well differentiated by Hong Kong Chinese people. Feeling at risk, an affective-cognitive dimension of risk perception predicts subsequent vaccination uptake better than do probability judgments. Among the four scales for assessing risk probability judgment, the 7-point verbal scale offered the best predictive power for subsequent vaccination uptake. PMID:23894292

  5. Comparison of different risk perception measures in predicting seasonal influenza vaccination among healthy Chinese adults in Hong Kong: a prospective longitudinal study.

    PubMed

    Liao, Qiuyan; Wong, Wing Sze; Fielding, Richard

    2013-01-01

    Risk perception is a reported predictor of vaccination uptake, but which measures of risk perception best predict influenza vaccination uptake remain unclear. During the main influenza seasons (between January and March) of 2009 (Wave 1) and 2010 (Wave 2),505 Chinese students and employees from a Hong Kong university completed an online survey. Multivariate logistic regression models were conducted to assess how well different risk perceptions measures in Wave 1 predicted vaccination uptake against seasonal influenza in Wave 2. The results of the multivariate logistic regression models showed that feeling at risk (β = 0.25, p = 0.021) was the better predictor compared with probability judgment while probability judgment (β = 0.25, p = 0.029 ) was better than beliefs about risk in predicting subsequent influenza vaccination uptake. Beliefs about risk and feeling at risk seemed to predict the same aspect of subsequent vaccination uptake because their associations with vaccination uptake became insignificant when paired into the logistic regression model. Similarly, to compare the four scales for assessing probability judgment in predicting vaccination uptake, the 7-point verbal scale remained a significant and stronger predictor for vaccination uptake when paired with other three scales; the 6-point verbal scale was a significant and stronger predictor when paired with the percentage scale or the 2-point verbal scale; and the percentage scale was a significant and stronger predictor only when paired with the 2-point verbal scale. Beliefs about risk and feeling at risk are not well differentiated by Hong Kong Chinese people. Feeling at risk, an affective-cognitive dimension of risk perception predicts subsequent vaccination uptake better than do probability judgments. Among the four scales for assessing risk probability judgment, the 7-point verbal scale offered the best predictive power for subsequent vaccination uptake.

  6. Efficient estimation of the attributable fraction when there are monotonicity constraints and interactions.

    PubMed

    Traskin, Mikhail; Wang, Wei; Ten Have, Thomas R; Small, Dylan S

    2013-01-01

    The PAF for an exposure is the fraction of disease cases in a population that can be attributed to that exposure. One method of estimating the PAF involves estimating the probability of having the disease given the exposure and confounding variables. In many settings, the exposure will interact with the confounders and the confounders will interact with each other. Also, in many settings, the probability of having the disease is thought, based on subject matter knowledge, to be a monotone increasing function of the exposure and possibly of some of the confounders. We develop an efficient approach for estimating logistic regression models with interactions and monotonicity constraints, and apply this approach to estimating the population attributable fraction (PAF). Our approach produces substantially more accurate estimates of the PAF in some settings than the usual approach which uses logistic regression without monotonicity constraints.

  7. Probability of Elevated Volatile Organic Compound (VOC) Concentrations in Groundwater in the Eagle River Watershed Valley-Fill Aquifer, Eagle County, North-Central Colorado, 2006-2007

    USGS Publications Warehouse

    Rupert, Michael G.; Plummer, Niel

    2009-01-01

    This raster data set delineates the predicted probability of elevated volatile organic compound (VOC) concentrations in groundwater in the Eagle River watershed valley-fill aquifer, Eagle County, North-Central Colorado, 2006-2007. This data set was developed by a cooperative project between the U.S. Geological Survey, Eagle County, the Eagle River Water and Sanitation District, the Town of Eagle, the Town of Gypsum, and the Upper Eagle Regional Water Authority. This project was designed to evaluate potential land-development effects on groundwater and surface-water resources so that informed land-use and water management decisions can be made. This groundwater probability map and its associated probability maps was developed as follows: (1) A point data set of wells with groundwater quality and groundwater age data was overlaid with thematic layers of anthropogenic (related to human activities) and hydrogeologic data by using a geographic information system to assign each well values for depth to groundwater, distance to major streams and canals, distance to gypsum beds, precipitation, soils, and well depth. These data then were downloaded to a statistical software package for analysis by logistic regression. (2) Statistical models predicting the probability of elevated nitrate concentrations, the probability of unmixed young water (using chlorofluorocarbon-11 concentrations and tritium activities), and the probability of elevated volatile organic compound concentrations were developed using logistic regression techniques. (3) The statistical models were entered into a GIS and the probability map was constructed.

  8. Probability of Elevated Nitrate Concentrations in Groundwater in the Eagle River Watershed Valley-Fill Aquifer, Eagle County, North-Central Colorado, 2006-2007

    USGS Publications Warehouse

    Rupert, Michael G.; Plummer, Niel

    2009-01-01

    This raster data set delineates the predicted probability of elevated nitrate concentrations in groundwater in the Eagle River watershed valley-fill aquifer, Eagle County, North-Central Colorado, 2006-2007. This data set was developed by a cooperative project between the U.S. Geological Survey, Eagle County, the Eagle River Water and Sanitation District, the Town of Eagle, the Town of Gypsum, and the Upper Eagle Regional Water Authority. This project was designed to evaluate potential land-development effects on groundwater and surface-water resources so that informed land-use and water management decisions can be made. This groundwater probability map and its associated probability maps was developed as follows: (1) A point data set of wells with groundwater quality and groundwater age data was overlaid with thematic layers of anthropogenic (related to human activities) and hydrogeologic data by using a geographic information system to assign each well values for depth to groundwater, distance to major streams and canals, distance to gypsum beds, precipitation, soils, and well depth. These data then were downloaded to a statistical software package for analysis by logistic regression. (2) Statistical models predicting the probability of elevated nitrate concentrations, the probability of unmixed young water (using chlorofluorocarbon-11 concentrations and tritium activities), and the probability of elevated volatile organic compound concentrations were developed using logistic regression techniques. (3) The statistical models were entered into a GIS and the probability map was constructed.

  9. Factors associated with automobile accidents and survival.

    PubMed

    Kim, Hong Sok; Kim, Hyung Jin; Son, Bongsoo

    2006-09-01

    This paper develops an econometric model for vehicles' inherent mortality rate and estimates the probability of accidents and survival in the United States. Logistic regression model is used to estimate probability of survival, and censored regression model is used to estimate probability of accidents. The estimation results indicated that the probability of accident and survival are influenced by the physical characteristics of the vehicles involved in the accident, and by the characteristics of the driver and the occupants. Using restrain system and riding in heavy vehicle increased the survival rate. Middle-aged drivers are less susceptible to involve in an accident, and surprisingly, female drivers are more likely to have an accident than male drivers. Riding in powerful vehicles (high horsepower) and driving late night increase the probability of accident. Overall, the driving behavior and characteristics of vehicle does matter and affects the probabilities of having a fatal accident for different types of vehicles.

  10. Using multiple logistic regression and GIS technology to predict landslide hazard in northeast Kansas, USA

    USGS Publications Warehouse

    Ohlmacher, G.C.; Davis, J.C.

    2003-01-01

    Landslides in the hilly terrain along the Kansas and Missouri rivers in northeastern Kansas have caused millions of dollars in property damage during the last decade. To address this problem, a statistical method called multiple logistic regression has been used to create a landslide-hazard map for Atchison, Kansas, and surrounding areas. Data included digitized geology, slopes, and landslides, manipulated using ArcView GIS. Logistic regression relates predictor variables to the occurrence or nonoccurrence of landslides within geographic cells and uses the relationship to produce a map showing the probability of future landslides, given local slopes and geologic units. Results indicated that slope is the most important variable for estimating landslide hazard in the study area. Geologic units consisting mostly of shale, siltstone, and sandstone were most susceptible to landslides. Soil type and aspect ratio were considered but excluded from the final analysis because these variables did not significantly add to the predictive power of the logistic regression. Soil types were highly correlated with the geologic units, and no significant relationships existed between landslides and slope aspect. ?? 2003 Elsevier Science B.V. All rights reserved.

  11. Spatial analysis of land use and shallow groundwater vulnerability in the watershed adjacent to Assateague Island National Seashore, Maryland and Virginia, USA

    USGS Publications Warehouse

    LaMotte, A.E.; Greene, E.A.

    2007-01-01

    Spatial relations between land use and groundwater quality in the watershed adjacent to Assateague Island National Seashore, Maryland and Virginia, USA were analyzed by the use of two spatial models. One model used a logit analysis and the other was based on geostatistics. The models were developed and compared on the basis of existing concentrations of nitrate as nitrogen in samples from 529 domestic wells. The models were applied to produce spatial probability maps that show areas in the watershed where concentrations of nitrate in groundwater are likely to exceed a predetermined management threshold value. Maps of the watershed generated by logistic regression and probability kriging analysis showing where the probability of nitrate concentrations would exceed 3 mg/L (>0.50) compared favorably. Logistic regression was less dependent on the spatial distribution of sampled wells, and identified an additional high probability area within the watershed that was missed by probability kriging. The spatial probability maps could be used to determine the natural or anthropogenic factors that best explain the occurrence and distribution of elevated concentrations of nitrate (or other constituents) in shallow groundwater. This information can be used by local land-use planners, ecologists, and managers to protect water supplies and identify land-use planning solutions and monitoring programs in vulnerable areas. ?? 2006 Springer-Verlag.

  12. Quantitative Analysis of Land Loss in Coastal Louisiana Using Remote Sensing

    NASA Astrophysics Data System (ADS)

    Wales, P. M.; Kuszmaul, J.; Roberts, C.

    2005-12-01

    For the past thirty-five years the land loss along the Louisiana Coast has been recognized as a growing problem. One of the clearest indicators of this land loss is that in 2000 smooth cord grass (spartina alterniflora) was turning brown well before its normal hibernation period. Over 100,000 acres of marsh were affected by the 2000 browning. In 2001 data were collected using low altitude helicopter based transects of the coast, with 7,400 data points being collected by researchers at the USGS, National Wetlands Research Center, and Louisiana Department of Natural Resources. The surveys contained data describing the characteristics of the marsh, including latitude, longitude, marsh condition, marsh color, percent vegetated, and marsh die-back. Creating a model that combines remote sensing images, field data, and statistical analysis to develop a methodology for estimating the margin of error in measurements of coastal land loss (erosion) is the ultimate goal of the study. A model was successfully created using a series of band combinations (used as predictive variables). The most successful band combinations or predictive variables were the braud value [(Sum Visible TM Bands - Sum Infrared TM Bands)/(Sum Visible TM Bands + Sum Infrared TM Bands)], TM band 7/ TM band 2, brightness, NDVI, wetness, vegetation index, and a 7x7 autocovariate nearest neighbor floating window. The model values were used to generate the logistic regression model. A new image was created based on the logistic regression probability equation where each pixel represents the probability of finding water or non-water at that location in each image. Pixels within each image that have a high probability of representing water have a value close to 1 and pixels with a low probability of representing water have a value close to 0. A logistic regression model is proposed that uses seven independent variables. This model yields an accurate classification in 86.5% of the locations considered in the 1997 and 2001 survey locations. When the logistic regression was modeled to the satellite imagery of the entire Louisiana Coast study area a statewide loss was estimated to be 358 mi2 to 368 mi2, from 1997 to 2001, using two different methods for estimating land loss.

  13. Forecasting the probability of future groundwater levels declining below specified low thresholds in the conterminous U.S.

    USGS Publications Warehouse

    Dudley, Robert W.; Hodgkins, Glenn A.; Dickinson, Jesse

    2017-01-01

    We present a logistic regression approach for forecasting the probability of future groundwater levels declining or maintaining below specific groundwater-level thresholds. We tested our approach on 102 groundwater wells in different climatic regions and aquifers of the United States that are part of the U.S. Geological Survey Groundwater Climate Response Network. We evaluated the importance of current groundwater levels, precipitation, streamflow, seasonal variability, Palmer Drought Severity Index, and atmosphere/ocean indices for developing the logistic regression equations. Several diagnostics of model fit were used to evaluate the regression equations, including testing of autocorrelation of residuals, goodness-of-fit metrics, and bootstrap validation testing. The probabilistic predictions were most successful at wells with high persistence (low month-to-month variability) in their groundwater records and at wells where the groundwater level remained below the defined low threshold for sustained periods (generally three months or longer). The model fit was weakest at wells with strong seasonal variability in levels and with shorter duration low-threshold events. We identified challenges in deriving probabilistic-forecasting models and possible approaches for addressing those challenges.

  14. A simple approach to power and sample size calculations in logistic regression and Cox regression models.

    PubMed

    Vaeth, Michael; Skovlund, Eva

    2004-06-15

    For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.

  15. Wildfire Risk Mapping over the State of Mississippi: Land Surface Modeling Approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cooke, William H.; Mostovoy, Georgy; Anantharaj, Valentine G

    2012-01-01

    Three fire risk indexes based on soil moisture estimates were applied to simulate wildfire probability over the southern part of Mississippi using the logistic regression approach. The fire indexes were retrieved from: (1) accumulated difference between daily precipitation and potential evapotranspiration (P-E); (2) top 10 cm soil moisture content simulated by the Mosaic land surface model; and (3) the Keetch-Byram drought index (KBDI). The P-E, KBDI, and soil moisture based indexes were estimated from gridded atmospheric and Mosaic-simulated soil moisture data available from the North American Land Data Assimilation System (NLDAS-2). Normalized deviations of these indexes from the 31-year meanmore » (1980-2010) were fitted into the logistic regression model describing probability of wildfires occurrence as a function of the fire index. It was assumed that such normalization provides more robust and adequate description of temporal dynamics of soil moisture anomalies than the original (not normalized) set of indexes. The logistic model parameters were evaluated for 0.25 x0.25 latitude/longitude cells and for probability representing at least one fire event occurred during 5 consecutive days. A 23-year (1986-2008) forest fires record was used. Two periods were selected and examined (January mid June and mid September December). The application of the logistic model provides an overall good agreement between empirical/observed and model-fitted fire probabilities over the study area during both seasons. The fire risk indexes based on the top 10 cm soil moisture and KBDI have the largest impact on the wildfire odds (increasing it by almost 2 times in response to each unit change of the corresponding fire risk index during January mid June period and by nearly 1.5 times during mid September-December) observed over 0.25 x0.25 cells located along the state of Mississippi Coast line. This result suggests a rather strong control of fire risk indexes on fire occurrence probability over this region.« less

  16. The alarming problems of confounding equivalence using logistic regression models in the perspective of causal diagrams.

    PubMed

    Yu, Yuanyuan; Li, Hongkai; Sun, Xiaoru; Su, Ping; Wang, Tingting; Liu, Yi; Yuan, Zhongshang; Liu, Yanxun; Xue, Fuzhong

    2017-12-28

    Confounders can produce spurious associations between exposure and outcome in observational studies. For majority of epidemiologists, adjusting for confounders using logistic regression model is their habitual method, though it has some problems in accuracy and precision. It is, therefore, important to highlight the problems of logistic regression and search the alternative method. Four causal diagram models were defined to summarize confounding equivalence. Both theoretical proofs and simulation studies were performed to verify whether conditioning on different confounding equivalence sets had the same bias-reducing potential and then to select the optimum adjusting strategy, in which logistic regression model and inverse probability weighting based marginal structural model (IPW-based-MSM) were compared. The "do-calculus" was used to calculate the true causal effect of exposure on outcome, then the bias and standard error were used to evaluate the performances of different strategies. Adjusting for different sets of confounding equivalence, as judged by identical Markov boundaries, produced different bias-reducing potential in the logistic regression model. For the sets satisfied G-admissibility, adjusting for the set including all the confounders reduced the equivalent bias to the one containing the parent nodes of the outcome, while the bias after adjusting for the parent nodes of exposure was not equivalent to them. In addition, all causal effect estimations through logistic regression were biased, although the estimation after adjusting for the parent nodes of exposure was nearest to the true causal effect. However, conditioning on different confounding equivalence sets had the same bias-reducing potential under IPW-based-MSM. Compared with logistic regression, the IPW-based-MSM could obtain unbiased causal effect estimation when the adjusted confounders satisfied G-admissibility and the optimal strategy was to adjust for the parent nodes of outcome, which obtained the highest precision. All adjustment strategies through logistic regression were biased for causal effect estimation, while IPW-based-MSM could always obtain unbiased estimation when the adjusted set satisfied G-admissibility. Thus, IPW-based-MSM was recommended to adjust for confounders set.

  17. Allocating Fire Mitigation Funds on the Basis of the Predicted Probabilities of Forest Wildfire

    Treesearch

    Ronald E. McRoberts; Greg C. Liknes; Mark D. Nelson; Krista M. Gebert; R. James Barbour; Susan L. Odell; Steven C. Yaddof

    2005-01-01

    A logistic regression model was used with map-based information to predict the probability of forest fire for forested areas of the United States. Model parameters were estimated using a digital layer depicting the locations of wildfires and satellite imagery depicting thermal hotspots. The area of the United States in the upper 50th percentile with respect to...

  18. Steganalysis using logistic regression

    NASA Astrophysics Data System (ADS)

    Lubenko, Ivans; Ker, Andrew D.

    2011-02-01

    We advocate Logistic Regression (LR) as an alternative to the Support Vector Machine (SVM) classifiers commonly used in steganalysis. LR offers more information than traditional SVM methods - it estimates class probabilities as well as providing a simple classification - and can be adapted more easily and efficiently for multiclass problems. Like SVM, LR can be kernelised for nonlinear classification, and it shows comparable classification accuracy to SVM methods. This work is a case study, comparing accuracy and speed of SVM and LR classifiers in detection of LSB Matching and other related spatial-domain image steganography, through the state-of-art 686-dimensional SPAM feature set, in three image sets.

  19. Deciphering factors controlling groundwater arsenic spatial variability in Bangladesh

    NASA Astrophysics Data System (ADS)

    Tan, Z.; Yang, Q.; Zheng, C.; Zheng, Y.

    2017-12-01

    Elevated concentrations of geogenic arsenic in groundwater have been found in many countries to exceed 10 μg/L, the WHO's guideline value for drinking water. A common yet unexplained characteristic of groundwater arsenic spatial distribution is the extensive variability at various spatial scales. This study investigates factors influencing the spatial variability of groundwater arsenic in Bangladesh to improve the accuracy of models predicting arsenic exceedance rate spatially. A novel boosted regression tree method is used to establish a weak-learning ensemble model, which is compared to a linear model using a conventional stepwise logistic regression method. The boosted regression tree models offer the advantage of parametric interaction when big datasets are analyzed in comparison to the logistic regression. The point data set (n=3,538) of groundwater hydrochemistry with 19 parameters was obtained by the British Geological Survey in 2001. The spatial data sets of geological parameters (n=13) were from the Consortium for Spatial Information, Technical University of Denmark, University of East Anglia and the FAO, while the soil parameters (n=42) were from the Harmonized World Soil Database. The aforementioned parameters were regressed to categorical groundwater arsenic concentrations below or above three thresholds: 5 μg/L, 10 μg/L and 50 μg/L to identify respective controlling factors. Boosted regression tree method outperformed logistic regression methods in all three threshold levels in terms of accuracy, specificity and sensitivity, resulting in an improvement of spatial distribution map of probability of groundwater arsenic exceeding all three thresholds when compared to disjunctive-kriging interpolated spatial arsenic map using the same groundwater arsenic dataset. Boosted regression tree models also show that the most important controlling factors of groundwater arsenic distribution include groundwater iron content and well depth for all three thresholds. The probability of a well with iron content higher than 5mg/L to contain greater than 5 μg/L, 10 μg/L and 50 μg/L As is estimated to be more than 91%, 85% and 51%, respectively, while the probability of a well from depth more than 160m to contain more than 5 μg/L, 10 μg/L and 50 μg/L As is estimated to be less than 38%, 25% and 14%, respectively.

  20. Modeling the dynamics of urban growth using multinomial logistic regression: a case study of Jiayu County, Hubei Province, China

    NASA Astrophysics Data System (ADS)

    Nong, Yu; Du, Qingyun; Wang, Kun; Miao, Lei; Zhang, Weiwei

    2008-10-01

    Urban growth modeling, one of the most important aspects of land use and land cover change study, has attracted substantial attention because it helps to comprehend the mechanisms of land use change thus helps relevant policies made. This study applied multinomial logistic regression to model urban growth in the Jiayu county of Hubei province, China to discover the relationship between urban growth and the driving forces of which biophysical and social-economic factors are selected as independent variables. This type of regression is similar to binary logistic regression, but it is more general because the dependent variable is not restricted to two categories, as those previous studies did. The multinomial one can simulate the process of multiple land use competition between urban land, bare land, cultivated land and orchard land. Taking the land use type of Urban as reference category, parameters could be estimated with odds ratio. A probability map is generated from the model to predict where urban growth will occur as a result of the computation.

  1. Probability of Unmixed Young Groundwater (defined using chlorofluorocarbon-11 concentrations and tritium activities) in the Eagle River Watershed Valley-Fill Aquifer, Eagle County, North-Central Colorado, 2006-2007

    USGS Publications Warehouse

    Rupert, Michael G.; Plummer, Niel

    2009-01-01

    This raster data set delineates the predicted probability of unmixed young groundwater (defined using chlorofluorocarbon-11 concentrations and tritium activities) in groundwater in the Eagle River watershed valley-fill aquifer, Eagle County, North-Central Colorado, 2006-2007. This data set was developed by a cooperative project between the U.S. Geological Survey, Eagle County, the Eagle River Water and Sanitation District, the Town of Eagle, the Town of Gypsum, and the Upper Eagle Regional Water Authority. This project was designed to evaluate potential land-development effects on groundwater and surface-water resources so that informed land-use and water management decisions can be made. This groundwater probability map and its associated probability maps were developed as follows: (1) A point data set of wells with groundwater quality and groundwater age data was overlaid with thematic layers of anthropogenic (related to human activities) and hydrogeologic data by using a geographic information system to assign each well values for depth to groundwater, distance to major streams and canals, distance to gypsum beds, precipitation, soils, and well depth. These data then were downloaded to a statistical software package for analysis by logistic regression. (2) Statistical models predicting the probability of elevated nitrate concentrations, the probability of unmixed young water (using chlorofluorocarbon-11 concentrations and tritium activities), and the probability of elevated volatile organic compound concentrations were developed using logistic regression techniques. (3) The statistical models were entered into a GIS and the probability map was constructed.

  2. Neuroimaging Characteristics of Small-Vessel Disease in Older Adults with Normal Cognition, Mild Cognitive Impairment, and Alzheimer Disease.

    PubMed

    Mimenza-Alvarado, Alberto; Aguilar-Navarro, Sara G; Yeverino-Castro, Sara; Mendoza-Franco, César; Ávila-Funes, José Alberto; Román, Gustavo C

    2018-01-01

    Cerebral small-vessel disease (SVD) represents the most frequent type of vascular brain lesions, often coexisting with Alzheimer disease (AD). By quantifying white matter hyperintensities (WMH) and hippocampal and parietal atrophy, we aimed to describe the prevalence and severity of SVD among older adults with normal cognition (NC), mild cognitive impairment (MCI), and probable AD and to describe associated risk factors. This study included 105 older adults evaluated with magnetic resonance imaging and clinical and neuropsychological tests. We used the Fazekas scale (FS) for quantification of WMH, the Scheltens scale (SS) for hippocampal atrophy, and the Koedam scale (KS) for parietal atrophy. Logistic regression models were performed to determine the association between FS, SS, and KS scores and the presence of NC, MCI, or probable AD. Compared to NC subjects, SVD was more prevalent in MCI and probable AD subjects. After adjusting for confounding factors, logistic regression showed a positive association between higher scores on the FS and probable AD (OR = 7.6, 95% CI 2.7-20, p < 0.001). With the use of the SS and KS (OR = 4.5, 95% CI 3.5-58, p = 0.003 and OR = 8.9, 95% CI 1-72, p = 0.04, respectively), the risk also remained significant for probable AD. These results suggest an association between severity of vascular brain lesions and neurodegeneration.

  3. Using methods from the data mining and machine learning literature for disease classification and prediction: A case study examining classification of heart failure sub-types

    PubMed Central

    Austin, Peter C.; Tu, Jack V.; Ho, Jennifer E.; Levy, Daniel; Lee, Douglas S.

    2014-01-01

    Objective Physicians classify patients into those with or without a specific disease. Furthermore, there is often interest in classifying patients according to disease etiology or subtype. Classification trees are frequently used to classify patients according to the presence or absence of a disease. However, classification trees can suffer from limited accuracy. In the data-mining and machine learning literature, alternate classification schemes have been developed. These include bootstrap aggregation (bagging), boosting, random forests, and support vector machines. Study design and Setting We compared the performance of these classification methods with those of conventional classification trees to classify patients with heart failure according to the following sub-types: heart failure with preserved ejection fraction (HFPEF) vs. heart failure with reduced ejection fraction (HFREF). We also compared the ability of these methods to predict the probability of the presence of HFPEF with that of conventional logistic regression. Results We found that modern, flexible tree-based methods from the data mining literature offer substantial improvement in prediction and classification of heart failure sub-type compared to conventional classification and regression trees. However, conventional logistic regression had superior performance for predicting the probability of the presence of HFPEF compared to the methods proposed in the data mining literature. Conclusion The use of tree-based methods offers superior performance over conventional classification and regression trees for predicting and classifying heart failure subtypes in a population-based sample of patients from Ontario. However, these methods do not offer substantial improvements over logistic regression for predicting the presence of HFPEF. PMID:23384592

  4. Analyses of non-fatal accidents in an opencast mine by logistic regression model - a case study.

    PubMed

    Onder, Seyhan; Mutlu, Mert

    2017-09-01

    Accidents cause major damage for both workers and enterprises in the mining industry. To reduce the number of occupational accidents, these incidents should be properly registered and carefully analysed. This study efficiently examines the Aegean Lignite Enterprise (ELI) of Turkish Coal Enterprises (TKI) in Soma between 2006 and 2011, and opencast coal mine occupational accident records were used for statistical analyses. A total of 231 occupational accidents were analysed for this study. The accident records were categorized into seven groups: area, reason, occupation, part of body, age, shift hour and lost days. The SPSS package program was used in this study for logistic regression analyses, which predicted the probability of accidents resulting in greater or less than 3 lost workdays for non-fatal injuries. Social facilities-area of surface installations, workshops and opencast mining areas are the areas with the highest probability for accidents with greater than 3 lost workdays for non-fatal injuries, while the reasons with the highest probability for these types of accidents are transporting and manual handling. Additionally, the model was tested for such reported accidents that occurred in 2012 for the ELI in Soma and estimated the probability of exposure to accidents with lost workdays correctly by 70%.

  5. Developing logistic regression models using purchase attributes and demographics to predict the probability of purchases of regular and specialty eggs.

    PubMed

    Bejaei, M; Wiseman, K; Cheng, K M

    2015-01-01

    Consumers' interest in specialty eggs appears to be growing in Europe and North America. The objective of this research was to develop logistic regression models that utilise purchaser attributes and demographics to predict the probability of a consumer purchasing a specific type of table egg including regular (white and brown), non-caged (free-run, free-range and organic) or nutrient-enhanced eggs. These purchase prediction models, together with the purchasers' attributes, can be used to assess market opportunities of different egg types specifically in British Columbia (BC). An online survey was used to gather data for the models. A total of 702 completed questionnaires were submitted by BC residents. Selected independent variables included in the logistic regression to develop models for different egg types to predict the probability of a consumer purchasing a specific type of table egg. The variables used in the model accounted for 54% and 49% of variances in the purchase of regular and non-caged eggs, respectively. Research results indicate that consumers of different egg types exhibit a set of unique and statistically significant characteristics and/or demographics. For example, consumers of regular eggs were less educated, older, price sensitive, major chain store buyers, and store flyer users, and had lower awareness about different types of eggs and less concern regarding animal welfare issues. However, most of the non-caged egg consumers were less concerned about price, had higher awareness about different types of table eggs, purchased their eggs from local/organic grocery stores, farm gates or farmers markets, and they were more concerned about care and feeding of hens compared to consumers of other eggs types.

  6. Nomogram for prediction of level 2 axillary lymph node metastasis in proven level 1 node-positive breast cancer patients.

    PubMed

    Jiang, Yanlin; Xu, Hong; Zhang, Hao; Ou, Xunyan; Xu, Zhen; Ai, Liping; Sun, Lisha; Liu, Caigang

    2017-09-22

    The current management of the axilla in level 1 node-positive breast cancer patients is axillary lymph node dissection regardless of the status of the level 2 axillary lymph nodes. The goal of this study was to develop a nomogram predicting the probability of level 2 axillary lymph node metastasis (L-2-ALNM) in patients with level 1 axillary node-positive breast cancer. We reviewed the records of 974 patients with pathology-confirmed level 1 node-positive breast cancer between 2010 and 2014 at the Liaoning Cancer Hospital and Institute. The patients were randomized 1:1 and divided into a modeling group and a validation group. Clinical and pathological features of the patients were assessed with uni- and multivariate logistic regression. A nomogram based on independent predictors for the L-2-ALNM identified by multivariate logistic regression was constructed. Independent predictors of L-2-ALNM by the multivariate logistic regression analysis included tumor size, Ki-67 status, histological grade, and number of positive level 1 axillary lymph nodes. The areas under the receiver operating characteristic curve of the modeling set and the validation set were 0.828 and 0.816, respectively. The false-negative rates of the L-2-ALNM nomogram were 1.82% and 7.41% for the predicted probability cut-off points of < 6% and < 10%, respectively, when applied to the validation group. Our nomogram could help predict L-2-ALNM in patients with level 1 axillary lymph node metastasis. Patients with a low probability of L-2-ALNM could be spared level 2 axillary lymph node dissection, thereby reducing postoperative morbidity.

  7. Functional Data Analysis Applied to Modeling of Severe Acute Mucositis and Dysphagia Resulting From Head and Neck Radiation Therapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dean, Jamie A., E-mail: jamie.dean@icr.ac.uk; Wong, Kee H.; Gay, Hiram

    Purpose: Current normal tissue complication probability modeling using logistic regression suffers from bias and high uncertainty in the presence of highly correlated radiation therapy (RT) dose data. This hinders robust estimates of dose-response associations and, hence, optimal normal tissue–sparing strategies from being elucidated. Using functional data analysis (FDA) to reduce the dimensionality of the dose data could overcome this limitation. Methods and Materials: FDA was applied to modeling of severe acute mucositis and dysphagia resulting from head and neck RT. Functional partial least squares regression (FPLS) and functional principal component analysis were used for dimensionality reduction of the dose-volume histogrammore » data. The reduced dose data were input into functional logistic regression models (functional partial least squares–logistic regression [FPLS-LR] and functional principal component–logistic regression [FPC-LR]) along with clinical data. This approach was compared with penalized logistic regression (PLR) in terms of predictive performance and the significance of treatment covariate–response associations, assessed using bootstrapping. Results: The area under the receiver operating characteristic curve for the PLR, FPC-LR, and FPLS-LR models was 0.65, 0.69, and 0.67, respectively, for mucositis (internal validation) and 0.81, 0.83, and 0.83, respectively, for dysphagia (external validation). The calibration slopes/intercepts for the PLR, FPC-LR, and FPLS-LR models were 1.6/−0.67, 0.45/0.47, and 0.40/0.49, respectively, for mucositis (internal validation) and 2.5/−0.96, 0.79/−0.04, and 0.79/0.00, respectively, for dysphagia (external validation). The bootstrapped odds ratios indicated significant associations between RT dose and severe toxicity in the mucositis and dysphagia FDA models. Cisplatin was significantly associated with severe dysphagia in the FDA models. None of the covariates was significantly associated with severe toxicity in the PLR models. Dose levels greater than approximately 1.0 Gy/fraction were most strongly associated with severe acute mucositis and dysphagia in the FDA models. Conclusions: FPLS and functional principal component analysis marginally improved predictive performance compared with PLR and provided robust dose-response associations. FDA is recommended for use in normal tissue complication probability modeling.« less

  8. Functional Data Analysis Applied to Modeling of Severe Acute Mucositis and Dysphagia Resulting From Head and Neck Radiation Therapy.

    PubMed

    Dean, Jamie A; Wong, Kee H; Gay, Hiram; Welsh, Liam C; Jones, Ann-Britt; Schick, Ulrike; Oh, Jung Hun; Apte, Aditya; Newbold, Kate L; Bhide, Shreerang A; Harrington, Kevin J; Deasy, Joseph O; Nutting, Christopher M; Gulliford, Sarah L

    2016-11-15

    Current normal tissue complication probability modeling using logistic regression suffers from bias and high uncertainty in the presence of highly correlated radiation therapy (RT) dose data. This hinders robust estimates of dose-response associations and, hence, optimal normal tissue-sparing strategies from being elucidated. Using functional data analysis (FDA) to reduce the dimensionality of the dose data could overcome this limitation. FDA was applied to modeling of severe acute mucositis and dysphagia resulting from head and neck RT. Functional partial least squares regression (FPLS) and functional principal component analysis were used for dimensionality reduction of the dose-volume histogram data. The reduced dose data were input into functional logistic regression models (functional partial least squares-logistic regression [FPLS-LR] and functional principal component-logistic regression [FPC-LR]) along with clinical data. This approach was compared with penalized logistic regression (PLR) in terms of predictive performance and the significance of treatment covariate-response associations, assessed using bootstrapping. The area under the receiver operating characteristic curve for the PLR, FPC-LR, and FPLS-LR models was 0.65, 0.69, and 0.67, respectively, for mucositis (internal validation) and 0.81, 0.83, and 0.83, respectively, for dysphagia (external validation). The calibration slopes/intercepts for the PLR, FPC-LR, and FPLS-LR models were 1.6/-0.67, 0.45/0.47, and 0.40/0.49, respectively, for mucositis (internal validation) and 2.5/-0.96, 0.79/-0.04, and 0.79/0.00, respectively, for dysphagia (external validation). The bootstrapped odds ratios indicated significant associations between RT dose and severe toxicity in the mucositis and dysphagia FDA models. Cisplatin was significantly associated with severe dysphagia in the FDA models. None of the covariates was significantly associated with severe toxicity in the PLR models. Dose levels greater than approximately 1.0 Gy/fraction were most strongly associated with severe acute mucositis and dysphagia in the FDA models. FPLS and functional principal component analysis marginally improved predictive performance compared with PLR and provided robust dose-response associations. FDA is recommended for use in normal tissue complication probability modeling. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.

  9. Estimating a Logistic Discrimination Functions When One of the Training Samples Is Subject to Misclassification: A Maximum Likelihood Approach.

    PubMed

    Nagelkerke, Nico; Fidler, Vaclav

    2015-01-01

    The problem of discrimination and classification is central to much of epidemiology. Here we consider the estimation of a logistic regression/discrimination function from training samples, when one of the training samples is subject to misclassification or mislabeling, e.g. diseased individuals are incorrectly classified/labeled as healthy controls. We show that this leads to zero-inflated binomial model with a defective logistic regression or discrimination function, whose parameters can be estimated using standard statistical methods such as maximum likelihood. These parameters can be used to estimate the probability of true group membership among those, possibly erroneously, classified as controls. Two examples are analyzed and discussed. A simulation study explores properties of the maximum likelihood parameter estimates and the estimates of the number of mislabeled observations.

  10. Conditional Poisson models: a flexible alternative to conditional logistic case cross-over analysis.

    PubMed

    Armstrong, Ben G; Gasparrini, Antonio; Tobias, Aurelio

    2014-11-24

    The time stratified case cross-over approach is a popular alternative to conventional time series regression for analysing associations between time series of environmental exposures (air pollution, weather) and counts of health outcomes. These are almost always analyzed using conditional logistic regression on data expanded to case-control (case crossover) format, but this has some limitations. In particular adjusting for overdispersion and auto-correlation in the counts is not possible. It has been established that a Poisson model for counts with stratum indicators gives identical estimates to those from conditional logistic regression and does not have these limitations, but it is little used, probably because of the overheads in estimating many stratum parameters. The conditional Poisson model avoids estimating stratum parameters by conditioning on the total event count in each stratum, thus simplifying the computing and increasing the number of strata for which fitting is feasible compared with the standard unconditional Poisson model. Unlike the conditional logistic model, the conditional Poisson model does not require expanding the data, and can adjust for overdispersion and auto-correlation. It is available in Stata, R, and other packages. By applying to some real data and using simulations, we demonstrate that conditional Poisson models were simpler to code and shorter to run than are conditional logistic analyses and can be fitted to larger data sets than possible with standard Poisson models. Allowing for overdispersion or autocorrelation was possible with the conditional Poisson model but when not required this model gave identical estimates to those from conditional logistic regression. Conditional Poisson regression models provide an alternative to case crossover analysis of stratified time series data with some advantages. The conditional Poisson model can also be used in other contexts in which primary control for confounding is by fine stratification.

  11. Susceptibility assessment of earthquake-triggered landslides in El Salvador using logistic regression

    NASA Astrophysics Data System (ADS)

    García-Rodríguez, M. J.; Malpica, J. A.; Benito, B.; Díaz, M.

    2008-03-01

    This work has evaluated the probability of earthquake-triggered landslide occurrence in the whole of El Salvador, with a Geographic Information System (GIS) and a logistic regression model. Slope gradient, elevation, aspect, mean annual precipitation, lithology, land use, and terrain roughness are the predictor variables used to determine the dependent variable of occurrence or non-occurrence of landslides within an individual grid cell. The results illustrate the importance of terrain roughness and soil type as key factors within the model — using only these two variables the analysis returned a significance level of 89.4%. The results obtained from the model within the GIS were then used to produce a map of relative landslide susceptibility.

  12. Three quantitative approaches to the diagnosis of abdominal pain in children: practical applications of decision theory.

    PubMed

    Klein, M D; Rabbani, A B; Rood, K D; Durham, T; Rosenberg, N M; Bahr, M J; Thomas, R L; Langenburg, S E; Kuhns, L R

    2001-09-01

    The authors compared 3 quantitative methods for assisting clinicians in the differential diagnosis of abdominal pain in children, where the most common important endpoint is whether the patient has appendicitis. Pretest probability in different age and sex groups were determined to perform Bayesian analysis, binary logistic regression was used to determine which variables were statistically significantly likely to contribute to a diagnosis, and recursive partitioning was used to build decision trees with quantitative endpoints. The records of all children (1,208) seen at a large urban emergency department (ED) with a chief complaint of abdominal pain were immediately reviewed retrospectively (24 to 72 hours after the encounter). Attempts were made to contact all the patients' families to determine an accurate final diagnosis. A total of 1,008 (83%) families were contacted. Data were analyzed by calculation of the posttest probability, recursive partitioning, and binary logistic regression. In all groups the most common diagnosis was abdominal pain (ICD-9 Code 789). After this, however, the order of the most common final diagnoses for abdominal pain varied significantly. The entire group had a pretest probability of appendicitis of 0.06. This varied with age and sex from 0.02 in boys 2 to 5 years old to 0.16 in boys older than 12 years. In boys age 5 to 12, recursive partitioning and binary logistic regression agreed on guarding and anorexia as important variables. Guarding and tenderness were important in girls age 5 to 12. In boys age greater than 12, both agreed on guarding and anorexia. Using sensitivities and specificities from the literature, computed tomography improved the posttest probability for the group from.06 to.33; ultrasound improved it from.06 to.48; and barium enema improved it from.06 to.58. Knowing the pretest probabilities in a specific population allows the physician to evaluate the likely diagnoses first. Other quantitative methods can help judge how much importance a certain criterion should have in the decision making and how much a particular test is likely to influence the probability of a correct diagnosis. It now should be possible to make these sophisticated quantitative methods readily available to clinicians via the computer. Copyright 2001 by W.B. Saunders Company.

  13. Administrative Climate and Novices' Intent to Remain Teaching

    ERIC Educational Resources Information Center

    Pogodzinski, Ben; Youngs, Peter; Frank, Kenneth A.; Belman, Dale

    2012-01-01

    Using survey data from novice teachers at the elementary and middle school level across 11 districts, multilevel logistic regressions were estimated to examine the association between novices' perceptions of the administrative climate and their desire to remain teaching within their schools. We find that the probability that a novice teacher…

  14. The Effect of Urban Sprawls on Timber Harvesting

    Treesearch

    Stephen A. Barlow; Ian A Munn; David A. Cleaves; David L. Evans

    1998-01-01

    In Mississippi and Alabama, urban population growth is pushing development into rural areas. To study the impact of urbanization on timber harvesting, census and forest inventory data were combined in a geographic information system, and a logistic regression model was used to estimate the relationship between several variables and harvest probabilities....

  15. Outdoor Recreation Constraints: An Examination of Race, Gender, and Rural Dwelling

    Treesearch

    Cassandra Y. Johnson; J. Michael Bowker; H. Ken Cordell

    2001-01-01

    We assess whether traditionally marginalized groups in American society (African-Americans, women, rural dwellers) perceive more constraints to outdoor recreation participation than other groups. A series of logistic regressions are applied to a national recreation survey and used to model the probability that individuals perceive certain constraints to...

  16. Oak regeneration and overstory density in the Missouri Ozarks

    Treesearch

    David R. Larsen; Monte A. Metzger

    1997-01-01

    Reducing overstory density is a commonly recommended method of increasing the regeneration potential of oak (Quercus) forests. However, recommendations seldom specify the probable increase in density or the size of reproduction associated with a given residual overstory density. This paper presents logistic regression models that describe this...

  17. Building a Decision Support System for Inpatient Admission Prediction With the Manchester Triage System and Administrative Check-in Variables.

    PubMed

    Zlotnik, Alexander; Alfaro, Miguel Cuchí; Pérez, María Carmen Pérez; Gallardo-Antolín, Ascensión; Martínez, Juan Manuel Montero

    2016-05-01

    The usage of decision support tools in emergency departments, based on predictive models, capable of estimating the probability of admission for patients in the emergency department may give nursing staff the possibility of allocating resources in advance. We present a methodology for developing and building one such system for a large specialized care hospital using a logistic regression and an artificial neural network model using nine routinely collected variables available right at the end of the triage process.A database of 255.668 triaged nonobstetric emergency department presentations from the Ramon y Cajal University Hospital of Madrid, from January 2011 to December 2012, was used to develop and test the models, with 66% of the data used for derivation and 34% for validation, with an ordered nonrandom partition. On the validation dataset areas under the receiver operating characteristic curve were 0.8568 (95% confidence interval, 0.8508-0.8583) for the logistic regression model and 0.8575 (95% confidence interval, 0.8540-0. 8610) for the artificial neural network model. χ Values for Hosmer-Lemeshow fixed "deciles of risk" were 65.32 for the logistic regression model and 17.28 for the artificial neural network model. A nomogram was generated upon the logistic regression model and an automated software decision support system with a Web interface was built based on the artificial neural network model.

  18. Integration of logistic regression, Markov chain and cellular automata models to simulate urban expansion

    NASA Astrophysics Data System (ADS)

    Jokar Arsanjani, Jamal; Helbich, Marco; Kainz, Wolfgang; Darvishi Boloorani, Ali

    2013-04-01

    This research analyses the suburban expansion in the metropolitan area of Tehran, Iran. A hybrid model consisting of logistic regression model, Markov chain (MC), and cellular automata (CA) was designed to improve the performance of the standard logistic regression model. Environmental and socio-economic variables dealing with urban sprawl were operationalised to create a probability surface of spatiotemporal states of built-up land use for the years 2006, 2016, and 2026. For validation, the model was evaluated by means of relative operating characteristic values for different sets of variables. The approach was calibrated for 2006 by cross comparing of actual and simulated land use maps. The achieved outcomes represent a match of 89% between simulated and actual maps of 2006, which was satisfactory to approve the calibration process. Thereafter, the calibrated hybrid approach was implemented for forthcoming years. Finally, future land use maps for 2016 and 2026 were predicted by means of this hybrid approach. The simulated maps illustrate a new wave of suburban development in the vicinity of Tehran at the western border of the metropolis during the next decades.

  19. GIS-based rare events logistic regression for mineral prospectivity mapping

    NASA Astrophysics Data System (ADS)

    Xiong, Yihui; Zuo, Renguang

    2018-02-01

    Mineralization is a special type of singularity event, and can be considered as a rare event, because within a specific study area the number of prospective locations (1s) are considerably fewer than the number of non-prospective locations (0s). In this study, GIS-based rare events logistic regression (RELR) was used to map the mineral prospectivity in the southwestern Fujian Province, China. An odds ratio was used to measure the relative importance of the evidence variables with respect to mineralization. The results suggest that formations, granites, and skarn alterations, followed by faults and aeromagnetic anomaly are the most important indicators for the formation of Fe-related mineralization in the study area. The prediction rate and the area under the curve (AUC) values show that areas with higher probability have a strong spatial relationship with the known mineral deposits. Comparing the results with original logistic regression (OLR) demonstrates that the GIS-based RELR performs better than OLR. The prospectivity map obtained in this study benefits the search for skarn Fe-related mineralization in the study area.

  20. Predictive landslide susceptibility mapping using spatial information in the Pechabun area of Thailand

    NASA Astrophysics Data System (ADS)

    Oh, Hyun-Joo; Lee, Saro; Chotikasathien, Wisut; Kim, Chang Hwan; Kwon, Ju Hyoung

    2009-04-01

    For predictive landslide susceptibility mapping, this study applied and verified probability model, the frequency ratio and statistical model, logistic regression at Pechabun, Thailand, using a geographic information system (GIS) and remote sensing. Landslide locations were identified in the study area from interpretation of aerial photographs and field surveys, and maps of the topography, geology and land cover were constructed to spatial database. The factors that influence landslide occurrence, such as slope gradient, slope aspect and curvature of topography and distance from drainage were calculated from the topographic database. Lithology and distance from fault were extracted and calculated from the geology database. Land cover was classified from Landsat TM satellite image. The frequency ratio and logistic regression coefficient were overlaid for landslide susceptibility mapping as each factor’s ratings. Then the landslide susceptibility map was verified and compared using the existing landslide location. As the verification results, the frequency ratio model showed 76.39% and logistic regression model showed 70.42% in prediction accuracy. The method can be used to reduce hazards associated with landslides and to plan land cover.

  1. Sparse Logistic Regression for Diagnosis of Liver Fibrosis in Rat by Using SCAD-Penalized Likelihood

    PubMed Central

    Yan, Fang-Rong; Lin, Jin-Guan; Liu, Yu

    2011-01-01

    The objective of the present study is to find out the quantitative relationship between progression of liver fibrosis and the levels of certain serum markers using mathematic model. We provide the sparse logistic regression by using smoothly clipped absolute deviation (SCAD) penalized function to diagnose the liver fibrosis in rats. Not only does it give a sparse solution with high accuracy, it also provides the users with the precise probabilities of classification with the class information. In the simulative case and the experiment case, the proposed method is comparable to the stepwise linear discriminant analysis (SLDA) and the sparse logistic regression with least absolute shrinkage and selection operator (LASSO) penalty, by using receiver operating characteristic (ROC) with bayesian bootstrap estimating area under the curve (AUC) diagnostic sensitivity for selected variable. Results show that the new approach provides a good correlation between the serum marker levels and the liver fibrosis induced by thioacetamide (TAA) in rats. Meanwhile, this approach might also be used in predicting the development of liver cirrhosis. PMID:21716672

  2. Prospective risk factors for new-onset post-traumatic stress disorder in National Guard soldiers deployed to Iraq.

    PubMed

    Polusny, M A; Erbes, C R; Murdoch, M; Arbisi, P A; Thuras, P; Rath, M B

    2011-04-01

    National Guard troops are at increased risk for post-traumatic stress disorder (PTSD); however, little is known about risk and resilience in this population. The Readiness and Resilience in National Guard Soldiers Study is a prospective, longitudinal investigation of 522 Army National Guard troops deployed to Iraq from March 2006 to July 2007. Participants completed measures of PTSD symptoms and potential risk/protective factors 1 month before deployment. Of these, 81% (n=424) completed measures of PTSD, deployment stressor exposure and post-deployment outcomes 2-3 months after returning from Iraq. New onset of probable PTSD 'diagnosis' was measured by the PTSD Checklist - Military (PCL-M). Independent predictors of new-onset probable PTSD were identified using hierarchical logistic regression analyses. At baseline prior to deployment, 3.7% had probable PTSD. Among soldiers without PTSD symptoms at baseline, 13.8% reported post-deployment new-onset probable PTSD. Hierarchical logistic regression adjusted for gender, age, race/ethnicity and military rank showed that reporting more stressors prior to deployment predicted new-onset probable PTSD [odds ratio (OR) 2.20] as did feeling less prepared for deployment (OR 0.58). After accounting for pre-deployment factors, new-onset probable PTSD was predicted by exposure to combat (OR 2.19) and to combat's aftermath (OR 1.62). Reporting more stressful life events after deployment (OR 1.96) was associated with increased odds of new-onset probable PTSD, while post-deployment social support (OR 0.31) was a significant protective factor in the etiology of PTSD. Combat exposure may be unavoidable in military service members, but other vulnerability and protective factors also predict PTSD and could be targets for prevention strategies.

  3. Logistic regression model for detecting radon prone areas in Ireland.

    PubMed

    Elío, J; Crowley, Q; Scanlon, R; Hodgson, J; Long, S

    2017-12-01

    A new high spatial resolution radon risk map of Ireland has been developed, based on a combination of indoor radon measurements (n=31,910) and relevant geological information (i.e. Bedrock Geology, Quaternary Geology, soil permeability and aquifer type). Logistic regression was used to predict the probability of having an indoor radon concentration above the national reference level of 200Bqm -3 in Ireland. The four geological datasets evaluated were found to be statistically significant, and, based on combinations of these four variables, the predicted probabilities ranged from 0.57% to 75.5%. Results show that the Republic of Ireland may be divided in three main radon risk categories: High (HR), Medium (MR) and Low (LR). The probability of having an indoor radon concentration above 200Bqm -3 in each area was found to be 19%, 8% and 3%; respectively. In the Republic of Ireland, the population affected by radon concentrations above 200Bqm -3 is estimated at ca. 460k (about 10% of the total population). Of these, 57% (265k), 35% (160k) and 8% (35k) are in High, Medium and Low Risk Areas, respectively. Our results provide a high spatial resolution utility which permit customised radon-awareness information to be targeted at specific geographic areas. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. Predicting the probability of elevated nitrate concentrations in the Puget Sound Basin: Implications for aquifer susceptibility and vulnerability

    USGS Publications Warehouse

    Tesoriero, A.J.; Voss, F.D.

    1997-01-01

    The occurrence and distribution of elevated nitrate concentrations (≥ 3 mg/l) in ground water in the Puget Sound Basin, Washington, were determined by examining existing data from more than 3000 wells. Models that estimate the probability that a well has an elevated nitrate concentration were constructed by relating the occurrence of elevated nitrate concentrations to both natural and anthropogenic variables using logistic regression. The variables that best explain the occurrence of elevated nitrate concentrations were well depth, surficial geology, and the percentage of urban and agricultural land within a radius of 3.2 kilometers of the well. From these relations, logistic regression models were developed to assess aquifer susceptibility (relative ease with which contaminants will reach aquifer) and ground-water vulnerability (relative ease with which contaminants will reach aquifer for a given set of land-use practices). Both models performed well at predicting the probability of elevated nitrate concentrations in an independent data set. This approach to assessing aquifer susceptibility and ground-water vulnerability has the advantages of having both model variables and coefficient values determined on the basis of existing water quality information and does not depend on the assignment of variables and weighting factors based on qualitative criteria.

  5. Predicting mortality for five California conifers following wildfire

    Treesearch

    Sharon M. Hood; Sheri L. Smith; Daniel R. Cluck

    2010-01-01

    Fire injury was characterized and survival monitored for 5677 trees >25cm DBH from five wildfires in California that occurred between 2000 and 2004. Logistic regression models for predicting the probability of mortality 5-years after fire were developed for incense cedar (Calocedrus decurrens (Torr.) Florin), white fir (Abies concolor (Gord. & Glend.) Lindl. ex...

  6. Predicting and Managing Turnover in Human Service Agencies: A Case Study of an Organization in Crisis.

    ERIC Educational Resources Information Center

    Balfour, Danny L.; Neff, Donna M.

    1993-01-01

    A logistic regression model applied to data from 171 child service caseworkers identified variables determining job turnover during times of intense external criticism of the agency (length of service, professional commitment, level of education). A special training program did not significantly reduce the probability of turnover. (SK)

  7. Logistic Regression Modeling for Predicting Task-Related ICT Use in Teaching

    ERIC Educational Resources Information Center

    Askar, Petek; Usluel, Yasemin Kocak; Mumcu, Filiz Kuskaya

    2006-01-01

    The main goal of this study is to estimate the extent to which perceived innovation characteristics are associated with the probability of task related ICT use among secondary school teachers. The tasks were categorized as teaching preparation, teaching delivery, and management. Four hundred and sixteen teachers from secondary schools in Turkey,…

  8. Using occupancy modeling and logistic regression to assess the distribution of shrimp species in lowland streams, Costa Rica: Does regional groundwater create favorable habitat?

    USGS Publications Warehouse

    Snyder, Marcia; Freeman, Mary C.; Purucker, S. Thomas; Pringle, Catherine M.

    2016-01-01

    Freshwater shrimps are an important biotic component of tropical ecosystems. However, they can have a low probability of detection when abundances are low. We sampled 3 of the most common freshwater shrimp species, Macrobrachium olfersii, Macrobrachium carcinus, and Macrobrachium heterochirus, and used occupancy modeling and logistic regression models to improve our limited knowledge of distribution of these cryptic species by investigating both local- and landscape-scale effects at La Selva Biological Station in Costa Rica. Local-scale factors included substrate type and stream size, and landscape-scale factors included presence or absence of regional groundwater inputs. Capture rates for 2 of the sampled species (M. olfersii and M. carcinus) were sufficient to compare the fit of occupancy models. Occupancy models did not converge for M. heterochirus, but M. heterochirus had high enough occupancy rates that logistic regression could be used to model the relationship between occupancy rates and predictors. The best-supported models for M. olfersii and M. carcinus included conductivity, discharge, and substrate parameters. Stream size was positively correlated with occupancy rates of all 3 species. High stream conductivity, which reflects the quantity of regional groundwater input into the stream, was positively correlated with M. olfersii occupancy rates. Boulder substrates increased occupancy rate of M. carcinus and decreased the detection probability of M. olfersii. Our models suggest that shrimp distribution is driven by factors that function at local (substrate and discharge) and landscape (conductivity) scales.

  9. The quest for conditional independence in prospectivity modeling: weights-of-evidence, boost weights-of-evidence, and logistic regression

    NASA Astrophysics Data System (ADS)

    Schaeben, Helmut; Semmler, Georg

    2016-09-01

    The objective of prospectivity modeling is prediction of the conditional probability of the presence T = 1 or absence T = 0 of a target T given favorable or prohibitive predictors B, or construction of a two classes 0,1 classification of T. A special case of logistic regression called weights-of-evidence (WofE) is geologists' favorite method of prospectivity modeling due to its apparent simplicity. However, the numerical simplicity is deceiving as it is implied by the severe mathematical modeling assumption of joint conditional independence of all predictors given the target. General weights of evidence are explicitly introduced which are as simple to estimate as conventional weights, i.e., by counting, but do not require conditional independence. Complementary to the regression view is the classification view on prospectivity modeling. Boosting is the construction of a strong classifier from a set of weak classifiers. From the regression point of view it is closely related to logistic regression. Boost weights-of-evidence (BoostWofE) was introduced into prospectivity modeling to counterbalance violations of the assumption of conditional independence even though relaxation of modeling assumptions with respect to weak classifiers was not the (initial) purpose of boosting. In the original publication of BoostWofE a fabricated dataset was used to "validate" this approach. Using the same fabricated dataset it is shown that BoostWofE cannot generally compensate lacking conditional independence whatever the consecutively processing order of predictors. Thus the alleged features of BoostWofE are disproved by way of counterexamples, while theoretical findings are confirmed that logistic regression including interaction terms can exactly compensate violations of joint conditional independence if the predictors are indicators.

  10. Estimating the Probability of Elevated Nitrate Concentrations in Ground Water in Washington State

    USGS Publications Warehouse

    Frans, Lonna M.

    2008-01-01

    Logistic regression was used to relate anthropogenic (manmade) and natural variables to the occurrence of elevated nitrate concentrations in ground water in Washington State. Variables that were analyzed included well depth, ground-water recharge rate, precipitation, population density, fertilizer application amounts, soil characteristics, hydrogeomorphic regions, and land-use types. Two models were developed: one with and one without the hydrogeomorphic regions variable. The variables in both models that best explained the occurrence of elevated nitrate concentrations (defined as concentrations of nitrite plus nitrate as nitrogen greater than 2 milligrams per liter) were the percentage of agricultural land use in a 4-kilometer radius of a well, population density, precipitation, soil drainage class, and well depth. Based on the relations between these variables and measured nitrate concentrations, logistic regression models were developed to estimate the probability of nitrate concentrations in ground water exceeding 2 milligrams per liter. Maps of Washington State were produced that illustrate these estimated probabilities for wells drilled to 145 feet below land surface (median well depth) and the estimated depth to which wells would need to be drilled to have a 90-percent probability of drawing water with a nitrate concentration less than 2 milligrams per liter. Maps showing the estimated probability of elevated nitrate concentrations indicated that the agricultural regions are most at risk followed by urban areas. The estimated depths to which wells would need to be drilled to have a 90-percent probability of obtaining water with nitrate concentrations less than 2 milligrams per liter exceeded 1,000 feet in the agricultural regions; whereas, wells in urban areas generally would need to be drilled to depths in excess of 400 feet.

  11. Alternative approach to modeling bacterial lag time, using logistic regression as a function of time, temperature, pH, and sodium chloride concentration.

    PubMed

    Koseki, Shige; Nonaka, Junko

    2012-09-01

    The objective of this study was to develop a probabilistic model to predict the end of lag time (λ) during the growth of Bacillus cereus vegetative cells as a function of temperature, pH, and salt concentration using logistic regression. The developed λ model was subsequently combined with a logistic differential equation to simulate bacterial numbers over time. To develop a novel model for λ, we determined whether bacterial growth had begun, i.e., whether λ had ended, at each time point during the growth kinetics. The growth of B. cereus was evaluated by optical density (OD) measurements in culture media for various pHs (5.5 ∼ 7.0) and salt concentrations (0.5 ∼ 2.0%) at static temperatures (10 ∼ 20°C). The probability of the end of λ was modeled using dichotomous judgments obtained at each OD measurement point concerning whether a significant increase had been observed. The probability of the end of λ was described as a function of time, temperature, pH, and salt concentration and showed a high goodness of fit. The λ model was validated with independent data sets of B. cereus growth in culture media and foods, indicating acceptable performance. Furthermore, the λ model, in combination with a logistic differential equation, enabled a simulation of the population of B. cereus in various foods over time at static and/or fluctuating temperatures with high accuracy. Thus, this newly developed modeling procedure enables the description of λ using observable environmental parameters without any conceptual assumptions and the simulation of bacterial numbers over time with the use of a logistic differential equation.

  12. Non-ignorable missingness in logistic regression.

    PubMed

    Wang, Joanna J J; Bartlett, Mark; Ryan, Louise

    2017-08-30

    Nonresponses and missing data are common in observational studies. Ignoring or inadequately handling missing data may lead to biased parameter estimation, incorrect standard errors and, as a consequence, incorrect statistical inference and conclusions. We present a strategy for modelling non-ignorable missingness where the probability of nonresponse depends on the outcome. Using a simple case of logistic regression, we quantify the bias in regression estimates and show the observed likelihood is non-identifiable under non-ignorable missing data mechanism. We then adopt a selection model factorisation of the joint distribution as the basis for a sensitivity analysis to study changes in estimated parameters and the robustness of study conclusions against different assumptions. A Bayesian framework for model estimation is used as it provides a flexible approach for incorporating different missing data assumptions and conducting sensitivity analysis. Using simulated data, we explore the performance of the Bayesian selection model in correcting for bias in a logistic regression. We then implement our strategy using survey data from the 45 and Up Study to investigate factors associated with worsening health from the baseline to follow-up survey. Our findings have practical implications for the use of the 45 and Up Study data to answer important research questions relating to health and quality-of-life. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  13. Ensemble of trees approaches to risk adjustment for evaluating a hospital's performance.

    PubMed

    Liu, Yang; Traskin, Mikhail; Lorch, Scott A; George, Edward I; Small, Dylan

    2015-03-01

    A commonly used method for evaluating a hospital's performance on an outcome is to compare the hospital's observed outcome rate to the hospital's expected outcome rate given its patient (case) mix and service. The process of calculating the hospital's expected outcome rate given its patient mix and service is called risk adjustment (Iezzoni 1997). Risk adjustment is critical for accurately evaluating and comparing hospitals' performances since we would not want to unfairly penalize a hospital just because it treats sicker patients. The key to risk adjustment is accurately estimating the probability of an Outcome given patient characteristics. For cases with binary outcomes, the method that is commonly used in risk adjustment is logistic regression. In this paper, we consider ensemble of trees methods as alternatives for risk adjustment, including random forests and Bayesian additive regression trees (BART). Both random forests and BART are modern machine learning methods that have been shown recently to have excellent performance for prediction of outcomes in many settings. We apply these methods to carry out risk adjustment for the performance of neonatal intensive care units (NICU). We show that these ensemble of trees methods outperform logistic regression in predicting mortality among babies treated in NICU, and provide a superior method of risk adjustment compared to logistic regression.

  14. Derivation of data-driven triggers for palliative care consultation in critically ill patients.

    PubMed

    Hua, May S; Ma, Xiaoyue; Li, Guohua; Wunsch, Hannah

    2018-04-30

    To examine the ability of existing triggers for intensive care unit (ICU) palliative care consultation to predict 6-month mortality, and derive new triggers for consultation based on risk factors for 6-month mortality. Retrospective cohort study of NY state residents who received intensive care, 2008-2013. We examined sensitivity and specificity of existing triggers for predicting 6-month mortality and used logistic regression to generate patient subgroups at high-risk for 6-month mortality as potential novel triggers for ICU palliative care consultation. Of 1,019,849 patients, 195,847 (19.2%) died within 6 months of admission. Existing triggers were specific but not sensitive for predicting 6-month mortality, (sensitivity 0.3%-11.1%, specificity 96.5-99.9% for individual triggers). Using logistic regression, patient subgroups with the highest predicted probability of 6-month mortality were older patients admitted with sepsis (age 70-79 probability 49.7%, [49.5-50.0]) or cancer (non-metastatic cancer, age 70-79 probability 51.5%, [51.1-51.9]; metastatic cancer, age 70-79 probability 60.3%, [59.9-60.6]). Sensitivity and specificity of novel triggers ranged from 0.05% to 9.2% and 98.6% to 99.9%, respectively. Existing triggers for palliative care consultation are specific, but insensitive for 6-month mortality. Using a data-driven approach to derive novel triggers may identify subgroups of patients at high-risk of 6-month mortality. Copyright © 2018 Elsevier Inc. All rights reserved.

  15. Mobile phone use during driving: Effects on speed and effectiveness of driver compensatory behaviour.

    PubMed

    Choudhary, Pushpa; Velaga, Nagendra R

    2017-09-01

    This study analysed and modelled the effects of conversation and texting (each with two difficulty levels) on driving performance of Indian drivers in terms of their mean speed and accident avoiding abilities; and further explored the relationship between speed reduction strategy of the drivers and their corresponding accident frequency. 100 drivers of three different age groups (young, mid-age and old-age) participated in the simulator study. Two sudden events of Indian context: unexpected crossing of pedestrians and joining of parked vehicles from road side, were simulated for estimating the accident probabilities. Generalized linear mixed models approach was used for developing linear regression models for mean speed and binary logistic regression models for accident probability. The results of the models showed that the drivers significantly compensated the increased workload by reducing their mean speed by 2.62m/s and 5.29m/s in the presence of conversation and texting tasks respectively. The logistic models for accident probabilities showed that the accident probabilities increased by 3 and 4 times respectively when the drivers were conversing or texting on a phone during driving. Further, the relationship between the speed reduction patterns and their corresponding accident frequencies showed that all the drivers compensated differently; but, among all the drivers, only few drivers, who compensated by reducing the speed by 30% or more, were able to fully offset the increased accident risk associated with the phone use. Copyright © 2017 Elsevier Ltd. All rights reserved.

  16. Distribution of cavity trees in midwestern old-growth and second-growth forests

    Treesearch

    Zhaofei Fan; Stephen R. Shifley; Martin A. Spetich; Frank R. Thompson; David R. Larsen

    2003-01-01

    We used classification and regression tree analysis to determine the primary variables associated with the occurrence of cavity trees and the hierarchical structure among those variables. We applied that information to develop logistic models predicting cavity tree probability as a function of diameter, species group, and decay class. Inventories of cavity abundance in...

  17. Distribution of cavity trees in midwesternold-growth and second-growth forests

    Treesearch

    Zhaofei Fan; Stephen R. Shifley; Martin A. Spetich; Frank R., III Thompson; David R. Larsen

    2003-01-01

    We used classification and regression tree analysis to determine the primary variables associated with the occurrence of cavity trees and the hierarchical structure among those variables. We applied that information to develop logistic models predicting cavity tree probability as a function of diameter, species group, and decay class. Inventories of cavity abundance in...

  18. Educational Subculture and Dropping out in Higher Education: A Longitudinal Case Study

    ERIC Educational Resources Information Center

    Venuleo, C.; Mossi, P.; Salvatore, S.

    2016-01-01

    The paper tests longitudinally the hypothesis that educational subcultures in terms of which students interpret their role and their educational setting affect the probability of dropping out of higher education. A logistic regression model was performed to predict drop out at the beginning of the second academic year for the 823 freshmen of a…

  19. A comparison of two modeling approaches for evaluating wildlife--habitat relationships

    Treesearch

    Ryan A. Long; Jonathan D. Muir; Janet L. Rachlow; John G. Kie

    2009-01-01

    Studies of resource selection form the basis for much of our understanding of wildlife habitat requirements, and resource selection functions (RSFs), which predict relative probability of use, have been proposed as a unifying concept for analysis and interpretation of wildlife habitat data. Logistic regression that contrasts used and available or unused resource units...

  20. Delayed conifer tree mortality following fire in California

    Treesearch

    Sharon M. Hood; Sheri L. Smith; Daniel R. Cluck

    2007-01-01

    Fire injury was characterized and survival monitored for 5,246 trees from five wildfires in California that occurred between 1999 and 2002. Logistic regression models for predicting the probability of mortality were developed for incense-cedar, Jeffrey pine, ponderosa pine, red fir and white fir. Two-year post-fire preliminary models were developed for incense-cedar,...

  1. Ten-year risk-rating systems for California red fir and white fir: development and use

    Treesearch

    George T. Ferrell

    1989-01-01

    Logistic regression equations predicting the probability that a tree will die from natural causes--insects, diseases, intertree competition--within 10 years have been developed for California red fir (Abies magnifica) and white fir (A. concolor). The equations, like those with a 5-year prediction period already developed for these...

  2. A Method for Calculating the Probability of Successfully Completing a Rocket Propulsion Ground Test

    NASA Technical Reports Server (NTRS)

    Messer, Bradley P.

    2004-01-01

    Propulsion ground test facilities face the daily challenges of scheduling multiple customers into limited facility space and successfully completing their propulsion test projects. Due to budgetary and schedule constraints, NASA and industry customers are pushing to test more components, for less money, in a shorter period of time. As these new rocket engine component test programs are undertaken, the lack of technology maturity in the test articles, combined with pushing the test facilities capabilities to their limits, tends to lead to an increase in facility breakdowns and unsuccessful tests. Over the last five years Stennis Space Center's propulsion test facilities have performed hundreds of tests, collected thousands of seconds of test data, and broken numerous test facility and test article parts. While various initiatives have been implemented to provide better propulsion test techniques and improve the quality, reliability, and maintainability of goods and parts used in the propulsion test facilities, unexpected failures during testing still occur quite regularly due to the harsh environment in which the propulsion test facilities operate. Previous attempts at modeling the lifecycle of a propulsion component test project have met with little success. Each of the attempts suffered form incomplete or inconsistent data on which to base the models. By focusing on the actual test phase of the tests project rather than the formulation, design or construction phases of the test project, the quality and quantity of available data increases dramatically. A logistic regression model has been developed form the data collected over the last five years, allowing the probability of successfully completing a rocket propulsion component test to be calculated. A logistic regression model is a mathematical modeling approach that can be used to describe the relationship of several independent predictor variables X(sub 1), X(sub 2),..,X(sub k) to a binary or dichotomous dependent variable Y, where Y can only be one of two possible outcomes, in this case Success or Failure. Logistic regression has primarily been used in the fields of epidemiology and biomedical research, but lends itself to many other applications. As indicated the use of logistic regression is not new, however, modeling propulsion ground test facilities using logistic regression is both a new and unique application of the statistical technique. Results from the models provide project managers with insight and confidence into the affectivity of rocket engine component ground test projects. The initial success in modeling rocket propulsion ground test projects clears the way for more complex models to be developed in this area.

  3. Modeling potential distribution of Oligoryzomys longicaudatus, the Andes virus (Genus: Hantavirus) reservoir, in Argentina.

    PubMed

    Andreo, Verónica; Glass, Gregory; Shields, Timothy; Provensal, Cecilia; Polop, Jaime

    2011-09-01

    We constructed a model to predict the potential distribution of Oligoryzomys longicaudatus, the reservoir of Andes virus (Genus: Hantavirus), in Argentina. We developed an extensive database of occurrence records from published studies and our own surveys and compared two methods to model the probability of O. longicaudatus presence; logistic regression and MaxEnt algorithm. The environmental variables used were tree, grass and bare soil cover from MODIS imagery and, altitude and 19 bioclimatic variables from WorldClim database. The models performances were evaluated and compared both by threshold dependent and independent measures. The best models included tree and grass cover, mean diurnal temperature range, and precipitation of the warmest and coldest seasons. The potential distribution maps for O. longicaudatus predicted the highest occurrence probabilities along the Andes range, from 32°S and narrowing southwards. They also predicted high probabilities for the south-central area of Argentina, reaching the Atlantic coast. The Hantavirus Pulmonary Syndrome cases coincided with mean occurrence probabilities of 95 and 77% for logistic and MaxEnt models, respectively. HPS transmission zones in Argentine Patagonia matched the areas with the highest probability of presence. Therefore, colilargos presence probability may provide an approximate risk of transmission and act as an early tool to guide control and prevention plans.

  4. Reanalysis of the start of the UK 1967 to 1968 foot-and-mouth disease epidemic to calculate airborne transmission probabilities.

    PubMed

    Sanson, R L; Gloster, J; Burgin, L

    2011-09-24

    The aims of this study were to statistically reassess the likelihood that windborne spread of foot-and-mouth disease (FMD) virus (FMDV) occurred at the start of the UK 1967 to 1968 FMD epidemic at Oswestry, Shropshire, and to derive dose-response probability of infection curves for farms exposed to airborne FMDV. To enable this, data on all farms present in 1967 in the parishes near Oswestry were assembled. Cases were infected premises whose date of appearance of first clinical signs was within 14 days of the depopulation of the index farm. Logistic regression was used to evaluate the association between infection status and distance and direction from the index farm. The UK Met Office's NAME atmospheric dispersion model (ADM) was used to generate plumes for each day that FMDV was excreted from the index farm based on actual historical weather records from October 1967. Daily airborne FMDV exposure rates for all farms in the study area were calculated using a geographical information system. Probit analyses were used to calculate dose-response probability of infection curves to FMDV, using relative exposure rates on case and control farms. Both the logistic regression and probit analyses gave strong statistical support to the hypothesis that airborne spread occurred. There was some evidence that incubation period was inversely proportional to the exposure rate.

  5. Power and type I error results for a bias-correction approach recently shown to provide accurate odds ratios of genetic variants for the secondary phenotypes associated with primary diseases.

    PubMed

    Wang, Jian; Shete, Sanjay

    2011-11-01

    We recently proposed a bias correction approach to evaluate accurate estimation of the odds ratio (OR) of genetic variants associated with a secondary phenotype, in which the secondary phenotype is associated with the primary disease, based on the original case-control data collected for the purpose of studying the primary disease. As reported in this communication, we further investigated the type I error probabilities and powers of the proposed approach, and compared the results to those obtained from logistic regression analysis (with or without adjustment for the primary disease status). We performed a simulation study based on a frequency-matching case-control study with respect to the secondary phenotype of interest. We examined the empirical distribution of the natural logarithm of the corrected OR obtained from the bias correction approach and found it to be normally distributed under the null hypothesis. On the basis of the simulation study results, we found that the logistic regression approaches that adjust or do not adjust for the primary disease status had low power for detecting secondary phenotype associated variants and highly inflated type I error probabilities, whereas our approach was more powerful for identifying the SNP-secondary phenotype associations and had better-controlled type I error probabilities. © 2011 Wiley Periodicals, Inc.

  6. Optimization of Game Formats in U-10 Soccer Using Logistic Regression Analysis

    PubMed Central

    Amatria, Mario; Arana, Javier; Anguera, M. Teresa; Garzón, Belén

    2016-01-01

    Abstract Small-sided games provide young soccer players with better opportunities to develop their skills and progress as individual and team players. There is, however, little evidence on the effectiveness of different game formats in different age groups, and furthermore, these formats can vary between and even within countries. The Royal Spanish Soccer Association replaced the traditional grassroots 7-a-side format (F-7) with the 8-a-side format (F-8) in the 2011-12 season and the country’s regional federations gradually followed suit. The aim of this observational methodology study was to investigate which of these formats best suited the learning needs of U-10 players transitioning from 5-aside futsal. We built a multiple logistic regression model to predict the success of offensive moves depending on the game format and the area of the pitch in which the move was initiated. Success was defined as a shot at the goal. We also built two simple logistic regression models to evaluate how the game format influenced the acquisition of technicaltactical skills. It was found that the probability of a shot at the goal was higher in F-7 than in F-8 for moves initiated in the Creation Sector-Own Half (0.08 vs 0.07) and the Creation Sector-Opponent's Half (0.18 vs 0.16). The probability was the same (0.04) in the Safety Sector. Children also had more opportunities to control the ball and pass or take a shot in the F-7 format (0.24 vs 0.20), and these were also more likely to be successful in this format (0.28 vs 0.19). PMID:28031768

  7. Predicting Grade 3 Acute Diarrhea During Radiation Therapy for Rectal Cancer Using a Cutoff-Dose Logistic Regression Normal Tissue Complication Probability Model

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Robertson, John M., E-mail: jrobertson@beaumont.ed; Soehn, Matthias; Yan Di

    Purpose: Understanding the dose-volume relationship of small bowel irradiation and severe acute diarrhea may help reduce the incidence of this side effect during adjuvant treatment for rectal cancer. Methods and Materials: Consecutive patients treated curatively for rectal cancer were reviewed, and the maximum grade of acute diarrhea was determined. The small bowel was outlined on the treatment planning CT scan, and a dose-volume histogram was calculated for the initial pelvic treatment (45 Gy). Logistic regression models were fitted for varying cutoff-dose levels from 5 to 45 Gy in 5-Gy increments. The model with the highest LogLikelihood was used to developmore » a cutoff-dose normal tissue complication probability (NTCP) model. Results: There were a total of 152 patients (48% preoperative, 47% postoperative, 5% other), predominantly treated prone (95%) with a three-field technique (94%) and a protracted venous infusion of 5-fluorouracil (78%). Acute Grade 3 diarrhea occurred in 21%. The largest LogLikelihood was found for the cutoff-dose logistic regression model with 15 Gy as the cutoff-dose, although the models for 20 Gy and 25 Gy had similar significance. According to this model, highly significant correlations (p <0.001) between small bowel volumes receiving at least 15 Gy and toxicity exist in the considered patient population. Similar findings applied to both the preoperatively (p = 0.001) and postoperatively irradiated groups (p = 0.001). Conclusion: The incidence of Grade 3 diarrhea was significantly correlated with the volume of small bowel receiving at least 15 Gy using a cutoff-dose NTCP model.« less

  8. Principal component analysis-based pattern analysis of dose-volume histograms and influence on rectal toxicity.

    PubMed

    Söhn, Matthias; Alber, Markus; Yan, Di

    2007-09-01

    The variability of dose-volume histogram (DVH) shapes in a patient population can be quantified using principal component analysis (PCA). We applied this to rectal DVHs of prostate cancer patients and investigated the correlation of the PCA parameters with late bleeding. PCA was applied to the rectal wall DVHs of 262 patients, who had been treated with a four-field box, conformal adaptive radiotherapy technique. The correlated changes in the DVH pattern were revealed as "eigenmodes," which were ordered by their importance to represent data set variability. Each DVH is uniquely characterized by its principal components (PCs). The correlation of the first three PCs and chronic rectal bleeding of Grade 2 or greater was investigated with uni- and multivariate logistic regression analyses. Rectal wall DVHs in four-field conformal RT can primarily be represented by the first two or three PCs, which describe approximately 94% or 96% of the DVH shape variability, respectively. The first eigenmode models the total irradiated rectal volume; thus, PC1 correlates to the mean dose. Mode 2 describes the interpatient differences of the relative rectal volume in the two- or four-field overlap region. Mode 3 reveals correlations of volumes with intermediate doses ( approximately 40-45 Gy) and volumes with doses >70 Gy; thus, PC3 is associated with the maximal dose. According to univariate logistic regression analysis, only PC2 correlated significantly with toxicity. However, multivariate logistic regression analysis with the first two or three PCs revealed an increased probability of bleeding for DVHs with more than one large PC. PCA can reveal the correlation structure of DVHs for a patient population as imposed by the treatment technique and provide information about its relationship to toxicity. It proves useful for augmenting normal tissue complication probability modeling approaches.

  9. Landscape characteristics influence pond occupancy by frogs after accounting for detectability

    USGS Publications Warehouse

    Mazerolle, M.J.; Desrochers, A.; Rochefort, L.

    2005-01-01

    Many investigators have hypothesized that landscape attributes such as the amount and proximity of habitat are important for amphibian spatial patterns. This has produced a number of studies focusing on the effects of landscape characteristics on amphibian patterns of occurrence in patches or ponds, most of which conclude that the landscape is important. We identified two concerns associated with these studies: one deals with their applicability to other landscape types, as most have been conducted in agricultural landscapes; the other highlights the need to account for the probability of detection. We tested the hypothesis that landscape characteristics influence spatial patterns of amphibian occurrence at ponds after accounting for the probability of detection in little-studied peatland landscapes undergoing peat mining. We also illustrated the costs of not accounting for the probability of detection by comparing our results to conventional logistic regression analyses. Results indicate that frog occurrence increased with the percent cover of ponds within 100, 250, and 1000 m, as well as the amount of forest cover within 1000 m. However, forest cover at 250 m had a negative influence on frog presence at ponds. Not accounting for the probability of detection resulted in underestimating the influence of most variables on frog occurrence, whereas a few were overestimated. Regardless, we show that conventional logistic regression can lead to different conclusions than analyses accounting for detectability. Our study is consistent with the hypothesis that landscape characteristics are important in determining the spatial patterns of frog occurrence at ponds. We strongly recommend estimating the probability of detection in field surveys, as this will increase the quality and conservation potential of models derived from such data. ?? 2005 by the Ecological Society of America.

  10. Impact of prostate weight on probability of positive surgical margins in patients with low-risk prostate cancer after robotic-assisted laparoscopic radical prostatectomy.

    PubMed

    Marchetti, Pablo E; Shikanov, Sergey; Razmaria, Aria A; Zagaja, Gregory P; Shalhav, Arieh L

    2011-03-01

    To evaluate the impact of prostate weight (PW) on probability of positive surgical margin (PSM) in patients undergoing robotic-assisted radical prostatectomy (RARP) for low-risk prostate cancer. The cohort consisted of 690 men with low-risk prostate cancer (clinical stage T1c, prostate-specific antigen <10 ng/mL, biopsy Gleason score ≤6) who underwent RARP with bilateral nerve-sparing at our institution by 1 of 2 surgeons from 2003 to 2009. PW was obtained from the pathologic specimen. The association between probability of PSM and PW was assessed with univariate and multivariate logistic regression analysis. A PSM was identified in 105 patients (15.2%). Patients with PSM had significant higher prostate-specific antigen (P = .04), smaller prostates (P = .0001), higher Gleason score (P = .004), and higher pathologic stage (P < .0001). After logistic regression, we found a significant inverse relation between PSM and PW (OR 0.97%; 95% confidence interval [CI] 0.96, 0.99; P = .0003) in univariate analysis. This remained significant in the multivariate model (OR 0.98%; 95% CI 0.96, 0.99; P = .006) adjusting for age, body mass index, surgeon experience, pathologic Gleason score, and pathologic stage. In this multivariate model, the predicted probability of PSM for 25-, 50-, 100-, and 150-g prostates were 22% (95% CI 16%, 30%), 13% (95% CI 11%, 16%), 5% (95% CI 1%, 8%), and 1% (95% CI 0%, 3%), respectively. Lower PW is independently associated with higher probability of PSM in low-risk patients undergoing RARP with bilateral nerve-sparing. Copyright © 2011 Elsevier Inc. All rights reserved.

  11. Relationships between common forest metrics and realized impacts of Hurricane Katrina on forest resources in Mississippi

    Treesearch

    Sonja N. Oswalt; Christopher M. Oswalt

    2008-01-01

    This paper compares and contrasts hurricane-related damage recorded across the Mississippi landscape in the 2 years following Katrina with initial damage assessments based on modeled parameters by the USDA Forest Service. Logistic and multiple regressions are used to evaluate the influence of stand characteristics on tree damage probability. Specifically, this paper...

  12. Can Statistical Modeling Increase Annual Fund Performance? An Experiment at the University of Maryland, College Park.

    ERIC Educational Resources Information Center

    Porter, Stephen R.

    Annual funds face pressures to contact all alumni to maximize participation, but these efforts are costly. This paper uses a logistic regression model to predict likely donors among alumni from the College of Arts & Humanities at the University of Maryland, College Park. Alumni were grouped according to their predicted probability of donating…

  13. Modeling Tree Mortality Following Wildfire in Pinus ponderosa Forests in the Central Sierra Nevada of California

    Treesearch

    Jon C. Regelbrugge

    1993-01-01

    Abstract. We modeled tree mortality occurring two years following wildfire in Pinus ponderosa forests using data from 1275 trees in 25 stands burned during the 1987 Stanislaus Complex fires. We used logistic regression analysis to develop models relating the probability of wildfire-induced mortality with tree size and fire severity for Pinus ponderosa, Calocedrus...

  14. Modelling landscape change in paddy fields using logistic regression and GIS

    NASA Astrophysics Data System (ADS)

    Franjaya, E. E.; Syartinilia; Setiawan, Y.

    2018-05-01

    Paddy field in karawang district, as an important agricultural land in west java, has been decreased since 1994. From previous study, paddy fields dominantly turned into built area. The changes were almost occured in the middle area of the district where roadways, industries, settlements, and commercial buildings were existed. These were estimated as driving forces. But, we still need to prove it. This study aimed to construct the paddy field probability change model, subsequently the driving forces will be obtained. GIS combined with logistic regression using environmental variables were used as main method in this study. Ten environmental variables were elevation 0–500 m, elevation>500 m, slope<8%, slope>8%, CBD, build up area, river, irrigation, toll and national roadway, and collector and local roadway. The result indicated that four variables were significantly played as driving forces (slope>8%, CBD area, build up area, and collector and local roadway). Paddy field has high, medium, and low probability to change which covered about 27.8%, 7.8%, and 64.4% area in Karawang respectively. Based on landscape ecology, the recommendation that suitable with landscape change is adaptive management.

  15. [Risk factor analysis of the patients with solitary pulmonary nodules and establishment of a prediction model for the probability of malignancy].

    PubMed

    Wang, X; Xu, Y H; Du, Z Y; Qian, Y J; Xu, Z H; Chen, R; Shi, M H

    2018-02-23

    Objective: This study aims to analyze the relationship among the clinical features, radiologic characteristics and pathological diagnosis in patients with solitary pulmonary nodules, and establish a prediction model for the probability of malignancy. Methods: Clinical data of 372 patients with solitary pulmonary nodules who underwent surgical resection with definite postoperative pathological diagnosis were retrospectively analyzed. In these cases, we collected clinical and radiologic features including gender, age, smoking history, history of tumor, family history of cancer, the location of lesion, ground-glass opacity, maximum diameter, calcification, vessel convergence sign, vacuole sign, pleural indentation, speculation and lobulation. The cases were divided to modeling group (268 cases) and validation group (104 cases). A new prediction model was established by logistic regression analying the data from modeling group. Then the data of validation group was planned to validate the efficiency of the new model, and was compared with three classical models(Mayo model, VA model and LiYun model). With the calculated probability values for each model from validation group, SPSS 22.0 was used to draw the receiver operating characteristic curve, to assess the predictive value of this new model. Results: 112 benign SPNs and 156 malignant SPNs were included in modeling group. Multivariable logistic regression analysis showed that gender, age, history of tumor, ground -glass opacity, maximum diameter, and speculation were independent predictors of malignancy in patients with SPN( P <0.05). We calculated a prediction model for the probability of malignancy as follow: p =e(x)/(1+ e(x)), x=-4.8029-0.743×gender+ 0.057×age+ 1.306×history of tumor+ 1.305×ground-glass opacity+ 0.051×maximum diameter+ 1.043×speculation. When the data of validation group was added to the four-mathematical prediction model, The area under the curve of our mathematical prediction model was 0.742, which is greater than other models (Mayo 0.696, VA 0.634, LiYun 0.681), while the differences between any two of the four models were not significant ( P >0.05). Conclusions: Age of patient, gender, history of tumor, ground-glass opacity, maximum diameter and speculation are independent predictors of malignancy in patients with solitary pulmonary nodule. This logistic regression prediction mathematic model is not inferior to those classical models in estimating the prognosis of SPNs.

  16. Sensitivity of Alpine and Subalpine Lakes to Atmospheric Deposition in Grand Teton National Park and Yellowstone National Park, Wyoming

    NASA Astrophysics Data System (ADS)

    Nanus, L.; Campbell, D. H.; Williams, M. W.

    2004-12-01

    Acidification of high-elevation lakes in the Western United States is of concern because of the storage and release of pollutants in snowmelt runoff combined with steep topography, granitic bedrock, and limited soils and biota. Land use managers have limited resources for sampling and thus need direction on how best to design monitoring programs. We evaluated the sensitivity of 400 lakes in Grand Teton (GRTE) and Yellowstone (YELL) National Parks to acidification from atmospheric deposition of nitrogen and sulfur based on statistical relations between acid-neutralizing capacity (ANC) concentrations and basin characteristics to aid in the design of a long-term monitoring plan for Outstanding Natural Resource Waters. ANC concentrations that were measured at 52 lakes in GRTE and 23 lakes in YELL during synoptic surveys were used to calibrate the statistical models. Basin-characteristic information was derived from Geographic Information System data sets. The explanatory variables that were considered included bedrock type, basin slope, basin aspect, basin elevation, lake area, basin area, inorganic nitrogen (N) deposition, sulfate deposition, hydrogen ion deposition, basin precipitation, soil type, and vegetation type. A logistic regression model was developed and applied to lake basins greater than 1 hectare (ha) in GRTE (n=106) and YELL (n=294). For GRTE, 36 percent of lakes had a greater than 60-percent probability of having ANC concentrations less than 100 microequivalents per liter, and 14 percent of lakes had a greater than 80-percent probability of having ANC concentrations less than 100 microequivalents per liter. The elevation of the lake outlet and the area of the basin with northeast aspects were determined to be statistically significant and were used as the explanatory variables in the multivariate logistic regression model. For YELL, results indicated that 13 percent of lakes had a greater than 60-percent probability of having ANC concentrations less than 100 microequivalents per liter, and 9 percent of lakes had a greater than 80-percent probability of having ANC concentrations less than 100 microequivalents per liter. Only the elevation of the lake outlet was determined to be statistically significant and was used as the explanatory variable in the multivariate logistic regression model. The lakes that exceeded 80-percent probability of having an ANC concentration less than 100 microequivalents per liter, and therefore had the greatest sensitivity to acidification from atmospheric deposition, are located at elevations greater than 2,810 meters (m) in GRTE, and greater than 2,655 m in YELL.

  17. Investigation of shipping accident injury severity and mortality.

    PubMed

    Weng, Jinxian; Yang, Dong

    2015-03-01

    Shipping movements are operated in a complex and high-risk environment. Fatal shipping accidents are the nightmares of seafarers. With ten years' worldwide ship accident data, this study develops a binary logistic regression model and a zero-truncated binomial regression model to predict the probability of fatal shipping accidents and corresponding mortalities. The model results show that both the probability of fatal accidents and mortalities are greater for collision, fire/explosion, contact, grounding, sinking accidents occurred in adverse weather conditions and darkness conditions. Sinking has the largest effects on the increment of fatal accident probability and mortalities. The results also show that the bigger number of mortalities is associated with shipping accidents occurred far away from the coastal area/harbor/port. In addition, cruise ships are found to have more mortalities than non-cruise ships. The results of this study are beneficial for policy-makers in proposing efficient strategies to prevent fatal shipping accidents. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. Breast arterial calcification is associated with reproductive factors in asymptomatic postmenopausal women.

    PubMed

    Bielak, Lawrence F; Whaley, Dana H; Sheedy, Patrick F; Peyser, Patricia A

    2010-09-01

    The etiology of breast arterial calcification (BAC) is not well understood. We examined reproductive history and cardiovascular disease (CVD) risk factor associations with the presence of detectable BAC in asymptomatic postmenopausal women. Reproductive history and CVD risk factors were obtained in 240 asymptomatic postmenopausal women from a community-based research study who had a screening mammogram within 2 years of their participation in the study. The mammograms were reviewed for the presence of detectable BAC. Age-adjusted logistic regression models were fit to assess the association between each risk factor and the presence of BAC. Multiple variable logistic regression models were used to identify the most parsimonious model for the presence of BAC. The prevalence of BAC increased with increased age (p < 0.0001). The most parsimonious logistic regression model for BAC presence included age at time of examination, increased parity (p = 0.01), earlier age at first birth (p = 0.002), weight, and an age-by-weight interaction term (p = 0.004). Older women with a smaller body size had a higher probability of having BAC than women of the same age with a larger body size. The presence or absence of BAC at mammography may provide an assessment of a postmenopausal woman's lifetime estrogen exposure and indicate women who could be at risk for hormonally related conditions.

  19. Development and validation of a mortality risk model for pediatric sepsis.

    PubMed

    Chen, Mengshi; Lu, Xiulan; Hu, Li; Liu, Pingping; Zhao, Wenjiao; Yan, Haipeng; Tang, Liang; Zhu, Yimin; Xiao, Zhenghui; Chen, Lizhang; Tan, Hongzhuan

    2017-05-01

    Pediatric sepsis is a burdensome public health problem. Assessing the mortality risk of pediatric sepsis patients, offering effective treatment guidance, and improving prognosis to reduce mortality rates, are crucial.We extracted data derived from electronic medical records of pediatric sepsis patients that were collected during the first 24 hours after admission to the pediatric intensive care unit (PICU) of the Hunan Children's hospital from January 2012 to June 2014. A total of 788 children were randomly divided into a training (592, 75%) and validation group (196, 25%). The risk factors for mortality among these patients were identified by conducting multivariate logistic regression in the training group. Based on the established logistic regression equation, the logit probabilities for all patients (in both groups) were calculated to verify the model's internal and external validities.According to the training group, 6 variables (brain natriuretic peptide, albumin, total bilirubin, D-dimer, lactate levels, and mechanical ventilation in 24 hours) were included in the final logistic regression model. The areas under the curves of the model were 0.854 (0.826, 0.881) and 0.844 (0.816, 0.873) in the training and validation groups, respectively.The Mortality Risk Model for Pediatric Sepsis we established in this study showed acceptable accuracy to predict the mortality risk in pediatric sepsis patients.

  20. Development and validation of a mortality risk model for pediatric sepsis

    PubMed Central

    Chen, Mengshi; Lu, Xiulan; Hu, Li; Liu, Pingping; Zhao, Wenjiao; Yan, Haipeng; Tang, Liang; Zhu, Yimin; Xiao, Zhenghui; Chen, Lizhang; Tan, Hongzhuan

    2017-01-01

    Abstract Pediatric sepsis is a burdensome public health problem. Assessing the mortality risk of pediatric sepsis patients, offering effective treatment guidance, and improving prognosis to reduce mortality rates, are crucial. We extracted data derived from electronic medical records of pediatric sepsis patients that were collected during the first 24 hours after admission to the pediatric intensive care unit (PICU) of the Hunan Children's hospital from January 2012 to June 2014. A total of 788 children were randomly divided into a training (592, 75%) and validation group (196, 25%). The risk factors for mortality among these patients were identified by conducting multivariate logistic regression in the training group. Based on the established logistic regression equation, the logit probabilities for all patients (in both groups) were calculated to verify the model's internal and external validities. According to the training group, 6 variables (brain natriuretic peptide, albumin, total bilirubin, D-dimer, lactate levels, and mechanical ventilation in 24 hours) were included in the final logistic regression model. The areas under the curves of the model were 0.854 (0.826, 0.881) and 0.844 (0.816, 0.873) in the training and validation groups, respectively. The Mortality Risk Model for Pediatric Sepsis we established in this study showed acceptable accuracy to predict the mortality risk in pediatric sepsis patients. PMID:28514310

  1. Environmental factors and flow paths related to Escherichia coli concentrations at two beaches on Lake St. Clair, Michigan, 2002–2005

    USGS Publications Warehouse

    Holtschlag, David J.; Shively, Dawn; Whitman, Richard L.; Haack, Sheridan K.; Fogarty, Lisa R.

    2008-01-01

    Regression analyses and hydrodynamic modeling were used to identify environmental factors and flow paths associated with Escherichia coli (E. coli) concentrations at Memorial and Metropolitan Beaches on Lake St. Clair in Macomb County, Mich. Lake St. Clair is part of the binational waterway between the United States and Canada that connects Lake Huron with Lake Erie in the Great Lakes Basin. Linear regression, regression-tree, and logistic regression models were developed from E. coli concentration and ancillary environmental data. Linear regression models on log10 E. coli concentrations indicated that rainfall prior to sampling, water temperature, and turbidity were positively associated with bacteria concentrations at both beaches. Flow from Clinton River, changes in water levels, wind conditions, and log10 E. coli concentrations 2 days before or after the target bacteria concentrations were statistically significant at one or both beaches. In addition, various interaction terms were significant at Memorial Beach. Linear regression models for both beaches explained only about 30 percent of the variability in log10 E. coli concentrations. Regression-tree models were developed from data from both Memorial and Metropolitan Beaches but were found to have limited predictive capability in this study. The results indicate that too few observations were available to develop reliable regression-tree models. Linear logistic models were developed to estimate the probability of E. coli concentrations exceeding 300 most probable number (MPN) per 100 milliliters (mL). Rainfall amounts before bacteria sampling were positively associated with exceedance probabilities at both beaches. Flow of Clinton River, turbidity, and log10 E. coli concentrations measured before or after the target E. coli measurements were related to exceedances at one or both beaches. The linear logistic models were effective in estimating bacteria exceedances at both beaches. A receiver operating characteristic (ROC) analysis was used to determine cut points for maximizing the true positive rate prediction while minimizing the false positive rate. A two-dimensional hydrodynamic model was developed to simulate horizontal current patterns on Lake St. Clair in response to wind, flow, and water-level conditions at model boundaries. Simulated velocity fields were used to track hypothetical massless particles backward in time from the beaches along flow paths toward source areas. Reverse particle tracking for idealized steady-state conditions shows changes in expected flow paths and traveltimes with wind speeds and directions from 24 sectors. The results indicate that three to four sets of contiguous wind sectors have similar effects on flow paths in the vicinity of the beaches. In addition, reverse particle tracking was used for transient conditions to identify expected flow paths for 10 E. coli sampling events in 2004. These results demonstrate the ability to track hypothetical particles from the beaches, backward in time, to likely source areas. This ability, coupled with a greater frequency of bacteria sampling, may provide insight into changes in bacteria concentrations between source and sink areas.

  2. Gender Differential Item Functioning on a National Field-Specific Test: The Case of PhD Entrance Exam of TEFL in Iran

    ERIC Educational Resources Information Center

    Ahmadi, Alireza; Bazvand, Ali Darabi

    2016-01-01

    Differential Item Functioning (DIF) exists when examinees of equal ability from different groups have different probabilities of successful performance in a certain item. This study examined gender differential item functioning across the PhD Entrance Exam of TEFL (PEET) in Iran, using both logistic regression (LR) and one-parameter item response…

  3. A model-based approach to estimating forest area

    Treesearch

    Ronald E. McRoberts

    2006-01-01

    A logistic regression model based on forest inventory plot data and transformations of Landsat Thematic Mapper satellite imagery was used to predict the probability of forest for 15 study areas in Indiana, USA, and 15 in Minnesota, USA. Within each study area, model-based estimates of forest area were obtained for circular areas with radii of 5 km, 10 km, and 15 km and...

  4. Speed and Cardiac Recovery Variables Predict the Probability of Elimination in Equine Endurance Events

    PubMed Central

    Younes, Mohamed; Robert, Céline; Cottin, François; Barrey, Eric

    2015-01-01

    Nearly 50% of the horses participating in endurance events are eliminated at a veterinary examination (a vet gate). Detecting unfit horses before a health problem occurs and treatment is required is a challenge for veterinarians but is essential for improving equine welfare. We hypothesized that it would be possible to detect unfit horses earlier in the event by measuring heart rate recovery variables. Hence, the objective of the present study was to compute logistic regressions of heart rate, cardiac recovery time and average speed data recorded at the previous vet gate (n-1) and thus predict the probability of elimination during successive phases (n and following) in endurance events. Speed and heart rate data were extracted from an electronic database of endurance events (80–160 km in length) organized in four countries. Overall, 39% of the horses that started an event were eliminated—mostly due to lameness (64%) or metabolic disorders (15%). For each vet gate, logistic regressions of explanatory variables (average speed, cardiac recovery time and heart rate measured at the previous vet gate) and categorical variables (age and/or event distance) were computed to estimate the probability of elimination. The predictive logistic regressions for vet gates 2 to 5 correctly classified between 62% and 86% of the eliminated horses. The robustness of these results was confirmed by high areas under the receiving operating characteristic curves (0.68–0.84). Overall, a horse has a 70% chance of being eliminated at the next gate if its cardiac recovery time is longer than 11 min at vet gate 1 or 2, or longer than 13 min at vet gates 3 or 4. Heart rate recovery and average speed variables measured at the previous vet gate(s) enabled us to predict elimination at the following vet gate. These variables should be checked at each veterinary examination, in order to detect unfit horses as early as possible. Our predictive method may help to improve equine welfare and ethical considerations in endurance events. PMID:26322506

  5. Speed and Cardiac Recovery Variables Predict the Probability of Elimination in Equine Endurance Events.

    PubMed

    Younes, Mohamed; Robert, Céline; Cottin, François; Barrey, Eric

    2015-01-01

    Nearly 50% of the horses participating in endurance events are eliminated at a veterinary examination (a vet gate). Detecting unfit horses before a health problem occurs and treatment is required is a challenge for veterinarians but is essential for improving equine welfare. We hypothesized that it would be possible to detect unfit horses earlier in the event by measuring heart rate recovery variables. Hence, the objective of the present study was to compute logistic regressions of heart rate, cardiac recovery time and average speed data recorded at the previous vet gate (n-1) and thus predict the probability of elimination during successive phases (n and following) in endurance events. Speed and heart rate data were extracted from an electronic database of endurance events (80-160 km in length) organized in four countries. Overall, 39% of the horses that started an event were eliminated--mostly due to lameness (64%) or metabolic disorders (15%). For each vet gate, logistic regressions of explanatory variables (average speed, cardiac recovery time and heart rate measured at the previous vet gate) and categorical variables (age and/or event distance) were computed to estimate the probability of elimination. The predictive logistic regressions for vet gates 2 to 5 correctly classified between 62% and 86% of the eliminated horses. The robustness of these results was confirmed by high areas under the receiving operating characteristic curves (0.68-0.84). Overall, a horse has a 70% chance of being eliminated at the next gate if its cardiac recovery time is longer than 11 min at vet gate 1 or 2, or longer than 13 min at vet gates 3 or 4. Heart rate recovery and average speed variables measured at the previous vet gate(s) enabled us to predict elimination at the following vet gate. These variables should be checked at each veterinary examination, in order to detect unfit horses as early as possible. Our predictive method may help to improve equine welfare and ethical considerations in endurance events.

  6. Binary logistic regression modelling: Measuring the probability of relapse cases among drug addict

    NASA Astrophysics Data System (ADS)

    Ismail, Mohd Tahir; Alias, Siti Nor Shadila

    2014-07-01

    For many years Malaysia faced the drug addiction issues. The most serious case is relapse phenomenon among treated drug addict (drug addict who have under gone the rehabilitation programme at Narcotic Addiction Rehabilitation Centre, PUSPEN). Thus, the main objective of this study is to find the most significant factor that contributes to relapse to happen. The binary logistic regression analysis was employed to model the relationship between independent variables (predictors) and dependent variable. The dependent variable is the status of the drug addict either relapse, (Yes coded as 1) or not, (No coded as 0). Meanwhile the predictors involved are age, age at first taking drug, family history, education level, family crisis, community support and self motivation. The total of the sample is 200 which the data are provided by AADK (National Antidrug Agency). The finding of the study revealed that age and self motivation are statistically significant towards the relapse cases..

  7. Quantitative Analyses of Pediatric Cervical Spine Ossification Patterns Using Computed Tomography

    PubMed Central

    Yoganandan, Narayan; Pintar, Frank A.; Lew, Sean M.; Rao, Raj D.; Rangarajan, Nagarajan

    2011-01-01

    The objective of the present study was to quantify ossification processes of the human pediatric cervical spine. Computed tomography images were obtained from a high resolution scanner according to clinical protocols. Bone window images were used to identify the presence of the primary synchondroses of the atlas, axis, and C3 vertebrae in 101 children. Principles of logistic regression were used to determine probability distributions as a function of subject age for each synchondrosis for each vertebra. The mean and 95% upper and 95% lower confidence intervals are given for each dataset delineating probability curves. Posterior ossifications preceded bilateral anterior closures of the synchondroses in all vertebrae. However, ossifications occurred at different ages. Logistic regression results for closures of different synchondrosis indicated p-values of <0.001 for the atlas, ranging from 0.002 to <0.001 for the axis, and 0.021 to 0.005 for the C3 vertebra. Fifty percent probability of three, two, and one synchondroses occurred at 2.53, 6.97, and 7.57 years of age for the atlas; 3.59, 4.74, and 5.7 years of age for the axis; and 1.28, 2.22, and 3.17 years of age for the third cervical vertebrae, respectively. Ossifications occurring at different ages indicate non-uniform maturations of bone growth/strength. They provide an anatomical rationale to reexamine dummies, scaling processes, and injury metrics for improved understanding of pediatric neck injuries PMID:22105393

  8. Spatial patterns of drought persistence in East China

    NASA Astrophysics Data System (ADS)

    Meng, L.; Ford, T.

    2017-12-01

    East China has experienced a number of severe droughts in recent decades. Understanding the characteristics of droughts and their persistence will provide operational guidelines for water resource management and agricultural production. This study uses a logistic regression model to measure the probability of drought occurrence in the current season given the previous season's Standardized Precipitation Index (SPI) and Southern Oscillation Index (SOI) as well as drought persistence. Results reveal large spatial and seasonal variations in the relationship between the previous season's SPI and the drought occurrence probability in a given season. The drought persistence averaged over the entire study area for all the four seasons is approximately 34% with large variations from season to season and from region to region. The East and Northeast regions have the largest summer drought persistence ( 40%) and lowest fall drought persistence ( 28%). The spatial pattern in winter and spring drought persistence is dissimilar with stronger winter and weaker spring drought persistence in the Southwest and Northeast relative to other regions. Logistic regression analysis indicates a stronger negative relationship in summer-to-fall (or between fall drought occurrence and summer SPI) than other inter-season relationships. This study demonstrates that the impact of previous season SPI and SOI on current season drought varies substantially from region to region and from season to season. This study also shows stronger drought persistence in summer than in other seasons. In other words, the probability of fall drought occurrence is closely related to summer moisture conditions in the East China.

  9. Why credit risk markets are predestined for exhibiting log-periodic power law structures

    NASA Astrophysics Data System (ADS)

    Wosnitza, Jan Henrik; Leker, Jens

    2014-01-01

    Recent research has established the existence of log-periodic power law (LPPL) patterns in financial institutions’ credit default swap (CDS) spreads. The main purpose of this paper is to clarify why credit risk markets are predestined for exhibiting LPPL structures. To this end, the credit risk prediction of two variants of logistic regression, i.e. polynomial logistic regression (PLR) and kernel logistic regression (KLR), are firstly compared to the standard logistic regression (SLR). In doing so, the question whether the performances of rating systems based on balance sheet ratios can be improved by nonlinear transformations of the explanatory variables is resolved. Building on the result that nonlinear balance sheet ratio transformations hardly improve the SLR’s predictive power in our case, we secondly compare the classification performance of a multivariate SLR to the discriminative powers of probabilities of default derived from three different capital market data, namely bonds, CDSs, and stocks. Benefiting from the prompt inclusion of relevant information, the capital market data in general and CDSs in particular increasingly outperform the SLR while approaching the time of the credit event. Due to the higher classification performances, it seems plausible for creditors to align their investment decisions with capital market-based default indicators, i.e., to imitate the aggregate opinion of the market participants. Since imitation is considered to be the source of LPPL structures in financial time series, it is highly plausible to scan CDS spread developments for LPPL patterns. By establishing LPPL patterns in governmental CDS spread trajectories of some European crisis countries, the LPPL’s application to credit risk markets is extended. This novel piece of evidence further strengthens the claim that credit risk markets are adequate breeding grounds for LPPL patterns.

  10. Impact of Colic Pain as a Significant Factor for Predicting the Stone Free Rate of One-Session Shock Wave Lithotripsy for Treating Ureter Stones: A Bayesian Logistic Regression Model Analysis

    PubMed Central

    Chung, Doo Yong; Cho, Kang Su; Lee, Dae Hun; Han, Jang Hee; Kang, Dong Hyuk; Jung, Hae Do; Kown, Jong Kyou; Ham, Won Sik; Choi, Young Deuk; Lee, Joo Yong

    2015-01-01

    Purpose This study was conducted to evaluate colic pain as a prognostic pretreatment factor that can influence ureter stone clearance and to estimate the probability of stone-free status in shock wave lithotripsy (SWL) patients with a ureter stone. Materials and Methods We retrospectively reviewed the medical records of 1,418 patients who underwent their first SWL between 2005 and 2013. Among these patients, 551 had a ureter stone measuring 4–20 mm and were thus eligible for our analyses. The colic pain as the chief complaint was defined as either subjective flank pain during history taking and physical examination. Propensity-scores for established for colic pain was calculated for each patient using multivariate logistic regression based upon the following covariates: age, maximal stone length (MSL), and mean stone density (MSD). Each factor was evaluated as predictor for stone-free status by Bayesian and non-Bayesian logistic regression model. Results After propensity-score matching, 217 patients were extracted in each group from the total patient cohort. There were no statistical differences in variables used in propensity- score matching. One-session success and stone-free rate were also higher in the painful group (73.7% and 71.0%, respectively) than in the painless group (63.6% and 60.4%, respectively). In multivariate non-Bayesian and Bayesian logistic regression models, a painful stone, shorter MSL, and lower MSD were significant factors for one-session stone-free status in patients who underwent SWL. Conclusions Colic pain in patients with ureter calculi was one of the significant predicting factors including MSL and MSD for one-session stone-free status of SWL. PMID:25902059

  11. Predicting Rotator Cuff Tears Using Data Mining and Bayesian Likelihood Ratios

    PubMed Central

    Lu, Hsueh-Yi; Huang, Chen-Yuan; Su, Chwen-Tzeng; Lin, Chen-Chiang

    2014-01-01

    Objectives Rotator cuff tear is a common cause of shoulder diseases. Correct diagnosis of rotator cuff tears can save patients from further invasive, costly and painful tests. This study used predictive data mining and Bayesian theory to improve the accuracy of diagnosing rotator cuff tears by clinical examination alone. Methods In this retrospective study, 169 patients who had a preliminary diagnosis of rotator cuff tear on the basis of clinical evaluation followed by confirmatory MRI between 2007 and 2011 were identified. MRI was used as a reference standard to classify rotator cuff tears. The predictor variable was the clinical assessment results, which consisted of 16 attributes. This study employed 2 data mining methods (ANN and the decision tree) and a statistical method (logistic regression) to classify the rotator cuff diagnosis into “tear” and “no tear” groups. Likelihood ratio and Bayesian theory were applied to estimate the probability of rotator cuff tears based on the results of the prediction models. Results Our proposed data mining procedures outperformed the classic statistical method. The correction rate, sensitivity, specificity and area under the ROC curve of predicting a rotator cuff tear were statistical better in the ANN and decision tree models compared to logistic regression. Based on likelihood ratios derived from our prediction models, Fagan's nomogram could be constructed to assess the probability of a patient who has a rotator cuff tear using a pretest probability and a prediction result (tear or no tear). Conclusions Our predictive data mining models, combined with likelihood ratios and Bayesian theory, appear to be good tools to classify rotator cuff tears as well as determine the probability of the presence of the disease to enhance diagnostic decision making for rotator cuff tears. PMID:24733553

  12. The association between hyperoxia and patient outcomes after cardiac arrest: Analysis of a high-resolution database

    PubMed Central

    Elmer, Jonathan; Scutella, Michael; Pullalarevu, Raghevesh; Wang, Bo; Vaghasia, Nishit; Trzeciak, Stephen; Rosario-Rivera, Bedda L.; Guyette, Francis X.; Rittenberger, Jon C.; Dezfulian, Cameron

    2014-01-01

    Purpose Previous observational studies have inconsistently associated early hyperoxia with worse outcomes after cardiac arrest and have methodological limitations. We tested this association using a high-resolution database controlling for multiple disease-specific markers of severity of illness and care processes. Methods This was a retrospective analysis of a single-center, prospective registry of consecutive cardiac arrest patients. We included patients who survived and were mechanically ventilated ≥24h after arrest. Our main exposure was arterial oxygen tension (PaO2), which we categorized hourly for 24 hours as severe hyperoxia (>300mmHg), moderate or probable hyperoxia (101-299mmHg), normoxia (60-100mmHg) or hypoxia (<60mmHg). We controlled for Utstein-style covariates, markers of disease severity and markers of care responsiveness. We performed unadjusted and multiple logistic regression to test the association between oxygen exposure and survival to discharge, and used ordered logistic regression to test the association of oxygen exposure with neurological outcome and Sequential Organ Failure Assessment (SOFA) score at 24h. Results Of 184 patients, 36% were exposed to severe hyperoxia and overall mortality was 54%. Severe hyperoxia, but not moderate or probable hyperoxia, was associated with decreased survival in both unadjusted and adjusted analysis (adjusted odds ratio (OR) for survival 0.83 per hour exposure, P=0.04). Moderate or probable hyperoxia was not associated with survival but was associated with improved SOFA score 24h (OR 0.92, P<0.01). Conclusion Severe hyperoxia was independently associated with decreased survival to hospital discharge. Moderate or probable hyperoxia was not associated with decreased survival and was associated with improved organ function at 24h. PMID:25472570

  13. Applicability of the Ricketts' posteroanterior cephalometry for sex determination using logistic regression analysis in Hispano American Peruvians.

    PubMed

    Perez, Ivan; Chavez, Allison K; Ponce, Dario

    2016-01-01

    The Ricketts' posteroanterior (PA) cephalometry seems to be the most widely used and it has not been tested by multivariate statistics for sex determination. The objective was to determine the applicability of Ricketts' PA cephalometry for sex determination using the logistic regression analysis. The logistic models were estimated at distinct age cutoffs (all ages, 11 years, 13 years, and 15 years) in a database from 1,296 Hispano American Peruvians between 5 years and 44 years of age. The logistic models were composed by six cephalometric measurements; the accuracy achieved by resubstitution varied between 60% and 70% and all the variables, with one exception, exhibited a direct relationship with the probability of being classified as male; the nasal width exhibited an indirect relationship. The maxillary and facial widths were present in all models and may represent a sexual dimorphism indicator. The accuracy found was lower than the literature and the Ricketts' PA cephalometry may not be adequate for sex determination. The indirect relationship of the nasal width in models with data from patients of 12 years of age or less may be a trait related to age or a characteristic in the studied population, which could be better studied and confirmed.

  14. Decoding and modelling of time series count data using Poisson hidden Markov model and Markov ordinal logistic regression models.

    PubMed

    Sebastian, Tunny; Jeyaseelan, Visalakshi; Jeyaseelan, Lakshmanan; Anandan, Shalini; George, Sebastian; Bangdiwala, Shrikant I

    2018-01-01

    Hidden Markov models are stochastic models in which the observations are assumed to follow a mixture distribution, but the parameters of the components are governed by a Markov chain which is unobservable. The issues related to the estimation of Poisson-hidden Markov models in which the observations are coming from mixture of Poisson distributions and the parameters of the component Poisson distributions are governed by an m-state Markov chain with an unknown transition probability matrix are explained here. These methods were applied to the data on Vibrio cholerae counts reported every month for 11-year span at Christian Medical College, Vellore, India. Using Viterbi algorithm, the best estimate of the state sequence was obtained and hence the transition probability matrix. The mean passage time between the states were estimated. The 95% confidence interval for the mean passage time was estimated via Monte Carlo simulation. The three hidden states of the estimated Markov chain are labelled as 'Low', 'Moderate' and 'High' with the mean counts of 1.4, 6.6 and 20.2 and the estimated average duration of stay of 3, 3 and 4 months, respectively. Environmental risk factors were studied using Markov ordinal logistic regression analysis. No significant association was found between disease severity levels and climate components.

  15. Predicting Salmonella populations from biological, chemical, and physical indicators in Florida surface waters.

    PubMed

    McEgan, Rachel; Mootian, Gabriel; Goodridge, Lawrence D; Schaffner, Donald W; Danyluk, Michelle D

    2013-07-01

    Coliforms, Escherichia coli, and various physicochemical water characteristics have been suggested as indicators of microbial water quality or index organisms for pathogen populations. The relationship between the presence and/or concentration of Salmonella and biological, physical, or chemical indicators in Central Florida surface water samples over 12 consecutive months was explored. Samples were taken monthly for 12 months from 18 locations throughout Central Florida (n = 202). Air and water temperature, pH, oxidation-reduction potential (ORP), turbidity, and conductivity were measured. Weather data were obtained from nearby weather stations. Aerobic plate counts and most probable numbers (MPN) for Salmonella, E. coli, and coliforms were performed. Weak linear relationships existed between biological indicators (E. coli/coliforms) and Salmonella levels (R(2) < 0.1) and between physicochemical indicators and Salmonella levels (R(2) < 0.1). The average rainfall (previous day, week, and month) before sampling did not correlate well with bacterial levels. Logistic regression analysis showed that E. coli concentration can predict the probability of enumerating selected Salmonella levels. The lack of good correlations between biological indicators and Salmonella levels and between physicochemical indicators and Salmonella levels shows that the relationship between pathogens and indicators is complex. However, Escherichia coli provides a reasonable way to predict Salmonella levels in Central Florida surface water through logistic regression.

  16. Predicting Salmonella Populations from Biological, Chemical, and Physical Indicators in Florida Surface Waters

    PubMed Central

    McEgan, Rachel; Mootian, Gabriel; Goodridge, Lawrence D.; Schaffner, Donald W.

    2013-01-01

    Coliforms, Escherichia coli, and various physicochemical water characteristics have been suggested as indicators of microbial water quality or index organisms for pathogen populations. The relationship between the presence and/or concentration of Salmonella and biological, physical, or chemical indicators in Central Florida surface water samples over 12 consecutive months was explored. Samples were taken monthly for 12 months from 18 locations throughout Central Florida (n = 202). Air and water temperature, pH, oxidation-reduction potential (ORP), turbidity, and conductivity were measured. Weather data were obtained from nearby weather stations. Aerobic plate counts and most probable numbers (MPN) for Salmonella, E. coli, and coliforms were performed. Weak linear relationships existed between biological indicators (E. coli/coliforms) and Salmonella levels (R2 < 0.1) and between physicochemical indicators and Salmonella levels (R2 < 0.1). The average rainfall (previous day, week, and month) before sampling did not correlate well with bacterial levels. Logistic regression analysis showed that E. coli concentration can predict the probability of enumerating selected Salmonella levels. The lack of good correlations between biological indicators and Salmonella levels and between physicochemical indicators and Salmonella levels shows that the relationship between pathogens and indicators is complex. However, Escherichia coli provides a reasonable way to predict Salmonella levels in Central Florida surface water through logistic regression. PMID:23624476

  17. The Integrative Weaning Index in Elderly ICU Subjects.

    PubMed

    Azeredo, Leandro M; Nemer, Sérgio N; Barbas, Carmen Sv; Caldeira, Jefferson B; Noé, Rosângela; Guimarães, Bruno L; Caldas, Célia P

    2017-03-01

    With increasing life expectancy and ICU admission of elderly patients, mechanical ventilation, and weaning trials have increased worldwide. We evaluated a cohort with 479 subjects in the ICU. Patients younger than 18 y, tracheostomized, or with neurologic diseases were excluded, resulting in 331 subjects. Subjects ≥70 y old were considered elderly, whereas those <70 y old were considered non-elderly. Besides the conventional weaning indexes, we evaluated the performance of the integrative weaning index (IWI). The probability of successful weaning was investigated using relative risk and logistic regression. The Hosmer-Lemeshow goodness-of-fit test was used to calibrate and the C statistic was calculated to evaluate the association between predicted probabilities and observed proportions in the logistic regression model. Prevalence of successful weaning in the sample was 83.7%. There was no difference in mortality between elderly and non-elderly subjects ( P = .16), in days of mechanical ventilation ( P = .22) and days of weaning ( P = .55). In elderly subjects, the IWI was the only respiratory variable associated with mechanical ventilation weaning in this population ( P < .001). The IWI was the independent variable found in weaning of elderly subjects that may contribute to the critical moment of this population in intensive care. Copyright © 2017 by Daedalus Enterprises.

  18. Interrelationships Between Receiver/Relative Operating Characteristics Display, Binomial, Logit, and Bayes' Rule Probability of Detection Methodologies

    NASA Technical Reports Server (NTRS)

    Generazio, Edward R.

    2014-01-01

    Unknown risks are introduced into failure critical systems when probability of detection (POD) capabilities are accepted without a complete understanding of the statistical method applied and the interpretation of the statistical results. The presence of this risk in the nondestructive evaluation (NDE) community is revealed in common statements about POD. These statements are often interpreted in a variety of ways and therefore, the very existence of the statements identifies the need for a more comprehensive understanding of POD methodologies. Statistical methodologies have data requirements to be met, procedures to be followed, and requirements for validation or demonstration of adequacy of the POD estimates. Risks are further enhanced due to the wide range of statistical methodologies used for determining the POD capability. Receiver/Relative Operating Characteristics (ROC) Display, simple binomial, logistic regression, and Bayes' rule POD methodologies are widely used in determining POD capability. This work focuses on Hit-Miss data to reveal the framework of the interrelationships between Receiver/Relative Operating Characteristics Display, simple binomial, logistic regression, and Bayes' Rule methodologies as they are applied to POD. Knowledge of these interrelationships leads to an intuitive and global understanding of the statistical data, procedural and validation requirements for establishing credible POD estimates.

  19. Use of ACE-inhibitors and falls in patients with Parkinson's disease.

    PubMed

    Laudisio, Alice; Lo Monaco, Maria Rita; Silveri, Maria Caterina; Bentivoglio, Anna Rita; Vetrano, Davide L; Pisciotta, Maria Stella; Brandi, Vincenzo; Bernabei, Roberto; Zuccalà, Giuseppe

    2017-05-01

    Falls represent a major concern in patients with Parkinson's disease (PD); however, currently acknowledged treatments for PD are not effective in reducing the risk of falling. The aim was to assess the association of use of ACE-inhibitors (ACEIs) and angiotensin receptor blockers (ARBs) with falls among patients with PD. We analysed data of 194 elderly with PD attending a geriatric Day Hospital. Self-reported history of falls that occurred over the last year, as well as use of drugs, including ACEIs and angiotensin II receptor blockers (ARBs) were recorded. The association of the occurrence of any falls with use of ACEIs, and ARBs was assessed by logistic regression analysis. The association between the number of falls and use of ACEIs, and ARBs was assessed according to Poisson regression. In logistic regression, after adjusting for potential confounders, use of ACEIs was associated with a reduced probability of falling over the last year (OR=0.15, 95% CI=0.03-0.81; P=0.028). This association did not vary with blood pressure levels (P for the interaction term=0.528). Also, using Poisson regression, use of ACEIs predicted a reduced number of falls among participants who fell (PR=0.31; 95% CI=0.10-0.94; P=0.039). No association was found between use of ARBs and falls. Our results indicate that use of ACEIs might be independently associated with reduced probability, and a reduced number of falls among patients with PD. Dedicated studies are needed to define the single agents and dosages that might most effectively reduce the risk of falling in clinical practice. Copyright © 2017 Elsevier B.V. All rights reserved.

  20. Prediction of Emergency Department Hospital Admission Based on Natural Language Processing and Neural Networks.

    PubMed

    Zhang, Xingyu; Kim, Joyce; Patzer, Rachel E; Pitts, Stephen R; Patzer, Aaron; Schrager, Justin D

    2017-10-26

    To describe and compare logistic regression and neural network modeling strategies to predict hospital admission or transfer following initial presentation to Emergency Department (ED) triage with and without the addition of natural language processing elements. Using data from the National Hospital Ambulatory Medical Care Survey (NHAMCS), a cross-sectional probability sample of United States EDs from 2012 and 2013 survey years, we developed several predictive models with the outcome being admission to the hospital or transfer vs. discharge home. We included patient characteristics immediately available after the patient has presented to the ED and undergone a triage process. We used this information to construct logistic regression (LR) and multilayer neural network models (MLNN) which included natural language processing (NLP) and principal component analysis from the patient's reason for visit. Ten-fold cross validation was used to test the predictive capacity of each model and receiver operating curves (AUC) were then calculated for each model. Of the 47,200 ED visits from 642 hospitals, 6,335 (13.42%) resulted in hospital admission (or transfer). A total of 48 principal components were extracted by NLP from the reason for visit fields, which explained 75% of the overall variance for hospitalization. In the model including only structured variables, the AUC was 0.824 (95% CI 0.818-0.830) for logistic regression and 0.823 (95% CI 0.817-0.829) for MLNN. Models including only free-text information generated AUC of 0.742 (95% CI 0.731- 0.753) for logistic regression and 0.753 (95% CI 0.742-0.764) for MLNN. When both structured variables and free text variables were included, the AUC reached 0.846 (95% CI 0.839-0.853) for logistic regression and 0.844 (95% CI 0.836-0.852) for MLNN. The predictive accuracy of hospital admission or transfer for patients who presented to ED triage overall was good, and was improved with the inclusion of free text data from a patient's reason for visit regardless of modeling approach. Natural language processing and neural networks that incorporate patient-reported outcome free text may increase predictive accuracy for hospital admission.

  1. A comparative analysis of predictive models of morbidity in intensive care unit after cardiac surgery - part II: an illustrative example.

    PubMed

    Cevenini, Gabriele; Barbini, Emanuela; Scolletta, Sabino; Biagioli, Bonizella; Giomarelli, Pierpaolo; Barbini, Paolo

    2007-11-22

    Popular predictive models for estimating morbidity probability after heart surgery are compared critically in a unitary framework. The study is divided into two parts. In the first part modelling techniques and intrinsic strengths and weaknesses of different approaches were discussed from a theoretical point of view. In this second part the performances of the same models are evaluated in an illustrative example. Eight models were developed: Bayes linear and quadratic models, k-nearest neighbour model, logistic regression model, Higgins and direct scoring systems and two feed-forward artificial neural networks with one and two layers. Cardiovascular, respiratory, neurological, renal, infectious and hemorrhagic complications were defined as morbidity. Training and testing sets each of 545 cases were used. The optimal set of predictors was chosen among a collection of 78 preoperative, intraoperative and postoperative variables by a stepwise procedure. Discrimination and calibration were evaluated by the area under the receiver operating characteristic curve and Hosmer-Lemeshow goodness-of-fit test, respectively. Scoring systems and the logistic regression model required the largest set of predictors, while Bayesian and k-nearest neighbour models were much more parsimonious. In testing data, all models showed acceptable discrimination capacities, however the Bayes quadratic model, using only three predictors, provided the best performance. All models showed satisfactory generalization ability: again the Bayes quadratic model exhibited the best generalization, while artificial neural networks and scoring systems gave the worst results. Finally, poor calibration was obtained when using scoring systems, k-nearest neighbour model and artificial neural networks, while Bayes (after recalibration) and logistic regression models gave adequate results. Although all the predictive models showed acceptable discrimination performance in the example considered, the Bayes and logistic regression models seemed better than the others, because they also had good generalization and calibration. The Bayes quadratic model seemed to be a convincing alternative to the much more usual Bayes linear and logistic regression models. It showed its capacity to identify a minimum core of predictors generally recognized as essential to pragmatically evaluate the risk of developing morbidity after heart surgery.

  2. Antimicrobial Activity of Aroma Compounds against Saccharomyces cerevisiae and Improvement of Microbiological Stability of Soft Drinks as Assessed by Logistic Regression▿

    PubMed Central

    Belletti, Nicoletta; Kamdem, Sylvain Sado; Patrignani, Francesca; Lanciotti, Rosalba; Covelli, Alessandro; Gardini, Fausto

    2007-01-01

    The combined effects of a mild heat treatment (55°C) and the presence of three aroma compounds [citron essential oil, citral, and (E)-2-hexenal] on the spoilage of noncarbonated beverages inoculated with different amounts of a Saccharomyces cerevisiae strain were evaluated. The results, expressed as growth/no growth, were elaborated using a logistic regression in order to assess the probability of beverage spoilage as a function of thermal treatment length, concentration of flavoring agents, and yeast inoculum. The logit models obtained for the three substances were extremely precise. The thermal treatment alone, even if prolonged for 20 min, was not able to prevent yeast growth. However, the presence of increasing concentrations of aroma compounds improved the stability of the products. The inhibiting effect of the compounds was enhanced by a prolonged thermal treatment. In fact, it influenced the vapor pressure of the molecules, which can easily interact within microbial membranes when they are in gaseous form. (E)-2-Hexenal showed a threshold level, related to initial inoculum and thermal treatment length, over which yeast growth was rapidly inhibited. Concentrations over 100 ppm of citral and thermal treatment longer than 16 min allowed a 90% probability of stability for bottles inoculated with 105 CFU/bottle. Citron gave the most interesting responses: beverages with 500 ppm of essential oil needed only 3 min of treatment to prevent yeast growth. In this framework, the logistic regression proved to be an important tool to study alternative hurdle strategies for the stabilization of noncarbonated beverages. PMID:17616627

  3. The effect of high leverage points on the logistic ridge regression estimator having multicollinearity

    NASA Astrophysics Data System (ADS)

    Ariffin, Syaiba Balqish; Midi, Habshah

    2014-06-01

    This article is concerned with the performance of logistic ridge regression estimation technique in the presence of multicollinearity and high leverage points. In logistic regression, multicollinearity exists among predictors and in the information matrix. The maximum likelihood estimator suffers a huge setback in the presence of multicollinearity which cause regression estimates to have unduly large standard errors. To remedy this problem, a logistic ridge regression estimator is put forward. It is evident that the logistic ridge regression estimator outperforms the maximum likelihood approach for handling multicollinearity. The effect of high leverage points are then investigated on the performance of the logistic ridge regression estimator through real data set and simulation study. The findings signify that logistic ridge regression estimator fails to provide better parameter estimates in the presence of both high leverage points and multicollinearity.

  4. Breast Arterial Calcification Is Associated with Reproductive Factors in Asymptomatic Postmenopausal Women

    PubMed Central

    Whaley, Dana H.; Sheedy, Patrick F.; Peyser, Patricia A.

    2010-01-01

    Abstract Objective The etiology of breast arterial calcification (BAC) is not well understood. We examined reproductive history and cardiovascular disease (CVD) risk factor associations with the presence of detectable BAC in asymptomatic postmenopausal women. Methods Reproductive history and CVD risk factors were obtained in 240 asymptomatic postmenopausal women from a community-based research study who had a screening mammogram within 2 years of their participation in the study. The mammograms were reviewed for the presence of detectable BAC. Age-adjusted logistic regression models were fit to assess the association between each risk factor and the presence of BAC. Multiple variable logistic regression models were used to identify the most parsimonious model for the presence of BAC. Results The prevalence of BAC increased with increased age (p < 0.0001). The most parsimonious logistic regression model for BAC presence included age at time of examination, increased parity (p = 0.01), earlier age at first birth (p = 0.002), weight, and an age-by-weight interaction term (p = 0.004). Older women with a smaller body size had a higher probability of having BAC than women of the same age with a larger body size. Conclusions The presence or absence of BAC at mammography may provide an assessment of a postmenopausal woman's lifetime estrogen exposure and indicate women who could be at risk for hormonally related conditions. PMID:20629578

  5. Sample size determination for logistic regression on a logit-normal distribution.

    PubMed

    Kim, Seongho; Heath, Elisabeth; Heilbrun, Lance

    2017-06-01

    Although the sample size for simple logistic regression can be readily determined using currently available methods, the sample size calculation for multiple logistic regression requires some additional information, such as the coefficient of determination ([Formula: see text]) of a covariate of interest with other covariates, which is often unavailable in practice. The response variable of logistic regression follows a logit-normal distribution which can be generated from a logistic transformation of a normal distribution. Using this property of logistic regression, we propose new methods of determining the sample size for simple and multiple logistic regressions using a normal transformation of outcome measures. Simulation studies and a motivating example show several advantages of the proposed methods over the existing methods: (i) no need for [Formula: see text] for multiple logistic regression, (ii) available interim or group-sequential designs, and (iii) much smaller required sample size.

  6. Statin, testosterone and phosphodiesterase 5-inhibitor treatments and age related mortality in diabetes

    PubMed Central

    Hackett, Geoffrey; Jones, Peter W; Strange, Richard C; Ramachandran, Sudarshan

    2017-01-01

    AIM To determine how statins, testosterone (T) replacement therapy (TRT) and phosphodiesterase 5-inhibitors (PDE5I) influence age related mortality in diabetic men. METHODS We studied 857 diabetic men screened for the BLAST study, stratifying them (mean follow-up = 3.8 years) into: (1) Normal T levels/untreated (total T > 12 nmol/L and free T > 0.25 nmol/L), Low T/untreated and Low T/treated; (2) PDE5I/untreated and PDE5I/treated; and (3) statin/untreated and statin/treated groups. The relationship between age and mortality, alone and with T/TRT, statin and PDE5I treatment was studied using logistic regression. Mortality probability and 95%CI were calculated from the above models for each individual. RESULTS Age was associated with mortality (logistic regression, OR = 1.10, 95%CI: 1.08-1.13, P < 0.001). With all factors included, age (OR = 1.08, 95%CI: 1.06-1.11, P < 0.001), Low T/treated (OR = 0.38, 95%CI: 0.15-0.92, P = 0.033), PDE5I/treated (OR = 0.17, 95%CI: 0.053-0.56, P = 0.004) and statin/treated (OR = 0.59, 95%CI: 0.36-0.97, P = 0.038) were associated with lower mortality. Age related mortality was as described by Gompertz, r2 = 0.881 when Ln (mortality) was plotted against age. The probability of mortality and 95%CI (from logistic regression) of individuals, treated/untreated with the drugs, alone and in combination was plotted against age. Overlap of 95%CI lines was evident with statins and TRT. No overlap was evident with PDE5I alone and with statins and TRT, this suggesting a change in the relationship between age and mortality. CONCLUSION We show that statins, PDE5I and TRT reduce mortality in diabetes. PDE5I, alone and with the other treatments significantly alter age related mortality in diabetic men. PMID:28344753

  7. Risk Factors for Venous Thromboembolism After Spine Surgery

    PubMed Central

    Tominaga, Hiroyuki; Setoguchi, Takao; Tanabe, Fumito; Kawamura, Ichiro; Tsuneyoshi, Yasuhiro; Kawabata, Naoya; Nagano, Satoshi; Abematsu, Masahiko; Yamamoto, Takuya; Yone, Kazunori; Komiya, Setsuro

    2015-01-01

    Abstract The efficacy and safety of chemical prophylaxis to prevent the development of deep venous thrombosis (DVT) or pulmonary embolism (PE) following spine surgery are controversial because of the possibility of epidural hematoma formation. Postoperative venous thromboembolism (VTE) after spine surgery occurs at a frequency similar to that seen after joint operations, so it is important to identify the risk factors for VTE formation following spine surgery. We therefore retrospectively studied data from patients who had undergone spinal surgery and developed postoperative VTE to identify those risk factors. We conducted a retrospective clinical study with logistic regression analysis of a group of 80 patients who had undergone spine surgery at our institution from June 2012 to August 2013. All patients had been screened by ultrasonography for DVT in the lower extremities. Parameters of the patients with VTE were compared with those without VTE using the Mann–Whitney U-test and Fisher exact probability test. Logistic regression analysis was used to analyze the risk factors associated with VTE. A value of P < 0.05 was used to denote statistical significance. The prevalence of VTE was 25.0% (20/80 patients). One patient had sensed some incongruity in the chest area, but the vital signs of all patients were stable. VTEs had developed in the pulmonary artery in one patient, in the superficial femoral vein in one patient, in the popliteal vein in two patients, and in the soleal vein in 18 patients. The Mann–Whitney U-test and Fisher exact probability test showed that, except for preoperative walking disability, none of the parameters showed a significant difference between patients with and without VTE. Risk factors identified in the multivariate logistic regression analysis were preoperative walking disability and age. The prevalence of VTE after spine surgery was relatively high. The most important risk factor for developing postoperative VTE was preoperative walking disability. Gait training during the early postoperative period is required to prevent VTE. PMID:25654385

  8. Identifying predictors of childhood anaemia in north-east India.

    PubMed

    Dey, Sanku; Goswami, Sankar; Dey, Tanujit

    2013-12-01

    The objective of this study is to examine the factors that influence the occurrence of childhood anaemia in North-East India by exploring dataset of the Reproductive and Child Health-II Survey (RCH-II). The study population consisted of 10,137 children in the age-group of 0-6 year(s) from North-East India to explore the predictors of childhood anaemia by means of different background characteristics, such as place of residence, religion, household standard of living, literacy of mother, total children ever born to a mother, age of mother at marriage. Prevalence of anaemia among children was taken as a polytomous variable. The predicted probabilities of anaemia were established via multinomial logistic regression model. These probabilities provided the degree of assessment of the contribution of predictors in the prevalence of childhood anaemia. The mean haemoglobin concentration in children aged 0-6 year(s) was found to be 11.85 g/dL, with a standard deviation of 5.61 g/dL. The multiple logistic regression analysis showed that rural children were at greater risk of severe (OR = 2.035; p = 0.003) and moderate (OR = 1.23; p = 0.003) anaemia. All types of anaemia (severe, moderate, and mild) were more prevalent among Hindu children (OR = 2.971; p = 0.000), (OR = 1.195; p = 0.010), and (OR = 1.201; p = 0.011) than among children of other religions whereas moderate (OR = 1.406; p = 0.001) and mild (OR = 1.857; p=0.000) anaemia were more prevalent among Muslim children. The fecundity of the mother was found to have significant effect on anaemia. Women with multiple children were prone to greater risk of anaemia. The multiple logistic regression analysis also confirmed that children of literate mothers were comparatively at lesser risk of severe anaemia. Mother's age at marriage had a significant effect on anaemia of their children as well.

  9. Acute convexity subarachnoid haemorrhage and cortical superficial siderosis in probable cerebral amyloid angiopathy without lobar haemorrhage.

    PubMed

    Charidimou, Andreas; Boulouis, Grégoire; Fotiadis, Panagiotis; Xiong, Li; Ayres, Alison M; Schwab, Kristin M; Gurol, Mahmut Edip; Rosand, Jonathan; Greenberg, Steve M; Viswanathan, Anand

    2018-04-01

    Acute non-traumatic convexity subarachnoid haemorrhage (cSAH) is increasingly recognised in cerebral amyloid angiopathy (CAA). We investigated: (a) the overlap between acute cSAH and cortical superficial siderosis-a new CAA haemorrhagic imaging signature and (b) whether acute cSAH presents with particular clinical symptoms in patients with probable CAA without lobar intracerebral haemorrhage. MRI scans of 130 consecutive patients meeting modified Boston criteria for probable CAA were analysed for cortical superficial siderosis (focal, ≤3 sulci; disseminated, ≥4 sulci), and key small vessel disease markers. We compared clinical, imaging and cortical superficial siderosis topographical mapping data between subjects with versus without acute cSAH, using multivariable logistic regression. We included 33 patients with probable CAA presenting with acute cSAH and 97 without cSAH at presentation. Patients with acute cSAH were more commonly presenting with transient focal neurological episodes (76% vs 34%; p<0.0001) compared with patients with CAA without cSAH. Patients with acute cSAH were also more often clinically presenting with transient focal neurological episodes compared with cortical superficial siderosis-positive, but cSAH-negative subjects with CAA (76% vs 30%; p<0.0001). Cortical superficial siderosis prevalence (but no other CAA severity markers) was higher among patients with cSAH versus those without, especially disseminated cortical superficial siderosis (49% vs 19%; p<0.0001). In multivariable logistic regression, cortical superficial siderosis burden (OR 5.53; 95% CI 2.82 to 10.8, p<0.0001) and transient focal neurological episodes (OR 11.7; 95% CI 2.70 to 50.6, p=0.001) were independently associated with acute cSAH. This probable CAA cohort provides additional evidence for distinct disease phenotypes, determined by the presence of cSAH and cortical superficial siderosis. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  10. A comparison of Cox and logistic regression for use in genome-wide association studies of cohort and case-cohort design.

    PubMed

    Staley, James R; Jones, Edmund; Kaptoge, Stephen; Butterworth, Adam S; Sweeting, Michael J; Wood, Angela M; Howson, Joanna M M

    2017-06-01

    Logistic regression is often used instead of Cox regression to analyse genome-wide association studies (GWAS) of single-nucleotide polymorphisms (SNPs) and disease outcomes with cohort and case-cohort designs, as it is less computationally expensive. Although Cox and logistic regression models have been compared previously in cohort studies, this work does not completely cover the GWAS setting nor extend to the case-cohort study design. Here, we evaluated Cox and logistic regression applied to cohort and case-cohort genetic association studies using simulated data and genetic data from the EPIC-CVD study. In the cohort setting, there was a modest improvement in power to detect SNP-disease associations using Cox regression compared with logistic regression, which increased as the disease incidence increased. In contrast, logistic regression had more power than (Prentice weighted) Cox regression in the case-cohort setting. Logistic regression yielded inflated effect estimates (assuming the hazard ratio is the underlying measure of association) for both study designs, especially for SNPs with greater effect on disease. Given logistic regression is substantially more computationally efficient than Cox regression in both settings, we propose a two-step approach to GWAS in cohort and case-cohort studies. First to analyse all SNPs with logistic regression to identify associated variants below a pre-defined P-value threshold, and second to fit Cox regression (appropriately weighted in case-cohort studies) to those identified SNPs to ensure accurate estimation of association with disease.

  11. The crux of the method: assumptions in ordinary least squares and logistic regression.

    PubMed

    Long, Rebecca G

    2008-10-01

    Logistic regression has increasingly become the tool of choice when analyzing data with a binary dependent variable. While resources relating to the technique are widely available, clear discussions of why logistic regression should be used in place of ordinary least squares regression are difficult to find. The current paper compares and contrasts the assumptions of ordinary least squares with those of logistic regression and explains why logistic regression's looser assumptions make it adept at handling violations of the more important assumptions in ordinary least squares.

  12. Mental health status and healthcare utilization among community dwelling older adults.

    PubMed

    Adepoju, Omolola; Lin, Szu-Hsuan; Mileski, Michael; Kruse, Clemens Scott; Mask, Andrew

    2018-04-27

    Shifts in mental health utilization patterns are necessary to allow for meaningful access to care for vulnerable populations. There have been long standing issues in how mental health is provided, which has caused problems in that care being efficacious for those seeking it. To assess the relationship between mental health status and healthcare utilization among adults ≥65 years. A negative binomial regression model was used to assess the relationship between mental health status and healthcare utilization related to office-based physician visits, while a two-part model, consisting of logistic regression and negative binomial regression, was used to separately model emergency visits and inpatient services. The receipt of care in office-based settings were marginally higher for subjects with mental health difficulties. Both probabilities and counts of inpatient hospitalizations were similar across mental health categories. The count of ER visits was similar across mental health categories; however, the probability of having an emergency department visit was marginally higher for older adults who reported mental health difficulties in 2012. These findings are encouraging and lend promise to the recent initiatives on addressing gaps in mental healthcare services.

  13. Probability of foliar injury for Acer sp. based on foliar fluoride concentrations.

    PubMed

    McDonough, Andrew M; Dixon, Murray J; Terry, Debbie T; Todd, Aaron K; Luciani, Michael A; Williamson, Michele L; Roszak, Danuta S; Farias, Kim A

    2016-12-01

    Fluoride is considered one of the most phytotoxic elements to plants, and indicative fluoride injury has been associated over a wide range of foliar fluoride concentrations. The aim of this study was to determine the probability of indicative foliar fluoride injury based on Acer sp. foliar fluoride concentrations using a logistic regression model. Foliage from Acer nedundo, Acer saccharinum, Acer saccharum and Acer platanoides was collected along a distance gradient from three separate brick manufacturing facilities in southern Ontario as part of a long-term monitoring programme between 1995 and 2014. Hydrogen fluoride is the major emission source associated with the manufacturing facilities resulting with highly elevated foliar fluoride close to the facilities and decreasing with distance. Consistent with other studies, indicative fluoride injury was observed over a wide range of foliar concentrations (9.9-480.0 μg F -  g -1 ). The logistic regression model was statistically significant for the Acer sp. group, A. negundo and A. saccharinum; consequently, A. negundo being the most sensitive species among the group. In addition, A. saccharum and A. platanoides were not statistically significant within the model. We are unaware of published foliar fluoride values for Acer sp. within Canada, and this research provides policy maker and scientist with probabilities of indicative foliar injury for common urban Acer sp. trees that can help guide decisions about emissions controls. Further research should focus on mechanisms driving indicative fluoride injury over wide ranging foliar fluoride concentrations and help determine foliar fluoride thresholds for damage.

  14. Prediction models for clustered data: comparison of a random intercept and standard regression model

    PubMed Central

    2013-01-01

    Background When study data are clustered, standard regression analysis is considered inappropriate and analytical techniques for clustered data need to be used. For prediction research in which the interest of predictor effects is on the patient level, random effect regression models are probably preferred over standard regression analysis. It is well known that the random effect parameter estimates and the standard logistic regression parameter estimates are different. Here, we compared random effect and standard logistic regression models for their ability to provide accurate predictions. Methods Using an empirical study on 1642 surgical patients at risk of postoperative nausea and vomiting, who were treated by one of 19 anesthesiologists (clusters), we developed prognostic models either with standard or random intercept logistic regression. External validity of these models was assessed in new patients from other anesthesiologists. We supported our results with simulation studies using intra-class correlation coefficients (ICC) of 5%, 15%, or 30%. Standard performance measures and measures adapted for the clustered data structure were estimated. Results The model developed with random effect analysis showed better discrimination than the standard approach, if the cluster effects were used for risk prediction (standard c-index of 0.69 versus 0.66). In the external validation set, both models showed similar discrimination (standard c-index 0.68 versus 0.67). The simulation study confirmed these results. For datasets with a high ICC (≥15%), model calibration was only adequate in external subjects, if the used performance measure assumed the same data structure as the model development method: standard calibration measures showed good calibration for the standard developed model, calibration measures adapting the clustered data structure showed good calibration for the prediction model with random intercept. Conclusion The models with random intercept discriminate better than the standard model only if the cluster effect is used for predictions. The prediction model with random intercept had good calibration within clusters. PMID:23414436

  15. Prediction models for clustered data: comparison of a random intercept and standard regression model.

    PubMed

    Bouwmeester, Walter; Twisk, Jos W R; Kappen, Teus H; van Klei, Wilton A; Moons, Karel G M; Vergouwe, Yvonne

    2013-02-15

    When study data are clustered, standard regression analysis is considered inappropriate and analytical techniques for clustered data need to be used. For prediction research in which the interest of predictor effects is on the patient level, random effect regression models are probably preferred over standard regression analysis. It is well known that the random effect parameter estimates and the standard logistic regression parameter estimates are different. Here, we compared random effect and standard logistic regression models for their ability to provide accurate predictions. Using an empirical study on 1642 surgical patients at risk of postoperative nausea and vomiting, who were treated by one of 19 anesthesiologists (clusters), we developed prognostic models either with standard or random intercept logistic regression. External validity of these models was assessed in new patients from other anesthesiologists. We supported our results with simulation studies using intra-class correlation coefficients (ICC) of 5%, 15%, or 30%. Standard performance measures and measures adapted for the clustered data structure were estimated. The model developed with random effect analysis showed better discrimination than the standard approach, if the cluster effects were used for risk prediction (standard c-index of 0.69 versus 0.66). In the external validation set, both models showed similar discrimination (standard c-index 0.68 versus 0.67). The simulation study confirmed these results. For datasets with a high ICC (≥15%), model calibration was only adequate in external subjects, if the used performance measure assumed the same data structure as the model development method: standard calibration measures showed good calibration for the standard developed model, calibration measures adapting the clustered data structure showed good calibration for the prediction model with random intercept. The models with random intercept discriminate better than the standard model only if the cluster effect is used for predictions. The prediction model with random intercept had good calibration within clusters.

  16. Binary Logistic Regression Versus Boosted Regression Trees in Assessing Landslide Susceptibility for Multiple-Occurring Regional Landslide Events: Application to the 2009 Storm Event in Messina (Sicily, southern Italy).

    NASA Astrophysics Data System (ADS)

    Lombardo, L.; Cama, M.; Maerker, M.; Parisi, L.; Rotigliano, E.

    2014-12-01

    This study aims at comparing the performances of Binary Logistic Regression (BLR) and Boosted Regression Trees (BRT) methods in assessing landslide susceptibility for multiple-occurrence regional landslide events within the Mediterranean region. A test area was selected in the north-eastern sector of Sicily (southern Italy), corresponding to the catchments of the Briga and the Giampilieri streams both stretching for few kilometres from the Peloritan ridge (eastern Sicily, Italy) to the Ionian sea. This area was struck on the 1st October 2009 by an extreme climatic event resulting in thousands of rapid shallow landslides, mainly of debris flows and debris avalanches types involving the weathered layer of a low to high grade metamorphic bedrock. Exploiting the same set of predictors and the 2009 landslide archive, BLR- and BRT-based susceptibility models were obtained for the two catchments separately, adopting a random partition (RP) technique for validation; besides, the models trained in one of the two catchments (Briga) were tested in predicting the landslide distribution in the other (Giampilieri), adopting a spatial partition (SP) based validation procedure. All the validation procedures were based on multi-folds tests so to evaluate and compare the reliability of the fitting, the prediction skill, the coherence in the predictor selection and the precision of the susceptibility estimates. All the obtained models for the two methods produced very high predictive performances, with a general congruence between BLR and BRT in the predictor importance. In particular, the research highlighted that BRT-models reached a higher prediction performance with respect to BLR-models, for RP based modelling, whilst for the SP-based models the difference in predictive skills between the two methods dropped drastically, converging to an analogous excellent performance. However, when looking at the precision of the probability estimates, BLR demonstrated to produce more robust models in terms of selected predictors and coefficients, as well as of dispersion of the estimated probabilities around the mean value for each mapped pixel. The difference in the behaviour could be interpreted as the result of overfitting effects, which heavily affect decision tree classification more than logistic regression techniques.

  17. Using Dominance Analysis to Determine Predictor Importance in Logistic Regression

    ERIC Educational Resources Information Center

    Azen, Razia; Traxel, Nicole

    2009-01-01

    This article proposes an extension of dominance analysis that allows researchers to determine the relative importance of predictors in logistic regression models. Criteria for choosing logistic regression R[superscript 2] analogues were determined and measures were selected that can be used to perform dominance analysis in logistic regression. A…

  18. A model to predict progression in brain-injured patients.

    PubMed

    Tommasino, N; Forteza, D; Godino, M; Mizraji, R; Alvarez, I

    2014-11-01

    The study of brain death (BD) epidemiology and the acute brain injury (ABI) progression profile is important to improve public health programs, organ procurement strategies, and intensive care unit (ICU) protocols. The purpose of this study was to analyze the ABI progression profile among patients admitted to ICUs with a Glasgow Coma Score (GCS) ≤8, as well as establishing a prediction model of probability of death and BD. This was a retrospective analysis of prospective data that included all brain-injured patients with GCS ≤8 admitted to a total of four public and private ICUs in Uruguay (N = 1447). The independent predictor factors of death and BD were studied using logistic regression analysis. A hierarchical model consisting of 2 nested logit regression models was then created. With these models, the probabilities of death, BD, and death by cardiorespiratory arrest were analyzed. In the first regression, we observed that as the GCS decreased and age increased, the probability of death rose. Each additional year of age increased the probability of death by 0.014. In the second model, however, BD risk decreased with each year of age. The presence of swelling, mass effect, and/or space-occupying lesion increased BD risk for the same given GCS. In the presence of injuries compatible with intracranial hypertension, age behaved as a protective factor that reduced the probability of BD. Based on the analysis of the local epidemiology, a model to predict the probability of death and BD can be developed. The organ potential donation of a country, region, or hospital can be predicted on the basis of this model, customizing it to each specific situation.

  19. Hospital of diagnosis and probability of having surgical treatment for resectable gastric cancer.

    PubMed

    van Putten, M; Verhoeven, R H A; van Sandick, J W; Plukker, J T M; Lemmens, V E P P; Wijnhoven, B P L; Nieuwenhuijzen, G A P

    2016-02-01

    Gastric cancer surgery is increasingly being centralized in the Netherlands, whereas the diagnosis is often made in hospitals where gastric cancer surgery is not performed. The aim of this study was to assess whether hospital of diagnosis affects the probability of undergoing surgery and its impact on overall survival. All patients with potentially curable gastric cancer according to stage (cT1/1b-4a, cN0-2, cM0) diagnosed between 2005 and 2013 were selected from The Netherlands Cancer Registry. Multilevel logistic regression was used to examine the probability of undergoing surgery according to hospital of diagnosis. The effect of variation in probability of undergoing surgery among hospitals of diagnosis on overall survival during the intervals 2005-2009 and 2010-2013 was examined by using Cox regression analysis. A total of 5620 patients with potentially curable gastric cancer, diagnosed in 91 hospitals, were included. The proportion of patients who underwent surgery ranged from 53.1 to 83.9 per cent according to hospital of diagnosis (P < 0.001); after multivariable adjustment for patient and tumour characteristics it ranged from 57.0 to 78.2 per cent (P < 0.001). Multivariable Cox regression showed that patients diagnosed between 2010 and 2013 in hospitals with a low probability of patients undergoing curative treatment had worse overall survival (hazard ratio 1.21; P < 0.001). The large variation in probability of receiving surgery for gastric cancer between hospitals of diagnosis and its impact on overall survival indicates that gastric cancer decision-making is suboptimal. © 2015 BJS Society Ltd Published by John Wiley & Sons Ltd.

  20. Methodology for constructing a colour-difference acceptability scale.

    PubMed

    Laborie, Baptiste; Viénot, Françoise; Langlois, Sabine

    2010-09-01

    Observers were invited to report their degree of satisfaction on a 6-point semantic scale with respect to the conformity of a test colour with a white reference colour, simultaneously presented on a PDP display. Eight test patches were chosen along each of the +a*, -a*, +b*, -b* axes of the CIELAB chromaticity plane, at Y = 80 ± 2 cd.m(-2) . Experimental conditions reliably represented the automotive environment (patch size, angular distance between patches) and observers could move their head and eyes freely. We have compared several methods of category scaling, the Torgerson-DMT method (Torgerson, W. S. (1958). Theory and methods of scaling. Wiley, New York, USA); two versions of the regression method i.e. Bonnet's (Bonnet, C. (1986). Manuel pratique de psychophysique. Armand Colin, Paris, France) and logistic regression; and the medians method. We describe in detail a case where all methods yield substantial but slightly different results. The solution proposed by the regression method which works with incomplete matrices and yields results directly on a colorimetric scale is probably the most useful in this industrial context. Finally we summarize the implementation of the logistic regression method over four hues and for one experimental condition. © 2010 The Authors, Ophthalmic and Physiological Optics © 2010 The College of Optometrists.

  1. Electrofishing capture probability of smallmouth bass in streams

    USGS Publications Warehouse

    Dauwalter, D.C.; Fisher, W.L.

    2007-01-01

    Abundance estimation is an integral part of understanding the ecology and advancing the management of fish populations and communities. Mark-recapture and removal methods are commonly used to estimate the abundance of stream fishes. Alternatively, abundance can be estimated by dividing the number of individuals sampled by the probability of capture. We conducted a mark-recapture study and used multiple repeated-measures logistic regression to determine the influence of fish size, sampling procedures, and stream habitat variables on the cumulative capture probability for smallmouth bass Micropterus dolomieu in two eastern Oklahoma streams. The predicted capture probability was used to adjust the number of individuals sampled to obtain abundance estimates. The observed capture probabilities were higher for larger fish and decreased with successive electrofishing passes for larger fish only. Model selection suggested that the number of electrofishing passes, fish length, and mean thalweg depth affected capture probabilities the most; there was little evidence for any effect of electrofishing power density and woody debris density on capture probability. Leave-one-out cross validation showed that the cumulative capture probability model predicts smallmouth abundance accurately. ?? Copyright by the American Fisheries Society 2007.

  2. Validation of a temperature prediction model for heat deaths in undocumented border crossers.

    PubMed

    Ruttan, Tim; Stolz, Uwe; Jackson-Vance, Sara; Parks, Bruce; Keim, Samuel M

    2013-04-01

    Heat exposure is a leading cause of death in undocumented border crossers along the Arizona-Mexico border. We performed a validation study of a weather prediction model that predicts the probability of heat related deaths among undocumented border crossers. We analyzed a medical examiner registry cohort of undocumented border crosser heat- related deaths from January 1, 2002 to August 31, 2009 and used logistic regression to model the probability of one or more heat deaths on a given day using daily high temperature (DHT) as the predictor. At a critical threshold DHT of 40 °C, the probability of at least one heat death was 50 %. The probability of a heat death along the Arizona-Mexico border for suspected undocumented border crossers is strongly associated with ambient temperature. These results can be used in prevention and response efforts to assess the daily risk of deaths among undocumented border crossers in the region.

  3. Probabilistic Nowcasting of Low-Visibility Procedure States at Vienna International Airport During Cold Season

    NASA Astrophysics Data System (ADS)

    Kneringer, Philipp; Dietz, Sebastian J.; Mayr, Georg J.; Zeileis, Achim

    2018-04-01

    Airport operations are sensitive to visibility conditions. Low-visibility events may lead to capacity reduction, delays and economic losses. Different levels of low-visibility procedures (lvp) are enacted to ensure aviation safety. A nowcast of the probabilities for each of the lvp categories helps decision makers to optimally schedule their operations. An ordered logistic regression (OLR) model is used to forecast these probabilities directly. It is applied to cold season forecasts at Vienna International Airport for lead times of 30-min out to 2 h. Model inputs are standard meteorological measurements. The skill of the forecasts is accessed by the ranked probability score. OLR outperforms persistence, which is a strong contender at the shortest lead times. The ranked probability score of the OLR is even better than the one of nowcasts from human forecasters. The OLR-based nowcasting system is computationally fast and can be updated instantaneously when new data become available.

  4. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yan, H; Chen, Z; Nath, R

    Purpose: kV fluoroscopic imaging combined with MV treatment beam imaging has been investigated for intrafractional motion monitoring and correction. It is, however, subject to additional kV imaging dose to normal tissue. To balance tracking accuracy and imaging dose, we previously proposed an adaptive imaging strategy to dynamically decide future imaging type and moments based on motion tracking uncertainty. kV imaging may be used continuously for maximal accuracy or only when the position uncertainty (probability of out of threshold) is high if a preset imaging dose limit is considered. In this work, we propose more accurate methods to estimate tracking uncertaintymore » through analyzing acquired data in real-time. Methods: We simulated motion tracking process based on a previously developed imaging framework (MV + initial seconds of kV imaging) using real-time breathing data from 42 patients. Motion tracking errors for each time point were collected together with the time point’s corresponding features, such as tumor motion speed and 2D tracking error of previous time points, etc. We tested three methods for error uncertainty estimation based on the features: conditional probability distribution, logistic regression modeling, and support vector machine (SVM) classification to detect errors exceeding a threshold. Results: For conditional probability distribution, polynomial regressions on three features (previous tracking error, prediction quality, and cosine of the angle between the trajectory and the treatment beam) showed strong correlation with the variation (uncertainty) of the mean 3D tracking error and its standard deviation: R-square = 0.94 and 0.90, respectively. The logistic regression and SVM classification successfully identified about 95% of tracking errors exceeding 2.5mm threshold. Conclusion: The proposed methods can reliably estimate the motion tracking uncertainty in real-time, which can be used to guide adaptive additional imaging to confirm the tumor is within the margin or initialize motion compensation if it is out of the margin.« less

  5. Combined Endoscopic/Sonographic-Based Risk Matrix Model for Predicting One-Year Risk of Surgery: A Prospective Observational Study of a Tertiary Center Severe/Refractory Crohn's Disease Cohort.

    PubMed

    Rispo, Antonio; Imperatore, Nicola; Testa, Anna; Bucci, Luigi; Luglio, Gaetano; De Palma, Giovanni Domenico; Rea, Matilde; Nardone, Olga Maria; Caporaso, Nicola; Castiglione, Fabiana

    2018-03-08

    In the management of Crohn's Disease (CD) patients, having a simple score combining clinical, endoscopic and imaging features to predict the risk of surgery could help to tailor treatment more effectively. AIMS: to prospectively evaluate the one-year risk factors for surgery in refractory/severe CD and to generate a risk matrix for predicting the probability of surgery at one year. CD patients needing a disease re-assessment at our tertiary IBD centre underwent clinical, laboratory, endoscopy and bowel sonography (BS) examinations within one week. The optimal cut-off values in predicting surgery were identified using ROC curves for Simple Endoscopic Score for CD (SES-CD), bowel wall thickness (BWT) at BS, and small bowel CD extension at BS. Binary logistic regression and Cox's regression were then carried out. Finally, the probabilities of surgery were calculated for selected baseline levels of covariates and results were arranged in a prediction matrix. Of 100 CD patients, 30 underwent surgery within one year. SES-CD©9 (OR 15.3; p<0.001), BWT©7 mm (OR 15.8; p<0.001), small bowel CD extension at BS©33 cm (OR 8.23; p<0.001) and stricturing/penetrating behavior (OR 4.3; p<0.001) were the only independent factors predictive of surgery at one-year based on binary logistic and Cox's regressions. Our matrix model combined these risk factors and the probability of surgery ranged from 0.48% to 87.5% (sixteen combinations). Our risk matrix combining clinical, endoscopic and ultrasonographic findings can accurately predict the one-year risk of surgery in patients with severe/refractory CD requiring a disease re-evaluation. This tool could be of value in clinical practice, serving as the basis for a tailored management of CD patients.

  6. On estimating probability of presence from use-availability or presence-background data.

    PubMed

    Phillips, Steven J; Elith, Jane

    2013-06-01

    A fundamental ecological modeling task is to estimate the probability that a species is present in (or uses) a site, conditional on environmental variables. For many species, available data consist of "presence" data (locations where the species [or evidence of it] has been observed), together with "background" data, a random sample of available environmental conditions. Recently published papers disagree on whether probability of presence is identifiable from such presence-background data alone. This paper aims to resolve the disagreement, demonstrating that additional information is required. We defined seven simulated species representing various simple shapes of response to environmental variables (constant, linear, convex, unimodal, S-shaped) and ran five logistic model-fitting methods using 1000 presence samples and 10 000 background samples; the simulations were repeated 100 times. The experiment revealed a stark contrast between two groups of methods: those based on a strong assumption that species' true probability of presence exactly matches a given parametric form had highly variable predictions and much larger RMS error than methods that take population prevalence (the fraction of sites in which the species is present) as an additional parameter. For six species, the former group grossly under- or overestimated probability of presence. The cause was not model structure or choice of link function, because all methods were logistic with linear and, where necessary, quadratic terms. Rather, the experiment demonstrates that an estimate of prevalence is not just helpful, but is necessary (except in special cases) for identifying probability of presence. We therefore advise against use of methods that rely on the strong assumption, due to Lele and Keim (recently advocated by Royle et al.) and Lancaster and Imbens. The methods are fragile, and their strong assumption is unlikely to be true in practice. We emphasize, however, that we are not arguing against standard statistical methods such as logistic regression, generalized linear models, and so forth, none of which requires the strong assumption. If probability of presence is required for a given application, there is no panacea for lack of data. Presence-background data must be augmented with an additional datum, e.g., species' prevalence, to reliably estimate absolute (rather than relative) probability of presence.

  7. The association between rainfall rate and occurrence of an enterovirus epidemic due to a contaminated well.

    PubMed

    Jean, J-S; Guo, H-R; Chen, S-H; Liu, C-C; Chang, W-T; Yang, Y-J; Huang, M-C

    2006-12-01

    To determine the association between rainfall rate and occurrence of enterovirus infection related to contamination of drinking water. One fatality case and three cases of severe illness were observed during the enterovirus epidemic in a village in southern Taiwan from 16 September to 3 October 1998. Groundwater samples were collected from the public well in the village after heavy rainfall to test for enterovirus using the reverse transcription-polymerase chain reaction (RT-PCR) assay. The RT-PCR assay detected the enterovirus in the groundwater sample collected on 26 September 1998. The logistic regression model also revealed a statistically significant association between the rainfall rate and the observation of cases of enterovirus infection. According to the fitted logistic regression model, the probability of detecting cases of enterovirus infection was greater than 50% at rainfall rates >31 mm h(-1). The higher the rainfall rate, the higher the probability of enterovirus epidemic. Contamination of drinking water by the enterovirus may lead to epidemics that cause deaths and severe illness, and such contamination may be caused by heavy rainfall. The major finding in this study is that the enterovirus could be flushed to groundwater in an unconfined aquifer after a heavy rainfall. This work allows for a warning level so that an action can be taken to minimize future outbreaks and so protect public health.

  8. Predictive occurrence models for coastal wetland plant communities: Delineating hydrologic response surfaces with multinomial logistic regression

    NASA Astrophysics Data System (ADS)

    Snedden, Gregg A.; Steyer, Gregory D.

    2013-02-01

    Understanding plant community zonation along estuarine stress gradients is critical for effective conservation and restoration of coastal wetland ecosystems. We related the presence of plant community types to estuarine hydrology at 173 sites across coastal Louisiana. Percent relative cover by species was assessed at each site near the end of the growing season in 2008, and hourly water level and salinity were recorded at each site Oct 2007-Sep 2008. Nine plant community types were delineated with k-means clustering, and indicator species were identified for each of the community types with indicator species analysis. An inverse relation between salinity and species diversity was observed. Canonical correspondence analysis (CCA) effectively segregated the sites across ordination space by community type, and indicated that salinity and tidal amplitude were both important drivers of vegetation composition. Multinomial logistic regression (MLR) and Akaike's Information Criterion (AIC) were used to predict the probability of occurrence of the nine vegetation communities as a function of salinity and tidal amplitude, and probability surfaces obtained from the MLR model corroborated the CCA results. The weighted kappa statistic, calculated from the confusion matrix of predicted versus actual community types, was 0.7 and indicated good agreement between observed community types and model predictions. Our results suggest that models based on a few key hydrologic variables can be valuable tools for predicting vegetation community development when restoring and managing coastal wetlands.

  9. Calibration power of the Braden scale in predicting pressure ulcer development.

    PubMed

    Chen, Hong-Lin; Cao, Ying-Juan; Wang, Jing; Huai, Bao-Sha

    2016-11-02

    Calibration is the degree of correspondence between the estimated probability produced by a model and the actual observed probability. The aim of this study was to investigate the calibration power of the Braden scale in predicting pressure ulcer development (PU). A retrospective analysis was performed among consecutive patients in 2013. The patients were separated into training a group and a validation group. The predicted incidence was calculated using a logistic regression model in the training group and the Hosmer-Lemeshow test was used for assessing the goodness of fit. In the validation cohort, the observed and the predicted incidence were compared by the Chi-square (χ 2 ) goodness of fit test for calibration power. We included 2585 patients in the study, of these 78 patients (3.0%) developed a PU. Between the training and validation groups the patient characteristics were non-significant (p>0.05). In the training group, the logistic regression model for predicting pressure ulcer was Logit(P) = -0.433*Braden score+2.616. The Hosmer-Lemeshow test showed no goodness fit (χ 2 =13.472; p=0.019). In the validation group, the predicted pressure ulcer incidence also did not fit well with the observed incidence (χ 2 =42.154, p=0.000 by Braden scores; and χ 2 =17.223, p=0.001 by Braden scale risk classification). The Braden scale has low calibration power in predicting PU formation.

  10. Objective Lightning Probability Forecasting for Kennedy Space Center and Cape Canaveral Air Force Station

    NASA Technical Reports Server (NTRS)

    Lambert, Winifred; Wheeler, Mark

    2005-01-01

    Five logistic regression equations were created that predict the probability of cloud-to-ground lightning occurrence for the day in the KSC/CCAFS area for each month in the warm season. These equations integrated the results from several studies over recent years to improve thunderstorm forecasting at KSC/CCAFS. All of the equations outperform persistence, which is known to outperform NPTI, the current objective tool used in 45 WS lightning forecasting operations. The equations also performed well in other tests. As a result, the new equations will be added to the current set of tools used by the 45 WS to determine the probability of lightning for their daily planning forecast. The results from these equations are meant to be used as first-guess guidance when developing the lightning probability forecast for the day. They provide an objective base from which forecasters can use other observations, model data, consultation with other forecasters, and their own experience to create the final lightning probability for the 1100 UTC briefing.

  11. Chronic probable PTSD in police responders in the world trade center health registry ten to eleven years after 9/11.

    PubMed

    Cone, James E; Li, Jiehui; Kornblith, Erica; Gocheva, Vihra; Stellman, Steven D; Shaikh, Annum; Schwarzer, Ralf; Bowler, Rosemarie M

    2015-05-01

    Police enrolled in the World Trade Center Health Registry (WTCHR) demonstrated increased probable posttraumatic stress disorder (PTSD) after the terrorist attack of 9/11/2001. Police enrollees without pre-9/11 PTSD were studied. Probable PTSD was assessed by Posttraumatic Stress Check List (PCL). Risk factors for chronic, new onset or resolved PTSD were assessed using multinomial logistic regression. Half of police with probable PTSD in 2003-2007 continued to have probable PTSD in 2011-2012. Women had higher prevalence of PTSD than men (15.5% vs. 10.3%, P = 0.008). Risk factors for chronic PTSD included decreased social support, unemployment, 2+ life stressors in last 12 months, 2+ life-threatening events since 9/11, 2+ injuries during the 9/11 attacks, and unmet mental health needs. Police responders to the WTC attacks continue to bear a high mental health burden. Improved early access to mental health treatment for police exposed to disasters may be needed. © 2015 Wiley Periodicals, Inc.

  12. Study on optimization method of test conditions for fatigue crack detection using lock-in vibrothermography

    NASA Astrophysics Data System (ADS)

    Min, Qing-xu; Zhu, Jun-zhen; Feng, Fu-zhou; Xu, Chao; Sun, Ji-wei

    2017-06-01

    In this paper, the lock-in vibrothermography (LVT) is utilized for defect detection. Specifically, for a metal plate with an artificial fatigue crack, the temperature rise of the defective area is used for analyzing the influence of different test conditions, i.e. engagement force, excitation intensity, and modulated frequency. The multivariate nonlinear and logistic regression models are employed to estimate the POD (probability of detection) and POA (probability of alarm) of fatigue crack, respectively. The resulting optimal selection of test conditions is presented. The study aims to provide an optimized selection method of the test conditions in the vibrothermography system with the enhanced detection ability.

  13. Comparison of multinomial logistic regression and logistic regression: which is more efficient in allocating land use?

    NASA Astrophysics Data System (ADS)

    Lin, Yingzhi; Deng, Xiangzheng; Li, Xing; Ma, Enjun

    2014-12-01

    Spatially explicit simulation of land use change is the basis for estimating the effects of land use and cover change on energy fluxes, ecology and the environment. At the pixel level, logistic regression is one of the most common approaches used in spatially explicit land use allocation models to determine the relationship between land use and its causal factors in driving land use change, and thereby to evaluate land use suitability. However, these models have a drawback in that they do not determine/allocate land use based on the direct relationship between land use change and its driving factors. Consequently, a multinomial logistic regression method was introduced to address this flaw, and thereby, judge the suitability of a type of land use in any given pixel in a case study area of the Jiangxi Province, China. A comparison of the two regression methods indicated that the proportion of correctly allocated pixels using multinomial logistic regression was 92.98%, which was 8.47% higher than that obtained using logistic regression. Paired t-test results also showed that pixels were more clearly distinguished by multinomial logistic regression than by logistic regression. In conclusion, multinomial logistic regression is a more efficient and accurate method for the spatial allocation of land use changes. The application of this method in future land use change studies may improve the accuracy of predicting the effects of land use and cover change on energy fluxes, ecology, and environment.

  14. Sperm Retrieval in Patients with Klinefelter Syndrome: A Skewed Regression Model Analysis.

    PubMed

    Chehrazi, Mohammad; Rahimiforoushani, Abbas; Sabbaghian, Marjan; Nourijelyani, Keramat; Sadighi Gilani, Mohammad Ali; Hoseini, Mostafa; Vesali, Samira; Yaseri, Mehdi; Alizadeh, Ahad; Mohammad, Kazem; Samani, Reza Omani

    2017-01-01

    The most common chromosomal abnormality due to non-obstructive azoospermia (NOA) is Klinefelter syndrome (KS) which occurs in 1-1.72 out of 500-1000 male infants. The probability of retrieving sperm as the outcome could be asymmetrically different between patients with and without KS, therefore logistic regression analysis is not a well-qualified test for this type of data. This study has been designed to evaluate skewed regression model analysis for data collected from microsurgical testicular sperm extraction (micro-TESE) among azoospermic patients with and without non-mosaic KS syndrome. This cohort study compared the micro-TESE outcome between 134 men with classic KS and 537 men with NOA and normal karyotype who were referred to Royan Institute between 2009 and 2011. In addition to our main outcome, which was sperm retrieval, we also used logistic and skewed regression analyses to compare the following demographic and hormonal factors: age, level of follicle stimulating hormone (FSH), luteinizing hormone (LH), and testosterone between the two groups. A comparison of the micro-TESE between the KS and control groups showed a success rate of 28.4% (38/134) for the KS group and 22.2% (119/537) for the control group. In the KS group, a significantly difference (P<0.001) existed between testosterone levels for the successful sperm retrieval group (3.4 ± 0.48 mg/mL) compared to the unsuccessful sperm retrieval group (2.33 ± 0.23 mg/mL). The index for quasi Akaike information criterion (QAIC) had a goodness of fit of 74 for the skewed model which was lower than logistic regression (QAIC=85). According to the results, skewed regression is more efficient in estimating sperm retrieval success when the data from patients with KS are analyzed. This finding should be investigated by conducting additional studies with different data structures.

  15. Logistic regression of family data from retrospective study designs.

    PubMed

    Whittemore, Alice S; Halpern, Jerry

    2003-11-01

    We wish to study the effects of genetic and environmental factors on disease risk, using data from families ascertained because they contain multiple cases of the disease. To do so, we must account for the way participants were ascertained, and for within-family correlations in both disease occurrences and covariates. We model the joint probability distribution of the covariates of ascertained family members, given family disease occurrence and pedigree structure. We describe two such covariate models: the random effects model and the marginal model. Both models assume a logistic form for the distribution of one person's covariates that involves a vector beta of regression parameters. The components of beta in the two models have different interpretations, and they differ in magnitude when the covariates are correlated within families. We describe ascertainment assumptions needed to estimate consistently the parameters beta(RE) in the random effects model and the parameters beta(M) in the marginal model. Under the ascertainment assumptions for the random effects model, we show that conditional logistic regression (CLR) of matched family data gives a consistent estimate beta(RE) for beta(RE) and a consistent estimate for the covariance matrix of beta(RE). Under the ascertainment assumptions for the marginal model, we show that unconditional logistic regression (ULR) gives a consistent estimate for beta(M), and we give a consistent estimator for its covariance matrix. The random effects/CLR approach is simple to use and to interpret, but it can use data only from families containing both affected and unaffected members. The marginal/ULR approach uses data from all individuals, but its variance estimates require special computations. A C program to compute these variance estimates is available at http://www.stanford.edu/dept/HRP/epidemiology. We illustrate these pros and cons by application to data on the effects of parity on ovarian cancer risk in mother/daughter pairs, and use simulations to study the performance of the estimates. Copyright 2003 Wiley-Liss, Inc.

  16. Factors related to the joint probability of flooding on paired streams

    USGS Publications Warehouse

    Koltun, G.F.; Sherwood, J.M.

    1998-01-01

    The factors related to the joint probabilty of flooding on paired streams were investigated and quantified to provide information to aid in the design of hydraulic structures where the joint probabilty of flooding is an element of the design criteria. Stream pairs were considered to have flooded jointly at the design-year flood threshold (corresponding to the 2-, 10-, 25-, or 50-year instantaneous peak streamflow) if peak streamflows at both streams in the pair were observed or predicted to have equaled or exceeded the threshold on a given calendar day. Daily mean streamflow data were used as a substitute for instantaneous peak streamflow data to determine which flood thresholds were equaled or exceeded on any given day. Instantaneous peak streamflow data, when available, were used preferentially to assess flood-threshold exceedance. Daily mean streamflow data for each stream were paired with concurrent daily mean streamflow data at the other streams. Observed probabilities of joint flooding, determined for the 2-, 10-, 25-, and 50-year flood thresholds, were computed as the ratios of the total number of days when streamflows at both streams concurrently equaled or exceeded their flood thresholds (events) to the total number of days where streamflows at either stream equaled or exceeded its flood threshold (trials). A combination of correlation analyses, graphical analyses, and logistic-regression analyses were used to identify and quantify factors associated with the observed probabilities of joint flooding (event-trial ratios). The analyses indicated that the distance between drainage area centroids, the ratio of the smaller to larger drainage area, the mean drainage area, and the centroid angle adjusted 30 degrees were the basin characteristics most closely associated with the joint probabilty of flooding on paired streams in Ohio. In general, the analyses indicated that the joint probabilty of flooding decreases with an increase in centroid distance and increases with increases in drainage area ratio, mean drainage area, and centroid angle adjusted 30 degrees. Logistic-regression equations were developed, which can be used to estimate the probability that streamflows at two streams jointly equal or exceed the 2-year flood threshold given that the streamflow at one of the two streams equals or exceeds the 2-year flood threshold. The logistic-regression equations are applicable to stream pairs in Ohio (and border areas of adjacent states) that are unregulated, free of significant urban influences, and have characteristics similar to those of the 304 gaged stream pairs used in the logistic-regression analyses. Contingency tables were constructed and analyzed to provide information about the bivariate distribution of floods on paired streams. The contingency tables showed that the percentage of trials in which both streams in the pair concurrently flood at identical recurrence-interval ranges generally increased as centroid distances decreased and was greatest for stream pairs with adjusted centroid angles greater than or equal to 60 degrees and drainage area ratios greater than or equal to 0.01. Also, as centroid distance increased, streamflow at one stream in the pair was more likely to be in a less than 2-year recurrence-interval range when streamflow at the second stream was in a 2-year or greater recurrence-interval range.

  17. Exploring unobserved heterogeneity in bicyclists' red-light running behaviors at different crossing facilities.

    PubMed

    Guo, Yanyong; Li, Zhibin; Wu, Yao; Xu, Chengcheng

    2018-06-01

    Bicyclists running the red light at crossing facilities increase the potential of colliding with motor vehicles. Exploring the contributing factors could improve the prediction of running red-light probability and develop countermeasures to reduce such behaviors. However, individuals could have unobserved heterogeneities in running a red light, which make the accurate prediction more challenging. Traditional models assume that factor parameters are fixed and cannot capture the varying impacts on red-light running behaviors. In this study, we employed the full Bayesian random parameters logistic regression approach to account for the unobserved heterogeneous effects. Two types of crossing facilities were considered which were the signalized intersection crosswalks and the road segment crosswalks. Electric and conventional bikes were distinguished in the modeling. Data were collected from 16 crosswalks in urban area of Nanjing, China. Factors such as individual characteristics, road geometric design, environmental features, and traffic variables were examined. Model comparison indicates that the full Bayesian random parameters logistic regression approach is statistically superior to the standard logistic regression model. More red-light runners are predicted at signalized intersection crosswalks than at road segment crosswalks. Factors affecting red-light running behaviors are gender, age, bike type, road width, presence of raised median, separation width, signal type, green ratio, bike and vehicle volume, and average vehicle speed. Factors associated with the unobserved heterogeneity are gender, bike type, signal type, separation width, and bike volume. Copyright © 2018 Elsevier Ltd. All rights reserved.

  18. Estimating the susceptibility of surface water in Texas to nonpoint-source contamination by use of logistic regression modeling

    USGS Publications Warehouse

    Battaglin, William A.; Ulery, Randy L.; Winterstein, Thomas; Welborn, Toby

    2003-01-01

    In the State of Texas, surface water (streams, canals, and reservoirs) and ground water are used as sources of public water supply. Surface-water sources of public water supply are susceptible to contamination from point and nonpoint sources. To help protect sources of drinking water and to aid water managers in designing protective yet cost-effective and risk-mitigated monitoring strategies, the Texas Commission on Environmental Quality and the U.S. Geological Survey developed procedures to assess the susceptibility of public water-supply source waters in Texas to the occurrence of 227 contaminants. One component of the assessments is the determination of susceptibility of surface-water sources to nonpoint-source contamination. To accomplish this, water-quality data at 323 monitoring sites were matched with geographic information system-derived watershed- characteristic data for the watersheds upstream from the sites. Logistic regression models then were developed to estimate the probability that a particular contaminant will exceed a threshold concentration specified by the Texas Commission on Environmental Quality. Logistic regression models were developed for 63 of the 227 contaminants. Of the remaining contaminants, 106 were not modeled because monitoring data were available at less than 10 percent of the monitoring sites; 29 were not modeled because there were less than 15 percent detections of the contaminant in the monitoring data; 27 were not modeled because of the lack of any monitoring data; and 2 were not modeled because threshold values were not specified.

  19. Basic Diagnosis and Prediction of Persistent Contrail Occurrence using High-resolution Numerical Weather Analyses/Forecasts and Logistic Regression. Part I: Effects of Random Error

    NASA Technical Reports Server (NTRS)

    Duda, David P.; Minnis, Patrick

    2009-01-01

    Straightforward application of the Schmidt-Appleman contrail formation criteria to diagnose persistent contrail occurrence from numerical weather prediction data is hindered by significant bias errors in the upper tropospheric humidity. Logistic models of contrail occurrence have been proposed to overcome this problem, but basic questions remain about how random measurement error may affect their accuracy. A set of 5000 synthetic contrail observations is created to study the effects of random error in these probabilistic models. The simulated observations are based on distributions of temperature, humidity, and vertical velocity derived from Advanced Regional Prediction System (ARPS) weather analyses. The logistic models created from the simulated observations were evaluated using two common statistical measures of model accuracy, the percent correct (PC) and the Hanssen-Kuipers discriminant (HKD). To convert the probabilistic results of the logistic models into a dichotomous yes/no choice suitable for the statistical measures, two critical probability thresholds are considered. The HKD scores are higher when the climatological frequency of contrail occurrence is used as the critical threshold, while the PC scores are higher when the critical probability threshold is 0.5. For both thresholds, typical random errors in temperature, relative humidity, and vertical velocity are found to be small enough to allow for accurate logistic models of contrail occurrence. The accuracy of the models developed from synthetic data is over 85 percent for both the prediction of contrail occurrence and non-occurrence, although in practice, larger errors would be anticipated.

  20. Quantifying prognosis with risk predictions.

    PubMed

    Pace, Nathan L; Eberhart, Leopold H J; Kranke, Peter R

    2012-01-01

    Prognosis is a forecast, based on present observations in a patient, of their probable outcome from disease, surgery and so on. Research methods for the development of risk probabilities may not be familiar to some anaesthesiologists. We briefly describe methods for identifying risk factors and risk scores. A probability prediction rule assigns a risk probability to a patient for the occurrence of a specific event. Probability reflects the continuum between absolute certainty (Pi = 1) and certified impossibility (Pi = 0). Biomarkers and clinical covariates that modify risk are known as risk factors. The Pi as modified by risk factors can be estimated by identifying the risk factors and their weighting; these are usually obtained by stepwise logistic regression. The accuracy of probabilistic predictors can be separated into the concepts of 'overall performance', 'discrimination' and 'calibration'. Overall performance is the mathematical distance between predictions and outcomes. Discrimination is the ability of the predictor to rank order observations with different outcomes. Calibration is the correctness of prediction probabilities on an absolute scale. Statistical methods include the Brier score, coefficient of determination (Nagelkerke R2), C-statistic and regression calibration. External validation is the comparison of the actual outcomes to the predicted outcomes in a new and independent patient sample. External validation uses the statistical methods of overall performance, discrimination and calibration and is uniformly recommended before acceptance of the prediction model. Evidence from randomised controlled clinical trials should be obtained to show the effectiveness of risk scores for altering patient management and patient outcomes.

  1. Probability of detecting atrazine/desethyl-atrazine and elevated concentrations of nitrate plus nitrate as nitrogen in ground water in the Idaho part of the western Snake River Plain

    USGS Publications Warehouse

    Donato, Mary M.

    2000-01-01

    As ground water continues to provide an ever-growing proportion of Idaho?s drinking water, concerns about the quality of that resource are increasing. Pesticides (most commonly, atrazine/desethyl-atrazine, hereafter referred to as atrazine) and nitrite plus nitrate as nitrogen (hereafter referred to as nitrate) have been detected in many aquifers in the State. To provide a sound hydrogeologic basis for atrazine and nitrate management in southern Idaho—the largest region of land and water use in the State—the U.S. Geological Survey produced maps showing the probability of detecting these contaminants in ground water in the upper Snake River Basin (published in a 1998 report) and the western Snake River Plain (published in this report). The atrazine probability map for the western Snake River Plain was constructed by overlaying ground-water quality data with hydrogeologic and anthropogenic data in a geographic information system (GIS). A data set was produced in which each well had corresponding information on land use, geology, precipitation, soil characteristics, regional depth to ground water, well depth, water level, and atrazine use. These data were analyzed by logistic regression using a statistical software package. Several preliminary multivariate models were developed and those that best predicted the detection of atrazine were selected. The multivariate models then were entered into a GIS and the probability maps were produced. Land use, precipitation, soil hydrologic group, and well depth were significantly correlated with atrazine detections in the western Snake River Plain. These variables also were important in the 1998 probability study of the upper Snake River Basin. The effectiveness of the probability models for atrazine might be improved if more detailed data were available for atrazine application. A preliminary atrazine probability map for the entire Snake River Plain in Idaho, based on a data set representing that region, also was produced. In areas where this map overlaps the 1998 map of the upper Snake River Basin, the two maps show broadly similar probabilities of detecting atrazine. Logistic regression also was used to develop a preliminary statistical model that predicts the probability of detecting elevated nitrate in the western Snake River Plain. A nitrate probability map was produced from this model. Results showed that elevated nitrate concentrations were correlated with land use, soil organic content, well depth, and water level. Detailed information on nitrate input, specifically fertilizer application, might have improved the effectiveness of this model.

  2. Probability models for growth and aflatoxin B1 production as affected by intraspecies variability in Aspergillus flavus.

    PubMed

    Aldars-García, Laila; Berman, María; Ortiz, Jordi; Ramos, Antonio J; Marín, Sonia

    2018-06-01

    The probability of growth and aflatoxin B 1 (AFB 1 ) production of 20 isolates of Aspergillus flavus were studied using a full factorial design with eight water activity levels (0.84-0.98 a w ) and six temperature levels (15-40 °C). Binary data obtained from growth studies were modelled using linear logistic regression analysis as a function of temperature, water activity and time for each isolate. In parallel, AFB 1 was extracted at different times from newly formed colonies (up to 20 mm in diameter). Although a total of 950 AFB 1 values over time for all conditions studied were recorded, they were not considered to be enough to build probability models over time, and therefore, only models at 30 days were built. The confidence intervals of the regression coefficients of the probability of growth models showed some differences among the 20 growth models. Further, to assess the growth/no growth and AFB 1 /no- AFB 1 production boundaries, 0.05 and 0.5 probabilities were plotted at 30 days for all of the isolates. The boundaries for growth and AFB 1 showed that, in general, the conditions for growth were wider than those for AFB 1 production. The probability of growth and AFB 1 production seemed to be less variable among isolates than AFB 1 accumulation. Apart from the AFB 1 production probability models, using growth probability models for AFB 1 probability predictions could be, although conservative, a suitable alternative. Predictive mycology should include a number of isolates to generate data to build predictive models and take into account the genetic diversity of the species and thus make predictions as similar as possible to real fungal food contamination. Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. High Prevalence of Post-Traumatic Stress Symptoms in Relation to Social Factors in Affected Population One Year after the Fukushima Nuclear Disaster.

    PubMed

    Tsujiuchi, Takuya; Yamaguchi, Maya; Masuda, Kazutaka; Tsuchida, Marisa; Inomata, Tadashi; Kumano, Hiroaki; Kikuchi, Yasushi; Augusterfer, Eugene F; Mollica, Richard F

    2016-01-01

    This study investigated post-traumatic stress symptoms in relation to the population affected by the Fukushima Nuclear Disaster, one year after the disaster. Additionally, we investigated social factors, such as forced displacement, which we hypothesize contributed to the high prevalence of post-traumatic stress. Finally, we report of written narratives that were collected from the impacted population. Using the Impact of Event Scale-Revised (IES-R), questionnaires were sent to 2,011 households of those displaced from Fukushima prefecture living temporarily in Saitama prefecture. Of the 490 replies; 350 met the criteria for inclusion in the study. Multiple logistic regression analysis was performed to examine several characteristics and variables of social factors as predictors of probable post-traumatic stress disorder, PTSD. The mean score of IES-R was 36.15±21.55, with 59.4% having scores of 30 or higher, thus indicating a probable PTSD. No significant differences in percentages of high-risk subjects were found among sex, age, evacuation area, housing damages, tsunami affected, family split-up, and acquaintance support. By the result of multiple logistic regression analysis, the significant predictors of probable PTSD were chronic physical diseases (OR = 1.97), chronic mental diseases (OR = 6.25), worries about livelihood (OR = 2.27), lost jobs (OR = 1.71), lost social ties (OR = 2.27), and concerns about compensation (OR = 3.74). Although there are limitations in assuming a diagnosis of PTSD based on self-report IES-R, our findings indicate that there was a high-risk of PTSD strongly related to the nuclear disaster and its consequent evacuation and displacement. Therefore, recovery efforts must focus not only on medical and psychological treatment alone, but also on social and economic issues related to the displacement, as well.

  4. Mental health status and related characteristics of Chinese male rural-urban migrant workers.

    PubMed

    Yang, Tingzhong; Xu, Xiaochao; Li, Mu; Rockett, Ian R H; Zhu, Waner; Ellison-Barnes, Alejandra

    2012-06-01

    To explore mental health status and related characteristics in a sample of Chinese male rural-urban migrants. Subjects were 1,595 male rural-urban migrant workers selected though a multi-stage sample survey conducted in two cities (Hangzhou and Guangzhou). Data were collected by means of a self-administered questionnaire. Both life and work stressors were examined. Stress and mental health status were measured by the Chinese Perceived Stress Scale (CPSS) and the Chinese Health Questionnaire (CHQ), respectively. Unconditional logistic regression analysis was performed to identify factors associated with probable mental disorders. There are approximately 120 million rural-urban migrants in China. The prevalence of probable mental disorders in the sample population was 24.4% (95% CI: 23.3-25.5%), which was higher than among urban residents (20.2%, 95% CI: 18.8-21.7%). Logistic regression analysis revealed that five characteristics were positively associated with risk for probable mental disorders: originating in the South (OR = 2.00; 95% CI = 1.02, 4.00), higher life stress (OR = 7.63; 95% CI = 5.88, 10.00), staying in the city for 5-9 months each year (OR = 2.56; 95% CI = 1.67, 3.85), higher work stress (OR = 2.56; 95% CI = 1.96, 3.33), and separation from wife (OR = 2.43; 95% CI = 1.61, 3.57). Employment in machinery and transportation (OR = 0.54; 95% CI = 0.36, 0.81) and higher self-worth (OR = 0.42; 95% CI = 0.28, 0.62) were negatively associated. Findings support an urgent need to develop specific policies and programs to address mental health problems among Chinese rural-urban migrants.

  5. No compelling positive association between ovarian hormones and wearing red clothing when using multinomial analyses.

    PubMed

    Blake, Khandis R; Dixson, Barnaby J W; O'Dean, Siobhan M; Denson, Thomas F

    2017-04-01

    Several studies report that wearing red clothing enhances women's attractiveness and signals sexual proceptivity to men. The associated hypothesis that women will choose to wear red clothing when fertility is highest, however, has received mixed support from empirical studies. One possible cause of these mixed findings may be methodological. The current study aimed to replicate recent findings suggesting a positive association between hormonal profiles associated with high fertility (high estradiol to progesterone ratios) and the likelihood of wearing red. We compared the effect of the estradiol to progesterone ratio on the probability of wearing: red versus non-red (binary logistic regression); red versus neutral, black, blue, green, orange, multi-color, and gray (multinomial logistic regression); and each of these same colors in separate binary models (e.g., green versus non-green). Red versus non-red analyses showed a positive trend between a high estradiol to progesterone ratio and wearing red, but the effect only arose for younger women and was not robust across samples. We found no compelling evidence for ovarian hormones increasing the probability of wearing red in the other analyses. However, we did find that the probability of wearing neutral was positively associated with the estradiol to progesterone ratio, though the effect did not reach conventional levels of statistical significance. Findings suggest that although ovarian hormones may affect younger women's preference for red clothing under some conditions, the effect is not robust when differentiating amongst other colors of clothing. In addition, the effect of ovarian hormones on clothing color preference may not be specific to the color red. Copyright © 2017 Elsevier Inc. All rights reserved.

  6. Patient education-level affects treatment allocation and prognosis in esophageal- and gastroesophageal junctional cancer in Sweden.

    PubMed

    Linder, Gustav; Sandin, Fredrik; Johansson, Jan; Lindblad, Mats; Lundell, Lars; Hedberg, Jakob

    2018-02-01

    Low socioeconomic status and poor education elevate the risk of developing esophageal- and junctional cancer. High education level also increases survival after curative surgery. The present study aimed to investigate associations, if any, between patient education-level and treatment allocation after diagnosis of esophageal- and junctional cancer and its subsequent impact on survival. A nation-wide cohort study was undertaken. Data from a Swedish national quality register for esophageal cancer (NREV) was linked to the National Cancer Register, National Patient Register, Prescribed Drug Register, Cause of Death Register and educational data from Statistics Sweden. The effect of education level (low; ≤9 years, intermediate; 10-12 years and high >12 years) on the probability of allocation to curative treatment was analyzed with logistic regression. The Kaplan-Meier-method and Cox proportional hazard models were used to assess the effect of education on survival. A total of 4112 patients were included. In a multivariate logistic regression model, high education level was associated with greater probability of allocation to curative treatment (adjusted OR: 1.48, 95% CI: 1.08-2.03, p = 0,014) as was adherence to a multidisciplinary treatment-conference (adjusted OR: 3.13, 95% CI: 2.40-4.08, p < 0,001). High education level was associated with improved survival in the patients allocated to curative treatment (HR: 0.82, 95% CI: 0.69-0.99, p = 0,036). In this nation-wide cohort of esophageal- and junctional cancer patients, including data regarding many confounders, high education level was associated with greater probability of being offered curative treatment and improved survival. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Accounting for geophysical information in geostatistical characterization of unexploded ordnance (UXO) sites.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Saito, Hirotaka; Goovaerts, Pierre; McKenna, Sean Andrew

    2003-06-01

    Efficient and reliable unexploded ordnance (UXO) site characterization is needed for decisions regarding future land use. There are several types of data available at UXO sites and geophysical signal maps are one of the most valuable sources of information. Incorporation of such information into site characterization requires a flexible and reliable methodology. Geostatistics allows one to account for exhaustive secondary information (i.e.,, known at every location within the field) in many different ways. Kriging and logistic regression were combined to map the probability of occurrence of at least one geophysical anomaly of interest, such as UXO, from a limited numbermore » of indicator data. Logistic regression is used to derive the trend from a geophysical signal map, and kriged residuals are added to the trend to estimate the probabilities of the presence of UXO at unsampled locations (simple kriging with varying local means or SKlm). Each location is identified for further remedial action if the estimated probability is greater than a given threshold. The technique is illustrated using a hypothetical UXO site generated by a UXO simulator, and a corresponding geophysical signal map. Indicator data are collected along two transects located within the site. Classification performances are then assessed by computing proportions of correct classification, false positive, false negative, and Kappa statistics. Two common approaches, one of which does not take any secondary information into account (ordinary indicator kriging) and a variant of common cokriging (collocated cokriging), were used for comparison purposes. Results indicate that accounting for exhaustive secondary information improves the overall characterization of UXO sites if an appropriate methodology, SKlm in this case, is used.« less

  8. Establishment of a mathematic model for predicting malignancy in solitary pulmonary nodules.

    PubMed

    Zhang, Man; Zhuo, Na; Guo, Zhanlin; Zhang, Xingguang; Liang, Wenhua; Zhao, Sheng; He, Jianxing

    2015-10-01

    The aim of this study was to establish a model for predicting the probability of malignancy in solitary pulmonary nodules (SPNs) and provide guidance for the diagnosis and follow-up intervention of SPNs. We retrospectively analyzed the clinical data and computed tomography (CT) images of 294 patients with a clear pathological diagnosis of SPN. Multivariate logistic regression analysis was used to screen independent predictors of the probability of malignancy in the SPN and to establish a model for predicting malignancy in SPNs. Then, another 120 SPN patients who did not participate in the model establishment were chosen as group B and used to verify the accuracy of the prediction model. Multivariate logistic regression analysis showed that there were significant differences in age, smoking history, maximum diameter of nodules, spiculation, clear borders, and Cyfra21-1 levels between subgroups with benign and malignant SPNs (P<0.05). These factors were identified as independent predictors of malignancy in SPNs. The area under the curve (AUC) was 0.910 [95% confidence interval (CI), 0.857-0.963] in model with Cyfra21-1 significantly better than 0.812 (95% CI, 0.763-0.861) in model without Cyfra21-1 (P=0.008). The area under receiver operating characteristic (ROC) curve of our model is significantly higher than the Mayo model, VA model and Peking University People's (PKUPH) model. Our model (AUC =0.910) compared with Brock model (AUC =0.878, P=0.350), the difference was not statistically significant. The model added Cyfra21-1 could improve prediction. The prediction model established in this study can be used to assess the probability of malignancy in SPNs, thereby providing help for the diagnosis of SPNs and the selection of follow-up interventions.

  9. An evaluation of agreement between pectoral spines and otoliths for estimating ages of catfishes

    USGS Publications Warehouse

    Olive, J.A.; Schramm, Harold; Gerard, Patrick D.; Irwin, E.

    2011-01-01

    Otoliths have been shown to provide more accurate ages than pectoral spine sections for several catfish populations; but sampling otoliths requires euthanizing the specimen, whereas spines can be sampled non-lethally. To evaluate whether, and under what conditions, spines provide the same or similar age estimates as otoliths, we examined data sets of individual fish aged from pectoral spines and otoliths for six blue catfish Ictalurus furcatus populations (n=420), 14 channel catfish Ictalurus punctatus populations (n=997), and 10 flathead catfish Pylodictus olivaris populations (n=947) from lotic and lentic waters throughout the central and eastern U.S. Logistic regression determined that agreement between ages estimated from otoliths and spines was consistently related to age, but inconsistently related to growth rate. When modeled at mean growth rate, we found at least 80% probability of no difference in spine- and otolith-assigned ages up to ages 4 and 5 for blue and channel catfish, respectively. For flathead catfish, an 80% probability of agreement between spine- and otolith-assigned ages did not occur at any age due to high incidence of differences in assigned ages even for age-1 fish. Logistic regression models predicted at least 80% probability that spine and otolith ages differed by ≤1 year up to ages 13, 16, and 9 for blue, channel, and flathead catfish, respectively. Age-bias assessment found mean spine-assigned age differed by less than 1 year from otolith-assigned age up to ages 19, 9, and 17 for blue catfish, channel catfish, and flathead catfish, respectively. These results can be used to help guide decisions about which structure is most appropriate for estimating catfish ages for particular populations and management objectives.

  10. Standards for Standardized Logistic Regression Coefficients

    ERIC Educational Resources Information Center

    Menard, Scott

    2011-01-01

    Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…

  11. Using automated texture features to determine the probability for masking of a tumor on mammography, but not ultrasound.

    PubMed

    Häberle, Lothar; Hack, Carolin C; Heusinger, Katharina; Wagner, Florian; Jud, Sebastian M; Uder, Michael; Beckmann, Matthias W; Schulz-Wendtland, Rüdiger; Wittenberg, Thomas; Fasching, Peter A

    2017-08-30

    Tumors in radiologically dense breast were overlooked on mammograms more often than tumors in low-density breasts. A fast reproducible and automated method of assessing percentage mammographic density (PMD) would be desirable to support decisions whether ultrasonography should be provided for women in addition to mammography in diagnostic mammography units. PMD assessment has still not been included in clinical routine work, as there are issues of interobserver variability and the procedure is quite time consuming. This study investigated whether fully automatically generated texture features of mammograms can replace time-consuming semi-automatic PMD assessment to predict a patient's risk of having an invasive breast tumor that is visible on ultrasound but masked on mammography (mammography failure). This observational study included 1334 women with invasive breast cancer treated at a hospital-based diagnostic mammography unit. Ultrasound was available for the entire cohort as part of routine diagnosis. Computer-based threshold PMD assessments ("observed PMD") were carried out and 363 texture features were obtained from each mammogram. Several variable selection and regression techniques (univariate selection, lasso, boosting, random forest) were applied to predict PMD from the texture features. The predicted PMD values were each used as new predictor for masking in logistic regression models together with clinical predictors. These four logistic regression models with predicted PMD were compared among themselves and with a logistic regression model with observed PMD. The most accurate masking prediction was determined by cross-validation. About 120 of the 363 texture features were selected for predicting PMD. Density predictions with boosting were the best substitute for observed PMD to predict masking. Overall, the corresponding logistic regression model performed better (cross-validated AUC, 0.747) than one without mammographic density (0.734), but less well than the one with the observed PMD (0.753). However, in patients with an assigned mammography failure risk >10%, covering about half of all masked tumors, the boosting-based model performed at least as accurately as the original PMD model. Automatically generated texture features can replace semi-automatically determined PMD in a prediction model for mammography failure, such that more than 50% of masked tumors could be discovered.

  12. Contact and contagion: Probability of transmission given contact varies with demographic state in bighorn sheep

    PubMed Central

    Manlove, Kezia R.; Cassirer, E. Frances; Plowright, Raina K.; Cross, Paul C.; Hudson, Peter J.

    2018-01-01

    Understanding both contact and probability of transmission given contact are key to managing wildlife disease. However, wildlife disease research tends to focus on contact heterogeneity, in part because the probability of transmission given contact is notoriously difficult to measure. Here, we present a first step towards empirically investigating the probability of transmission given contact in free-ranging wildlife.We used measured contact networks to test whether bighorn sheep demographic states vary systematically in infectiousness or susceptibility to Mycoplasma ovipneumoniae, an agent responsible for bighorn sheep pneumonia.We built covariates using contact network metrics, demographic information and infection status, and used logistic regression to relate those covariates to lamb survival. The covariate set contained degree, a classic network metric describing node centrality, but also included covariates breaking the network metrics into subsets that differentiated between contacts with yearlings, ewes with lambs, and ewes without lambs, and animals with and without active infections.Yearlings, ewes with lambs, and ewes without lambs showed similar group membership patterns, but direct interactions involving touch occurred at a rate two orders of magnitude higher between lambs and reproductive ewes than between any classes of adults or yearlings, and one order of magnitude higher than direct interactions between multiple lambs.Although yearlings and non-reproductive bighorn ewes regularly carried M. ovipneumoniae, our models suggest that a contact with an infected reproductive ewe had approximately five times the odds of producing a lamb mortality event of an identical contact with an infected dry ewe or yearling. Consequently, management actions targeting infected animals might lead to unnecessary removal of young animals that carry pathogens but rarely transmit.This analysis demonstrates a simple logistic regression approach for testing a priori hypotheses about variation in the odds of transmission given contact for free-ranging hosts, and may be broadly applicable for investigations in wildlife disease ecology. PMID:28317104

  13. Contact and contagion: Probability of transmission given contact varies with demographic state in bighorn sheep.

    PubMed

    Manlove, Kezia R; Cassirer, E Frances; Plowright, Raina K; Cross, Paul C; Hudson, Peter J

    2017-07-01

    Understanding both contact and probability of transmission given contact are key to managing wildlife disease. However, wildlife disease research tends to focus on contact heterogeneity, in part because the probability of transmission given contact is notoriously difficult to measure. Here, we present a first step towards empirically investigating the probability of transmission given contact in free-ranging wildlife. We used measured contact networks to test whether bighorn sheep demographic states vary systematically in infectiousness or susceptibility to Mycoplasma ovipneumoniae, an agent responsible for bighorn sheep pneumonia. We built covariates using contact network metrics, demographic information and infection status, and used logistic regression to relate those covariates to lamb survival. The covariate set contained degree, a classic network metric describing node centrality, but also included covariates breaking the network metrics into subsets that differentiated between contacts with yearlings, ewes with lambs, and ewes without lambs, and animals with and without active infections. Yearlings, ewes with lambs, and ewes without lambs showed similar group membership patterns, but direct interactions involving touch occurred at a rate two orders of magnitude higher between lambs and reproductive ewes than between any classes of adults or yearlings, and one order of magnitude higher than direct interactions between multiple lambs. Although yearlings and non-reproductive bighorn ewes regularly carried M. ovipneumoniae, our models suggest that a contact with an infected reproductive ewe had approximately five times the odds of producing a lamb mortality event of an identical contact with an infected dry ewe or yearling. Consequently, management actions targeting infected animals might lead to unnecessary removal of young animals that carry pathogens but rarely transmit. This analysis demonstrates a simple logistic regression approach for testing a priori hypotheses about variation in the odds of transmission given contact for free-ranging hosts, and may be broadly applicable for investigations in wildlife disease ecology. © 2017 The Authors. Journal of Animal Ecology © 2017 British Ecological Society.

  14. Calculating the individual probability of successful ocriplasmin treatment in eyes with VMT syndrome: a multivariable prediction model from the EXPORT study.

    PubMed

    Paul, Christoph; Heun, Christine; Müller, Hans-Helge; Hoerauf, Hans; Feltgen, Nicolas; Wachtlin, Joachim; Kaymak, Hakan; Mennel, Stefan; Koss, Michael Janusz; Fauser, Sascha; Maier, Mathias M; Schumann, Ricarda G; Mueller, Simone; Chang, Petrus; Schmitz-Valckenberg, Steffen; Kazerounian, Sara; Szurman, Peter; Lommatzsch, Albrecht; Bertelmann, Thomas

    2017-10-31

    To evaluate predictive factors for the treatment success of ocriplasmin and to use these factors to generate a multivariate model to calculate the individual probability of successful treatment. Data were collected in a retrospective, multicentre cohort study. Patients with vitreomacular traction (VMT) syndrome without a full-thickness macular hole were included if they received an intravitreal injection (IVI) of ocriplasmin. Five factors (age, gender, lens status, presence of epiretinal membrane (ERM) formation and horizontal diameter of VMT) were assessed on their association with VMT resolution. A multivariable logistic regression model was employed to further analyse these factors and calculate the individual probability of successful treatment. 167 eyes of 167 patients were included. Univariate analysis revealed a significant correlation to VMT resolution for all analysed factors: age (years) (OR 0.9208; 95% CI 0.8845 to 0.9586; p<0.0001), gender (male) (OR 0.480; 95% CI 0.241 to 0.957; p=0.0371), lens status (phakic) (OR 2.042; 95% CI 1.054 to 3.958; p=0.0344), ERM formation (present) (OR 0.384; 95% CI 0.179 to 0.821; p=0.0136) and horizontal VMT diameter (µm) (OR 0.99812; 95% CI 0.99684 to 0.99941, p=0.0042). A significant multivariable logistic regression model was established with age and VMT diameter. Known predictive factors for VMT resolution after ocriplasmin IVI were confirmed in our study. We were able to combine them into a formula, ultimately allowing the calculation of an individual probability of treatment success with ocriplasmin in patients with VMT syndrome without FTHM. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  15. Contact and contagion: Probability of transmission given contact varies with demographic state in bighorn sheep

    USGS Publications Warehouse

    Manlove, Kezia R.; Cassirer, E. Frances; Plowright, Raina K.; Cross, Paul C.; Hudson, Peter J.

    2017-01-01

    Understanding both contact and probability of transmission given contact are key to managing wildlife disease. However, wildlife disease research tends to focus on contact heterogeneity, in part because the probability of transmission given contact is notoriously difficult to measure. Here, we present a first step towards empirically investigating the probability of transmission given contact in free-ranging wildlife.We used measured contact networks to test whether bighorn sheep demographic states vary systematically in infectiousness or susceptibility to Mycoplasma ovipneumoniae, an agent responsible for bighorn sheep pneumonia.We built covariates using contact network metrics, demographic information and infection status, and used logistic regression to relate those covariates to lamb survival. The covariate set contained degree, a classic network metric describing node centrality, but also included covariates breaking the network metrics into subsets that differentiated between contacts with yearlings, ewes with lambs, and ewes without lambs, and animals with and without active infections.Yearlings, ewes with lambs, and ewes without lambs showed similar group membership patterns, but direct interactions involving touch occurred at a rate two orders of magnitude higher between lambs and reproductive ewes than between any classes of adults or yearlings, and one order of magnitude higher than direct interactions between multiple lambs.Although yearlings and non-reproductive bighorn ewes regularly carried M. ovipneumoniae, our models suggest that a contact with an infected reproductive ewe had approximately five times the odds of producing a lamb mortality event of an identical contact with an infected dry ewe or yearling. Consequently, management actions targeting infected animals might lead to unnecessary removal of young animals that carry pathogens but rarely transmit.This analysis demonstrates a simple logistic regression approach for testing a priorihypotheses about variation in the odds of transmission given contact for free-ranging hosts, and may be broadly applicable for investigations in wildlife disease ecology.

  16. A Bayesian goodness of fit test and semiparametric generalization of logistic regression with measurement data.

    PubMed

    Schörgendorfer, Angela; Branscum, Adam J; Hanson, Timothy E

    2013-06-01

    Logistic regression is a popular tool for risk analysis in medical and population health science. With continuous response data, it is common to create a dichotomous outcome for logistic regression analysis by specifying a threshold for positivity. Fitting a linear regression to the nondichotomized response variable assuming a logistic sampling model for the data has been empirically shown to yield more efficient estimates of odds ratios than ordinary logistic regression of the dichotomized endpoint. We illustrate that risk inference is not robust to departures from the parametric logistic distribution. Moreover, the model assumption of proportional odds is generally not satisfied when the condition of a logistic distribution for the data is violated, leading to biased inference from a parametric logistic analysis. We develop novel Bayesian semiparametric methodology for testing goodness of fit of parametric logistic regression with continuous measurement data. The testing procedures hold for any cutoff threshold and our approach simultaneously provides the ability to perform semiparametric risk estimation. Bayes factors are calculated using the Savage-Dickey ratio for testing the null hypothesis of logistic regression versus a semiparametric generalization. We propose a fully Bayesian and a computationally efficient empirical Bayesian approach to testing, and we present methods for semiparametric estimation of risks, relative risks, and odds ratios when parametric logistic regression fails. Theoretical results establish the consistency of the empirical Bayes test. Results from simulated data show that the proposed approach provides accurate inference irrespective of whether parametric assumptions hold or not. Evaluation of risk factors for obesity shows that different inferences are derived from an analysis of a real data set when deviations from a logistic distribution are permissible in a flexible semiparametric framework. © 2013, The International Biometric Society.

  17. Socioeconomic Factors Associated with Post-Mastectomy Immediate Reconstruction in a Contemporary Cohort of Breast Cancer Survivors.

    PubMed

    Schumacher, Jessica R; Taylor, Lauren J; Tucholka, Jennifer L; Poore, Samuel; Eggen, Amanda; Steiman, Jennifer; Wilke, Lee G; Greenberg, Caprice C; Neuman, Heather B

    2017-10-01

    Post-mastectomy reconstruction is a critical component of high-quality breast cancer care. Prior studies demonstrate socioeconomic disparity in receipt of reconstruction. Our objective was to evaluate trends in receipt of immediate reconstruction and examine socioeconomic factors associated with reconstruction in a contemporary cohort. Using the National Cancer Database, we identified women <75 years of age with stage 0-1 breast cancer treated with mastectomy (n = 297,121). Trends in immediate reconstruction rates (2004-2013) for the overall cohort and stratified by socioeconomic factors were examined using Join-point regression analysis, and annual percentage change (APC) was calculated. We then restricted our sample to a contemporary cohort (2010-2013, n = 145,577). Multivariable logistic regression identified socioeconomic factors associated with immediate reconstruction. Average adjusted predicted probabilities of receiving reconstruction were calculated. Immediate reconstruction rates increased from 27 to 48%. Although absolute rates of reconstruction for each stratification group increased, similar APCs across strata led to persistent gaps in receipt of reconstruction. On multivariable logistic regression using our contemporary cohort, race, income, education, and insurance type were all strongly associated with immediate reconstruction. Patients with the lowest predicted probability of receiving reconstruction were patients with Medicaid who lived in areas with the lowest rates of high-school graduation (Black 42.4% [95% CI 40.5-44.3], White 45.7% [95% CI 43.9-47.4]). Although reconstruction rates have increased dramatically over the past decade, lower rates persist for disadvantaged patients. Understanding how socioeconomic factors influence receipt of reconstruction, and identifying modifiable factors, are critical next steps towards identifying interventions to reduce disparities in breast cancer surgical care.

  18. The role of gender in long-term sickness absence and transition to permanent disability benefits. Results from a multiregister based, prospective study in Norway 1990-1995.

    PubMed

    Gjesdal, Sturla; Bratberg, Espen

    2002-09-01

    The aim of the study was to identify predictors for the transition from long-term sickness absence into disability pension with a special focus on gender. The study used data from a national database containing a 10% random sample of the Norwegian adult population (The KIRUT database). The study population were all individuals in the database who on 1 January 1990 were eligible for sick pay from the Norwegian National Insurance System: 83,398 men and 75,586 women. Individuals below 60 years with long-term sickness absence starting in 1990 and 1991 were identified, 6,434 men and 8,233 women, and followed up for three years. Background data were used as independent variables in a logistic regression of the probability for receiving disability pension during follow-up. Annual cumulative incidence of long-term sickness absence was 6.5% for women and 4.9% for men. During follow-up, 12.4% of the women and 12.6% of the men received disability pension. Among full-time employed women only 10.3% had become disability pensioners, while the corresponding proportion for women working part-time was 15.5%. For men the figures were 12.1% (full-time) and 18.1% (part-time). In the logistic regression of the whole sample the female odds ratio was insignificant. The dominant predictive factors for disability pension were age and duration of the sickness spells. Working part-time also increased the risk. Higher levels of education and having children below 7 years reduced the probability for disability pension. Separate regressions for men and women showed that the 'protective' effect of having small children only remained for women.

  19. Factors determining the smooth flow and the non-operative time in a one-induction room to one-operating room setting

    PubMed Central

    Mulier, Jan P; De Boeck, Liesje; Meulders, Michel; Beliën, Jeroen; Colpaert, Jan; Sels, Annabel

    2015-01-01

    Rationale, aims and objectives What factors determine the use of an anaesthesia preparation room and shorten non-operative time? Methods A logistic regression is applied to 18 751 surgery records from AZ Sint-Jan Brugge AV, Belgium, where each operating room has its own anaesthesia preparation room. Surgeries, in which the patient's induction has already started when the preceding patient's surgery has ended, belong to a first group where the preparation room is used as an induction room. Surgeries not fulfilling this property belong to a second group. A logistic regression model tries to predict the probability that a surgery will be classified into a specific group. Non-operative time is calculated as the time between end of the previous surgery and incision of the next surgery. A log-linear regression of this non-operative time is performed. Results It was found that switches in surgeons, being a non-elective surgery as well as the previous surgery being non-elective, increase the probability of being classified into the second group. Only a few surgery types, anaesthesiologists and operating rooms can be found exclusively in one of the two groups. Analysis of variance demonstrates that the first group has significantly lower non-operative times. Switches in surgeons, anaesthesiologists and longer scheduled durations of the previous surgery increases the non-operative time. A switch in both surgeon and anaesthesiologist strengthens this negative effect. Only a few operating rooms and surgery types influence the non-operative time. Conclusion The use of the anaesthesia preparation room shortens the non-operative time and is determined by several human and structural factors. PMID:25496600

  20. The Mantel-Haenszel procedure revisited: models and generalizations.

    PubMed

    Fidler, Vaclav; Nagelkerke, Nico

    2013-01-01

    Several statistical methods have been developed for adjusting the Odds Ratio of the relation between two dichotomous variables X and Y for some confounders Z. With the exception of the Mantel-Haenszel method, commonly used methods, notably binary logistic regression, are not symmetrical in X and Y. The classical Mantel-Haenszel method however only works for confounders with a limited number of discrete strata, which limits its utility, and appears to have no basis in statistical models. Here we revisit the Mantel-Haenszel method and propose an extension to continuous and vector valued Z. The idea is to replace the observed cell entries in strata of the Mantel-Haenszel procedure by subject specific classification probabilities for the four possible values of (X,Y) predicted by a suitable statistical model. For situations where X and Y can be treated symmetrically we propose and explore the multinomial logistic model. Under the homogeneity hypothesis, which states that the odds ratio does not depend on Z, the logarithm of the odds ratio estimator can be expressed as a simple linear combination of three parameters of this model. Methods for testing the homogeneity hypothesis are proposed. The relationship between this method and binary logistic regression is explored. A numerical example using survey data is presented.

  1. The Mantel-Haenszel Procedure Revisited: Models and Generalizations

    PubMed Central

    Fidler, Vaclav; Nagelkerke, Nico

    2013-01-01

    Several statistical methods have been developed for adjusting the Odds Ratio of the relation between two dichotomous variables X and Y for some confounders Z. With the exception of the Mantel-Haenszel method, commonly used methods, notably binary logistic regression, are not symmetrical in X and Y. The classical Mantel-Haenszel method however only works for confounders with a limited number of discrete strata, which limits its utility, and appears to have no basis in statistical models. Here we revisit the Mantel-Haenszel method and propose an extension to continuous and vector valued Z. The idea is to replace the observed cell entries in strata of the Mantel-Haenszel procedure by subject specific classification probabilities for the four possible values of (X,Y) predicted by a suitable statistical model. For situations where X and Y can be treated symmetrically we propose and explore the multinomial logistic model. Under the homogeneity hypothesis, which states that the odds ratio does not depend on Z, the logarithm of the odds ratio estimator can be expressed as a simple linear combination of three parameters of this model. Methods for testing the homogeneity hypothesis are proposed. The relationship between this method and binary logistic regression is explored. A numerical example using survey data is presented. PMID:23516463

  2. Propensity score estimation: machine learning and classification methods as alternatives to logistic regression

    PubMed Central

    Westreich, Daniel; Lessler, Justin; Funk, Michele Jonsson

    2010-01-01

    Summary Objective Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this Review was to assess machine learning alternatives to logistic regression which may accomplish the same goals but with fewer assumptions or greater accuracy. Study Design and Setting We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. Results We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (CART), and meta-classifiers (in particular, boosting). Conclusion While the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and to a lesser extent decision trees (particularly CART) appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice. PMID:20630332

  3. Fungible weights in logistic regression.

    PubMed

    Jones, Jeff A; Waller, Niels G

    2016-06-01

    In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  4. Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression.

    PubMed

    Westreich, Daniel; Lessler, Justin; Funk, Michele Jonsson

    2010-08-01

    Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this review was to assess machine learning alternatives to logistic regression, which may accomplish the same goals but with fewer assumptions or greater accuracy. We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (classification and regression trees [CART]), and meta-classifiers (in particular, boosting). Although the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and, to a lesser extent, decision trees (particularly CART), appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice. Copyright (c) 2010 Elsevier Inc. All rights reserved.

  5. Predicting the risk of patients with biopsy Gleason score 6 to harbor a higher grade cancer.

    PubMed

    Gofrit, Ofer N; Zorn, Kevin C; Taxy, Jerome B; Lin, Shang; Zagaja, Gregory P; Steinberg, Gary D; Shalhav, Arieh L

    2007-11-01

    Prostate cancer Gleason score 3 + 3 = 6 is currently the most common score assigned on prostatic biopsies. We analyzed the clinical variables that predict the likelihood of a patient with biopsy Gleason score 6 to harbor a higher grade tumor. The study population consisted of 448 patients with a mean age of 59.1 years who underwent radical prostatectomy between February 2003 to October 2006 for Gleason score 6 adenocarcinoma. The effect of preoperative variables on the probability of a Gleason score upgrade on final pathological evaluation was evaluated using logistic regression, and classification and regression tree analysis. Gleason score upgrade was found in 91 of 448 patients (20.3%). Logistic regression showed that only serum prostate specific antigen and the greatest percent of cancer in a core were significantly associated with a score upgrade (p = 0.0014 and 0.023, respectively). Classification and regression tree analysis showed that the risk of a Gleason score upgrade was 62% when serum prostate specific antigen was higher than 12 ng/ml and 18% when serum prostate specific antigen was 12 ng/ml or less. In patients with serum prostate specific antigen lower than 12 ng/ml the risk of a score upgrade could be dichotomized at a greatest percent of cancer in a core of 5%. The risk was 22.6% and 10.5% when the greatest percent of cancer in a core was higher than 5% and 5% or lower, respectively. The probability of patients with a prostate biopsy Gleason score of 6 to conceal a Gleason score of 7 or higher can be predicted using serum prostate specific antigen and the greatest percent of cancer in a core. With these parameters it is possible to predict upgrade rates as high as 62% and as low as 10.5%.

  6. Should metacognition be measured by logistic regression?

    PubMed

    Rausch, Manuel; Zehetleitner, Michael

    2017-03-01

    Are logistic regression slopes suitable to quantify metacognitive sensitivity, i.e. the efficiency with which subjective reports differentiate between correct and incorrect task responses? We analytically show that logistic regression slopes are independent from rating criteria in one specific model of metacognition, which assumes (i) that rating decisions are based on sensory evidence generated independently of the sensory evidence used for primary task responses and (ii) that the distributions of evidence are logistic. Given a hierarchical model of metacognition, logistic regression slopes depend on rating criteria. According to all considered models, regression slopes depend on the primary task criterion. A reanalysis of previous data revealed that massive numbers of trials are required to distinguish between hierarchical and independent models with tolerable accuracy. It is argued that researchers who wish to use logistic regression as measure of metacognitive sensitivity need to control the primary task criterion and rating criteria. Copyright © 2017 Elsevier Inc. All rights reserved.

  7. Automatic seed selection for segmentation of liver cirrhosis in laparoscopic sequences

    NASA Astrophysics Data System (ADS)

    Sinha, Rahul; Marcinczak, Jan Marek; Grigat, Rolf-Rainer

    2014-03-01

    For computer aided diagnosis based on laparoscopic sequences, image segmentation is one of the basic steps which define the success of all further processing. However, many image segmentation algorithms require prior knowledge which is given by interaction with the clinician. We propose an automatic seed selection algorithm for segmentation of liver cirrhosis in laparoscopic sequences which assigns each pixel a probability of being cirrhotic liver tissue or background tissue. Our approach is based on a trained classifier using SIFT and RGB features with PCA. Due to the unique illumination conditions in laparoscopic sequences of the liver, a very low dimensional feature space can be used for classification via logistic regression. The methodology is evaluated on 718 cirrhotic liver and background patches that are taken from laparoscopic sequences of 7 patients. Using a linear classifier we achieve a precision of 91% in a leave-one-patient-out cross-validation. Furthermore, we demonstrate that with logistic probability estimates, seeds with high certainty of being cirrhotic liver tissue can be obtained. For example, our precision of liver seeds increases to 98.5% if only seeds with more than 95% probability of being liver are used. Finally, these automatically selected seeds can be used as priors in Graph Cuts which is demonstrated in this paper.

  8. London Measure of Unplanned Pregnancy: guidance for its use as an outcome measure

    PubMed Central

    Hall, Jennifer A; Barrett, Geraldine; Copas, Andrew; Stephenson, Judith

    2017-01-01

    Background The London Measure of Unplanned Pregnancy (LMUP) is a psychometrically validated measure of the degree of intention of a current or recent pregnancy. The LMUP is increasingly being used worldwide, and can be used to evaluate family planning or preconception care programs. However, beyond recommending the use of the full LMUP scale, there is no published guidance on how to use the LMUP as an outcome measure. Ordinal logistic regression has been recommended informally, but studies published to date have all used binary logistic regression and dichotomized the scale at different cut points. There is thus a need for evidence-based guidance to provide a standardized methodology for multivariate analysis and to enable comparison of results. This paper makes recommendations for the regression method for analysis of the LMUP as an outcome measure. Materials and methods Data collected from 4,244 pregnant women in Malawi were used to compare five regression methods: linear, logistic with two cut points, and ordinal logistic with either the full or grouped LMUP score. The recommendations were then tested on the original UK LMUP data. Results There were small but no important differences in the findings across the regression models. Logistic regression resulted in the largest loss of information, and assumptions were violated for the linear and ordinal logistic regression. Consequently, robust standard errors were used for linear regression and a partial proportional odds ordinal logistic regression model attempted. The latter could only be fitted for grouped LMUP score. Conclusion We recommend the linear regression model with robust standard errors to make full use of the LMUP score when analyzed as an outcome measure. Ordinal logistic regression could be considered, but a partial proportional odds model with grouped LMUP score may be required. Logistic regression is the least-favored option, due to the loss of information. For logistic regression, the cut point for un/planned pregnancy should be between nine and ten. These recommendations will standardize the analysis of LMUP data and enhance comparability of results across studies. PMID:28435343

  9. Modelling long-term fire occurrence factors in Spain by accounting for local variations with geographically weighted regression

    NASA Astrophysics Data System (ADS)

    Martínez-Fernández, J.; Chuvieco, E.; Koutsias, N.

    2013-02-01

    Humans are responsible for most forest fires in Europe, but anthropogenic factors behind these events are still poorly understood. We tried to identify the driving factors of human-caused fire occurrence in Spain by applying two different statistical approaches. Firstly, assuming stationary processes for the whole country, we created models based on multiple linear regression and binary logistic regression to find factors associated with fire density and fire presence, respectively. Secondly, we used geographically weighted regression (GWR) to better understand and explore the local and regional variations of those factors behind human-caused fire occurrence. The number of human-caused fires occurring within a 25-yr period (1983-2007) was computed for each of the 7638 Spanish mainland municipalities, creating a binary variable (fire/no fire) to develop logistic models, and a continuous variable (fire density) to build standard linear regression models. A total of 383 657 fires were registered in the study dataset. The binary logistic model, which estimates the probability of having/not having a fire, successfully classified 76.4% of the total observations, while the ordinary least squares (OLS) regression model explained 53% of the variation of the fire density patterns (adjusted R2 = 0.53). Both approaches confirmed, in addition to forest and climatic variables, the importance of variables related with agrarian activities, land abandonment, rural population exodus and developmental processes as underlying factors of fire occurrence. For the GWR approach, the explanatory power of the GW linear model for fire density using an adaptive bandwidth increased from 53% to 67%, while for the GW logistic model the correctly classified observations improved only slightly, from 76.4% to 78.4%, but significantly according to the corrected Akaike Information Criterion (AICc), from 3451.19 to 3321.19. The results from GWR indicated a significant spatial variation in the local parameter estimates for all the variables and an important reduction of the autocorrelation in the residuals of the GW linear model. Despite the fitting improvement of local models, GW regression, more than an alternative to "global" or traditional regression modelling, seems to be a valuable complement to explore the non-stationary relationships between the response variable and the explanatory variables. The synergy of global and local modelling provides insights into fire management and policy and helps further our understanding of the fire problem over large areas while at the same time recognizing its local character.

  10. Logistic models--an odd(s) kind of regression.

    PubMed

    Jupiter, Daniel C

    2013-01-01

    The logistic regression model bears some similarity to the multivariable linear regression with which we are familiar. However, the differences are great enough to warrant a discussion of the need for and interpretation of logistic regression. Copyright © 2013 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.

  11. Probability of detecting atrazine/desethyl-atrazine and elevated concentrations of nitrate in ground water in Colorado

    USGS Publications Warehouse

    Rupert, Michael G.

    2003-01-01

    Draft Federal regulations may require that each State develop a State Pesticide Management Plan for the herbicides atrazine, alachlor, metolachlor, and simazine. Maps were developed that the State of Colorado could use to predict the probability of detecting atrazine and desethyl-atrazine (a breakdown product of atrazine) in ground water in Colorado. These maps can be incorporated into the State Pesticide Management Plan and can help provide a sound hydrogeologic basis for atrazine management in Colorado. Maps showing the probability of detecting elevated nitrite plus nitrate as nitrogen (nitrate) concentrations in ground water in Colorado also were developed because nitrate is a contaminant of concern in many areas of Colorado. Maps showing the probability of detecting atrazine and(or) desethyl-atrazine (atrazine/DEA) at or greater than concentrations of 0.1 microgram per liter and nitrate concentrations in ground water greater than 5 milligrams per liter were developed as follows: (1) Ground-water quality data were overlaid with anthropogenic and hydrogeologic data using a geographic information system to produce a data set in which each well had corresponding data on atrazine use, fertilizer use, geology, hydrogeomorphic regions, land cover, precipitation, soils, and well construction. These data then were downloaded to a statistical software package for analysis by logistic regression. (2) Relations were observed between ground-water quality and the percentage of land-cover categories within circular regions (buffers) around wells. Several buffer sizes were evaluated; the buffer size that provided the strongest relation was selected for use in the logistic regression models. (3) Relations between concentrations of atrazine/DEA and nitrate in ground water and atrazine use, fertilizer use, geology, hydrogeomorphic regions, land cover, precipitation, soils, and well-construction data were evaluated, and several preliminary multivariate models with various combinations of independent variables were constructed. (4) The multivariate models that best predicted the presence of atrazine/DEA and elevated concentrations of nitrate in ground water were selected. (5) The accuracy of the multivariate models was confirmed by validating the models with an independent set of ground-water quality data. (6) The multivariate models were entered into a geographic information system and the probability maps were constructed.

  12. The use of auxiliary variables in capture-recapture and removal experiments

    USGS Publications Warehouse

    Pollock, K.H.; Hines, J.E.; Nichols, J.D.

    1984-01-01

    The dependence of animal capture probabilities on auxiliary variables is an important practical problem which has not been considered in the development of estimation procedures for capture-recapture and removal experiments. In this paper the linear logistic binary regression model is used to relate the probability of capture to continuous auxiliary variables. The auxiliary variables could be environmental quantities such as air or water temperature, or characteristics of individual animals, such as body length or weight. Maximum likelihood estimators of the population parameters are considered for a variety of models which all assume a closed population. Testing between models is also considered. The models can also be used when one auxiliary variable is a measure of the effort expended in obtaining the sample.

  13. Comparative analysis on the probability of being a good payer

    NASA Astrophysics Data System (ADS)

    Mihova, V.; Pavlov, V.

    2017-10-01

    Credit risk assessment is crucial for the bank industry. The current practice uses various approaches for the calculation of credit risk. The core of these approaches is the use of multiple regression models, applied in order to assess the risk associated with the approval of people applying for certain products (loans, credit cards, etc.). Based on data from the past, these models try to predict what will happen in the future. Different data requires different type of models. This work studies the causal link between the conduct of an applicant upon payment of the loan and the data that he completed at the time of application. A database of 100 borrowers from a commercial bank is used for the purposes of the study. The available data includes information from the time of application and credit history while paying off the loan. Customers are divided into two groups, based on the credit history: Good and Bad payers. Linear and logistic regression are applied in parallel to the data in order to estimate the probability of being good for new borrowers. A variable, which contains value of 1 for Good borrowers and value of 0 for Bad candidates, is modeled as a dependent variable. To decide which of the variables listed in the database should be used in the modelling process (as independent variables), a correlation analysis is made. Due to the results of it, several combinations of independent variables are tested as initial models - both with linear and logistic regression. The best linear and logistic models are obtained after initial transformation of the data and following a set of standard and robust statistical criteria. A comparative analysis between the two final models is made and scorecards are obtained from both models to assess new customers at the time of application. A cut-off level of points, bellow which to reject the applications and above it - to accept them, has been suggested for both the models, applying the strategy to keep the same Accept Rate as in the current data.

  14. Gene selection in cancer classification using sparse logistic regression with Bayesian regularization.

    PubMed

    Cawley, Gavin C; Talbot, Nicola L C

    2006-10-01

    Gene selection algorithms for cancer classification, based on the expression of a small number of biomarker genes, have been the subject of considerable research in recent years. Shevade and Keerthi propose a gene selection algorithm based on sparse logistic regression (SLogReg) incorporating a Laplace prior to promote sparsity in the model parameters, and provide a simple but efficient training procedure. The degree of sparsity obtained is determined by the value of a regularization parameter, which must be carefully tuned in order to optimize performance. This normally involves a model selection stage, based on a computationally intensive search for the minimizer of the cross-validation error. In this paper, we demonstrate that a simple Bayesian approach can be taken to eliminate this regularization parameter entirely, by integrating it out analytically using an uninformative Jeffrey's prior. The improved algorithm (BLogReg) is then typically two or three orders of magnitude faster than the original algorithm, as there is no longer a need for a model selection step. The BLogReg algorithm is also free from selection bias in performance estimation, a common pitfall in the application of machine learning algorithms in cancer classification. The SLogReg, BLogReg and Relevance Vector Machine (RVM) gene selection algorithms are evaluated over the well-studied colon cancer and leukaemia benchmark datasets. The leave-one-out estimates of the probability of test error and cross-entropy of the BLogReg and SLogReg algorithms are very similar, however the BlogReg algorithm is found to be considerably faster than the original SLogReg algorithm. Using nested cross-validation to avoid selection bias, performance estimation for SLogReg on the leukaemia dataset takes almost 48 h, whereas the corresponding result for BLogReg is obtained in only 1 min 24 s, making BLogReg by far the more practical algorithm. BLogReg also demonstrates better estimates of conditional probability than the RVM, which are of great importance in medical applications, with similar computational expense. A MATLAB implementation of the sparse logistic regression algorithm with Bayesian regularization (BLogReg) is available from http://theoval.cmp.uea.ac.uk/~gcc/cbl/blogreg/

  15. The association between second-hand smoke exposure and depressive symptoms among pregnant women.

    PubMed

    Huang, Jingya; Wen, Guoming; Yang, Weikang; Yao, Zhenjiang; Wu, Chuan'an; Ye, Xiaohua

    2017-10-01

    Tobacco smoking and depression are strongly associated, but the possible association between second-hand smoke (SHS) exposure and depression is unclear. This study aimed to examine the possible relation between SHS exposure and depressive symptoms among pregnant women. A cross-sectional survey was conducted in Shenzhen, China, using a multistage sampling method. The univariable and multivariable logistic regression models were used to explore the associations between SHS exposure and depressive symptoms. Among 2176 pregnant women, 10.5% and 2.0% were classified as having probable and severe depressive symptoms. Both binary and multinomial logistic regression revealed that there were significantly increased risks of severe depressive symptoms corresponding to SHS exposure in homes or regular SHS exposure in workplaces using no exposure as reference. In addition, greater frequency of SHS exposure was significantly associated with the increased risk of severe depressive symptoms. Our findings suggest that SHS exposure is positively associated with depressive symptoms in a dose-response manner among the pregnant women. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Fisher Scoring Method for Parameter Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model

    NASA Astrophysics Data System (ADS)

    Widyaningsih, Purnami; Retno Sari Saputro, Dewi; Nugrahani Putri, Aulia

    2017-06-01

    GWOLR model combines geographically weighted regression (GWR) and (ordinal logistic reression) OLR models. Its parameter estimation employs maximum likelihood estimation. Such parameter estimation, however, yields difficult-to-solve system of nonlinear equations, and therefore numerical approximation approach is required. The iterative approximation approach, in general, uses Newton-Raphson (NR) method. The NR method has a disadvantage—its Hessian matrix is always the second derivatives of each iteration so it does not always produce converging results. With regard to this matter, NR model is modified by substituting its Hessian matrix into Fisher information matrix, which is termed Fisher scoring (FS). The present research seeks to determine GWOLR model parameter estimation using Fisher scoring method and apply the estimation on data of the level of vulnerability to Dengue Hemorrhagic Fever (DHF) in Semarang. The research concludes that health facilities give the greatest contribution to the probability of the number of DHF sufferers in both villages. Based on the number of the sufferers, IR category of DHF in both villages can be determined.

  17. To resuscitate or not to resuscitate: a logistic regression analysis of physician-related variables influencing the decision.

    PubMed

    Einav, Sharon; Alon, Gady; Kaufman, Nechama; Braunstein, Rony; Carmel, Sara; Varon, Joseph; Hersch, Moshe

    2012-09-01

    To determine whether variables in physicians' backgrounds influenced their decision to forego resuscitating a patient they did not previously know. Questionnaire survey of a convenience sample of 204 physicians working in the departments of internal medicine, anaesthesiology and cardiology in 11 hospitals in Israel. Twenty per cent of the participants had elected to forego resuscitating a patient they did not previously know without additional consultation. Physicians who had more frequently elected to forego resuscitation had practised medicine for more than 5 years (p=0.013), estimated the number of resuscitations they had performed as being higher (p=0.009), and perceived their experience in resuscitation as sufficient (p=0.001). The variable that predicted the outcome of always performing resuscitation in the logistic regression model was less than 5 years of experience in medicine (OR 0.227, 95% CI 0.065 to 0.793; p=0.02). Physicians' level of experience may affect the probability of a patient's receiving resuscitation, whereas the physicians' personal beliefs and values did not seem to affect this outcome.

  18. Childhood growth and development associated with need for full-time special education at school age.

    PubMed

    Mannerkoski, Minna; Aberg, Laura; Hoikkala, Marianne; Sarna, Seppo; Kaski, Markus; Autti, Taina; Heiskala, Hannu

    2009-01-01

    To explore how growth measurements and attainment of developmental milestones in early childhood reflect the need for full-time special education (SE). After stratification in this population-based study, 900 pupils in full-time SE groups (age-range 7-16 years, mean 12 years 8 months) at three levels and 301 pupils in mainstream education (age-range 7-16, mean 12 years 9 months) provided data on height and weight from birth to age 7 years and head circumference to age 1 year. Developmental screening was evaluated from age 1 month to 48 months. Statistical methods included a general linear model (growth measurements), binary logistic regression analysis (odds ratios for growth), and multinomial logistic regression analysis (odds ratios for developmental milestones). At 1 year, a 1 standard deviation score (SDS) decrease in height raised the probability of SE placement by 40%, and a 1 SDS decrease in head size by 28%. In developmental screening, during the first months of life the gross motor milestones, especially head support, differentiated the children at levels 0-3. Thereafter, the fine motor milestones and those related to speech and social skills became more important. Children whose growth is mildly impaired, though in the normal range, and who fail to attain certain developmental milestones have an increased probability for SE and thus a need for special attention when toddlers age. Similar to the growth curves, these children seem to have consistent developmental curves (patterns).

  19. Daytime identification of summer hailstorm cells from MSG data

    NASA Astrophysics Data System (ADS)

    Merino, A.; López, L.; Sánchez, J. L.; García-Ortega, E.; Cattani, E.; Levizzani, V.

    2014-04-01

    Identifying deep convection is of paramount importance, as it may be associated with extreme weather phenomena that have significant impact on the environment, property and populations. A new method, the hail detection tool (HDT), is described for identifying hail-bearing storms using multispectral Meteosat Second Generation (MSG) data. HDT was conceived as a two-phase method, in which the first step is the convective mask (CM) algorithm devised for detection of deep convection, and the second a hail mask algorithm (HM) for the identification of hail-bearing clouds among cumulonimbus systems detected by CM. Both CM and HM are based on logistic regression models trained with multispectral MSG data sets comprised of summer convective events in the middle Ebro Valley (Spain) between 2006 and 2010, and detected by the RGB (red-green-blue) visualization technique (CM) or C-band weather radar system of the University of León. By means of the logistic regression approach, the probability of identifying a cumulonimbus event with CM or a hail event with HM are computed by exploiting a proper selection of MSG wavelengths or their combination. A number of cloud physical properties (liquid water path, optical thickness and effective cloud drop radius) were used to physically interpret results of statistical models from a meteorological perspective, using a method based on these "ingredients". Finally, the HDT was applied to a new validation sample consisting of events during summer 2011. The overall probability of detection was 76.9 % and the false alarm ratio 16.7 %.

  20. Normal Tissue Complication Probability (NTCP) Modelling of Severe Acute Mucositis using a Novel Oral Mucosal Surface Organ at Risk.

    PubMed

    Dean, J A; Welsh, L C; Wong, K H; Aleksic, A; Dunne, E; Islam, M R; Patel, A; Patel, P; Petkar, I; Phillips, I; Sham, J; Schick, U; Newbold, K L; Bhide, S A; Harrington, K J; Nutting, C M; Gulliford, S L

    2017-04-01

    A normal tissue complication probability (NTCP) model of severe acute mucositis would be highly useful to guide clinical decision making and inform radiotherapy planning. We aimed to improve upon our previous model by using a novel oral mucosal surface organ at risk (OAR) in place of an oral cavity OAR. Predictive models of severe acute mucositis were generated using radiotherapy dose to the oral cavity OAR or mucosal surface OAR and clinical data. Penalised logistic regression and random forest classification (RFC) models were generated for both OARs and compared. Internal validation was carried out with 100-iteration stratified shuffle split cross-validation, using multiple metrics to assess different aspects of model performance. Associations between treatment covariates and severe mucositis were explored using RFC feature importance. Penalised logistic regression and RFC models using the oral cavity OAR performed at least as well as the models using mucosal surface OAR. Associations between dose metrics and severe mucositis were similar between the mucosal surface and oral cavity models. The volumes of oral cavity or mucosal surface receiving intermediate and high doses were most strongly associated with severe mucositis. The simpler oral cavity OAR should be preferred over the mucosal surface OAR for NTCP modelling of severe mucositis. We recommend minimising the volume of mucosa receiving intermediate and high doses, where possible. Copyright © 2016 The Royal College of Radiologists. Published by Elsevier Ltd. All rights reserved.

  1. Hierarchical faunal filters: An approach to assessing effects of habitat and nonnative species on native fishes

    USGS Publications Warehouse

    Quist, M.C.; Rahel, F.J.; Hubert, W.A.

    2005-01-01

    Understanding factors related to the occurrence of species across multiple spatial and temporal scales is critical to the conservation and management of native fishes, especially for those species at the edge of their natural distribution. We used the concept of hierarchical faunal filters to provide a framework for investigating the influence of habitat characteristics and normative piscivores on the occurrence of 10 native fishes in streams of the North Platte River watershed in Wyoming. Three faunal filters were developed for each species: (i) large-scale biogeographic, (ii) local abiotic, and (iii) biotic. The large-scale biogeographic filter, composed of elevation and stream-size thresholds, was used to determine the boundaries within which each species might be expected to occur. Then, a local abiotic filter (i.e., habitat associations), developed using binary logistic-regression analysis, estimated the probability of occurrence of each species from features such as maximum depth, substrate composition, submergent aquatic vegetation, woody debris, and channel morphology (e.g., amount of pool habitat). Lastly, a biotic faunal filter was developed using binary logistic regression to estimate the probability of occurrence of each species relative to the abundance of nonnative piscivores in a reach. Conceptualising fish assemblages within a framework of hierarchical faunal filters is simple and logical, helps direct conservation and management activities, and provides important information on the ecology of fishes in the western Great Plains of North America. ?? Blackwell Munksgaard, 2004.

  2. Predictive occurrence models for coastal wetland plant communities: delineating hydrologic response surfaces with multinomial logistic regression

    USGS Publications Warehouse

    Snedden, Gregg A.; Steyer, Gregory D.

    2013-01-01

    Understanding plant community zonation along estuarine stress gradients is critical for effective conservation and restoration of coastal wetland ecosystems. We related the presence of plant community types to estuarine hydrology at 173 sites across coastal Louisiana. Percent relative cover by species was assessed at each site near the end of the growing season in 2008, and hourly water level and salinity were recorded at each site Oct 2007–Sep 2008. Nine plant community types were delineated with k-means clustering, and indicator species were identified for each of the community types with indicator species analysis. An inverse relation between salinity and species diversity was observed. Canonical correspondence analysis (CCA) effectively segregated the sites across ordination space by community type, and indicated that salinity and tidal amplitude were both important drivers of vegetation composition. Multinomial logistic regression (MLR) and Akaike's Information Criterion (AIC) were used to predict the probability of occurrence of the nine vegetation communities as a function of salinity and tidal amplitude, and probability surfaces obtained from the MLR model corroborated the CCA results. The weighted kappa statistic, calculated from the confusion matrix of predicted versus actual community types, was 0.7 and indicated good agreement between observed community types and model predictions. Our results suggest that models based on a few key hydrologic variables can be valuable tools for predicting vegetation community development when restoring and managing coastal wetlands.

  3. Day-time identification of summer hailstorm cells from MSG data

    NASA Astrophysics Data System (ADS)

    Merino, A.; López, L.; Sánchez, J. L.; García-Ortega, E.; Cattani, E.; Levizzani, V.

    2013-10-01

    Identifying deep convection is of paramount importance, as it may be associated with extreme weather that has significant impact on the environment, property and the population. A new method, the Hail Detection Tool (HDT), is described for identifying hail-bearing storms using multi-spectral Meteosat Second Generation (MSG) data. HDT was conceived as a two-phase method, in which the first step is the Convective Mask (CM) algorithm devised for detection of deep convection, and the second a Hail Detection algorithm (HD) for the identification of hail-bearing clouds among cumulonimbus systems detected by CM. Both CM and HD are based on logistic regression models trained with multi-spectral MSG data-sets comprised of summer convective events in the middle Ebro Valley between 2006-2010, and detected by the RGB visualization technique (CM) or C-band weather radar system of the University of León. By means of the logistic regression approach, the probability of identifying a cumulonimbus event with CM or a hail event with HD are computed by exploiting a proper selection of MSG wavelengths or their combination. A number of cloud physical properties (liquid water path, optical thickness and effective cloud drop radius) were used to physically interpret results of statistical models from a meteorological perspective, using a method based on these "ingredients." Finally, the HDT was applied to a new validation sample consisting of events during summer 2011. The overall Probability of Detection (POD) was 76.9% and False Alarm Ratio 16.7%.

  4. On the predictability of outliers in ensemble forecasts

    NASA Astrophysics Data System (ADS)

    Siegert, S.; Bröcker, J.; Kantz, H.

    2012-03-01

    In numerical weather prediction, ensembles are used to retrieve probabilistic forecasts of future weather conditions. We consider events where the verification is smaller than the smallest, or larger than the largest ensemble member of a scalar ensemble forecast. These events are called outliers. In a statistically consistent K-member ensemble, outliers should occur with a base rate of 2/(K+1). In operational ensembles this base rate tends to be higher. We study the predictability of outlier events in terms of the Brier Skill Score and find that forecast probabilities can be calculated which are more skillful than the unconditional base rate. This is shown analytically for statistically consistent ensembles. Using logistic regression, forecast probabilities for outlier events in an operational ensemble are calculated. These probabilities exhibit positive skill which is quantitatively similar to the analytical results. Possible causes of these results as well as their consequences for ensemble interpretation are discussed.

  5. Probability of nitrate contamination of recently recharged groundwaters in the conterminous United States

    USGS Publications Warehouse

    Nolan, B.T.; Hitt, K.J.; Ruddy, B.C.

    2002-01-01

    A new logistic regression (LR) model was used to predict the probability of nitrate contamination exceeding 4 mg/L in predominantly shallow, recently recharged ground waters of the United States. The new model contains variables representing (1) N fertilizer loading (p 2 = 0.875), indicating that the LR model fits the data well. The likelihood of nitrate contamination is greater in areas with high N loading and well-drained surficial soils over unconsolidated sand and gravels. The LR model correctly predicted the status of nitrate contamination in 75% of wells in a validation data set. Considering all wells used in both calibration and validation, observed median nitrate concentration increased from 0.24 to 8.30 mg/L as the mapped probability of nitrate exceeding 4 mg/L increased from less than or equal to 0.17 to > 0.83.

  6. On the mishandling of probabilities in Lamotte & Wells' commentary on J.P. Michaud, G. Moreau, Predicting the visitation of carcasses by carrion-related insects under different rates of degree-day accumulation.

    PubMed

    Moreau, Gaétan; Michaud, J-P

    2017-01-01

    LaMotte and Wells re-analyzed and criticized one of our articles in which we proposed a novel statistical test for predicting postmortem interval from insect succession data. Using simple mathematical examples, we demonstrate that LaMotte and Wells erred because their analyses are based on an erroneous interpretation of the nature of probabilities that disregards more than 300 years of scientific literature on probability combination. We also argue that the methods presented in our article, more specifically the use of degree-day-based logistic regression analysis to model succession, was a positive contribution to the fields of forensic entomology and carrion ecology, which LaMotte and Wells forgot to mention by instead focusing on issues that were either trivial or did not exist. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  7. Prevalence and risk factors associated with tardive dyskinesia among Indian patients with schizophrenia.

    PubMed

    Achalia, Rashmin M; Chaturvedi, Santosh K; Desai, Geetha; Rao, Girish N; Prakash, Om

    2014-06-01

    Tardive dyskinesia (TD) is one of the most distressing side effects of antipsychotic treatment. As prevalence studies of TD in Asian population are scarce, a cross-sectional study was performed to assess the frequency of TD in Indian patients with schizophrenia and risk factors of TD. Cross-sectional study of 160 Indian patients fulfilling the DSM-IV TR criteria for schizophrenia and who received antipsychotics for at least one year, were examined with two validated scales for TD. Logistic regression analyses were used to examine the relationship between TD and clinical risk factors. The frequency of probable TD in the total sample was 26.4%. The logistic regression yielded significant odds ratios between TD and age, intermittent treatment, and total cumulative antipsychotic dose. The difference of TD between SGA and FGA disappeared after adjusting for important co-variables in regression analysis. Indian patients with schizophrenia and long-term antipsychotic treatment have a high risk of TD, and TD is associated with older age, intermittent antipsychotic treatment, and a high total cumulative antipsychotic dose. Our study findings suggest that there is no significant difference between SGAs with regards to the risk of causing TD as compared to FGAs. Copyright © 2014 Elsevier B.V. All rights reserved.

  8. Utility of inverse probability weighting in molecular pathological epidemiology.

    PubMed

    Liu, Li; Nevo, Daniel; Nishihara, Reiko; Cao, Yin; Song, Mingyang; Twombly, Tyler S; Chan, Andrew T; Giovannucci, Edward L; VanderWeele, Tyler J; Wang, Molin; Ogino, Shuji

    2018-04-01

    As one of causal inference methodologies, the inverse probability weighting (IPW) method has been utilized to address confounding and account for missing data when subjects with missing data cannot be included in a primary analysis. The transdisciplinary field of molecular pathological epidemiology (MPE) integrates molecular pathological and epidemiological methods, and takes advantages of improved understanding of pathogenesis to generate stronger biological evidence of causality and optimize strategies for precision medicine and prevention. Disease subtyping based on biomarker analysis of biospecimens is essential in MPE research. However, there are nearly always cases that lack subtype information due to the unavailability or insufficiency of biospecimens. To address this missing subtype data issue, we incorporated inverse probability weights into Cox proportional cause-specific hazards regression. The weight was inverse of the probability of biomarker data availability estimated based on a model for biomarker data availability status. The strategy was illustrated in two example studies; each assessed alcohol intake or family history of colorectal cancer in relation to the risk of developing colorectal carcinoma subtypes classified by tumor microsatellite instability (MSI) status, using a prospective cohort study, the Nurses' Health Study. Logistic regression was used to estimate the probability of MSI data availability for each cancer case with covariates of clinical features and family history of colorectal cancer. This application of IPW can reduce selection bias caused by nonrandom variation in biospecimen data availability. The integration of causal inference methods into the MPE approach will likely have substantial potentials to advance the field of epidemiology.

  9. What are hierarchical models and how do we analyze them?

    USGS Publications Warehouse

    Royle, Andy

    2016-01-01

    In this chapter we provide a basic definition of hierarchical models and introduce the two canonical hierarchical models in this book: site occupancy and N-mixture models. The former is a hierarchical extension of logistic regression and the latter is a hierarchical extension of Poisson regression. We introduce basic concepts of probability modeling and statistical inference including likelihood and Bayesian perspectives. We go through the mechanics of maximizing the likelihood and characterizing the posterior distribution by Markov chain Monte Carlo (MCMC) methods. We give a general perspective on topics such as model selection and assessment of model fit, although we demonstrate these topics in practice in later chapters (especially Chapters 5, 6, 7, and 10 Chapter 5 Chapter 6 Chapter 7 Chapter 10)

  10. Extrapolating regional probability of drying of headwater streams using discrete observations and gauging networks

    NASA Astrophysics Data System (ADS)

    Beaufort, Aurélien; Lamouroux, Nicolas; Pella, Hervé; Datry, Thibault; Sauquet, Eric

    2018-05-01

    Headwater streams represent a substantial proportion of river systems and many of them have intermittent flows due to their upstream position in the network. These intermittent rivers and ephemeral streams have recently seen a marked increase in interest, especially to assess the impact of drying on aquatic ecosystems. The objective of this paper is to quantify how discrete (in space and time) field observations of flow intermittence help to extrapolate over time the daily probability of drying (defined at the regional scale). Two empirical models based on linear or logistic regressions have been developed to predict the daily probability of intermittence at the regional scale across France. Explanatory variables were derived from available daily discharge and groundwater-level data of a dense gauging/piezometer network, and models were calibrated using discrete series of field observations of flow intermittence. The robustness of the models was tested using an independent, dense regional dataset of intermittence observations and observations of the year 2017 excluded from the calibration. The resulting models were used to extrapolate the daily regional probability of drying in France: (i) over the period 2011-2017 to identify the regions most affected by flow intermittence; (ii) over the period 1989-2017, using a reduced input dataset, to analyse temporal variability of flow intermittence at the national level. The two empirical regression models performed equally well between 2011 and 2017. The accuracy of predictions depended on the number of continuous gauging/piezometer stations and intermittence observations available to calibrate the regressions. Regions with the highest performance were located in sedimentary plains, where the monitoring network was dense and where the regional probability of drying was the highest. Conversely, the worst performances were obtained in mountainous regions. Finally, temporal projections (1989-2016) suggested the highest probabilities of intermittence (> 35 %) in 1989-1991, 2003 and 2005. A high density of intermittence observations improved the information provided by gauging stations and piezometers to extrapolate the temporal variability of intermittent rivers and ephemeral streams.

  11. Prevalence of abortion and stillbirth in a beef cattle system in Southeastern Mexico.

    PubMed

    Segura-Correa, José C; Segura-Correa, Victor M

    2009-12-01

    Prenatal mortality is an important cause of production losses in the livestock industry. This study estimates the prevalences of abortion and stillbirth in a beef cattle system and determines the significance of some risk factors, in the tropics of Mexico. Data were obtained from a Zebu cattle herd and their crosses with Bos taurus breeds, in Yucatan, Mexico. The logit of the probability of an abortion or stillbirth was modeled using binary logistic regression. The risk factors tested were: year of abortion (or calving), season of abortion (or calving), parity number and dam breed group. The effect of twins on stillbirth was tested using Fisher exact test. Of the 4175 calvings studied 49 were abortions (1.17%). Significant factors in the logistic regression analysis for abortions were season of abortion and parity number. The risk of abortion was lower in the dry seasons compared to the rainy and windy seasons (P = 0.009). The risk of abortion was higher in second parity cows followed by the third and first parity cows, as compared to older cows (P = 0.015). Of the 4126 births, 87 were stillbirths (2.11%). Significant factors in the logistic regression analysis for stillbirth were year of calving (P = 0.0001) and parity number (P < 0.001). The risk of stillbirth in first parity cows was 2.6 times that of old cows. Of the total births, 15 were twins (0.36%) of which 7 were born dead calves. Herd owners must focus on the significant risk factors under their control to reduce the prevalence of prenatal mortality.

  12. PARAMETRIC AND NON PARAMETRIC (MARS: MULTIVARIATE ADDITIVE REGRESSION SPLINES) LOGISTIC REGRESSIONS FOR PREDICTION OF A DICHOTOMOUS RESPONSE VARIABLE WITH AN EXAMPLE FOR PRESENCE/ABSENCE OF AMPHIBIANS

    EPA Science Inventory

    The purpose of this report is to provide a reference manual that could be used by investigators for making informed use of logistic regression using two methods (standard logistic regression and MARS). The details for analyses of relationships between a dependent binary response ...

  13. Analyzing Student Learning Outcomes: Usefulness of Logistic and Cox Regression Models. IR Applications, Volume 5

    ERIC Educational Resources Information Center

    Chen, Chau-Kuang

    2005-01-01

    Logistic and Cox regression methods are practical tools used to model the relationships between certain student learning outcomes and their relevant explanatory variables. The logistic regression model fits an S-shaped curve into a binary outcome with data points of zero and one. The Cox regression model allows investigators to study the duration…

  14. An appraisal of convergence failures in the application of logistic regression model in published manuscripts.

    PubMed

    Yusuf, O B; Bamgboye, E A; Afolabi, R F; Shodimu, M A

    2014-09-01

    Logistic regression model is widely used in health research for description and predictive purposes. Unfortunately, most researchers are sometimes not aware that the underlying principles of the techniques have failed when the algorithm for maximum likelihood does not converge. Young researchers particularly postgraduate students may not know why separation problem whether quasi or complete occurs, how to identify it and how to fix it. This study was designed to critically evaluate convergence issues in articles that employed logistic regression analysis published in an African Journal of Medicine and medical sciences between 2004 and 2013. Problems of quasi or complete separation were described and were illustrated with the National Demographic and Health Survey dataset. A critical evaluation of articles that employed logistic regression was conducted. A total of 581 articles was reviewed, of which 40 (6.9%) used binary logistic regression. Twenty-four (60.0%) stated the use of logistic regression model in the methodology while none of the articles assessed model fit. Only 3 (12.5%) properly described the procedures. Of the 40 that used the logistic regression model, the problem of convergence occurred in 6 (15.0%) of the articles. Logistic regression tends to be poorly reported in studies published between 2004 and 2013. Our findings showed that the procedure may not be well understood by researchers since very few described the process in their reports and may be totally unaware of the problem of convergence or how to deal with it.

  15. Description of Aspergillus flavus growth under the influence of different factors (water activity, incubation temperature, protein and fat concentration, pH, and cinnamon essential oil concentration) by kinetic, probability of growth, and time-to-detection models.

    PubMed

    Kosegarten, Carlos E; Ramírez-Corona, Nelly; Mani-López, Emma; Palou, Enrique; López-Malo, Aurelio

    2017-01-02

    A Box-Behnken design was used to determine the effect of protein concentration (0, 5, or 10g of casein/100g), fat (0, 3, or 6g of corn oil/100g), a w (0.900, 0.945, or 0.990), pH (3.5, 5.0, or 6.5), concentration of cinnamon essential oil (CEO, 0, 200, or 400μL/kg) and incubation temperature (15, 25, or 35°C) on the growth of Aspergillus flavus during 50days of incubation. Mold response under the evaluated conditions was modeled by the modified Gompertz equation, logistic regression, and time-to-detection model. The obtained polynomial regression models allow the significant coefficients (p<0.05) for linear, quadratic and interaction effects for the Gompertz equation's parameters to be identified, which adequately described (R 2 >0.967) the studied mold responses. After 50days of incubation, every tested model system was classified according to the observed response as 1 (growth) or 0 (no growth), then a binary logistic regression was utilized to model A. flavus growth interface, allowing to predict the probability of mold growth under selected combinations of tested factors. The time-to-detection model was utilized to estimate the time at which A. flavus visible growth begins. Water activity, temperature, and CEO concentration were the most important factors affecting fungal growth. It was observed that there is a range of possible combinations that may induce growth, such that incubation conditions and the amount of essential oil necessary for fungal growth inhibition strongly depend on protein and fat concentrations as well as on the pH of studied model systems. The probabilistic model and the time-to-detection models constitute another option to determine appropriate storage/processing conditions and accurately predict the probability and/or the time at which A. flavus growth occurs. Copyright © 2016 Elsevier B.V. All rights reserved.

  16. Logistic Regression: Concept and Application

    ERIC Educational Resources Information Center

    Cokluk, Omay

    2010-01-01

    The main focus of logistic regression analysis is classification of individuals in different groups. The aim of the present study is to explain basic concepts and processes of binary logistic regression analysis intended to determine the combination of independent variables which best explain the membership in certain groups called dichotomous…

  17. Remote sensing and GIS-based landslide hazard analysis and cross-validation using multivariate logistic regression model on three test areas in Malaysia

    NASA Astrophysics Data System (ADS)

    Pradhan, Biswajeet

    2010-05-01

    This paper presents the results of the cross-validation of a multivariate logistic regression model using remote sensing data and GIS for landslide hazard analysis on the Penang, Cameron, and Selangor areas in Malaysia. Landslide locations in the study areas were identified by interpreting aerial photographs and satellite images, supported by field surveys. SPOT 5 and Landsat TM satellite imagery were used to map landcover and vegetation index, respectively. Maps of topography, soil type, lineaments and land cover were constructed from the spatial datasets. Ten factors which influence landslide occurrence, i.e., slope, aspect, curvature, distance from drainage, lithology, distance from lineaments, soil type, landcover, rainfall precipitation, and normalized difference vegetation index (ndvi), were extracted from the spatial database and the logistic regression coefficient of each factor was computed. Then the landslide hazard was analysed using the multivariate logistic regression coefficients derived not only from the data for the respective area but also using the logistic regression coefficients calculated from each of the other two areas (nine hazard maps in all) as a cross-validation of the model. For verification of the model, the results of the analyses were then compared with the field-verified landslide locations. Among the three cases of the application of logistic regression coefficient in the same study area, the case of Selangor based on the Selangor logistic regression coefficients showed the highest accuracy (94%), where as Penang based on the Penang coefficients showed the lowest accuracy (86%). Similarly, among the six cases from the cross application of logistic regression coefficient in other two areas, the case of Selangor based on logistic coefficient of Cameron showed highest (90%) prediction accuracy where as the case of Penang based on the Selangor logistic regression coefficients showed the lowest accuracy (79%). Qualitatively, the cross application model yields reasonable results which can be used for preliminary landslide hazard mapping.

  18. A revised logistic regression equation and an automated procedure for mapping the probability of a stream flowing perennially in Massachusetts

    USGS Publications Warehouse

    Bent, Gardner C.; Steeves, Peter A.

    2006-01-01

    A revised logistic regression equation and an automated procedure were developed for mapping the probability of a stream flowing perennially in Massachusetts. The equation provides city and town conservation commissions and the Massachusetts Department of Environmental Protection a method for assessing whether streams are intermittent or perennial at a specific site in Massachusetts by estimating the probability of a stream flowing perennially at that site. This information could assist the environmental agencies who administer the Commonwealth of Massachusetts Rivers Protection Act of 1996, which establishes a 200-foot-wide protected riverfront area extending from the mean annual high-water line along each side of a perennial stream, with exceptions for some urban areas. The equation was developed by relating the observed intermittent or perennial status of a stream site to selected basin characteristics of naturally flowing streams (defined as having no regulation by dams, surface-water withdrawals, ground-water withdrawals, diversion, wastewater discharge, and so forth) in Massachusetts. This revised equation differs from the equation developed in a previous U.S. Geological Survey study in that it is solely based on visual observations of the intermittent or perennial status of stream sites across Massachusetts and on the evaluation of several additional basin and land-use characteristics as potential explanatory variables in the logistic regression analysis. The revised equation estimated more accurately the intermittent or perennial status of the observed stream sites than the equation from the previous study. Stream sites used in the analysis were identified as intermittent or perennial based on visual observation during low-flow periods from late July through early September 2001. The database of intermittent and perennial streams included a total of 351 naturally flowing (no regulation) sites, of which 85 were observed to be intermittent and 266 perennial. Stream sites included in the database had drainage areas that ranged from 0.04 to 10.96 square miles. Of the 66 stream sites with drainage areas greater than 2.00 square miles, 2 sites were intermittent and 64 sites were perennial. Thus, stream sites with drainage areas greater than 2.00 square miles were assumed to flow perennially, and the database used to develop the logistic regression equation included only those stream sites with drainage areas less than 2.00 square miles. The database for the equation included 285 stream sites that had drainage areas less than 2.00 square miles, of which 83 sites were intermittent and 202 sites were perennial. Results of the logistic regression analysis indicate that the probability of a stream flowing perennially at a specific site in Massachusetts can be estimated as a function of four explanatory variables: (1) drainage area (natural logarithm), (2) areal percentage of sand and gravel deposits, (3) areal percentage of forest land, and (4) region of the state (eastern region or western region). Although the equation provides an objective means of determining the probability of a stream flowing perennially at a specific site, the reliability of the equation is constrained by the data used in its development. The equation is not recommended for (1) losing stream reaches or (2) streams whose ground-water contributing areas do not coincide with their surface-water drainage areas, such as many streams draining the Southeast Coastal Region-the southern part of the South Coastal Basin, the eastern part of the Buzzards Bay Basin, and the entire area of the Cape Cod and the Islands Basins. If the equation were used on a regulated stream site, the estimated intermittent or perennial status would reflect the natural flow conditions for that site. An automated mapping procedure was developed to determine the intermittent or perennial status of stream sites along reaches throughout a basin. The procedure delineates the drainage area boundaries, determines values for the four explanatory variables, and solves the equation for estimating the probability of a stream flowing perennially at two locations on a headwater (first-order) stream reach-one near its confluence or end point and one near its headwaters or start point. The automated procedure then determines the intermittent or perennial status of the reach on the basis of the calculated probability values and a probability cutpoint (a stream is considered to flow perennially at a cutpoint of 0.56 or greater for this study) for the two locations or continues to loop upstream or downstream between locations less than and greater than the cutpoint of 0.56 to determine the transition point from an intermittent to a perennial stream. If the first-order stream reach is determined to be intermittent, the procedure moves to the next downstream reach and repeats the same process. The automated procedure then moves to the next first-order stream and repeats the process until the entire basin is mapped. A map of the intermittent and perennial stream reaches in the Shawsheen River Basin is provided on a CD-ROM that accompanies this report. The CD-ROM also contains ArcReader 9.0, a freeware product, that allows a user to zoom in and out, set a scale, pan, turn on and off map layers (such as a USGS topographic map), and print a map of the stream site with a scale bar. Maps of the intermittent and perennial stream reaches in Massachusetts will provide city and town conservation commissions and the Massachusetts Department of Environmental Protection with an additional method for assessing the intermittent or perennial status of stream sites.

  19. An Entropy-Based Measure for Assessing Fuzziness in Logistic Regression

    PubMed Central

    Weiss, Brandi A.; Dardick, William

    2015-01-01

    This article introduces an entropy-based measure of data–model fit that can be used to assess the quality of logistic regression models. Entropy has previously been used in mixture-modeling to quantify how well individuals are classified into latent classes. The current study proposes the use of entropy for logistic regression models to quantify the quality of classification and separation of group membership. Entropy complements preexisting measures of data–model fit and provides unique information not contained in other measures. Hypothetical data scenarios, an applied example, and Monte Carlo simulation results are used to demonstrate the application of entropy in logistic regression. Entropy should be used in conjunction with other measures of data–model fit to assess how well logistic regression models classify cases into observed categories. PMID:29795897

  20. Logistic regression applied to natural hazards: rare event logistic regression with replications

    NASA Astrophysics Data System (ADS)

    Guns, M.; Vanacker, V.

    2012-06-01

    Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.

  1. Large unbalanced credit scoring using Lasso-logistic regression ensemble.

    PubMed

    Wang, Hong; Xu, Qingsong; Zhou, Lifeng

    2015-01-01

    Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data.

  2. An Entropy-Based Measure for Assessing Fuzziness in Logistic Regression.

    PubMed

    Weiss, Brandi A; Dardick, William

    2016-12-01

    This article introduces an entropy-based measure of data-model fit that can be used to assess the quality of logistic regression models. Entropy has previously been used in mixture-modeling to quantify how well individuals are classified into latent classes. The current study proposes the use of entropy for logistic regression models to quantify the quality of classification and separation of group membership. Entropy complements preexisting measures of data-model fit and provides unique information not contained in other measures. Hypothetical data scenarios, an applied example, and Monte Carlo simulation results are used to demonstrate the application of entropy in logistic regression. Entropy should be used in conjunction with other measures of data-model fit to assess how well logistic regression models classify cases into observed categories.

  3. Force Sizing for Stability Operations

    DTIC Science & Technology

    2010-03-01

    recommen at ons n t e ow teens , a t oug t ey a so cons ere a limited number of operations, which included several that weren’t COIN. In effect, there is...50 corresponds to an 80% probability of success using the logistics regression. An e tremel important ca eat in sing this approach is the implicit ass ...ifi dargest countr es rom t e u set o cons ere . e vers on o t s c art n t e ass e Appendix contains the names of the countries and is, of course, of

  4. The effectiveness of tape playbacks in estimating Black Rail densities

    USGS Publications Warehouse

    Legare, M.; Eddleman, W.R.; Buckley, P.A.; Kelly, C.

    1999-01-01

    Tape playback is often the only efficient technique to survey for secretive birds. We measured the vocal responses and movements of radio-tagged black rails (Laterallus jamaicensis; 26 M, 17 F) to playback of vocalizations at 2 sites in Florida during the breeding seasons of 1992-95. We used coefficients from logistic regression equations to model probability of a response conditional to the birds' sex. nesting status, distance to playback source, and time of survey. With a probability of 0.811, nonnesting male black rails were ))lost likely to respond to playback, while nesting females were the least likely to respond (probability = 0.189). We used linear regression to determine daily, monthly and annual variation in response from weekly playback surveys along a fixed route during the breeding seasons of 1993-95. Significant sources of variation in the regression model were month (F3.48 = 3.89, P = 0.014), year (F2.48 = 9.37, P < 0.001), temperature (F1.48 = 5.44, P = 0.024), and month X year (F5.48 = 2.69, P = 0.031). The model was highly significant (P < 0.001) and explained 54% of the variation of mean response per survey period (r2 = 0.54). We combined response probability data from radiotagged black rails with playback survey route data to provide a density estimate of 0.25 birds/ha for the St. Johns National Wildlife Refuge. The relation between the number of black rails heard during playback surveys to the actual number present was influenced by a number of variables. We recommend caution when making density estimates from tape playback surveys

  5. Endoscopic third ventriculostomy in the treatment of childhood hydrocephalus.

    PubMed

    Kulkarni, Abhaya V; Drake, James M; Mallucci, Conor L; Sgouros, Spyros; Roth, Jonathan; Constantini, Shlomi

    2009-08-01

    To develop a model to predict the probability of endoscopic third ventriculostomy (ETV) success in the treatment for hydrocephalus on the basis of a child's individual characteristics. We analyzed 618 ETVs performed consecutively on children at 12 international institutions to identify predictors of ETV success at 6 months. A multivariable logistic regression model was developed on 70% of the dataset (training set) and validated on 30% of the dataset (validation set). In the training set, 305/455 ETVs (67.0%) were successful. The regression model (containing patient age, cause of hydrocephalus, and previous cerebrospinal fluid shunt) demonstrated good fit (Hosmer-Lemeshow, P = .78) and discrimination (C statistic = 0.70). In the validation set, 105/163 ETVs (64.4%) were successful and the model maintained good fit (Hosmer-Lemeshow, P = .45), discrimination (C statistic = 0.68), and calibration (calibration slope = 0.88). A simplified ETV Success Score was devised that closely approximates the predicted probability of ETV success. Children most likely to succeed with ETV can now be accurately identified and spared the long-term complications of CSF shunting.

  6. Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning

    ERIC Educational Resources Information Center

    Li, Zhushan

    2014-01-01

    Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…

  7. A Methodology for Generating Placement Rules that Utilizes Logistic Regression

    ERIC Educational Resources Information Center

    Wurtz, Keith

    2008-01-01

    The purpose of this article is to provide the necessary tools for institutional researchers to conduct a logistic regression analysis and interpret the results. Aspects of the logistic regression procedure that are necessary to evaluate models are presented and discussed with an emphasis on cutoff values and choosing the appropriate number of…

  8. Comparison of standard maximum likelihood classification and polytomous logistic regression used in remote sensing

    Treesearch

    John Hogland; Nedret Billor; Nathaniel Anderson

    2013-01-01

    Discriminant analysis, referred to as maximum likelihood classification within popular remote sensing software packages, is a common supervised technique used by analysts. Polytomous logistic regression (PLR), also referred to as multinomial logistic regression, is an alternative classification approach that is less restrictive, more flexible, and easy to interpret. To...

  9. Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble

    PubMed Central

    Wang, Hong; Xu, Qingsong; Zhou, Lifeng

    2015-01-01

    Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data. PMID:25706988

  10. Contribution of anti-Hsp70.1 IgG antibody levels to the diagnostic certainty of clinically suspected ocular toxoplasmosis.

    PubMed

    Chumpitazi, Bernabé F F; Bouillet, Laurence; Fricker-Hidalgo, Hélène; Lacharme, Tiffany; Romanet, Jean-Paul; Massot, Christian; Chiquet, Christophe; Pelloux, Hervé

    2010-11-01

    Laboratory diagnosis of ocular toxoplasmosis, the major cause of posterior uveitis worldwide, can be improved. Heat shock protein (Hsp) 70 is involved in cellular infection by Toxoplasma gondii but also in the immune response to this parasite. The authors postulate that infected patients may exhibit serum IgG anti-Hsp70.1 antibodies and that determining the presence of these antibodies could improve the diagnosis of suspected ocular toxoplasmosis. This retrospective case-control study included 26 laboratory-confirmed cases of ocular toxoplasmosis (group A), 41 clinically suspected cases (group B), and 67 currently healthy blood donors who were chronically infected with T. gondii (group C). Laboratory and clinical data were analyzed according to the ocular presentation and Goldmann-Witmer's coefficient. Serum and aqueous humor were sampled at the time of uveitis. Serum anti-Hsp70.1 antibody levels were obtained by ELISA. The probability of ocular toxoplasmosis was estimated by a logistic regression analysis that combined data from serum IgG anti-Hsp70.1 and aqueous-humor IgG anti-T. gondii antibody levels. Serum IgG anti-Hsp70.1 antibody levels were significantly increased in groups A and B when compared to the levels in control group C (P ≤ 0.0034). These levels correlated with the retinal lesion size (r = 0.301; P < 0.0349). Logistic probability and anti-Hsp70.1 antibodies in sera confirmed that 10 of 23 cases in group B were true ocular toxoplasmosis. Anti-Hsp70 may play a role in the immunopathogenesis of ocular Toxoplasma infection. This study showed that the anti-Hsp70.1 antibody and the logistic probability test can confirm clinically suspected ocular toxoplasmosis.

  11. Validation of Metrics as Error Predictors

    NASA Astrophysics Data System (ADS)

    Mendling, Jan

    In this chapter, we test the validity of metrics that were defined in the previous chapter for predicting errors in EPC business process models. In Section 5.1, we provide an overview of how the analysis data is generated. Section 5.2 describes the sample of EPCs from practice that we use for the analysis. Here we discuss a disaggregation by the EPC model group and by error as well as a correlation analysis between metrics and error. Based on this sample, we calculate a logistic regression model for predicting error probability with the metrics as input variables in Section 5.3. In Section 5.4, we then test the regression function for an independent sample of EPC models from textbooks as a cross-validation. Section 5.5 summarizes the findings.

  12. Predicting anthropogenic soils across the Amazonia

    NASA Astrophysics Data System (ADS)

    Mcmichael, C.; Palace, M. W.; Bush, M. B.; Braswell, B. H.; Hagen, S. C.; Silman, M.; Neves, E.; Czarnecki, C.

    2012-12-01

    Hidden under the forest canopy in lowland Amazonia are nutrient-enriched soils, called terra pretas (or Amazonian black earths), which were formed by prehistoric indigenous populations. These anthrosols are in stark contrast to typical nutrient-poor Amazonian soils, and have retained increased nutrient levels for hundreds of years. Because of their long-term nutrient retaining ability, terra pretas may be crucial for developing sustainable agricultural practices in Amazonia, especially given the deforestation necessary for traditional slash-and-burn systems. However, the frequency and distribution of terra preta soils across the landscape remains debatable, and archaeologists have estimated that terra pretas cover anywhere from 0.1% to 10% of the lowland Amazonian forests. The highest concentration of terra preta soils has been found along the central and eastern portions of the Amazon River and its major tributaries, but whether this is a true pattern or simply reflects sampling bias remains unknown. A possible explanation is that specific environmental or biotic conditions were preferred for human settlement and terra preta formation. Here, we use environmental parameters to predict the probabilities of terra preta soils across lowland Amazonian forests. We compiled a database of 2708 sites across Amazonia, including locations that contain terra pretas (n = 917), and those that are known to be terra preta-free (n = 1791). More than 20 environmental variables, including precipitation, elevation, slope, soil fertility, and distance to river were converted into 90-m resolution raster images across Amazonia and used to model the probability of terra preta occurrence. The relationship between the predictor variables and the occurrence of terra preta was examined using three modeling techniques: logistic regression, auto-logistic regression, and maximum entropy estimations. All three techniques provided similar predictions for terra preta distributions and the amount of area covered by terra preta. Distance to river, locations of bluffs, elevation, and soil fertility were important factors in determining distributions of terra preta, while other environmental variables had less effect. Terra pretas were most likely to be found in central and eastern Amazonia near the confluences of the Amazon River and its major tributaries. Within this general area of higher probability, terra pretas are most likely found atop the bluffs overlooking the rivers as opposed to lying on the floodplain. Interestingly, terra pretas are more probable in areas with less-fertile and more highly weathered soils. Although all three modeling techniques provided similar predictions of terra preta across Amazonia, we suggest that maximum entropy modeling is the best technique to predict anthropogenic soils across the vast Amazonian landscape. The auto-logistic regression corrects for spatial autocorrelation inherent to archaeological surveys, but still requires absence data, which was collected at different times and on different spatial scales than the presence data. The maximum entropy model requires presence only data, accounts for spatial autocorrelation, and is not affected by the differential soil sampling techniques.

  13. A logistic regression equation for estimating the probability of a stream flowing perennially in Massachusetts

    USGS Publications Warehouse

    Bent, Gardner C.; Archfield, Stacey A.

    2002-01-01

    A logistic regression equation was developed for estimating the probability of a stream flowing perennially at a specific site in Massachusetts. The equation provides city and town conservation commissions and the Massachusetts Department of Environmental Protection with an additional method for assessing whether streams are perennial or intermittent at a specific site in Massachusetts. This information is needed to assist these environmental agencies, who administer the Commonwealth of Massachusetts Rivers Protection Act of 1996, which establishes a 200-foot-wide protected riverfront area extending along the length of each side of the stream from the mean annual high-water line along each side of perennial streams, with exceptions in some urban areas. The equation was developed by relating the verified perennial or intermittent status of a stream site to selected basin characteristics of naturally flowing streams (no regulation by dams, surface-water withdrawals, ground-water withdrawals, diversion, waste-water discharge, and so forth) in Massachusetts. Stream sites used in the analysis were identified as perennial or intermittent on the basis of review of measured streamflow at sites throughout Massachusetts and on visual observation at sites in the South Coastal Basin, southeastern Massachusetts. Measured or observed zero flow(s) during months of extended drought as defined by the 310 Code of Massachusetts Regulations (CMR) 10.58(2)(a) were not considered when designating the perennial or intermittent status of a stream site. The database used to develop the equation included a total of 305 stream sites (84 intermittent- and 89 perennial-stream sites in the State, and 50 intermittent- and 82 perennial-stream sites in the South Coastal Basin). Stream sites included in the database had drainage areas that ranged from 0.14 to 8.94 square miles in the State and from 0.02 to 7.00 square miles in the South Coastal Basin.Results of the logistic regression analysis indicate that the probability of a stream flowing perennially at a specific site in Massachusetts can be estimated as a function of (1) drainage area (cube root), (2) drainage density, (3) areal percentage of stratified-drift deposits (square root), (4) mean basin slope, and (5) location in the South Coastal Basin or the remainder of the State. Although the equation developed provides an objective means for estimating the probability of a stream flowing perennially at a specific site, the reliability of the equation is constrained by the data used to develop the equation. The equation may not be reliable for (1) drainage areas less than 0.14 square mile in the State or less than 0.02 square mile in the South Coastal Basin, (2) streams with losing reaches, or (3) streams draining the southern part of the South Coastal Basin and the eastern part of the Buzzards Bay Basin and the entire area of Cape Cod and the Islands Basins.

  14. An Entropy-Based Measure for Assessing Fuzziness in Logistic Regression

    ERIC Educational Resources Information Center

    Weiss, Brandi A.; Dardick, William

    2016-01-01

    This article introduces an entropy-based measure of data-model fit that can be used to assess the quality of logistic regression models. Entropy has previously been used in mixture-modeling to quantify how well individuals are classified into latent classes. The current study proposes the use of entropy for logistic regression models to quantify…

  15. What Are the Odds of that? A Primer on Understanding Logistic Regression

    ERIC Educational Resources Information Center

    Huang, Francis L.; Moon, Tonya R.

    2013-01-01

    The purpose of this Methodological Brief is to present a brief primer on logistic regression, a commonly used technique when modeling dichotomous outcomes. Using data from the National Education Longitudinal Study of 1988 (NELS:88), logistic regression techniques were used to investigate student-level variables in eighth grade (i.e., enrolled in a…

  16. Emergency Assessment of Debris-Flow Hazards from Basins Burned by the Piru, Simi, and Verdale Fires of 2003, Southern California

    USGS Publications Warehouse

    Cannon, Susan H.; Gartner, Joseph E.; Rupert, Michael G.; Michael, John A.

    2003-01-01

    These maps present preliminary assessments of the probability of debris-flow activity and estimates of peak discharges that can potentially be generated by debris-flows issuing from basins burned by the Piru, Simi and Verdale Fires of October 2003 in southern California in response to the 25-year, 10-year, and 2-year 1-hour rain storms. The probability maps are based on the application of a logistic multiple regression model that describes the percent chance of debris-flow production from an individual basin as a function of burned extent, soil properties, basin gradients and storm rainfall. The peak discharge maps are based on application of a multiple-regression model that can be used to estimate debris-flow peak discharge at a basin outlet as a function of basin gradient, burn extent, and storm rainfall. Probabilities of debris-flow occurrence for the Piru Fire range between 2 and 94% and estimates of debris flow peak discharges range between 1,200 and 6,640 ft3/s (34 to 188 m3/s). Basins burned by the Simi Fire show probabilities for debris-flow occurrence between 1 and 98%, and peak discharge estimates between 1,130 and 6,180 ft3/s (32 and 175 m3/s). The probabilities for debris-flow activity calculated for the Verdale Fire range from negligible values to 13%. Peak discharges were not estimated for this fire because of these low probabilities. These maps are intended to identify those basins that are most prone to the largest debris-flow events and provide information for the preliminary design of mitigation measures and for the planning of evacuation timing and routes.

  17. Hospital of Diagnosis Influences the Probability of Receiving Curative Treatment for Esophageal Cancer.

    PubMed

    van Putten, Margreet; Koëter, Marijn; van Laarhoven, Hanneke W M; Lemmens, Valery E P P; Siersema, Peter D; Hulshof, Maarten C C M; Verhoeven, Rob H A; Nieuwenhuijzen, Grard A P

    2018-02-01

    The aim of this article was to study the influence of hospital of diagnosis on the probability of receiving curative treatment and its impact on survival among patients with esophageal cancer (EC). Although EC surgery is centralized in the Netherlands, the disease is often diagnosed in hospitals that do not perform this procedure. Patients with potentially curable esophageal or gastroesophageal junction tumors diagnosed between 2005 and 2013 who were potentially curable (cT1-3,X, any N, M0,X) were selected from the Netherlands Cancer Registry. Multilevel logistic regression was performed to examine the probability to undergo curative treatment (resection with or without neoadjuvant treatment, definitive chemoradiotherapy, or local tumor excision) according to hospital of diagnosis. Effects of variation in probability of undergoing curative treatment among these hospitals on survival were investigated by Cox regression. All 13,017 patients with potentially curable EC, diagnosed in 91 hospitals, were included. The proportion of patients receiving curative treatment ranged from 37% to 83% and from 45% to 86% in the periods 2005-2009 and 2010-2013, respectively, depending on hospital of diagnosis. After adjustment for patient- and hospital-related characteristics these proportions ranged from 41% to 77% and from 50% to 82%, respectively (both P < 0.001). Multivariable survival analyses showed that patients diagnosed in hospitals with a low probability of undergoing curative treatment had a worse overall survival (hazard ratio = 1.13, 95% confidence interval 1.06-1.20; hazard ratio = 1.15, 95% confidence interval 1.07-1.24). The variation in probability of undergoing potentially curative treatment for EC between hospitals of diagnosis and its impact on survival indicates that treatment decision making in EC may be improved.

  18. Predicting Dural Tear in Compound Depressed Skull Fractures: A Prospective Multicenter Correlational Study.

    PubMed

    Salia, Shemsedin Musefa; Mersha, Hagos Biluts; Aklilu, Abenezer Tirsit; Baleh, Abat Sahlu; Lund-Johansen, Morten

    2018-06-01

    Compound depressed skull fracture (DSF) is a neurosurgical emergency. Preoperative knowledge of dural status is indispensable for treatment decision making. This study aimed to determine predictors of dural tear from clinical and imaging characteristics in patients with compound DSF. This prospective, multicenter correlational study in neurosurgical hospitals in Addis Ababa, Ethiopia, included 128 patients operated on from January 1, 2016, to October 31, 2016. Clinical, imaging, and intraoperative findings were evaluated. Univariate and multivariate analyses were used to establish predictors of dural tear. A logistic regression model was developed to predict probability of dural tear. Model validation was done using the receiver operating characteristic curve. Dural tear was seen in 55.5% of 128 patients. Demographics, injury mechanism, clinical presentation, and site of DSF had no significant correlation with dural tear. In univariate and multivariate analyses, depth of fracture depression (odds ratio 1.3, P < 0.001), pneumocephalus (odds ratio 2.8, P = 0.005), and brain contusions/intracerebral hematoma (odds ratio 5.5, P < 0.001) were significantly correlated with dural tear. We developed a logistic regression model (diagnostic test) to calculate probability of dural tear. Using the receiver operating characteristic curve, we determined the cutoff value for a positive test giving the highest accuracy to be 30% with a corresponding sensitivity of 93.0% and specificity of 43.9%. Dural tear in compound DSF can be predicted with 93.0% sensitivity using preoperative findings and may guide treatment decision making in resource-limited settings where risk of extensive cranial surgery outweighs the benefit. Copyright © 2018 Elsevier Inc. All rights reserved.

  19. Datamining approaches for modeling tumor control probability.

    PubMed

    Naqa, Issam El; Deasy, Joseph O; Mu, Yi; Huang, Ellen; Hope, Andrew J; Lindsay, Patricia E; Apte, Aditya; Alaly, James; Bradley, Jeffrey D

    2010-11-01

    Tumor control probability (TCP) to radiotherapy is determined by complex interactions between tumor biology, tumor microenvironment, radiation dosimetry, and patient-related variables. The complexity of these heterogeneous variable interactions constitutes a challenge for building predictive models for routine clinical practice. We describe a datamining framework that can unravel the higher order relationships among dosimetric dose-volume prognostic variables, interrogate various radiobiological processes, and generalize to unseen data before when applied prospectively. Several datamining approaches are discussed that include dose-volume metrics, equivalent uniform dose, mechanistic Poisson model, and model building methods using statistical regression and machine learning techniques. Institutional datasets of non-small cell lung cancer (NSCLC) patients are used to demonstrate these methods. The performance of the different methods was evaluated using bivariate Spearman rank correlations (rs). Over-fitting was controlled via resampling methods. Using a dataset of 56 patients with primary NCSLC tumors and 23 candidate variables, we estimated GTV volume and V75 to be the best model parameters for predicting TCP using statistical resampling and a logistic model. Using these variables, the support vector machine (SVM) kernel method provided superior performance for TCP prediction with an rs=0.68 on leave-one-out testing compared to logistic regression (rs=0.4), Poisson-based TCP (rs=0.33), and cell kill equivalent uniform dose model (rs=0.17). The prediction of treatment response can be improved by utilizing datamining approaches, which are able to unravel important non-linear complex interactions among model variables and have the capacity to predict on unseen data for prospective clinical applications.

  20. Clinical and sonographic risk factors and complications of shoulder dystocia - a case-control study with parity and gestational age matched controls.

    PubMed

    Parantainen, Jukka; Palomäki, Outi; Talola, Nina; Uotila, Jukka

    2014-06-01

    To examine the clinical risk factors and complications of shoulder dystocia today and to evaluate ultrasound methods predicting it. Retrospective, matched case-control study at a University Hospital with 5000 annual deliveries. The study population consisted of 152 deliveries complicated by shoulder dystocia over a period of 8.5 years (January 2004-June 2012) and 152 controls matched for gestational age and parity. The data was collected from the medical records of mothers and children and analyzed by conditional logistic regression. Incidences and odds ratios were calculated for risk factors and complications. Antenatal ultrasound data was analyzed when available by conditional logistic regression to test for significant differences between study groups. Birthweight (OR 12.1 for ≥4000 g; 95% CI 4.18-35.0) and vacuum extraction (OR 3.98; 95% CI 1.25-12.7) remained the most significant clinical risk factors. Only a trend of an association of pregestational or gestational diabetes was noticed (OR 1.87; 95% CI 0.997-3.495, probability of type II error 51%). Of the complications of shoulder dystocia the incidence of brachial plexus palsies was high (40%). Antenatal ultrasound method based on the difference between abdominal and biparietal diameters had a significant difference between cases and controls. The impact of diabetes as a risk factor has diminished, which may reflect improved screening and treatment. Antenatal ultrasound methods are showing some promise, but the predictive value of ultrasound alone is probably low. Copyright © 2014. Published by Elsevier Ireland Ltd.

  1. Developing and Testing a Model to Predict Outcomes of Organizational Change

    PubMed Central

    Gustafson, David H; Sainfort, François; Eichler, Mary; Adams, Laura; Bisognano, Maureen; Steudel, Harold

    2003-01-01

    Objective To test the effectiveness of a Bayesian model employing subjective probability estimates for predicting success and failure of health care improvement projects. Data Sources Experts' subjective assessment data for model development and independent retrospective data on 221 healthcare improvement projects in the United States, Canada, and the Netherlands collected between 1996 and 2000 for validation. Methods A panel of theoretical and practical experts and literature in organizational change were used to identify factors predicting the outcome of improvement efforts. A Bayesian model was developed to estimate probability of successful change using subjective estimates of likelihood ratios and prior odds elicited from the panel of experts. A subsequent retrospective empirical analysis of change efforts in 198 health care organizations was performed to validate the model. Logistic regression and ROC analysis were used to evaluate the model's performance using three alternative definitions of success. Data Collection For the model development, experts' subjective assessments were elicited using an integrative group process. For the validation study, a staff person intimately involved in each improvement project responded to a written survey asking questions about model factors and project outcomes. Results Logistic regression chi-square statistics and areas under the ROC curve demonstrated a high level of model performance in predicting success. Chi-square statistics were significant at the 0.001 level and areas under the ROC curve were greater than 0.84. Conclusions A subjective Bayesian model was effective in predicting the outcome of actual improvement projects. Additional prospective evaluations as well as testing the impact of this model as an intervention are warranted. PMID:12785571

  2. Mortality risk prediction in burn injury: Comparison of logistic regression with machine learning approaches.

    PubMed

    Stylianou, Neophytos; Akbarov, Artur; Kontopantelis, Evangelos; Buchan, Iain; Dunn, Ken W

    2015-08-01

    Predicting mortality from burn injury has traditionally employed logistic regression models. Alternative machine learning methods have been introduced in some areas of clinical prediction as the necessary software and computational facilities have become accessible. Here we compare logistic regression and machine learning predictions of mortality from burn. An established logistic mortality model was compared to machine learning methods (artificial neural network, support vector machine, random forests and naïve Bayes) using a population-based (England & Wales) case-cohort registry. Predictive evaluation used: area under the receiver operating characteristic curve; sensitivity; specificity; positive predictive value and Youden's index. All methods had comparable discriminatory abilities, similar sensitivities, specificities and positive predictive values. Although some machine learning methods performed marginally better than logistic regression the differences were seldom statistically significant and clinically insubstantial. Random forests were marginally better for high positive predictive value and reasonable sensitivity. Neural networks yielded slightly better prediction overall. Logistic regression gives an optimal mix of performance and interpretability. The established logistic regression model of burn mortality performs well against more complex alternatives. Clinical prediction with a small set of strong, stable, independent predictors is unlikely to gain much from machine learning outside specialist research contexts. Copyright © 2015 Elsevier Ltd and ISBI. All rights reserved.

  3. Who cares? A comparison of informal and formal care provision in Spain, England and the USA.

    PubMed

    Solé-Auró, Aïda; Crimmins, Eileen M

    2014-03-01

    This paper investigates the prevalence of incapacity in performing daily activities and the associations between household composition and availability of family members and receipt of care among older adults with functioning problems in Spain, England and the United States of America (USA). We examine how living arrangements, marital status, child availability, limitations in functioning ability, age and gender affect the probability of receiving formal care and informal care from household members and from others in three countries with different family structures, living arrangements and policies supporting care of the incapacitated. Data sources include the 2006 Survey of Health, Ageing and Retirement in Europe for Spain, the third wave of the English Longitudinal Study of Ageing (2006), and the eighth wave of the USA Health and Retirement Study (2006). Logistic and multinomial logistic regressions are used to estimate the probability of receiving care and the sources of care among persons age 50 and older. The percentage of people with functional limitations receiving care is higher in Spain. More care comes from outside the household in the USA and England than in Spain. The use of formal care among the incapacitated is lowest in the USA and highest in Spain.

  4. Bias in logistic regression due to imperfect diagnostic test results and practical correction approaches.

    PubMed

    Valle, Denis; Lima, Joanna M Tucker; Millar, Justin; Amratia, Punam; Haque, Ubydul

    2015-11-04

    Logistic regression is a statistical model widely used in cross-sectional and cohort studies to identify and quantify the effects of potential disease risk factors. However, the impact of imperfect tests on adjusted odds ratios (and thus on the identification of risk factors) is under-appreciated. The purpose of this article is to draw attention to the problem associated with modelling imperfect diagnostic tests, and propose simple Bayesian models to adequately address this issue. A systematic literature review was conducted to determine the proportion of malaria studies that appropriately accounted for false-negatives/false-positives in a logistic regression setting. Inference from the standard logistic regression was also compared with that from three proposed Bayesian models using simulations and malaria data from the western Brazilian Amazon. A systematic literature review suggests that malaria epidemiologists are largely unaware of the problem of using logistic regression to model imperfect diagnostic test results. Simulation results reveal that statistical inference can be substantially improved when using the proposed Bayesian models versus the standard logistic regression. Finally, analysis of original malaria data with one of the proposed Bayesian models reveals that microscopy sensitivity is strongly influenced by how long people have lived in the study region, and an important risk factor (i.e., participation in forest extractivism) is identified that would have been missed by standard logistic regression. Given the numerous diagnostic methods employed by malaria researchers and the ubiquitous use of logistic regression to model the results of these diagnostic tests, this paper provides critical guidelines to improve data analysis practice in the presence of misclassification error. Easy-to-use code that can be readily adapted to WinBUGS is provided, enabling straightforward implementation of the proposed Bayesian models.

  5. Confirming the validity of the CONUT system for early detection and monitoring of clinical undernutrition: comparison with two logistic regression models developed using SGA as the gold standard.

    PubMed

    González-Madroño, A; Mancha, A; Rodríguez, F J; Culebras, J; de Ulibarri, J I

    2012-01-01

    To ratify previous validations of the CONUT nutritional screening tool by the development of two probabilistic models using the parameters included in the CONUT, to see if the CONUT´s effectiveness could be improved. It is a two step prospective study. In Step 1, 101 patients were randomly selected, and SGA and CONUT was made. With data obtained an unconditional logistic regression model was developed, and two variants of CONUT were constructed: Model 1 was made by a method of logistic regression. Model 2 was made by dividing the probabilities of undernutrition obtained in model 1 in seven regular intervals. In step 2, 60 patients were selected and underwent the SGA, the original CONUT and the new models developed. The diagnostic efficacy of the original CONUT and the new models was tested by means of ROC curves. Both samples 1 and 2 were put together to measure the agreement degree between the original CONUT and SGA, and diagnostic efficacy parameters were calculated. No statistically significant differences were found between sample 1 and 2, regarding age, sex and medical/surgical distribution and undernutrition rates were similar (over 40%). The AUC for the ROC curves were 0.862 for the original CONUT, and 0.839 and 0.874, for model 1 and 2 respectively. The kappa index for the CONUT and SGA was 0.680. The CONUT, with the original scores assigned by the authors is equally good than mathematical models and thus is a valuable tool, highly useful and efficient for the purpose of Clinical Undernutrition screening.

  6. The impact of the 2008 financial crisis on food security and food expenditures in Mexico: a disproportionate effect on the vulnerable.

    PubMed

    Vilar-Compte, Mireya; Sandoval-Olascoaga, Sebastian; Bernal-Stuart, Ana; Shimoga, Sandhya; Vargas-Bustamante, Arturo

    2015-11-01

    The present paper investigated the impact of the 2008 financial crisis on food security in Mexico and how it disproportionally affected vulnerable households. A generalized ordered logistic regression was estimated to assess the impact of the crisis on households' food security status. An ordinary least squares and a quantile regression were estimated to evaluate the effect of the financial crisis on a continuous proxy measure of food security defined as the share of a household's current income devoted to food expenditures. Setting Both analyses were performed using pooled cross-sectional data from the Mexican National Household Income and Expenditure Survey 2008 and 2010. The analytical sample included 29,468 households in 2008 and 27,654 in 2010. The generalized ordered logistic model showed that the financial crisis significantly (P<0·05) decreased the probability of being food secure, mildly or moderately food insecure, compared with being severely food insecure (OR=0·74). A similar but smaller effect was found when comparing severely and moderately food-insecure households with mildly food-insecure and food-secure households (OR=0·81). The ordinary least squares model showed that the crisis significantly (P<0·05) increased the share of total income spent on food (β coefficient of 0·02). The quantile regression confirmed the findings suggested by the generalized ordered logistic model, showing that the effects of the crisis were more profound among poorer households. The results suggest that households that were more vulnerable before the financial crisis saw a worsened effect in terms of food insecurity with the crisis. Findings were consistent with both measures of food security--one based on self-reported experience and the other based on food spending.

  7. Landslide susceptibility mapping for a part of North Anatolian Fault Zone (Northeast Turkey) using logistic regression model

    NASA Astrophysics Data System (ADS)

    Demir, Gökhan; aytekin, mustafa; banu ikizler, sabriye; angın, zekai

    2013-04-01

    The North Anatolian Fault is know as one of the most active and destructive fault zone which produced many earthquakes with high magnitudes. Along this fault zone, the morphology and the lithological features are prone to landsliding. However, many earthquake induced landslides were recorded by several studies along this fault zone, and these landslides caused both injuiries and live losts. Therefore, a detailed landslide susceptibility assessment for this area is indispancable. In this context, a landslide susceptibility assessment for the 1445 km2 area in the Kelkit River valley a part of North Anatolian Fault zone (Eastern Black Sea region of Turkey) was intended with this study, and the results of this study are summarized here. For this purpose, geographical information system (GIS) and a bivariate statistical model were used. Initially, Landslide inventory maps are prepared by using landslide data determined by field surveys and landslide data taken from General Directorate of Mineral Research and Exploration. The landslide conditioning factors are considered to be lithology, slope gradient, slope aspect, topographical elevation, distance to streams, distance to roads and distance to faults, drainage density and fault density. ArcGIS package was used to manipulate and analyze all the collected data Logistic regression method was applied to create a landslide susceptibility map. Landslide susceptibility maps were divided into five susceptibility regions such as very low, low, moderate, high and very high. The result of the analysis was verified using the inventoried landslide locations and compared with the produced probability model. For this purpose, Area Under Curvature (AUC) approach was applied, and a AUC value was obtained. Based on this AUC value, the obtained landslide susceptibility map was concluded as satisfactory. Keywords: North Anatolian Fault Zone, Landslide susceptibility map, Geographical Information Systems, Logistic Regression Analysis.

  8. Forage site selection by lesser snow geese during autumn staging on the Arctic National Wildlife Refuge, Alaska

    USGS Publications Warehouse

    Hupp, Jerry W.; Robertson, Donna G.

    1998-01-01

    Lesser snow geese (Chen caerulescens caerulescens) of the Western Canadian Arctic Population feed intensively for 2-4 weeks on the coastal plain of the Beaufort Sea in Canada and Alaska at the beginning of their autumn migration. Petroleum leasing proposed for the Alaskan portion of the staging area on the Arctic National Wildlife Refuge (ANWR) could affect staging habitats and their use by geese. Therefore we studied availability, distribution, and use by snow geese of tall and russett cotton-grass (Eriophorum angustifolium and E. russeolum, respectively) feeding habitats on the ANWR. We studied selection of feeding habitats at 3 spatial scales (feeding sites [0.06 m2], feeding patches [ca. 100 m2], and feeding areas [>1 ha]) during 1990-93. We used logistic regression analysis to discriminate differences in soil moisture and vegetation between 1,548 feeding sites where snow geese exploited individual cotton-grass plants and 1,143 unexploited sites at 61 feeding patches in 1990. Feeding likelihood increased with greater soil moisture and decreased where nonforage species were present. We tested the logistic regression model in 1991 by releasing human-imprinted snow geese into 4 10 × 20-m enclosed plots where plant communities had been mapped, habitats sampled, and feeding probabilities calculated. Geese selected more feeding sites per square meter in areas of predicted high quality feeding habitat (feeding probability ≥ 0.6) than in medium (feeding probability = 0.3-0.59) or poor (feeding probability < 0.3) quality habitat (P < 0.0001). Geese increasingly used medium quality areas and spent more time feeding as trials progressed and forage was presumably reduced in high quality habitats. We examined relationships between underground biomass of plants, feeding probability, and surface microrelief at 474 0.06- m2 sites in 20 thermokarst pits in 1992. Feeding probability was correlated with the percentage of underground biomass composed of cotton-grass (r = 0.56). Feeding probability and relative availability of cotton-grass forage were highest in flooded soils along the ecotone of flooded and upland habitats. In 1992, we also used the logistic regression model to estimate availability of high quality feeding sites on 192 80 × 90-m plots that were randomly located on 24 study areas. A mean of 1.6% of the area sampled in each plot was classified as high quality feeding habitat at 23 of the study areas. Relative availability of high quality sites was highest in troughs, thermokarst pits, and water tracks because saturated soils in those microreliefs were dominated by cotton-grass. Relative availability of high quality sites was lower in saturated soils of basins (low-centered polygons, wet meadows, and strangmoor) because that microrelief was dominated by Carex spp. Most (63%) of the saturated area on the ANWR coastal plain was in basins. We examined distribution of feeding patches relative to microrelief in 49 snow goose feeding areas in 1993. Only 2.5% of the tundra in each feeding area was exploited by snow geese. Snow geese preferentially fed in thermokarst pits, water tracks, and troughs, and avoided basins and uplands. Feeding areas had more thermokarst pit but less basin microrelief than adjacent randomly-selected areas. Thermokarst pits and water tracks occurred most frequently in regions of the coastal plain where geese were observed most often during aerial surveys (1982-93). Microrelief influenced selection of feeding patches and feeding areas and may have affected snow goose distribution on the ANWR. Potential feeding patches were widely distributed but composed a small percentage (≤2.5%) of the tundra landscape and were highly interspersed with less suitable habitat. The Western Canadian Arctic Population probably used a large staging area on the Beaufort Sea coastal plain because snow geese exploited a spatially and temporally heterogeneous resource.

  9. Multinomial Logistic Regression & Bootstrapping for Bayesian Estimation of Vertical Facies Prediction in Heterogeneous Sandstone Reservoirs

    NASA Astrophysics Data System (ADS)

    Al-Mudhafar, W. J.

    2013-12-01

    Precisely prediction of rock facies leads to adequate reservoir characterization by improving the porosity-permeability relationships to estimate the properties in non-cored intervals. It also helps to accurately identify the spatial facies distribution to perform an accurate reservoir model for optimal future reservoir performance. In this paper, the facies estimation has been done through Multinomial logistic regression (MLR) with respect to the well logs and core data in a well in upper sandstone formation of South Rumaila oil field. The entire independent variables are gamma rays, formation density, water saturation, shale volume, log porosity, core porosity, and core permeability. Firstly, Robust Sequential Imputation Algorithm has been considered to impute the missing data. This algorithm starts from a complete subset of the dataset and estimates sequentially the missing values in an incomplete observation by minimizing the determinant of the covariance of the augmented data matrix. Then, the observation is added to the complete data matrix and the algorithm continues with the next observation with missing values. The MLR has been chosen to estimate the maximum likelihood and minimize the standard error for the nonlinear relationships between facies & core and log data. The MLR is used to predict the probabilities of the different possible facies given each independent variable by constructing a linear predictor function having a set of weights that are linearly combined with the independent variables by using a dot product. Beta distribution of facies has been considered as prior knowledge and the resulted predicted probability (posterior) has been estimated from MLR based on Baye's theorem that represents the relationship between predicted probability (posterior) with the conditional probability and the prior knowledge. To assess the statistical accuracy of the model, the bootstrap should be carried out to estimate extra-sample prediction error by randomly drawing datasets with replacement from the training data. Each sample has the same size of the original training set and it can be conducted N times to produce N bootstrap datasets to re-fit the model accordingly to decrease the squared difference between the estimated and observed categorical variables (facies) leading to decrease the degree of uncertainty.

  10. Applying additive logistic regression to data derived from sensors monitoring behavioral and physiological characteristics of dairy cows to detect lameness.

    PubMed

    Kamphuis, C; Frank, E; Burke, J K; Verkerk, G A; Jago, J G

    2013-01-01

    The hypothesis was that sensors currently available on farm that monitor behavioral and physiological characteristics have potential for the detection of lameness in dairy cows. This was tested by applying additive logistic regression to variables derived from sensor data. Data were collected between November 2010 and June 2012 on 5 commercial pasture-based dairy farms. Sensor data from weigh scales (liveweight), pedometers (activity), and milk meters (milking order, unadjusted and adjusted milk yield in the first 2 min of milking, total milk yield, and milking duration) were collected at every milking from 4,904 cows. Lameness events were recorded by farmers who were trained in detecting lameness before the study commenced. A total of 318 lameness events affecting 292 cows were available for statistical analyses. For each lameness event, the lame cow's sensor data for a time period of 14 d before observation date were randomly matched by farm and date to 10 healthy cows (i.e., cows that were not lame and had no other health event recorded for the matched time period). Sensor data relating to the 14-d time periods were used for developing univariable (using one source of sensor data) and multivariable (using multiple sources of sensor data) models. Model development involved the use of additive logistic regression by applying the LogitBoost algorithm with a regression tree as base learner. The model's output was a probability estimate for lameness, given the sensor data collected during the 14-d time period. Models were validated using leave-one-farm-out cross-validation and, as a result of this validation, each cow in the data set (318 lame and 3,180 nonlame cows) received a probability estimate for lameness. Based on the area under the curve (AUC), results indicated that univariable models had low predictive potential, with the highest AUC values found for liveweight (AUC=0.66), activity (AUC=0.60), and milking order (AUC=0.65). Combining these 3 sensors improved AUC to 0.74. Detection performance of this combined model varied between farms but it consistently and significantly outperformed univariable models across farms at a fixed specificity of 80%. Still, detection performance was not high enough to be implemented in practice on large, pasture-based dairy farms. Future research may improve performance by developing variables based on sensor data of liveweight, activity, and milking order, but that better describe changes in sensor data patterns when cows go lame. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  11. An EM-based semi-parametric mixture model approach to the regression analysis of competing-risks data.

    PubMed

    Ng, S K; McLachlan, G J

    2003-04-15

    We consider a mixture model approach to the regression analysis of competing-risks data. Attention is focused on inference concerning the effects of factors on both the probability of occurrence and the hazard rate conditional on each of the failure types. These two quantities are specified in the mixture model using the logistic model and the proportional hazards model, respectively. We propose a semi-parametric mixture method to estimate the logistic and regression coefficients jointly, whereby the component-baseline hazard functions are completely unspecified. Estimation is based on maximum likelihood on the basis of the full likelihood, implemented via an expectation-conditional maximization (ECM) algorithm. Simulation studies are performed to compare the performance of the proposed semi-parametric method with a fully parametric mixture approach. The results show that when the component-baseline hazard is monotonic increasing, the semi-parametric and fully parametric mixture approaches are comparable for mildly and moderately censored samples. When the component-baseline hazard is not monotonic increasing, the semi-parametric method consistently provides less biased estimates than a fully parametric approach and is comparable in efficiency in the estimation of the parameters for all levels of censoring. The methods are illustrated using a real data set of prostate cancer patients treated with different dosages of the drug diethylstilbestrol. Copyright 2003 John Wiley & Sons, Ltd.

  12. Development of an Algorithm for Stroke Prediction: A National Health Insurance Database Study in Korea.

    PubMed

    Min, Seung Nam; Park, Se Jin; Kim, Dong Joon; Subramaniyam, Murali; Lee, Kyung-Sun

    2018-01-01

    Stroke is the second leading cause of death worldwide and remains an important health burden both for the individuals and for the national healthcare systems. Potentially modifiable risk factors for stroke include hypertension, cardiac disease, diabetes, and dysregulation of glucose metabolism, atrial fibrillation, and lifestyle factors. We aimed to derive a model equation for developing a stroke pre-diagnosis algorithm with the potentially modifiable risk factors. We used logistic regression for model derivation, together with data from the database of the Korea National Health Insurance Service (NHIS). We reviewed the NHIS records of 500,000 enrollees. For the regression analysis, data regarding 367 stroke patients were selected. The control group consisted of 500 patients followed up for 2 consecutive years and with no history of stroke. We developed a logistic regression model based on information regarding several well-known modifiable risk factors. The developed model could correctly discriminate between normal subjects and stroke patients in 65% of cases. The model developed in the present study can be applied in the clinical setting to estimate the probability of stroke in a year and thus improve the stroke prevention strategies in high-risk patients. The approach used to develop the stroke prevention algorithm can be applied for developing similar models for the pre-diagnosis of other diseases. © 2018 S. Karger AG, Basel.

  13. The relationship between problem gambling and mental and physical health correlates among a nationally representative sample of Canadian women.

    PubMed

    Afifi, Tracie O; Cox, Brian J; Martens, Patricia J; Sareen, Jitender; Enns, Murray W

    2010-01-01

    Gambling has become an increasingly common activity among women since the widespread growth of the gambling industry. Currently, our knowledge of the relationship between problem gambling among women and mental and physical correlates is limited. Therefore, important relationships between problem gambling and health and functioning, mental disorders, physical health conditions, and help-seeking behaviours among women were examined using a nationally representative Canadian sample. Data were from the nationally representative Canadian Community Health Survey Cycle 1.2 (CCHS 1.2; n = 10,056 women aged 15 years and older; data collected in 2002). The statistical analysis included binary logistic regression, multinomial logistic regression, and linear regression models. Past 12-month problem gambling was associated with a significantly higher probability of current lower general health, suicidal ideation and attempts, decreased psychological well-being, increased distress, depression, mania, panic attacks, social phobia, agoraphobia, alcohol dependence, any mental disorder, comorbidity of mental disorders, chronic bronchitis, fibromyalgia, migraine headaches, help-seeking from a professional, attending a self-help group, and calling a telephone help line (odds ratios ranged from 1.5 to 8.2). Problem gambling was associated with a broad range of negative health correlates among women. Problem gambling is an important public health concern. These findings can be used to inform healthy public policies on gambling.

  14. The implementation of rare events logistic regression to predict the distribution of mesophotic hard corals across the main Hawaiian Islands.

    PubMed

    Veazey, Lindsay M; Franklin, Erik C; Kelley, Christopher; Rooney, John; Frazer, L Neil; Toonen, Robert J

    2016-01-01

    Predictive habitat suitability models are powerful tools for cost-effective, statistically robust assessment of the environmental drivers of species distributions. The aim of this study was to develop predictive habitat suitability models for two genera of scleractinian corals (Leptoserisand Montipora) found within the mesophotic zone across the main Hawaiian Islands. The mesophotic zone (30-180 m) is challenging to reach, and therefore historically understudied, because it falls between the maximum limit of SCUBA divers and the minimum typical working depth of submersible vehicles. Here, we implement a logistic regression with rare events corrections to account for the scarcity of presence observations within the dataset. These corrections reduced the coefficient error and improved overall prediction success (73.6% and 74.3%) for both original regression models. The final models included depth, rugosity, slope, mean current velocity, and wave height as the best environmental covariates for predicting the occurrence of the two genera in the mesophotic zone. Using an objectively selected theta ("presence") threshold, the predicted presence probability values (average of 0.051 for Leptoseris and 0.040 for Montipora) were translated to spatially-explicit habitat suitability maps of the main Hawaiian Islands at 25 m grid cell resolution. Our maps are the first of their kind to use extant presence and absence data to examine the habitat preferences of these two dominant mesophotic coral genera across Hawai'i.

  15. Logistic regression for risk factor modelling in stuttering research.

    PubMed

    Reed, Phil; Wu, Yaqionq

    2013-06-01

    To outline the uses of logistic regression and other statistical methods for risk factor analysis in the context of research on stuttering. The principles underlying the application of a logistic regression are illustrated, and the types of questions to which such a technique has been applied in the stuttering field are outlined. The assumptions and limitations of the technique are discussed with respect to existing stuttering research, and with respect to formulating appropriate research strategies to accommodate these considerations. Finally, some alternatives to the approach are briefly discussed. The way the statistical procedures are employed are demonstrated with some hypothetical data. Research into several practical issues concerning stuttering could benefit if risk factor modelling were used. Important examples are early diagnosis, prognosis (whether a child will recover or persist) and assessment of treatment outcome. After reading this article you will: (a) Summarize the situations in which logistic regression can be applied to a range of issues about stuttering; (b) Follow the steps in performing a logistic regression analysis; (c) Describe the assumptions of the logistic regression technique and the precautions that need to be checked when it is employed; (d) Be able to summarize its advantages over other techniques like estimation of group differences and simple regression. Copyright © 2012 Elsevier Inc. All rights reserved.

  16. The New York Sepsis Severity Score: Development of a Risk-Adjusted Severity Model for Sepsis.

    PubMed

    Phillips, Gary S; Osborn, Tiffany M; Terry, Kathleen M; Gesten, Foster; Levy, Mitchell M; Lemeshow, Stanley

    2018-05-01

    In accordance with Rory's Regulations, hospitals across New York State developed and implemented protocols for sepsis recognition and treatment to reduce variations in evidence informed care and preventable mortality. The New York Department of Health sought to develop a risk assessment model for accurate and standardized hospital mortality comparisons of adult septic patients across institutions using case-mix adjustment. Retrospective evaluation of prospectively collected data. Data from 43,204 severe sepsis and septic shock patients from 179 hospitals across New York State were evaluated. Prospective data were submitted to a database from January 1, 2015, to December 31, 2015. None. Maximum likelihood logistic regression was used to estimate model coefficients used in the New York State risk model. The mortality probability was estimated using a logistic regression model. Variables to be included in the model were determined as part of the model-building process. Interactions between variables were included if they made clinical sense and if their p values were less than 0.05. Model development used a random sample of 90% of available patients and was validated using the remaining 10%. Hosmer-Lemeshow goodness of fit p values were considerably greater than 0.05, suggesting good calibration. Areas under the receiver operator curve in the developmental and validation subsets were 0.770 (95% CI, 0.765-0.775) and 0.773 (95% CI, 0.758-0.787), respectively, indicating good discrimination. Development and validation datasets had similar distributions of estimated mortality probabilities. Mortality increased with rising age, comorbidities, and lactate. The New York Sepsis Severity Score accurately estimated the probability of hospital mortality in severe sepsis and septic shock patients. It performed well with respect to calibration and discrimination. This sepsis-specific model provides an accurate, comprehensive method for standardized mortality comparison of adult patients with severe sepsis and septic shock.

  17. High Prevalence of Post-Traumatic Stress Symptoms in Relation to Social Factors in Affected Population One Year after the Fukushima Nuclear Disaster

    PubMed Central

    Tsujiuchi, Takuya; Yamaguchi, Maya; Masuda, Kazutaka; Tsuchida, Marisa; Inomata, Tadashi; Kumano, Hiroaki; Kikuchi, Yasushi; Augusterfer, Eugene F.; Mollica, Richard F.

    2016-01-01

    Objective This study investigated post-traumatic stress symptoms in relation to the population affected by the Fukushima Nuclear Disaster, one year after the disaster. Additionally, we investigated social factors, such as forced displacement, which we hypothesize contributed to the high prevalence of post-traumatic stress. Finally, we report of written narratives that were collected from the impacted population. Design and Settings Using the Impact of Event Scale-Revised (IES-R), questionnaires were sent to 2,011 households of those displaced from Fukushima prefecture living temporarily in Saitama prefecture. Of the 490 replies; 350 met the criteria for inclusion in the study. Multiple logistic regression analysis was performed to examine several characteristics and variables of social factors as predictors of probable post-traumatic stress disorder, PTSD. Results The mean score of IES-R was 36.15±21.55, with 59.4% having scores of 30 or higher, thus indicating a probable PTSD. No significant differences in percentages of high-risk subjects were found among sex, age, evacuation area, housing damages, tsunami affected, family split-up, and acquaintance support. By the result of multiple logistic regression analysis, the significant predictors of probable PTSD were chronic physical diseases (OR = 1.97), chronic mental diseases (OR = 6.25), worries about livelihood (OR = 2.27), lost jobs (OR = 1.71), lost social ties (OR = 2.27), and concerns about compensation (OR = 3.74). Conclusion Although there are limitations in assuming a diagnosis of PTSD based on self-report IES-R, our findings indicate that there was a high-risk of PTSD strongly related to the nuclear disaster and its consequent evacuation and displacement. Therefore, recovery efforts must focus not only on medical and psychological treatment alone, but also on social and economic issues related to the displacement, as well. PMID:27002324

  18. Comparative effectiveness of echinocandins versus fluconazole therapy for the treatment of adult candidaemia due to Candida parapsilosis: a retrospective observational cohort study of the Mycoses Study Group (MSG-12).

    PubMed

    Chiotos, Kathleen; Vendetti, Neika; Zaoutis, Theoklis E; Baddley, John; Ostrosky-Zeichner, Luis; Pappas, Peter; Fisher, Brian T

    2016-12-01

    A polymorphism in the gene encoding β-1,3-glucan synthase, the target of the echinocandin class of antifungals, results in increased in vitro MICs of the echinocandins. This has resulted in controversy surrounding use of the echinocandins for treatment of Candida parapsilosis candidaemia. We aimed to compare 30 day mortality in adults with C. parapsilosis candidaemia treated with echinocandins versus fluconazole. This is a retrospective observational cohort study. We used the Premier Perspective Database to identify adult patients with C. parapsilosis candidaemia treated with only fluconazole or only an echinocandin as definitive therapy. The primary outcome was 30 day mortality. Propensity scores were derived to estimate the probability the patient would have received either an echinocandin or fluconazole. Inverse probability of treatment weighting (IPTW) was used in a weighted logistic regression to calculate odds of 30 day mortality. There were 307 unique patients with C. parapsilosis candidaemia. One hundred and twenty-six (41%) received fluconazole and 181 (59%) received an echinocandin. Age, gender, race, year of admission, need for ICU resources in the week prior to candidaemia onset, and receipt of vasopressors on the day of candidaemia onset were included in the propensity score model used to calculate inverse probability of treatment weights. Weighted logistic regression demonstrated no difference in 30 day mortality between patients receiving an echinocandin as compared with fluconazole (OR 0.82, 95% CI 0.33-2.07). Our result supports the 2016 IDSA invasive candidiasis guidelines, which no longer clearly favour treatment with fluconazole over an echinocandin for C. parapsilosis candidaemia. © The Author 2016. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  19. Bridging non-human primate correlates of protection to reassess the Anthrax Vaccine Adsorbed booster schedule in humans.

    PubMed

    Schiffer, Jarad M; Chen, Ligong; Dalton, Shannon; Niemuth, Nancy A; Sabourin, Carol L; Quinn, Conrad P

    2015-07-17

    Anthrax Vaccine Adsorbed (AVA, BioThrax) is approved for use in humans as a priming series of 3 intramuscular (i.m.) injections (0, 1, 6 months; 3-IM) with boosters at 12 and 18 months, and annually thereafter for those at continued risk of infection. A reduction in AVA booster frequency would lessen the burden of vaccination, reduce the cumulative frequency of vaccine associated adverse events and potentially expand vaccine coverage by requiring fewer doses per schedule. Because human inhalation anthrax studies are neither feasible nor ethical, AVA efficacy estimates are determined using cross-species bridging of immune correlates of protection (COP) identified in animal models. We have previously reported that the AVA 3-IM priming series provided high levels of protection in non-human primates (NHP) against inhalation anthrax for up to 4 years after the first vaccination. Penalized logistic regressions of those NHP immunological data identified that anti-protective antigen (anti-PA) IgG concentration measured just prior to infectious challenge was the most accurate single COP. In the present analysis, cross-species logistic regression models of this COP were used to predict probability of survival during a 43 month study in humans receiving the current 3-dose priming and 4 boosters (12, 18, 30 and 42 months; 7-IM) and reduced schedules with boosters at months 18 and 42 only (5-IM), or at month 42 only (4-IM). All models predicted high survival probabilities for the reduced schedules from 7 to 43 months. The predicted survival probabilities for the reduced schedules were 86.8% (4-IM) and 95.8% (5-IM) at month 42 when antibody levels were lowest. The data indicated that 4-IM and 5-IM are both viable alternatives to the current AVA pre-exposure prophylaxis schedule. Published by Elsevier Ltd.

  20. Dynamic Dimensionality Selection for Bayesian Classifier Ensembles

    DTIC Science & Technology

    2015-03-19

    learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but much more...classifier, Generative learning, Discriminative learning, Naïve Bayes, Feature selection, Logistic regression , higher order attribute independence 16...discriminative learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but

  1. A review of logistic regression models used to predict post-fire tree mortality of western North American conifers

    Treesearch

    Travis Woolley; David C. Shaw; Lisa M. Ganio; Stephen Fitzgerald

    2012-01-01

    Logistic regression models used to predict tree mortality are critical to post-fire management, planning prescribed bums and understanding disturbance ecology. We review literature concerning post-fire mortality prediction using logistic regression models for coniferous tree species in the western USA. We include synthesis and review of: methods to develop, evaluate...

  2. Preserving Institutional Privacy in Distributed binary Logistic Regression.

    PubMed

    Wu, Yuan; Jiang, Xiaoqian; Ohno-Machado, Lucila

    2012-01-01

    Privacy is becoming a major concern when sharing biomedical data across institutions. Although methods for protecting privacy of individual patients have been proposed, it is not clear how to protect the institutional privacy, which is many times a critical concern of data custodians. Built upon our previous work, Grid Binary LOgistic REgression (GLORE)1, we developed an Institutional Privacy-preserving Distributed binary Logistic Regression model (IPDLR) that considers both individual and institutional privacy for building a logistic regression model in a distributed manner. We tested our method using both simulated and clinical data, showing how it is possible to protect the privacy of individuals and of institutions using a distributed strategy.

  3. Covariate Imbalance and Adjustment for Logistic Regression Analysis of Clinical Trial Data

    PubMed Central

    Ciolino, Jody D.; Martin, Reneé H.; Zhao, Wenle; Jauch, Edward C.; Hill, Michael D.; Palesch, Yuko Y.

    2014-01-01

    In logistic regression analysis for binary clinical trial data, adjusted treatment effect estimates are often not equivalent to unadjusted estimates in the presence of influential covariates. This paper uses simulation to quantify the benefit of covariate adjustment in logistic regression. However, International Conference on Harmonization guidelines suggest that covariate adjustment be pre-specified. Unplanned adjusted analyses should be considered secondary. Results suggest that that if adjustment is not possible or unplanned in a logistic setting, balance in continuous covariates can alleviate some (but never all) of the shortcomings of unadjusted analyses. The case of log binomial regression is also explored. PMID:24138438

  4. Differentially private distributed logistic regression using private and public data.

    PubMed

    Ji, Zhanglong; Jiang, Xiaoqian; Wang, Shuang; Xiong, Li; Ohno-Machado, Lucila

    2014-01-01

    Privacy protecting is an important issue in medical informatics and differential privacy is a state-of-the-art framework for data privacy research. Differential privacy offers provable privacy against attackers who have auxiliary information, and can be applied to data mining models (for example, logistic regression). However, differentially private methods sometimes introduce too much noise and make outputs less useful. Given available public data in medical research (e.g. from patients who sign open-consent agreements), we can design algorithms that use both public and private data sets to decrease the amount of noise that is introduced. In this paper, we modify the update step in Newton-Raphson method to propose a differentially private distributed logistic regression model based on both public and private data. We try our algorithm on three different data sets, and show its advantage over: (1) a logistic regression model based solely on public data, and (2) a differentially private distributed logistic regression model based on private data under various scenarios. Logistic regression models built with our new algorithm based on both private and public datasets demonstrate better utility than models that trained on private or public datasets alone without sacrificing the rigorous privacy guarantee.

  5. Logistic regression analysis of conventional ultrasonography, strain elastosonography, and contrast-enhanced ultrasound characteristics for the differentiation of benign and malignant thyroid nodules

    PubMed Central

    Deng, Yingyuan; Wang, Tianfu; Chen, Siping; Liu, Weixiang

    2017-01-01

    The aim of the study is to screen the significant sonographic features by logistic regression analysis and fit a model to diagnose thyroid nodules. A total of 525 pathological thyroid nodules were retrospectively analyzed. All the nodules underwent conventional ultrasonography (US), strain elastosonography (SE), and contrast -enhanced ultrasound (CEUS). Those nodules’ 12 suspicious sonographic features were used to assess thyroid nodules. The significant features of diagnosing thyroid nodules were picked out by logistic regression analysis. All variables that were statistically related to diagnosis of thyroid nodules, at a level of p < 0.05 were embodied in a logistic regression analysis model. The significant features in the logistic regression model of diagnosing thyroid nodules were calcification, suspected cervical lymph node metastasis, hypoenhancement pattern, margin, shape, vascularity, posterior acoustic, echogenicity, and elastography score. According to the results of logistic regression analysis, the formula that could predict whether or not thyroid nodules are malignant was established. The area under the receiver operating curve (ROC) was 0.930 and the sensitivity, specificity, accuracy, positive predictive value, and negative predictive value were 83.77%, 89.56%, 87.05%, 86.04%, and 87.79% respectively. PMID:29228030

  6. Logistic regression analysis of conventional ultrasonography, strain elastosonography, and contrast-enhanced ultrasound characteristics for the differentiation of benign and malignant thyroid nodules.

    PubMed

    Pang, Tiantian; Huang, Leidan; Deng, Yingyuan; Wang, Tianfu; Chen, Siping; Gong, Xuehao; Liu, Weixiang

    2017-01-01

    The aim of the study is to screen the significant sonographic features by logistic regression analysis and fit a model to diagnose thyroid nodules. A total of 525 pathological thyroid nodules were retrospectively analyzed. All the nodules underwent conventional ultrasonography (US), strain elastosonography (SE), and contrast -enhanced ultrasound (CEUS). Those nodules' 12 suspicious sonographic features were used to assess thyroid nodules. The significant features of diagnosing thyroid nodules were picked out by logistic regression analysis. All variables that were statistically related to diagnosis of thyroid nodules, at a level of p < 0.05 were embodied in a logistic regression analysis model. The significant features in the logistic regression model of diagnosing thyroid nodules were calcification, suspected cervical lymph node metastasis, hypoenhancement pattern, margin, shape, vascularity, posterior acoustic, echogenicity, and elastography score. According to the results of logistic regression analysis, the formula that could predict whether or not thyroid nodules are malignant was established. The area under the receiver operating curve (ROC) was 0.930 and the sensitivity, specificity, accuracy, positive predictive value, and negative predictive value were 83.77%, 89.56%, 87.05%, 86.04%, and 87.79% respectively.

  7. Prevalence and Determinants of Preterm Birth in Tehran, Iran: A Comparison between Logistic Regression and Decision Tree Methods.

    PubMed

    Amini, Payam; Maroufizadeh, Saman; Samani, Reza Omani; Hamidi, Omid; Sepidarkish, Mahdi

    2017-06-01

    Preterm birth (PTB) is a leading cause of neonatal death and the second biggest cause of death in children under five years of age. The objective of this study was to determine the prevalence of PTB and its associated factors using logistic regression and decision tree classification methods. This cross-sectional study was conducted on 4,415 pregnant women in Tehran, Iran, from July 6-21, 2015. Data were collected by a researcher-developed questionnaire through interviews with mothers and review of their medical records. To evaluate the accuracy of the logistic regression and decision tree methods, several indices such as sensitivity, specificity, and the area under the curve were used. The PTB rate was 5.5% in this study. The logistic regression outperformed the decision tree for the classification of PTB based on risk factors. Logistic regression showed that multiple pregnancies, mothers with preeclampsia, and those who conceived with assisted reproductive technology had an increased risk for PTB ( p < 0.05). Identifying and training mothers at risk as well as improving prenatal care may reduce the PTB rate. We also recommend that statisticians utilize the logistic regression model for the classification of risk groups for PTB.

  8. Probability and predictors of cannabis use disorders relapse: results of the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC).

    PubMed

    Flórez-Salamanca, Ludwing; Secades-Villa, Roberto; Budney, Alan J; García-Rodríguez, Olaya; Wang, Shuai; Blanco, Carlos

    2013-09-01

    This study aims to estimate the odds and predictors of Cannabis Use Disorders (CUD) relapse among individuals in remission. Analyses were done on the subsample of individuals with lifetime history of a CUD (abuse or dependence) who were in full remission at baseline (Wave 1) of the National Epidemiological Survey of Alcohol and Related Conditions (NESARC) (n=2350). Univariate logistic regression models and hierarchical logistic regression model were implemented to estimate odds of relapse and identify predictors of relapse at 3 years follow up (Wave 2). The relapse rate of CUD was 6.63% over an average of 3.6 year follow-up period. In the multivariable model, the odds of relapse were inversely related to time in remission, whereas having a history of conduct disorder or a major depressive disorder after Wave 1 increased the risk of relapse. Our findings suggest that maintenance of remission is the most common outcome for individuals in remission from a CUD. Treatment approaches may improve rates of sustained remission of individuals with CUD and conduct disorder or major depressive disorder. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  9. Prediction of Return-to-original-work after an Industrial Accident Using Machine Learning and Comparison of Techniques

    PubMed Central

    2018-01-01

    Background Many studies have tried to develop predictors for return-to-work (RTW). However, since complex factors have been demonstrated to predict RTW, it is difficult to use them practically. This study investigated whether factors used in previous studies could predict whether an individual had returned to his/her original work by four years after termination of the worker's recovery period. Methods An initial logistic regression analysis of 1,567 participants of the fourth Panel Study of Worker's Compensation Insurance yielded odds ratios. The participants were divided into two subsets, a training dataset and a test dataset. Using the training dataset, logistic regression, decision tree, random forest, and support vector machine models were established, and important variables of each model were identified. The predictive abilities of the different models were compared. Results The analysis showed that only earned income and company-related factors significantly affected return-to-original-work (RTOW). The random forest model showed the best accuracy among the tested machine learning models; however, the difference was not prominent. Conclusion It is possible to predict a worker's probability of RTOW using machine learning techniques with moderate accuracy. PMID:29736160

  10. Social determinants of cataract surgery utilization in south India. The Operations Research Group.

    PubMed

    Brilliant, G E; Lepkowski, J M; Zurita, B; Thulasiraj, R D

    1991-04-01

    A field trial was conducted to compare the effects of eight health education and economic incentive interventions on the awareness and acceptance of cataract surgery. Cataract screening and follow-up surgery were offered to more than 19,000 residents age 40 years and older in a probability sample of 90 villages in south India. Eight months after intervention, an evaluation was conducted to identify those in need of surgery who had been operated on. Two principal measures of program effectiveness are examined: awareness of cataract surgery and acceptance of the surgery. The type of intervention had a negligible effect on awareness of cataract surgery. A multiple logistic regression analysis revealed that individuals who were aware of surgery tended to be male, literate, and more affluent than those who were unaware of that option. Interventions that covered the complete costs of surgery had higher surgery acceptance rates. One health education strategy, house-to-house visits by a subject with aphakia, increased acceptance of the procedure more than others. In a multiple logistic regression analysis of acceptance rates, persons accepting surgery tended to be male; other factors were not important in explaining variation in acceptance rates.

  11. Bayesian logistic regression approaches to predict incorrect DRG assignment.

    PubMed

    Suleiman, Mani; Demirhan, Haydar; Boyd, Leanne; Girosi, Federico; Aksakalli, Vural

    2018-05-07

    Episodes of care involving similar diagnoses and treatments and requiring similar levels of resource utilisation are grouped to the same Diagnosis-Related Group (DRG). In jurisdictions which implement DRG based payment systems, DRGs are a major determinant of funding for inpatient care. Hence, service providers often dedicate auditing staff to the task of checking that episodes have been coded to the correct DRG. The use of statistical models to estimate an episode's probability of DRG error can significantly improve the efficiency of clinical coding audits. This study implements Bayesian logistic regression models with weakly informative prior distributions to estimate the likelihood that episodes require a DRG revision, comparing these models with each other and to classical maximum likelihood estimates. All Bayesian approaches had more stable model parameters than maximum likelihood. The best performing Bayesian model improved overall classification per- formance by 6% compared to maximum likelihood, with a 34% gain compared to random classification, respectively. We found that the original DRG, coder and the day of coding all have a significant effect on the likelihood of DRG error. Use of Bayesian approaches has improved model parameter stability and classification accuracy. This method has already lead to improved audit efficiency in an operational capacity.

  12. Insurance premiums and insurance coverage of near-poor children.

    PubMed

    Hadley, Jack; Reschovsky, James D; Cunningham, Peter; Kenney, Genevieve; Dubay, Lisa

    States increasingly are using premiums for near-poor children in their public insurance programs (Medicaid/SCHIP) to limit private insurance crowd-out and constrain program costs. Using national data from four rounds of the Community Tracking Study Household Surveys spanning the seven years from 1996 to 2003, this study estimates a multinomial logistic regression model examining how public and private insurance premiums affect insurance coverage outcomes (Medicaid/SCHIP coverage, private coverage, and no coverage). Higher public premiums are significantly associated with a lower probability of public coverage and higher probabilities of private coverage and uninsurance; higher private premiums are significantly related to a lower probability of private coverage and higher probabilities of public coverage and uninsurance. The results imply that uninsurance rates will rise if both public and private premiums increase, and suggest that states that impose or increase public insurance premiums for near-poor children will succeed in discouraging crowd-out of private insurance, but at the expense of higher rates of uninsurance. Sustained increases in private insurance premiums will continue to create enrollment pressures on state insurance programs for children.

  13. A Competing Risk Model of First Failure Site after Definitive Chemoradiation Therapy for Locally Advanced Non-Small Cell Lung Cancer.

    PubMed

    Nygård, Lotte; Vogelius, Ivan R; Fischer, Barbara M; Kjær, Andreas; Langer, Seppo W; Aznar, Marianne C; Persson, Gitte F; Bentzen, Søren M

    2018-04-01

    The aim of the study was to build a model of first failure site- and lesion-specific failure probability after definitive chemoradiotherapy for inoperable NSCLC. We retrospectively analyzed 251 patients receiving definitive chemoradiotherapy for NSCLC at a single institution between 2009 and 2015. All patients were scanned by fludeoxyglucose positron emission tomography/computed tomography for radiotherapy planning. Clinical patient data and fludeoxyglucose positron emission tomography standardized uptake values from primary tumor and nodal lesions were analyzed by using multivariate cause-specific Cox regression. In patients experiencing locoregional failure, multivariable logistic regression was applied to assess risk of each lesion being the first site of failure. The two models were used in combination to predict probability of lesion failure accounting for competing events. Adenocarcinoma had a lower hazard ratio (HR) of locoregional failure than squamous cell carcinoma (HR = 0.45, 95% confidence interval [CI]: 0.26-0.76, p = 0.003). Distant failures were more common in the adenocarcinoma group (HR = 2.21, 95% CI: 1.41-3.48, p < 0.001). Multivariable logistic regression of individual lesions at the time of first failure showed that primary tumors were more likely to fail than lymph nodes (OR = 12.8, 95% CI: 5.10-32.17, p < 0.001). Increasing peak standardized uptake value was significantly associated with lesion failure (OR = 1.26 per unit increase, 95% CI: 1.12-1.40, p < 0.001). The electronic model is available at http://bit.ly/LungModelFDG. We developed a failure site-specific competing risk model based on patient- and lesion-level characteristics. Failure patterns differed between adenocarcinoma and squamous cell carcinoma, illustrating the limitation of aggregating them into NSCLC. Failure site-specific models add complementary information to conventional prognostic models. Copyright © 2018 International Association for the Study of Lung Cancer. Published by Elsevier Inc. All rights reserved.

  14. Parameter estimation in Cox models with missing failure indicators and the OPPERA study.

    PubMed

    Brownstein, Naomi C; Cai, Jianwen; Slade, Gary D; Bair, Eric

    2015-12-30

    In a prospective cohort study, examining all participants for incidence of the condition of interest may be prohibitively expensive. For example, the "gold standard" for diagnosing temporomandibular disorder (TMD) is a physical examination by a trained clinician. In large studies, examining all participants in this manner is infeasible. Instead, it is common to use questionnaires to screen for incidence of TMD and perform the "gold standard" examination only on participants who screen positively. Unfortunately, some participants may leave the study before receiving the "gold standard" examination. Within the framework of survival analysis, this results in missing failure indicators. Motivated by the Orofacial Pain: Prospective Evaluation and Risk Assessment (OPPERA) study, a large cohort study of TMD, we propose a method for parameter estimation in survival models with missing failure indicators. We estimate the probability of being an incident case for those lacking a "gold standard" examination using logistic regression. These estimated probabilities are used to generate multiple imputations of case status for each missing examination that are combined with observed data in appropriate regression models. The variance introduced by the procedure is estimated using multiple imputation. The method can be used to estimate both regression coefficients in Cox proportional hazard models as well as incidence rates using Poisson regression. We simulate data with missing failure indicators and show that our method performs as well as or better than competing methods. Finally, we apply the proposed method to data from the OPPERA study. Copyright © 2015 John Wiley & Sons, Ltd.

  15. Evaluating the effect of a third-party implementation of resolution recovery on the quality of SPECT bone scan imaging using visual grading regression.

    PubMed

    Hay, Peter D; Smith, Julie; O'Connor, Richard A

    2016-02-01

    The aim of this study was to evaluate the benefits to SPECT bone scan image quality when applying resolution recovery (RR) during image reconstruction using software provided by a third-party supplier. Bone SPECT data from 90 clinical studies were reconstructed retrospectively using software supplied independent of the gamma camera manufacturer. The current clinical datasets contain 120×10 s projections and are reconstructed using an iterative method with a Butterworth postfilter. Five further reconstructions were created with the following characteristics: 10 s projections with a Butterworth postfilter (to assess intraobserver variation); 10 s projections with a Gaussian postfilter with and without RR; and 5 s projections with a Gaussian postfilter with and without RR. Two expert observers were asked to rate image quality on a five-point scale relative to our current clinical reconstruction. Datasets were anonymized and presented in random order. The benefits of RR on image scores were evaluated using ordinal logistic regression (visual grading regression). The application of RR during reconstruction increased the probability of both observers of scoring image quality as better than the current clinical reconstruction even where the dataset contained half the normal counts. Type of reconstruction and observer were both statistically significant variables in the ordinal logistic regression model. Visual grading regression was found to be a useful method for validating the local introduction of technological developments in nuclear medicine imaging. RR, as implemented by the independent software supplier, improved bone SPECT image quality when applied during image reconstruction. In the majority of clinical cases, acquisition times for bone SPECT intended for the purposes of localization can safely be halved (from 10 s projections to 5 s) when RR is applied.

  16. Self-perceived health among Eastern European immigrants over 50 living in Western Europe.

    PubMed

    Lanari, D; Bussini, O; Minelli, L

    2015-01-01

    This paper examines whether Eastern European immigrants aged 50 and over living in Northern and Western Europe face a health disadvantage in terms of self-perceived health, with respect to the native-born. We also examined health changes over time (2004-2006-2010) through the probabilities of transition among self-perceived health states, and how they vary according to nativity status and age group. Data were obtained from the Survey of Health, Ageing and Retirement in Europe (SHARE). Logistic regressions and probabilities of transition were used. Results emphasise the health disadvantage of Eastern European immigrants living in Germany, France and  Sweden with respect to the native-born, even after controlling for socio-economic status. Probabilities of transition also evidenced that people born in Eastern Europe were more likely to experience worsening health and less likely to recover from sickness. This paper suggests that health inequalities do not affect immigrant groups in equal measure and confirm the poorer and more steeply deteriorating health status of Eastern European immigrants.

  17. Modeling rural landowners' hunter access policies in East Texas, USA

    NASA Astrophysics Data System (ADS)

    Wright, Brett A.; Fesenmaier, Daniel R.

    1988-03-01

    Private landowners in East Texas, USA, were aggregated into one of four policy categories according to the degree of access allowed to their lands for hunting. Based on these categories, a logistic regression model of possible determinants of access policy was developed and probabilities of policy adoption were calculated. Overwhelmingly, attitudes toward hunting as a sport, incentives, and control over the actions of hunters were most predictive of landowners' policies. Additionally, the availability of deer was found to be negatively correlated with access, thereby suggesting management efforts to increase deer populations may be counter to increasing access. Further, probabilities derived from the model indicated that there was almost a 7 in 10 chance (0.66) that landowners would adopt policies commensurate with allowing family and personal acquaintances to hunt on their property. However, the probability of increasing access beyond this level, where access was provided for the general public, dropped off drastically to less than 5% (0.04).

  18. Maximum ikelihood estimation for the double-count method with independent observers

    USGS Publications Warehouse

    Manly, Bryan F.J.; McDonald, Lyman L.; Garner, Gerald W.

    1996-01-01

    Data collected under a double-count protocol during line transect surveys were analyzed using new maximum likelihood methods combined with Akaike's information criterion to provide estimates of the abundance of polar bear (Ursus maritimus Phipps) in a pilot study off the coast of Alaska. Visibility biases were corrected by modeling the detection probabilities using logistic regression functions. Independent variables that influenced the detection probabilities included perpendicular distance of bear groups from the flight line and the number of individuals in the groups. A series of models were considered which vary from (1) the simplest, where the probability of detection was the same for both observers and was not affected by either distance from the flight line or group size, to (2) models where probability of detection is different for the two observers and depends on both distance from the transect and group size. Estimation procedures are developed for the case when additional variables may affect detection probabilities. The methods are illustrated using data from the pilot polar bear survey and some recommendations are given for design of a survey over the larger Chukchi Sea between Russia and the United States.

  19. [Investigation of health literacy and enterprise provided health service utilization among migrants in construction site].

    PubMed

    Jiang, Ying; Zeng, Qingqi; Ji, Ying; Wang, Yanling; Zheng, Yunting; Chang, Chun

    2015-01-01

    To investigate health literacy and enterprise provided health service utilization among migrants in construction sites and explore the influencing factors of enterprise provided health service utilization. All 652 migrants in 10 construction sites in Xi'an and Tongchuan were selected using stratified cluster sampling method, and health literacy level, occupational health awareness and enterprise provided health service utilization of migrants were investigated in 2013 April to June.Score and pass rate was used to describe status of health literacy and occupational health awareness of migrants. Chi-square was used to analyze the difference of occupational health awareness and enterprise provided health service utilization between migrants of different levels of health literacy. And logistic regression was used to analyze the influencing factors of enterprise provided health service utilization. Average score of health literacy among migrants in construction site was (3.75 ± 2.17) (9 score totally). Migrants who knew enterprise should provide health training, physical examination, safety training, occupational protection and pay health insurance for workers accounted for 28.2% (174/616), 43.5% (268/616), 52.8% (325/616), 54.9% (338/616) and 37.7% (230/616) respectively, and the percentage of migrants who thought there were noise and dust in their working environment were 46.4% (201/627) and 44.8% (281/627) respectively.61.1% (373/610) received none of health training, occupational training, physical examination and first-aid kit, and only 0.8% (5/610) had utilized all of the above health service in workplace. And logistic regression showed that migrants whose health literacy score was higher than 5 had 1.819 times probability to utilize enterprise provided health service (OR = 1.82, 95%CI:1.13-2.92) , and migrants who were educated for more than 13 years had 3.812 times probability to utilize enterprise provided health service than those who were educated for less than 6 years (OR = 3.81, 95%CI:1.75-8.31) .However, occupational health awareness had no significant influence to the utility of enterprise provided health service utilization in logistic regression (χ(2) = 3.50, P = 0.061). Occupational health awareness and enterprise provided health service utilization were both low among migrants in construction site, level of health literacy and school years were the main factors that influence enterprise provided health service utilization.

  20. Automatic prediction of solar flares and super geomagnetic storms

    NASA Astrophysics Data System (ADS)

    Song, Hui

    Space weather is the response of our space environment to the constantly changing Sun. As the new technology advances, mankind has become more and more dependent on space system, satellite-based services. A geomagnetic storm, a disturbance in Earth's magnetosphere, may produce many harmful effects on Earth. Solar flares and Coronal Mass Ejections (CMEs) are believed to be the major causes of geomagnetic storms. Thus, establishing a real time forecasting method for them is very important in space weather study. The topics covered in this dissertation are: the relationship between magnetic gradient and magnetic shear of solar active regions; the relationship between solar flare index and magnetic features of solar active regions; based on these relationships a statistical ordinal logistic regression model is developed to predict the probability of solar flare occurrences in the next 24 hours; and finally the relationship between magnetic structures of CME source regions and geomagnetic storms, in particular, the super storms when the D st index decreases below -200 nT is studied and proved to be able to predict those super storms. The results are briefly summarized as follows: (1) There is a significant correlation between magnetic gradient and magnetic shear of active region. Furthermore, compared with magnetic shear, magnetic gradient might be a better proxy to locate where a large flare occurs. It appears to be more accurate in identification of sources of X-class flares than M-class flares; (2) Flare index, defined by weighting the SXR flares, is proved to have positive correlation with three magnetic features of active region; (3) A statistical ordinal logistic regression model is proposed for solar flare prediction. The results are much better than those data published in the NASA/SDAC service, and comparable to the data provided by the NOAA/SEC complicated expert system. To our knowledge, this is the first time that logistic regression model has been applied in solar physics to predict flare occurrences; (4) The magnetic orientation angle [straight theta], determined from a potential field model, is proved to be able to predict the probability of super geomagnetic storms (D= st <=-200nT). The results show that those active regions associated with | [straight theta]| < 90° are more likely to cause a super geomagnetic storm.

  1. Prediction of major complications after hepatectomy using liver stiffness values determined by magnetic resonance elastography.

    PubMed

    Sato, N; Kenjo, A; Kimura, T; Okada, R; Ishigame, T; Kofunato, Y; Shimura, T; Abe, K; Ohira, H; Marubashi, S

    2018-04-23

    Liver fibrosis is a risk factor for hepatectomy but cannot be determined accurately before hepatectomy because diagnostic procedures are too invasive. Magnetic resonance elastography (MRE) can determine liver stiffness (LS), a surrogate marker for assessing liver fibrosis, non-invasively. The aim of this study was to investigate whether the LS value determined by MRE is predictive of major complications after hepatectomy. This prospective study enrolled consecutive patients who underwent hepatic resection between April 2013 and August 2016. LS values were measured by imaging shear waves by MRE in the liver before hepatectomy. The primary endpoint was major complications, defined as Clavien-Dindo grade IIIa or above. Logistic regression analysis identified independent predictive factors, from which a logistic model to estimate the probability of major complications was constructed. A total of 96 patients were included in the study. Major complications were observed in 15 patients (16 per cent). Multivariable logistic analysis confirmed that higher LS value (P = 0·021) and serum albumin level (P = 0·009) were independent predictive factors for major complications after hepatectomy. Receiver operating characteristic (ROC) analysis showed that the best LS cut-off value was 4·3 kPa for detecting major complications, comparable to liver fibrosis grade F4, with a sensitivity of 80 per cent and specificity of 82 per cent. A logistic model using the LS value and serum albumin level to estimate the probability of major complications was constructed; the area under the ROC curve for predicting major complications was 0·84. The LS value determined by MRE in patients undergoing hepatectomy was an independent predictive factor for major complications. © 2018 BJS Society Ltd Published by John Wiley & Sons Ltd.

  2. Teenage smoking, attempts to quit, and school performance.

    PubMed Central

    Hu, T W; Lin, Z; Keeler, T E

    1998-01-01

    OBJECTIVES: This study examined the relationship between school performance, smoking, and quitting attempts among teenagers. METHODS: A logistic regression model was used to predict the probability of being a current smoker or a former smoker. Data were derived from the 1990 California Youth Tobacco Survey. RESULTS: Students' school performance was a key factor in predicting smoking and quitting attempts when other sociodemographic and family income factors were controlled. CONCLUSIONS: Developing academic or remedial classes designed to improve students' school performance may lead to a reduction in smoking rates among teenagers while simultaneously providing a human capital investment in their futures. PMID:9618625

  3. Migration intentions and illicit substance use among youth in central Mexico.

    PubMed

    Marsiglia, Flavio Francisco; Kulis, Stephen; Hoffman, Steven; Calderón-Tena, Carlos Orestes; Becerra, David; Alvarez, Diana

    2011-01-01

    This study explored intentions to emigrate and substance use among youth (ages 14-24) from a central Mexico state with high emigration rates. Questionnaires were completed in 2007 by 702 students attending a probability sample of alternative secondary schools serving remote or poor communities. Linear and logistic regression analyses indicated that stronger intentions to emigrate predicted greater access to drugs, drug offers, and use of illicit drugs (marijuana, cocaine, inhalants), but not alcohol or cigarettes. Results are related to the healthy migrant theory and its applicability to youth with limited educational opportunities. The study's limitations are noted.

  4. Sublethal foot-predation on Donacidae (Mollusca: Bivalvia)

    NASA Astrophysics Data System (ADS)

    Salas, Carmen; Tirado, Cristina; Manjón-Cabeza, Maria Eugenia

    2001-08-01

    The incidence of foot nipping was studied on the Donax spp. of the littoral of Málaga (Southern Spain, 2875 specimens collected from February 1990 to January 1991) and of Ré island (French Atlantic coast, 262 specimens of Donax vittatus (Da Costa, 1778) collected in May 1996). In Málaga, Donax trunculus L., 1758 was the species most regularly nipped (18% of individuals), with peaks in summer (25% in August and 48% in September) and winter (34% in December). In Ré island, 27% of the specimens showed a nipped foot. Logistic regression shows that in D. trunculus length is the variable that most influences the probability of foot nipping, followed by weight and chlorophyll a. However, the difference in length between damaged and undamaged individuals was not significant (U-Mann-Whitney test). The size class frequency and the values of Ivlev's index show that the small size classes were avoided, while for the other size classes predation remained balanced throughout the year. Therefore, the avoidance of the small size classes makes length the most influential variable. The logistic regression indicated a coefficient B=-0.03 for weight. This implies a slightly negative influence on the probability of foot nipping. However, without the data of September, there is a positive correlation ( r=0.76, p<0.01) between the monthly percentages of predation and the flesh dry weight of a standard individual (25 mm long). The peak in September could be due to the recruitment peak of bivalves, which may have attracted more predators to the area, and/or to the recruitment of predators such as crabs to the swash zone. Logistic regression and test of comparison of percentages indicate that there was not any influence of the sex of an animal on the probability of foot nipping. Only in February was a significantly higher percentage ( p<0.05) of females nipped (44.44%) than the total of females in the sample (20.20%). The biomass (as flesh dry weight) of D. trunculus lost by foot nipping amounts to more than 20% in most of the size classes. There was an increase from the small sizes to the largest ones, in which it reaches 37%, with a positive correlation ( r=0.84; p<0.005) between size class and loss of biomass. Possible predators responsible for the foot nipping are crabs. Crab species usually found together with the donacids were Portumnus latipes (Pennant, 1777) Liocarcinus vernalis (Risso, 1816) and Atelecyclus undecimdentatus (Herbst, 1783). In aquarium experiments, they demonstrated an ability to nip the foot of clams. Portumnus latipes was the most active foot nipper, but left alive all the damaged clams. Therefore, we conclude that crabs are the most likely foot-nipping predators in the field.

  5. Logistic regression for dichotomized counts.

    PubMed

    Preisser, John S; Das, Kalyan; Benecha, Habtamu; Stamm, John W

    2016-12-01

    Sometimes there is interest in a dichotomized outcome indicating whether a count variable is positive or zero. Under this scenario, the application of ordinary logistic regression may result in efficiency loss, which is quantifiable under an assumed model for the counts. In such situations, a shared-parameter hurdle model is investigated for more efficient estimation of regression parameters relating to overall effects of covariates on the dichotomous outcome, while handling count data with many zeroes. One model part provides a logistic regression containing marginal log odds ratio effects of primary interest, while an ancillary model part describes the mean count of a Poisson or negative binomial process in terms of nuisance regression parameters. Asymptotic efficiency of the logistic model parameter estimators of the two-part models is evaluated with respect to ordinary logistic regression. Simulations are used to assess the properties of the models with respect to power and Type I error, the latter investigated under both misspecified and correctly specified models. The methods are applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren. © The Author(s) 2014.

  6. Predicting 30-day Hospital Readmission with Publicly Available Administrative Database. A Conditional Logistic Regression Modeling Approach.

    PubMed

    Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P

    2015-01-01

    This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of more than 10% over the standard classification models, which can be translated to correct labeling of additional 400 - 500 readmissions for heart failure patients in the state of California over a year. Lastly, several key predictor identified from the HCUP data include the disposition location from discharge, the number of chronic conditions, and the number of acute procedures. It would be beneficial to apply simple decision rules obtained from the decision tree in an ad-hoc manner to guide the cohort stratification. It could be potentially beneficial to explore the effect of pairwise interactions between influential predictors when building the logistic regression models for different data strata. Judicious use of the ad-hoc CLR models developed offers insights into future development of prediction models for hospital readmissions, which can lead to better intuition in identifying high-risk patients and developing effective post-discharge care strategies. Lastly, this paper is expected to raise the awareness of collecting data on additional markers and developing necessary database infrastructure for larger-scale exploratory studies on readmission risk prediction.

  7. Interpretation of commonly used statistical regression models.

    PubMed

    Kasza, Jessica; Wolfe, Rory

    2014-01-01

    A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.

  8. Patient-Centered Research

    PubMed Central

    Wicki, J; Perneger, TV; Junod, AF; Bounameaux, H; Perrier, A

    2000-01-01

    PURPOSE We aimed to develop a simple standardized clinical score to stratify emergency ward patients with clinically suspected PE into groups with a high, intermediate, or low probability of PE, in order to improve and simplify the diagnostic approach. METHODS Analysis of a database of 1090 consecutive patients admitted to the emergency ward for suspected PE, in whom diagnosis of PE was ruled in or out by a standard diagnostic algorithm. Logistic regression was used to predict clinical parameters associated with PE. RESULTS 296 out of 1090 patients (27%) were found to have PE. The optimal estimate of clinical probability was based on eight variables: recent surgery, previous thromboembolic event, older age, hypocapnia, hypoxemia, tachycardia, band atelectasis or elevation of a hemidiaphragm on chest X-ray. A probability score was calculated by adding points assigned to these variables. A cut-off score of 4 best identified patients with low probability of PE. 486 patients (49%) had a low clinical probability of PE (score < 4), of which 50 (10.3%) had a proven PE. The prevalence of PE was 38% in the 437 patients with an intermediate probability (score 5–8, n = 437) and 81% in the 63 patients with a high probability (score>9). CONCLUSION This clinical score, based on easily available and objective variables, provides a standardized assessment of the clinical probability of PE. Applying this score to emergency ward patients suspected of PE could allow a more efficient diagnostic process.

  9. Evaluation of logistic regression models and effect of covariates for case-control study in RNA-Seq analysis.

    PubMed

    Choi, Seung Hoan; Labadorf, Adam T; Myers, Richard H; Lunetta, Kathryn L; Dupuis, Josée; DeStefano, Anita L

    2017-02-06

    Next generation sequencing provides a count of RNA molecules in the form of short reads, yielding discrete, often highly non-normally distributed gene expression measurements. Although Negative Binomial (NB) regression has been generally accepted in the analysis of RNA sequencing (RNA-Seq) data, its appropriateness has not been exhaustively evaluated. We explore logistic regression as an alternative method for RNA-Seq studies designed to compare cases and controls, where disease status is modeled as a function of RNA-Seq reads using simulated and Huntington disease data. We evaluate the effect of adjusting for covariates that have an unknown relationship with gene expression. Finally, we incorporate the data adaptive method in order to compare false positive rates. When the sample size is small or the expression levels of a gene are highly dispersed, the NB regression shows inflated Type-I error rates but the Classical logistic and Bayes logistic (BL) regressions are conservative. Firth's logistic (FL) regression performs well or is slightly conservative. Large sample size and low dispersion generally make Type-I error rates of all methods close to nominal alpha levels of 0.05 and 0.01. However, Type-I error rates are controlled after applying the data adaptive method. The NB, BL, and FL regressions gain increased power with large sample size, large log2 fold-change, and low dispersion. The FL regression has comparable power to NB regression. We conclude that implementing the data adaptive method appropriately controls Type-I error rates in RNA-Seq analysis. Firth's logistic regression provides a concise statistical inference process and reduces spurious associations from inaccurately estimated dispersion parameters in the negative binomial framework.

  10. Statistical primer: propensity score matching and its alternatives.

    PubMed

    Benedetto, Umberto; Head, Stuart J; Angelini, Gianni D; Blackstone, Eugene H

    2018-06-01

    Propensity score (PS) methods offer certain advantages over more traditional regression methods to control for confounding by indication in observational studies. Although multivariable regression models adjust for confounders by modelling the relationship between covariates and outcome, the PS methods estimate the treatment effect by modelling the relationship between confounders and treatment assignment. Therefore, methods based on the PS are not limited by the number of events, and their use may be warranted when the number of confounders is large, or the number of outcomes is small. The PS is the probability for a subject to receive a treatment conditional on a set of baseline characteristics (confounders). The PS is commonly estimated using logistic regression, and it is used to match patients with similar distribution of confounders so that difference in outcomes gives unbiased estimate of treatment effect. This review summarizes basic concepts of the PS matching and provides guidance in implementing matching and other methods based on the PS, such as stratification, weighting and covariate adjustment.

  11. Differentially private distributed logistic regression using private and public data

    PubMed Central

    2014-01-01

    Background Privacy protecting is an important issue in medical informatics and differential privacy is a state-of-the-art framework for data privacy research. Differential privacy offers provable privacy against attackers who have auxiliary information, and can be applied to data mining models (for example, logistic regression). However, differentially private methods sometimes introduce too much noise and make outputs less useful. Given available public data in medical research (e.g. from patients who sign open-consent agreements), we can design algorithms that use both public and private data sets to decrease the amount of noise that is introduced. Methodology In this paper, we modify the update step in Newton-Raphson method to propose a differentially private distributed logistic regression model based on both public and private data. Experiments and results We try our algorithm on three different data sets, and show its advantage over: (1) a logistic regression model based solely on public data, and (2) a differentially private distributed logistic regression model based on private data under various scenarios. Conclusion Logistic regression models built with our new algorithm based on both private and public datasets demonstrate better utility than models that trained on private or public datasets alone without sacrificing the rigorous privacy guarantee. PMID:25079786

  12. A retrospective analysis to identify the factors affecting infection in patients undergoing chemotherapy.

    PubMed

    Park, Ji Hyun; Kim, Hyeon-Young; Lee, Hanna; Yun, Eun Kyoung

    2015-12-01

    This study compares the performance of the logistic regression and decision tree analysis methods for assessing the risk factors for infection in cancer patients undergoing chemotherapy. The subjects were 732 cancer patients who were receiving chemotherapy at K university hospital in Seoul, Korea. The data were collected between March 2011 and February 2013 and were processed for descriptive analysis, logistic regression and decision tree analysis using the IBM SPSS Statistics 19 and Modeler 15.1 programs. The most common risk factors for infection in cancer patients receiving chemotherapy were identified as alkylating agents, vinca alkaloid and underlying diabetes mellitus. The logistic regression explained 66.7% of the variation in the data in terms of sensitivity and 88.9% in terms of specificity. The decision tree analysis accounted for 55.0% of the variation in the data in terms of sensitivity and 89.0% in terms of specificity. As for the overall classification accuracy, the logistic regression explained 88.0% and the decision tree analysis explained 87.2%. The logistic regression analysis showed a higher degree of sensitivity and classification accuracy. Therefore, logistic regression analysis is concluded to be the more effective and useful method for establishing an infection prediction model for patients undergoing chemotherapy. Copyright © 2015 Elsevier Ltd. All rights reserved.

  13. Performance and strategy comparisons of human listeners and logistic regression in discriminating underwater targets.

    PubMed

    Yang, Lixue; Chen, Kean

    2015-11-01

    To improve the design of underwater target recognition systems based on auditory perception, this study compared human listeners with automatic classifiers. Performances measures and strategies in three discrimination experiments, including discriminations between man-made and natural targets, between ships and submarines, and among three types of ships, were used. In the experiments, the subjects were asked to assign a score to each sound based on how confident they were about the category to which it belonged, and logistic regression, which represents linear discriminative models, also completed three similar tasks by utilizing many auditory features. The results indicated that the performances of logistic regression improved as the ratio between inter- and intra-class differences became larger, whereas the performances of the human subjects were limited by their unfamiliarity with the targets. Logistic regression performed better than the human subjects in all tasks but the discrimination between man-made and natural targets, and the strategies employed by excellent human subjects were similar to that of logistic regression. Logistic regression and several human subjects demonstrated similar performances when discriminating man-made and natural targets, but in this case, their strategies were not similar. An appropriate fusion of their strategies led to further improvement in recognition accuracy.

  14. Factors affecting the probability of first year medical student dropout in the UK: a logistic analysis for the intake cohorts of 1980-92.

    PubMed

    Arulampalam, Wiji; Naylor, Robin; Smith, Jeremy

    2004-05-01

    In the context of the 1997 Report of the Medical Workforce Standing Advisory Committee, it is important that we develop an understanding of the factors influencing medical school retention rates. To analyse the determinants of the probability that an individual medical student will drop out of medical school during their first year of study. Binomial and multinomial logistic regression analysis of individual-level administrative data on 51 810 students in 21 medical schools in the UK for the intake cohorts of 1980-92 was performed. The overall average first year dropout rate over the period 1980-92 was calculated to be 3.8%. We found that the probability that a student would drop out of medical school during their first year of study was influenced significantly by both the subjects studied at A-level and by the scores achieved. For example, achieving 1 grade higher in biology, chemistry or physics reduced the dropout probability by 0.38% points, equivalent to a fall of 10%. We also found that males were about 8% more likely to drop out than females. The medical school attended also had a significant effect on the estimated dropout probability. Indicators of both the social class and the previous school background of the student were largely insignificant. Policies aimed at increasing the size of the medical student intake in the UK and of widening access to students from non-traditional backgrounds should be informed by evidence that student dropout probabilities are sensitive to measures of A-level attainment, such as subject studied and scores achieved. If traditional entry requirements or standards are relaxed, then this is likely to have detrimental effects on medical schools' retention rates unless accompanied by appropriate measures such as focussed student support.

  15. Probability Models Based on Soil Properties for Predicting Presence-Absence of Pythium in Soybean Roots.

    PubMed

    Zitnick-Anderson, Kimberly K; Norland, Jack E; Del Río Mendoza, Luis E; Fortuna, Ann-Marie; Nelson, Berlin D

    2017-10-01

    Associations between soil properties and Pythium groups on soybean roots were investigated in 83 commercial soybean fields in North Dakota. A data set containing 2877 isolates of Pythium which included 26 known spp. and 1 unknown spp. and 13 soil properties from each field were analyzed. A Pearson correlation analysis was performed with all soil properties to observe any significant correlation between properties. Hierarchical clustering, indicator spp., and multi-response permutation procedures were used to identify groups of Pythium. Logistic regression analysis using stepwise selection was employed to calculate probability models for presence of groups based on soil properties. Three major Pythium groups were identified and three soil properties were associated with these groups. Group 1, characterized by P. ultimum, was associated with zinc levels; as zinc increased, the probability of group 1 being present increased (α = 0.05). Pythium group 2, characterized by Pythium kashmirense and an unknown Pythium sp., was associated with cation exchange capacity (CEC) (α < 0.05); as CEC increased, these spp. increased. Group 3, characterized by Pythium heterothallicum and Pythium irregulare, were associated with CEC and calcium carbonate exchange (CCE); as CCE increased and CEC decreased, these spp. increased (α = 0.05). The regression models may have value in predicting pathogenic Pythium spp. in soybean fields in North Dakota and adjacent states.

  16. Emergency Assessment of Debris-Flow Hazards from Basins Burned by the Padua Fire of 2003, Southern California

    USGS Publications Warehouse

    Cannon, Susan H.; Gartner, Joseph E.; Rupert, Michael G.; Michael, John A.

    2004-01-01

    Results of a present preliminary assessment of the probability of debris-flow activity and estimates of peak discharges that can potentially be generated by debris flows issuing from basins burned by the Padua Fire of October 2003 in southern California in response to 25-year, 10-year, and 2-year recurrence, 1-hour duration rain storms are presented. The resulting probability maps are based on the application of a logistic multiple-regression model (Cannon and others, 2004) that describes the percent chance of debris-flow production from an individual basin as a function of burned extent, soil properties, basin gradients, and storm rainfall. The resulting peak discharge maps are based on application of a multiple-regression model (Cannon and others, 2004) that can be used to estimate debris-flow peak discharge at a basin outlet as a function of basin gradient, burn extent, and storm rainfall. Probabilities of debris-flow occurrence for the Padua Fire range between 0 and 99% and estimates of debris-flow peak discharges range between 1211 and 6,096 ft3/s (34 to 173 m3/s). These maps are intended to identify those basins that are most prone to the largest debris-flow events and provide information for the preliminary design of mitigation measures and for the planning of evacuation timing and routes.

  17. Unitary Response Regression Models

    ERIC Educational Resources Information Center

    Lipovetsky, S.

    2007-01-01

    The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…

  18. Binary logistic regression-Instrument for assessing museum indoor air impact on exhibits.

    PubMed

    Bucur, Elena; Danet, Andrei Florin; Lehr, Carol Blaziu; Lehr, Elena; Nita-Lazar, Mihai

    2017-04-01

    This paper presents a new way to assess the environmental impact on historical artifacts using binary logistic regression. The prediction of the impact on the exhibits during certain pollution scenarios (environmental impact) was calculated by a mathematical model based on the binary logistic regression; it allows the identification of those environmental parameters from a multitude of possible parameters with a significant impact on exhibitions and ranks them according to their severity effect. Air quality (NO 2 , SO 2 , O 3 and PM 2.5 ) and microclimate parameters (temperature, humidity) monitoring data from a case study conducted within exhibition and storage spaces of the Romanian National Aviation Museum Bucharest have been used for developing and validating the binary logistic regression method and the mathematical model. The logistic regression analysis was used on 794 data combinations (715 to develop of the model and 79 to validate it) by a Statistical Package for Social Sciences (SPSS 20.0). The results from the binary logistic regression analysis demonstrated that from six parameters taken into consideration, four of them present a significant effect upon exhibits in the following order: O 3 >PM 2.5 >NO 2 >humidity followed at a significant distance by the effects of SO 2 and temperature. The mathematical model, developed in this study, correctly predicted 95.1 % of the cumulated effect of the environmental parameters upon the exhibits. Moreover, this model could also be used in the decisional process regarding the preventive preservation measures that should be implemented within the exhibition space. The paper presents a new way to assess the environmental impact on historical artifacts using binary logistic regression. The mathematical model developed on the environmental parameters analyzed by the binary logistic regression method could be useful in a decision-making process establishing the best measures for pollution reduction and preventive preservation of exhibits.

  19. Determining factors influencing survival of breast cancer by fuzzy logistic regression model.

    PubMed

    Nikbakht, Roya; Bahrampour, Abbas

    2017-01-01

    Fuzzy logistic regression model can be used for determining influential factors of disease. This study explores the important factors of actual predictive survival factors of breast cancer's patients. We used breast cancer data which collected by cancer registry of Kerman University of Medical Sciences during the period of 2000-2007. The variables such as morphology, grade, age, and treatments (surgery, radiotherapy, and chemotherapy) were applied in the fuzzy logistic regression model. Performance of model was determined in terms of mean degree of membership (MDM). The study results showed that almost 41% of patients were in neoplasm and malignant group and more than two-third of them were still alive after 5-year follow-up. Based on the fuzzy logistic model, the most important factors influencing survival were chemotherapy, morphology, and radiotherapy, respectively. Furthermore, the MDM criteria show that the fuzzy logistic regression have a good fit on the data (MDM = 0.86). Fuzzy logistic regression model showed that chemotherapy is more important than radiotherapy in survival of patients with breast cancer. In addition, another ability of this model is calculating possibilistic odds of survival in cancer patients. The results of this study can be applied in clinical research. Furthermore, there are few studies which applied the fuzzy logistic models. Furthermore, we recommend using this model in various research areas.

  20. A comparison between univariate probabilistic and multivariate (logistic regression) methods for landslide susceptibility analysis: the example of the Febbraro valley (Northern Alps, Italy)

    NASA Astrophysics Data System (ADS)

    Rossi, M.; Apuani, T.; Felletti, F.

    2009-04-01

    The aim of this paper is to compare the results of two statistical methods for landslide susceptibility analysis: 1) univariate probabilistic method based on landslide susceptibility index, 2) multivariate method (logistic regression). The study area is the Febbraro valley, located in the central Italian Alps, where different types of metamorphic rocks croup out. On the eastern part of the studied basin a quaternary cover represented by colluvial and secondarily, by glacial deposits, is dominant. In this study 110 earth flows, mainly located toward NE portion of the catchment, were analyzed. They involve only the colluvial deposits and their extension mainly ranges from 36 to 3173 m2. Both statistical methods require to establish a spatial database, in which each landslide is described by several parameters that can be assigned using a main scarp central point of landslide. The spatial database is constructed using a Geographical Information System (GIS). Each landslide is described by several parameters corresponding to the value of main scarp central point of the landslide. Based on bibliographic review a total of 15 predisposing factors were utilized. The width of the intervals, in which the maps of the predisposing factors have to be reclassified, has been defined assuming constant intervals to: elevation (100 m), slope (5 °), solar radiation (0.1 MJ/cm2/year), profile curvature (1.2 1/m), tangential curvature (2.2 1/m), drainage density (0.5), lineament density (0.00126). For the other parameters have been used the results of the probability-probability plots analysis and the statistical indexes of landslides site. In particular slope length (0 ÷ 2, 2 ÷ 5, 5 ÷ 10, 10 ÷ 20, 20 ÷ 35, 35 ÷ 260), accumulation flow (0 ÷ 1, 1 ÷ 2, 2 ÷ 5, 5 ÷ 12, 12 ÷ 60, 60 ÷27265), Topographic Wetness Index 0 ÷ 0.74, 0.74 ÷ 1.94, 1.94 ÷ 2.62, 2.62 ÷ 3.48, 3.48 ÷ 6,00, 6.00 ÷ 9.44), Stream Power Index (0 ÷ 0.64, 0.64 ÷ 1.28, 1.28 ÷ 1.81, 1.81 ÷ 4.20, 4.20 ÷ 9.40). Geological map and land use map were also used, considering geological and land use properties as categorical variables. Appling the univariate probabilistic method the Landslide Susceptibility Index (LSI) is defined as the sum of the ratio Ra/Rb calculated for each predisposing factor, where Ra is the ratio between number of pixel of class and the total number of pixel of the study area, and Rb is the ratio between number of landslides respect to the pixel number of the interval area. From the analysis of the Ra/Rb ratio the relationship between landslide occurrence and predisposing factors were defined. Then the equation of LSI was used in GIS to trace the landslide susceptibility maps. The multivariate method for landslide susceptibility analysis, based on logistic regression, was performed starting from the density maps of the predisposing factors, calculated with the intervals defined above using the equation Rb/Rbtot, where Rbtot is a sum of all Rb values. Using stepwise forward algorithms the logistic regression was performed in two successive steps: first a univariate logistic regression is used to choose the most significant predisposing factors, then the multivariate logistic regression can be performed. The univariate regression highlighted the importance of the following factors: elevation, accumulation flow, drainage density, lineament density, geology and land use. When the multivariate regression was applied the number of controlling factors was reduced neglecting the geological properties. The resulting final susceptibility equation is: P = 1 / (1 + exp-(6.46-22.34*elevation-5.33*accumulation flow-7.99* drainage density-4.47*lineament density-17.31*land use)) and using this equation the susceptibility maps were obtained. To easy compare the results of the two methodologies, the susceptibility maps were reclassified in five susceptibility intervals (very high, high, moderate, low and very low) using natural breaks. Then the maps were validated using two cumulative distribution curves, one related to the landslides (number of landslides in each susceptibility class) and one to the basin (number of pixel covering each class). Comparing the curves for each method, it results that the two approaches (univariate and multivariate) are appropriate, providing acceptable results. In both maps the distribution of high susceptibility condition is mainly localized on the left slope of the catchment in agreement with the field evidences. The comparison between the methods was obtained by subtraction of the two maps. This operation shows that about 40% of the basin is classified by the same class of susceptibility. In general the univariate probabilistic method tends to overestimate the areal extension of the high susceptibility class with respect to the maps obtained by the logistic regression method.

  1. Impact of literacy and years of education on the diagnosis of dementia: A population-based study.

    PubMed

    Contador, Israel; Del Ser, Teodoro; Llamas, Sara; Villarejo, Alberto; Benito-León, Julián; Bermejo-Pareja, Félix

    2017-03-01

    The effect of different educational indices on clinical diagnosis of dementia requires more investigation. We compared the differential influence of two educational indices (EIs): years of schooling and level of education (i.e., null/low literacy, can read and write, primary school, and secondary school) on global cognition, functional performance, and the probability of having a dementia diagnosis. A total of 3,816 participants were selected from the population-based study of older adults "Neurological Disorders in Central Spain" (NEDICES). The 37-item version of the Mini-Mental State Examination (MMSE-37) and the Pfeffer's questionnaire were applied to assess cognitive and functional performance, respectively. The diagnosis of dementia was performed by expert neurologists according to Diagnostic and Statistical Manual of Mental Disorders-Fourth Edition (DSM-IV) criteria. Logistic regression models adjusted for potential confounders were carried out to test the association between the two EIs and dementia diagnosis. Both EIs were significantly associated with cognitive and functional scores, but individuals with null/low literacy performed significantly worse on MMSE-37 than literates when these groups were compared in terms of years of schooling. The two EIs were also related to an increased probability of dementia diagnosis in logistic models, but the association's strength was stronger for level of education than for years of schooling. Literacy predicted cognitive performance over and above the years of schooling. Lower education increases the probability of having a dementia diagnosis but the impact of different EIs is not uniform.

  2. Forecasting Lightning at Kennedy Space Center/Cape Canaveral Air Force Station, Florida

    NASA Technical Reports Server (NTRS)

    Lambert, Winfred; Wheeler, Mark; Roeder, William

    2005-01-01

    The Applied Meteorology Unit (AMU) developed a set of statistical forecast equations that provide a probability of lightning occurrence on Kennedy Space Center (KSC) I Cape Canaveral Air Force Station (CCAFS) for the day during the warm season (May September). The 45th Weather Squadron (45 WS) forecasters at CCAFS in Florida include a probability of lightning occurrence in their daily 24-hour and weekly planning forecasts, which are briefed at 1100 UTC (0700 EDT). This information is used for general scheduling of operations at CCAFS and KSC. Forecasters at the Spaceflight Meteorology Group also make thunderstorm forecasts for the KSC/CCAFS area during Shuttle flight operations. Much of the current lightning probability forecast at both groups is based on a subjective analysis of model and observational data. The objective tool currently available is the Neumann-Pfeffer Thunderstorm Index (NPTI, Neumann 1971), developed specifically for the KSCICCAFS area over 30 years ago. However, recent studies have shown that 1-day persistence provides a better forecast than the NPTI, indicating that the NPTI needed to be upgraded or replaced. Because they require a tool that provides a reliable estimate of the daily thunderstorm probability forecast, the 45 WS forecasters requested that the AMU develop a new lightning probability forecast tool using recent data and more sophisticated techniques now possible through more computing power than that available over 30 years ago. The equation development incorporated results from two research projects that investigated causes of lightning occurrence near KSCICCAFS and over the Florida peninsula. One proved that logistic regression outperformed the linear regression method used in NPTI, even when the same predictors were used. The other study found relationships between large scale flow regimes and spatial lightning distributions over Florida. Lightning, probabilities based on these flow regimes were used as candidate predictors in the equation development. Fifteen years (1 989-2003) of warm season data were used to develop the forecast equations. The data sources included a local network of cloud-to-ground lightning sensors called the Cloud-to-Ground Lightning Surveillance System (CGLSS), 1200 UTC Florida synoptic soundings, and the 1000 UTC CCAFS sounding. Data from CGLSS were used to determine lightning occurrence for each day. The 1200 UTC soundings were used to calculate the synoptic-scale flow regimes and the 1000 UTC soundings were used to calculate local stability parameters, which were used as candidate predictors of lightning occurrence. Five logistic regression forecast equations were created through careful selection and elimination of the candidate predictors. The resulting equations contain five to six predictors each. Results from four performance tests indicated that the equations showed an increase in skill over several standard forecasting methods, good reliability, an ability to distinguish between non-lightning and lightning days, and good accuracy measures and skill scores. Given the overall good performance the 45 WS requested that the equations be transitioned to operations and added to the current set of tools used to determine the daily lightning probability of occurrence.

  3. Classification of pregnancies of unknown location according to four different hCG-based protocols.

    PubMed

    Fistouris, J; Bergh, C; Strandell, A

    2016-10-01

    How do four protocols based on serial human chorionic gonadotropin (hCG) measurements perform when classifying pregnancies of unknown location (PULs) as low or high risk of being an ectopic pregnancy (EP)? The use of cut-offs in hCG level changes published by NICE, and a logistic regression model, M4, correctly classify more PULs as high risk, compared with two other protocols. A logistic regression model, M4, based on the mean of two consecutive hCG values and the hCG ratio (hCG 48 h/hCG 0 h) that classify PULs into low- and high-risk groups for triage purposes, identifies more EPs than a protocol using the cut-offs between a 13% decline and a 66% rise in hCG levels over 48 h. A retrospective comparative study of four different hCG-based protocols classifying PULs as low or high risk of being an EP was performed at a gynaecological emergency unit over 3 years. We identified 915 women with a PUL. Initial transvaginal ultrasonography (TVS) findings categorised 187 of the PULs as probable intrauterine pregnancies (IUPs) and 16 as probable EPs. The rate of change in hCG levels over 48 h was calculated for each patient and subjected to three different hCG threshold intervals and a logistic regression model for outcome prediction. Each PUL was subsequently dichotomised to either low-risk (i.e. failed PUL/IUP) or high-risk (i.e. EP) classification, which allowed us to compare the diagnostic performance. In 'Protocol A', a PUL was classified as low risk if >13% hCG level decline or >66% hCG level rise was achieved; otherwise, the PUL was classified as high risk of being an EP. 'Protocol B' classified a PUL as low or high risk using cut-offs of 35-50% declining hCG levels and of 53% rising hCG levels. Similarly, 'Protocol C' used hCG level cut-offs published by NICE, 50% for declining hCG levels and 63% for rising hCG levels. Finally, if a logistic regression model 'Protocol M4' calculated a ≥5% risk of the PUL being an EP, it was classified as high risk, and otherwise the PUL was classified as low risk. When the time interval between two hCG measurements failed to meet an exact 48 h, extrapolation and interpolation of hCG values was made, using log linear transformation. Protocols A, B, C and M4 classified 73, 66, 55 and 56% of PULs as low risk. The sensitivity for protocols A, B, C and M4 was 68% (95% confidence interval (CI) 61-75%), 81% (74-86%), 87% (82-92%) and 88% (83-93%), respectively. The specificity was 82% (80-85%), 77% (74-80%), 66% (62-69%) and 67% (63-70%) for protocols A, B, C and M4, respectively. All comparisons of sensitivity and specificity between the protocols were statistically significant except for protocol C versus protocol M4. In protocol C, 87% (66-97%) of misclassified EPs had rising hCG levels, compared with 19% (6-41%) for protocol M4 (P < 0.01). In a secondary analysis excluding probable IUPs and probable EPs, the results for 712 PULs were analysed. The sensitivity subsequently remained stable for all protocols. Protocol M4 reached a 78% (74-81%) specificity, which was significantly higher than 70% (66-74%) for protocol C (P = 0.01) and protocol M4 classified 63% of PULs as low risk compared with 58% for protocol C. The retrospective design of the study is a limitation. The results are derived from a population where laparoscopy played an important role in PUL management and diagnosis of EPs, although it did reflect real clinical practice. Although we tried to adhere to definitions of PUL and final outcomes as in previous studies and a recent consensus statement, potential differences in this regard must be acknowledged. Where the time interval between two serial hCG measurements deviated from 48 h we estimated 48 h hCG values. A logistic regression model, M4, classifies more PULs correctly as low risk in a selected PUL population without probable IUPs and EPs and identifies as many EPs, in comparison with the cut-offs available in the NICE guideline. This advantage for model M4 may result in a reduction of unnecessary follow-up visits, when fewer low-risk PULs are misclassified as high risk. These findings, however, ought to be clarified in a randomised controlled trial. The study was supported by LUA/ALF grant No. 70940. There are no competing interests. © The Author 2016. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  4. Mixed conditional logistic regression for habitat selection studies.

    PubMed

    Duchesne, Thierry; Fortin, Daniel; Courbin, Nicolas

    2010-05-01

    1. Resource selection functions (RSFs) are becoming a dominant tool in habitat selection studies. RSF coefficients can be estimated with unconditional (standard) and conditional logistic regressions. While the advantage of mixed-effects models is recognized for standard logistic regression, mixed conditional logistic regression remains largely overlooked in ecological studies. 2. We demonstrate the significance of mixed conditional logistic regression for habitat selection studies. First, we use spatially explicit models to illustrate how mixed-effects RSFs can be useful in the presence of inter-individual heterogeneity in selection and when the assumption of independence from irrelevant alternatives (IIA) is violated. The IIA hypothesis states that the strength of preference for habitat type A over habitat type B does not depend on the other habitat types also available. Secondly, we demonstrate the significance of mixed-effects models to evaluate habitat selection of free-ranging bison Bison bison. 3. When movement rules were homogeneous among individuals and the IIA assumption was respected, fixed-effects RSFs adequately described habitat selection by simulated animals. In situations violating the inter-individual homogeneity and IIA assumptions, however, RSFs were best estimated with mixed-effects regressions, and fixed-effects models could even provide faulty conclusions. 4. Mixed-effects models indicate that bison did not select farmlands, but exhibited strong inter-individual variations in their response to farmlands. Less than half of the bison preferred farmlands over forests. Conversely, the fixed-effect model simply suggested an overall selection for farmlands. 5. Conditional logistic regression is recognized as a powerful approach to evaluate habitat selection when resource availability changes. This regression is increasingly used in ecological studies, but almost exclusively in the context of fixed-effects models. Fitness maximization can imply differences in trade-offs among individuals, which can yield inter-individual differences in selection and lead to departure from IIA. These situations are best modelled with mixed-effects models. Mixed-effects conditional logistic regression should become a valuable tool for ecological research.

  5. Advanced colorectal neoplasia risk stratification by penalized logistic regression.

    PubMed

    Lin, Yunzhi; Yu, Menggang; Wang, Sijian; Chappell, Richard; Imperiale, Thomas F

    2016-08-01

    Colorectal cancer is the second leading cause of death from cancer in the United States. To facilitate the efficiency of colorectal cancer screening, there is a need to stratify risk for colorectal cancer among the 90% of US residents who are considered "average risk." In this article, we investigate such risk stratification rules for advanced colorectal neoplasia (colorectal cancer and advanced, precancerous polyps). We use a recently completed large cohort study of subjects who underwent a first screening colonoscopy. Logistic regression models have been used in the literature to estimate the risk of advanced colorectal neoplasia based on quantifiable risk factors. However, logistic regression may be prone to overfitting and instability in variable selection. Since most of the risk factors in our study have several categories, it was tempting to collapse these categories into fewer risk groups. We propose a penalized logistic regression method that automatically and simultaneously selects variables, groups categories, and estimates their coefficients by penalizing the [Formula: see text]-norm of both the coefficients and their differences. Hence, it encourages sparsity in the categories, i.e. grouping of the categories, and sparsity in the variables, i.e. variable selection. We apply the penalized logistic regression method to our data. The important variables are selected, with close categories simultaneously grouped, by penalized regression models with and without the interactions terms. The models are validated with 10-fold cross-validation. The receiver operating characteristic curves of the penalized regression models dominate the receiver operating characteristic curve of naive logistic regressions, indicating a superior discriminative performance. © The Author(s) 2013.

  6. Spatial patterns of high Aedes aegypti oviposition activity in northwestern Argentina.

    PubMed

    Estallo, Elizabet Lilia; Más, Guillermo; Vergara-Cid, Carolina; Lanfri, Mario Alberto; Ludueña-Almeida, Francisco; Scavuzzo, Carlos Marcelo; Introini, María Virginia; Zaidenberg, Mario; Almirón, Walter Ricardo

    2013-01-01

    In Argentina, dengue has affected mainly the Northern provinces, including Salta. The objective of this study was to analyze the spatial patterns of high Aedes aegypti oviposition activity in San Ramón de la Nueva Orán, northwestern Argentina. The location of clusters as hot spot areas should help control programs to identify priority areas and allocate their resources more effectively. Oviposition activity was detected in Orán City (Salta province) using ovitraps, weekly replaced (October 2005-2007). Spatial autocorrelation was measured with Moran's Index and depicted through cluster maps to identify hot spots. Total egg numbers were spatially interpolated and a classified map with Ae. aegypti high oviposition activity areas was performed. Potential breeding and resting (PBR) sites were geo-referenced. A logistic regression analysis of interpolated egg numbers and PBR location was performed to generate a predictive mapping of mosquito oviposition activity. Both cluster maps and predictive map were consistent, identifying in central and southern areas of the city high Ae. aegypti oviposition activity. A logistic regression model was successfully developed to predict Ae. aegypti oviposition activity based on distance to PBR sites, with tire dumps having the strongest association with mosquito oviposition activity. A predictive map reflecting probability of oviposition activity was produced. The predictive map delimitated an area of maximum probability of Ae. aegypti oviposition activity in the south of Orán city where tire dumps predominate. The overall fit of the model was acceptable (ROC=0.77), obtaining 99% of sensitivity and 75.29% of specificity. Distance to tire dumps is inversely associated with high mosquito activity, allowing us to identify hot spots. These methodologies are useful for prevention, surveillance, and control of tropical vector borne diseases and might assist National Health Ministry to focus resources more effectively.

  7. Comparison of two possible routes of pathogen contamination of spinach leaves in a hydroponic cultivation system.

    PubMed

    Koseki, Shigenobu; Mizuno, Yasuko; Yamamoto, Kazutaka

    2011-09-01

    The route of pathogen contamination (from roots versus from leaves) of spinach leaves was investigated with a hydroponic cultivation system. Three major bacterial pathogens, Escherichia coli O157:H7, Salmonella, and Listeria monocytogenes, were inoculated into the hydroponic solution, in which the spinach was grown to give concentrations of 10⁶ and 10³ CFU/ml. In parallel, the pathogens were inoculated onto the growing leaf surface by pipetting, to give concentrations of 10⁶ and 10³ CFU per leaf. Although contamination was observed at a high rate through the root system by the higher inoculum (10⁶ CFU) for all the pathogens tested, the contamination was rare when the lower inoculum (10³ CFU) was applied. In contrast, contamination through the leaf occurred at a very low rate, even when the inoculum level was high. For all the pathogens tested in the present study, the probability of contamination was promoted through the roots and with higher inoculum levels. The probability of contamination was analyzed with logistic regression. The logistic regression model showed that the odds ratio of contamination from the roots versus from the leaves was 6.93, which suggested that the risk of contamination from the roots was 6.93 times higher than the risk of contamination from the leaves. In addition, the risk of contamination by L. monocytogenes was about 0.3 times that of Salmonella enterica subsp. enterica serovars Typhimurium and Enteritidis and E. coli O157:H7. The results of the present study indicate that the principal route of pathogen contamination of growing spinach leaves in a hydroponic system is from the plant's roots, rather than from leaf contamination itself.

  8. Associations between dairy cow inter-service interval and probability of conception.

    PubMed

    Remnant, J G; Green, M J; Huxley, J N; Hudson, C D

    2018-07-01

    Recent research has indicated that the interval between inseminations in modern dairy cattle is often longer than the commonly accepted cycle length of 18-24 days. This study analysed 257,396 inseminations in 75,745 cows from 312 herds in England and Wales. The interval between subsequent inseminations in the same cow in the same lactation (inter-service interval, ISI) were calculated and inseminations categorised as successful or unsuccessful depending on whether there was a corresponding calving event. Conception risk was calculated for each individual ISI between 16 and 28 days. A random effects logistic regression model was fitted to the data with pregnancy as the outcome variable and ISI (in days) included in the model as a categorical variable. The modal ISI was 22 days and the peak conception risk was 44% for ISIs of 21 days rising from 27% at 16 days. The logistic regression model revealed significant associations of conception risk with ISI as well as 305 day milk yield, insemination number, parity and days in milk. Predicted conception risk was lower for ISIs of 16, 17 and 18 days and higher for ISIs of 20, 21 and 22 days compared to 25 day ISIs. A mixture model was specified to identify clusters in insemination frequency and conception risk for ISIs between 3 and 50 days. A "high conception risk, high insemination frequency" cluster was identified between 19 and 26 days which indicated that this time period was the true latent distribution for ISI with optimal reproductive outcome. These findings suggest that the period of increased numbers of inseminations around 22 days identified in existing work coincides with the period of increased probability of conception and therefore likely represents true return estrus events. Copyright © 2018 Elsevier Inc. All rights reserved.

  9. Impact of grade separator on pedestrian risk taking behavior.

    PubMed

    Khatoon, Mariya; Tiwari, Geetam; Chatterjee, Niladri

    2013-01-01

    Pedestrians on Delhi roads are often exposed to high risks. This is because the basic needs of pedestrians are not recognized as a part of the urban transport infrastructure improvement projects in Delhi. Rather, an ever increasing number of cars and motorized two-wheelers encourage the construction of large numbers of flyovers/grade separators to facilitate signal free movement for motorized vehicles, exposing pedestrians to greater risk. This paper describes the statistical analysis of pedestrian risk taking behavior while crossing the road, before and after the construction of a grade separator at an intersection of Delhi. A significant number of pedestrians are willing to take risks in both before and after situations. The results indicate that absence of signals make pedestrians behave independently, leading to increased variability in their risk taking behavior. Variability in the speeds of all categories of vehicles has increased after the construction of grade separators. After the construction of the grade separator, the waiting time of pedestrians at the starting point of crossing has increased and the correlation between waiting times and gaps accepted by pedestrians show that after certain time of waiting, pedestrians become impatient and accepts smaller gap size to cross the road. A Logistic regression model is fitted by assuming that the probability of road crossing by pedestrians depends on the gap size (in s) between pedestrian and conflicting vehicles, sex, age, type of pedestrians (single or in a group) and type of conflicting vehicles. The results of Logistic regression explained that before the construction of the grade separator the probability of road crossing by the pedestrian depends on only the gap size parameter; however after the construction of the grade separator, other parameters become significant in determining pedestrian risk taking behavior. Copyright © 2012 Elsevier Ltd. All rights reserved.

  10. Predicting Redox Conditions in Groundwater Using Statistical Techniques: Implications for Nitrate Transport in Groundwater and Streams

    NASA Astrophysics Data System (ADS)

    Tesoriero, A. J.; Terziotti, S.

    2014-12-01

    Nitrate trends in streams often do not match expectations based on recent nitrogen source loadings to the land surface. Groundwater discharge with long travel times has been suggested as the likely cause for these observations. The fate of nitrate in groundwater depends to a large extent on the occurrence of denitrification along flow paths. Because denitrification in groundwater is inhibited when dissolved oxygen (DO) concentrations are high, defining the oxic-suboxic interface has been critical in determining pathways for nitrate transport in groundwater and to streams at the local scale. Predicting redox conditions on a regional scale is complicated by the spatial variability of reaction rates. In this study, logistic regression and boosted classification tree analysis were used to predict the probability of oxic water in groundwater in the Chesapeake Bay watershed. The probability of oxic water (DO > 2 mg/L) was predicted by relating DO concentrations in over 3,000 groundwater samples to indicators of residence time and/or electron donor availability. Variables that describe position in the flow system (e.g., depth to top of the open interval), soil drainage and surficial geology were the most important predictors of oxic water. Logistic regression and boosted classification tree analysis correctly predicted the presence or absence of oxic conditions in over 75 % of the samples in both training and validation data sets. Predictions of the percentages of oxic wells in deciles of risk were very accurate (r2>0.9) in both the training and validation data sets. Depth to the bottom of the oxic layer was predicted and is being used to estimate the effect that groundwater denitrification has on stream nitrate concentrations and the time lag between the application of nitrogen at the land surface and its effect on streams.

  11. The burden of infectious and cardiovascular diseases in India from 2004 to 2014.

    PubMed

    Banerjee, Kajori; Dwivedi, Laxmi Kant

    2016-01-01

    In India, both communicable and non-communicable diseases have been argued to disproportionately affect certain socioeconomic strata of the population. Using the 60th (2004) and 71st (2014) rounds of the National Sample Survey, this study assessed the balance between infectious diseases and cardiovascular diseases (CVD) from 2004 to 2014, as well as changes in the disease burden in various socioeconomic and demographic subpopulations. Prevalence rates, hospitalization rates, case fatality rates, and share of in-patients deaths were estimated to compare the disease burdens at these time points. Logistic regression and multivariate decomposition were used to evaluate changes in disease burden across various socio-demographic and socioeconomic groups. Evidence of stagnation in the infectious disease burden and rapid increase in the CVD burden was observed. Along with the drastic increase in case fatality rate, share of in-patients deaths became more skewed towards CVD from 2004 to 2014. Logistic regression analysis demonstrated a significant shift of the chance of succumbing to CVD from the privileged class, comprising non-Scheduled Castes and Tribes, more highly educated individuals, and households with higher monthly expenditures, towards the underprivileged population. Decomposition indicated that a change in the probability of suffering from CVD among the subcategories of age, social groups, educational status, and monthly household expenditures contributed to the increase in CVD prevalence more than compositional changes of the population from 2004 to 2014. This study provides evidence of the ongoing tendency of CVD to occur in older population segments, and also confirms the theory of diffusion, according to which an increased probability of suffering from CVD has trickled down the socioeconomic gradient.

  12. Use of administrative records to assess pneumococcal conjugate vaccine impact on pediatric meningitis and pneumonia hospitalizations in Rwanda.

    PubMed

    Gatera, Maurice; Uwimana, Jeannine; Manzi, Emmanuel; Ngabo, Fidele; Nwaigwe, Friday; Gessner, Bradford D; Moïsi, Jennifer C

    2016-10-17

    Ongoing surveillance is critical to assessing pneumococcal conjugate vaccine (PCV) impact over time. However, robust prospective studies are difficult to implement in resource-poor settings. We evaluated retrospective use of routinely collected data to estimate PCV impact in Rwanda. We collected data from admission registers at five district hospitals on children age <5yearsadmitted for suspected meningitis and pneumonia during 2002-2012. We obtained clinical and laboratory data on meningitis from sentinel surveillance at the national reference hospital in Kigali. We developed multivariable logistic regression models to estimate PCV effectiveness (VE) against severe pneumonia and probable bacterial meningitis and Poisson models to estimate absolute rate reductions. Haemophilus influenzae type b vaccine was introduced in January 2002, PCV7 in April 2009 and PCV13 in August 2011. At the district hospitals, the severe pneumonia and suspected meningitis hospitalization rates decreased by 70/100,000 and 11/100,000 children for 2012 compared to baseline, respectively. VE against severe pneumonia calculated from logistic regression was 54% (95% CI 42-63%). In Kigali, from 2002 to 2012, annual suspected meningitis cases decreased from 170 pre-PCV7 to 40 post-PCV13 and confirmed pneumococcal meningitis cases from 7 to 0. VE against probable bacterial meningitis was 42% (95% CI -4% to 68%). In a resource-poor African setting, analysis of district hospital admission logbooks and routine sentinel surveillance data produced results consistent with more sophisticated impact studies conducted elsewhere. Our findings support applying this methodology in other settings and confirm the benefits of PCV in Rwanda. Copyright © 2016 Elsevier Ltd. All rights reserved.

  13. Factors affecting patients' adherence to orthodontic appointments.

    PubMed

    Bukhari, Omair M; Sohrabi, Keyvan; Tavares, Mary

    2016-03-01

    Studies show that attendance at orthodontic appointments affects treatment outcomes, treatment duration, and the probability of side effects. The aim of this study was to predict factors that influence patients' attendance at orthodontic appointments. We conducted a face-to-face guided interview survey of 153 participants from orthodontic clinics in the Greater Boston area. Attendance at scheduled orthodontic appointments was self-reported as always, sometimes, or rarely. Participants' characteristics, including demographics, dental insurance, and oral hygiene practices, were self-reported. Moreover, from dental records, we collected the time that the participants spent undergoing active orthodontic treatment. Multivariable ordered logistic regression was used to report proportional odds ratios and attendance probabilities. A likelihood ratio test was performed to ensure that the proportional odds assumption held. For overall appointment attendance, 76% of the participants reported always attending, 16% reported sometimes attending, and 8% reported rarely attending. Based on multivariable logistic regression (adjusted for age, race, and sex), the participants with optimal oral hygiene practices were almost 6 times (5.9) more likely to attend appointments than those who did not (P = 0.002). The odds of attending appointments decreased significantly (by 23%) for every 6-month increase in treatment duration (P = 0.008). Participants covered by non-Medicaid insurance were 4 times (P = 0.018) more likely to attend appointments than were those with Medicaid insurance. Our findings indicate that adherence to orthodontic treatment follow-up visits was strongly correlated to insurance type, treatment duration, and oral hygiene practices. Unlike previous studies, sex was not a significant predictor of adherence. Copyright © 2016 American Association of Orthodontists. Published by Elsevier Inc. All rights reserved.

  14. Identification and validation of a logistic regression model for predicting serious injuries associated with motor vehicle crashes.

    PubMed

    Kononen, Douglas W; Flannagan, Carol A C; Wang, Stewart C

    2011-01-01

    A multivariate logistic regression model, based upon National Automotive Sampling System Crashworthiness Data System (NASS-CDS) data for calendar years 1999-2008, was developed to predict the probability that a crash-involved vehicle will contain one or more occupants with serious or incapacitating injuries. These vehicles were defined as containing at least one occupant coded with an Injury Severity Score (ISS) of greater than or equal to 15, in planar, non-rollover crash events involving Model Year 2000 and newer cars, light trucks, and vans. The target injury outcome measure was developed by the Centers for Disease Control and Prevention (CDC)-led National Expert Panel on Field Triage in their recent revision of the Field Triage Decision Scheme (American College of Surgeons, 2006). The parameters to be used for crash injury prediction were subsequently specified by the National Expert Panel. Model input parameters included: crash direction (front, left, right, and rear), change in velocity (delta-V), multiple vs. single impacts, belt use, presence of at least one older occupant (≥ 55 years old), presence of at least one female in the vehicle, and vehicle type (car, pickup truck, van, and sport utility). The model was developed using predictor variables that may be readily available, post-crash, from OnStar-like telematics systems. Model sensitivity and specificity were 40% and 98%, respectively, using a probability cutpoint of 0.20. The area under the receiver operator characteristic (ROC) curve for the final model was 0.84. Delta-V (mph), seat belt use and crash direction were the most important predictors of serious injury. Due to the complexity of factors associated with rollover-related injuries, a separate screening algorithm is needed to model injuries associated with this crash mode. Copyright © 2010 Elsevier Ltd. All rights reserved.

  15. Developmental dyslexia: predicting individual risk

    PubMed Central

    Thompson, Paul A; Hulme, Charles; Nash, Hannah M; Gooch, Debbie; Hayiou-Thomas, Emma; Snowling, Margaret J

    2015-01-01

    Background Causal theories of dyslexia suggest that it is a heritable disorder, which is the outcome of multiple risk factors. However, whether early screening for dyslexia is viable is not yet known. Methods The study followed children at high risk of dyslexia from preschool through the early primary years assessing them from age 3 years and 6 months (T1) at approximately annual intervals on tasks tapping cognitive, language, and executive-motor skills. The children were recruited to three groups: children at family risk of dyslexia, children with concerns regarding speech, and language development at 3;06 years and controls considered to be typically developing. At 8 years, children were classified as ‘dyslexic’ or not. Logistic regression models were used to predict the individual risk of dyslexia and to investigate how risk factors accumulate to predict poor literacy outcomes. Results Family-risk status was a stronger predictor of dyslexia at 8 years than low language in preschool. Additional predictors in the preschool years include letter knowledge, phonological awareness, rapid automatized naming, and executive skills. At the time of school entry, language skills become significant predictors, and motor skills add a small but significant increase to the prediction probability. We present classification accuracy using different probability cutoffs for logistic regression models and ROC curves to highlight the accumulation of risk factors at the individual level. Conclusions Dyslexia is the outcome of multiple risk factors and children with language difficulties at school entry are at high risk. Family history of dyslexia is a predictor of literacy outcome from the preschool years. However, screening does not reach an acceptable clinical level until close to school entry when letter knowledge, phonological awareness, and RAN, rather than family risk, together provide good sensitivity and specificity as a screening battery. PMID:25832320

  16. Prediction of unwanted pregnancies using logistic regression, probit regression and discriminant analysis

    PubMed Central

    Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon

    2015-01-01

    Background: Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. Methods: In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. Results: The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Conclusion: Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended. PMID:26793655

  17. Prediction of unwanted pregnancies using logistic regression, probit regression and discriminant analysis.

    PubMed

    Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon

    2015-01-01

    Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended.

  18. Automated Screening of Children With Obstructive Sleep Apnea Using Nocturnal Oximetry: An Alternative to Respiratory Polygraphy in Unattended Settings

    PubMed Central

    Álvarez, Daniel; Alonso-Álvarez, María L.; Gutiérrez-Tobal, Gonzalo C.; Crespo, Andrea; Kheirandish-Gozal, Leila; Hornero, Roberto; Gozal, David; Terán-Santos, Joaquín; Del Campo, Félix

    2017-01-01

    Study Objectives: Nocturnal oximetry has become known as a simple, readily available, and potentially useful diagnostic tool of childhood obstructive sleep apnea (OSA). However, at-home respiratory polygraphy (HRP) remains the preferred alternative to polysomnography (PSG) in unattended settings. The aim of this study was twofold: (1) to design and assess a novel methodology for pediatric OSA screening based on automated analysis of at-home oxyhemoglobin saturation (SpO2), and (2) to compare its diagnostic performance with HRP. Methods: SpO2 recordings were parameterized by means of time, frequency, and conventional oximetric measures. Logistic regression models were optimized using genetic algorithms (GAs) for three cutoffs for OSA: 1, 3, and 5 events/h. The diagnostic performance of logistic regression models, manual obstructive apnea-hypopnea index (OAHI) from HRP, and the conventional oxygen desaturation index ≥ 3% (ODI3) were assessed. Results: For a cutoff of 1 event/h, the optimal logistic regression model significantly outperformed both conventional HRP-derived ODI3 and OAHI: 85.5% accuracy (HRP 74.6%; ODI3 65.9%) and 0.97 area under the receiver operating characteristics curve (AUC) (HRP 0.78; ODI3 0.75) were reached. For a cutoff of 3 events/h, the logistic regression model achieved 83.4% accuracy (HRP 85.0%; ODI3 74.5%) and 0.96 AUC (HRP 0.93; ODI3 0.85) whereas using a cutoff of 5 events/h, oximetry reached 82.8% accuracy (HRP 85.1%; ODI3 76.7) and 0.97 AUC (HRP 0.95; ODI3 0.84). Conclusions: Automated analysis of at-home SpO2 recordings provide accurate detection of children with high pretest probability of OSA. Thus, unsupervised nocturnal oximetry may enable a simple and effective alternative to HRP and PSG in unattended settings. Citation: Álvarez D, Alonso-Álvarez ML, Gutiérrez-Tobal GC, Crespo A, Kheirandish-Gozal L, Hornero R, Gozal D, Terán-Santos J, Del Campo F. Automated screening of children with obstructive sleep apnea using nocturnal oximetry: an alternative to respiratory polygraphy in unattended settings. J Clin Sleep Med. 2017;13(5):693–702. PMID:28356177

  19. Predictors of course in obsessive-compulsive disorder: logistic regression versus Cox regression for recurrent events.

    PubMed

    Kempe, P T; van Oppen, P; de Haan, E; Twisk, J W R; Sluis, A; Smit, J H; van Dyck, R; van Balkom, A J L M

    2007-09-01

    Two methods for predicting remissions in obsessive-compulsive disorder (OCD) treatment are evaluated. Y-BOCS measurements of 88 patients with a primary OCD (DSM-III-R) diagnosis were performed over a 16-week treatment period, and during three follow-ups. Remission at any measurement was defined as a Y-BOCS score lower than thirteen combined with a reduction of seven points when compared with baseline. Logistic regression models were compared with a Cox regression for recurrent events model. Logistic regression yielded different models at different evaluation times. The recurrent events model remained stable when fewer measurements were used. Higher baseline levels of neuroticism and more severe OCD symptoms were associated with a lower chance of remission, early age of onset and more depressive symptoms with a higher chance. Choice of outcome time affects logistic regression prediction models. Recurrent events analysis uses all information on remissions and relapses. Short- and long-term predictors for OCD remission show overlap.

  20. [Logistic regression model of noninvasive prediction for portal hypertensive gastropathy in patients with hepatitis B associated cirrhosis].

    PubMed

    Wang, Qingliang; Li, Xiaojie; Hu, Kunpeng; Zhao, Kun; Yang, Peisheng; Liu, Bo

    2015-05-12

    To explore the risk factors of portal hypertensive gastropathy (PHG) in patients with hepatitis B associated cirrhosis and establish a Logistic regression model of noninvasive prediction. The clinical data of 234 hospitalized patients with hepatitis B associated cirrhosis from March 2012 to March 2014 were analyzed retrospectively. The dependent variable was the occurrence of PHG while the independent variables were screened by binary Logistic analysis. Multivariate Logistic regression was used for further analysis of significant noninvasive independent variables. Logistic regression model was established and odds ratio was calculated for each factor. The accuracy, sensitivity and specificity of model were evaluated by the curve of receiver operating characteristic (ROC). According to univariate Logistic regression, the risk factors included hepatic dysfunction, albumin (ALB), bilirubin (TB), prothrombin time (PT), platelet (PLT), white blood cell (WBC), portal vein diameter, spleen index, splenic vein diameter, diameter ratio, PLT to spleen volume ratio, esophageal varices (EV) and gastric varices (GV). Multivariate analysis showed that hepatic dysfunction (X1), TB (X2), PLT (X3) and splenic vein diameter (X4) were the major occurring factors for PHG. The established regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4. The accuracy of model for PHG was 79.1% with a sensitivity of 77.2% and a specificity of 80.8%. Hepatic dysfunction, TB, PLT and splenic vein diameter are risk factors for PHG and the noninvasive predicted Logistic regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4.

  1. Variable Selection in Logistic Regression.

    DTIC Science & Technology

    1987-06-01

    23 %. AUTIOR(.) S. CONTRACT OR GRANT NUMBE Rf.i %Z. D. Bai, P. R. Krishnaiah and . C. Zhao F49620-85- C-0008 " PERFORMING ORGANIZATION NAME AND AOORESS...d I7 IOK-TK- d 7 -I0 7’ VARIABLE SELECTION IN LOGISTIC REGRESSION Z. D. Bai, P. R. Krishnaiah and L. C. Zhao Center for Multivariate Analysis...University of Pittsburgh Center for Multivariate Analysis University of Pittsburgh Y !I VARIABLE SELECTION IN LOGISTIC REGRESSION Z- 0. Bai, P. R. Krishnaiah

  2. Comparison of Logistic Regression and Artificial Neural Network in Low Back Pain Prediction: Second National Health Survey

    PubMed Central

    Parsaeian, M; Mohammad, K; Mahmoudi, M; Zeraati, H

    2012-01-01

    Background: The purpose of this investigation was to compare empirically predictive ability of an artificial neural network with a logistic regression in prediction of low back pain. Methods: Data from the second national health survey were considered in this investigation. This data includes the information of low back pain and its associated risk factors among Iranian people aged 15 years and older. Artificial neural network and logistic regression models were developed using a set of 17294 data and they were validated in a test set of 17295 data. Hosmer and Lemeshow recommendation for model selection was used in fitting the logistic regression. A three-layer perceptron with 9 inputs, 3 hidden and 1 output neurons was employed. The efficiency of two models was compared by receiver operating characteristic analysis, root mean square and -2 Loglikelihood criteria. Results: The area under the ROC curve (SE), root mean square and -2Loglikelihood of the logistic regression was 0.752 (0.004), 0.3832 and 14769.2, respectively. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the artificial neural network was 0.754 (0.004), 0.3770 and 14757.6, respectively. Conclusions: Based on these three criteria, artificial neural network would give better performance than logistic regression. Although, the difference is statistically significant, it does not seem to be clinically significant. PMID:23113198

  3. Comparison of logistic regression and artificial neural network in low back pain prediction: second national health survey.

    PubMed

    Parsaeian, M; Mohammad, K; Mahmoudi, M; Zeraati, H

    2012-01-01

    The purpose of this investigation was to compare empirically predictive ability of an artificial neural network with a logistic regression in prediction of low back pain. Data from the second national health survey were considered in this investigation. This data includes the information of low back pain and its associated risk factors among Iranian people aged 15 years and older. Artificial neural network and logistic regression models were developed using a set of 17294 data and they were validated in a test set of 17295 data. Hosmer and Lemeshow recommendation for model selection was used in fitting the logistic regression. A three-layer perceptron with 9 inputs, 3 hidden and 1 output neurons was employed. The efficiency of two models was compared by receiver operating characteristic analysis, root mean square and -2 Loglikelihood criteria. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the logistic regression was 0.752 (0.004), 0.3832 and 14769.2, respectively. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the artificial neural network was 0.754 (0.004), 0.3770 and 14757.6, respectively. Based on these three criteria, artificial neural network would give better performance than logistic regression. Although, the difference is statistically significant, it does not seem to be clinically significant.

  4. Understanding logistic regression analysis.

    PubMed

    Sperandei, Sandro

    2014-01-01

    Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using examples to make it as simple as possible. After definition of the technique, the basic interpretation of the results is highlighted and then some special issues are discussed.

  5. Who cares? A comparison of informal and formal care provision in Spain, England and the USA

    PubMed Central

    SOLÉ-AURÓ, AÏDA; CRIMMINS, EILEEN M.

    2013-01-01

    This paper investigates the prevalence of incapacity in performing daily activities and the associations between household composition and availability of family members and receipt of care among older adults with functioning problems in Spain, England and the United States of America (USA). We examine how living arrangements, marital status, child availability, limitations in functioning ability, age and gender affect the probability of receiving formal care and informal care from household members and from others in three countries with different family structures, living arrangements and policies supporting care of the incapacitated. Data sources include the 2006 Survey of Health, Ageing and Retirement in Europe for Spain, the third wave of the English Longitudinal Study of Ageing (2006), and the eighth wave of the USA Health and Retirement Study (2006). Logistic and multinomial logistic regressions are used to estimate the probability of receiving care and the sources of care among persons age 50 and older. The percentage of people with functional limitations receiving care is higher in Spain. More care comes from outside the household in the USA and England than in Spain. The use of formal care among the incapacitated is lowest in the USA and highest in Spain. PMID:24550574

  6. Exposure-response evaluations of venetoclax efficacy and safety in patients with non-Hodgkin lymphoma.

    PubMed

    Parikh, Apurvasena; Gopalakrishnan, Sathej; Freise, Kevin J; Verdugo, Maria E; Menon, Rajeev M; Mensing, Sven; Salem, Ahmed Hamed

    2018-04-01

    Exposure-response analyses were performed for a venetoclax monotherapy study in 106 patients with varying subtypes of non-Hodgkin lymphoma (NHL) (NCT01328626). Logistic regression, time-to-event, and progression-free survival (PFS) analyses were used to evaluate the relationship between venetoclax exposure, NHL subtype and response, PFS, or occurrence of serious adverse events. Trends for small increases in the probability of response with increasing venetoclax exposures were identified, and became more evident when assessed by NHL subtype. Trends in exposure-PFS were shown for the mantle cell lymphoma (MCL) subtype, but not other subtypes. There was no increase in the probability of experiencing a serious adverse event with increasing exposure. Overall, the results indicate that venetoclax doses of 800-1200 mg as a single agent may be appropriate to maximize efficacy in MCL, follicular lymphoma, and diffuse large B-cell lymphoma subtypes with no expected negative impact on safety.

  7. Predicting significant torso trauma.

    PubMed

    Nirula, Ram; Talmor, Daniel; Brasel, Karen

    2005-07-01

    Identification of motor vehicle crash (MVC) characteristics associated with thoracoabdominal injury would advance the development of automatic crash notification systems (ACNS) by improving triage and response times. Our objective was to determine the relationships between MVC characteristics and thoracoabdominal trauma to develop a torso injury probability model. Drivers involved in crashes from 1993 to 2001 within the National Automotive Sampling System were reviewed. Relationships between torso injury and MVC characteristics were assessed using multivariate logistic regression. Receiver operating characteristic curves were used to compare the model to current ACNS models. There were a total of 56,466 drivers. Age, ejection, braking, avoidance, velocity, restraints, passenger-side impact, rollover, and vehicle weight and type were associated with injury (p < 0.05). The area under the receiver operating characteristic curve (83.9) was significantly greater than current ACNS models. We have developed a thoracoabdominal injury probability model that may improve patient triage when used with ACNS.

  8. Stock price analysis of sustainable foreign investment companies in Indonesia

    NASA Astrophysics Data System (ADS)

    Fachrudin, Khaira Amalia

    2018-03-01

    The stock price is determined by demand and supply in the stock market. Stock price reacts to information. Sustainable investment is an investment that considers environmental sustainability and human rights. This study aims to predict the probability of above average stock price by including the sustainability index as one of its variables. The population is all foreign investment companies in Indonesia. The target population is companies that distribute dividends – also as a sample. The analysis tool is a logistic regression. At 5% alpha, it was found that sustainability index did not have the probability to increase stock price average. The significant effects are free cash flow and cost of debt. However, sustainability index can increase the Negelkarke R square. The implication is that the awareness of sustainability is still necesary to be improved because from the research result it can be seen that investors only consider the risk and return.

  9. Ecological correlates of depression and self-esteem in rural youth.

    PubMed

    Smokowski, Paul R; Evans, Caroline B R; Cotter, Katie L; Guo, Shenyang

    2014-10-01

    The current study examines individual-, social-, and school-level characteristics influencing symptoms of depression and self-esteem among a large sample (N = 4,321) of U.S. youth living in two rural counties in the South. Survey data for this sample of middle-school students (Grade 6 to Grade 8) were part of the Rural Adaptation Project. Data were analyzed using ordered logistic regression. Results show that being female, having a low income, and having negative relationships with parents and peers are risk factors that increase the probability of reporting high levels of depressive symptoms and low levels of self-esteem. In contrast, supportive relationships with parents and peers, high religious orientation, ethnic identity, and school satisfaction increased the probability of reporting low levels of depressive symptoms and high levels of self-esteem. There were few school-level characteristics associated with levels of depressive symptoms and self-esteem. Implications are discussed.

  10. [Psychomotor development in offspring of mothers with post partum depression].

    PubMed

    Podestá L, Loreto; Alarcón, Ana María; Muñoz, Sergio; Legüe C, Marcela; Bustos, Luis; Barría P, Mauricio

    2013-04-01

    Postpartum depression (PPD) has adverse effects on psychomotor development of the offspring. To evaluate the relationship between PPD and psychomotor development in children aged 18 months, consulting in primary care. Cross-sectional study with 360 infants and their mothers. Children had their psychomotor evaluation at l8 months and mothers completed the Edinburgh Postnatal Depression Scale at 4 and 12 weeks postpartum. The prevalence of both PPD and psychomotor alteration was estimated. The association between PPD and psychomotor alteration, including confounding variables, was estimated through logistic multiple regression analysis. The prevalence of PPD and psychomotor alteration was 29 and 16%, respectively Mothers with PPD had twice the probability of having an offspring with psychomotor alteration (Odds ratio = 2.0, confidence intervals = 1.07-3.68). This probability was significantly higher among single mothers or those with an unstable partner. PPD has a detrimental impact on psychomotor development of children.

  11. Comparing Methodologies for Developing an Early Warning System: Classification and Regression Tree Model versus Logistic Regression. REL 2015-077

    ERIC Educational Resources Information Center

    Koon, Sharon; Petscher, Yaacov

    2015-01-01

    The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…

  12. Functional Data Analysis in NTCP Modeling: A New Method to Explore the Radiation Dose-Volume Effects

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Benadjaoud, Mohamed Amine, E-mail: mohamedamine.benadjaoud@gustaveroussy.fr; Université Paris sud, Le Kremlin-Bicêtre; Institut Gustave Roussy, Villejuif

    2014-11-01

    Purpose/Objective(s): To describe a novel method to explore radiation dose-volume effects. Functional data analysis is used to investigate the information contained in differential dose-volume histograms. The method is applied to the normal tissue complication probability modeling of rectal bleeding (RB) for patients irradiated in the prostatic bed by 3-dimensional conformal radiation therapy. Methods and Materials: Kernel density estimation was used to estimate the individual probability density functions from each of the 141 rectum differential dose-volume histograms. Functional principal component analysis was performed on the estimated probability density functions to explore the variation modes in the dose distribution. The functional principalmore » components were then tested for association with RB using logistic regression adapted to functional covariates (FLR). For comparison, 3 other normal tissue complication probability models were considered: the Lyman-Kutcher-Burman model, logistic model based on standard dosimetric parameters (LM), and logistic model based on multivariate principal component analysis (PCA). Results: The incidence rate of grade ≥2 RB was 14%. V{sub 65Gy} was the most predictive factor for the LM (P=.058). The best fit for the Lyman-Kutcher-Burman model was obtained with n=0.12, m = 0.17, and TD50 = 72.6 Gy. In PCA and FLR, the components that describe the interdependence between the relative volumes exposed at intermediate and high doses were the most correlated to the complication. The FLR parameter function leads to a better understanding of the volume effect by including the treatment specificity in the delivered mechanistic information. For RB grade ≥2, patients with advanced age are significantly at risk (odds ratio, 1.123; 95% confidence interval, 1.03-1.22), and the fits of the LM, PCA, and functional principal component analysis models are significantly improved by including this clinical factor. Conclusion: Functional data analysis provides an attractive method for flexibly estimating the dose-volume effect for normal tissues in external radiation therapy.« less

  13. Expression of Proteins Involved in Epithelial-Mesenchymal Transition as Predictors of Metastasis and Survival in Breast Cancer Patients

    DTIC Science & Technology

    2013-11-01

    Ptrend 0.78 0.62 0.75 Unconditional logistic regression was used to estimate odds ratios (OR) and 95 % confidence intervals (CI) for risk of node...Ptrend 0.71 0.67 Unconditional logistic regression was used to estimate odds ratios (OR) and 95 % confidence intervals (CI) for risk of high-grade tumors... logistic regression was used to estimate odds ratios (OR) and 95 % confidence intervals (CI) for the associations between each of the seven SNPs and

  14. Investigating flight response of Pacific brant to helicopters at Izembek Lagoon, Alaska by using logistic regression

    USGS Publications Warehouse

    Erickson, Wallace P.; Nick, Todd G.; Ward, David H.; Peck, Roxy; Haugh, Larry D.; Goodman, Arnold

    1998-01-01

    Izembek Lagoon, an estuary in Alaska, is a very important staging area for Pacific brant, a small migratory goose. Each fall, nearly the entire Pacific Flyway population of 130,000 brant flies to Izembek Lagoon and feeds on eelgrass to accumulate fat reserves for nonstop transoceanic migration to wintering areas as distant as Mexico. In the past 10 years, offshore drilling activities in this area have increased, and, as a result, the air traffic in and out of the nearby Cold Bay airport has also increased. There has been a concern that this increased air traffic could affect the brant by disturbing them from their feeding and resting activities, which in turn could result in reduced energy intake and buildup. This may increase the mortality rates during their migratory journey. Because of these concerns, a study was conducted to investigate the flight response of brant to overflights of large helicopters. Response was measured on flocks during experimental overflights of large helicopters flown at varying altitudes and lateral (perpendicular) distances from the flocks. Logistic regression models were developed for predicting probability of flight response as a function of these distance variables. Results of this study may be used in the development of new FAA guidelines for aircraft near Izembek Lagoon.

  15. Surveillance of antimicrobial resistance in clinical isolates of Pasteurella multocida and Streptococcus suis from Ontario swine.

    PubMed

    Glass-Kaastra, Shiona K; Pearl, David L; Reid-Smith, Richard J; McEwen, Beverly; Slavic, Durda; Fairles, Jim; McEwen, Scott A

    2014-10-01

    Susceptibility results for Pasteurella multocida and Streptococcus suis isolated from swine clinical samples were obtained from January 1998 to October 2010 from the Animal Health Laboratory at the University of Guelph, Guelph, Ontario, and used to describe variation in antimicrobial resistance (AMR) to 4 drugs of importance in the Ontario swine industry: ampicillin, tetracycline, tiamulin, and trimethoprim-sulfamethoxazole. Four temporal data-analysis options were used: visualization of trends in 12-month rolling averages, logistic-regression modeling, temporal-scan statistics, and a scan with the "What's strange about recent events?" (WSARE) algorithm. The AMR trends varied among the antimicrobial drugs for a single pathogen and between pathogens for a single antimicrobial, suggesting that pathogen-specific AMR surveillance may be preferable to indicator data. The 4 methods provided complementary and, at times, redundant results. The most appropriate combination of analysis methods for surveillance using these data included temporal-scan statistics with a visualization method (rolling-average or predicted-probability plots following logistic-regression models). The WSARE algorithm provided interesting results for quality control and has the potential to detect new resistance patterns; however, missing data created problems for displaying the results in a way that would be meaningful to all surveillance stakeholders.

  16. Surveillance of antimicrobial resistance in clinical isolates of Pasteurella multocida and Streptococcus suis from Ontario swine

    PubMed Central

    Glass-Kaastra, Shiona K.; Pearl, David L.; Reid-Smith, Richard J.; McEwen, Beverly; Slavic, Durda; Fairles, Jim; McEwen, Scott A.

    2014-01-01

    Susceptibility results for Pasteurella multocida and Streptococcus suis isolated from swine clinical samples were obtained from January 1998 to October 2010 from the Animal Health Laboratory at the University of Guelph, Guelph, Ontario, and used to describe variation in antimicrobial resistance (AMR) to 4 drugs of importance in the Ontario swine industry: ampicillin, tetracycline, tiamulin, and trimethoprim–sulfamethoxazole. Four temporal data-analysis options were used: visualization of trends in 12-month rolling averages, logistic-regression modeling, temporal-scan statistics, and a scan with the “What’s strange about recent events?” (WSARE) algorithm. The AMR trends varied among the antimicrobial drugs for a single pathogen and between pathogens for a single antimicrobial, suggesting that pathogen-specific AMR surveillance may be preferable to indicator data. The 4 methods provided complementary and, at times, redundant results. The most appropriate combination of analysis methods for surveillance using these data included temporal-scan statistics with a visualization method (rolling-average or predicted-probability plots following logistic-regression models). The WSARE algorithm provided interesting results for quality control and has the potential to detect new resistance patterns; however, missing data created problems for displaying the results in a way that would be meaningful to all surveillance stakeholders. PMID:25355992

  17. Clarifying the convergence between obsessive compulsive personality disorder criteria and obsessive compulsive disorder.

    PubMed

    Eisen, Jane L; Coles, Meredith E; Shea, M Tracie; Pagano, Maria E; Stout, Robert L; Yen, Shirley; Grilo, Carlos M; Rasmussen, Steven A

    2006-06-01

    In this study we examined the convergence between obsessive-compulsive personality disorder (OCPD) criteria and obsessive-compulsive disorder (OCD). Baseline assessments of 629 participants of the Collaborative Longitudinal Personality Disorders Study were used to examine the associations between OCPD criteria and diagnoses of OCD. Three of the eight OCPD criteria--hoarding, perfectionism, and preoccupation with details--were significantly more frequent in subjects with OCD (n = 89) than in subjects without OCD (n = 540). Logistic regressions were used to predict the probability of each OCPD criterion as a function of Axis I diagnoses (OCD, additional anxiety disorders, and major depressive disorder). Associations between OCD and these three OCPD criteria remained significant in the logistic regressions, showing unique associations with OCD and odds ratios ranging from 2.71 to 2.99. In addition, other anxiety disorders and major depressive disorder showed few associations with specific OCPD criteria. This study suggests variability in the strength of the relationships between specific OCPD criteria and OCD. The findings also support a unique relationship between OCPD symptoms and OCD, compared to other anxiety disorders or major depression. Future efforts to explore the link between Axis I and Axis II disorders may be enriched by conducting analyses at the symptom level.

  18. CLARIFYING THE CONVERGENCE BETWEEN OBSESSIVE COMPULSIVE PERSONALITY DISORDER CRITERIA AND OBSESSIVE COMPULSIVE DISORDER

    PubMed Central

    Eisen, Jane L.; Coles, Meredith E.; Shea, M. Tracie; Pagano, Maria E.; Stout, Robert L.; Yen, Shirley; Grilo, Carlos M.; Rasmussen, Steven A.

    2008-01-01

    In this study we examined the convergence between obsessive-compulsive personality disorder (OCPD) criteria and obsessive-compulsive disorder (OCD). Baseline assessments of 629 participants of the Collaborative Longitudinal Personality Disorders Study were used to examine the associations between OCPD criteria and diagnoses of OCD. Three of the eight OCPD criteria—hoarding, perfectionism, and preoccupation with details—were significantly more frequent in subjects with OCD (n = 89) than in subjects without OCD (n = 540). Logistic regressions were used to predict the probability of each OCPD criterion as a function of Axis I diagnoses (OCD, additional anxiety disorders, and major depressive disorder). Associations between OCD and these three OCPD criteria remained significant in the logistic regressions, showing unique associations with OCD and odds ratios ranging from 2.71 to 2.99. In addition, other anxiety disorders and major depressive disorder showed few associations with specific OCPD criteria. This study suggests variability in the strength of the relationships between specific OCPD criteria and OCD. The findings also support a unique relationship between OCPD symptoms and OCD, compared to other anxiety disorders or major depression. Future efforts to explore the link between Axis I and Axis II disorders may be enriched by conducting analyses at the symptom level. PMID:16776557

  19. Correlations between quality ratings of skilled nursing facilities and multidrug-resistant urinary tract infections.

    PubMed

    Gucwa, Azad L; Dolar, Veronika; Ye, Chao; Epstein, Stephanie

    2016-11-01

    The purpose of this study was to determine risk factors for the acquisition of urinary tract infections (UTIs) and multidrug-resistant organisms (MDROs) in residents of skilled nursing facilities (SNFs). Using the informational database provided by the Centers for Medicare and Medicaid Services (CMS), a retrospective logistic regression was performed on 1,523 urine cultures from 12 SNFs located in Long Island, New York. Of the 1,142 positive urine cultures, Escherichia coli was most prevalent. Additionally, 164 (14.4%) of the UTIs were attributed to an MDRO. In multivariate logistic regression, sex and overall quality rating predicted the occurrence of UTIs, whereas identification of MDROs was dependent on the level of nursing care received. The mean predicted probability of UTIs and receipt of contaminated samples was inversely dependent on the facility's rating, where the likelihood increased as overall quality ratings decreased. The CMS's quality rating system may provide some insight into the status of infection control practices in SNFs. The results of this study suggest that potential consumers should focus on the overall star ratings and the competency of the nursing staff in these facilities rather than on individual quality measures. Copyright © 2016 Association for Professionals in Infection Control and Epidemiology, Inc. Published by Elsevier Inc. All rights reserved.

  20. Embedded measures of performance validity using verbal fluency tests in a clinical sample.

    PubMed

    Sugarman, Michael A; Axelrod, Bradley N

    2015-01-01

    The objective of this study was to determine to what extent verbal fluency measures can be used as performance validity indicators during neuropsychological evaluation. Participants were clinically referred for neuropsychological evaluation in an urban-based Veteran's Affairs hospital. Participants were placed into 2 groups based on their objectively evaluated effort on performance validity tests (PVTs). Individuals who exhibited credible performance (n = 431) failed 0 PVTs, and those with poor effort (n = 192) failed 2 or more PVTs. All participants completed the Controlled Oral Word Association Test (COWAT) and Animals verbal fluency measures. We evaluated how well verbal fluency scores could discriminate between the 2 groups. Raw scores and T scores for Animals discriminated between the credible performance and poor-effort groups with 90% specificity and greater than 40% sensitivity. COWAT scores had lower sensitivity for detecting poor effort. A combination of FAS and Animals scores into logistic regression models yielded acceptable group classification, with 90% specificity and greater than 44% sensitivity. Verbal fluency measures can yield adequate detection of poor effort during neuropsychological evaluation. We provide suggested cut points and logistic regression models for predicting the probability of poor effort in our clinical setting and offer suggested cutoff scores to optimize sensitivity and specificity.

  1. Effects of BMI on the risk and frequency of AIS 3+ injuries in motor-vehicle crashes.

    PubMed

    Rupp, Jonathan D; Flannagan, Carol A C; Leslie, Andrew J; Hoff, Carrie N; Reed, Matthew P; Cunningham, Rebecca M

    2013-01-01

    Determine the effects of BMI on the risk of serious-to-fatal injury (Abbreviated Injury Scale ≥ 3 or AIS 3+) to different body regions for adults in frontal, nearside, farside, and rollover crashes. Multivariate logistic regression analysis was applied to a probability sample of adult occupants involved in crashes generated by combining the National Automotive Sampling System (NASS-CDS) with a pseudoweighted version of the Crash Injury Research and Engineering Network database. Logistic regression models were applied to weighted data to estimate the change in the number of occupants with AIS 3+ injuries if no occupants were obese. Increasing BMI increased risk of lower-extremity injury in frontal crashes, decreased risk of lower-extremity injury in nearside impacts, increased risk of upper-extremity injury in frontal and nearside crashes, and increased risk of spine injury in frontal crashes. Several of these findings were affected by interactions with gender and vehicle type. If no occupants in frontal crashes were obese, 7% fewer occupants would sustain AIS 3+ upper-extremity injuries, 8% fewer occupants would sustain AIS 3+ lower-extremity injuries, and 28% fewer occupants would sustain AIS 3+ spine injuries. Results of this study have implications on the design and evaluation of vehicle safety systems. Copyright © 2013 The Obesity Society.

  2. Habitat selection models for Pacific sand lance (Ammodytes hexapterus) in Prince William Sound, Alaska

    USGS Publications Warehouse

    Ostrand, William D.; Gotthardt, Tracey A.; Howlin, Shay; Robards, Martin D.

    2005-01-01

    We modeled habitat selection by Pacific sand lance (Ammodytes hexapterus) by examining their distribution in relation to water depth, distance to shore, bottom slope, bottom type, distance from sand bottom, and shoreline type. Through both logistic regression and classification tree models, we compared the characteristics of 29 known sand lance locations to 58 randomly selected sites. The best models indicated a strong selection of shallow water by sand lance, with weaker association between sand lance distribution and beach shorelines, sand bottoms, distance to shore, bottom slope, and distance to the nearest sand bottom. We applied an information-theoretic approach to the interpretation of the logistic regression analysis and determined importance values of 0.99, 0.54, 0.52, 0.44, 0.39, and 0.25 for depth, beach shorelines, sand bottom, distance to shore, gradual bottom slope, and distance to the nearest sand bottom, respectively. The classification tree model indicated that sand lance selected shallow-water habitats and remained near sand bottoms when located in habitats with depths between 40 and 60 m. All sand lance locations were at depths <60 m and 93% occurred at depths <40 m. Probable reasons for the modeled relationships between the distribution of sand lance and the independent variables are discussed.

  3. LASSO NTCP predictors for the incidence of xerostomia in patients with head and neck squamous cell carcinoma and nasopharyngeal carcinoma

    PubMed Central

    Lee, Tsair-Fwu; Liou, Ming-Hsiang; Huang, Yu-Jie; Chao, Pei-Ju; Ting, Hui-Min; Lee, Hsiao-Yi

    2014-01-01

    To predict the incidence of moderate-to-severe patient-reported xerostomia among head and neck squamous cell carcinoma (HNSCC) and nasopharyngeal carcinoma (NPC) patients treated with intensity-modulated radiotherapy (IMRT). Multivariable normal tissue complication probability (NTCP) models were developed by using quality of life questionnaire datasets from 152 patients with HNSCC and 84 patients with NPC. The primary endpoint was defined as moderate-to-severe xerostomia after IMRT. The numbers of predictive factors for a multivariable logistic regression model were determined using the least absolute shrinkage and selection operator (LASSO) with bootstrapping technique. Four predictive models were achieved by LASSO with the smallest number of factors while preserving predictive value with higher AUC performance. For all models, the dosimetric factors for the mean dose given to the contralateral and ipsilateral parotid gland were selected as the most significant predictors. Followed by the different clinical and socio-economic factors being selected, namely age, financial status, T stage, and education for different models were chosen. The predicted incidence of xerostomia for HNSCC and NPC patients can be improved by using multivariable logistic regression models with LASSO technique. The predictive model developed in HNSCC cannot be generalized to NPC cohort treated with IMRT without validation and vice versa. PMID:25163814

  4. Logistic LASSO regression for the diagnosis of breast cancer using clinical demographic data and the BI-RADS lexicon for ultrasonography.

    PubMed

    Kim, Sun Mi; Kim, Yongdai; Jeong, Kuhwan; Jeong, Heeyeong; Kim, Jiyoung

    2018-01-01

    The aim of this study was to compare the performance of image analysis for predicting breast cancer using two distinct regression models and to evaluate the usefulness of incorporating clinical and demographic data (CDD) into the image analysis in order to improve the diagnosis of breast cancer. This study included 139 solid masses from 139 patients who underwent a ultrasonography-guided core biopsy and had available CDD between June 2009 and April 2010. Three breast radiologists retrospectively reviewed 139 breast masses and described each lesion using the Breast Imaging Reporting and Data System (BI-RADS) lexicon. We applied and compared two regression methods-stepwise logistic (SL) regression and logistic least absolute shrinkage and selection operator (LASSO) regression-in which the BI-RADS descriptors and CDD were used as covariates. We investigated the performances of these regression methods and the agreement of radiologists in terms of test misclassification error and the area under the curve (AUC) of the tests. Logistic LASSO regression was superior (P<0.05) to SL regression, regardless of whether CDD was included in the covariates, in terms of test misclassification errors (0.234 vs. 0.253, without CDD; 0.196 vs. 0.258, with CDD) and AUC (0.785 vs. 0.759, without CDD; 0.873 vs. 0.735, with CDD). However, it was inferior (P<0.05) to the agreement of three radiologists in terms of test misclassification errors (0.234 vs. 0.168, without CDD; 0.196 vs. 0.088, with CDD) and the AUC without CDD (0.785 vs. 0.844, P<0.001), but was comparable to the AUC with CDD (0.873 vs. 0.880, P=0.141). Logistic LASSO regression based on BI-RADS descriptors and CDD showed better performance than SL in predicting the presence of breast cancer. The use of CDD as a supplement to the BI-RADS descriptors significantly improved the prediction of breast cancer using logistic LASSO regression.

  5. Methods for estimating selected low-flow frequency statistics for unregulated streams in Kentucky

    USGS Publications Warehouse

    Martin, Gary R.; Arihood, Leslie D.

    2010-01-01

    This report provides estimates of, and presents methods for estimating, selected low-flow frequency statistics for unregulated streams in Kentucky including the 30-day mean low flows for recurrence intervals of 2 and 5 years (30Q2 and 30Q5) and the 7-day mean low flows for recurrence intervals of 5, 10, and 20 years (7Q2, 7Q10, and 7Q20). Estimates of these statistics are provided for 121 U.S. Geological Survey streamflow-gaging stations with data through the 2006 climate year, which is the 12-month period ending March 31 of each year. Data were screened to identify the periods of homogeneous, unregulated flows for use in the analyses. Logistic-regression equations are presented for estimating the annual probability of the selected low-flow frequency statistics being equal to zero. Weighted-least-squares regression equations were developed for estimating the magnitude of the nonzero 30Q2, 30Q5, 7Q2, 7Q10, and 7Q20 low flows. Three low-flow regions were defined for estimating the 7-day low-flow frequency statistics. The explicit explanatory variables in the regression equations include total drainage area and the mapped streamflow-variability index measured from a revised statewide coverage of this characteristic. The percentage of the station low-flow statistics correctly classified as zero or nonzero by use of the logistic-regression equations ranged from 87.5 to 93.8 percent. The average standard errors of prediction of the weighted-least-squares regression equations ranged from 108 to 226 percent. The 30Q2 regression equations have the smallest standard errors of prediction, and the 7Q20 regression equations have the largest standard errors of prediction. The regression equations are applicable only to stream sites with low flows unaffected by regulation from reservoirs and local diversions of flow and to drainage basins in specified ranges of basin characteristics. Caution is advised when applying the equations for basins with characteristics near the applicable limits and for basins with karst drainage features.

  6. Migration Intentions and Illicit Substance Use among Youth in Central Mexico

    PubMed Central

    Marsiglia, Flavio Francisco; Kulis, Stephen; Hoffman, Steven; Calderón-Tena, Carlos Orestes; Becerra, David; Alvarez, Diana

    2011-01-01

    This study explored intentions to emigrate and substance use among youth (ages 14–24) from a central Mexico state with high emigration rates. Questionnaires were completed in 2007 by 702 students attending a probability sample of alternative secondary schools serving remote or poor communities. Linear and logistic regression analyses indicated that stronger intentions to emigrate predicted greater access to drugs, drug offers, and use of illicit drugs (marijuana, cocaine, inhalants), but not alcohol or cigarettes. Results are related to the healthy migrant theory and its applicability to youth with limited educational opportunities. The study’s limitations are noted. PMID:21955065

  7. When status hurts: dimensions of women's status and domestic abuse in rural Northern India.

    PubMed

    Mogford, Elizabeth

    2011-07-01

    This study is a multiple logistic regression analysis of the relationship between dimensions of women's status and domestic abuse in rural Uttar Pradesh, India, using the 1998-1999 National Family Health Survey (NFHS-2). Findings indicate that the effects of a woman's status on her likelihood of experiencing abuse depend on the social realm within which status operates. Specifically, more "public" dimensions of status are associated with a greater probability of abuse, while "domestic" dimensions are protective. The findings are interpreted in terms of transitioning gender norms in Uttar Pradesh and provide clarity to the literature on the complex relationship between status and abuse.

  8. Modeling summer month hydrological drought probabilities in the United States using antecedent flow conditions

    USGS Publications Warehouse

    Austin, Samuel H.; Nelms, David L.

    2017-01-01

    Climate change raises concern that risks of hydrological drought may be increasing. We estimate hydrological drought probabilities for rivers and streams in the United States (U.S.) using maximum likelihood logistic regression (MLLR). Streamflow data from winter months are used to estimate the chance of hydrological drought during summer months. Daily streamflow data collected from 9,144 stream gages from January 1, 1884 through January 9, 2014 provide hydrological drought streamflow probabilities for July, August, and September as functions of streamflows during October, November, December, January, and February, estimating outcomes 5-11 months ahead of their occurrence. Few drought prediction methods exploit temporal links among streamflows. We find MLLR modeling of drought streamflow probabilities exploits the explanatory power of temporally linked water flows. MLLR models with strong correct classification rates were produced for streams throughout the U.S. One ad hoc test of correct prediction rates of September 2013 hydrological droughts exceeded 90% correct classification. Some of the best-performing models coincide with areas of high concern including the West, the Midwest, Texas, the Southeast, and the Mid-Atlantic. Using hydrological drought MLLR probability estimates in a water management context can inform understanding of drought streamflow conditions, provide warning of future drought conditions, and aid water management decision making.

  9. Social network and individual correlates of sexual risk behavior among homeless young men who have sex with men.

    PubMed

    Tucker, Joan S; Hu, Jianhui; Golinelli, Daniela; Kennedy, David P; Green, Harold D; Wenzel, Suzanne L

    2012-10-01

    There is growing interest in network-based interventions to reduce HIV sexual risk behavior among both homeless youth and men who have sex with men. The goal of this study was to better understand the social network and individual correlates of sexual risk behavior among homeless young men who have sex with men (YMSM) to inform these HIV prevention efforts. A multistage sampling design was used to recruit a probability sample of 121 homeless YMSM (ages: 16-24 years) from shelters, drop-in centers, and street venues in Los Angeles County. Face-to-face interviews were conducted. Because of the different distributions of the three outcome variables, three distinct regression models were needed: ordinal logistic regression for unprotected sex, zero-truncated Poisson regression for number of sex partners, and logistic regression for any sex trade. Homeless YMSM were less likely to engage in unprotected sex and had fewer sex partners if their networks included platonic ties to peers who regularly attended school, and had fewer sex partners if most of their network members were not heavy drinkers. Most other aspects of network composition were unrelated to sexual risk behavior. Individual predictors of sexual risk behavior included older age, Hispanic ethnicity, lower education, depressive symptoms, less positive condom attitudes, and sleeping outdoors because of nowhere else to stay. HIV prevention programs for homeless YMSM may warrant a multipronged approach that helps these youth strengthen their ties to prosocial peers, develop more positive condom attitudes, and access needed mental health and housing services. Copyright © 2012 Society for Adolescent Health and Medicine. Published by Elsevier Inc. All rights reserved.

  10. Determining delayed admission to intensive care unit for mechanically ventilated patients in the emergency department.

    PubMed

    Hung, Shih-Chiang; Kung, Chia-Te; Hung, Chih-Wei; Liu, Ber-Ming; Liu, Jien-Wei; Chew, Ghee; Chuang, Hung-Yi; Lee, Wen-Huei; Lee, Tzu-Chi

    2014-08-23

    The adverse effects of delayed admission to the intensive care unit (ICU) have been recognized in previous studies. However, the definitions of delayed admission varies across studies. This study proposed a model to define "delayed admission", and explored the effect of ICU-waiting time on patients' outcome. This retrospective cohort study included non-traumatic adult patients on mechanical ventilation in the emergency department (ED), from July 2009 to June 2010. The primary outcomes measures were 21-ventilator-day mortality and prolonged hospital stays (over 30 days). Models of Cox regression and logistic regression were used for multivariate analysis. The non-delayed ICU-waiting was defined as a period in which the time effect on mortality was not statistically significant in a Cox regression model. To identify a suitable cut-off point between "delayed" and "non-delayed", subsets from the overall data were made based on ICU-waiting time and the hazard ratio of ICU-waiting hour in each subset was iteratively calculated. The cut-off time was then used to evaluate the impact of delayed ICU admission on mortality and prolonged length of hospital stay. The final analysis included 1,242 patients. The time effect on mortality emerged after 4 hours, thus we deduced ICU-waiting time in ED > 4 hours as delayed. By logistic regression analysis, delayed ICU admission affected the outcomes of 21 ventilator-days mortality and prolonged hospital stay, with odds ratio of 1.41 (95% confidence interval, 1.05 to 1.89) and 1.56 (95% confidence interval, 1.07 to 2.27) respectively. For patients on mechanical ventilation at the ED, delayed ICU admission is associated with higher probability of mortality and additional resource expenditure. A benchmark waiting time of no more than 4 hours for ICU admission is recommended.

  11. Modeling Governance KB with CATPCA to Overcome Multicollinearity in the Logistic Regression

    NASA Astrophysics Data System (ADS)

    Khikmah, L.; Wijayanto, H.; Syafitri, U. D.

    2017-04-01

    The problem often encounters in logistic regression modeling are multicollinearity problems. Data that have multicollinearity between explanatory variables with the result in the estimation of parameters to be bias. Besides, the multicollinearity will result in error in the classification. In general, to overcome multicollinearity in regression used stepwise regression. They are also another method to overcome multicollinearity which involves all variable for prediction. That is Principal Component Analysis (PCA). However, classical PCA in only for numeric data. Its data are categorical, one method to solve the problems is Categorical Principal Component Analysis (CATPCA). Data were used in this research were a part of data Demographic and Population Survey Indonesia (IDHS) 2012. This research focuses on the characteristic of women of using the contraceptive methods. Classification results evaluated using Area Under Curve (AUC) values. The higher the AUC value, the better. Based on AUC values, the classification of the contraceptive method using stepwise method (58.66%) is better than the logistic regression model (57.39%) and CATPCA (57.39%). Evaluation of the results of logistic regression using sensitivity, shows the opposite where CATPCA method (99.79%) is better than logistic regression method (92.43%) and stepwise (92.05%). Therefore in this study focuses on major class classification (using a contraceptive method), then the selected model is CATPCA because it can raise the level of the major class model accuracy.

  12. Stochastic modeling of sunshine number data

    NASA Astrophysics Data System (ADS)

    Brabec, Marek; Paulescu, Marius; Badescu, Viorel

    2013-11-01

    In this paper, we will present a unified statistical modeling framework for estimation and forecasting sunshine number (SSN) data. Sunshine number has been proposed earlier to describe sunshine time series in qualitative terms (Theor Appl Climatol 72 (2002) 127-136) and since then, it was shown to be useful not only for theoretical purposes but also for practical considerations, e.g. those related to the development of photovoltaic energy production. Statistical modeling and prediction of SSN as a binary time series has been challenging problem, however. Our statistical model for SSN time series is based on an underlying stochastic process formulation of Markov chain type. We will show how its transition probabilities can be efficiently estimated within logistic regression framework. In fact, our logistic Markovian model can be relatively easily fitted via maximum likelihood approach. This is optimal in many respects and it also enables us to use formalized statistical inference theory to obtain not only the point estimates of transition probabilities and their functions of interest, but also related uncertainties, as well as to test of various hypotheses of practical interest, etc. It is straightforward to deal with non-homogeneous transition probabilities in this framework. Very importantly from both physical and practical points of view, logistic Markov model class allows us to test hypotheses about how SSN dependents on various external covariates (e.g. elevation angle, solar time, etc.) and about details of the dynamic model (order and functional shape of the Markov kernel, etc.). Therefore, using generalized additive model approach (GAM), we can fit and compare models of various complexity which insist on keeping physical interpretation of the statistical model and its parts. After introducing the Markovian model and general approach for identification of its parameters, we will illustrate its use and performance on high resolution SSN data from the Solar Radiation Monitoring Station of the West University of Timisoara.

  13. Stochastic modeling of sunshine number data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brabec, Marek, E-mail: mbrabec@cs.cas.cz; Paulescu, Marius; Badescu, Viorel

    2013-11-13

    In this paper, we will present a unified statistical modeling framework for estimation and forecasting sunshine number (SSN) data. Sunshine number has been proposed earlier to describe sunshine time series in qualitative terms (Theor Appl Climatol 72 (2002) 127-136) and since then, it was shown to be useful not only for theoretical purposes but also for practical considerations, e.g. those related to the development of photovoltaic energy production. Statistical modeling and prediction of SSN as a binary time series has been challenging problem, however. Our statistical model for SSN time series is based on an underlying stochastic process formulation ofmore » Markov chain type. We will show how its transition probabilities can be efficiently estimated within logistic regression framework. In fact, our logistic Markovian model can be relatively easily fitted via maximum likelihood approach. This is optimal in many respects and it also enables us to use formalized statistical inference theory to obtain not only the point estimates of transition probabilities and their functions of interest, but also related uncertainties, as well as to test of various hypotheses of practical interest, etc. It is straightforward to deal with non-homogeneous transition probabilities in this framework. Very importantly from both physical and practical points of view, logistic Markov model class allows us to test hypotheses about how SSN dependents on various external covariates (e.g. elevation angle, solar time, etc.) and about details of the dynamic model (order and functional shape of the Markov kernel, etc.). Therefore, using generalized additive model approach (GAM), we can fit and compare models of various complexity which insist on keeping physical interpretation of the statistical model and its parts. After introducing the Markovian model and general approach for identification of its parameters, we will illustrate its use and performance on high resolution SSN data from the Solar Radiation Monitoring Station of the West University of Timisoara.« less

  14. Logistic regression models of factors influencing the location of bioenergy and biofuels plants

    Treesearch

    T.M. Young; R.L. Zaretzki; J.H. Perdue; F.M. Guess; X. Liu

    2011-01-01

    Logistic regression models were developed to identify significant factors that influence the location of existing wood-using bioenergy/biofuels plants and traditional wood-using facilities. Logistic models provided quantitative insight for variables influencing the location of woody biomass-using facilities. Availability of "thinnings to a basal area of 31.7m2/ha...

  15. Early endoscopic ultrasonography in acute biliary pancreatitis: A prospective pilot study

    PubMed Central

    Anderloni, Andrea; Galeazzi, Marianna; Ballarè, Marco; Pagliarulo, Michela; Orsello, Marco; Del Piano, Mario; Repici, Alessandro

    2015-01-01

    AIM: To investigate the clinical usefulness of early endoscopic ultrasonography (EUS) in the management of acute biliary pancreatitis (ABP). METHODS: All consecutive patients entering the emergency department between January 2010 and December 2012 due to acute abdominal pain and showing biochemical and/or radiological findings consistent with possible ABP were prospectively enrolled. Patients were classified as having a low, moderate, or high probability of common bile duct (CBD) stones, according to the established risk stratification. Exclusion criteria were: gastrectomy or patient in whom the cause of biliary obstruction was already identified by ultrasonography. All enrolled patients underwent EUS within 48 h of their admission. Endoscopic retrograde cholangiopancreatography was performed immediately after EUS only in those cases with proven CBD stones or sludge. The following parameters were investigated: (1) clinical: age, sex, fever; (2) radiological: dilated CBD; and (3) biochemical: bilirubin, AST, ALT, gGT, ALP, amylase, lipasis, PCR. Association between presence of CBD stone at EUS and the individual predictors were assessed by univariate logistic regression. Predictors significantly associated with CBD stones (P < 0.05) were entered in a multivariate logistic regression model. RESULTS: A total of 181 patients with pancreatitis were admitted to the emergency department between January 2010 and December 2012. After exclusion criteria a total of 71 patients (38 females, 53.5%, mean age 58 ± 20.12 years, range 27-89 years; 33 males, 46.5%, mean age 65 ± 11.86 years, range 41-91 years) were included in the present study. The probability of CBD stones was considered low in 21 cases (29%), moderate in 26 (37%), and high in the remaining 24 (34%). The 71 patients included in the study underwent EUS, which allowed for a complete evaluation of the target sites in all the cases. The procedure was completed in a mean time of 14.7 min (range 9-34 min), without any notable complications.The overall CBD stone frequency was 44% (31 of 71), with a significant increase from the group at low pretest probability to that at moderate (OR = 5.79, P = 0.01) and high (OR = 4.25, P = 0.03) pretest probability. CONCLUSION: Early EUS in ABP allows, if appropriate, immediate endoscopic treatment and significant spare of unnecessary operative procedures thus reducing possible related complications. PMID:26420969

  16. [Application of Bayes Probability Model in Differentiation of Yin and Yang Jaundice Syndromes in Neonates].

    PubMed

    Mu, Chun-sun; Zhang, Ping; Kong, Chun-yan; Li, Yang-ning

    2015-09-01

    To study the application of Bayes probability model in differentiating yin and yang jaundice syndromes in neonates. Totally 107 jaundice neonates who admitted to hospital within 10 days after birth were assigned to two groups according to syndrome differentiation, 68 in the yang jaundice syndrome group and 39 in the yin jaundice syndrome group. Data collected for neonates were factors related to jaundice before, during and after birth. Blood routines, liver and renal functions, and myocardial enzymes were tested on the admission day or the next day. Logistic regression model and Bayes discriminating analysis were used to screen factors important for yin and yang jaundice syndrome differentiation. Finally, Bayes probability model for yin and yang jaundice syndromes was established and assessed. Factors important for yin and yang jaundice syndrome differentiation screened by Logistic regression model and Bayes discriminating analysis included mothers' age, mother with gestational diabetes mellitus (GDM), gestational age, asphyxia, or ABO hemolytic diseases, red blood cell distribution width (RDW-SD), platelet-large cell ratio (P-LCR), serum direct bilirubin (DBIL), alkaline phosphatase (ALP), cholinesterase (CHE). Bayes discriminating analysis was performed by SPSS to obtain Bayes discriminant function coefficient. Bayes discriminant function was established according to discriminant function coefficients. Yang jaundice syndrome: y1= -21. 701 +2. 589 x mother's age + 1. 037 x GDM-17. 175 x asphyxia + 13. 876 x gestational age + 6. 303 x ABO hemolytic disease + 2.116 x RDW-SD + 0. 831 x DBIL + 0. 012 x ALP + 1. 697 x LCR + 0. 001 x CHE; Yin jaundice syndrome: y2= -33. 511 + 2.991 x mother's age + 3.960 x GDM-12. 877 x asphyxia + 11. 848 x gestational age + 1. 820 x ABO hemolytic disease +2. 231 x RDW-SD +0. 999 x DBIL +0. 023 x ALP +1. 916 x LCR +0. 002 x CHE. Bayes discriminant function was hypothesis tested and got Wilks' λ =0. 393 (P =0. 000). So Bayes discriminant function was proved to be with statistical difference. To check Bayes probability model in discriminating yin and yang jaundice syndromes, coincidence rates for yin and yang jaundice syndromes were both 90% plus. Yin and yang jaundice syndromes in neonates could be accurately judged by Bayesian discriminating functions.

  17. Discrete post-processing of total cloud cover ensemble forecasts

    NASA Astrophysics Data System (ADS)

    Hemri, Stephan; Haiden, Thomas; Pappenberger, Florian

    2017-04-01

    This contribution presents an approach to post-process ensemble forecasts for the discrete and bounded weather variable of total cloud cover. Two methods for discrete statistical post-processing of ensemble predictions are tested. The first approach is based on multinomial logistic regression, the second involves a proportional odds logistic regression model. Applying them to total cloud cover raw ensemble forecasts from the European Centre for Medium-Range Weather Forecasts improves forecast skill significantly. Based on station-wise post-processing of raw ensemble total cloud cover forecasts for a global set of 3330 stations over the period from 2007 to early 2014, the more parsimonious proportional odds logistic regression model proved to slightly outperform the multinomial logistic regression model. Reference Hemri, S., Haiden, T., & Pappenberger, F. (2016). Discrete post-processing of total cloud cover ensemble forecasts. Monthly Weather Review 144, 2565-2577.

  18. A Primer on Logistic Regression.

    ERIC Educational Resources Information Center

    Woldbeck, Tanya

    This paper introduces logistic regression as a viable alternative when the researcher is faced with variables that are not continuous. If one is to use simple regression, the dependent variable must be measured on a continuous scale. In the behavioral sciences, it may not always be appropriate or possible to have a measured dependent variable on a…

  19. [Lifestyle and probabilty of dementia in the elderly].

    PubMed

    León-Ortiz, Pablo; Ruiz-Flores, Manuel Leonardo; Ramírez-Bermúdez, Jesús; Sosa-Ortiz, Ana Luisa

    2013-01-01

    there is evidence of a relationship between physical and cognitive activity and the development of dementia, although this hypothesis has not been tested in Mexican population. analyze the association between an increased participation in physical and cognitive activities and the probability of having dementia, using a Mexican open population sample. we made a cross sectional survey in open Mexican population of residents in urban and rural areas of 65 of age and older; we performed cognitive assessments to identify subjects with dementia, as well as questionnaires to assess the level of participation in physical and cognitive activities. We performed a binary logistic regression analysis to establish the association between participation and the probability of having dementia. we included 2003 subjects, 180 with diagnosis of dementia. Subjects with dementia were older, had less education and higher prevalence of some chronic diseases. The low participation in cognitive activities was associated with a higher probability of developing dementia. Patients with dementia had significantly lower scores on physical activity scales. this study supports the hypothesis of a relationship between low cognitive and physical activity and the presentation of dementia.

  20. Predicting redox conditions in groundwater at a regional scale

    USGS Publications Warehouse

    Tesoriero, Anthony J.; Terziotti, Silvia; Abrams, Daniel B.

    2015-01-01

    Defining the oxic-suboxic interface is often critical for determining pathways for nitrate transport in groundwater and to streams at the local scale. Defining this interface on a regional scale is complicated by the spatial variability of reaction rates. The probability of oxic groundwater in the Chesapeake Bay watershed was predicted by relating dissolved O2 concentrations in groundwater samples to indicators of residence time and/or electron donor availability using logistic regression. Variables that describe surficial geology, position in the flow system, and soil drainage were important predictors of oxic water. The probability of encountering oxic groundwater at a 30 m depth and the depth to the bottom of the oxic layer were predicted for the Chesapeake Bay watershed. The influence of depth to the bottom of the oxic layer on stream nitrate concentrations and time lags (i.e., time period between land application of nitrogen and its effect on streams) are illustrated using model simulations for hypothetical basins. Regional maps of the probability of oxic groundwater should prove useful as indicators of groundwater susceptibility and stream susceptibility to contaminant sources derived from groundwater.

  1. Risk factors for low birth weight according to the multiple logistic regression model. A retrospective cohort study in José María Morelos municipality, Quintana Roo, Mexico.

    PubMed

    Franco Monsreal, José; Tun Cobos, Miriam Del Ruby; Hernández Gómez, José Ricardo; Serralta Peraza, Lidia Esther Del Socorro

    2018-01-17

    Low birth weight has been an enigma for science over time. There have been many researches on its causes and its effects. Low birth weight is an indicator that predicts the probability of a child surviving. In fact, there is an exponential relationship between weight deficit, gestational age, and perinatal mortality. Multiple logistic regression is one of the most expressive and versatile statistical instruments available for the analysis of data in both clinical and epidemiology settings, as well as in public health. To assess in a multivariate fashion the importance of 17 independent variables in low birth weight (dependent variable) of children born in the Mayan municipality of José María Morelos, Quintana Roo, Mexico. Analytical observational epidemiological cohort study with retrospective temporality. Births that met the inclusion criteria occurred in the "Hospital Integral Jose Maria Morelos" of the Ministry of Health corresponding to the Maya municipality of Jose Maria Morelos during the period from August 1, 2014 to July 31, 2015. The total number of newborns recorded was 1,147; 84 of which (7.32%) had low birth weight. To estimate the independent association between the explanatory variables (potential risk factors) and the response variable, a multiple logistic regression analysis was performed using the IBM SPSS Statistics 22 software. In ascending numerical order values of odds ratio > 1 indicated the positive contribution of explanatory variables or possible risk factors: "unmarried" marital status (1.076, 95% confidence interval: 0.550 to 2.104); age at menarche ≤ 12 years (1.08, 95% confidence interval: 0.64 to 1.84); history of abortion(s) (1.14, 95% confidence interval: 0.44 to 2.93); maternal weight < 50 kg (1.51, 95% confidence interval: 0.83 to 2.76); number of prenatal consultations ≤ 5 (1.86, 95% confidence interval: 0.94 to 3.66); maternal age ≥ 36 years (3.5, 95% confidence interval: 0.40 to 30.47); maternal age ≤ 19 years (3.59, 95% confidence interval: 0.43 to 29.87); number of deliveries = 1 (3.86, 95% confidence interval: 0.33 to 44.85); personal pathological history (4.78, 95% confidence interval: 2.16 to 10.59); pathological obstetric history (5.01, 95% confidence interval: 1.66 to 15.18); maternal height < 150 cm (5.16, 95% confidence interval: 3.08 to 8.65); number of births ≥ 5 (5.99, 95% confidence interval: 0.51 to 69.99); and smoking (15.63, 95% confidence interval: 1.07 to 227.97). Four of the independent variables (personal pathological history, obstetric pathological history, maternal stature <150 centimeters and smoking) showed a significant positive contribution, thus they can be considered as clear risk factors for low birth weight. The use of the logistic regression model in the Mayan municipality of José María Morelos, will allow estimating the probability of low birth weight for each pregnant woman in the future, which will be useful for the health authorities of the region.

  2. Estimating the probability of elevated nitrate (NO2+NO3-N) concentrations in ground water in the Columbia Basin Ground Water Management Area, Washington

    USGS Publications Warehouse

    Frans, Lonna M.

    2000-01-01

    Logistic regression was used to relate anthropogenic (man-made) and natural factors to the occurrence of elevated concentrations of nitrite plus nitrate as nitrogen in ground water in the Columbia Basin Ground Water Management Area, eastern Washington. Variables that were analyzed included well depth, depth of well casing, ground-water recharge rates, presence of canals, fertilizer application amounts, soils, surficial geology, and land-use types. The variables that best explain the occurrence of nitrate concentrations above 3 milligrams per liter in wells were the amount of fertilizer applied annually within a 2-kilometer radius of a well and the depth of the well casing; the variables that best explain the occurrence of nitrate above 10 milligrams per liter included the amount of fertilizer applied annually within a 3-kilometer radius of a well, the depth of the well casing, and the mean soil hydrologic group, which is a measure of soil infiltration rate. Based on the relations between these variables and elevated nitrate concentrations, models were developed using logistic regression that predict the probability that ground water will exceed a nitrate concentration of either 3 milligrams per liter or 10 milligrams per liter. Maps were produced that illustrate the predicted probability that ground-water nitrate concentrations will exceed 3 milligrams per liter or 10 milligrams per liter for wells cased to 78 feet below land surface (median casing depth) and the predicted depth to which wells would need to be cased in order to have an 80-percent probability of drawing water with a nitrate concentration below either 3 milligrams per liter or 10 milligrams per liter. Maps showing the predicted probability for the occurrence of elevated nitrate concentrations indicate that the irrigated agricultural regions are most at risk. The predicted depths to which wells need to be cased in order to have an 80-percent chance of obtaining low nitrate ground water exceed 600 feet in the irrigated agricultural regions, whereas wells in dryland agricultural areas generally need a casing in excess of 400 feet. The predicted depth to which wells need to be cased to have at least an 80-percent chance to draw water with a nitrate concentration less than 10 milligrams per liter generally did not exceed 800 feet, with a 200-foot casing depth typical of the majority of the area.

  3. A Solution to Separation and Multicollinearity in Multiple Logistic Regression

    PubMed Central

    Shen, Jianzhao; Gao, Sujuan

    2010-01-01

    In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27–38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth’s penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study. PMID:20376286

  4. A Solution to Separation and Multicollinearity in Multiple Logistic Regression.

    PubMed

    Shen, Jianzhao; Gao, Sujuan

    2008-10-01

    In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27-38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth's penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study.

  5. [Influences of environmental factors and interaction of several chemokines gene-environmental on systemic lupus erythematosus].

    PubMed

    Ye, Dong-qing; Hu, Yi-song; Li, Xiang-pei; Huang, Fen; Yang, Shi-gui; Hao, Jia-hu; Yin, Jing; Zhang, Guo-qing; Liu, Hui-hui

    2004-11-01

    To explore the impact of environmental factors, daily lifestyle, psycho-social factors and the interactions between environmental factors and chemokines genes on systemic lupus erythematosus (SLE). Case-control study was carried out and environmental factors for SLE were analyzed by univariate and multivariate unconditional logistic regression. Interactions between environmental factors and chemokines polymorphism contributing to systemic lupus erythematosus were also analyzed by logistic regression model. There were nineteen factors associated with SLE when univariate unconditional logistic regression was used. However, when multivariate unconditional logistic regression was used, only five factors showed having impacts on the disease, in which drinking well water (OR=0.099) was protective factor for SLE, and multiple drug allergy (OR=8.174), over-exposure to sunshine (OR=18.339), taking antibiotics (OR=9.630) and oral contraceptives were risk factors for SLE. When unconditional logistic regression model was used, results showed that there was interaction between eating irritable food and -2518MCP-1G/G genotype (OR=4.387). No interaction between environmental factors was found that contributing to SLE in this study. Many environmental factors were related to SLE, and there was an interaction between -2518MCP-1G/G genotype and eating irritable food.

  6. A deeper look at two concepts of measuring gene-gene interactions: logistic regression and interaction information revisited.

    PubMed

    Mielniczuk, Jan; Teisseyre, Paweł

    2018-03-01

    Detection of gene-gene interactions is one of the most important challenges in genome-wide case-control studies. Besides traditional logistic regression analysis, recently the entropy-based methods attracted a significant attention. Among entropy-based methods, interaction information is one of the most promising measures having many desirable properties. Although both logistic regression and interaction information have been used in several genome-wide association studies, the relationship between them has not been thoroughly investigated theoretically. The present paper attempts to fill this gap. We show that although certain connections between the two methods exist, in general they refer two different concepts of dependence and looking for interactions in those two senses leads to different approaches to interaction detection. We introduce ordering between interaction measures and specify conditions for independent and dependent genes under which interaction information is more discriminative measure than logistic regression. Moreover, we show that for so-called perfect distributions those measures are equivalent. The numerical experiments illustrate the theoretical findings indicating that interaction information and its modified version are more universal tools for detecting various types of interaction than logistic regression and linkage disequilibrium measures. © 2017 WILEY PERIODICALS, INC.

  7. Predicting the dynamics of ascospore maturation of Venturia pirina based on environmental factors.

    PubMed

    Rossi, V; Salinari, F; Pattori, E; Giosuè, S; Bugiani, R

    2009-04-01

    Airborne ascospores of Venturia pirina were trapped at two sites in northern Italy in 2002 to 2008. The cumulative proportion of ascospores trapped at each discharge was regressed against the physiological time. The best fit (R(2) = 0.90, standard error of estimates [SEest] = 0.11) was obtained using a Gompertz equation and the degree-days (>0 degrees C) accumulated after the day on which the first ascospore of the season was trapped (biofix day), but only for the days with > or =0.2 mm rain or < or =4 hPa vapor pressure deficit (DDwet). This Italian model performed better than the models developed in Oregon, United States (R(2) = 0.69, SEest = 0.16) or Victoria, Australia (R(2) = 0.74, SEest = 0.18), which consider only the effect of temperature. When the Italian model was evaluated against data not used in its elaboration, it accurately predicted ascospore maturation (R(2) = 0.92, SEest = 0.10). A logistic regression model was also developed to estimate the biofix for initiating the accumulation of degree-days (biofix model). The probability of the first ascospore discharge of the season increased as DDwet (calculated from 1 January) increased. Based on this model, there is low probability of the first ascospore discharge when DDwet < or =268.5 (P = 0.03) and high probability (P = 0.83) of discharge on the first day with >0.2 mm rain after such a DDwet threshold.

  8. Controlling Type I Error Rates in Assessing DIF for Logistic Regression Method Combined with SIBTEST Regression Correction Procedure and DIF-Free-Then-DIF Strategy

    ERIC Educational Resources Information Center

    Shih, Ching-Lin; Liu, Tien-Hsiang; Wang, Wen-Chung

    2014-01-01

    The simultaneous item bias test (SIBTEST) method regression procedure and the differential item functioning (DIF)-free-then-DIF strategy are applied to the logistic regression (LR) method simultaneously in this study. These procedures are used to adjust the effects of matching true score on observed score and to better control the Type I error…

  9. Explaining Match Outcome During The Men’s Basketball Tournament at The Olympic Games

    PubMed Central

    Leicht, Anthony S.; Gómez, Miguel A.; Woods, Carl T.

    2017-01-01

    In preparation for the Olympics, there is a limited opportunity for coaches and athletes to interact regularly with team performance indicators providing important guidance to coaches for enhanced match success at the elite level. This study examined the relationship between match outcome and team performance indicators during men’s basketball tournaments at the Olympic Games. Twelve team performance indicators were collated from all men’s teams and matches during the basketball tournament of the 2004-2016 Olympic Games (n = 156). Linear and non-linear analyses examined the relationship between match outcome and team performance indicator characteristics; namely, binary logistic regression and a conditional interference (CI) classification tree. The most parsimonious logistic regression model retained ‘assists’, ‘defensive rebounds’, ‘field-goal percentage’, ‘fouls’, ‘fouls against’, ‘steals’ and ‘turnovers’ (delta AIC <0.01; Akaike weight = 0.28) with a classification accuracy of 85.5%. Conversely, four performance indicators were retained with the CI classification tree with an average classification accuracy of 81.4%. However, it was the combination of ‘field-goal percentage’ and ‘defensive rebounds’ that provided the greatest probability of winning (93.2%). Match outcome during the men’s basketball tournaments at the Olympic Games was identified by a unique combination of performance indicators. Despite the average model accuracy being marginally higher for the logistic regression analysis, the CI classification tree offered a greater practical utility for coaches through its resolution of non-linear phenomena to guide team success. Key points A unique combination of team performance indicators explained 93.2% of winning observations in men’s basketball at the Olympics. Monitoring of these team performance indicators may provide coaches with the capability to devise multiple game plans or strategies to enhance their likelihood of winning. Incorporation of machine learning techniques with team performance indicators may provide a valuable and strategic approach to explain patterns within multivariate datasets in sport science. PMID:29238245

  10. The impact of the 2008 financial crisis on food security and food expenditures in Mexico: a disproportionate effect on the vulnerable

    PubMed Central

    Vilar-Compte, Mireya; Sandoval-Olascoaga, Sebastian; Bernal-Stuart, Ana; Shimoga, Sandhya; Vargas-Bustamante, Arturo

    2015-01-01

    Objective The present paper investigated the impact of the 2008 financial crisis on food security in Mexico and how it disproportionally affected vulnerable households. Design A generalized ordered logistic regression was estimated to assess the impact of the crisis on households’ food security status. An ordinary least squares and a quantile regression were estimated to evaluate the effect of the financial crisis on a continuous proxy measure of food security defined as the share of a household’s current income devoted to food expenditures. Setting Both analyses were performed using pooled cross-sectional data from the Mexican National Household Income and Expenditure Survey 2008 and 2010. Subjects The analytical sample included 29 468 households in 2008 and 27 654 in 2010. Results The generalized ordered logistic model showed that the financial crisis significantly (P < 0·05) decreased the probability of being food secure, mildly or moderately food insecure, compared with being severely food insecure (OR = 0·74). A similar but smaller effect was found when comparing severely and moderately food-insecure households with mildly food-insecure and food-secure households (OR = 0·81). The ordinary least squares model showed that the crisis significantly (P < 0·05) increased the share of total income spent on food (β coefficient of 0·02). The quantile regression confirmed the findings suggested by the generalized ordered logistic model, showing that the effects of the crisis were more profound among poorer households. Conclusions The results suggest that households that were more vulnerable before the financial crisis saw a worsened effect in terms of food insecurity with the crisis. Findings were consistent with both measures of food security – one based on self-reported experience and the other based on food spending. PMID:25428800

  11. Logistic regression models for predicting physical and mental health-related quality of life in rheumatoid arthritis patients.

    PubMed

    Alishiri, Gholam Hossein; Bayat, Noushin; Fathi Ashtiani, Ali; Tavallaii, Seyed Abbas; Assari, Shervin; Moharamzad, Yashar

    2008-01-01

    The aim of this work was to develop two logistic regression models capable of predicting physical and mental health related quality of life (HRQOL) among rheumatoid arthritis (RA) patients. In this cross-sectional study which was conducted during 2006 in the outpatient rheumatology clinic of our university hospital, Short Form 36 (SF-36) was used for HRQOL measurements in 411 RA patients. A cutoff point to define poor versus good HRQOL was calculated using the first quartiles of SF-36 physical and mental component scores (33.4 and 36.8, respectively). Two distinct logistic regression models were used to derive predictive variables including demographic, clinical, and psychological factors. The sensitivity, specificity, and accuracy of each model were calculated. Poor physical HRQOL was positively associated with pain score, disease duration, monthly family income below 300 US$, comorbidity, patient global assessment of disease activity or PGA, and depression (odds ratios: 1.1; 1.004; 15.5; 1.1; 1.02; 2.08, respectively). The variables that entered into the poor mental HRQOL prediction model were monthly family income below 300 US$, comorbidity, PGA, and bodily pain (odds ratios: 6.7; 1.1; 1.01; 1.01, respectively). Optimal sensitivity and specificity were achieved at a cutoff point of 0.39 for the estimated probability of poor physical HRQOL and 0.18 for mental HRQOL. Sensitivity, specificity, and accuracy of the physical and mental models were 73.8, 87, 83.7% and 90.38, 70.36, 75.43%, respectively. The results show that the suggested models can be used to predict poor physical and mental HRQOL separately among RA patients using simple variables with acceptable accuracy. These models can be of use in the clinical decision-making of RA patients and to recognize patients with poor physical or mental HRQOL in advance, for better management.

  12. A low-cost landslide displacement activity assessment from time-lapse photogrammetry and rainfall data: Application to the Tessina landslide site

    NASA Astrophysics Data System (ADS)

    Gabrieli, F.; Corain, L.; Vettore, L.

    2016-09-01

    Acquiring useful and reliable displacement data from a complex landslide site is often a problem because of large, localized and scattered erosive processes and deformations; the inaccessibility of the site; the high cost of instrumentation and maintenance. However, these data are of fundamental importance not only to hazard assessments but also to understanding the processes at the basis of slope evolution. In this framework, time-lapse photogrammetry can represent a good compromise; the low accuracy is compensated for by the wide-ranging and dense spatial displacement information that can be obtained with inexpensive equipment. Nevertheless, when large displacement monitoring data sets become available, the problem becomes the choice of the most suitable statistical model to describe the probability of movement and adequately simplify the complexity of a scattered, intermittent, and spatially inhomogeneous displacement field. In this paper, an automated displacement detection method, which is based on the absolute image differences and digital correlations from a sequence of photos, was developed and applied to a photographic survey activity at the head of the Tessina landslide (northeastern Italy). The method allowed us to simplify and binarize the displacement field and to recognize the intermittent activity and the peculiar behaviours of different parts of the landslide, which were identified and classified by combining geomorphological and geological information. Moreover, for the first time, sliding correlations between these areas were quantitatively estimated using time-series-based binary logistic regression and the definition of a probability-based directed graph of displacement occurrence that connected the source zones to the lower depletion basin and the main collector channel. Using rainfall data, event-based logistic and Poisson regression models were applied to the upper zones of the landslide to estimate the probability of movement of each scarp and the persistence of the displacement as a result of certain rainfall events. The results of these statistical analyses highlighted the capability of this approach to quantitatively evaluate the pattern of displacement occurrences and to assess the evolution of a landslide site to gain insight into geomorphological processes.

  13. Development of Decision Support Formulas for the Prediction of Bladder Outlet Obstruction and Prostatic Surgery in Patients With Lower Urinary Tract Symptom/Benign Prostatic Hyperplasia: Part I, Development of the Formula and its Internal Validation.

    PubMed

    Choo, Min Soo; Yoo, Changwon; Cho, Sung Yong; Jeong, Seong Jin; Jeong, Chang Wook; Ku, Ja Hyeon; Oh, Seung-June

    2017-04-01

    As the elderly population increases, a growing number of patients have lower urinary tract symptom (LUTS)/benign prostatic hyperplasia (BPH). The aim of this study was to develop decision support formulas and nomograms for the prediction of bladder outlet obstruction (BOO) and for BOO-related surgical decision-making, and to validate them in patients with LUTS/BPH. Patient with LUTS/BPH between October 2004 and May 2014 were enrolled as a development cohort. The available variables included age, International Prostate Symptom Score, free uroflowmetry, postvoid residual volume, total prostate volume, and the results of a pressure-flow study. A causal Bayesian network analysis was used to identify relevant parameters. Using multivariate logistic regression analysis, formulas were developed to calculate the probabilities of having BOO and requiring prostatic surgery. Patients between June 2014 and December 2015 were prospectively enrolled for internal validation. Receiver operating characteristic curve analysis, calibration plots, and decision curve analysis were performed. A total of 1,179 male patients with LUTS/BPH, with a mean age of 66.1 years, were included as a development cohort. Another 253 patients were enrolled as an internal validation cohort. Using multivariate logistic regression analysis, 2 and 4 formulas were established to estimate the probabilities of having BOO and requiring prostatic surgery, respectively. Our analysis of the predictive accuracy of the model revealed area under the curve values of 0.82 for BOO and 0.87 for prostatic surgery. The sensitivity and specificity were 53.6% and 87.0% for BOO, and 91.6% and 50.0% for prostatic surgery, respectively. The calibration plot indicated that these prediction models showed a good correspondence. In addition, the decision curve analysis showed a high net benefit across the entire spectrum of probability thresholds. We established nomograms for the prediction of BOO and BOO-related prostatic surgery in patients with LUTS/BPH. Internal validation of the nomograms demonstrated that they predicted both having BOO and requiring prostatic surgery very well.

  14. A multicenter mortality prediction model for patients receiving prolonged mechanical ventilation

    PubMed Central

    Carson, Shannon S.; Kahn, Jeremy M.; Hough, Catherine L.; Seeley, Eric J.; White, Douglas B.; Douglas, Ivor S.; Cox, Christopher E.; Caldwell, Ellen; Bangdiwala, Shrikant I.; Garrett, Joanne M.; Rubenfeld, Gordon D.

    2012-01-01

    Objective Significant deficiencies exist in the communication of prognosis for patients requiring prolonged mechanical ventilation after acute illness, in part because of clinician uncertainty about long-term outcomes. We sought to refine a mortality prediction model for patients requiring prolonged ventilation using a multicentered study design. Design Cohort study. Setting Five geographically diverse tertiary care medical centers in the United States (California, Colorado, North Carolina, Pennsylvania, Washington). Patients Two hundred sixty adult patients who received at least 21 days of mechanical ventilation after acute illness. Interventions None. Measurements and Main Results For the probability model, we included age, platelet count, and requirement for vasopressors and/or hemodialysis, each measured on day 21 of mechanical ventilation, in a logistic regression model with 1-yr mortality as the outcome variable. We subsequently modified a simplified prognostic scoring rule (ProVent score) by categorizing the risk variables (age 18–49, 50–64, and >65 yrs; platelet count 0–150 and >150; vasopressors; hemodialysis) in another logistic regression model and assigning points to variables according to β coefficient values. Overall mortality at 1 yr was 48%. The area under the curve of the receiver operator characteristic curve for the primary ProVent probability model was 0.79 (95% confidence interval, 0.75–0.81), and the p value for the Hosmer-Lemeshow goodness-of-fit statistic was .89. The area under the curve for the categorical model was 0.77, and the p value for the goodness-of-fit statistic was .34. The area under the curve for the ProVent score was 0.76, and the p value for the Hosmer-Lemeshow goodness-of-fit statistic was .60. For the 50 patients with a ProVent score >2, only one patient was able to be discharged directly home, and 1-yr mortality was 86%. Conclusion The ProVent probability model is a simple and reproducible model that can accurately identify patients requiring prolonged mechanical ventilation who are at high risk of 1-yr mortality. PMID:22080643

  15. [Predicting the probability of development and progression of primary open angle glaucoma by regression modeling].

    PubMed

    Likhvantseva, V G; Sokolov, V A; Levanova, O N; Kovelenova, I V

    2018-01-01

    Prediction of the clinical course of primary open-angle glaucoma (POAG) is one of the main directions in solving the problem of vision loss prevention and stabilization of the pathological process. Simple statistical methods of correlation analysis show the extent of each risk factor's impact, but do not indicate the total impact of these factors in personalized combinations. The relationships between the risk factors is subject to correlation and regression analysis. The regression equation represents the dependence of the mathematical expectation of the resulting sign on the combination of factor signs. To develop a technique for predicting the probability of development and progression of primary open-angle glaucoma based on a personalized combination of risk factors by linear multivariate regression analysis. The study included 66 patients (23 female and 43 male; 132 eyes) with newly diagnosed primary open-angle glaucoma. The control group consisted of 14 patients (8 male and 6 female). Standard ophthalmic examination was supplemented with biochemical study of lacrimal fluid. Concentration of matrix metalloproteinase MMP-2 and MMP-9 in tear fluid in both eyes was determined using 'sandwich' enzyme-linked immunosorbent assay (ELISA) method. The study resulted in the development of regression equations and step-by-step multivariate logistic models that can help calculate the risk of development and progression of POAG. Those models are based on expert evaluation of clinical and instrumental indicators of hydrodynamic disturbances (coefficient of outflow ease - C, volume of intraocular fluid secretion - F, fluctuation of intraocular pressure), as well as personalized morphometric parameters of the retina (central retinal thickness in the macular area) and concentration of MMP-2 and MMP-9 in the tear film. The newly developed regression equations are highly informative and can be a reliable tool for studying of the influence vector and assessment of pathogenic potential of the independent risk factors in specific personalized combinations.

  16. Access disparities to Magnet hospitals for patients undergoing neurosurgical operations

    PubMed Central

    Missios, Symeon; Bekelis, Kimon

    2017-01-01

    Background Centers of excellence focusing on quality improvement have demonstrated superior outcomes for a variety of surgical interventions. We investigated the presence of access disparities to hospitals recognized by the Magnet Recognition Program of the American Nurses Credentialing Center (ANCC) for patients undergoing neurosurgical operations. Methods We performed a cohort study of all neurosurgery patients who were registered in the New York Statewide Planning and Research Cooperative System (SPARCS) database from 2009–2013. We examined the association of African-American race and lack of insurance with Magnet status hospitalization for neurosurgical procedures. A mixed effects propensity adjusted multivariable regression analysis was used to control for confounding. Results During the study period, 190,535 neurosurgical patients met the inclusion criteria. Using a multivariable logistic regression, we demonstrate that African-Americans had lower admission rates to Magnet institutions (OR 0.62; 95% CI, 0.58–0.67). This persisted in a mixed effects logistic regression model (OR 0.77; 95% CI, 0.70–0.83) to adjust for clustering at the patient county level, and a propensity score adjusted logistic regression model (OR 0.75; 95% CI, 0.69–0.82). Additionally, lack of insurance was associated with lower admission rates to Magnet institutions (OR 0.71; 95% CI, 0.68–0.73), in a multivariable logistic regression model. This persisted in a mixed effects logistic regression model (OR 0.72; 95% CI, 0.69–0.74), and a propensity score adjusted logistic regression model (OR 0.72; 95% CI, 0.69–0.75). Conclusions Using a comprehensive all-payer cohort of neurosurgery patients in New York State we identified an association of African-American race and lack of insurance with lower rates of admission to Magnet hospitals. PMID:28684152

  17. Tool for Forecasting Cool-Season Peak Winds Across Kennedy Space Center and Cape Canaveral Air Force Station

    NASA Technical Reports Server (NTRS)

    Barrett, Joe H., III; Roeder, William P.

    2010-01-01

    The expected peak wind speed for the day is an important element in the daily morning forecast for ground and space launch operations at Kennedy Space Center (KSC) and Cape Canaveral Air Force Station (CCAFS). The 45th Weather Squadron (45 WS) must issue forecast advisories for KSC/CCAFS when they expect peak gusts for >= 25, >= 35, and >= 50 kt thresholds at any level from the surface to 300 ft. In Phase I of this task, the 45 WS tasked the Applied Meteorology Unit (AMU) to develop a cool-season (October - April) tool to help forecast the non-convective peak wind from the surface to 300 ft at KSC/CCAFS. During the warm season, these wind speeds are rarely exceeded except during convective winds or under the influence of tropical cyclones, for which other techniques are already in use. The tool used single and multiple linear regression equations to predict the peak wind from the morning sounding. The forecaster manually entered several observed sounding parameters into a Microsoft Excel graphical user interface (GUI), and then the tool displayed the forecast peak wind speed, average wind speed at the time of the peak wind, the timing of the peak wind and the probability the peak wind will meet or exceed 35, 50 and 60 kt. The 45 WS customers later dropped the requirement for >= 60 kt wind warnings. During Phase II of this task, the AMU expanded the period of record (POR) by six years to increase the number of observations used to create the forecast equations. A large number of possible predictors were evaluated from archived soundings, including inversion depth and strength, low-level wind shear, mixing height, temperature lapse rate and winds from the surface to 3000 ft. Each day in the POR was stratified in a number of ways, such as by low-level wind direction, synoptic weather pattern, precipitation and Bulk Richardson number. The most accurate Phase II equations were then selected for an independent verification. The Phase I and II forecast methods were compared using an independent verification data set. The two methods were compared to climatology, wind warnings and advisories issued by the 45 WS, and North American Mesoscale (NAM) model (MesoNAM) forecast winds. The performance of the Phase I and II methods were similar with respect to mean absolute error. Since the Phase I data were not stratified by precipitation, this method's peak wind forecasts had a large negative bias on days with precipitation and a small positive bias on days with no precipitation. Overall, the climatology methods performed the worst while the MesoNAM performed the best. Since the MesoNAM winds were the most accurate in the comparison, the final version of the tool was based on the MesoNAM winds. The probability the peak wind will meet or exceed the warning thresholds were based on the one standard deviation error bars from the linear regression. For example, the linear regression might forecast the most likely peak speed to be 35 kt and the error bars used to calculate that the probability of >= 25 kt = 76%, the probability of >= 35 kt = 50%, and the probability of >= 50 kt = 19%. The authors have not seen this application of linear regression error bars in any other meteorological applications. Although probability forecast tools should usually be developed with logistic regression, this technique could be easily generalized to any linear regression forecast tool to estimate the probability of exceeding any desired threshold . This could be useful for previously developed linear regression forecast tools or new forecast applications where statistical analysis software to perform logistic regression is not available. The tool was delivered in two formats - a Microsoft Excel GUI and a Tool Command Language/Tool Kit (Tcl/Tk) GUI in the Meteorological Interactive Data Display System (MIDDS). The Microsoft Excel GUI reads a MesoNAM text file containing hourly forecasts from 0 to 84 hours, from one model run (00 or 12 UTC). The GUI then displays e peak wind speed, average wind speed, and the probability the peak wind will meet or exceed the 25-, 35- and 50-kt thresholds. The user can display the Day-1 through Day-3 peak wind forecasts, and separate forecasts are made for precipitation and non-precipitation days. The MIDDS GUI uses data from the NAM and Global Forecast System (GFS), instead of the MesoNAM. It can display Day-1 and Day-2 forecasts using NAM data, and Day-1 through Day-5 forecasts using GFS data. The timing of the peak wind is not displayed, since the independent verification showed that none of the forecast methods performed significantly better than climatology. The forecaster should use the climatological timing of the peak wind (2248 UTC) as a first guess and then adjust it based on the movement of weather features.

  18. On the use and misuse of scalar scores of confounders in design and analysis of observational studies.

    PubMed

    Pfeiffer, R M; Riedl, R

    2015-08-15

    We assess the asymptotic bias of estimates of exposure effects conditional on covariates when summary scores of confounders, instead of the confounders themselves, are used to analyze observational data. First, we study regression models for cohort data that are adjusted for summary scores. Second, we derive the asymptotic bias for case-control studies when cases and controls are matched on a summary score, and then analyzed either using conditional logistic regression or by unconditional logistic regression adjusted for the summary score. Two scores, the propensity score (PS) and the disease risk score (DRS) are studied in detail. For cohort analysis, when regression models are adjusted for the PS, the estimated conditional treatment effect is unbiased only for linear models, or at the null for non-linear models. Adjustment of cohort data for DRS yields unbiased estimates only for linear regression; all other estimates of exposure effects are biased. Matching cases and controls on DRS and analyzing them using conditional logistic regression yields unbiased estimates of exposure effect, whereas adjusting for the DRS in unconditional logistic regression yields biased estimates, even under the null hypothesis of no association. Matching cases and controls on the PS yield unbiased estimates only under the null for both conditional and unconditional logistic regression, adjusted for the PS. We study the bias for various confounding scenarios and compare our asymptotic results with those from simulations with limited sample sizes. To create realistic correlations among multiple confounders, we also based simulations on a real dataset. Copyright © 2015 John Wiley & Sons, Ltd.

  19. [Application of SAS macro to evaluated multiplicative and additive interaction in logistic and Cox regression in clinical practices].

    PubMed

    Nie, Z Q; Ou, Y Q; Zhuang, J; Qu, Y J; Mai, J Z; Chen, J M; Liu, X Q

    2016-05-01

    Conditional logistic regression analysis and unconditional logistic regression analysis are commonly used in case control study, but Cox proportional hazard model is often used in survival data analysis. Most literature only refer to main effect model, however, generalized linear model differs from general linear model, and the interaction was composed of multiplicative interaction and additive interaction. The former is only statistical significant, but the latter has biological significance. In this paper, macros was written by using SAS 9.4 and the contrast ratio, attributable proportion due to interaction and synergy index were calculated while calculating the items of logistic and Cox regression interactions, and the confidence intervals of Wald, delta and profile likelihood were used to evaluate additive interaction for the reference in big data analysis in clinical epidemiology and in analysis of genetic multiplicative and additive interactions.

  20. Morphological Awareness and Children's Writing: Accuracy, Error, and Invention

    PubMed Central

    McCutchen, Deborah; Stull, Sara

    2014-01-01

    This study examined the relationship between children's morphological awareness and their ability to produce accurate morphological derivations in writing. Fifth-grade U.S. students (n = 175) completed two writing tasks that invited or required morphological manipulation of words. We examined both accuracy and error, specifically errors in spelling and errors of the sort we termed morphological inventions, which entailed inappropriate, novel pairings of stems and suffixes. Regressions were used to determine the relationship between morphological awareness, morphological accuracy, and spelling accuracy, as well as between morphological awareness and morphological inventions. Linear regressions revealed that morphological awareness uniquely predicted children's generation of accurate morphological derivations, regardless of whether or not accurate spelling was required. A logistic regression indicated that morphological awareness was also uniquely predictive of morphological invention, with higher morphological awareness increasing the probability of morphological invention. These findings suggest that morphological knowledge may not only assist children with spelling during writing, but may also assist with word production via generative experimentation with morphological rules during sentence generation. Implications are discussed for the development of children's morphological knowledge and relationships with writing. PMID:25663748

  1. An Analysis of the Number of Medical Malpractice Claims and Their Amounts

    PubMed Central

    Bonetti, Marco; Cirillo, Pasquale; Musile Tanzi, Paola; Trinchero, Elisabetta

    2016-01-01

    Starting from an extensive database, pooling 9 years of data from the top three insurance brokers in Italy, and containing 38125 reported claims due to alleged cases of medical malpractice, we use an inhomogeneous Poisson process to model the number of medical malpractice claims in Italy. The intensity of the process is allowed to vary over time, and it depends on a set of covariates, like the size of the hospital, the medical department and the complexity of the medical operations performed. We choose the combination medical department by hospital as the unit of analysis. Together with the number of claims, we also model the associated amounts paid by insurance companies, using a two-stage regression model. In particular, we use logistic regression for the probability that a claim is closed with a zero payment, whereas, conditionally on the fact that an amount is strictly positive, we make use of lognormal regression to model it as a function of several covariates. The model produces estimates and forecasts that are relevant to both insurance companies and hospitals, for quality assurance, service improvement and cost reduction. PMID:27077661

  2. Logistic Regression for Seismically Induced Landslide Predictions: Using Uniform Hazard and Geophysical Layers as Predictor Variables

    NASA Astrophysics Data System (ADS)

    Nowicki, M. A.; Hearne, M.; Thompson, E.; Wald, D. J.

    2012-12-01

    Seismically induced landslides present a costly and often fatal threats in many mountainous regions. Substantial effort has been invested to understand where seismically induced landslides may occur in the future. Both slope-stability methods and, more recently, statistical approaches to the problem are described throughout the literature. Though some regional efforts have succeeded, no uniformly agreed-upon method is available for predicting the likelihood and spatial extent of seismically induced landslides. For use in the U. S. Geological Survey (USGS) Prompt Assessment of Global Earthquakes for Response (PAGER) system, we would like to routinely make such estimates, in near-real time, around the globe. Here we use the recently produced USGS ShakeMap Atlas of historic earthquakes to develop an empirical landslide probability model. We focus on recent events, yet include any digitally-mapped landslide inventories for which well-constrained ShakeMaps are also available. We combine these uniform estimates of the input shaking (e.g., peak acceleration and velocity) with broadly available susceptibility proxies, such as topographic slope and surface geology. The resulting database is used to build a predictive model of the probability of landslide occurrence with logistic regression. The landslide database includes observations from the Northridge, California (1994); Wenchuan, China (2008); ChiChi, Taiwan (1999); and Chuetsu, Japan (2004) earthquakes; we also provide ShakeMaps for moderate-sized events without landslide for proper model testing and training. The performance of the regression model is assessed with both statistical goodness-of-fit metrics and a qualitative review of whether or not the model is able to capture the spatial extent of landslides for each event. Part of our goal is to determine which variables can be employed based on globally-available data or proxies, and whether or not modeling results from one region are transferrable to geomorphologically-similar regions that lack proper calibration events. Combined with near-real time ShakeMaps, we anticipate using our model to make generalized predictions of whether or not (and if so, where) landslides are likely to occur for earthquakes around the globe; we also intend to incorporate this functionality into the USGS PAGER system.

  3. What Is Threatening the Effectiveness of Insecticide-Treated Bednets? A Case-Control Study of Environmental, Behavioral, and Physical Factors Associated with Prevention Failure.

    PubMed

    Obala, Andrew A; Mangeni, Judith Nekesa; Platt, Alyssa; Aswa, Daniel; Abel, Lucy; Namae, Jane; Prudhomme O'Meara, Wendy

    2015-01-01

    Insecticide-treated nets are the cornerstone of global malaria control and have been shown to reduce malaria morbidity by 50-60%. However, some areas are experiencing a resurgence in malaria following successful control. We describe an efficacy decay framework to understand why high malaria burden persists even under high ITN coverage in a community in western Kenya. We enrolled 442 children hospitalized with malaria and paired them with age, time, village and gender-matched controls. We completed comprehensive household and neighborhood assessments including entomological surveillance. The indicators are grouped into five domains in an efficacy decay framework: ITN ownership, compliance, physical integrity, vector susceptibility and facilitating factors. After variable selection, case-control data were analyzed using conditional logistic regression models and mosquito data were analyzed using negative binomial regression. Predictive margins were calculated from logistic regression models. Measures of ITN coverage and physical integrity were not correlated with hospitalized malaria in our study. However, consistent ITN use (Adjusted Odds Ratio (AOR) = 0.23, 95%CI: 0.12-0.43), presence of nearby larval sites (AOR = 1.137, 95%CI: 1.02-1.27), and specific types of crops (AOR (grains) = 0.446, 95%CI: 0.24-0.82) were significantly correlated with malaria amongst children who owned an ITN. The odds of hospitalization for febrile malaria nearly tripled when one other household member had symptomatic malaria infection (AOR-2.76, 95%CI:1.83-4.18). Overall, perfect household adherence could reduce the probability of hospitalization for malaria to less than 30% (95%CI:0.12-0.46) and adjusting environmental factors such as elimination of larval sites and growing grains nearby could reduce the probability of hospitalization for malaria to less than 20% (95%CI:0.04-0.31). Availability of ITNs is not the bottleneck for malaria prevention in this community. Behavior change interventions to improve compliance and environmental management of mosquito breeding habitats may greatly enhance ITN efficacy. A better understanding of the relationship between agriculture and mosquito survival and feeding success is needed.

  4. Nowcasting sunshine number using logistic modeling

    NASA Astrophysics Data System (ADS)

    Brabec, Marek; Badescu, Viorel; Paulescu, Marius

    2013-04-01

    In this paper, we present a formalized approach to statistical modeling of the sunshine number, binary indicator of whether the Sun is covered by clouds introduced previously by Badescu (Theor Appl Climatol 72:127-136, 2002). Our statistical approach is based on Markov chain and logistic regression and yields fully specified probability models that are relatively easily identified (and their unknown parameters estimated) from a set of empirical data (observed sunshine number and sunshine stability number series). We discuss general structure of the model and its advantages, demonstrate its performance on real data and compare its results to classical ARIMA approach as to a competitor. Since the model parameters have clear interpretation, we also illustrate how, e.g., their inter-seasonal stability can be tested. We conclude with an outlook to future developments oriented to construction of models allowing for practically desirable smooth transition between data observed with different frequencies and with a short discussion of technical problems that such a goal brings.

  5. Differential diagnosis of degenerative dementias using basic neuropsychological tests: multivariable logistic regression analysis of 301 patients.

    PubMed

    Jiménez-Huete, Adolfo; Riva, Elena; Toledano, Rafael; Campo, Pablo; Esteban, Jesús; Barrio, Antonio Del; Franch, Oriol

    2014-12-01

    The validity of neuropsychological tests for the differential diagnosis of degenerative dementias may depend on the clinical context. We constructed a series of logistic models taking into account this factor. We retrospectively analyzed the demographic and neuropsychological data of 301 patients with probable Alzheimer's disease (AD), frontotemporal degeneration (FTLD), or dementia with Lewy bodies (DLB). Nine models were constructed taking into account the diagnostic question (eg, AD vs DLB) and subpopulation (incident vs prevalent). The AD versus DLB model for all patients, including memory recovery and phonological fluency, was highly accurate (area under the curve = 0.919, sensitivity = 90%, and specificity = 80%). The results were comparable in incident and prevalent cases. The FTLD versus AD and DLB versus FTLD models were both inaccurate. The models constructed from basic neuropsychological variables allowed an accurate differential diagnosis of AD versus DLB but not of FTLD versus AD or DLB. © The Author(s) 2014.

  6. No rationale for 1 variable per 10 events criterion for binary logistic regression analysis.

    PubMed

    van Smeden, Maarten; de Groot, Joris A H; Moons, Karel G M; Collins, Gary S; Altman, Douglas G; Eijkemans, Marinus J C; Reitsma, Johannes B

    2016-11-24

    Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth's correction, are compared. The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect ('separation'). We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth's correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.

  7. 4D-Fingerprint Categorical QSAR Models for Skin Sensitization Based on Classification Local Lymph Node Assay Measures

    PubMed Central

    Li, Yi; Tseng, Yufeng J.; Pan, Dahua; Liu, Jianzhong; Kern, Petra S.; Gerberick, G. Frank; Hopfinger, Anton J.

    2008-01-01

    Currently, the only validated methods to identify skin sensitization effects are in vivo models, such as the Local Lymph Node Assay (LLNA) and guinea pig studies. There is a tremendous need, in particular due to novel legislation, to develop animal alternatives, eg. Quantitative Structure-Activity Relationship (QSAR) models. Here, QSAR models for skin sensitization using LLNA data have been constructed. The descriptors used to generate these models are derived from the 4D-molecular similarity paradigm and are referred to as universal 4D-fingerprints. A training set of 132 structurally diverse compounds and a test set of 15 structurally diverse compounds were used in this study. The statistical methodologies used to build the models are logistic regression (LR), and partial least square coupled logistic regression (PLS-LR), which prove to be effective tools for studying skin sensitization measures expressed in the two categorical terms of sensitizer and non-sensitizer. QSAR models with low values of the Hosmer-Lemeshow goodness-of-fit statistic, χHL2, are significant and predictive. For the training set, the cross-validated prediction accuracy of the logistic regression models ranges from 77.3% to 78.0%, while that of PLS-logistic regression models ranges from 87.1% to 89.4%. For the test set, the prediction accuracy of logistic regression models ranges from 80.0%-86.7%, while that of PLS-logistic regression models ranges from 73.3%-80.0%. The QSAR models are made up of 4D-fingerprints related to aromatic atoms, hydrogen bond acceptors and negatively partially charged atoms. PMID:17226934

  8. A New Piece of the Puzzle: Sexual Orientation, Gender, and Physical Health Status.

    PubMed

    Gorman, Bridget K; Denney, Justin T; Dowdy, Hilary; Medeiros, Rose Anne

    2015-08-01

    Although research has long documented the relevance of gender for health, studies that simultaneously incorporate the relevance of disparate sexual orientation groups are sparse. We address these shortcomings by applying an intersectional perspective to evaluate how sexual orientation and gender intersect to pattern self-rated health status among U.S. adults. Our project aggregated probability samples from the Behavioral Risk Factor Surveillance System (BRFSS) across seven U.S. states between 2005 and 2010, resulting in an analytic sample of 10,128 sexual minority (gay, lesbian, and bisexual) and 405,145 heterosexual adults. Logistic regression models and corresponding predicted probabilities examined how poor self-rated health differed across sexual orientation-by-gender groups, before and after adjustment for established health risk factors. Results reveal distinct patterns among sexual minorities. Initially, bisexual men and women reported the highest--and gay and lesbian adults reported the lowest--rates of poor self-rated health, with heterosexuals in between. Distinct socioeconomic status profiles accounted for large portions of these differences. Furthermore, in baseline and fully adjusted regression models, only among heterosexuals did women report significantly different health from men. Importantly, the findings highlight elevated rates of poor health experienced by bisexual men and women, which are partially attributable to their heightened economic, behavioral, and social disadvantages relative to other groups.

  9. MODELING SNAKE MICROHABITAT FROM RADIOTELEMETRY STUDIES USING POLYTOMOUS LOGISTIC REGRESSION

    EPA Science Inventory

    Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...

  10. A general regression framework for a secondary outcome in case-control studies.

    PubMed

    Tchetgen Tchetgen, Eric J

    2014-01-01

    Modern case-control studies typically involve the collection of data on a large number of outcomes, often at considerable logistical and monetary expense. These data are of potentially great value to subsequent researchers, who, although not necessarily concerned with the disease that defined the case series in the original study, may want to use the available information for a regression analysis involving a secondary outcome. Because cases and controls are selected with unequal probability, regression analysis involving a secondary outcome generally must acknowledge the sampling design. In this paper, the author presents a new framework for the analysis of secondary outcomes in case-control studies. The approach is based on a careful re-parameterization of the conditional model for the secondary outcome given the case-control outcome and regression covariates, in terms of (a) the population regression of interest of the secondary outcome given covariates and (b) the population regression of the case-control outcome on covariates. The error distribution for the secondary outcome given covariates and case-control status is otherwise unrestricted. For a continuous outcome, the approach sometimes reduces to extending model (a) by including a residual of (b) as a covariate. However, the framework is general in the sense that models (a) and (b) can take any functional form, and the methodology allows for an identity, log or logit link function for model (a).

  11. Selecting risk factors: a comparison of discriminant analysis, logistic regression and Cox's regression model using data from the Tromsø Heart Study.

    PubMed

    Brenn, T; Arnesen, E

    1985-01-01

    For comparative evaluation, discriminant analysis, logistic regression and Cox's model were used to select risk factors for total and coronary deaths among 6595 men aged 20-49 followed for 9 years. Groups with mortality between 5 and 93 per 1000 were considered. Discriminant analysis selected variable sets only marginally different from the logistic and Cox methods which always selected the same sets. A time-saving option, offered for both the logistic and Cox selection, showed no advantage compared with discriminant analysis. Analysing more than 3800 subjects, the logistic and Cox methods consumed, respectively, 80 and 10 times more computer time than discriminant analysis. When including the same set of variables in non-stepwise analyses, all methods estimated coefficients that in most cases were almost identical. In conclusion, discriminant analysis is advocated for preliminary or stepwise analysis, otherwise Cox's method should be used.

  12. Modification of the Mantel-Haenszel and Logistic Regression DIF Procedures to Incorporate the SIBTEST Regression Correction

    ERIC Educational Resources Information Center

    DeMars, Christine E.

    2009-01-01

    The Mantel-Haenszel (MH) and logistic regression (LR) differential item functioning (DIF) procedures have inflated Type I error rates when there are large mean group differences, short tests, and large sample sizes.When there are large group differences in mean score, groups matched on the observed number-correct score differ on true score,…

  13. Practical Session: Logistic Regression

    NASA Astrophysics Data System (ADS)

    Clausel, M.; Grégoire, G.

    2014-12-01

    An exercise is proposed to illustrate the logistic regression. One investigates the different risk factors in the apparition of coronary heart disease. It has been proposed in Chapter 5 of the book of D.G. Kleinbaum and M. Klein, "Logistic Regression", Statistics for Biology and Health, Springer Science Business Media, LLC (2010) and also by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr341.pdf). This example is based on data given in the file evans.txt coming from http://www.sph.emory.edu/dkleinb/logreg3.htm#data.

  14. The cross-validated AUC for MCP-logistic regression with high-dimensional data.

    PubMed

    Jiang, Dingfeng; Huang, Jian; Zhang, Ying

    2013-10-01

    We propose a cross-validated area under the receiving operator characteristic (ROC) curve (CV-AUC) criterion for tuning parameter selection for penalized methods in sparse, high-dimensional logistic regression models. We use this criterion in combination with the minimax concave penalty (MCP) method for variable selection. The CV-AUC criterion is specifically designed for optimizing the classification performance for binary outcome data. To implement the proposed approach, we derive an efficient coordinate descent algorithm to compute the MCP-logistic regression solution surface. Simulation studies are conducted to evaluate the finite sample performance of the proposed method and its comparison with the existing methods including the Akaike information criterion (AIC), Bayesian information criterion (BIC) or Extended BIC (EBIC). The model selected based on the CV-AUC criterion tends to have a larger predictive AUC and smaller classification error than those with tuning parameters selected using the AIC, BIC or EBIC. We illustrate the application of the MCP-logistic regression with the CV-AUC criterion on three microarray datasets from the studies that attempt to identify genes related to cancers. Our simulation studies and data examples demonstrate that the CV-AUC is an attractive method for tuning parameter selection for penalized methods in high-dimensional logistic regression models.

  15. Refining Stimulus Parameters in Assessing Infant Speech Perception Using Visual Reinforcement Infant Speech Discrimination: Sensation Level.

    PubMed

    Uhler, Kristin M; Baca, Rosalinda; Dudas, Emily; Fredrickson, Tammy

    2015-01-01

    Speech perception measures have long been considered an integral piece of the audiological assessment battery. Currently, a prelinguistic, standardized measure of speech perception is missing in the clinical assessment battery for infants and young toddlers. Such a measure would allow systematic assessment of speech perception abilities of infants as well as the potential to investigate the impact early identification of hearing loss and early fitting of amplification have on the auditory pathways. To investigate the impact of sensation level (SL) on the ability of infants with normal hearing (NH) to discriminate /a-i/ and /ba-da/ and to determine if performance on the two contrasts are significantly different in predicting the discrimination criterion. The design was based on a survival analysis model for event occurrence and a repeated measures logistic model for binary outcomes. The outcome for survival analysis was the minimum SL for criterion and the outcome for the logistic regression model was the presence/absence of achieving the criterion. Criterion achievement was designated when an infant's proportion correct score was >0.75 on the discrimination performance task. Twenty-two infants with NH sensitivity participated in this study. There were 9 males and 13 females, aged 6-14 mo. Testing took place over two to three sessions. The first session consisted of a hearing test, threshold assessment of the two speech sounds (/a/ and /i/), and if time and attention allowed, visual reinforcement infant speech discrimination (VRISD). The second session consisted of VRISD assessment for the two test contrasts (/a-i/ and /ba-da/). The presentation level started at 50 dBA. If the infant was unable to successfully achieve criterion (>0.75) at 50 dBA, the presentation level was increased to 70 dBA followed by 60 dBA. Data examination included an event analysis, which provided the probability of criterion distribution across SL. The second stage of the analysis was a repeated measures logistic regression where SL and contrast were used to predict the likelihood of speech discrimination criterion. Infants were able to reach criterion for the /a-i/ contrast at statistically lower SLs when compared to /ba-da/. There were six infants who never reached criterion for /ba-da/ and one never reached criterion for /a-i/. The conditional probability of not reaching criterion by 70 dB SL was 0% for /a-i/ and 21% for /ba-da/. The predictive logistic regression model showed that children were more likely to discriminate the /a-i/ even when controlling for SL. Nearly all normal-hearing infants can demonstrate discrimination criterion of a vowel contrast at 60 dB SL, while a level of ≥70 dB SL may be needed to allow all infants to demonstrate discrimination criterion of a difficult consonant contrast. American Academy of Audiology.

  16. Parent’s Socioeconomic Status, Adolescents’ Disposable Income, and Adolescents’ Smoking Status in Massachusetts

    PubMed Central

    Soteriades, Elpidoforos S.; DiFranza, Joseph R.

    2003-01-01

    Objectives. This study examined the association between parental socioeconomic status (SES) and adolescent smoking. Methods. We conducted telephone interviews with a probability sample of 1308 Massachusetts adolescents aged 12 to 17 years. We used multiple-variable-adjusted logistic regression models. Results. The risk of adolescent smoking increased by 28% with each step down in parental education and increased by 30% for each step down in parental household income. These associations persisted after adjustment for age, sex, race/ethnicity, and adolescent disposable income. Parental smoking status was a mediator of these associations. Conclusions. Parental SES is inversely associated with adolescent smoking. Parental smoking is a mediator but does not fully explain the association. PMID:12835202

  17. When women tell: intimate partner violence and the factors related to police notification.

    PubMed

    Novisky, Meghan A; Peralta, Robert L

    2015-01-01

    We analyze how victim perceptions of mandatory arrest policies, perpetrator substance use, and presence of children are related to decisions to invoke law enforcement assistance. Logistic regression was used on survey responses from women receiving care in domestic violence shelters. Results suggest that as victim support for mandatory arrest increases, the odds of law enforcement notification of the abuse also increase. Accordingly, mandatory arrest may simply be reducing the probability of reporting intimate partner violence (IPV) among those who do not support the policy, instead of reducing IPV. Results also suggest that perpetrator substance use plays a significant role in law enforcement notification. © The Author(s) 2014.

  18. Constructing a consumption model of fine dining from the perspective of behavioral economics

    PubMed Central

    Tsai, Sang-Bing

    2018-01-01

    Numerous factors affect how people choose a fine dining restaurant, including food quality, service quality, food safety, and hedonic value. A conceptual framework for evaluating restaurant selection behavior has not yet been developed. This study surveyed 150 individuals with fine dining experience and proposed the use of mental accounting and axiomatic design to construct a consumer economic behavior model. Linear and logistic regressions were employed to determine model correlations and the probability of each factor affecting behavior. The most crucial factor was food quality, followed by service and dining motivation, particularly regarding family dining. Safe ingredients, high cooking standards, and menu innovation all increased the likelihood of consumers choosing fine dining restaurants. PMID:29641554

  19. Constructing a consumption model of fine dining from the perspective of behavioral economics.

    PubMed

    Hsu, Sheng-Hsun; Hsiao, Cheng-Fu; Tsai, Sang-Bing

    2018-01-01

    Numerous factors affect how people choose a fine dining restaurant, including food quality, service quality, food safety, and hedonic value. A conceptual framework for evaluating restaurant selection behavior has not yet been developed. This study surveyed 150 individuals with fine dining experience and proposed the use of mental accounting and axiomatic design to construct a consumer economic behavior model. Linear and logistic regressions were employed to determine model correlations and the probability of each factor affecting behavior. The most crucial factor was food quality, followed by service and dining motivation, particularly regarding family dining. Safe ingredients, high cooking standards, and menu innovation all increased the likelihood of consumers choosing fine dining restaurants.

  20. Empirical Behavioral Models to Support Alternative Tools for the Analysis of Mixed-Priority Pedestrian-Vehicle Interaction in a Highway Capacity Context

    PubMed Central

    Rouphail, Nagui M.

    2011-01-01

    This paper presents behavioral-based models for describing pedestrian gap acceptance at unsignalized crosswalks in a mixed-priority environment, where some drivers yield and some pedestrians cross in gaps. Logistic regression models are developed to predict the probability of pedestrian crossings as a function of vehicle dynamics, pedestrian assertiveness, and other factors. In combination with prior work on probabilistic yielding models, the results can be incorporated in a simulation environment, where they can more fully describe the interaction of these two modes. The approach is intended to supplement HCM analytical procedure for locations where significant interaction occurs between drivers and pedestrians, including modern roundabouts. PMID:21643488

  1. Features and prevalence of patients with probable adult attention deficit hyperactivity disorder who request treatment for cocaine use disorders.

    PubMed

    Pérez de Los Cobos, José; Siñol, Núria; Puerta, Carmen; Cantillano, Vanessa; López Zurita, Cristina; Trujols, Joan

    2011-01-30

    To characterize those patients with probable adult attention deficit hyperactivity disorder (ADHD) who ask for treatment of cocaine use disorders; to estimate the prevalence of probable adult ADHD among these patients. This is a cross-sectional and multi-center study performed at outpatient resources of 12 addiction treatment centers in Spain. Participants were treatment-seeking primary cocaine abusers recruited consecutively at one center and through convenience sampling at the other centers. Assessments included semi-structured clinical interview focused on Diagnostic and Statistical Manual of Mental Disorders, fourth edition (DSM-IV) ADHD criteria adapted to adulthood, and the Wender-Utah Rating Scale (WURS) for screening childhood history of ADHD according to patients. Probable adult ADHD was diagnosed when patients met DSM-IV criteria of ADHD in adulthood and scored WURS>32. All participants were diagnosed with current cocaine dependence (n=190) or abuse (n=15). Patients with probable adult ADHD, compared with patients having no lifetime ADHD, were more frequently male, reported higher impulsivity, and began to use nicotine, alcohol, cannabis, or cocaine earlier. Before starting the current treatment, patients with probable adult ADHD also showed higher cocaine craving for the previous day, less frequent cocaine abstinence throughout the previous week, and higher use of cocaine and tobacco during the previous month. Impulsivity and male gender were the only independent risk factors of probable adult ADHD in a logistic regression analysis. The prevalence of probable adult ADHD was 20.5% in the sub-sample of patients consecutively recruited (n=78). A diagnosis of probable adult ADHD strongly distinguishes among treatment-seeking cocaine primary abusers regarding past and current key aspects of their addictive disorder; one-fifth of these patients present with probable adult ADHD. Copyright © 2009 Elsevier Ireland Ltd. All rights reserved.

  2. The Probability of Neonatal Respiratory Distress Syndrome as a Function of Gestational Age and Lecithin/Sphingomyelin Ratio

    PubMed Central

    St. Clair, Caryn; Norwitz, Errol R.; Woensdregt, Karlijn; Cackovic, Michael; Shaw, Julia A.; Malkus, Herbert; Ehrenkranz, Richard A.; Illuzzi, Jessica L.

    2011-01-01

    We sought to define the risk of neonatal respiratory distress syndrome (RDS) as a function of both lecithin/sphingomyelin (L/S) ratio and gestational age. Amniotic fluid L/S ratio data were collected from consecutive women undergoing amniocentesis for fetal lung maturity at Yale-New Haven Hospital from January 1998 to December 2004. Women were included in the study if they delivered a live-born, singleton, nonanomalous infant within 72 hours of amniocentesis. The probability of RDS was modeled using multivariate logistic regression with L/S ratio and gestational age as predictors. A total of 210 mother-neonate pairs (8 RDS, 202 non-RDS) met criteria for analysis. Both gestational age and L/S ratio were independent predictors of RDS. A probability of RDS of 3% or less was noted at an L/S ratio cutoff of ≥3.4 at 34 weeks, ≥2.6 at 36 weeks, ≥1.6 at 38 weeks, and ≥1.2 at term. Under 34 weeks of gestation, the prevalence of RDS was so high that a probability of 3% or less was not observed by this model. These data describe a means of stratifying the probability of neonatal RDS using both gestational age and the L/S ratio and may aid in clinical decision making concerning the timing of delivery. PMID:18773379

  3. Mental health and well-being among type 1 diabetes caregivers in India: Evidence from the IDREAM study.

    PubMed

    Capistrant, Benjamin D; Friedemann-Sánchez, Greta; Novak, Lindsey K; Zuijdwijk, Caroline; Ogle, Graham D; Pendsey, Sharad

    2017-12-01

    Although more than half of the world's children with T1D live in developing countries, still little is known about how caregiving for children with T1D affects the parent/caregivers' health in low- and middle-income country settings. Caregivers of 178 children with T1D from a specialized diabetes clinic in Maharashtra, India were surveyed. Ordered and standard logistic regression models adjusted for caregiver, household and child characteristics, were fit to estimate the association of caregiving burden (objective caregiving burden and subjective caregiving burden (Zarit Burden Inventory - tertiles)) with caregiver depression (Patient Health Questionnaire [PHQ-9]) and well-being (CDC Unhealthy Days) outcomes. Caregivers with high subjective caregiving burden had a 41% probability of most severe depression category (probability: 0.41, 95% CI: 0.25, 0.57) and an 39% probability of low well-being (probability: 0.39, 95% CI: 0.27, 0.51), compared to caregivers with low subjective burden. Caregivers with high subjective caregiving burden and high objective direct caregiving burden had an adjusted 30% probability of elevated depressive symptoms (PHQ≥10). Among Indian T1D caregivers, high subjective caregiving burden and objective direct caregiving burden were associated with a high risk for caregiver depression and poorer well-being. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. High lifetime probability of screen-detected cervical abnormalities.

    PubMed

    Pankakoski, Maiju; Heinävaara, Sirpa; Sarkeala, Tytti; Anttila, Ahti

    2017-12-01

    Objective Regular screening and follow-up is an important key to cervical cancer prevention; however, screening inevitably detects mild or borderline abnormalities that would never progress to a more severe stage. We analysed the cumulative probability and recurrence of cervical abnormalities in the Finnish organized screening programme during a 22-year follow-up. Methods Screening histories were collected for 364,487 women born between 1950 and 1965. Data consisted of 1 207,017 routine screens and 88,143 follow-up screens between 1991 and 2012. Probabilities of cervical abnormalities by age were estimated using logistic regression and generalized estimating equations methodology. Results The probability of experiencing any abnormality at least once at ages 30-64 was 34.0% (95% confidence interval [CI]: 33.3-34.6%) . Probability was 5.4% (95% CI: 5.0-5.8%) for results warranting referral and 2.2% (95% CI: 2.0-2.4%) for results with histologically confirmed findings. Previous occurrences were associated with an increased risk of detecting new ones, specifically in older women. Conclusion A considerable proportion of women experience at least one abnormal screening result during their lifetime, and yet very few eventually develop an actual precancerous lesion. Re-evaluation of diagnostic criteria concerning mild abnormalities might improve the balance of harms and benefits of screening. Special monitoring of women with recurrent abnormalities especially at older ages may also be needed.

  5. Relationship of lead, mercury, mirex, dichlorodiphenyldichloroethylene, hexachlorobenzene, and polychlorinated biphenyls to timing of menarche among Akwesasne Mohawk girls.

    PubMed

    Denham, Melinda; Schell, Lawrence M; Deane, Glenn; Gallo, Mia V; Ravenscroft, Julia; DeCaprio, Anthony P

    2005-02-01

    Children are commonly exposed at background levels to several ubiquitous environmental pollutants, such as lead and persistent organic pollutants, that have been linked to neurologic and endocrine effects. These effects have prompted concern about alterations in human reproductive development. Few studies have examined the effects of these toxicants on human sexual maturation at levels commonly found in the general population, and none has been able to examine multiple toxicant exposures. The aim of the current investigation was to examine the relationship between attainment of menarche and levels of 6 environmental pollutants to which children are commonly exposed at low levels, ie, dichlorodiphenyldichloroethylene (p,p'-DDE), hexachlorobenzene (HCB), polychlorinated biphenyls (PCBs), mirex, lead, and mercury. This study was conducted with residents of the Akwesasne Mohawk Nation, a sovereign territory that spans the St Lawrence River and the boundaries of New York State and Ontario and Quebec, Canada. Since the 1950s, the St Lawrence River has been a site of substantial industrial development, and the Nation is currently adjacent to a US National Priority Superfund site. PCB, p,p'-DDE, HCB, and mirex levels exceeding the US Food and Drug Administration recommended tolerance limits for human consumption have been found in local animal species. The present analysis included 138 Akwesasne Mohawk Nation girls 10 to 16.9 years of age. Blood samples and sociodemographic data were collected by Akwesasne community members, without prior knowledge of participants' exposure status. Attainment of menses (menarche) was assessed as present or absent at the time of the interview. Congener-specific PCB analysis was available, and all 16 PCB congeners detected in >50% of the sample were included in analyses (International Union of Pure and Applied Chemistry numbers 52, 70, 74, 84, 87, 95, 99, 101 [+90], 105, 110, 118, 138 [+163 and 164], 149 [+123], 153, 180, and 187). Probit analysis was used to determine the median age at menarche for the sample. Binary logistic regression analysis was used to determine predictors of menarcheal status. Six toxicants (p,p'-DDE, HCB, PCBs, mirex, lead, and mercury) were entered into the logistic regression model. Age, socioeconomic status (SES), and BMI were tested as potential cofounders and were included in the model at P < .05. Interactions among toxicants were also evaluated. Toxicant levels were measured in blood for this sample and were consistent with long-term exposure to a variety of toxicants in multiple media. Mercury levels were at or below background levels, all lead levels were well below the Centers for Disease Control and Prevention action limit of 10 microg/dL, and PCB levels were consistent with a cumulative, continuing exposure pattern. The median age at menarche for the total sample was 12.2 years. The predicted age at menarche for girls with lead levels above the median (1.2 microg/dL) was 10.5 months later than that for girls with lead levels below the median. In the logistic regression analysis, age was the strongest predictor of menarcheal status and SES was also a significant predictor but BMI was not. The logistic regression analysis that corrected for age, SES, and other pollutants (p,p'-DDE, HCB, mirex, and mercury) indicated that, at their respective geometric means, lead (geometric mean: 0.49 microg/dL) was associated with a significantly lower probability of having reached menarche (beta = -1.29) and a group of 4 potentially estrogenic PCB congeners (E-PCB) (geometric mean: 0.12 ppb; International Union of Pure and Applied Chemistry numbers 52, 70, 101 [+90], and 187) was associated with a significantly greater probability of having reached menarche (beta = 2.13). Predicted probabilities at different levels of lead and PCBs were calculated on the basis of the logistic regression model. At the respective means of all toxicants and SES, 69% of 12-year-old girls were predicted to have reached menarche. However, at the 75th percentile of lead levels, only 10% of 12-year-old Mohawk girls were predicted to have reached menarche; at the 75th percentile of E-PCB levels, 86% of 12-year-old Mohawk girls were predicted to have reached menarche. No association was observed between mirex, p,p'-DDE, or HCB and menarcheal status. Although BMI was not a significant predictor, we tested BMI in the logistic regression model; it had little effect on the relationships between menarcheal status and either lead or E-PCB. In models testing toxicant interactions, age, SES, lead levels, and PCB levels continued to be significant predictors of menarcheal status. When each toxicant was tested in a logistic regression model correcting only for age and SES, we observed little change in the effects of lead or E-PCB on menarcheal status. The analysis of multichemical exposure among Akwesasne Mohawk Nation adolescent girls suggests that the attainment of menarche may be sensitive to relatively low levels of lead and certain PCB congeners. This study is distinguished by the ability to test many toxicants simultaneously and thus to exclude effects from unmeasured but coexisting exposures. By testing several PCB congener groupings, we were able to determine that specifically a group of potentially estrogenic PCB congeners affected the odds of reaching menarche. The lead and PCB findings are consistent with the literature and are biologically plausible. The sample size, cross-sectional study design, and possible occurrence of confounders beyond those tested suggest that results should be interpreted cautiously. Additional investigation to determine whether such low toxicant levels may affect reproduction and disorders of the reproductive system is warranted.

  6. Assessing DSM-5 latent subtypes of acute stress disorder dissociative or intrusive?

    PubMed

    Armour, Cherie; Hansen, Maj

    2015-02-28

    Acute Stress Disorder (ASD) was first included in the DSM-IV in 1994. It was proposed to account for traumatic responding in the early post trauma phase and to act as an identifier for later Posttraumatic Stress Disorder (PTSD). Unlike PTSD it included a number of dissociative indicators. The revised DSM-5 PTSD criterion included a dissociative-PTSD subtype. The current study assessed if a dissociative-ASD subtype may be present for DSM-5 ASD. Moreover, we assessed if a number of risk factors resulted in an increased probability of membership in symptomatic compared to a baseline ASD profile. We used data from 450 bank robbery victims. Latent profile analysis (LPA) was used to uncover latent profiles of ASD. Multinomial logistic regression was used to determine if female gender, age, social support, peritraumatic panic, somatization, and number of trauma exposures increased or decreased the probability of profile membership. Four latent profiles were uncovered and included an intrusion rather than dissociative subtype. Increased age and social support decreased the probability of individuals being grouped into the intrusion subtype whereas increased peritraumatic panic and somatization increased the probability of individuals being grouped into the intrusion subtype. Findings are discussed in regard to the ICD-11 and the DSM-5. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  7. Sexual abuse, residential schooling and probable pathological gambling among Indigenous Peoples.

    PubMed

    Dion, Jacinthe; Cantinotti, Michael; Ross, Amélie; Collin-Vézina, Delphine

    2015-06-01

    Sexual abuse leads to short-term and long-lasting pervasive outcomes, including addictions. Among Indigenous Peoples, sexual abuse experienced in the context of residential schooling may have led to unresolved grief that is contributing to social problems, such as pathological (disordered) gambling. The aim of this study is to investigate the link between child sexual abuse, residential schooling and probable pathological gambling. The participants were 358 Indigenous persons (54.2% women) aged between 18 and 87 years, from two communities and two semi-urban centers in Quebec (Canada). Probable pathological gambling was evaluated using the South Oaks Gambling Screen (SOGS), and sexual abuse and residential schooling were assessed with dichotomous questions (yes/no). The results indicate an 8.7% past-year prevalence rate of pathological gambling problems among participants, which is high compared with the general Canadian population. Moreover, 35.4% were sexually abused, while 28.1% reported having been schooled in a residential setting. The results of a logistic regression also indicate that experiences of child sexual abuse and residential schooling are associated with probable pathological gambling among Indigenous Peoples. These findings underscore the importance of using an ecological approach when treating gambling, to address childhood traumas alongside current addiction problems. Copyright © 2015 Elsevier Ltd. All rights reserved.

  8. A comparison of the fertility of Dominican, Puerto Rican and mainland Puerto Rican adolescents.

    PubMed

    Fennelly, K; Cornwell, G; Casper, L

    1992-01-01

    Data from three fertility surveys are used to examined the probabilities and determinants of adolescent births among Dominican and Puerto Rican women. Young women in the Dominican Republic are the most likely to have had a child by each year of age from 14 through 24, followed by young women on the Island of Puerto Rico; the probability of an early birth is lowest for Puerto Rican women on the U.S. mainland. Eighteen percent of Dominican women have had a child before their 18th birthday, compared with 13% of women living in Puerto Rico, and 10% of Puerto Rican women in metropolitan New York. The cumulative probabilities that Puerto Rican women will have borne a child before their 20th birthday are almost identical, whether the women live on the island or the U.S. mainland, but the difference between Puerto Rican and Dominican women widens. The order is reversed, however, in the analysis of premarital births: The probability of a premarital birth during adolescence is highest for Puerto Rican women in New York, and lowest for Dominican women. In a separate logistic regression analysis, education and age at first sexual intercourse are shown to be important determinants of adolescent fertility in all three populations.

  9. Raising the speed limit from 75 to 80mph on Utah rural interstates: Effects on vehicle speeds and speed variance.

    PubMed

    Hu, Wen

    2017-06-01

    In November 2010 and October 2013, Utah increased speed limits on sections of rural interstates from 75 to 80mph. Effects on vehicle speeds and speed variance were examined. Speeds were measured in May 2010 and May 2014 within the new 80mph zones, and at a nearby spillover site and at more distant control sites where speed limits remained 75mph. Log-linear regression models estimated percentage changes in speed variance and mean speeds for passenger vehicles and large trucks associated with the speed limit increase. Logistic regression models estimated effects on the probability of passenger vehicles exceeding 80, 85, or 90mph and large trucks exceeding 80mph. Within the 80mph zones and at the spillover location in 2014, mean passenger vehicle speeds were significantly higher (4.1% and 3.5%, respectively), as were the probabilities that passenger vehicles exceeded 80mph (122.3% and 88.5%, respectively), than would have been expected without the speed limit increase. Probabilities that passenger vehicles exceeded 85 and 90mph were non-significantly higher than expected within the 80mph zones. For large trucks, the mean speed and probability of exceeding 80mph were higher than expected within the 80mph zones. Only the increase in mean speed was significant. Raising the speed limit was associated with non-significant increases in speed variance. The study adds to the wealth of evidence that increasing speed limits leads to higher travel speeds and an increased probability of exceeding the new speed limit. Results moreover contradict the claim that increasing speed limits reduces speed variance. Although the estimated increases in mean vehicle speeds may appear modest, prior research suggests such increases would be associated with substantial increases in fatal or injury crashes. This should be considered by lawmakers considering increasing speed limits. Copyright © 2017 Elsevier Ltd and National Safety Council. All rights reserved.

  10. Factors associated with blood transfusion in donor hepatectomy: results from 2344 donors at a large single center.

    PubMed

    Choi, Seong-Soo; Cho, Seong-Sik; Kim, Sung-Hoon; Jun, In-Gu; Hwang, Gyu-Sam; Kim, Young-Kug

    2013-12-15

    The safety of healthy living donors undergoing hepatic resection for living-donor liver transplantation is of paramount concern. Although blood transfusions have been associated with morbidity and mortality after hepatectomy, there is limited information about the risk factors associated with blood transfusion in living liver donors. We retrospectively analyzed 2344 donors who underwent a hepatectomy for living-donor liver transplantation. Logistic regression analysis was performed to determine blood transfusion predictors in living-donor hepatectomy. Of these donors, 48 (2.0%) and 97 (4.1%) were transfused with packed red blood cell (PRBC) and fresh-frozen plasma (FFP), respectively. The amount of PRBC and FFP administered to donors transfused with blood products were 1.9±0.8 and 3.7±2.5 units, respectively. In multivariate logistic regression analysis, a low preoperative hemoglobin level was found to be an independent predictor of PRBC transfusion in donor hepatectomy (odds ratio=0.585; 95% confidence interval=0.451-0.758; P<0.001). A high graft-to-donor weight ratio predicted an FFP transfusion in donor hepatectomy (odds ratio=2.997; 95% confidence interval=1.226-7.327; P=0.016). These results indicate that, in donor hepatectomy, the preoperative hemoglobin value and graft-to-donor weight ratio can provide useful information on the probability of PRBC and FFP transfusion, respectively.

  11. Logistic regression model can reduce unnecessary artificial liver support in hepatitis B virus-associated acute-on-chronic liver failure: decision curve analysis.

    PubMed

    Qin, Gang; Bian, Zhao-Lian; Shen, Yi; Zhang, Lei; Zhu, Xiao-Hong; Liu, Yan-Mei; Shao, Jian-Guo

    2016-06-04

    Several models have been proposed to predict the short-term outcome of acute-on-chronic liver failure (ACLF) after treatment. We aimed to determine whether better decisions for artificial liver support system (ALSS) treatment could be made with a model than without, through decision curve analysis (DCA). The medical profiles of a cohort of 232 patients with hepatitis B virus (HBV)-associated ACLF were retrospectively analyzed to explore the role of plasma prothrombin activity (PTA), model for end-stage liver disease (MELD) and logistic regression model (LRM) in identifying patients who could benefit from ALSS. The accuracy and reliability of PTA, MELD and LRM were evaluated with previously reported cutoffs. DCA was performed to evaluate the clinical role of these models in predicting the treatment outcome. With the cut-off value of 0.2, LRM had sensitivity of 92.6 %, specificity of 42.3 % and an area under the receiving operating characteristic curve (AUC) of 0.68, which showed superior discrimination over PTA and MELD. DCA revealed that the LRM-guided ALSS treatment was superior over other strategies including "treating all" and MELD-guided therapy, for the midrange threshold probabilities of 16 to 64 %. The use of LRM-guided ALSS treatment could increase both the accuracy and efficiency of this procedure, allowing the avoidance of unnecessary ALSS.

  12. Partial meniscectomy is associated with increased risk of incident radiographic osteoarthritis and worsening cartilage damage in the following year

    PubMed Central

    Roemer, Frank W.; Kwoh, C. Kent; Hannon, Michael J.; Hunter, David J.; Eckstein, Felix; Grago, Jason; Boudreau, Robert M.; Englund, Martin; Guermazi, Ali

    2016-01-01

    Objectives To assess whether partial meniscectomy is associated with increased risk of radiographic osteoarthritis (ROA) and worsening cartilage damage in the following year. Methods We studied 355 knees from the Osteoarthritis Initiative that developed ROA (Kellgren-Lawrence grade ≥ 2), which were matched with control knees. The MR images were assessed using the semi-quantitative MOAKS system. Conditional logistic regression was applied to estimate risk of incident ROA. Logistic regression was used to assess the risk of worsening cartilage damage in knees with partial meniscectomy that developed ROA. Results In the group with incident ROA, 4.4% underwent partial meniscectomy during the year prior to the case-defining visit, compared with none of the knees that did not develop ROA. All (n=31) knees that had partial meniscectomy and 58.9% (n=165) of the knees with prevalent meniscal damage developed ROA (OR=2.51, 95% CI [1.73, 3.64]). In knees that developed ROA, partial meniscectomy was associated with an increased risk of worsening cartilage damage (OR=4.51, 95% CI [1.53, 13.33]). Conclusions The probability of having had partial meniscectomy was higher in knees that developed ROA. When looking only at knees that developed ROA, partial meniscectomy was associated with greater risk of worsening cartilage damage. PMID:27121931

  13. The effect of playing tactics and situational variables on achieving score-box possessions in a professional soccer team.

    PubMed

    Lago-Ballesteros, Joaquin; Lago-Peñas, Carlos; Rey, Ezequiel

    2012-01-01

    The aim of this study was to analyse the influence of playing tactics, opponent interaction and situational variables on achieving score-box possessions in professional soccer. The sample was constituted by 908 possessions obtained by a team from the Spanish soccer league in 12 matches played during the 2009-2010 season. Multidimensional qualitative data obtained from 12 ordered categorical variables were used. Sampled matches were registered by the AMISCO PRO system. Data were analysed using chi-square analysis and multiple logistic regression analysis. Of 908 possessions, 303 (33.4%) produced score-box possessions, 477 (52.5%) achieved progression and 128 (14.1%) failed to reach any sort of progression. Multiple logistic regression showed that, for the main variable "team possession type", direct attacks and counterattacks were three times more effective than elaborate attacks for producing a score-box possession (P < 0.05). Team possession originating from the middle zones and playing against less than six defending players (P < 0.001) registered a higher success than those started in the defensive zone with a balanced defence. When the team was drawing or winning, the probability of reaching the score-box decreased by 43 and 53 percent, respectively, compared with the losing situation (P < 0.05). Accounting for opponent interactions and situational variables is critical to evaluate the effectiveness of offensive playing tactics on producing score-box possessions.

  14. Induced abortion: risk factors for adolescent female students, a Brazilian study.

    PubMed

    Correia, Divanise S; Cavalcante, Jairo C; Maia, Eulália M C

    2009-12-16

    The purpose of this study was to analyze risk factors for abortion among female teenagers from 12 to 19 years of age in the city of Maceió, Brazil. This is a cross-sectional study, conducted in ten schools. The sample was calculated by considering the number of admissions for postabortion curettage, obtained from the Information System of Hospitalization. Data were obtained through a semi-structured questionnaire divided into three basic blocks of data: sociodemographic, sexual life, and pregnancy/abortion. To analyze the data, the logistic regression model was used. The Forward Method was chosen to set the final model that minimizes the number of variables and maximizes the accuracy of the model. The significant analysis between the dichotomous variables provided eight significant variables. Two of them are protective for abortion: the ages 12-14 years and talking with parents about sex. After the logistic regression, the receipt of support for abortion was the most significant variable of all. The adolescent with an active sexual life, a previous pregnancy, who is married, and has received support for an abortion has a 99.74% probability for an abortion. The results of this study, demonstrating the importance of the group in adolescence, and the statistical significance of having a partner to support and approve the pregnancy appears as a preventive factor for abortion. It shows the importance of support and companionship for adolescent women.

  15. Identifying patterns of item missing survey data using latent groups: an observational study

    PubMed Central

    McElwee, Paul; Nathan, Andrea; Burton, Nicola W; Turrell, Gavin

    2017-01-01

    Objectives To examine whether respondents to a survey of health and physical activity and potential determinants could be grouped according to the questions they missed, known as ‘item missing’. Design Observational study of longitudinal data. Setting Residents of Brisbane, Australia. Participants 6901 people aged 40–65 years in 2007. Materials and methods We used a latent class model with a mixture of multinomial distributions and chose the number of classes using the Bayesian information criterion. We used logistic regression to examine if participants’ characteristics were associated with their modal latent class. We used logistic regression to examine whether the amount of item missing in a survey predicted wave missing in the following survey. Results Four per cent of participants missed almost one-fifth of the questions, and this group missed more questions in the middle of the survey. Eighty-three per cent of participants completed almost every question, but had a relatively high missing probability for a question on sleep time, a question which had an inconsistent presentation compared with the rest of the survey. Participants who completed almost every question were generally younger and more educated. Participants who completed more questions were less likely to miss the next longitudinal wave. Conclusions Examining patterns in item missing data has improved our understanding of how missing data were generated and has informed future survey design to help reduce missing data. PMID:29084795

  16. Bayesian data fusion for spatial prediction of categorical variables in environmental sciences

    NASA Astrophysics Data System (ADS)

    Gengler, Sarah; Bogaert, Patrick

    2014-12-01

    First developed to predict continuous variables, Bayesian Maximum Entropy (BME) has become a complete framework in the context of space-time prediction since it has been extended to predict categorical variables and mixed random fields. This method proposes solutions to combine several sources of data whatever the nature of the information. However, the various attempts that were made for adapting the BME methodology to categorical variables and mixed random fields faced some limitations, as a high computational burden. The main objective of this paper is to overcome this limitation by generalizing the Bayesian Data Fusion (BDF) theoretical framework to categorical variables, which is somehow a simplification of the BME method through the convenient conditional independence hypothesis. The BDF methodology for categorical variables is first described and then applied to a practical case study: the estimation of soil drainage classes using a soil map and point observations in the sandy area of Flanders around the city of Mechelen (Belgium). The BDF approach is compared to BME along with more classical approaches, as Indicator CoKringing (ICK) and logistic regression. Estimators are compared using various indicators, namely the Percentage of Correctly Classified locations (PCC) and the Average Highest Probability (AHP). Although BDF methodology for categorical variables is somehow a simplification of BME approach, both methods lead to similar results and have strong advantages compared to ICK and logistic regression.

  17. Postoperative complications of contemporary open and robot-assisted laparoscopic radical prostatectomy using standardized reporting systems.

    PubMed

    Pompe, Raisa S; Beyer, Burkhard; Haese, Alexander; Preisser, Felix; Michl, Uwe; Steuber, Thomas; Graefen, Markus; Huland, Hartwig; Karakiewicz, Pierre I; Tilki, Derya

    2018-05-04

    To analyze time trends and contemporary rates of postoperative complications after RP and to compare the complication profile of ORP and RALP using standardized reporting systems. Retrospective analysis of 13,924 RP patients in a single institution (2005 to 2015). Complications were collected during hospital stay and via standardized questionnaire 3 months after and grouped into eight schemes. Since 2013, the revised Clavien-Dindo classification was used (n = 4,379). Annual incidence rates of different complications were graphically displayed. Multivariable logistic regression analyses compared complications between ORP and RALP after inverse probability of treatment weighting (IPTW). After introduction of standardized classification systems, complication rates have increased with a contemporary rate of 20.6% (2013 - 2015). While minor Clavien-Dindo grades represented the majority (I: 10.6%; II: 7.9%), severe complications (grades IV-V) were rare (<1%). In logistic regression analyses after IPTW, RALP was associated with less blood loss, shorter catheterization time and lower risk for Clavien-Dindo grade II and III complications. Our results emphasize the importance of standardized reporting systems for quality control and comparison across approaches or institutions. Contemporary complication rates in a high volume center remain low and are most frequently minor Clavien-Dindo grades. RALP had a slightly better complication profile compared to ORP. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  18. Predicting Visual Distraction Using Driving Performance Data

    PubMed Central

    Kircher, Katja; Ahlstrom, Christer

    2010-01-01

    Behavioral variables are often used as performance indicators (PIs) of visual or internal distraction induced by secondary tasks. The objective of this study is to investigate whether visual distraction can be predicted by driving performance PIs in a naturalistic setting. Visual distraction is here defined by a gaze based real-time distraction detection algorithm called AttenD. Seven drivers used an instrumented vehicle for one month each in a small scale field operational test. For each of the visual distraction events detected by AttenD, seven PIs such as steering wheel reversal rate and throttle hold were calculated. Corresponding data were also calculated for time periods during which the drivers were classified as attentive. For each PI, means between distracted and attentive states were calculated using t-tests for different time-window sizes (2 – 40 s), and the window width with the smallest resulting p-value was selected as optimal. Based on the optimized PIs, logistic regression was used to predict whether the drivers were attentive or distracted. The logistic regression resulted in predictions which were 76 % correct (sensitivity = 77 % and specificity = 76 %). The conclusion is that there is a relationship between behavioral variables and visual distraction, but the relationship is not strong enough to accurately predict visual driver distraction. Instead, behavioral PIs are probably best suited as complementary to eye tracking based algorithms in order to make them more accurate and robust. PMID:21050615

  19. Peers, tobacco advertising, and secondhand smoke exposure influences smoking initiation in diverse adolescents.

    PubMed

    Voorhees, Carolyn C; Ye, Cong; Carter-Pokras, Olivia; MacPherson, Laura; Kanamori, Mariano; Zhang, Guangyu; Chen, Lu; Fiedler, Robert

    2011-01-01

    Identify demographic, social, and environmental factors associated with smoking initiation in a large, racially and ethnically diverse sample of underage youth participating in the 2006 Maryland Youth Tobacco Survey. Cross-sectional, multistage, probability sample survey. Schools (308 middle and high schools) in Maryland. Subjects were 12- to 17-year-old adolescents participating in a school-based survey. New smokers and nonsmokers were included in the analysis (n  =  57,072). Social and media influence, secondhand smoke exposure, tobacco product use, and demographic information including age, race/ethnicity, and geographic region. Chi-square and multiple logistic regression analyses controlling for clustering. Hispanic and Hawaiian/Pacific Islander youth were most likely and Asian and Black youth were least likely to be new smokers. Smoking initiation was positively associated with higher age, living with a current smoker, secondhand smoke exposure, exposure to advertisements for tobacco products, having more friends that smoke, tobacco products offered by friends, risk perceptions, and use of other tobacco products such as smokeless tobacco and cigars. Multivariate logistic regression results suggested that composite measures of peer influence, advertising exposure, and secondhand smoke exposure were independently associated with smoking initiation. Media, peer influence, and secondhand smoke exposure were the most important factors influencing smoking initiation and were common to all racial/ethnic groups in this study. Interventions combining targeted public awareness, education, and media campaigns directed at parents/guardians should be investigated.

  20. Religious variations in perceived infertility and inconsistent contraceptive use among unmarried young adults in the United States.

    PubMed

    Burdette, Amy M; Haynes, Stacy H; Hill, Terrence D; Bartkowski, John P

    2014-06-01

    In this paper, we examine associations among personal religiosity, perceived infertility, and inconsistent contraceptive use among unmarried young adults (ages 18-29). The data for this investigation came from the National Survey of Reproductive and Contraceptive Knowledge (n = 1,695). We used multinomial logistic regression to model perceived infertility, adjusted probabilities to model rationales for perceived infertility, and binary logistic regression to model inconsistent contraceptive use. Evangelical Protestants were more likely than non-affiliates to believe that they were infertile. Among the young women who indicated some likelihood of infertility, evangelical Protestants were also more likely than their other Protestant or non-Christian faith counterparts to believe that they were infertile because they had unprotected sex without becoming pregnant. Although evangelical Protestants were more likely to exhibit inconsistent contraception use than non-affiliates, we were unable to attribute any portion of this difference to infertility perceptions. Whereas most studies of religion and health emphasize the salubrious role of personal religiosity, our results suggest that evangelical Protestants may be especially likely to hold misconceptions about their fertility. Because these misconceptions fail to explain higher rates of inconsistent contraception use among evangelical Protestants, additional research is needed to understand the principles and motives of this unique religious community. Copyright © 2014 Society for Adolescent Health and Medicine. Published by Elsevier Inc. All rights reserved.

  1. Robust logistic regression to narrow down the winner's curse for rare and recessive susceptibility variants.

    PubMed

    Kesselmeier, Miriam; Lorenzo Bermejo, Justo

    2017-11-01

    Logistic regression is the most common technique used for genetic case-control association studies. A disadvantage of standard maximum likelihood estimators of the genotype relative risk (GRR) is their strong dependence on outlier subjects, for example, patients diagnosed at unusually young age. Robust methods are available to constrain outlier influence, but they are scarcely used in genetic studies. This article provides a non-intimidating introduction to robust logistic regression, and investigates its benefits and limitations in genetic association studies. We applied the bounded Huber and extended the R package 'robustbase' with the re-descending Hampel functions to down-weight outlier influence. Computer simulations were carried out to assess the type I error rate, mean squared error (MSE) and statistical power according to major characteristics of the genetic study and investigated markers. Simulations were complemented with the analysis of real data. Both standard and robust estimation controlled type I error rates. Standard logistic regression showed the highest power but standard GRR estimates also showed the largest bias and MSE, in particular for associated rare and recessive variants. For illustration, a recessive variant with a true GRR=6.32 and a minor allele frequency=0.05 investigated in a 1000 case/1000 control study by standard logistic regression resulted in power=0.60 and MSE=16.5. The corresponding figures for Huber-based estimation were power=0.51 and MSE=0.53. Overall, Hampel- and Huber-based GRR estimates did not differ much. Robust logistic regression may represent a valuable alternative to standard maximum likelihood estimation when the focus lies on risk prediction rather than identification of susceptibility variants. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  2. CUSUM-Logistic Regression analysis for the rapid detection of errors in clinical laboratory test results.

    PubMed

    Sampson, Maureen L; Gounden, Verena; van Deventer, Hendrik E; Remaley, Alan T

    2016-02-01

    The main drawback of the periodic analysis of quality control (QC) material is that test performance is not monitored in time periods between QC analyses, potentially leading to the reporting of faulty test results. The objective of this study was to develop a patient based QC procedure for the more timely detection of test errors. Results from a Chem-14 panel measured on the Beckman LX20 analyzer were used to develop the model. Each test result was predicted from the other 13 members of the panel by multiple regression, which resulted in correlation coefficients between the predicted and measured result of >0.7 for 8 of the 14 tests. A logistic regression model, which utilized the measured test result, the predicted test result, the day of the week and time of day, was then developed for predicting test errors. The output of the logistic regression was tallied by a daily CUSUM approach and used to predict test errors, with a fixed specificity of 90%. The mean average run length (ARL) before error detection by CUSUM-Logistic Regression (CSLR) was 20 with a mean sensitivity of 97%, which was considerably shorter than the mean ARL of 53 (sensitivity 87.5%) for a simple prediction model that only used the measured result for error detection. A CUSUM-Logistic Regression analysis of patient laboratory data can be an effective approach for the rapid and sensitive detection of clinical laboratory errors. Published by Elsevier Inc.

  3. Determinants of unmet need for family planning in rural Burkina Faso: a multilevel logistic regression analysis.

    PubMed

    Wulifan, Joseph K; Jahn, Albrecht; Hien, Hervé; Ilboudo, Patrick Christian; Meda, Nicolas; Robyn, Paul Jacob; Saidou Hamadou, T; Haidara, Ousmane; De Allegri, Manuela

    2017-12-19

    Unmet need for family planning has implications for women and their families, such as unsafe abortion, physical abuse, and poor maternal health. Contraceptive knowledge has increased across low-income settings, yet unmet need remains high with little information on the factors explaining it. This study assessed factors associated with unmet need among pregnant women in rural Burkina Faso. We collected data on pregnant women through a population-based survey conducted in 24 rural districts between October 2013 and March 2014. Multivariate multilevel logistic regression was used to assess the association between unmet need for family planning and a selection of relevant demand- and supply-side factors. Of the 1309 pregnant women covered in the survey, 239 (18.26%) reported experiencing unmet need for family planning. Pregnant women with more than three living children [OR = 1.80; 95% CI (1.11-2.91)], those with a child younger than 1 year [OR = 1.75; 95% CI (1.04-2.97)], pregnant women whose partners disapproves contraceptive use [OR = 1.51; 95% CI (1.03-2.21)] and women who desired fewer children compared to their partners preferred number of children [OR = 1.907; 95% CI (1.361-2.672)] were significantly more likely to experience unmet need for family planning, while health staff training in family planning logistics management (OR = 0.46; 95% CI (0.24-0.73)] was associated with a lower probability of experiencing unmet need for family planning. Findings suggest the need to strengthen family planning interventions in Burkina Faso to ensure greater uptake of contraceptive use and thus reduce unmet need for family planning.

  4. Nonconvex Sparse Logistic Regression With Weakly Convex Regularization

    NASA Astrophysics Data System (ADS)

    Shen, Xinyue; Gu, Yuantao

    2018-06-01

    In this work we propose to fit a sparse logistic regression model by a weakly convex regularized nonconvex optimization problem. The idea is based on the finding that a weakly convex function as an approximation of the $\\ell_0$ pseudo norm is able to better induce sparsity than the commonly used $\\ell_1$ norm. For a class of weakly convex sparsity inducing functions, we prove the nonconvexity of the corresponding sparse logistic regression problem, and study its local optimality conditions and the choice of the regularization parameter to exclude trivial solutions. Despite the nonconvexity, a method based on proximal gradient descent is used to solve the general weakly convex sparse logistic regression, and its convergence behavior is studied theoretically. Then the general framework is applied to a specific weakly convex function, and a necessary and sufficient local optimality condition is provided. The solution method is instantiated in this case as an iterative firm-shrinkage algorithm, and its effectiveness is demonstrated in numerical experiments by both randomly generated and real datasets.

  5. A comparative study on entrepreneurial attitudes modeled with logistic regression and Bayes nets.

    PubMed

    López Puga, Jorge; García García, Juan

    2012-11-01

    Entrepreneurship research is receiving increasing attention in our context, as entrepreneurs are key social agents involved in economic development. We compare the success of the dichotomic logistic regression model and the Bayes simple classifier to predict entrepreneurship, after manipulating the percentage of missing data and the level of categorization in predictors. A sample of undergraduate university students (N = 1230) completed five scales (motivation, attitude towards business creation, obstacles, deficiencies, and training needs) and we found that each of them predicted different aspects of the tendency to business creation. Additionally, our results show that the receiver operating characteristic (ROC) curve is affected by the rate of missing data in both techniques, but logistic regression seems to be more vulnerable when faced with missing data, whereas Bayes nets underperform slightly when categorization has been manipulated. Our study sheds light on the potential entrepreneur profile and we propose to use Bayesian networks as an additional alternative to overcome the weaknesses of logistic regression when missing data are present in applied research.

  6. Epidemiologic programs for computers and calculators. A microcomputer program for multiple logistic regression by unconditional and conditional maximum likelihood methods.

    PubMed

    Campos-Filho, N; Franco, E L

    1989-02-01

    A frequent procedure in matched case-control studies is to report results from the multivariate unmatched analyses if they do not differ substantially from the ones obtained after conditioning on the matching variables. Although conceptually simple, this rule requires that an extensive series of logistic regression models be evaluated by both the conditional and unconditional maximum likelihood methods. Most computer programs for logistic regression employ only one maximum likelihood method, which requires that the analyses be performed in separate steps. This paper describes a Pascal microcomputer (IBM PC) program that performs multiple logistic regression by both maximum likelihood estimation methods, which obviates the need for switching between programs to obtain relative risk estimates from both matched and unmatched analyses. The program calculates most standard statistics and allows factoring of categorical or continuous variables by two distinct methods of contrast. A built-in, descriptive statistics option allows the user to inspect the distribution of cases and controls across categories of any given variable.

  7. Comparison of cranial sex determination by discriminant analysis and logistic regression.

    PubMed

    Amores-Ampuero, Anabel; Alemán, Inmaculada

    2016-04-05

    Various methods have been proposed for estimating dimorphism. The objective of this study was to compare sex determination results from cranial measurements using discriminant analysis or logistic regression. The study sample comprised 130 individuals (70 males) of known sex, age, and cause of death from San José cemetery in Granada (Spain). Measurements of 19 neurocranial dimensions and 11 splanchnocranial dimensions were subjected to discriminant analysis and logistic regression, and the percentages of correct classification were compared between the sex functions obtained with each method. The discriminant capacity of the selected variables was evaluated with a cross-validation procedure. The percentage accuracy with discriminant analysis was 78.2% for the neurocranium (82.4% in females and 74.6% in males) and 73.7% for the splanchnocranium (79.6% in females and 68.8% in males). These percentages were higher with logistic regression analysis: 85.7% for the neurocranium (in both sexes) and 94.1% for the splanchnocranium (100% in females and 91.7% in males).

  8. Modelling the regional variability of the probability of high trihalomethane occurrence in municipal drinking water.

    PubMed

    Cool, Geneviève; Lebel, Alexandre; Sadiq, Rehan; Rodriguez, Manuel J

    2015-12-01

    The regional variability of the probability of occurrence of high total trihalomethane (TTHM) levels was assessed using multilevel logistic regression models that incorporate environmental and infrastructure characteristics. The models were structured in a three-level hierarchical configuration: samples (first level), drinking water utilities (DWUs, second level) and natural regions, an ecological hierarchical division from the Quebec ecological framework of reference (third level). They considered six independent variables: precipitation, temperature, source type, seasons, treatment type and pH. The average probability of TTHM concentrations exceeding the targeted threshold was 18.1%. The probability was influenced by seasons, treatment type, precipitations and temperature. The variance at all levels was significant, showing that the probability of TTHM concentrations exceeding the threshold is most likely to be similar if located within the same DWU and within the same natural region. However, most of the variance initially attributed to natural regions was explained by treatment types and clarified by spatial aggregation on treatment types. Nevertheless, even after controlling for treatment type, there was still significant regional variability of the probability of TTHM concentrations exceeding the threshold. Regional variability was particularly important for DWUs using chlorination alone since they lack the appropriate treatment required to reduce the amount of natural organic matter (NOM) in source water prior to disinfection. Results presented herein could be of interest to authorities in identifying regions with specific needs regarding drinking water quality and for epidemiological studies identifying geographical variations in population exposure to disinfection by-products (DBPs).

  9. Fruit and vegetable intake, as reflected by serum carotenoid concentrations, predicts reduced probability of PCB-associated risk for type 2 diabetes: NHANES 2003–2004

    PubMed Central

    Hofe, Carolyn R.; Feng, Limin; Zephyr, Dominique; Stromberg, Arnold J.; Hennig, Bernhard; Gaetke, Lisa M.

    2014-01-01

    Type 2 diabetes has been shown to occur in response to environmental and genetic influences, among them nutrition, food intake patterns, sedentary lifestyle, body mass index (BMI), and exposure to persistent organic pollutants (POPs), such as polychlorinated biphenyls (PCBs). Nutrition is essential in the prevention and management of type 2 diabetes and has been shown to modulate the toxicity of PCBs. Serum carotenoid concentrations, considered a reliable biomarker of fruit and vegetable intake, are associated with the reduced probability of chronic diseases, such as type 2 diabetes and cardiovascular disease. Our hypothesis is that fruit and vegetable intake, reflected by serum carotenoid concentrations, is associated with the reduced probability of developing type 2 diabetes in US adults with elevated serum concentrations of PCBs 118, 126, and 153. This cross-sectional study utilized the CDC database, National Health and Nutrition Examination Survey (NHANES) 2003–2004 in logistic regression analyses. Overall prevalence of type 2 diabetes was approximately 11.6% depending on the specific PCB. All three PCBs were positively associated with the probability of type 2 diabetes. For participants at higher PCB percentiles (e.g., 75th and 90th) for PCB 118 and 126, increasing serum carotenoid concentrations were associated with a smaller probability of type 2 diabetes. Fruit and vegetable intake, as reflected by serum carotenoid concentrations, predicted notably reduced probability of dioxin-like PCB-associated risk for type 2 diabetes. PMID:24774064

  10. Stepwise Distributed Open Innovation Contests for Software Development: Acceleration of Genome-Wide Association Analysis

    PubMed Central

    Hill, Andrew; Loh, Po-Ru; Bharadwaj, Ragu B.; Pons, Pascal; Shang, Jingbo; Guinan, Eva; Lakhani, Karim; Kilty, Iain

    2017-01-01

    Abstract Background: The association of differing genotypes with disease-related phenotypic traits offers great potential to both help identify new therapeutic targets and support stratification of patients who would gain the greatest benefit from specific drug classes. Development of low-cost genotyping and sequencing has made collecting large-scale genotyping data routine in population and therapeutic intervention studies. In addition, a range of new technologies is being used to capture numerous new and complex phenotypic descriptors. As a result, genotype and phenotype datasets have grown exponentially. Genome-wide association studies associate genotypes and phenotypes using methods such as logistic regression. As existing tools for association analysis limit the efficiency by which value can be extracted from increasing volumes of data, there is a pressing need for new software tools that can accelerate association analyses on large genotype-phenotype datasets. Results: Using open innovation (OI) and contest-based crowdsourcing, the logistic regression analysis in a leading, community-standard genetics software package (PLINK 1.07) was substantially accelerated. OI allowed us to do this in <6 months by providing rapid access to highly skilled programmers with specialized, difficult-to-find skill sets. Through a crowd-based contest a combination of computational, numeric, and algorithmic approaches was identified that accelerated the logistic regression in PLINK 1.07 by 18- to 45-fold. Combining contest-derived logistic regression code with coarse-grained parallelization, multithreading, and associated changes to data initialization code further developed through distributed innovation, we achieved an end-to-end speedup of 591-fold for a data set size of 6678 subjects by 645 863 variants, compared to PLINK 1.07's logistic regression. This represents a reduction in run time from 4.8 hours to 29 seconds. Accelerated logistic regression code developed in this project has been incorporated into the PLINK2 project. Conclusions: Using iterative competition-based OI, we have developed a new, faster implementation of logistic regression for genome-wide association studies analysis. We present lessons learned and recommendations on running a successful OI process for bioinformatics. PMID:28327993

  11. Stepwise Distributed Open Innovation Contests for Software Development: Acceleration of Genome-Wide Association Analysis.

    PubMed

    Hill, Andrew; Loh, Po-Ru; Bharadwaj, Ragu B; Pons, Pascal; Shang, Jingbo; Guinan, Eva; Lakhani, Karim; Kilty, Iain; Jelinsky, Scott A

    2017-05-01

    The association of differing genotypes with disease-related phenotypic traits offers great potential to both help identify new therapeutic targets and support stratification of patients who would gain the greatest benefit from specific drug classes. Development of low-cost genotyping and sequencing has made collecting large-scale genotyping data routine in population and therapeutic intervention studies. In addition, a range of new technologies is being used to capture numerous new and complex phenotypic descriptors. As a result, genotype and phenotype datasets have grown exponentially. Genome-wide association studies associate genotypes and phenotypes using methods such as logistic regression. As existing tools for association analysis limit the efficiency by which value can be extracted from increasing volumes of data, there is a pressing need for new software tools that can accelerate association analyses on large genotype-phenotype datasets. Using open innovation (OI) and contest-based crowdsourcing, the logistic regression analysis in a leading, community-standard genetics software package (PLINK 1.07) was substantially accelerated. OI allowed us to do this in <6 months by providing rapid access to highly skilled programmers with specialized, difficult-to-find skill sets. Through a crowd-based contest a combination of computational, numeric, and algorithmic approaches was identified that accelerated the logistic regression in PLINK 1.07 by 18- to 45-fold. Combining contest-derived logistic regression code with coarse-grained parallelization, multithreading, and associated changes to data initialization code further developed through distributed innovation, we achieved an end-to-end speedup of 591-fold for a data set size of 6678 subjects by 645 863 variants, compared to PLINK 1.07's logistic regression. This represents a reduction in run time from 4.8 hours to 29 seconds. Accelerated logistic regression code developed in this project has been incorporated into the PLINK2 project. Using iterative competition-based OI, we have developed a new, faster implementation of logistic regression for genome-wide association studies analysis. We present lessons learned and recommendations on running a successful OI process for bioinformatics. © The Author 2017. Published by Oxford University Press.

  12. Easy and low-cost identification of metabolic syndrome in patients treated with second-generation antipsychotics: artificial neural network and logistic regression models.

    PubMed

    Lin, Chao-Cheng; Bai, Ya-Mei; Chen, Jen-Yeu; Hwang, Tzung-Jeng; Chen, Tzu-Ting; Chiu, Hung-Wen; Li, Yu-Chuan

    2010-03-01

    Metabolic syndrome (MetS) is an important side effect of second-generation antipsychotics (SGAs). However, many SGA-treated patients with MetS remain undetected. In this study, we trained and validated artificial neural network (ANN) and multiple logistic regression models without biochemical parameters to rapidly identify MetS in patients with SGA treatment. A total of 383 patients with a diagnosis of schizophrenia or schizoaffective disorder (DSM-IV criteria) with SGA treatment for more than 6 months were investigated to determine whether they met the MetS criteria according to the International Diabetes Federation. The data for these patients were collected between March 2005 and September 2005. The input variables of ANN and logistic regression were limited to demographic and anthropometric data only. All models were trained by randomly selecting two-thirds of the patient data and were internally validated with the remaining one-third of the data. The models were then externally validated with data from 69 patients from another hospital, collected between March 2008 and June 2008. The area under the receiver operating characteristic curve (AUC) was used to measure the performance of all models. Both the final ANN and logistic regression models had high accuracy (88.3% vs 83.6%), sensitivity (93.1% vs 86.2%), and specificity (86.9% vs 83.8%) to identify MetS in the internal validation set. The mean +/- SD AUC was high for both the ANN and logistic regression models (0.934 +/- 0.033 vs 0.922 +/- 0.035, P = .63). During external validation, high AUC was still obtained for both models. Waist circumference and diastolic blood pressure were the common variables that were left in the final ANN and logistic regression models. Our study developed accurate ANN and logistic regression models to detect MetS in patients with SGA treatment. The models are likely to provide a noninvasive tool for large-scale screening of MetS in this group of patients. (c) 2010 Physicians Postgraduate Press, Inc.

  13. Bayesian logistic regression in detection of gene-steroid interaction for cancer at PDLIM5 locus.

    PubMed

    Wang, Ke-Sheng; Owusu, Daniel; Pan, Yue; Xie, Changchun

    2016-06-01

    The PDZ and LIM domain 5 (PDLIM5) gene may play a role in cancer, bipolar disorder, major depression, alcohol dependence and schizophrenia; however, little is known about the interaction effect of steroid and PDLIM5 gene on cancer. This study examined 47 single-nucleotide polymorphisms (SNPs) within the PDLIM5 gene in the Marshfield sample with 716 cancer patients (any diagnosed cancer, excluding minor skin cancer) and 2848 noncancer controls. Multiple logistic regression model in PLINK software was used to examine the association of each SNP with cancer. Bayesian logistic regression in PROC GENMOD in SAS statistical software, ver. 9.4 was used to detect gene- steroid interactions influencing cancer. Single marker analysis using PLINK identified 12 SNPs associated with cancer (P< 0.05); especially, SNP rs6532496 revealed the strongest association with cancer (P = 6.84 × 10⁻³); while the next best signal was rs951613 (P = 7.46 × 10⁻³). Classic logistic regression in PROC GENMOD showed that both rs6532496 and rs951613 revealed strong gene-steroid interaction effects (OR=2.18, 95% CI=1.31-3.63 with P = 2.9 × 10⁻³ for rs6532496 and OR=2.07, 95% CI=1.24-3.45 with P = 5.43 × 10⁻³ for rs951613, respectively). Results from Bayesian logistic regression showed stronger interaction effects (OR=2.26, 95% CI=1.2-3.38 for rs6532496 and OR=2.14, 95% CI=1.14-3.2 for rs951613, respectively). All the 12 SNPs associated with cancer revealed significant gene-steroid interaction effects (P < 0.05); whereas 13 SNPs showed gene-steroid interaction effects without main effect on cancer. SNP rs4634230 revealed the strongest gene-steroid interaction effect (OR=2.49, 95% CI=1.5-4.13 with P = 4.0 × 10⁻⁴ based on the classic logistic regression and OR=2.59, 95% CI=1.4-3.97 from Bayesian logistic regression; respectively). This study provides evidence of common genetic variants within the PDLIM5 gene and interactions between PLDIM5 gene polymorphisms and steroid use influencing cancer.

  14. Moving Beyond Maximum Tolerated Dose for Targeted Oncology Drugs: Use of Clinical Utility Index to Optimize Venetoclax Dosage in Multiple Myeloma Patients.

    PubMed

    Freise, K J; Jones, A K; Verdugo, M E; Menon, R M; Maciag, P C; Salem, A H

    2017-12-01

    Exposure-response analyses of venetoclax in combination with bortezomib and dexamethasone in previously treated patients with multiple myeloma (MM) were performed on a phase Ib venetoclax dose-ranging study. Logistic regression models were utilized to determine relationships, identify subpopulations with different responses, and optimize the venetoclax dosage that balanced both efficacy and safety. Bortezomib refractory status and number of prior treatments were identified to impact the efficacy response to venetoclax treatment. Higher venetoclax exposures were estimated to increase the probability of achieving a very good partial response (VGPR) or better through venetoclax doses of 1,200 mg. However, the probability of neutropenia (grade ≥3) was estimated to increase at doses >800 mg. Using a clinical utility index, a venetoclax dosage of 800 mg daily was selected to optimally balance the VGPR or better rates and neutropenia rates in MM patients administered 1-3 prior lines of therapy and nonrefractory to bortezomib. © 2017 American Society for Clinical Pharmacology and Therapeutics.

  15. Atmospheric conditions, lunar phases, and childbirth: a multivariate analysis

    NASA Astrophysics Data System (ADS)

    Ochiai, Angela Megumi; Gonçalves, Fabio Luiz Teixeira; Ambrizzi, Tercio; Florentino, Lucia Cristina; Wei, Chang Yi; Soares, Alda Valeria Neves; De Araujo, Natalucia Matos; Gualda, Dulce Maria Rosa

    2012-07-01

    Our objective was to assess extrinsic influences upon childbirth. In a cohort of 1,826 days containing 17,417 childbirths among them 13,252 spontaneous labor admissions, we studied the influence of environment upon the high incidence of labor (defined by 75th percentile or higher), analyzed by logistic regression. The predictors of high labor admission included increases in outdoor temperature (odds ratio: 1.742, P = 0.045, 95%CI: 1.011 to 3.001), and decreases in atmospheric pressure (odds ratio: 1.269, P = 0.029, 95%CI: 1.055 to 1.483). In contrast, increases in tidal range were associated with a lower probability of high admission (odds ratio: 0.762, P = 0.030, 95%CI: 0.515 to 0.999). Lunar phase was not a predictor of high labor admission ( P = 0.339). Using multivariate analysis, increases in temperature and decreases in atmospheric pressure predicted high labor admission, and increases of tidal range, as a measurement of the lunar gravitational force, predicted a lower probability of high admission.

  16. Calculating the weight of evidence in low-template forensic DNA casework.

    PubMed

    Lohmueller, Kirk E; Rudin, Norah

    2013-01-01

    Interpreting and assessing the weight of low-template DNA evidence presents a formidable challenge in forensic casework. This report describes a case in which a similar mixed DNA profile was obtained from four different bloodstains. The defense proposed that the low-level minor profile came from an alternate suspect, the defendant's mistress. The strength of the evidence was assessed using a probabilistic approach that employed likelihood ratios incorporating the probability of allelic drop-out. Logistic regression was used to model the probability of drop-out using empirical validation data from the government laboratory. The DNA profile obtained from the bloodstain described in this report is at least 47 billion times more likely if, in addition to the victim, the alternate suspect was the minor contributor, than if another unrelated individual was the minor contributor. This case illustrates the utility of the probabilistic approach for interpreting complex low-template DNA profiles. © 2012 American Academy of Forensic Sciences.

  17. Nomogram to predict successful placement in surgical subspecialty fellowships using applicant characteristics.

    PubMed

    Muffly, Tyler M; Barber, Matthew D; Karafa, Matthew T; Kattan, Michael W; Shniter, Abigail; Jelovsek, J Eric

    2012-01-01

    The purpose of the study was to develop a model that predicts an individual applicant's probability of successful placement into a surgical subspecialty fellowship program. Candidates who applied to surgical fellowships during a 3-year period were identified in a set of databases that included the electronic application materials. Of the 1281 applicants who were available for analysis, 951 applicants (74%) successfully placed into a colon and rectal surgery, thoracic surgery, vascular surgery, or pediatric surgery fellowship. The optimal final prediction model, which was based on a logistic regression, included 14 variables. This model, with a c statistic of 0.74, allowed for the determination of a useful estimate of the probability of placement for an individual candidate. Of the factors that are available at the time of fellowship application, 14 were used to predict accurately the proportion of applicants who will successfully gain a fellowship position. Copyright © 2012 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.

  18. Understanding the universality of the immigrant health paradox: the Spanish perspective.

    PubMed

    Speciale, Anna Maria; Regidor, Enrique

    2011-06-01

    This study sought the existence of an immigrant health paradox by evaluating the relationship between region of origin and the perinatal indicators of low birth weight and preterm birth in Spain. The data consist of individual records from the 2006 National Birth Registry of Spain. Mother's origin was divided into eleven groups based on geographic region. We calculated the frequency of Low Birth Weight (LBW) and Prematurity. Logistic regressions were conducted evaluating relationship between origin and LBW and origin and prematurity. After adjusting for socio-demographic variables mothers from Sub-Saharan Africa had an increased probability of having a neonate of LBW over the Spanish mothers, whereas in the mothers of the rest of regions the probability was lower. No differences were found in prematurity in babies born to foreign mothers when compared to babies born to Spanish mothers. While our findings largely support an immigrant paradox with regard to low birth weight, they also suggest that region of origin may play an important role.

  19. Comparison of Annoyance from Railway Noise and Railway Vibration.

    PubMed

    Ögren, Mikael; Gidlöf-Gunnarsson, Anita; Smith, Michael; Gustavsson, Sara; Persson Waye, Kerstin

    2017-07-19

    The aim of this study is to compare vibration exposure to noise exposure from railway traffic in terms of equal annoyance, i.e., to determine when a certain noise level is equally annoying as a corresponding vibration velocity. Based on questionnaire data from the Train Vibration and Noise Effects (TVANE) research project from residential areas exposed to railway noise and vibration, the dose response relationship for annoyance was estimated. By comparing the relationships between exposure and annoyance for areas both with and without significant vibration exposure, the noise levels and vibration velocities that had an equal probability of causing annoyance was determined using logistic regression. The comparison gives a continuous mapping between vibration velocity in the ground and a corresponding noise level at the facade that are equally annoying. For equivalent noise level at the facade compared to maximum weighted vibration velocity in the ground the probability of annoyance is approximately 20% for 59 dB or 0.48 mm/s, and about 40% for 63 dB or 0.98 mm/s.

  20. [An evaluation of a continuing medical education program for primary care services in the prescription of hypoglycemic agents in diabetes mellitus type 2].

    PubMed

    Castro-Ríos, Angélica; Reyes-Morales, Hortensia; Pérez-Cuevas, Ricardo

    2008-01-01

    To evaluate the impact of a continuing medical education program on family doctors to improve prescription of hypoglycemic drugs. An observational study was conducted with two groups of comparison (with-without program) and before-after periods. The unit of analysis was the visit. The period of evaluation comprised six months before and six after implementing the program. The outcome variable was the appropriateness of prescription that was based upon two criteria: appropriate selection and proper indication of the drug. Logistic regression models and the double differences technique were used to analyze the information. Models were adjusted by independent variables related with the patient, the visit and the PCC, the more relevant ones were: sex, obesity, conditions other than diabetes, number of visits in the analyzed period, number of drugs prescribed, size of the PCC and period. the program increases 0.6% the probability of appropriate prescription and 11% the probability of appropriate choice of the hypoglycemic drug in obese patients.

  1. Predictors of posttraumatic stress-related impairment in victims of terrorism and ongoing conflict in Israel.

    PubMed

    Chipman, Katie J; Palmieri, Patrick A; Canetti, Daphna; Johnson, Robert J; Hobfoll, Stevan E

    2011-05-01

    The present study aimed to investigate the prevalence of self-reported impairment (Criterion F) as part of a probable DSM-IV diagnosis of posttraumatic stress disorder (PTSD) within a sample of 1001 Israeli Jews subjected to direct and indirect exposure to rocket attacks. Further, the present study aimed to investigate predictors of endorsing posttraumatic stress (PTS)-related impairment, with specific attention to the influence of resources and resource loss. Data were collected via phone surveys. Twenty-nine percent of the sample reported impairment; however, only 19% of those reporting impairment met criteria for probable PTSD. Logistic regression results indicated that psychosocial resource losses, experiencing personal injury or injury to a family member or close friend, experiencing other major life stressors in the past year, having poorer health, having significant sleep difficulty, and having traditional (moderate) religious practices, significantly predicted PTS-related impairment. Results suggest that addressing impairment only within the context of full PTSD misses many individuals experiencing significant PTS-related impairment.

  2. Deletion Diagnostics for Alternating Logistic Regressions

    PubMed Central

    Preisser, John S.; By, Kunthel; Perin, Jamie; Qaqish, Bahjat F.

    2013-01-01

    Deletion diagnostics are introduced for the regression analysis of clustered binary outcomes estimated with alternating logistic regressions, an implementation of generalized estimating equations (GEE) that estimates regression coefficients in a marginal mean model and in a model for the intracluster association given by the log odds ratio. The diagnostics are developed within an estimating equations framework that recasts the estimating functions for association parameters based upon conditional residuals into equivalent functions based upon marginal residuals. Extensions of earlier work on GEE diagnostics follow directly, including computational formulae for one-step deletion diagnostics that measure the influence of a cluster of observations on the estimated regression parameters and on the overall marginal mean or association model fit. The diagnostic formulae are evaluated with simulations studies and with an application concerning an assessment of factors associated with health maintenance visits in primary care medical practices. The application and the simulations demonstrate that the proposed cluster-deletion diagnostics for alternating logistic regressions are good approximations of their exact fully iterated counterparts. PMID:22777960

  3. Estimating interaction on an additive scale between continuous determinants in a logistic regression model.

    PubMed

    Knol, Mirjam J; van der Tweel, Ingeborg; Grobbee, Diederick E; Numans, Mattijs E; Geerlings, Mirjam I

    2007-10-01

    To determine the presence of interaction in epidemiologic research, typically a product term is added to the regression model. In linear regression, the regression coefficient of the product term reflects interaction as departure from additivity. However, in logistic regression it refers to interaction as departure from multiplicativity. Rothman has argued that interaction estimated as departure from additivity better reflects biologic interaction. So far, literature on estimating interaction on an additive scale using logistic regression only focused on dichotomous determinants. The objective of the present study was to provide the methods to estimate interaction between continuous determinants and to illustrate these methods with a clinical example. and results From the existing literature we derived the formulas to quantify interaction as departure from additivity between one continuous and one dichotomous determinant and between two continuous determinants using logistic regression. Bootstrapping was used to calculate the corresponding confidence intervals. To illustrate the theory with an empirical example, data from the Utrecht Health Project were used, with age and body mass index as risk factors for elevated diastolic blood pressure. The methods and formulas presented in this article are intended to assist epidemiologists to calculate interaction on an additive scale between two variables on a certain outcome. The proposed methods are included in a spreadsheet which is freely available at: http://www.juliuscenter.nl/additive-interaction.xls.

  4. Are there good reasons for inequalities in access to renal transplantation in children?

    PubMed

    Hogan, Julien; Audry, Benoit; Harambat, Jérôme; Dunand, Olivier; Garnier, Arnaud; Salomon, Rémi; Ulinski, Tim; Macher, Marie-Alice; Couchoud, Cécile

    2015-12-01

    Studies in the USA and Europe have demonstrated inequalities in adult access to renal transplants. We previously demonstrate that the centre of treatment was impacting the time to be registered on the renal waiting list. In this study, we sought to ascertain the influence of patient and centre characteristics on the probability of transplantation within 1 year after registration on the waiting list for children. We included patients <18 years awaiting transplantation from the French ESRD National Registry. The effects of patient and centre characteristics were studied by hierarchical logistic regression. Centre effects were assessed by centre-level residual variance. A descriptive survey was performed to investigate differences in the centres' practices, and linear regression was used to confirm findings of different HLA compatibility requirements between centres. The study included 556 patients treated at 54 centres; 450 (80.9%) received transplants in the year after their listing. HLA group scarcity, time of inactive status during the year, pre-emptive listing and listing after age 18 were associated with lower probabilities of transplantation. Patient characteristics explained most of the variability among centres, but patients treated in paediatric centres had a lower probability of transplantation within 1 year because of higher HLA compatibility requirements for transplants. Although patient characteristics explained most of the inter-centre variability, harmonization of some practices might enable us to reduce some inequalities in access to renal transplantation while maintaining optimal transplant survival and chances to get a second transplant when needed. © The Author 2014. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.

  5. Home advantage in high-level volleyball varies according to set number.

    PubMed

    Marcelino, Rui; Mesquita, Isabel; Palao Andrés, José Manuel; Sampaio, Jaime

    2009-01-01

    The aim of the present study was to identify the probability of winning each Volleyball set according to game location (home, away). Archival data was obtained from 275 sets in the 2005 Men's Senior World League and 65,949 actions were analysed. Set result (win, loss), game location (home, away), set number (first, second, third, fourth and fifth) and performance indicators (serve, reception, set, attack, dig and block) were the variables considered in this study. In a first moment, performance indicators were used in a logistic model of set result, by binary logistic regression analysis. After finding the adjusted logistic model, the log-odds of winning the set were analysed according to game location and set number. The results showed that winning a set is significantly related to performance indicators (Chisquare(18)=660.97, p<0.01). Analyses of log-odds of winning a set demonstrate that home teams always have more probability of winning the game than away teams, regardless of the set number. Home teams have more advantage at the beginning of the game (first set) and in the two last sets of the game (fourth and fifth sets), probably due to facilities familiarity and crowd effects. Different game actions explain these advantages and showed that to win the first set is more important to take risk, through a better performance in the attack and block, and to win the final set is important to manage the risk through a better performance on the reception. These results may suggest intra-game variation in home advantage and can be most useful to better prepare and direct the competition. Key pointsHome teams always have more probability of winning the game than away teams.Home teams have higher performance in reception, set and attack in the total of the sets.The advantage of home teams is more pronounced at the beginning of the game (first set) and in two last sets of the game (fourth and fifth sets) suggesting intra-game variation in home advantage.Analysis by sets showed that home teams have a better performance in the attack and block in the first set and in the reception in the third and fifth sets.

  6. Incorporating detection probability into northern Great Plains pronghorn population estimates

    USGS Publications Warehouse

    Jacques, Christopher N.; Jenks, Jonathan A.; Grovenburg, Troy W.; Klaver, Robert W.; DePerno, Christopher S.

    2014-01-01

    Pronghorn (Antilocapra americana) abundances commonly are estimated using fixed-wing surveys, but these estimates are likely to be negatively biased because of violations of key assumptions underpinning line-transect methodology. Reducing bias and improving precision of abundance estimates through use of detection probability and mark-resight models may allow for more responsive pronghorn management actions. Given their potential application in population estimation, we evaluated detection probability and mark-resight models for use in estimating pronghorn population abundance. We used logistic regression to quantify probabilities that detecting pronghorn might be influenced by group size, animal activity, percent vegetation, cover type, and topography. We estimated pronghorn population size by study area and year using mixed logit-normal mark-resight (MLNM) models. Pronghorn detection probability increased with group size, animal activity, and percent vegetation; overall detection probability was 0.639 (95% CI = 0.612–0.667) with 396 of 620 pronghorn groups detected. Despite model selection uncertainty, the best detection probability models were 44% (range = 8–79%) and 180% (range = 139–217%) greater than traditional pronghorn population estimates. Similarly, the best MLNM models were 28% (range = 3–58%) and 147% (range = 124–180%) greater than traditional population estimates. Detection probability of pronghorn was not constant but depended on both intrinsic and extrinsic factors. When pronghorn detection probability is a function of animal group size, animal activity, landscape complexity, and percent vegetation, traditional aerial survey techniques will result in biased pronghorn abundance estimates. Standardizing survey conditions, increasing resighting occasions, or accounting for variation in individual heterogeneity in mark-resight models will increase the accuracy and precision of pronghorn population estimates.

  7. Logits and Tigers and Bears, Oh My! A Brief Look at the Simple Math of Logistic Regression and How It Can Improve Dissemination of Results

    ERIC Educational Resources Information Center

    Osborne, Jason W.

    2012-01-01

    Logistic regression is slowly gaining acceptance in the social sciences, and fills an important niche in the researcher's toolkit: being able to predict important outcomes that are not continuous in nature. While OLS regression is a valuable tool, it cannot routinely be used to predict outcomes that are binary or categorical in nature. These…

  8. Probability of detecting atrazine/desethyl-atrazine and elevated concentrations of nitrate (NO2+NO3-N) in ground water in the Idaho part of the upper Snake River basin

    USGS Publications Warehouse

    Rupert, Michael G.

    1998-01-01

    Draft Federal regulations may require that each State develop a State Pesticide Management Plan for the herbicides atrazine, alachlor, cyanazine, metolachlor, and simazine. This study developed maps that the Idaho State Department of Agriculture might use to predict the probability of detecting atrazine and desethyl-atrazine (a breakdown product of atrazine) in ground water in the Idaho part of the upper Snake River Basin. These maps can be incorporated in the State Pesticide Management Plan and help provide a sound hydrogeologic basis for atrazine management in the study area. Maps showing the probability of detecting atrazine/desethyl-atrazine in ground water were developed as follows: (1) Ground-water monitoring data were overlaid with hydrogeologic and anthropogenic data using a geographic information system to produce a data set in which each well had corresponding data on atrazine use, depth to ground water, geology, land use, precipitation, soils, and well depth. These data then were downloaded to a statistical software package for analysis by logistic regression. (2) Individual (univariate) relations between atrazine/desethyl-atrazine in ground water and atrazine use, depth to ground water, geology, land use, precipitation, soils, and well depth data were evaluated to identify those independent variables significantly related to atrazine/ desethyl-atrazine detections. (3) Several preliminary multivariate models with various combinations of independent variables were constructed. (4) The multivariate models which best predicted the presence of atrazine/desethyl-atrazine in ground water were selected. (5) The multivariate models were entered into the geographic information system and the probability maps were constructed. Two models which best predicted the presence of atrazine/desethyl-atrazine in ground water were selected; one with and one without atrazine use. Correlations of the predicted probabilities of atrazine/desethyl-atrazine in ground water with the percent of actual detections were good; r-squared values were 0.91 and 0.96, respectively. Models were verified using a second set of groundwater quality data. Verification showed that wells with water containing atrazine/desethyl-atrazine had significantly higher probability ratings than wells with water containing no atrazine/desethylatrazine (p <0.002). Logistic regression also was used to develop a preliminary model to predict the probability of nitrite plus nitrate as nitrogen concentrations greater than background levels of 2 milligrams per liter. A direct comparison between the atrazine/ desethyl-atrazine and nitrite plus nitrate as nitrogen probability maps was possible because the same ground-water monitoring, hydrogeologic, and anthropogenic data were used to develop both maps. Land use, precipitation, soil hydrologic group, and well depth were significantly related with atrazine/desethyl-atrazine detections. Depth to water, land use, and soil drainage were signifi- cantly related with elevated nitrite plus nitrate as nitrogen concentrations. The differences between atrazine/desethyl-atrazine and nitrite plus nitrate as nitrogen relations were attributed to differences in chemical behavior of these compounds in the environment and possibly to differences in the extent of use and rates of their application.

  9. Growth/no growth boundary of Clostridium perfringens from spores in cooked meat: A logistic analysis.

    PubMed

    Huang, Lihan; Li, Changcheng; Hwang, Cheng-An

    2018-02-02

    Clostridium perfringens is a major foodborne health hazard that can cause acute gastroenteritis in consumers, and is often associated with cooked meat and poultry products. Improper cooling after cooking may allow this pathogen to grow in a product, producing an enterotoxin that causes food poisoning. This study was conducted to evaluate the effect of common ingredients, including sodium tripolyphosphate (STPP), sodium lactate (NaL), and sodium chloride (NaCl), on the germination and outgrowth of C. perfringens spores in meat products. The growth/no growth test was conducted in Shahidi Ferguson Perfringens agar mixed with STPP (0-2500ppm), NaL (0-4%), and NaCl (0-4%) in microplates. Turbidity measurements at 600nm were compared before and after anaerobic incubation at 46°C to evaluate growth and no growth conditions. The dichotomous responses were analyzed by logistic regression to develop a model for estimating the growth probability of C. perfringens. The probability model was used to define the threshold of growth (probability >0.1 or 0.2) of C. perfringens and validated using inoculated ground beef under optimum temperature. Inoculated ground beef was mixed with different combinations of STPP, NaL, and NaCl to observe growth or no growth of C. perfringens, and the probability was calculated from the formulation. If the threshold of growth was set to 0.2, the accuracy of the growth and no growth predictions was 95.7%, with 4.3% over-prediction of growth events (fail-safe). The results from this study suggested that proper combinations of STPP, NaL, and NaCl could be used to control the growth of C. perfringens in cooked beef under the optimum temperature. The results may also suggest that proper combinations of STPP, NaL, and NaCl in cooked meat and poultry products could be used to prevent the growth of C. perfringens during cooling. Published by Elsevier B.V.

  10. Modeling of Combined Processing Steps for Reducing Escherichia coli O157:H7 Populations in Apple Cider

    PubMed Central

    Uljas, Heidi E.; Schaffner, Donald W.; Duffy, Siobain; Zhao, Lihui; Ingham, Steven C.

    2001-01-01

    Probabilistic models were used as a systematic approach to describe the response of Escherichia coli O157:H7 populations to combinations of commonly used preservation methods in unpasteurized apple cider. Using a complete factorial experimental design, the effect of pH (3.1 to 4.3), storage temperature and time (5 to 35°C for 0 to 6 h or 12 h), preservatives (0, 0.05, or 0.1% potassium sorbate or sodium benzoate), and freeze-thaw (F-T; −20°C, 48 h and 4°C, 4 h) treatment combinations (a total of 1,600 treatments) on the probability of achieving a 5-log10-unit reduction in a three-strain E. coli O157:H7 mixture in cider was determined. Using logistic regression techniques, pH, temperature, time, and concentration were modeled in separate segments of the data set, resulting in prediction equations for: (i) no preservatives, before F-T; (ii) no preservatives, after F-T; (iii) sorbate, before F-T; (iv) sorbate, after F-T; (v) benzoate, before F-T; and (vi) benzoate, after F-T. Statistical analysis revealed a highly significant (P < 0.0001) effect of all four variables, with cider pH being the most important, followed by temperature and time, and finally by preservative concentration. All models predicted 92 to 99% of the responses correctly. To ensure safety, use of the models is most appropriate at a 0.9 probability level, where the percentage of false positives, i.e., falsely predicting a 5-log10-unit reduction, is the lowest (0 to 4.4%). The present study demonstrates the applicability of logistic regression approaches to describing the effectiveness of multiple treatment combinations in pathogen control in cider making. The resulting models can serve as valuable tools in designing safe apple cider processes. PMID:11133437

  11. Modeling of combined processing steps for reducing Escherichia coli O157:H7 populations in apple cider.

    PubMed

    Uljas, H E; Schaffner, D W; Duffy, S; Zhao, L; Ingham, S C

    2001-01-01

    Probabilistic models were used as a systematic approach to describe the response of Escherichia coli O157:H7 populations to combinations of commonly used preservation methods in unpasteurized apple cider. Using a complete factorial experimental design, the effect of pH (3. 1 to 4.3), storage temperature and time (5 to 35 degrees C for 0 to 6 h or 12 h), preservatives (0, 0.05, or 0.1% potassium sorbate or sodium benzoate), and freeze-thaw (F-T; -20 degrees C, 48 h and 4 degrees C, 4 h) treatment combinations (a total of 1,600 treatments) on the probability of achieving a 5-log(10)-unit reduction in a three-strain E. coli O157:H7 mixture in cider was determined. Using logistic regression techniques, pH, temperature, time, and concentration were modeled in separate segments of the data set, resulting in prediction equations for: (i) no preservatives, before F-T; (ii) no preservatives, after F-T; (iii) sorbate, before F-T; (iv) sorbate, after F-T; (v) benzoate, before F-T; and (vi) benzoate, after F-T. Statistical analysis revealed a highly significant (P < 0.0001) effect of all four variables, with cider pH being the most important, followed by temperature and time, and finally by preservative concentration. All models predicted 92 to 99% of the responses correctly. To ensure safety, use of the models is most appropriate at a 0.9 probability level, where the percentage of false positives, i.e., falsely predicting a 5-log(10)-unit reduction, is the lowest (0 to 4.4%). The present study demonstrates the applicability of logistic regression approaches to describing the effectiveness of multiple treatment combinations in pathogen control in cider making. The resulting models can serve as valuable tools in designing safe apple cider processes.

  12. Variability in the prescription of non-biologic disease-modifying antirheumatic drugs for the treatment of spondyloarthritis in Spain.

    PubMed

    Silva-Fernández, Lucía; Pérez-Vicente, Sabina; Martín-Martínez, María Auxiliadora; López-González, Ruth

    2015-06-01

    To describe the variability in the prescription of non-biologic disease-modifying antirheumatic drugs (nbDMARDs) for the treatment of spondyloarthritis (SpA) in Spain and to explore which factors relating to the disease, patient, physician, and/or center contribute to these variations. A retrospective medical record review was performed using a probabilistic sample of 1168 patients with SpA from 45 centers distributed in 15/19 regions in Spain. The sociodemographic and clinical features and the use of drugs were recorded following a standardized protocol. Logistic regression, with nbDMARDs prescriptions as the dependent variable, was used for bivariable analysis. A multilevel logistic regression model was used to study variability. The probability of receiving an nbDMARD was higher in female patients [OR = 1.548; 95% confidence interval (CI): 1.208-1.984], in those with elevated C-reactive protein (OR = 1.039; 95% CI: 1.012-1.066) and erythrocyte sedimentation rate (OR = 1.012; 95% CI: 1.003-1.021), in those with a higher number of affected peripheral joints (OR = 12.921; 95% CI: 2.911-57.347), and in patients with extra-articular manifestations like dactylitis (OR = 2.997; 95% CI: 1.868-4.809), psoriasis (OR = 2.601; 95% CI: 1.870-3.617), and enthesitis (OR = 1.717; 95% CI: 1.224-2.410). There was a marked variability in the prescription of nbDMARDs for SpA patients, depending on the center (14.3%; variance 0.549; standard error 0.161; median odds ratio 2.366; p < 0.001). After adjusting for patient and center variables, this variability fell to 3.8%. A number of factors affecting variability in clinical practice, and which are independent of disease characteristics, are associated with the probability of SpA patients receiving nbDMARDs in Spain. Copyright © 2015 Elsevier Inc. All rights reserved.

  13. Logistic and linear regression model documentation for statistical relations between continuous real-time and discrete water-quality constituents in the Kansas River, Kansas, July 2012 through June 2015

    USGS Publications Warehouse

    Foster, Guy M.; Graham, Jennifer L.

    2016-04-06

    The Kansas River is a primary source of drinking water for about 800,000 people in northeastern Kansas. Source-water supplies are treated by a combination of chemical and physical processes to remove contaminants before distribution. Advanced notification of changing water-quality conditions and cyanobacteria and associated toxin and taste-and-odor compounds provides drinking-water treatment facilities time to develop and implement adequate treatment strategies. The U.S. Geological Survey (USGS), in cooperation with the Kansas Water Office (funded in part through the Kansas State Water Plan Fund), and the City of Lawrence, the City of Topeka, the City of Olathe, and Johnson County Water One, began a study in July 2012 to develop statistical models at two Kansas River sites located upstream from drinking-water intakes. Continuous water-quality monitors have been operated and discrete-water quality samples have been collected on the Kansas River at Wamego (USGS site number 06887500) and De Soto (USGS site number 06892350) since July 2012. Continuous and discrete water-quality data collected during July 2012 through June 2015 were used to develop statistical models for constituents of interest at the Wamego and De Soto sites. Logistic models to continuously estimate the probability of occurrence above selected thresholds were developed for cyanobacteria, microcystin, and geosmin. Linear regression models to continuously estimate constituent concentrations were developed for major ions, dissolved solids, alkalinity, nutrients (nitrogen and phosphorus species), suspended sediment, indicator bacteria (Escherichia coli, fecal coliform, and enterococci), and actinomycetes bacteria. These models will be used to provide real-time estimates of the probability that cyanobacteria and associated compounds exceed thresholds and of the concentrations of other water-quality constituents in the Kansas River. The models documented in this report are useful for characterizing changes in water-quality conditions through time, characterizing potentially harmful cyanobacterial events, and indicating changes in water-quality conditions that may affect drinking-water treatment processes.

  14. Landsat TM as a Tool for Locating Habitat for Cerulean Warblers

    NASA Technical Reports Server (NTRS)

    Kellner, Chris

    2000-01-01

    I believe that I made significant strides in three areas between fall of 1997 and fall of 2000 when I concluded my participation in the JOVE program. First, I acquired skill in digital remote sensing. This was significant to me because it had been 20 years since I had done any work utilizing remote sensing. I used my new skills in two classroom settings (forest ecology and GIS). In addition, I will participate as an instructor of digital remote sensing in a workshop for secondary educators this coming spring. Second, I received funding from the Arkansas Game and Fish Commission and the U.S. Forest Service to supplement JOVE funds. Third, and most importantly, a students and I developed a technique using LandSAT TM for identifying habitat for cerulean warblers. We developed a habitat model using logistic regression to discriminate between pixels that had a high probability of representing good cerulean warbler habitat and pixels that had a low probability of representing cerulean warbler habitat. Using this model, we located five significant populations of cerulean warblers in the Ozark National Forest of Arkansas. These populations were unknown before the initiation of this research and further represent a significant proportion of the known cerulean warblers in Arkansas. Preliminary findings were presented at the Ornithological Societies of America meeting in August of 1999. I also presented findings at the Arkansas Game and Fish Commission Research Symposium held in June of 2000. Finally, one paper is in press: James, D. A., C.J. Kellner, J. Self, and J. Davis., 'Breeding season distribution of cerulean warblers in Arkansas in the 1990's'. In addition, one paper is under construction: 'Population fluctuation and habitat selection by cerulean warblers in upland forests of Arkansas,' and one paper is under consideration: 'LandSAT TM and Logistic regression for identification of cerulean warbler habitat in upland forests of Arkansas.'

  15. [The prevalence of war-related post-traumatic stress disorder in children from Cundinamarca, Colombia].

    PubMed

    Pérez-Olmos, Isabel; Fernández-Piñeres, Patricia E; Rodado-Fuentes, Sonia

    2005-01-01

    Determining the prevalence of post-traumatic stress disorder (PTSD) related to the type of war exposure and associated factors in school-aged children from three Colombian towns. Cross-sectional epidemiological study. Representative randomised sample of 493 children aged 5-14. The children were evaluated during 2002 using semi-structured psychiatric interviews and the clinician administered PTSD scale. 167 children were evaluated in La Palma who had been chronically exposed to war, 164 in Arbeláez who had had recent war-exposure and 162 in Sopó who had not been exposed to war. War-related PTSD prevalence was calculated in each municipality. Odds ratio (OR) and chi-square were used for evaluating the association between exposure to war and PTSD and the related risk. Multivariate analysis used the logistic regression model. The affected children required specialised mental health counselling. The prevalence of PTSD resulting from war was 16,8 % in La Palma, 23,2 % in Arbeláez and 1.2% in Sopó. A 19.9 OR (CI 4.7, 119.2), 30,5 Chi-square and p = 0.000 revealed war-related PTSD association and risk for children when comparing the exposed towns to Sopó. The logistic regression showed that geographical closeness to war zone and intense emotional reaction to war increased the probability of war-related PTSD. Vulnerability factors were predominant in war-exposed towns. Poverty, parents' low educational level and child abuse predominated in La Palma. Attention-deficit and psychosomatic disorders were more prevalent in Arbeláez. War affects children's mental health; the children from the exposed towns had 19 times greater probability of war-related PTSD than those from a non-exposed town. Early therapeutic intervention is a public health priority. The results are useful for countries suffering from war, internal conflict and/or terrorism.

  16. Sequential PET/CT with [18F]-FDG Predicts Pathological Tumor Response to Preoperative Short Course Radiotherapy with Delayed Surgery in Patients with Locally Advanced Rectal Cancer Using Logistic Regression Analysis

    PubMed Central

    Pecori, Biagio; Lastoria, Secondo; Caracò, Corradina; Celentani, Marco; Tatangelo, Fabiana; Avallone, Antonio; Rega, Daniela; De Palma, Giampaolo; Mormile, Maria; Budillon, Alfredo; Muto, Paolo; Bianco, Francesco; Aloj, Luigi; Petrillo, Antonella; Delrio, Paolo

    2017-01-01

    Previous studies indicate that FDG PET/CT may predict pathological response in patients undergoing neoadjuvant chemo-radiotherapy for locally advanced rectal cancer (LARC). Aim of the current study is evaluate if pathological response can be similarly predicted in LARC patients after short course radiation therapy alone. Methods: Thirty-three patients with cT2-3, N0-2, M0 rectal adenocarcinoma treated with hypo fractionated short course neoadjuvant RT (5x5 Gy) with delayed surgery (SCRTDS) were prospectively studied. All patients underwent 3 PET/CT studies at baseline, 10 days from RT end (early), and 53 days from RT end (delayed). Maximal standardized uptake value (SUVmax), mean standardized uptake value (SUVmean) and total lesion glycolysis (TLG) of the primary tumor were measured and recorded at each PET/CT study. We use logistic regression analysis to aggregate different measures of metabolic response to predict the pathological response in the course of SCRTDS. Results: We provide straightforward formulas to classify response and estimate the probability of being a major responder (TRG1-2) or a complete responder (TRG1) for each individual. The formulas are based on the level of TLG at the early PET and on the overall proportional reduction of TLG between baseline and delayed PET studies. Conclusions: This study demonstrates that in the course of SCRTDS it is possible to estimate the probabilities of pathological tumor responses on the basis of PET/CT with FDG. Our formulas make it possible to assess the risks associated to LARC borne by a patient in the course of SCRTDS. These risk assessments can be balanced against other health risks associated with further treatments and can therefore be used to make informed therapy adjustments during SCRTDS. PMID:28060889

  17. Sequential PET/CT with [18F]-FDG Predicts Pathological Tumor Response to Preoperative Short Course Radiotherapy with Delayed Surgery in Patients with Locally Advanced Rectal Cancer Using Logistic Regression Analysis.

    PubMed

    Pecori, Biagio; Lastoria, Secondo; Caracò, Corradina; Celentani, Marco; Tatangelo, Fabiana; Avallone, Antonio; Rega, Daniela; De Palma, Giampaolo; Mormile, Maria; Budillon, Alfredo; Muto, Paolo; Bianco, Francesco; Aloj, Luigi; Petrillo, Antonella; Delrio, Paolo

    2017-01-01

    Previous studies indicate that FDG PET/CT may predict pathological response in patients undergoing neoadjuvant chemo-radiotherapy for locally advanced rectal cancer (LARC). Aim of the current study is evaluate if pathological response can be similarly predicted in LARC patients after short course radiation therapy alone. Thirty-three patients with cT2-3, N0-2, M0 rectal adenocarcinoma treated with hypo fractionated short course neoadjuvant RT (5x5 Gy) with delayed surgery (SCRTDS) were prospectively studied. All patients underwent 3 PET/CT studies at baseline, 10 days from RT end (early), and 53 days from RT end (delayed). Maximal standardized uptake value (SUVmax), mean standardized uptake value (SUVmean) and total lesion glycolysis (TLG) of the primary tumor were measured and recorded at each PET/CT study. We use logistic regression analysis to aggregate different measures of metabolic response to predict the pathological response in the course of SCRTDS. We provide straightforward formulas to classify response and estimate the probability of being a major responder (TRG1-2) or a complete responder (TRG1) for each individual. The formulas are based on the level of TLG at the early PET and on the overall proportional reduction of TLG between baseline and delayed PET studies. This study demonstrates that in the course of SCRTDS it is possible to estimate the probabilities of pathological tumor responses on the basis of PET/CT with FDG. Our formulas make it possible to assess the risks associated to LARC borne by a patient in the course of SCRTDS. These risk assessments can be balanced against other health risks associated with further treatments and can therefore be used to make informed therapy adjustments during SCRTDS.

  18. Developmental dyslexia: predicting individual risk.

    PubMed

    Thompson, Paul A; Hulme, Charles; Nash, Hannah M; Gooch, Debbie; Hayiou-Thomas, Emma; Snowling, Margaret J

    2015-09-01

    Causal theories of dyslexia suggest that it is a heritable disorder, which is the outcome of multiple risk factors. However, whether early screening for dyslexia is viable is not yet known. The study followed children at high risk of dyslexia from preschool through the early primary years assessing them from age 3 years and 6 months (T1) at approximately annual intervals on tasks tapping cognitive, language, and executive-motor skills. The children were recruited to three groups: children at family risk of dyslexia, children with concerns regarding speech, and language development at 3;06 years and controls considered to be typically developing. At 8 years, children were classified as 'dyslexic' or not. Logistic regression models were used to predict the individual risk of dyslexia and to investigate how risk factors accumulate to predict poor literacy outcomes. Family-risk status was a stronger predictor of dyslexia at 8 years than low language in preschool. Additional predictors in the preschool years include letter knowledge, phonological awareness, rapid automatized naming, and executive skills. At the time of school entry, language skills become significant predictors, and motor skills add a small but significant increase to the prediction probability. We present classification accuracy using different probability cutoffs for logistic regression models and ROC curves to highlight the accumulation of risk factors at the individual level. Dyslexia is the outcome of multiple risk factors and children with language difficulties at school entry are at high risk. Family history of dyslexia is a predictor of literacy outcome from the preschool years. However, screening does not reach an acceptable clinical level until close to school entry when letter knowledge, phonological awareness, and RAN, rather than family risk, together provide good sensitivity and specificity as a screening battery. © 2015 The Authors. Journal of Child Psychology and Psychiatry published by John Wiley & Sons Ltd on behalf of Association for Child and Adolescent Mental Health.

  19. Spatial Patterns of High Aedes aegypti Oviposition Activity in Northwestern Argentina

    PubMed Central

    Estallo, Elizabet Lilia; Más, Guillermo; Vergara-Cid, Carolina; Lanfri, Mario Alberto; Ludueña-Almeida, Francisco; Scavuzzo, Carlos Marcelo; Introini, María Virginia; Zaidenberg, Mario; Almirón, Walter Ricardo

    2013-01-01

    Background In Argentina, dengue has affected mainly the Northern provinces, including Salta. The objective of this study was to analyze the spatial patterns of high Aedes aegypti oviposition activity in San Ramón de la Nueva Orán, northwestern Argentina. The location of clusters as hot spot areas should help control programs to identify priority areas and allocate their resources more effectively. Methodology Oviposition activity was detected in Orán City (Salta province) using ovitraps, weekly replaced (October 2005–2007). Spatial autocorrelation was measured with Moran’s Index and depicted through cluster maps to identify hot spots. Total egg numbers were spatially interpolated and a classified map with Ae. aegypti high oviposition activity areas was performed. Potential breeding and resting (PBR) sites were geo-referenced. A logistic regression analysis of interpolated egg numbers and PBR location was performed to generate a predictive mapping of mosquito oviposition activity. Principal Findings Both cluster maps and predictive map were consistent, identifying in central and southern areas of the city high Ae. aegypti oviposition activity. A logistic regression model was successfully developed to predict Ae. aegypti oviposition activity based on distance to PBR sites, with tire dumps having the strongest association with mosquito oviposition activity. A predictive map reflecting probability of oviposition activity was produced. The predictive map delimitated an area of maximum probability of Ae. aegypti oviposition activity in the south of Orán city where tire dumps predominate. The overall fit of the model was acceptable (ROC = 0.77), obtaining 99% of sensitivity and 75.29% of specificity. Conclusions Distance to tire dumps is inversely associated with high mosquito activity, allowing us to identify hot spots. These methodologies are useful for prevention, surveillance, and control of tropical vector borne diseases and might assist National Health Ministry to focus resources more effectively. PMID:23349813

  20. Predictive model of third molar eruption after second molar extraction.

    PubMed

    De-la-Rosa-Gay, Cristina; Valmaseda-Castellón, Eduard; Gay-Escoda, Cosme

    2010-03-01

    Extraction of second permanent molars is an option for providing space in orthodontic treatment. Although many articles have described its impact on the outcome, there are few data on the prognosis of the eruption of the adjacent third molars. The aims of this investigation were to provide predictive models of eruption of third molars after second permanent molar extraction and to validate them. A total of 48 patients (ages, 11-23 years) who had 128 second permanent molars (54 maxillary, 74 mandibular) extracted during orthodontic treatment were followed until eruption of the third molars was complete. A lineal regression model predicted the final angle of the third molars with the permanent first molar by using the variables of initial angle, jaw, and the developmental stage of the third molar. A logistic regression model predicted the probability of correct eruption by using the variables of initial angle, jaw, sex, age, and the developmental stage of the third molar. 2010 American Association of Orthodontists. Published by Mosby, Inc. All rights reserved.

Top